Over a year and a half ago, I wrote up a post about the rights and responsibilities due any engineer at Honeycomb. At the time we were in the middle of a growth spurt, had just hired several new engineers, and I was in the process of turning over day-to-day engineering management over to Emily. Writing things down helped me codify what I actually cared about, and helped keep us true to our principles as we grew.
Tacked on to the end of the post was a list of manager responsibilities, almost as an afterthought. Many people protested, “don’t managers get any rights??” (and naturally I snapped “NO! hahahahahha”)
I always intended to circle back and write a followup post with the rights and responsibilities for managers. But it wasn’t til recently, as we are gearing up for another hiring spurt and have expanded our managerial ranks, that it really felt like its time had come.
The time has come, the time is now, as marvin k. mooney once said. Added the bill of rights, and updated and expanded the list of responsibilities. Thanks Emily Nakashima for co-writing it with me.
Manager’s Bill of Rights
You shall receive honest, courageous, timely feedback about yourself and your team, from your reports, your peers, and your leaders. (No one is exempt from feeding the hungry hungry feedback hippo! NOO ONNEEEE!) 🦛🦛🦛🦛🦛🦛🦛
Management will be treated with the same respect and importance as individual work.
You have the final say over hiring, firing, and leveling decisions for your team. It is expected that you solicit feedback from your team and peers and drive consensus where possible. But in the end, the say is yours.
Management can be draining, difficult work, even at places that do it well. You will get tactical, strategic, and emotional support from other managers.
You cannot take care of others unless you first practice self-care. You damn well better take vacations. (Real ones.)
You have the right to personal development, career progression, and professional support. We will retain a leadership coach for you.
You do not have to be a manager if you do not want to. No one will ever pressure you.
Recruit and hire and train your team. Foster a sense of solidarity and “teaminess” as well as real emotional safety.
Cultivate an inclusive culture and redistribute opportunity. Fuck a pedigree. Resist monoculture.
Care for the people on your team. Support them in their career trajectory, personal goals, work/life balance, and inter- and intra-team dynamics.
Keep an eye out for people on other teams who aren’t getting the support they need, and work with your leadership and manager peers to fix the situation.
Give feedback early and often. Receive feedback gracefully. Always say the hard things, but say them with love.
Move us relentlessly forward, staying alert for rabbit-holing and work that doesn’t contribute to our goals. Ensure redundancy/coverage of critical areas.
Own the planning process for your team, be accountable for the goals you set. Allocate resources by communicating priorities and requesting support. Add focus or urgency where needed.
Own your time and attention. Be accessible. Actively manage your calendar. Try not to make your emotions everyone else’s problems (but do lean on your own manager and your peers for support).
Make your own personal growth and self-care a priority. Model the values and traits we want employees to pattern themselves after.
I just read this piece, which is basically a very long subtweet about my Friday deploy threads. Go on and read it: I’ll wait.
Here’s the thing. After getting over some of the personal gibes (smug optimism? literally no one has ever accused me of being an optimist, kind sir), you may be expecting me to issue a vigorous rebuttal. But I shan’t. Because we are actually in violent agreement, almost entirely.
I have repeatedly stressed the following points:
I want to make engineers’ lives better, by giving them more uninterrupted weekends and nights of sleep. This is the goal that underpins everything I do.
Anyone who ships code should develop and exercise good engineering judgment about when to deploy, every day of the week
Every team has to make their own determination about which policies and norms are right given their circumstances and risk tolerance
A policy of “no Friday deploys” may be reasonable for now but should be seen as a smell, a sign that your deploys are risky. It is also likely to make things WORSE for you, not better, by causing you to adopt other risky practices (e.g. elongating the interval between merge and deploy, batching changes up in a single deploy)
This has been the most frustrating thing about this conversation: that a) I am not in fact the absolutist y’all are arguing against, and b) MY number one priority is engineers and their work/life balance. Which makes this particularly aggravating:
Lastly there is some strange argument that choosing not to deploy on Friday “Shouldn’t be a source of glee and pride”. That one I haven’t figured out yet, because I have always had a lot of glee and pride in being extremely (overly?) protective of the work/life balance of the engineers who either work for me, or with me. I don’t expect that to change.
Hold up. Did you catch that clever little logic switcheroo? You defined “not deploying on Friday” as being a priori synonymous with “protecting the work/life balance of engineers”. This is how I know you haven’t actually grasped my point, and are arguing against a straw man. My entire point is that the behaviors and practices associated with blocking Friday deploys are in fact hurting your engineers.
I, too, take a lot of glee and pride in being extremely, massively, yes even OVERLY protective of the work/life balance of the engineers who either work for me, or with me.
AND THAT IS WHY WE DEPLOY ON FRIDAYS.
Because it is BETTER for them. Because it is part of a deploy ecosystem which results in them being woken up less and having fewer weekends interrupted overall than if I had blocked deploys on Fridays.
It’s not about Fridays. It’s about having a healthy ecosystem and feedback loop where you trust your deploys, where deploys aren’t a big deal, and they never cause engineers to have to work outside working hours. And part of how you get there is by not artificially blocking off a big bunch of the week and not deploying during that time, because that breaks up your virtuous feedback loop and causes your deploys to be much more likely to fail in terrible ways.
The other thing that annoys me is when people say, primly, “you can’t guarantee any deploy is safe, but you can guarantee people have plans for the weekend.”
Know what else you can guarantee? That people would like to sleep through the fucking night, even on weeknights.
When I hear people say this all I hear is that they don’t care enough to invest the time to actually fix their shit so it won’t wake people up or interrupt their off time, seven days a week. Enough with the virtue signaling already.
You cannot have it both ways, where you block off a bunch of undeployable time AND you have robust, resilient, swift deploys. Somehow I keep not getting this core point across to a substantial number of very intelligent people. So let me try a different way.
Let’s try telling a story.
A tale of two startups
Here are two case studies.
Company X is a three-year-old startup. It is a large, fast-growing multi-tenant platform on a large distributed system with spiky traffic, lots of user-submitted data, and a very green database. Company X deploys the API about once per day, and does a global deploy of all services every Tuesday. Deploys often involve some firefighting and a rollback or two, and Tuesdays often involve deploying and reverting all day (sigh).
Pager volume at Company X isn’t the worst, but usually involves getting woken up a couple times a week, and there are deploy-related alerts after maybe a third of deploys, which then need to be triaged to figure out whose diff was the cause.
Company Z is a three-year-old startup. It is a large, fast-growing multi-tenant platform on a large distributed system with spiky traffic, lots of user-submitted data, and a very green house-built distributed storage engine. Company Z automatically triggers a deploy within 30 minutes of a merge to master, for all services impacted by that merge. Developers at company Z practice observability-driven deployment, where they instrument all changes, ask “how will I know if this change doesn’t work?” during code review, and have a muscle memory habit of checking to see if their changes are working as intended or not after they merge to master.
Deploys rarely result in the pager going off at Company Z; most problems are caught visually by the engineer and reverted or fixed before any paging alert can fire. Pager volume consists of roughly one alert per week outside of working hours, and no one is woken up more than a couple times per year.
Same damn problem, better damn solutions.
If it wasn’t extremely obvious, these companies are my last two jobs, Parse (company X, from 2012-2016) and Honeycomb (company Z, from 2016-present).
They have a LOT in common. Both are services for developers, both are platforms, both are running highly elastic microservices written in golang, both get lots of spiky traffic and store lots of user-defined data in a young, homebrewed columnar storage engine. They were even built by some of the same people (I built infra for both, and they share four more of the same developers).
At Parse, deploys were run by ops engineers because of how common it was for there to be some firefighting involved. We discouraged people from deploying on Fridays, we locked deploys around holidays and big launches. At Honeycomb, none of these things are true. In fact, we literally can’t remember a time when it was hard to debug a deploy-related change.
What’s the difference between Company X and Company Z?
So: what’s the difference? Why are the two companies so dramatically different in the riskiness of their deploys, and the amount of human toil it takes to keep them up?
I’ve thought about this a lot. It comes down to three main things.
Single merge per deploy
I think that I’ve been reluctant to hammer this home as much as I ought to, because I’m exquisitely sensitive about sounding like an obnoxious vendor trying to sell you things. 😛 (Which has absolutely been detrimental to my argument.)
When I say observability, I mean in the precise technical definition as I laid out in this piece: with high cardinality, arbitrarily wide structured events, etc. Metrics and other generic telemetry will not give you the ability to do the necessary things, e.g. break down by build id in combination with all your other dimensions to see the world through the lens of your instrumentation. Here, for example, are all the deploys for a particular service last Friday:
Each shaded area is the duration of an individual deploy: you can see the counters for each build id, as the new versions replace the old ones,
2. Observability-driven development.
This is cultural as well as technical. By this I mean instrumenting a couple steps ahead of yourself as you are developing and shipping code. I mean making a cultural practice of asking each other “how will you know if this is broken?” during code review. I mean always going and looking at your service through the lens of your instrumentation after every diff you ship. Like muscle memory.
3. Single merge per deploy.
The number one thing you can do to make your deploys intelligible, other than observability and instrumentation, is this: deploy one changeset at a time, as swiftly as possible after it is merged to master. NEVER glom multiple changesets into a single deploy — that’s how you get into a state where you aren’t sure which change is at fault, or who to escalate to, or if it’s an intersection of multiple changes, or if you should just start bisecting blindly to try and isolate the source of the problem. THIS is what turns deploys into long, painful marathons.
And NEVER wait hours or days to deploy after the change is merged. As a developer, you know full well how this goes. After you merge to master one of two things will happen. Either:
you promptly pull up a window to watch your changes roll out, checking on your instrumentation to see if it’s doing what you intended it to or if anything looks weird, OR
you close the project and open a new one.
When you switch to a new project, your brain starts rapidly evicting all the rich context about what you had intended to do and and overwriting it with all the new details about the new project.
Whereas if you shipped that changeset right after merging, then you can WATCH it roll out. And 80-90% of all problems can be, should be caught right here, before your users ever notice — before alerts can fire off and page you. If you have the ability to break down by build id, zoom in on any errors that happen to arise, see exactly which dimensions all the errors have in common and how they differ from the healthy requests, see exactly what the context is for any erroring requests.
Healthy feedback loops == healthy systems.
That tight, short feedback loop of build/ship/observe is the beating heart of a healthy, observable distributed system that can be run and maintained by human beings, without it sucking your life force or ruining your sleep schedule or will to live.
Most engineers have never worked on a system like this. Most engineers have no idea what a yawning chasm exists between a healthy, tractable system and where they are now. Most engineers have no idea what a difference observability can make. Most engineers are far more familiar with spending 40-50% of their week fumbling around in the dark, trying to figure out where in the system is the problem they are trying to fix, and what kind of context do they need to reproduce.
Most engineers are dealing with systems where they blindly shipped bugs with no observability, and reports about those bugs started to trickle in over the next hours, days, weeks, months, or years. Most engineers are dealing with systems that are obfuscated and obscure, systems which are tangled heaps of bugs and poorly understood behavior for years compounding upon years on end.
That’s why it doesn’t seem like such a big deal to you break up that tight, short feedback loop. That’s why it doesn’t fill you with horror to think of merging on Friday morning and deploying on Monday. That’s why it doesn’t appall you to clump together all the changes that happen to get merged between Friday and Monday and push them out in a single deploy.
It just doesn’t seem that much worse than what you normally deal with. You think this raging trash fire is, unfortunately … normal.
How realistic is this, though, really?
Maybe you’re rolling your eyes at me now. “Sure, Charity, that’s nice for you, on your brand new shiny system. Ours has years of technical debt, It’s unrealistic to hold us to the same standard.”
Yeah, I know. It is much harder to dig yourself out of a hole than it is to not create a hole in the first place. No doubt about that.
Harder, yes. But not impossible.
I have done it.
Parse in 2013 was a trash fire. It woke us up every night, we spent a lot of time stabbing around in the dark after every deploy. But after we got acquired by Facebook, after we started shipping some data sets into Scuba, after (in retrospect, I can say) we had event-level observability for our systems, we were able to start paying down that debt and fixing our deploy systems.
We started hooking up that virtuous feedback loop, step by step.
We reworked our CI/CD system so that it built a new artifact after every single merge.
We put developers at the steering wheel so they could push their own changes out.
We got better at instrumentation, and we made a habit of going to look at it during or after each deploy.
We hooked up the pager so it would alert the person who merged the last diff, if an alert was generated within an hour after that service was deployed.
We started finding bugs quicker, faster, and paying down the tech debt we had amassed from shipping code without observability/visibility for many years.
Developers got in the habit of shipping their own changes, and watching them as they rolled out, and finding/fixing their bugs immediately.
It took some time. But after a year of this, our formerly flaky, obscure, mysterious, massively multi-tenant service that was going down every day and wreaking havoc on our sleep schedules was tamed. Deploys were swift and drama-free. We stopped blocking deploys on Fridays, holidays, or any other days, because we realized our systems were more stable when we always shipped consistently and quickly.
Allow me to repeat. Our systems were more stable when we always shipped right after the changes were merged. Our systems were less stable when we carved out times to pause deployments. This was not common wisdom at the time, so it surprised me; yet I found it to be true over and over and over again.
This is literally why I started Honeycomb.
When I was leaving Facebook, I suddenly realized that this meant going back to the Dark Ages in terms of tooling. I had become so accustomed to having the Parse+scuba tooling and being able to iteratively explore and ask any question without having to predict it in advance. I couldn’t fathom giving it up.
The idea of going back to a world without observability, a world where one deployed and then stared anxiously at dashboards — it was unthinkable. It was like I was being asked to give up my five senses for production — like I was going to be blind, deaf, dumb, without taste or touch.
Look, I agree with nearly everything in the author’s piece. I could have written that piece myself five years ago.
But since then, I’ve learned that systems can be better. They MUST be better. Our systems are getting so rapidly more complex, they are outstripping our ability to understand and manage them using the past generation of tools. If we don’t change our ways, it will chew up another generation of engineering lives, sleep schedules, relationships.
Observability isn’t the whole story. But it’s certainly where it starts. If you can’t see where you’re going, you can’t go very far.
Get you some observability.
And then raise your standards for how systems should feel, and how much of your human life they should consume. Do better.
Because I couldn’t agree with that other post more: it really is all about people and their real lives.
Listen, if you can swing a four day work week, more power to you (most of us can’t). Any day you aren’t merging code to master, you have no need to deploy either. It’s not about Fridays; it’s about the swift, virtuous feedback loop.
And nobody should be shamed for what they need to do to survive, given the state of their systems today.
But things aren’t gonna get better unless you see clearly how you are contributing to your present pain. And congratulating ourselves for blocking Friday deploys is like congratulating ourselves for swatting ourselves in the face with the flyswatter. It’s a gross hack.
Maybe you had a good reason. Sure. But I’m telling you, if you truly do care about people and their work/life balance: we can do a lot better.
(With 🙏 to Joe Beda, whose brilliant idea for a blog post this was. Thanks for letting me borrow it!)
Interviewing is hard and it sucks.
In theory, it really shouldn’t be. You’re a highly paid professional and your skills are in high demand. This ought to be a meeting between equals to mutually explore what a longer-term relationship might look like. Why take the outcome personally? There are at least as many reasons for you to decide not to join a company as for the company to decide not to hire you, right?
In reality, of course, all the situational cues and incentives line up to make you feel like the whole thing is a referendum on whether or not you personally are Good Enough (smart enough, senior enough, skilled enough, cool enough) to join their fancy club.
People stay at shitty jobs far, far longer than they ought to, just because interviews can be so genuinely crushing to your spirit and sense of self. Even when they aren’t the worst, it can leave a lasting sting when they decline to hire you.
But there is an important asymmetry here. By not hiring someone, I very rarely mean it as a rejection of that person. (Not unless they were, like, mean to the office manager, or directed all their technical questions to the male interviewers.) On the contrary, I generally hold the people we decline to hire — or have had to let go! — in extremely high opinion.
So if someone interviews at Honeycomb, I do not want them to walk away feeling stung, hurt, or bad about themselves. I would like them to walk away feeling good about themselves and our interactions, even if one or both of us are disappointed by the outcome. I want them to feel the same way about themselves as I feel about them, especially since there’s a high likelihood that I may want to work with them in the future.
So here are the real, honest-to-god most common reasons why I don’t hire someone.
If you’ve worked at a Google or Facebook before, you may have a certain mental model of how hiring works. You ask the candidate a bunch of questions, and if they do well enough, you hire them. This could not be more different from early stage startup hiring, which is defined in every way by scarcity.
I only have a few precious slots to fill this year, and every single one of them is tied to one or more key company initiatives or goals, without which we may fail as a company. Emily and I spend hours obsessively discussing what the profile we are looking for is, what the smallest possible set of key strengths and skills that this hire must have, inter-team and intra-team dynamics and what elements are missing or need to be bolstered from the team as it stands. And at the end of the day, there are not nearly as many slots to fill as there are awesome people we’d like to hire. Not even close. Having to choose between several differently wonderful people can be *excruciating*.
No, not that kind. (Yes, we care about cultivating a diverse team and support that goal through our recruiting and hiring processes, but it’s not a factor in our hiring decisions.) I mean your level, stage in your career, educational background, professional background, trajectory, areas of focus and strengths. We are trying to build radical new tools for sociotechnical systems; tools that are friendly, intuitive, and accessible to every engineer (and engineering-adjacent profession) in the world.
How well do you think we’re going to do at our goal if the people building it are all ex-Facebook, ex-MIT senior engineers? If everyone has the exact same reference points and professional training, we will all have the same blind spots. Even if our team looks like a fucking Benetton ad.
3. We are assembling a team, not hiring individuals.
We spend at least as much time hashing out what the subtle needs of the team are right now as talking about the individual candidate. Maybe what we need is a senior candidate who loves mentoring with her whole heart, or a language polyglot who can help unify the look and feel of our integrations across ten different languages and platforms. Or maybe we have plenty of accomplished mentors, but the team is really lacking someone with expertise in query profiling and db tuning, and we expect this to be a big source of pain in the coming year. Maybe we realize we have nobody on the team who is interested in management, and we are definitely going to need someone to grow into or be hired on as a manager a year or two from now.
There is no value judgment or hierarchy attached to any of these skills or particulars. We simply need what we need, and you are who you are.
4. I am not confident that we can make you successful in this role at this time.
We rarely turn people down for purely technical reasons, because technical skills can be learned. But there can be some combination of your skills, past experience, geographical location, time zone, experience with working remotely, etc — that just gives us pause. If we cast forward a year, do we think you are going to be joyfully humming along and enjoying yourself, working more-or-less independently and collaboratively? If we can’t convince ourselves this is true, for whatever reasons, we are unlikely to hire you. (But we would love to talk with you again someday.)
5. The team needs someone operating at a different level.
Don’t assume this always means “you aren’t senior enough”. We have had to turn down people at least as often for being too senior as not senior enough. An organization can only absorb so many principal and senior engineers; there just isn’t enough high-level strategic work to go around. I believe happy, healthy teams are comprised of a range of levels — you need more junior folks asking naive questions that give senior folks the opportunity to explain themselves and catch their dumb mistakes. You need there to be at least one sweet child who is just so completely stoked to build their very first login page.
A team staffed with nothing but extremely senior developers will be a dysfunctional, bored and contentious team where no one is really growing up or being challenged as they should.
6. We don’t have the kind of work you need or want.
The first time we tried hiring junior developers, we ran into this problem hardcore. We simply didn’t have enough entry-level work for them to do. Everything was frustratingly complex and hard for them, so they weren’t able to operate independently, and we couldn’t spare an engineer to pair with them full time.
This also manifests in other ways. Like, lots of SREs and data engineers would LOVE to work at honeycomb. But we don’t have enough ops engineering work or data problems to keep them busy full time. (Well — that’s not precisely true. They could probably keep busy. But it wouldn’t be aligned with our core needs as a business, which makes them premature optimizations we cannot afford.)
7. Communication skills.
We select highly for communication skills. The core of our technical interview involves improving and extending a piece of code, then bringing it in the next day to discuss it with your peers. We believe that if you can explain what you did and why, you can definitely do the work, and the reverse is not necessarily true. We also believe that communication skills are at the foundation of a team’s ability to learn from its mistakes and improve as a unit. We value high-performing teams, therefore we select for those skills.
There are many excellent engineers who are not good communicators, or who do not value communication the way we do, and while we may respect you very much, it’s not a great fit for our team.
8. You don’t actually want to work at a startup.
“I really want to work at a startup. Also the things that are really important to me are: work/life balance, predictability, high salary, gold benefits, stability, working from 10 to 5 on the dot, knowing what i’ll be working on for the next month, not having things change unexpectedly, never being on call, never needing to think or care about work out of hours …”
To be clear, it is not a red flag if you care about work/life balance. We care about that too — who the hell doesn’t? But startups are inherently more chaotic and unpredictable, and roles are more fluid and dynamic, and I want to make sure your expectations are aligned with reality.
9. You just want to work for women.
I hate it when I’m interviewing someone and I ask why they’re interested in Honeycomb, and they enthusiastically say “Because it was founded by women!”, and I wait for the rest of it, but that’s all there is. That’s it? Nothing interests you about the problem, the competitive space, the people, the customers … nothing?? It’s fine if the leadership team is what first caught your eye. But it’s kind of insulting to just stop there. Just imagine if somebody asked you out on a date “because you’re a woman”. Low. Fucking. Bar.
10. I truly want you to be happy.
I have no interest in making a hard sell to people who are dubious about Honeycomb. I don’t want to hire people who can capably do the job, but whose hearts are really elsewhere doing other things, or who barely tolerate going to work every day. I want to join with people who see their labor as an extension of themselves, who see work as an important part of their life’s project. I only want you to work here if it’s what’s best for you.
11. I’m not perfect.
We have made the wrong decision before, and will do so again. >_<
As a candidate, it is tempting to feel like you will get the job if you are awesome enough, therefore if you do not get the job it must be because you were insufficiently awesome. But that is not how hiring works — not for highly constrained startups, anyway.
If we brought you in for an interview, we already think you’re awesome. Period. Now we’re just trying to figure out if you narrowly intersect the skill sets we are lacking that we need to succeed this year.
If you could be a fly on the wall, listening to us talk about you, the phrase you would hear over and over is not “how good are they?”, but “what will they need to be successful? can we provide the support they need?” We know this is as much of a referendum on us as it is on you. And we are not perfect.
Yesterday we had a super fun meetup here at Intercom in Dublin. We split up into small discussion groups and talked about things related to managing teams and being a senior individual contributor (IC), and going back and forth throughout your career.
One interesting question that came up repeatedly was: “what are some reasons that someone might not want to be a manager?”
"Things would be different if I was in charge", the all belief that authority is an all powerful magic wand you can wave and fix things.
Fascinatingly, I heard it asked over the full range of tones from extremely positive (“what kind of nutter wouldn’t want to manage a team?!”) to extremely negative (“who would ever want to manage a team?!”). So I said I would write a piece and list some reasons.
Point of order: I am going to focus on intrinsic reasons, not external ones. There are lots of toxic orgs where you wouldn’t want to be a manager for many reasons — but that list is too long and overwhelming, and I would argue you probably don’t want to work there in ANY capacity. Please assume the surroundings of a functional, healthy org (I know, I know — whopping assumption).
it's a huge responsibility. if you are having trouble advocating for yourself and your own needs/career goals/work output, then you may not have the capacity to do it for the people you're responsible for managing. i take the role extremely seriously, and it takes a toll.
Never underestimate this one, and never take it for granted. If you look forward to work and even miss it on vacation; if you occasionally leave work whistling with delight and/or triumph; if your brain has figured out how to wring out regular doses of dopamine and serotonin while delivering ever-increasing value; if you look back with pride at what you have learned and built and achieved, if you regularly tap into your creative happy place … hell, your life is already better than 99.99% of all the humans who have ever labored and lived. Don’t underestimate the magnitude of your achievement, and don’t assume it will always be there waiting for you to just pick it right back up again.
I got into tech because I like writing code. As a manager, I didn’t get to do that. Becoming a not-manager lets me do that again.
2. It is easy to get a new engineering job. Really, really easy.
Getting your first gig as an engineer can be a challenge, but after that? It is possibly easier for an experienced engineer to find a new job than anyone else on the planet. There is so much demand this skill set that we actually complain about how annoying it is being constantly recruited! Amazing.
It is typically harder to find a new job as a manager. If you think interview processes for engineers are terrible (and they are, honey), they are even weirder and less predictable (and more prone to implicit bias) for managers. So much of manager hiring is about intangibles like “culture fit” and “do I like you” — things you can’t practice or study or know if you’ve answered correctly. And soooo much of your skill set is inevitably bound up in navigating the personalities and bureaucracies of particular teams and a particular company. A manager’s effectiveness is grounded in trust and relationships, which makes it much less transferrable than engineering skills.
Someone has probably said it, but management will always be an option, but going back from management to writing code again can be very difficult (after some period of time). Anyway, looking forward to the post.
I am not claiming it is equally trivial for everyone to get a new job; it can be hard if you live in an out-of-the-way place, or have an unusual skill, etc. But in almost every case, it becomes harder if you’re a manager. Besides — given that the ratio of engineers to line managers is roughly 7 to one — there will be almost an order of magnitude fewer eng manager jobs than engineering jobs.
Regardless of org health, there's a _lot_ of emotional labor involved. Whether that's good for you personally depends a lot on circumstances, and how much of it you tend to take home with you. If it's too much to take, probably not good to manage, either for you or your team.
Engineers (in theory) add value directly to the bottom line. Management is, to be brutally frank, overhead. Middle management is often the first to be cut during layoffs
Remember how I said that creation is the engineering superpower? That’s a nicer way of saying that managers don’t directly create any value. They may indirectly contribute to increased value over time — the good ones do — but only by working through other people as a force multiplier, mentor etc. When times get tough, you don’t cut the people who build the product, you cut the ones whose value-added is contingent or harder to measure.
Another way this plays out is when companies are getting acquired. As a baseline for acquihires, the acquiring company will estimate a value of $1 million per engineer, then deduct $500k for every other role being acquired. Ouch.
I noticed that as soon as I had a competent manager, I never considered going into management ever again 😀
Where it’s completely normal for an engineer to hop jobs every 1-3 years, a manager who does this will not get points for learning a wide range of skills, they’ll be seen as “probably difficult to work with”. I have no data to support this, but I suspect the job tenure of a successful manager is at least 2-3x as long as that of a successful IC. It takes a year or two just to gain the trust of everyone on your team and the adjacent teams, and to learn the personalities involved in navigating the organization. At a large company, it may take a few times that long. I was a manager at Facebook for 2.5 years and I still learned some critical new detail about managing teams there on a weekly basis. Your value to the org really kicks in after a few years have gone by, once a significant part of the way things get done resides in your cranium.
As a PE who deliberately "leads" but has no interest in "management": I have stomach-churning aversion to the disciplinary/compensation/downsizing side of management, and a nontrivial chunk of my job satisfaction still comes from learning/exploring hard technical problems.
You know the type. Sneering about how managers don’t do any “real work”, looking down on them for being “less technical”. Basically everyone who utters the question “.. but how technical are they?” in that particular tone of voice is a shitbird. Hilariously, we had a great conversation about whether a great manager needs to be technical or not — many people sheepishly admitted that the best managers they had ever had knew absolutely nothing about technology, and yet they gave managers coding interviews and expected them to be technical. Why? Mostly because the engineers wouldn’t respect them otherwise.
7. As a manager, you will need to have some hard conversations. Really, really hard ones.
Do you shy away from confrontation? Does it seriously stress you out to give people feedback they don’t want to hear? Manager life may not be for you. There hopefully won’t be too many of these moments, but when they do happen, they are likely to be of outsized importance. Having a manager who avoids giving critical feedback can be really damaging, because it deprives you of the information you need to make course corrections before the problem becomes really big and hard.
Being a good manager takes emotional maturity, and it can be exhausting to always handle interpersonal problems well. Idk, I like to think I did better than ave, but holding people accountable? Giving the tough talks? If you hate that, do us all a fav and don't be a mgr.
As an engineer, if you really feel strongly about something, you just go off and do it yourself. As a manager, you have to lead through influence and persuasion and inspiring other people to do things. It can be quite frustrating. “But can’t I just tell people what to do?” you might be thinking. And the answer is no. Any time you have to tell someone what to do using your formal authority, you have failed in some way and your actual influence and power will decrease. Formal authority is a blunt, fragile instrument.
For a technical person, being a principal in a company with a two track career ladder, is all the best parts of managing a team without the down sides.
There is still plenty of room to learn and grow, career wise.
Best companies enable people to swap tracks back and forth.
3. If you go become a manager because you want to be the one making the decisions, imagine how happy you'd be with a manager like that. Also remember you're also going to have your own manager 4. Your current skillset is irrelevant. Humans are random & heterogenous. It's hard.
10) Use your position as an IC to bring balance to the Force.
I LOVE working in orgs where ICs have power and use their voices. I love having senior ICs around who model that, who walk around confidently assuming that their voice is wanted and needed in the decision-making process. If your org is not like that, do you know who is best positioned to shift the balance of power back? Senior ICs, with some behind-the-scenes support from managers. For this reason, I am always a little sad when a vocal, powerful IC who models this behavior transitions to management. If ALL of the ICs who act this way become managers, it sends a very dismaying message to the ranks — that you only speak up if you’re in the process of converting to management.
Not the optimal way to achieve impact given the setup of our organization, my personal skills, and work it would necessarily trade off with.
11) Management is just a collection of skills, and you should be able to do all the fun ones as an IC.
Do you love mentoring? Interviewing, constructing hiring loops, defining the career ladder? Do you love technical leadership and teaching other people, or running meetings and running projects? Any reasonably healthy org should encourage all senior ICs to participate and have leadership roles in these areas. Management can be unbundled into a lot of different skills and roles, and the only ones that are necessarily confined to management are the shitty ones, like performance reviews and firing people. I LOVE it when an engineer expresses the desire to start learning more management skills, and will happily brainstorm with them on next steps — get an intern? run team meetings? there are so many things to choose from! When I say that all engineers should try management at some point in their career, what I really mean is these are skills that every senior engineer should develop. Or as Jill says:
I tell people all the time that you can do most of the "fun" management things (mentoring, coaching, watching people grow, contributing to decision making) as an IC without doing all the terrible parts of management (firing, budgeting, serious HR things).
That dopamine drip in your brain from fixing problems and learning things goes away, and it’s … real tough. This is why I say you need to commit to a two year stint if you’re going to try management: that, plus it takes that long to start to get your feet under you and is hard on your team if they’re switching managers all the time. It usually takes a year or two to rewire your brain to look for the longer timeline, less intense rewards you get from coaching other people to do great things. For some of us, it never does kick in. It’s genuinely hard to know whether you’ve done anything worth doing.
As a manager who frequently falls down a mental hole about not being totally sure I ever achieve anything or add value: sometimes you can go for long periods unsure you have achieved anything or added value 🙂
13) It will take up emotional space at the expense of your personal life.
When I was an IC, I would work late and then go out and see friends or meet up at the pub almost every night. It was great for my dating life and social life in general. As a manager, I feel like curling up in a fetal position and rolling home around 4 pm. I’m an introvert, and while my capacity has increased a LOT over the past several years, I am still sapped every single day by the emotional needs of my team.
As an engineer who's survived this long in the biz I know two things: a) I'm really good at dealing with technical stuff, and b) I'm really not good at dealing with people.
Schedule flexibility is an often overlooked reason. Coming back from maternity leave, big trip, sick days are easier if you don’t have a team whose day to day you are responsible for. Also meetings tend not to be very movable time wise.
16) If technical leadership is what your heart loves most, you should NOT be a manager.
If you are a strong tech lead and you convert to management, it is your job to begin slowly taking yourself out of the loop as tech lead and promoting others in your place. Your technical skills will stop growing at the point that you switch careers, and will slowly decay after that. Moreover, if you stay on as tech lead/manager you will slowly suck all the oxygen from the room. It is your job to train up and hand over to your replacements and gradually step out of the way, period.
For a while, I personally struggled to switch my mindset from deriving my sense of personal success on the code I shipped to the impact the team(s) I supported were delivering. I have definitely seen others fail to make that change and personally suffer for it.
Wish we could avoid the either/or of manager vs individual contributor. There’s also practice leaders who might not manage within a formal org sense but are specialists and still lead teams and innovative thinking. Best job at the company IMHO
Given all this, why should ANYONE ever be a manager? Shrug. I don’t think there’s any one good or bad answer. I used to think a bad answer would be “to gain power and influence” or “to route around shitty communication systems”, but in retrospect those were my reasons and I think things turned out fine. It’s a complex calculation. If you want to try it and the opportunity arises, try it! Just commit to the full two year experiment, and pour yourself into learning it like you’re learning a new career — since, you know, you are.
"If you want to spend your emotional energy outside of work "
But please do be honest with yourself. One thing I hate is when someone wants to be a manager, and I ask why, and they rattle off a list of reasons they’ve heard that people SHOULD want to become managers (“to have a greater impact than I can with just myself, because I love helping other people learn and grow, etc”) but I am damn sure they are lying to themselves and/or me.
Introspection and self-knowledge are absolutely key to being a decent manager, and lord knows we need more of those. So don’t kick off your grand experiment by lying to yourself, ok?
And also, the people who excel at all those management tasks, the ICs who would actually make *great* managers but don't want to do it? They make the *best* ICs. Literally a dream. They make my job so much easier in so many ways. Wouldn't trade them.
I hadn’t seen anyone say something like this in quite a while. I remember saying things like this myself as recently as, oh, 2016, but I thought the zeitgeist had moved on to continuous delivery.
Which is not to say that Friday freezes don’t happen anymore, or even that they shouldn’t; I just thought that this was no longer seen as a badge of responsibility and honor, rather a source of mild embarrassment. (Much like the fact that you still don’t automatedly restore your db backups and verify them every night. Do you.)
So I responded with an equally hyperbolic and indefensible claim:
If you're scared of pushing to production on Fridays, I recommend reassigning all your developer cycles off of feature development and onto your CI/CD process and observability tooling for as long as it takes to ✨fix that✨.
Now obviously, OBVIOUSLY, reassigning all your developer cycles is probably a terrible idea. You don’t get 100x parallel efficiency if you put 100 developers on a single problem. So I thought it was clear that this said somewhat tongue in cheek, serious-but-not-really. I was wrong there too.
So let me explain.
There’s nothing morally “wrong” with Friday freezes. But it is a costly and cumbersome bandage for a problem that you would be better served to address directly. And if your stated goal is to protect people’s off hours, this strategy is likely to sabotage that goal and cause them to waste far more time and get woken up much more often, and it stunts your engineers’ technical development on top of that.
Fear is the mind-killer.
Fear of deploys is the ultimate technical debt. How much time does your company waste, between engineers:
waiting until it is “safe” to deploy,
batching up changes into bigger changes that are decidedly unsafe to deploy,
debugging broken deploys that had many changes batched into them,
waiting nervously to get paged after a deploy goes out,
figuring out if now is a good time to deploy or not,
cleaning up terrible deploy-related catastrophuckes
Anxiety related to deploys is the single largest source of technical debt in many, many orgs. Technical debt, lest we forget, is not the same as “bad code”. Tech debt hurts your people.
Saying “don’t push to production” is a code smell. Hearing it once a month at unpredictable intervals is concerning. Hearing it EVERY WEEK for an ENTIRE DAY OF THE WEEK should be a heartstopper alarm. If you’ve been living under this policy you may be numb to its horror, but just because you’re used to hearing it doesn’t make it any less noxious.
If you’re used to hearing it and saying it on a weekly basis, you are afraid of your deploys and you should fix that.
It’s a smell. If you can’t deploy at 6pm on a Friday it means you don’t understand or don’t trust your systems or process. That might be ok if you openly acknowledge it, but if your not talking about it, that’s dangerous
If you are a software company, shipping code is your heartbeat. Shipping code should be as reliable and sturdy and fast and unremarkable as possible, because this is the drumbeat by which value gets delivered to your org.
Deploys are the heartbeat of your company.
Every time your production pipeline stops, it is a heart attack. It should not be ok to go around nonchalantly telling people to halt the lifeblood of their systems based on something as pedestrian as the day of the week.
Why are you afraid to push to prod? Usually it boils down to one or more factors:
your deploys frequently break, and require manual intervention just to get to a good state
your test coverage is not good, your monitoring checks are not good, so you rely on users to report problems back to you and this trickles in over days
recovering from deploys gone bad can regularly cause everything to grind to a halt for hours or days while you recover, so you don’t want to even embark on a deploy without 24 hours of work day ahead of you
your deploys are painfully slow, and take hours to run tests and go live.
These are pretty darn good reasons. If this is the state you are in, I totally get why you don’t want to deploy on Fridays. So what are you doing to actively fix those states? How long do you think these emergency controls will be in effect?
The answers of “nothing” and “forever” are unacceptable. These are eminently fixable problems, and the amount of drag they create on your engineering team and ability to execute are the equivalent of five-alarm fires.
Fix. That. Take some cycles off product and fix your fucking deploy pipeline.
It is difficult to combine "you shouldn't do this because it is a symptom of a systemic problem, you should instead address the systemic problem" with "yes actually you should do it for as long as the systemic problem exists, because it is adaptive".
If you’ve been paying attention to the DORA report or Accelerate, you know that the way you address the problem of flaky deploys is NOT by slowing down or adding roadblocks and friction, but by shipping more QUICKLY.
Science says: ship fast, ship often.
Deploy on every commit. Smaller, coherent changesets transform into debuggable, understandable deploys. If we’ve learned anything from recent research, it’s that velocity of deploys and lowered error rates are not in tension with each other, they actually reinforce each other. When one gets better, the other does too.
So by slowing down or batching up or pausing your deploys, you are materially contributing to the worsening of your own overall state.
If you block devs from merging on Fridays, then you are sacrificing a fifth of your velocity and overall output. That’s a lot of fucking output.
If you do not block merges on Fridays, and only block deploys, you are queueing up a bunch of changes to all get shipped days later, long after the engineers wrote the code and have forgotten half of the context. Any problems you encounter will be MUCH harder to debug on Monday in a muddled blob of changes than they would have been just shipping crisply, one at a time on Friday. Is it worth sacrificing your entire Monday? Monday-Tuesday? Monday-Tuesday-Wednesday?
The worst, most PTSD-inducing outages of my life have all happened after holiday code freezes. Every. Single. One.
Don't use rules. Practice good judgment, build tools to align incentives and deploy often; practice tolerating and recovering from pedestrian failures often too. https://t.co/GH7lef274z
I am not saying that you should make a habit of shipping a large feature at 4:55 pm on Friday and then sauntering out the door at 5. For fucks sake. Every engineer needs to learn and practice good technical judgment around deploy hygiene. LIke,
Don’t ship before you walk out the door on *any* day.
Don’t ship big, gnarly features right before the weekend, if you aren’t going to be around to watch them.
Instrument your code, and go and LOOK at the damn thing once it’s live.
Use feature flags and other tools that separate turning on code paths from deploys.
But you don’t need rules for this; in fact, rules actually inhibit the development of good judgment!
Policies (and enumerated exceptions to policies, and exceptions to exceptions) are a piss-poor substitute for judgment. Rules are blunt instruments that stunt your engineers' development and critical thinking skills.
Most deploy-related problems are readily obvious, if the person who has the context for the change in their heads goes and looks at it.
But if you aren’t looking for them, then sure — you probably won’t find out until user reports start to trickle in over the next few days.
So go and LOOK.
Stop shipping blind. Actually LOOK at what you ship.
I mean, if it takes 48 hours for a bug to show up, then maybe you better freeze deploys on Thursdays too, just to be safe! 🙄
I get why this seems obvious and tempting. The “safety” of nodeploy Friday is realized immediately, while the costs are felt later later. They’re felt when you lose Monday (and Tuesday) to debugging the big blob deplly. Or they get amortized out over time. Or you experience them as sluggish ship rates and a general culture of fear and avoidance, or learned helplessness, and the broad acceptance of fucked up situations as “normal”.
If pushing to production is a painful event, make it a *non-event* (same philosophy as chaos engineering)
But if recovering from deploys is long and painful and hard, then you should fix that. If you don’t tend to detect reliability events until long after the event, you should fix that. If people are regularly getting paged on Saturdays and Sundays, they are probably getting paged throughout the night, too. You should fix that.
On call paging events should be extremely rare. There’s no excuse for on call being something that significantly impacts a person’s life on the regular. None.
I’m not saying that every place is perfect, or that every company can run like a tech startup. I am saying that deploy tooling is systematically underinvested in, and we abuse people far too much by paging them incessantly and running them ragged, because we don’t actually believe it can be any better.
It can. If you work towards it.
Devote some real engineering hours to your deploy pipeline, and some real creativity to your processes, and someday you too can lift the Friday ban on deploys and relieve your oncall from burnout and increase your overall velocity and productivity.
On virtue signaling
Finally, I heard from a alarming number of people who admitted that Friday deploy bans were useless or counterproductive, but they supported them anyway as a purely symbolic gesture to show that they supported work/life balance.
This makes me really sad. I’m … glad they want to support work/life balance, but surely we can come up with some other gestures that don’t work directly counter to their goals of life/work balance.
That's it. Because if you make it a virtue signal, it will NEVER GET FIXED. Blocking Friday deploy is not a mark of moral virtue; it is a physical bash script patching over technical rot.
And technical rot is bad because it HURTS PEOPLE. It is in your interest to fix it.
Ways to begin recovering from a toxic deploy culture:
Have a deploy philosophy, make sure everybody knows what it is. Be consistent.
Build and deploy on every set of committed changes. Do not batch up multiple people’s commits into a deploy.
Train every engineer so they can run their own deploys, if they aren’t fully automated. Make every engineer responsible for their own deploys.
(Work towards fully automated deploys.)
Every deploy should be owned by the developer who made the changes that are rolling out. Page the person who committed the change that triggered the deploy, not whoever is oncall.
Set expectations around what “ownership” means. Provide observability tooling so they can break down by build id and compare the last known stable deploy with the one rolling out.
Never accept a diff if there’s no explanation for the question, “how will you know when this code breaks? how will you know if the deploy is not behaving as planned?” Instrument every commit so you can answer this question in production.
Shipping software and running tests should be fast. Super fast. Minutes, tops.
It should be muscle memory for every developer to check up on their deploy and see if it is behaving as expected, and if anything else looks “weird”.
Practice good deploy hygiene using feature flags. Decouple deploys from feature releases. Empower support and other teams to flip flags without involving engineers.
Each deploy should be owned by the developer who made the code changes. But your deploy pipeline needs to have a team that owns it too. I recommend putting your most experienced, senior developers on this problem to signal its high value.
Ultimately, I am not dogmatic about Friday deploys. Truly, I’m not. If that’s the only lever you have to protect your time, use it. But call it and treat it like the hack it is. It’s a gross workaround, not an ideal state.
Don’t let your people settle into the idea that it’s some kind of moral stance instead of a butt-ugly hack. Because if you do you will never, ever get rid of it.
There are plenty of good reasons to block deploys on Fridays, but it's not a good policy to cargo cult blindly. It imposes real costs and ultimately hinders you from achieving safe, boring deploys.
Remember: a team’s maturity and efficiency can be represented by how long it takes to get their shit into users’ hands after they write it. Ship it fast, while it’s still fresh in your developers’ heads. Ship one change set at a time, so you can swiftly debug and revert them. I promise your lives will be so much better. Every step helps. <3
Seven years ago I was working on backend infra for mobile apps at Parse, resenting MongoDB and its accursed single write lock per replica with all my dirty, blackened soul. That’s when Miles Ward asked me to give a customer testimonial for MongoDB at AWS reinvent.
It was my first time EVER speaking in public, and I had never been more terrified. I have always been a writer, not a talker, and I was pathologically afraid of speaking in public, or even having groups of people look at me. I scripted every word, memorized my lines, even printed it all out just in case my laptop didn’t work. I had nightmares every night. For three months I woke up every night in a cold sweat, shaking.
And I bombed, completely and utterly. The laptop DIDN’T work, my limbs and tongue froze, I was shaking so badly I could hardly read my printout, and after I rushed through the last sentences I turned and stumbled robotically off the stage, fully unaware that people were raising their hands and asking questions. I even tripped over the microphone cord in my haste to escape the stage.
Afterwards I burned with unpleasantries — fear, anger, humiliation, rage at being so bad at anything. It was excruciating. For the next two years I sought out every opportunity I could get to talk at a meetup, conference, anything. I got a prescription for propranolol to help manage the physical symptoms of panic. I gave 17 more talks that year, spending most nights and weekends working on them or rehearsing, and 21 the year after that. I hated every second of it.
I hated it, but I burned up my fear and aversion as fuel. Until around 18 months later, when I realized that I no longer had nightmares and had forgotten to pack my meds for a conference. I brute forced my way through to the other side, and public speaking became just an ordinary skill or a tool like any other.
I was on a podcast last week where the topic was career journeys. They asked me what piece of career advice I would like to give to people. I promptly said that following your bliss is nice, but I think it’s important to learn to lean into pain.
“Pain is nature’s teacher,” I said. Feedback loops train us every day, mostly unconsciously. We feel aversion for pain, and we enjoy dopamine hits, and out of those and other brain chemicals our habits are made. All it takes is a little tolerance for discomfort and a some conscious tweaking of those feedback loops, and you can train yourself to achieve big things without even really trying.
But then I hesitated. Yes, leaning in to pain has done well for me in my career. But that is not the whole story, it leaves off some important truths. It has also hurt me and held me back.
Misery is not a virtue. Pain is awful. That’s why it’s so powerful and primal. It’s a pre-conscious mechanism, an acute response that kicks in long before your conscious mind. Even just the suggestion of pain (or memory of past trauma) will train you to twist and contort around to avoid it.
When you are in pain, your horizons shrink. Your vision narrows, you curl inward. You have to expend enormous amounts of energy just moving forward through the day inch by inch.
Everything is hard when you’re in pain. Your creative brain shuts down. Basic life functions become impossible tests. You have to spend so much time compensating for your reduced capacity that learning new things is nearly impossible. You can’t pick up on subtle signals when your nerves are screaming in agony. And you grow numb over time, as they die off from sheer exhaustion.
I am no longer the CEO of honeycomb.
I never wanted to be CEO; I always fiercely wanted a technical role. But it was a matter of company survival, and I did my best. I wasn’t a great CEO, although we did pretty well at the things I am good at or care about. But I couldn’t expand past them.
I hated every second of it. I cried every single day for the first year and a half. I tried to will myself into loving a role I couldn’t stand, tried to brute force my way to success like I always do. It didn’t get better. My ability to be present and curious and expansive withered. I got numb.
Turns out not every problem can be powered through on a high pain tolerance. The collateral damage starts to rack up. Sometimes the only way to succeed is to redefine success.
Pain is a terrific teacher, but pain is an acute response. Chronic pain will hijack your reward pathways, your perspective, your relationships, and every other productive system and leave them stunted.
Leaning in to pain can be powerful if you have the agency and ability to change it, or practice it to mastery, or even just adapt your own emotional responses to it. If you don’t or you can’t, leaning in to pain will kill you. Having the wisdom to know the difference is everything. Or so I’m learning.
From here on out I’ll be in the CTO seat. I don’t know what that even means yet, but I guess we’ll find out. Stay tuned. <3
Last night I was out with a dear friend who has been an engineering manager for a year now, and by two drinks in I was rattling off a long list things I always say to newer engineering managers.
Then I remembered: I should write a post! It’s one of my goals this year to write more long form instead of just twittering off into the abyss.
There’s a piece I wrote two years ago, The Engineer/Manager Pendulum, which is probably my all time favorite. It was a love letter to a friend who I desperately wanted to see go back to engineering, for his own happiness and mental health. Well, this piece is a sequel to that one.
It’s primarily aimed at new managers, who aren’t sure what their career options look like or how to evaluate the opportunities that come their way, or how it may expand or shrink their future opportunities.
The first fork in the manager’s path
Every manager reaches a point where they need to choose: do they want to manage engineers (a “line manager”), or do they want to try to climb the org chart? — manage managers, managers of other managers, even other divisions; while being “promoted” from manager to senior manager, director to senior director, all the way up to VP and so forth. Almost everyone’s instinct is to say “climb the org chart”, but we’ll talk about why you should be critical of this instinct.
They also face a closely related question: how technical do they wish to stay, and how badly do they care?
Are you an “engineering MANAGER” or an “ENGINEERING manager”?
These are not unlike the decisions every engineer ends up making about whether to go deep or go broad, whether to specialize or be a generalist. The problem is that both engineers and managers often make these career choices with very little information — or even awareness that they are doing it.
And managers in particular then have a tendency to look up ten years later and realize that those choices, witting or unwitting, have made them a) less employable and b) deeply unhappy.
Lots of people have the mindset that once they become an engineering manager, they should just go from gig to gig as an engineering manager who manages other engineers: that’s who they are now. But this is actually a very fragile place to sit long-term, as we’ll discuss further on in this piece.
But let’s start at to the beginning, so I can speak to those of you who are considering management for the very first time.
“So you want to try engineering management.”
COOL! I think lots of senior engineers should try management, maybe even most senior engineers. It’s so good for you, it makes you better at your job. (If you aren’t a senior engineer, and by that I mean at least 7+ years of engineering experience, be very wary; know this isn’t usually in your best interest.)
Hopefully you have already gathered that management is a career change, not a promotion, and you’re aware that nobody is very good at it when they first start.
That’s okay! It takes a solid year or two to find new rhythms and reward mechanisms before you can even begin to find your own voice or trust your judgment. Management problems look easy, deceptively so. Reasons this is hard include:
Most tech companies are absolutely abysmal at providing any sort of training or structure to help you learn the ropes and find your feet.
Even if they do, you still have to own your own careerdevelopment. If learning to be a good engineer was sort of like getting your bachelor’s, learning to be a good manager is like getting your PhD — much more custom to who you are.
It will exhaust you mentally and emotionally in the weirdest ways for much longer than you think it should. You’ll be tired a lot, and you’ll miss feeling like you’re good at something (anything).
This is because you need to change your habits and practices, which in turn will actually change who you are. This takes time. Which is why …
The minimum tour of duty as a new manager is two years.
If you really want to try being a manager, and the opportunity presents itself, do it! But only if you are prepared to fully commit to a two year long experiment.
Commit to it like a proper career change. Seek out new peers, find new heroes. Bring fresh eyes and a beginner’s mindset. Ask lots of questions. Re-examine every one of your patterns and habits and priorities: do they still serve you? your team?
Don’t even bother thinking about in terms of whether you “enjoy managing” for a while, or trying to figure out if you are are any good at it. Of course you aren’t any good at it yet. And even if you are, you don’t know how to recognize when you’ve succeeded at something, and you haven’t yet connected your brain’s reward systems to your successes. A long stretch of time without satisfying brain drugs is just the price of admission if you want to earn these experiences, sadly.
It takes more than one year to learn management skills and wire up your brain to like it. If you are waffling over the two year commitment, maybe now is not the time. Switching managers too frequently is disruptive to the team, and it’s not fair to make them report to someone who would rather be doing something else or isn’t trying their ass off.
It takes about 3-5 years for your skills to deteriorate.
So you’ve been managing a team for a couple years, and it’s starting to feel … comfortable? Hey, you’re pretty good at this! Yay!
With a couple of years under your belt as a line manager, you now have TWO powerful skill sets. You can build things, AND you can organize people into teams to build even bigger things. Right now, both sets are sharp. You could return to engineering pretty easily, or keep on as a manager — your choice.
But this state of grace doesn’t last very long. Your technical skills stop advancing when you become a manager, and instead begin eroding. Two years in, you aren’t the effective tech lead you once were; your information is out of date and full of gaps, the hard parts are led by other people these days.
More critically, your patterns of mind and habits shift over time, and become those of a manager, not an engineer. Consider how excited an engineer becomes at the prospect of a justifiable greenfield project; now compare to her manager’s glum reaction as she instinctively winces at having to plan for something so reprehensibly unpredictable and difficult to estimate. It takes time to rewire yourself back.
If you like engineering management, your tendency is to go “cool, now I’m a manager”, and move from job to job as an engineering manager, managing team after team of engineers. But this is a trap. It is not a sound long term plan. It leads too many people off to a place they never wanted to end up: technically sidelined.
Why can’t I just make a career out of being a combo tech lead+line manager?
One of the most common paths to management is this: you’re a tech lead, you’re directing ever larger chunks of technical work, doing 1x1s and picking up some of the people stuff, when your boss asks if you’d like to manage the team. “Sure!”, you say, and voila — you are an engineering manager with deep domain expertise.
But if you are doing your job, you begin the process of divesting yourself of technical leadership responsibilities starting immediately. Your own technical development should screech to a halt once you become a manager, because you have a whole new career to focus on learning.
Your job is to leverage that technical expertise to grow your engineers into great senior engineers and tech leads themselves. Your job is not to hog the glory and squat on the hard problems yourself, it’s to empower and challenge and guide your team. Don’t suck up all the oxygen: you’ll stunt the growth of your team.
But your technical knowledge gets dated, and your skills atrophy.. The longer it’s been since you worked as an engineer, the harder it will be to switch back. It gets real hard around three years, and five years seems like a tipping point.
And because so much of your credibility and effectiveness as an engineering leader comes from your expertise in the technology that your team uses every day, ultimately you will be no longer capable of technical leadership, only people management.
On being an “engineering manager” who only does people management
I mean, there’s a reason we don’t lure good people managers away from Starbucks to run engineering teams. It’s the intersection and juxtaposition of skill sets that gives engineering managers such outsize impact.
The great ones can make a large team thrum with energy. The great ones can break down a massive project into projects that challenge (but do not overwhelm) a dozen or more engineers, from new grads to grizzled veterans, pushing everyone to grow. The great ones can look ahead and guess which rocks you are going to die on if you don’t work to avoid them right now.
The great ones are a treasure: and they are rare. And in order to stay great, they regularly need to go back to the well to refresh their own hands-on technical abilities.
There is an enormous demand for technical engineering leaders — far more demand than supply. The most common hackaround is to pair a people manager (who can speak the language and knows the concepts, but stopped engineering ages ago) with a tech lead, and make them collaborate to co-lead the team. This unwieldy setup often works pretty well.
But most of those people managers didn’t want or expect to end up sidelined in this way when they were told to stop engineering.
If you want to be a pure people manager and not do engineering work, and don’t want to climb the ladder or can’t find a ladder to climb, more power to you. I don’t know that I’ve met many of these people in my life. I have met a lot of people in this situation by accident, and they are always kinda angsty and unhappy about it. Don’t let yourself become this person by accident. Please.
Which brings me to my next point.
You will be advised to stop writing code or engineering.
Everybody’s favorite hobby is hassling new managers about whether or not they’ve stopped writing code yet, and not letting up until they say that they have. This is a terrible, horrible, no-good VERY bad idea that seems like it must originally have been a botched repeating of the correct advice, which is:
Stop writing code and engineering
in the critical path
Can you spot the difference? It’s very subtle. Let’s run a quick test:
Authoring a feature? ⛔️
Covering on-call when someone needs a break? ✅
Diving on the biggest project after a post mortem? ⛔️
Code reviews? ✅
Picking up a p2 bug that’s annoying but never seems to become top priority? ✅
Insisting that all commits be gated on their approval? ⛔️
Cleaning up the monitoring checks and writing a library to generate coverage? ✅
The more you can keep your hands warm, the more effective you will be as a coach and a leader. You’ll have a richer instinct for what people need and want from you and each other, which will help you keep a light touch. You will write better reviews and resolve technical disputes with more authority. You will also slow the erosion and geriatric creep of your own technical chops.
I firmly believe every line manager should either be in the on call rotation or pinch hit liberally and regularly, but that’s a different post.
Technical Leadership track
If you love technology and want to remain a subject-matter expert in designing, building and shipping cutting-edge technical products and systems, you cannot afford to let yourself drift too far or too long away from hands-on engineering work. You need to consciously cultivate your path , probably by practicing some form of the engineer/manager pendulum.
If you love managing engineers — if being a technical leader is a part of your identity that you take great pride in, then you must keep up your technical skills and periodically invest in your practice and renew your education. Again: this is simply the price of admission. You need to renew your technical abilities, your habits of mind, and your visceral senses around creating and maintaining systems. There is no way to do this besides doing it. If management isn’t a promotion, then returning to hands-on work isn’t a demotion, either. Right?
One warning: Your company may be great, but it doesn’t exist for your benefit. You and only you can decide what your needs are and advocate for them. Remember that next time your boss tries to guilt you into staying on as manager because you’re so badly needed, when you can feel your skills getting rusty and your effectiveness dwindling. You owe it to yourself to figure out what makes you happy and build a portfolio of experiences that liberate you to do what you love. Don’t sacrifice your happiness at the altar of any company. There are always other companies.
Honestly, I would try not to think of yourself as a manager at all: you are an “engineering leader” performing a tour of duty in management. You’re pursuing a long term strategy towards being a well-respected technologist, someone who can sling code, give informed technical guidance and explain in detail customized for to anyone at any level of sophistication.
Organizational Leadership Track
Most managers assume they want to climb the ladder. Leveling up feels like an achievement, and that can feel impossible to resist.
Resist it. Or at least, resist doing it unthinkingly. Don’t do it because the ladder is there and must be climbed. Know as much as you can about what you’re in for before you decide it’s what you want.
Here are a few reasons to think critically about climbing the ladder to director and executive roles.
Your choices shrink. There are fewer jobs, with more competition, mostly at bigger companies. (Do you even like big companies?)
You basically need to do real time at a big company where they teach effective management skills, or you’ll start from a disadvantage.
Bureaucracies are highly idiosyncratic, skills and relationships may or may not transfer with you between companies. As an engineer you could skip every year or two for greener pastures if you landed a crap gig. An engineer has … about 2-3x more leeway in this regard than an exec does. A string of short director/exec gigs is a career ender or a coach seat straight to consultant life.
You are going to become less employable overall. The ever-higher continuous climb almost never happens, usually for reasons you have no control over. This can be a very bitter pill.
Your employability becomes more about your “likability” and other problematic things. Your company’s success determines the shape of your career much more than your own performance. (Actually, this probably begins the day you start managing people.)
Your time is not your own. Your flaws are no longer cute. You will see your worst failings ripple outward and be magnified and reflected. (Ditto, applies to all leaders but intensifies as you rise.)
You may never feel the dopamine hit of “i learned something, i fixed something, i did something” that comes so freely as an I.C. Some people learn to feel satisfaction from managery things, others never do. Most describe it as a very subdued version of the thrill you get from building things.
You will go home tired every night, unable to articulate what you did that day. You cannot compartmentalize or push it aside. If the project failed for reasons outside your control, you will be identified with the failure anyway.
Nobody really thinks of you as a person anymore, you turn into a totem for them to project shit on. (Things will only get worse if you hit back.) Can you handle that? Are you sure?
It’s pretty much a one-way trip.
Sure, there are compensating rewards. Money, power, impact. But I’m pointing out the negatives because most people don’t stop to consider them when they start saying they want to try managing managers. Every manager says that.
The mere existence of a ladder compels us all to climb.
I know people who have climbed, gotten stuck, and wished they hadn’t. I know people who never realized how hard it would be for them to go back to something they loved doing after 5+ years climbing the ladder farther and farther away from tech. I know some who are struggling their way back, others who have no idea how or where to start. For those who try, it is hard.
You can’t go back and forth from engineering to executive, or even director to manager, in the way you can traverse freely between management and engineering as a technologist.
I just want more of you entering management with eyes wide open. That’s all I’m saying.
If you don’t know what you want, act to maximize your options.
Engineering is a creative act. Managing engineers will require your full attentive and authentic self. You will be more successful if you figure out what that self is, and honor its needs. Try to resist the default narratives about promotions and titles and roles, they have nothing to do with what satisfies your soul. If you have influence, use it to lean hard against things like paying managers more than ICs of the same level.
It’s totally normal not to know who you want to be, or have some passionate end goal. It’s great to live your life and work your work and keep an eye out for interesting opportunities, and see what resonates. It’s awesome when you get asked to step up and opportunistically build on your successes.
If you want a sustainable career in tech, you are going to need to keep learning your whole life. The world is changing much faster than humans evolved to naturally adapt, so you need to stay a little bit restless and unnaturally hungry to succeed in this industry.
The best way to do that is to make sure you a) know yourself and what makes you happy, b) spend your time mostly in alignment with that. Doing things that make you happy give you energy. Doing things that drain you are antithetical to your success. Find out what those things are, and don’t do them.
Don’t be a martyr, don’t let your spending habits shackle you, and don’t build things that trouble your conscience.
And have fun.
Yours in inverting $(allthehierarchies),
 Important point: I am not saying you can’t pick up the skills and patience to practice engineering again. You probably can! But employers are extremely reluctant to pay you a salary as an engineer if you haven’t been paid to ship code recently. The tipping point for hireability comes long before the tipping point for learning ability, in my experience.
 It is in no one’s best interest for money to factor into the decision of whether to be a manager or not. Slack pays their managers LESS than engineers of the same level, and I think this is incredibly smart: sends a strong signal of servant leadership.
The company is growing like crazy, your engineering team keeps rising to the challenge, and you are ferociously proud of them. But some cracks are beginning to show, and frankly you’re a little worried. You have always advocated for engineers to have broad latitude in technical decisions, including choosing languages and tools. This autonomy and culture of ownership is part of how you have successfully hired and retained top talent despite the siren song of the Faceboogles.
But recently you saw something terrifying that you cannot unsee: your company is using all the languages, all the environments, all the databases, all the build tools. Shit!!! Your ops team is in full revolt and you can’t really blame them. It’s grown into an unsupportable nightmare and something MUST be done, but you don’t know what or how — let alone how to solve it while retaining the autonomy and personal agency that you all value so highly.
I hear a version of this everywhere I’ve gone for the past year or two. It’s crazy how often. I’ve been meaning to write my answer up for ages, and here it (finally) is.
First of all: you aren’t alone. This is extremely common among high-performing teams, so congratulations. Really!
There actually seems to be a direct link between teams that give engineers lots of leeway to own their technical decisions and that team’s ability to hire and retain top-tier talent, particularly senior talent. Everything is a tradeoff, obviously, but accepting somewhat more chaos in exchange for a stronger sense of individual ownership is usually the right one, and leads to higher-performing teams in the long run.
Second, there is actually already a well-trod path out of this hole to a better place, and it doesn’t involve sacrificing developer agency. It’s fairly simple! Just five short steps, which I will describe to you now.
How to build a golden path and reverse software sprawl
Assemble a small council of trusted senior engineers.
Task them with creating a recommended list of default components for developers to use when building out new services. This will be your Golden Path, the path of convergence (and the path of least resistance).
Tell all your engineers that going forward, the Golden Path will be fully supported by the org. Upgrades, patches, security fixes; backups, monitoring, build pipeline; deploy tooling, artifact versioning, development environment, even tier 1 on call support. Pave the path with gold. Nobody HAS to use these components … but if they don’t, they’re on their own. They will have to support it themselves.
Work with team leads to draw up an umbrella plan for adopting the Golden Path for their current projects as well as older production services, as much as is reasonable or possible or desirable. Come up with a timeline for the whole eng org to deprecate as many other tools as possible. Allocate real engineering time to the effort. Hell, make a party out of it!
After the cutoff date (and once things have stabilized), establish a regular process for reviewing and incorporating feedback about the blessed Path and considering any proposed changes, additions or removals.
There you go. That’s it. Easy, right??
(It’s not easy. I never said it was easy, I said it was simple. 👼🏼)
Your engineers are currently used to picking the best tool for the job by optimizing locally. What data store has a data model that is easiest for them to fit to their needs? Which language is fastest for I/O throughput? What are they already proficient in? What you need to do is start building your muscles for optimizing globally. Not in isolation of other considerations, but in conjunction with them. It will always be a balancing act between optimizing locally for the problem at hand and optimizing globally for operability and general sanity.
(Oh, incidentally, requiring an engineer to write up a proposal any time they want to use a non-standard component, and then defend their case while the council grills them in person — this will be nothing but good for them, guaran-fucking-teed.)
Let’s go into a bit more detail on each of the five points. But quick disclaimer: this is not a prescription. I don’t know your system, your team, your cultural land mines or technical interdependencies or anything else about your situation. I am just telling stories here.
1. Assemble your council
Three is a good number for a council. More than that gets unwieldy, and may have trouble reaching consensus. Less than three and you run into SPOFs. You never want to have a single person making unilateral decisions because a) the decision-making process will be weaker, b) it sets that person up for too much interpersonal friction, and c) it denies your other engineers the opportunity to practice making these kinds of decisions.
Your council members need technical breadth more than depth, and should be widely respected by engineers.
At least one member should have a long history with the company so they know lots of stupid little details about what’s been tried before and why it failed.
At least one member should be deeply versed in practical data and operability concerns.
They should all have enough patience and political skill to drive consensus for their decisions. Absolutely no bombthrowers.
If you’re super lucky, you just tap the three senior technologists who immediately come to mind … your mind and everyone else’s. If you don’t have this kind of automatic consensus, you may want to let teams or orgs nominate their own representative so they feel they have some say.
2. Task the council with defining a Golden Path
Your council cannot vanish for a week and then descend from the mountain lugging lists engraved on stone tablets. The process of discovery and consensus is what validates the result.
The process must include talking to and gathering feedback from your engineers, talking to experts outside the company, talking to teams at other companies who are farther along using that technology, coming up with detailed pro/con lists and reasons for their choices. Maybe sometimes it includes prototyping something or investigating the technical depths … but yeah no mostly it’s just the talking.
You need your council members to have enough political skill to handle these conversations deftly, building support and driving consensus through the process. Everybody doesn’t have to love the outcome, but it shouldn’t be a *surprise* to anyone by the end.
3. Know where you’re going
Your council should create a detailed written plan describing which technologies are going to be supported … and a stab at what “supported” means. (Ask the experts in each component what the best practices are for backups, versioning, dependency management, etc.)
You might start with something like this:
* Backend lang: Go 1.11 ## we will no longer be supporting
backend scripting languages
* Frontend lang: ReactJS v 16.5
* Primary db: Aurora v 2.0 ## Yes, we know postgres is "better",
but we have many mysql experts and 0 pg experts except the one guy
who is going to complain about this. You know who you are.
* Deploy pipeline: github -> jenkins + docker -> S3 -> custom k8s
* Message broker: kafka v 2.10, confluent build
* Mail: SES
* .... etc
Circulate the draft regularly for feedback, especially with eng managers. Some team reorganization will probably be necessary to bear the new weight of your support specifications, and managers will need some lead time to wrangle this.
This is also a great time to reconceive of the way on call works at your company. But I am not going to go into all that here.
4. Set a date, draft a plan: go!
Get approval from leadership to devote a certain amount of time to consolidating your stack and paying down a lump sum of tech debt. It depends on your stage of decay, but a reasonable amount of time might be “25% of engineering time for three months“. Whatever you agree to, make sure it’s enough to make the world demonstrably better for the humans who run it; you don’t want to leave them with a tire fire or you’ll blow your credibility.
The council and team leads should come up with a rough outer estimate for how long it would take to rewrite everything and move the whole stack on to the Golden Stack. (It’s probably impossible and/or would take years, but that’s okay.) Next, look for the quick wins or swollen, inflamed pain points.
If you are running two pieces of functionally similar software, like postgres and mysql, can you eliminate one?
If you are managing something yourself that AWS could manage for you (e.g. postfix instead of SES, or kafka instead of kinesis), can you migrate that?
If you are managing anything yourself that is not core to your business value, in fact, you should try to not manage it.
If you are running any services by hand on an AWS instance somewhere, could you try using a service?
If you are running your own monitoring software, etc … can you not?
If you have multiple versions of a piece of software, can you upgrade or consolidate on one version?
The hardest parts are always going to be the ones around migrating data or rewriting components. Not everything is worth doing or can afford to be done in the time span of your project time, and that’s okay.
Next, brainstorm up some carrots. Can you write templates so that anybody who writes a service using your approved library, magically gets monitoring checks without having to configure anything? Can you write a wrapper so they get a bunch of end-to-end tests for free? Anything you can do to delight people or save them time and effort by using your preferred components is worth considering.
(By the way, if you don’t have any engineers devoted to internal tooling, you’re probably way overdue at this point.)
Pay down as much debt as you can, but be pragmatic: it’s better to get rid of five small things than one large thing, from a support perspective. Your main goal is to shrink the number of types of software your team has to support, particularly databases.
Do look for ways to make it fun, like … running a competition to see who can move the most tools to AWS in a week, or throwing a hack week party, or giving dorky prizes like trophies that entitle you to put your manager on call instead of you for a day, etc.
5. Make the process sustainable
After your target date has come and gone, you probably want to hold a post mortem retrospective and do lots of listening. (Well — first might I recommend a bubble bath and a bottle of champagne? But then a post mortem.)
Nothing is ever fixed forever. The company’s needs are going to expand and contract, and people will come and go, because change is the only constant. So you need to bake some flex into your system. How are you going to handle the need for changes to the Golden Path? Monthly discussions? An email list? Quarterly meetings with a formal agenda? I’ve seen people do all of these and more, it doesn’t really matter afaict.
Nobody likes a cabal, though, so the original council should gradually rotate out. I recommend replacing one person at a time, one per quarter, and rotating in another senior engineer in their place. This provides continuity while giving others a chance to learn these technical and political skills.
In the end, engineers are still free to use any tool or component at any time, just like before, only now they are solely responsible for it, which puts pressure on them not to do it unless REALLY necessary. So if someone wants to propose adding a new tool to the default golden path, they can always add it themselves and gain some experience in it before bringing it to the council to discuss a formal place for it.
That’s all folks
See, wasn’t that simple?
(It’s never simple.)
I dearly wish more people would write up their experiences with this sort of thing in detail. I think engineering teams are too reluctant to show their warts and struggles to the world — or maybe it’s their executives who are afraid? Dunno.
Regardless, I think it’s actually a highly effective recruiting tool when teams aren’t afraid to share their struggles. The companies that brag about how awesome they are are the ones who come off looking weak and fragile. Whereas you can always trust the ones who are willing to laugh about all the ways they screwed up. Right?
In conclusion, don’t feel like an asshole for insisting on some process here. There should be friction around adding new components to your stack. (Add in haste, repent at leisure, as they say.) Anybody who argues with you probably needs to be exposed to way, way more of the support load for that software. That’s my professional opinion.
Anyway. You win or you die. Good luck with your sprawl.
“How can I learn to be a better manager? I have no idea what I’m doing.”
“I’m a tech lead, and responsible for shipping large projects, but all of my power is informal. How do I get people to listen to me and do what I need them to?”
“As a manager, I don’t feel safe talking to anyone about my work problems. What if it gets passed around as gossip?”
Leadership is such a weird thing. Leadership is something every one of us does more and more of as we get older and more experienced, but mostly you learn leadership lessons from trial and error and failing a lot. Which is too bad when you’re doing something you really care about, for the first time.
(Like starting a company.)
I’ve read books on leadership. I’ve been semi-consensually subjected to management training, I’ve had coaches, I’ve tried therapy and mentors. Most of this has been impressively (and expensively) unhelpful.
There’s only one thing that has reliably accelerated my development as a leader or manager, and that is forming bonds and swapping stories with my peers.
Stories are power tools.
A story is a tool. The more stories you have about how other people have solved a problem like yours, the more tools you have.
People are very complicated puzzles, and the more tools you have the more likely you are to find a tool that works.
Unlike management books which speak in abstractions and generalities, stories are real and specific. When you have the storyteller in front of you, you can drill down and find out more about how the situation was like your own or not, and what they wish they’d done in retrospect.
Details matter. Context matters.
Sometimes all you really need is a sympathetic ear to listen and make murmuring noises of encouragement while you work it out yourself out loud. Sometimes they have grappled with a similar situation and can tell you how it all worked out, what they wish they’d known. Sometimes they will cut you off and tell you to quit feeling sorry for yourself or sabotaging yourself.. if you’re lucky.
Peers?? Why not a ‘mentor’?
No insult meant to anyone who gets a lot out of mentoring, but it isn’t really my bag. I’ve always had.. let’s say issues with authority. Which is a nice way of saying “never met a power structure I didn’t simultaneously want to crush and invert”. So I prefer the framing of “peers” over even the relatively tame hierarchy of the mentor-mentee relationship.
I mean, one-way relationships are fucked up. Lots of my peers are more junior than me and some are more senior, yet somehow we all manage to be givers as well as takers. And if you’re both giving support and receiving it, then what the fuck do you need different roles like mentor and mentee?
I don’t want to be someone’s mentor. I want to be their friend and to sometimes be helpful. I don’t want to be someone’s “mentee” either, that makes me feel like their charity case (ha ha).
But friends and peers? Those just make my life better and awesomer.
From each according to their ability, to each according to their need.
The first year or two of honeycomb, I had a small list of friends who I got dinner or drinks with once a month like clockwork. Most of them were founders or execs or had been at one point, so they knew how depressed I was without needing to say so.
They listened to my stories (even though I was terrible company), and shared plenty of their own. They just kept showing up, reminding me to sleep, asking if they could help, not taking it personally whatever state I was in.
These friendships carried me through some dark times. When Christine’s role required her to level up at leadership skills, I encouraged her to get some peers too. And that’s when I began to realize some of the limitations of the 1×1 model: it’s very time consuming, and doesn’t scale well.
But hey! scaling problems are fun. 😀 I decided to pull together a peer group where people could come together and give and get support all at the same time.
(Actually two groups. One for me, one for Christine, so we could complain about each other in proper peace and privacy.)
Practicing vulnerability, establishing intimacy.
It took some time to assemble the right groups, but then we met weekly for 6 weeks straight, and after that roughly once a month for a year. The six starter weeks were intended to help us practice vulnerability and establish intimacy in a compressed time frame.
Last week was our one-year birthday.
There’s something sterile about management books and leadership material, something that makes it hard for me to emotionally connect my problems to the solutions they preach. I want advice from someone who knows me in all my strengths and weaknesses, who knows what advice I can take and perform authentically and what I can’t. Context matters. Who we are matters.
As word has started to get around about the group, sometimes people ask me about joining or how to form a group of their own. Turns out lots of people are hungry to get better at leadership, and there are precious few resources.
That’s why I decided to write up and publish my notes. Everything I learned along the way about how to run a tech leadership skill swap — the logistics, the facilitation, the homework, the ground rules. Who to invite. Recommended reading lists.
(It’s a little rough, but I positively cannot spare any more time.)
This shit is hard. You need a posse.
Would you like to run a tech leads skill swap? Please tell me if you do, I would love to know! I’m happy to help you get started with a phone call, if you want.
All I ask is that you try to pull together a posse that’s at least 50% women, queers, and other marginalized folks.
Good luck. ~charity
 I take a pretty expansive view of leadership. For example, an intern might exercise leadership in the vaunted area of database backups — just by volunteering to own backups, reliably performing said backups and serving as a point of coordination and education for how we do backups here at $corp.
If you have expertise and people rely on you for it, this is a legit form of influence and power … in other words, that’s leadership.
 A HUGE thanks to Rachel Chalmers for scribing a first draft of these notes, and to Kris for running the other group and contributing the homework sheet and stories of a related group at twitter.
On twitter this week, @srhtcn noted that “Many incidents happen during or right after release” and asked for advice on ways to fix this.
And he’s right! Rolling out new software is the proximate cause for the overwhelming majority of incidents, at companies of all sizes. Upgrading software is both a necessity and a minor insanity, considering how often it breaks things.
But it’s still risky. And most issues are still caused by humans and our pesky need for “improvements”. So what can be done?
It’s not ok for software releases to be scary and hazardous
First of all: If releasing is risky for you, you need to fix that. Make this a priority. Track your failures, practice post mortems, evaluate your on call practices and culture. Know if you’re getting better or worse. This is a project that will take weeks if not months until you can be confident in the results.
You have to fix it though, because these things are self-reinforcing. If shipping changes is scary and fraught, people will do it less and it will get even MORE scary and treacherous.
Likewise, if you turn it into a non-cortisol inducing event and set expectations, engineers will ship their code more often in smaller diffs and therefore break the world less.
Fixing deploys isn’t about eliminating errors, it’s about making your pipeline resilient to errors. It’s fundamentally about detecting common failures and recovering from them, without requiring human intervention.
Value your tools more
As an short term patch, you should run deploys in the mornings or whenever everyone is around and fresh. Then take a hard look at your deploy pipeline.
In too many organizations, deploy code is a technical backwater, an accumulation of crufty scripts and glue code, forked gems and interns’ earnest attempts to hack up Capistrano. It usually gives off a strong whiff of “sloppily evolved from many 2 am patches with no code review”.
This is insane. Deploy software is the most important software you have. Treat it that way: recruit an owner, allocate real time for development and testing, bake in metrics and track them over time.
If it doesn’t have an owner, it will never improve. And you will need to invest in frequent improvements even after you’re over this first hump.
Signal high organizational value by putting one of your best engineers on it.
Recruit help from the design side of the house as well. The “right” thing to do must be the fastest, easiest thing to do, with friendly prompts and good docs. No “shortcuts” for people to reach for at the worst possible time. You need user research and design here.
Track how often deploys fail and why. Managers should pay close attention to this metric, just like the one for people getting interrupted or woken up, and allocate time to fixing this early whenever it sags. Before it gets bad.
Allocate real time for development, testing, and training — don’t expect the work to get shoved into people’s “spare time” or post mortem cleanup time. Make sure other managers understand the impact of this work and are on board. Make this one of your KPIs.
In other words, make deploy tools a first class citizen of your technical toolset. Make the work prestigious and valued — even aspirational. If you do performance reviews, recognize the impact there.
(Btw, “how we hardened our deploys” is total Velocity-bait (&& other practitioner conferences) as well as being great for recruiting and general visibility in blog post form. People love these stories; there definitely aren’t enough of them.)
Turn software engineers into software owners
The canonical CI/CD advice starts with “ship early, ship often, ship smaller change sets”. That’s great advice: you should definitely do those things. But they are covered plenty elsewhere. What’s software ownership?
Software ownership is the natural end state of DevOps. Software engineers, operations engineers, platform engineers, mobile engineers — everyone who writes code should be own the full lifecycle of their software.
Software owners are people who:
Can deploy and roll back their own code
Are able to debug their own issues in prod (via instrumentation, not ssh)
If you’re lacking any one of those three ingredients, you don’t have ownership.
Why ownership? Because software ownership makes for better engineers, better software, and a better experience for customers. It shortens feedback loops and means the person debugging is usually the person with the most context on what has recently changed.
Some engineers might balk at this, but you’ll be doing them a favor. We are all distributed systems engineers now, and distributed systems require a much higher level of operational literacy. May as well start today.
Fail fast, fix fast
This is about shifting your mindset from one of brittleness and a tight grip, to one of flexibility where failures are no big deal because they happen all the time, don’t impact users, and give everyone lots of practice at detecting and recovering from them.
Here are a few of the best practices you should adopt with this practice.
The engineer who writes the code and merges the PR should also run the deploy
Everyone who writes code must be trained in how to deploy, roll back & revert to last known good state (before escalating if necessary). They should also know the basics of instrumentation, feature flagging and debugging in prod..
After deploying you MUST go verify: are your changes behaving as expected? Does anything else look .. unexpected? You have the most context on what to expect; just two minutes spent verifying that things look reasonable will catch the overwhelming majority of errors before users even notice.
Everyone who puts software in production needs to understand and feel responsible for the full lifecycle of their code, not just how it works in their IDE.
Baking: it’s not just for cookies
Shipping something to production is a process of incrementally gaining confidence, not a switch you can flip.
You can’t trust code until it’s been in prod a while, until you’ve seen it perform under a wide range of load and concurrency scenarios, in lots of partial failure modes. Only over time can you develop confidence in it not being terrible.
Nothing is production except production. Don’t rely on never failing; expect failure, embrace failure. Practice failure! Build guard rails around your production systems to help you find and fix problems quickly.
The changes you need to make your pipeline more resilient are roughly the same changes you need to safely test in production. These are a few of your guard rails.
Build canaries for your deploy process, so you can promote releases gracefully and automatically to larger subsets of your traffic as you gain confidence in them
Create cohorts. Deploy to internal users first, then any free tier, etc in order of ascending importance. Don’t jump from 10% to 25% to 50% and then 100% — some changes are related to saturating backend resources, and the 50%-100% jump will kill you.
Have robots check the health of your software as it rolls out to decide whether to promote the canary. Over time the robot checks will mature and eventually catch a ton of problems and regressions for you.
The quality of code is not knowable before it hits production. You may able to spot some problems, but you can never guarantee a lack of then. It takes time to bake a new release and gain incremental confidence in new code.
Get someone to own the deploy software
Value the work
Create a culture of software ownership
LOOK at what you’ve done after you do it
Be suspicious of new versions until they prove themselves