But if it feels like a demotion, or it’s too hard to swallow, or if the politics of promotions at this company make you leery: compromise by getting yourself hired as a staff or principal engineer, while being clear with your hiring manager that you plan to spend the first 6+ months slinging diffs. They should be fine with it (delighted, really) since a) ANY staff+ hire is an investment for the long run, b) this is a great way to onboard any staff+ engineer, and c) I don’t believe anybody can actually have staff+ level impact during their first 6-12 months at a company, since so much of the job has to do with people, process, technical context, systems history, etc which accrues over time.
Have fun, and do report back! Tell us how it goes!
P.S.: You don’t say how long it’s been, but I’m operating under the assumption that it’s been 5-10 years since you last worked as an engineer.
P.P.S.: 🚨unsolicited opinion alert🚨 I would personally avoid any role that includes “Architect” in its title (except solutions architects). To me, “software architect” rings of “someone who can no longer write code or perform as a software engineer, who has probably been at the same company for so long that their skills and knowledge now consist entirely of minutiae about that particular company’s technology. likely to be useless and/or helpless at any other company.” I say this with all due apologies to my architect friends, every one of whom is technically dazzling, operationally up-to-date, and has beautiful hair.💆 🥰
We have been interviewing and hiring a pile of engineering directors at Honeycomb lately. In so doing, I’ve had some fascinating conversations with engineering managers who have been trying unsuccessfully to make the leap to director.
Here is a roundup of some of the ideas and advice I shared with them, and the original twitter thread that spawned this post.
What is an engineering director?
Given all the title inflation and general inconsistency out there, it seems worth describing what I have in mind when I say or hear “Engineering Director.”
In a traditional org chart, an engineering manager usually manages about 5-8 engineers, an engineering director manages 2-5 engineering managers, and a VP of engineering manages the directors. (At big companies, you may see managers and directors reporting to other managers and directors, and/or you may find a bunch of ‘title padding’ roles like Senior Manager, Senior Director etc.)
In smaller companies, it’s common to have a “Head of Engineering” (this is an appropriately weaselly title that commands just the right amount of respect while leaving plenty of space to hire additional people below or above them). Or all of engineering might roll up to a director or VP or CTO. It varies a lot.
When it comes to the work a director is expected to do, though, there’s a fair bit of consistency: we expect managers to manage ICs, and directors to manage managers. Directors sit between the line managers and the strategic leadership roles. (More on this later.)
So if you’re an engineering manager, and you want to try being a director, the first thing you’ll want to understand is this: it is generally better to get there by being promoted than by getting offered a director title at a different company.
How to level up
Lots of engineers get tapped by their management to become managers, but not many become directors without a conversation and some intentional growth first. This means that for many of you, trying to become a director may be the first time you have ever consciously solicited a role outside the interview process. This can bring up feelings of awkwardness, even shamefulness or inappropriateness. You’ll just have to push through those.
If you ever want a job in upper leadership, you are going to have to learn how to shamelessly state your career goals. We want people in senior leadership who want to be there and are honing their skills in anticipation of an opportunity. Not “oops, I accidentally a VP.”
It is better to get promoted than hired up a level
There are a few reasons for this. It’s usually easier to get promoted than to get hired straight into a job you’ve never held before (at a org with high standards), and it also tends to be more sustainable/more likely to succeed if you get promoted in as well. Being a director is NOT just being a super-duper manager, it’s a different role and function entirely.
A lot of your ability to be successful as a director (or any kind of manager) comes from knowing the landscape, the product and the people, and having good relationships internally. When you are internally promoted, you already know the company and the people, so you get a leg up towards being successful. Whereas if you’ve just joined the company and are trying to learn the tech, the people, the relationships, and how the company works all at once, on top of trying to perform a new role for the first time.. well, that is a lot to take on at once.
There are exceptions, sure! Oodles of them. But I would frankly look sideways at a place that wanted to hire me as a director if I haven’t been one, or hadn’t at leastmanaged managers before. It’s at least a yellow flag. It tells me they are probably either a) very desperate or b) very sloppy with handing out titles.
If you want a promotion…
The obvious first step involves asking your manager, “what is the skill gap for me between the job I am doing right now and a director role?” Unlike in the movies, promotions don’t usually get surprise-dropped on people’s heads; people are usually cultivated for them. Registering your interest makes it more likely they will consider you, or help you develop skills in that direction as time moves on.
If you have a good manager who believes in you, and the opportunities exist at your company, that might even be all you have to do.(!)
And if so, lucky you. But as for the remaining ~80-90% of us (ha!) … well, we’ll need a bit more hustle.
Take inventory of your opportunities
Lots of companies aren’t large enough to need directors, or growing fast enough to create new opportunities. This can actually be the most challenging part of the equation, because there are generally a lot more managers who want to be directors than there are available openings.
If you do need to find a new job to reach your career goals, I would target fast-growing companies with at least 100 engineers. If you’re evaluating prospective employers based on your chance of advancement, consider the following::
Ask about their policies on internal vs external hires. Do they give preference to existing employees? How do they decide when to recruit vs grow from within?
Ask about the last time that someone was promoted into a similar role.
Tell the recruiter and hiring manager about your career goals. Don’t be shy. “My next career goal is to gain some experience managing managers” is fine. (That shouldn’t be the only reason you’re interested, of course.)
There are no sure bets. But you can do a lot to put yourself in the right place at the right time, signal your interest, and be prepared for the opportunity when it strikes.
a director is not a ‘super-senior manager’
A director is not just a manager on steroids: it is an entirely different job. It helps to have been a good manager before becoming a director, because many management skills will translate, but others will be entirely new to you. Expect this.
How being a good director is different from being a good manager
Let’s look at some of the ways that being a good engineering manager is different than being a good director.
You can be a great EM, beloved by your team, without giving much thought to managing out or up. Directors cannot. If anything, it’s the opposite. You may get away with not coddling your EMs, but you must pull your weight for your peers and upper management.
You can have a bit of a reputation for being stubborn or difficult as an EM, and that can be just fine. But having such a rep will probably sabotage your attempt at being promoted to director.
You can be a powerful technical EM who sometimes jumps in to train engineers, be on call, or course correct technical and architectural decisions. This can even burnish your value and reputation as an EM. But this would all be a solid knock against you as a director.
Managers can get away with being opinionated and attached to technology, to some extent, while directors absolutely must balance lots of different stakeholders to achieve healthy business outcomes.
This difference of perspective is why managers will sometimes sniff about directors having sold out, or being “all about politics.”
(Blaming something on “politics” is usually a way of accidentally confessing that you don’t actually understand the constraints someone is operating under, IMO.)
A director’s job is running the business
Here’s the key fact: ✨directors run the business✨.
Managers should be focused on high-performing engineering teams. VPs should be focused on strategy and the longer term. Directors are the execution machines that knit technology with business objectives. (I like this piece, although the lede is a little buried. Key graf:)
Directors run the business. They are accountable for results. You can’t be bopping in and writing or reviewing code, or tossing off technical opinions. That’s not your job anymore.
Managing managers is a whole new skill set
The distance between managing engineers and managing managers is nearly as vast as the gulf between being an engineer and being a manager.
But it’s sneakier, because you don’t feel out of your depth as much as you did when you became a manager. 😁
As a manager, each of us instinctively draws on our own unique blend of strength and charisma — whatever it is that makes people look up to you and willing to accept your influence. Most of us can’t explain how we do it, because we run on instinct.
But as a director, you have to figure it out. Because you need to be able to debug it when the magic breaks down. You need to help your managers influence and lead using *their* unique strengths. What works for you won’t work for them. You have to learn how to unpack different leadership styles and support them in the way they need.
If you’re working towards a director role:
There are lots of areas where you can improve your director skills and increase your chances of being viewed as director material without any help whatsoever from your manager.
You ✨can not✨ be a blocker
Directors run the business … so you CANNOT be seen as a blocker. People must come to you of their own accord to get shit done and break through the blockers.
If they are going to other people for advice on how to break through YOU, you are not a good candidate for director. Figure out how to fix this before you do anything else.
Demonstrate impact beyond your team(s)
Another way to make yourself an attractive prospect for director is to work on systemic problems, driving impact at the org or company level. You could:
work to substantially increase the diversity of your teams or your candidate pipeline, and offer to work with recruiting and other managers to help them do the same (becoming BFFs with recruiting is often a canny move)
drive some cross-platform initiative to consolidate dozens of snowflake deploy processes and significantly reduce CI/CD build/deploy times, set an internal SLO for artifact build times, or successfully champion auto-deployment
champion an internal tools team with a mandate to increase developer productivity, and quantify the hell out of it
lead a revamp of the new hire onboarding process. Or add training and structure to the interview process and set an SLO of responding to every candidate within one week
I dunno — it all depends on what’s broken at your company. 🙃 Identify something causing widespread pain and frustration at the organizational level and fix it.
Managing ‘up’ is not a ‘nice-to-have’…
If there’s a problem, make sure you are the one to bring it to your manager (and swiftly), along with “Here is the context, here’s where I went wrong, and this is what I’m planning to do about it.” No surprises.
At this point in your career, you should have mastered the art of not being a giant pain in the ass to your manager. Nobody wants a high-maintenance director. Do you reliably make problems go away, or do they boomerang back five times worse after you “fix” them?
…Neither is managing ‘out’
Managing “out” is important too. (Not “managing out”, which means terminating people from the company, but managing “out” as in horizontally, meaning your relationship with your peers.)
What do your peers think of you? Do you invest in those relationships? Do they see you as an ally and a source of wise counsel, or a source of chaos, gossip and instability, or a competitor with turf to protect? If you’re the manager that other managers seek out for a peer check, you might be a good candidate for director.
psst.. People are watching you
One of the most uncomfortable things to internalize if you climb the ladder is how much people will make snap judgments about you based on the tiniest fragments of information about you, and how those judgments may forever color the way they think of you or interact with you.
First impressions might be made by ten minutes together on the same zoom call…a few overheard fragments of people talking about you…even the expressions on your face as they pass you in the hallway. People will extrapolate a lot from a very little, and changing their impression of you later is hard work.
(Yes it’s frustrating, but you can’t really get upset about it, because you and I do it too. It’s part of being human. )
Because of that, you really do have to guard against being too cranky, too tired, or out of spoons. People WILL take it personally. It WILL come back to hurt you.
Remember, you don’t hear most feedback. If you visibly disagree with someone, assume 10x as many silently agree with them. If one person gives you a piece of hard feedback, assume 10x as many will never tell you. Be grateful. The more power you are perceived to have, the less feedback you will ever hear.
You can infer a surprising amount about how good a director candidate may be at their job, simply by listening closely to how they talk about their colleagues. Do they complain about being misunderstood or mistreated, do they minimize the difficulty or quality of others’ work, do they humblebrag, or do they take full responsibility for outcomes? And does their empathy fully extend to their peers in other departments, like sales and marketing?
Does it sound like they enjoy their work, and look forward to beginning it every day? Does it sound like they are all in the same little tugboat, all pulling in the same direction, or is there a baseline disconnect and lack of trust?
Be approachable, be a drama dampener, project warmth. Control your calendar and carve out regular focus time. Guard your energy — never run your engine under 30%, and always leave something in the tank.
There are a lot more great responses and advice in the replies to my thread, btw. Go check them out if you’re interested.. and if you have something to say, contribute!.☺️
 Occasionally, it may work out to your benefit to jump into a new, higher title at a new company. This can happen when someone is already well qualified for the higher role, but is finding it difficult to get promoted (possibly due to insufficient opportunity or systemic biases). Just be aware that the job you were hired into is likely to be one where the titles are meaningless and/or the roles are chaotic. You may want to stay just long enough to get the title, then bounce to a healthier org.
There are two anxieties I hear people express above all the rest.
The first one is something I hear over and over again, particularly from first-time managers as they contemplate the possibility of leaving management and returning to IC (individual contributor) work as an engineer:
“What if I never get another shot at people management?” “Maybe this is the only chance I’ll ever get … and I’m about to give it up??” “Am I going to regret this?”
“Will I ever get another shot at management?”
People decide to go back to engineering for lots of reasons. Maybe they’re burned out, or they work someplace with a poisonous management culture, or they’re having a kid and want to return to a role that feels more comfortable for a while. Or maybe they’ve been managing teams for a few years now, and have decided it’s time to go back to the well and refresh their technical skills in the interest of their long-term employability.
Regardless, these are not typically people who disliked being a manager. Rather they tend to be engineers who really enjoyed people management, and find it bittersweet to give up. Maybe they will miss the strategic elements and roadmap work, but they’re excited to clear their calendar and spend time in flow again, or they will miss having 1x1s but can’t wait to have time to mentor people. Whatever. They want to manage teams again someday, and worry they won’t get another chance.
Their anxiety is understandable! Lots of people feel like they waited a long time to be tapped for management, or like they were passed over again and again. Our cultural scripts about management definitely contribute to this sense of scarcity and diminution of agency (i.e. that management is a promotion, it is bestowed on you by your “superiors” as a reward for your performance, and it is pushy or improper to openly seek the role for yourself).
This anxiety is also, in my experience, ridiculously misplaced. ☺️
Once a manager, marked for life as a manager
You may have struggled to get your first opportunity to manage a team. But it’s a whole different story once you’ve done the job. Now you have the skills and the experience, and people can smellit on you.
I’m not joking. If you’re a good manager it’s actually nearly impossible to hide that you have the skills, because of the way it infuses your work and everything that you do as an IC. You get better at prioritization, more attuned to the needs of the business, and restless about work that doesn’t materially move the business forward. You get better at asking questions about why things need to be done and at communicating with stakeholders. You get better at motivating the people you work with, understanding their motivations and your own, and mediating conflicts or putting a damper on drama between peers. People come to you for advice and may seem to just do what you say, or go where you point.
Senior engineers with management experience are worth their weight in gold. They are valuable contributors and influential teammates. It’s a palpable shift! And every experienced manager in their vicinity will sense it.
So yes, you will be tapped for management again. And again and again and again. You are more likely to spend the rest of your career fending off management “opportunities” with a baseball bat than you are to wither away, pining for another shot.
There is a chronic shortage of good engineering managers, just like there is a chronic shortage of good, empathetic managers in every line of work. The challenge you will face from now on will not be about getting the chance to manage a team, but about being intentional and firm in carving out the time you need to recover and recharge your skills as an engineer.
“Am I too rusty to go back to engineering?”
The second anxiety is in some ways a mirror of the first:
“Can I still perform as an engineer?” “Will anyone hire me for an engineering role?” “Has it been too long, am I too rusty, will I be able to pull my weight?”
This is a more materially valid concern than the first one, in my opinion. Your engineering skills do wither and erode as time goes on. It will take longer and longer to refresh your skills the longer you go without using them. Management skills don’t decay in the same way that technical ones do, nor do they go out of date every few years as languages, frameworks and technologies tend to do.
If you aren’t interested in climbing the ladder and becoming a director or VP — or rather, if you aren’t actively, successfully climbing the ladder — you should have a strategy for keeping your hands-on skills sharp, because your ability to be a strong line manager is grounded in your own engineering skills.
Never, ever accept a managerial role until you are already solidly senior as an engineer. To me this means at least seven years or more writing and shipping code; definitely, absolutely no less than five. It may feel like a compliment when someone offers you the job of manager — hell, take the compliment 🙃 — but they are not doing you any favors when it comes to your career or your ability to be effective.
When you accept your first manager job, I think you should make a commitment to yourself to stick it out for two years. That’s how long it takes to rewire your instincts and synapses, to learn enough that you can tell whether you’re doing a good job or not.
After two or three years of management, it’s still pretty easy to go back to engineering. After five years, it gets progressively harder. But it can be done. And it should be worth it to your employer to invest in keeping you while you refresh your skills over the six months or whatever it may take. Insist on it, if you must. It’s better to refresh your skills while employed, on a system and codebase you’re familiar with, than to find yourself struggling to brush up enough to pass a coding interview.
Engineering fluency == job security
There is one more reason to refresh your engineering skills from time to time, one I don’t often see mentioned, and that is job security and optionality.
The higher you go up the ladder, the more money you will get paid…but the fewer jobs there be, and the fewer still that match your profile.
As a senior software engineer, there are fifteen bajillion job openings for you. Everyone wants to hire you. You can get a new job in a matter of days, no matter how picky you want to be about location, flexibility, technologies, product types, whatever. You’ve reached Peak Hire.
If you are looking for management roles, there will be an order of magnitude fewer opportunities (and more idiosyncratic hiring criteria), but still plenty for the most part. But for every step up the ladder you go, the opportunities drop by another order of magnitude, and the scrutiny becomes much more intense and particular. If you’re looking for VP roles, it may take months to find a place you want to work at, and then they might not choose you. ¯\_(ツ)_/¯
Maintaining your technical chops is a stellar way to hedge against uncertainties and maintain your optionality.
How can you tell the companies who are earnestly trying to improve apart from the ones who sound all polished and healthy from the outside, whilst rotting on the inside?
This seems to be on a lot of minds right now, what with the Great Resignation and all. There are no perfect companies, just like there are no perfect relationships; but there are many questions and techniques you can use to increase your confidence that a particular company is decent and self-aware, one whose quirks and foibles you are compatible with.
There's an asymmetry of information that makes it very challenging to pick between the companies that are earnestly trying to do better and the ones that say the right things but are actively evil.
Interviews are designed to make you feel like you are under inspection, like the interviewer holds all the power. This is an illusion. Your labor is valuable — it is vital — and you should be scrutinizing them every bit as closely as they you. In fact, here is Tip #1:
If they allow you plenty of time to converse with your interviewers throughout the process, great. If they tack on a cursory “any questions for us?” while wrapping up, they don’t think it matters what you think of them. Pull the ripcord.
Collect and practice good interview questions for you to ask potential employers. Write them down — your mind is likely to go blank under stress, and you don’t want to let them off the hook. There is a LOT of signal to be gained by probing down below the surface answers.
Whisper networks and backchannels are incredibly important. It can be especially valuable to talk to someone who has recently left the company: why did they leave? Would they go there again?
Alternately, do you know anyone who has worked for or with their leadership, even if not at that company?
If you know any women or under-represented minorities (URM) who work there, buy them lunch and ask for the unvarnished truth. That’s where you usually turn up the real dirt. 🥂
Diversity, equity and inclusion
Just because a company has a diverse workforce doesn’t necessarily mean it is a healthy place to work. (But it’s fair to give some points up front, because that doesn’t usually happen by accident.)
Do they have a diverse leadership team? A diverse board?
Is their company diverse overall, or are minorities concentrated in a few (lower-paying, high-turnover) departments?
You might not want to write off all the companies that don’t meet points one and two, if for no other reason than it dramatically shrinks your available option pool. If they don’t have a particularly diverse team, and this is something that matters to you, that’s your cue to dig deeper:
Are they bothered by their lack of diversity? What’s the plan? Do they just feel generically sad about it, or have they set specific goals to improve by specific dates? What investments are they making?
Who works on DEI stuff currently? (Answers like “HR and recruiting”, or “we have a woman who’s really good at it” are bad answers.)
Who is accountable for making sure those goals are hit? (The only right answer is “our execs”. Having a “chief diversity officer” is an anti-pattern in my book.)
If the team is all guys, for example, ask if they’ve ever had any women on the team in the past. Did she/they leave? Do they know why?
This is a GREAT one: “As a white man, I’d ask what they’ve done to find qualified women and minorities for the role I’m interviewing for.” (via David Daly) 🔥🔥
What are their values? Do they feel bloodless and ripped from the pages of HBR, or are they unique, lived-in, and give you a glimpse of what the people there care about? Are they mentioned over the course of your interview?
Ask tough questions about the business and try to ascertain whether they are hitting their quarterly goals, how much funding they have in the bank, what the growth curve looks like, what users really think about their product, and what the biggest obstacles to success are.
Companies that are floundering are going to be really stressful places to work, and even if the leadership is decent, they may find themselves backed into making some really tough decisions.
You want to work at a company on a strong growth trajectory for lots of reasons, but a big one is your own growth potential. You will learn the most the fastest at places that are growing fast, and have way more openings for promotions and leadership roles than a slower-growing company.
Are people willing to speak freely about things they’ve tried that have failed, and things that don’t work well currently? Being self-aware and comfortable with visible failure are two of the most important self-correcting mechanisms a company can cultivate.
EVERYBODY thinks they value transparency, so I wouldn’t even bother asking. Instead, ask for specific examples of leadership being forthcoming with bad news to the team, and team members delivering hard feedback or bad news to upper leadership. Transparency shouldn’t be something they’re especially proud of, so much as it is taken for granted. It’s in the air that you breathe.
Planning and the unplanned
Ask about how decisions get made. A chestnut is, “how does work end up on my plate?” — meaning is there a business strategy (owned by whom?), a technical strategy, a product strategy, quarterly KPIs, customer requests, manager delegation, JIRA tickets…? (The most important part may be how similar the answers you get are. 🙃)
How often does work get pre-empted and why? It’s a good thing if product development has to get put on ice once in a while so the team can focus on reliability and maintenance work. It’s a bad thing if they’re expected to stuff reliability work in the cracks around their product development, or if they’re incapable of sticking to a plan.
What does “crunch time” look like? Nearly every company has one from time to time (it might even be a bad sign if it never happens), but this is when you find out your leadership’s true colors.
Do they praise people or call them out to thank them for pulling all nighters and other extremist behaviors? 🚨BZZT🚨
Is it voluntary? Are you trusted to set your own pace, your own limits, or are you pressured to do more? Are people expected to participate to the extent that they are able, and not expected to justify how hard or how much (so long as they communicate their capacity, of course)?
How long did it last, and how often does it happen, and why? It should be rare (1-2x/year at most), involve the whole company, and move the business forward meaningfully
Did they follow through by making sure people took time off afterwards to recover? Not just give permission, but actually make sure the human beings had a chance to refresh themselves? Did leaders set a good example by taking a breather themselves?
Believe it or not, crunch time done correctly can be an enormously exciting, intense, bonding time for a group of people who love what they do, culminating in a surge of collective triumph and celebration, followed by recovery time. If it was done correctly, and you ask about it afterwards, people will 💡light up💡.
Unfortunately, culture can vary widely from function to function, even from manager to manager. Make sure you get a real interview slot with your actual manager — not just a screener or wrap-up call — and as much of the team as possible, too.
Ask your potential manager about the last person they had to let go. Why? What was the process? What was the impact on the team? How did the person feel afterwards?
Who is on call? How often do people get paged outside of hours, and how frequently do they work an incident? (Do managers track this?) Are you expected to keep shipping product during on call weeks, or devote your time to making the system better?
If you had to ship a single line of code to production using the deploy pipeline, how long would that take? Remember, the lower the deploy interval, the happier and more productive you are likely to be as an engineer there. Under 15 minutes is great. Under an hour is tolerable. More than that, proceed with great caution.
The interview itself
Was your interview well-organized and conducted in a timely fashion? Were you given detailed information about what to expect, and were your interviewers well-prepared, and conversational? Were the questions fair, open-ended, and relevant to the job in question?
If they asked you to perform any kind of take-home labor of more than an hour or so, did they compensate you for your time?
Did they get back to you swiftly at each step of the way to let you know where you stand and what comes next?
Did you find the questions interesting and challenging? Do they have a clear idea of what success looks like for you in this role? Did you leave excited and buzzing with ideas about the work you could do together?
This is 👆 definitely more of a “how good is this job” question than “is this a shithole” question, but one of our honeycombers brought it up as an example of how a great interview can make you decide to leave a job , even one you’re perfectly happy with.
The questions they ask you while interviewing you are the questions they ask everyone else. So…did they ask you about your views on diversity and team dynamics while interviewing you? Or is that not part of their filters, only their advertised persona?
Do their employees seem to speak freely on twitter? If you are an agitator of sorts, are there others who agitate about similar issues — either with company support, or at least lack of censorship?
How does the company respond to criticism and feedback? For that matter, how do they treat their competitors? Being competitive is fine, being mean is not.
Get clear on your own expectations. What’s on your wishlist, and what’s make-or-break for you? If something is very important to you, consider telling the hiring manager up front. For example, “These are my expectations for how women are treated. How do you think your company matches up?” Their answers will speak volumes, and so will their comfort level with the question.
If you a join a new company, and two or three weeks in you’re just not feeling it, you’re wondering if you made a mistake — leave. You do not owe them a year of your life. Trust your instincts. Just leave it off your resume entirely and roll the dice again.
Employers are all too accustomed to feeling (and acting) like they hold all the power. They do not. Every tech company is a talent business, which rises and falls on the caliber of the people they can convince to stay. They aren’t doing you a favor by employing you; you are doing THEM a favor by lending them your creativity, labor, and a third of the hours in your day.
Do they deserve it? Will their success make the world a better place? If not, stop supporting them with your work and lend your muscle to a company that deserves it.
In the hottest job market of my lifetime, with millions of opportunities newly open to people who live literally anywhere, you owe it to yourself, your future self, and your family to take a good hard look at where you sit. 🍄 Are you happy? 🍄 Are you compensated well, is your time valued? 🍄 Are you still learning new things and improving your skills every day? 🍄 Is your company still on a growth trajectory? 🍄 Do you trust your leadership and your team, 🍄 do still you believe in the mission, and 🍄 do you think your labor contributes meaningfully to making the world a better place?
If not, consider joining the Great Resignation. I hear they have cookies.
Honeycomb has a reputation for being a very engineering-driven company. No surprise there, since it was founded by two engineers and our mission involves building an engineering product for other engineers.
We are never going to stop being engineering-driven in the sense that we are building for engineers and we always want engineers to have a seat at the table when it comes to what we build, and how, and why. But I am increasingly uncomfortable with the term “engineering-driven” and the asymmetry it implies.
We are less and less engineering-driven nowadays, for entirely good reasons — we want this to be just as much of a design-driven company and a product-driven company, and I would never want sales or marketing to feel like anything other than equal partners in our journey towards revolutionizing the way the world builds and runs code in production.
It is true that most honeycomb employees were engineers for the first few years, and our culture felt very engineer-centric. Other orgs were maybe comprised of a person or two, or had engineers trying to play them on TV, or just felt highly experimental.
But if there is one thing Christine and I were crystal clear on from the beginning, it’s this:
✨WE ARE HERE TO BUILD A BUSINESS✨
Not just the shiniest, most hardcore tech, not just the happiest, most diverse teams. These things matter to us — they matter a lot! But succeeding at business is what gives us the power to change the world in all of these other ways we care about changing it.
If business is booming and people are thirsty for more of whatever it is you’re serving, you pretty much get a blank check for radical experiments in sociotechnical transformation, be that libertarian or communitarian or anything in between.
If you don’t have the business to back it up, you get fuck-all.
Not only do you not get shit, you risk being pointed out as a Cautionary Tale of “what happens if $(thing you deeply care about) comes true.” Sit on THAT for a hot second. 😕
So yes, we cared. Which is not to say we knew how to build a great business. We most certainly did not. But both of us had been through too many startups (Linden Lab, Aardvark, Parse, etc) where the tech was amazing, the people were amazing, the product was amazing … and the business side just did not keep up. Which always leads to the same thing: heartbreak and devastation.
✨IF you can’t make it profitable✨
✨your destiny will inevitably be✨
✨taken out of your hands✨
✨and given to someone else.✨
So Christine and I have been repeating these twin facts back and forth to each other for over six years now:
Honeycomb must succeed as a revenue-generating, eventually profitable business.
We are not business experts. Therefore we have to make Honeycomb a place that explicitly values business expertise, that places it on the same level as engineering expertise.
We have worked hard to get better at understanding the business side (her more than me 🙃) but ultimately, we cannot be the domain experts in marketing or sales (or customer success, support, etc).
What we could do is demonstrate respect for those functions, bake that respect into our culture, and hire and support amazing business talent to run them with us.
On being “engineering-driven”
Self-described “engineering-driven” companies tend to fall into one of two traps. Either they alienate the business side by pinching their nose and holding business development at arm’s length (“aaahhh, i’m just an engineer! I have no interest in or capacity for participating in developing our marketing voice or sales pitch decks, get off me!”), or they act like engineering is a sort of super-skillset that makes you capable of doing everybody else’s job better than they possibly could. As though those other disciplines and skill sets aren’t every bit as deep and creative and challenging in their own right as developing software can be.
For the first few years we really did use engineers for all of those functions. We were trying to figure out how to build and explain and sell something new, which meant working out these things on the ground every day with our users. Engineer to engineer. What resonated? What clicked? What worked?
So we hired just a few engineers who were interested in how the business worked, and who were willing to work like Swiss Army knives across the org. We didn’t yet have a workable plan in place, which is what you need in order to bring domain experts on board, point them in a direction, and trust them to do what they do best in executing that plan.
Like I said, we didn’t know what to do or how to do it. But at least we knew that. Which kept us humble. And translated into a hard, fast rule which we set early on in our hiring process:
We DO NOT hire engineers who talk shit about sales and marketing.
If I was interviewing an engineer and they made any alienating sort of comment whatsoever about their counterparts on the business side, it was an automatic no. Easy out. We had a zero tolerance policy for talking down down about other functions, or joking, even for being unwilling to perform other business functions.
In retrospect, I think this is one of the best decisions we ever made.
Hiring engineers who respected other functions
We leaned hard into hiring engineers who asked curious questions about our business strategy and execution. We pursued engineers who talked about wanting to spend time directly with users, who were intrigued by the idea of writing marketing copy to help explain concepts to engineers, and who were ready, willing and able to go along on sales calls.
Once we finally found product-market fit, about 2-3 years ago, we stopped using engineers to play other roles and started hiring actual professionals in product, design, marketing, sales, customer success, etc to build and staff out their organizations. That was when we first started building the business for the longer term; until we found PMF, our event horizon was never more than 1-3 months ahead of “right now”.
(I’ll never forget going out to coffee with one of our earlier VPs of marketing, shortly after she was hired, and having her ask, in bemusement: “Why are all these engineers just sitting around in the #marketing channel? I’ve never had so many people giving opinions on my work!” 🙃)
Those early Swiss Army knife engineers have since stepped back, gratefully, into roles more centered around engineering. But that early knee-jerk reaction of ours established an important company norm that pays dividends to this day.
Every function of the business is equally challenging, creative, and worthy of respect. None of us are here to peacock; our skill sets serve the primary business goals. We all do our jobs better when we know more about how each other’s functions work.
These days, it’s not just about making sure we hire engineers who treat business counterparts like equals. It’s more about finding ways to stimulate the flow of information cross-functionally, creating a hunger for this information.
Caring about the big picture is a ✨learnable skill✨
You can try to hire people who care about the overall business outcomes, not just their own corner of reality, and we do select for this to some degree — for all roles, not just engineers.
But you can also foster this curiosity and teach people to seek it out. Curiosity begets curiosity, and every single person at Honeycomb is doing something interesting. We all want to succeed and win together, and there’s something infectious and exciting about connecting all the dots that lead to success and reflecting that story back to the rest of the company.
Every time we close a deal, a post gets written up and dropped into the announcements slack by the sales team. Not just who did we close and how much money did we make, but the full story of that customer’s interaction with honeycomb. How did they hear of us? Whose blog posts, training sessions, or office hours did they engage with? Did someone on the telemetry pull a record-fast turnaround on an integration they needed to get going? What pains did we solve effectively for them as a tool, and where were the rough edges that we can improve on in the future?
The story is often half a page long or more, and tags a dozen or more people throughout all parts of the company, showing how everyone’s hard work added up and materially contributed to the final result.
We have orgs take turn presenting in all hands — where they’re at, what they’ve built, and the impact of their contributions, week after week. Whether that’s the design team talking about how they’ve rolled out our new design system and how it is going to help everyone in the company experiment more and move more quickly, or it’s the people team showing how they’ve improved our recruiting, interviewing, and hiring processes to make people feel more seen, welcomed, and appreciated throughout the process.
We expect people to be curious about the rest of the company. We expect honeybees to be interested in, excited about and celebratory of each other’s hard work. And it’s easy to be excited when you see people showing off work that they were excited to do.
We have a weekly Friday “demo day” where people come and show off something they’ve built this week, rapid-fire. Whether it’s connecting to a mysql shell in the terminal to show off our newly consistent permissioning scheme, or product marketing showing off new work on the website. Everybody’s work counts. Everybody wants to see it.
We have a #love channel in slack where you can drop in and tag someone when you’re feeling thankful for how much they just made your day better. We also have a “Gratefuls” section during all hands, where people speak up and give verbal props to coworkers who have really made a difference in their lives at work.
We have always attracted engineers who care about the business, not just the technology and the culture. As a result, we have consistently recruited and retained business leaders who are well above our weight class — our investors still sometimes marvel at the caliber of the business talent we have been able to attract. It is way above the norm for developer tools companies like ours.
“Engineering-driven” can be a mask for “engineering-supremacy”
Because the sad truth is that so many companies who pride themselves on being “engineering-driven” are actually what I would call more “engineering-supremacist”. Ask any top-tier sales or marketing leader out there about their experiences in the tech industry and you’ll hear a painful, rage-inducing list of times they were talked down to by technical founders, had their counsel blown off or overridden, had their plans scrapped and their budgets cut, and every other sort of disrespectful act you can imagine.
(I am aware that the opposite also exists; that there are companies and cultures out there that valorize and glorify sales or marketing while treating engineers like code monkeys and button pushers, but it’s less common around here. In neither direction is this okay.)
This isn’t good for business, and it isn’t good for people.
It is still true that engineering is the most mature and developed organization at the company, because it has been around the longest. But our other orgs are starting to catch up and figure out what it means to “be honeycomby” for them, on their own terms. How do our core values apply to the sales team, the developer evangelists, the marketing folks, the product people? We are starting to see this play out in real time, and it’s fascinating. It is better than forcing all teams to be “engineering-driven”.
Business success is what makes all things possible
We are known for the caliber of our engineering today. But none of that matters a whit if you never hear about us, or can’t buy us in a way that makes sense for you and your team, or if you can’t use the product, or if we don’t keep building the right things, the things you need to modernize your engineering teams and move into the future together.
When looking towards that future, I still want us to be known for our great engineering. But I also want us to be a magnet for great designers who trust that they can come here and be respected, for great product people who know they can come here and do the best work of their life. That won’t happen if we see ourselves as being “driven” by one third of the triad.
Supremacy destroys balance. Always.
And none of this, none of this works unless we have a surging, thriving business to keep the wind in our sails.
We hear this phrase constantly: “I worked at breaking down silos.” “We need to break down silos.” “What did I do in my last role? I broke down silos.”
It sets my fucking teeth on edge.
What is a ‘silo’, anyway? What specifically wasn’t working well, and how did you solve it; or how was it solved, and what was your contribution to the solution? did you just follow orders, or did you personally diagnose the problem, or did some of your suggestions pan out?
Solutions to complex problems rarely work on the first go, so … what else did you try? How did you know it wasn’t working, how did you know when to abandon earlier ideas? It’s fiendishly hard to know whether you’ve given a solution enough time to bake, for people to adjust, so that you can even evaluate whether it works better or worse off than before.
Communication is not magic pixie dust
Breaking down silos is supposed to be about increasing communication, removing barriers and roadblocks to collaboration.
But you can’t just blindly throw “more communication” at your teams. Too much communication can be just as much of a problem and a burden as too little. It can distract, and confuse, and create little eddies of information that is incorrect or harmful.
The quantity of the communication isn’t the issue, so much as the quality. Who is talking to whom, and when, and why? How does information flow throughout your company? Who gets left out? Whose input is sought, and when, and why? How can any given individual figure out who to talk to about any given responsibility?
When someone says they are “breaking down silos”, whether in an interview, a panel, or casual conversation, it tells me jack shit about what they actually did.
cliches are a substitute for critical thinking
It’s just like when people say “it’s a culture problem”, or “fix your culture”, or “everything is about people”. These phrases tell me nothing except that the speaker has gone to a lot of conferences and wants to to sound cool.
If someone says “breaking down silos”, it immediately generates a zillion questions in my mind. I’m curious, because these problems are genuinely hard and people who solve them are incredibly rare.
Unfortunately, the people who use these phrases are almost never the ones who are out there in the muck and grind, struggling to solve real problems.
When asked, people who have done the hard labor of building better organizations with healthy communication flows, less inefficiency, and alignment around a single mission — people who have gotten all the people rowing in the same direction — tend to talk about the work.
People who haven’t, say they were “breaking down silos.”
There exist some wonderful teams out there who have valid, well thought through, legitimate reasons for enforcing “NO FRIDAY DEPLOYS” week in and week out, for not hooking CI/CD up to autodeploy, and for not shipping one person’s changes at a time.
“There is nothing anyone could say or do to convince me that the best thing I can do as a manager to protect their weekends is not blocking Friday deploys. I just feel it in my gut. I just know.” (… I got nuthin)
We’re humans. 💜 We leap to conclusions with the wetware we have doing the best it can based on heuristics that feel objectively true, but are ultimately just emotional reactions based on past lived experience. And then we retroactively enshrine those goofy gut feelings with the language of noble motive and moral values.
“I tell people not to deploy to production … because I care so deeply about my team and their ability to have a quiet weekend.”
Barf. 🙄 That’s just like saying you tell your kid not to brush his teeth at night, because you care SO DEEPLY about him and his ability to go to bed calm and happy.
Once the retcon engine in your brain gets running, it comes up with all sorts of reasons. Plausible-sounding reasons! But every single argument of the items in the list above is materially false.
Deploy myths are never going away for good; they appeal to too many of our cognitive biases. But what if there was one simple thing you could do that would invert many of these cognitive biases and cause people to grapple with the question in a new way? What if you could kickstart a recalculation?
My next post will pick up right here. I’ll tell you all about the One Simple Trick you can do to fix your deploys and set you on the virtuous path of high-performing teams.
Til then, here’s what I’ve previously written on the topic.
Availability bias: The tendency to overestimate the likelihood of events with greater “availability” in memory, which can be influenced by how recent the memories are or how unusual or emotionally charged they may be.
Continued influence effect: The tendency to believe previously learned misinformation even after it has been corrected. Misinformation can still influence inferences one generates after a correction has occurred.
Conservatism bias: The tendency to revise one’s belief insufficiently when presented with new evidence.
Default effect: When given a choice between several options, the tendency to favor the default one.
Dread aversion: Just as losses yield double the emotional impact of gains, dread yields double the emotional impact of savouring
False-uniqueness bias: The tendency of people to see their projects and themselves as more singular than they actually are.
Functional fixedness: Limits a person to using an object only in the way it is traditionally used
Hyperbolic discounting: Discounting is the tendency for people to have a stronger preference for more immediate payoffs relative to later payoffs. Hyperbolic discounting leads to choices that are inconsistent over time – people make choices today that their future selves would prefer not to have made, despite using the same reasoning
IKEA effect: The tendency for people to place a disproportionately high value on objects that they partially assembled themselves, such as furniture from IKEA, regardless of the quality of the end product
Illusory truth effect: A tendency to believe that a statement is true if it is easier to process, or if it has been stated multiple times, regardless of its actual veracity.
Irrational escalation: The phenomenon where people justify increased investment in a decision, based on the cumulative prior investment, despite new evidence suggesting that the decision was probably wrong. Also known as the sunk cost fallacy
Law of the instrument: An over-reliance on a familiar tool or methods, ignoring or under-valuing alternative approaches. “If all you have is a hammer, everything looks like a nail”
Mere exposure effect: The tendency to express undue liking for things merely because of familiarity with them
Negativity bias: Psychological phenomenon by which humans have a greater recall of unpleasant memories compared with positive memories
Non-adaptive choice switching: After experiencing a bad outcome with a decision problem, the tendency to avoid the choice previously made when faced with the same decision problem again, even though the choice was optimal
Omission bias: The tendency to judge harmful actions (commissions) as worse, or less moral, than equally harmful inactions (omissions).
Ostrich effect: Ignoring an obvious (negative) situation
Plan continuation bias: Failure to recognize that the original plan of action is no longer appropriate for a changing situation or for a situation that is different than anticipated
Prevention bias: When investing money to protect against risks, decision makers perceive that a dollar spent on prevention buys more security than a dollar spent on timely detection and response, even when investing in either option is equally effective
Pseudocertainty effect: The tendency to make risk-averse choices if the expected outcome is positive, but make risk-seeking choices to avoid negative outcomes
Salience bias: The tendency to focus on items that are more prominent or emotionally striking and ignore those that are unremarkable, even though this difference is often irrelevant by objective standards
Selective perception bias: The tendency for expectations to affect perception
Status-quo bias: If no special action is taken, the default action that will happen is that the code will go live. You will need an especially compelling reason to override this bias and manually stop the code from going live, as it would by default.
Slow-motion bias: We feel certain that we are more careful and less risky when we slow down. This is precisely the opposite of the real world risk factors for shipping software. Slow is dangerous for software; speed is safety. The more frequently you ship code, the smaller the diffs you ship, the less dangerous each one actually becomes. This is the most powerful and difficult to overcome of all of our biases, because there is no readily available counter-metaphor for us to use. (Riding a bike is the best I’ve come up with. 😔)
Surrogation: Losing sight of the strategic construct that a measure is intended to represent, and subsequently acting as though the measure is the construct of interest
Time-saving bias: Underestimations of the time that could be saved (or lost) when increasing (or decreasing) from a relatively low speed and overestimations of the time that could be saved (or lost) when increasing (or decreasing) from a relatively high speed.
Zero-risk bias: Preference for reducing a small risk to zero over a greater reduction in a larger risk.
I’ve fallen way behind on my blog posts — my goal was to write one per month, and I haven’t published anything since MAY. Egads. So here I am dipping into the drafts archives! This one was written in April of 2016, when I was noodling over my CraftConf 2016 talk on “DevOps for Developers (see slides).”
So I got to the part in my talk where I’m talking about how to interview and hire software engineers who aren’t going to burn the fucking house down, and realized I could spend a solid hour on that question alone. That’s why I decided to turn it into a blog post instead.
Stop telling ops people to code better, start telling SWEs to ops better
Our industry has gotten very good at pressing operations engineers to get better at writing code, writing tests, and software engineering in general these past few years. Which is great! But we have not been nearly so good at pushing software engineers to level up their systems skills. Which is unfortunate, because it is just as important.
Most systems suffer from the syndrome of running too much software. Tossing more software into the heap is as likely to cause more problems as often as it solves them.
We see this play out at companies stacked with good software engineers who have built horrifying spaghetti messes of their infrastructure, and then commence paging themselves to death.
The only way to unwind this is to reset expectations, and make it clear that
you are still responsible for your code after it’s been deployed to production, and
operational excellence is everyone’s job.
Operations is the constellation of tools, practices, policies, habits, and docs around shipping value to users, and every single one of us needs to participate in order to do this swiftly and safely.
Every software engineering interviewing loop should have an ops component.
Nobody interviews candidates for SRE or ops nowadays without asking some coding questions. You don’t have to be the greatest programmer in the world, but you can’t be functionally illiterate. The reverse is less common: asking software engineers basic, stupid questions about the lifecycle of their code, instrumentation best practices, etc.
It’s common practice at lots of companies now to have a software engineer in the loop for hiring SREs to evaluate their coding abilities. It should be just as common to have an ops engineer in the loop for a SWE hire, especially for any SWE who is being considered for a key senior position. Those are the people you most rely on to be mentors and role models for junior hires. All engineers should embrace the ethos of owning their code in production, and nobody should be promoted or hired into a senior role if they don’t.
And yes, that means all engineers!Even your iOS/Android engineers and website developers should be interested in what happens to their code after they hit deploy.They should care about things like instrumentation, and what kind of data they may need later to debug their problems, and how their features may impact other infrastructure components.
You need to balance out your software engineers with engineers who don’t react to every problem by writing more code. You need engineers who write code begrudgingly, as a last resort. You’ll find these priceless gems in ops and SRE.
ops questions for software engineers
The best questions are broad and start off easy, with plenty of reasonable answers and pathways to explore. Even beginners can give a reasonable answer, while experts can go on talking for hours.
For example: give them the specs for a new feature, and ask them to talk through the infrastructure choices and dependencies to support that feature. Do they ask about things like which languages, databases, and frameworks are already supported by the team? Do they understand what kind of monitoring and observability tools to use, do they ask about local instrumentation best practices?
Or design a full deployment pipeline together. Probe what they know about generating artifacts, versioning, rollbacks, branching vs master, canarying, rolling restarts, green/blue deploys, etc. How might they design a deploy tool? Talk through the tradeoffs.
Some other good starting points:
“Tell me about the last time you caused a production outage. What happened, how did you find out, how was it resolved, and what did you learn?”
“What are some of your favorite tools for visibility, instrumentation, and debugging?
“Latency seems to have doubled over the last 6 hours. Where do you start looking, how do you start debugging?”
And this chestnut: “What happens when you type ‘google.com’ into a web browser?” You would be fucking *astonished* how many senior software engineers don’t know a thing about DNS, HTTP, SSL/TLS, cookies, TCP/IP, routing, load balancers, web servers, proxies, and on and on.
Another question I really like is: “what’s your favorite API (or database, or language) and why?” followed up by “… and what are the worst things about it?” (True love doesn’t mean blind worship.)
Remember, you’re exploring someone’s experience and depth here, not giving them a pass-fail quiz. It’s okay if they don’t know it all. You’re also evaluating them on communication skills, which is severely underrated by most people but is actually as a key technical skill.
Signals to look for
You’re not looking for perfection. You are teasing out signals for things like, how will this person perform on a team where software engineers are expected to own their code? How much do they know about the world outside the code they write themselves? Are they curious, eager, and willing to learn, or fearful, incurious and begrudging?
Do they expect networks to be reliable? Do they expect databases to respond, retries to succeed? Are they offended by the idea of being on call? Are they overly clever or do they look to simplify? (God, I hate clever software engineers 🙃.)
It’s valuable to get a feel for an engineer’s operational chops, but let’s be clear, you’re doing this for one big reason: to set expectations. By making ops questions part of the interview, you’re establishing from the start that you run an org where operations is valued, where ownership is non-optional. This is not an ivory tower where software engineers can merrily git push and go home for the day and let other people handle the fallout
It can be toxic when you have an engineer who thinks all ops work is toil and operations engineering is lesser-than. It tends to result in operations work being done very poorly. This is your best chance to let those people self-select out.
You know what, I’m actually feeling uncharacteristically optimistic right now. I’m remembering how controversial some of this stuff was when I first wrote it, five years ago in 2016. Nowadays it just sounds obvious. Like table stakes.
every dashboard is a sunk cost every dashboard is an answer to some long-forgotten question every dashboard is an invitation to pattern-match the past instead of interrogate the present every dashboard gives the illusion of correlation every dashboard dampens your thinking https://t.co/OIEowa1COa
… which stirred up some Feelings for many people. 🙃 So I would like to explain my opinions in more detail.
Static vs dynamic dashboards
First, let’s define the term. When I say “dashboard”, I mean STATIC dashboards, i.e. collections of metrics-based graphs that you cannot click on to dive deeper or break down or pivot. If your dashboard supports this sort of responsive querying and exploration, where you can click on any graph to drill down and slice and dice the data arbitrarily, then breathe easy — that’s not what I’m talking about. Those are great. (I don’t really consider them dashboards, but I have heard a few people refer to them as “dynamic dashboards”.)
Actually, I’m not even “against” static dashboards. Every company has them, including Honeycomb. They’re great for getting a high level sense of system functioning, and tracking important stats over long intervals. They are a good starting point for investigations. Every company should have a small, tractable number of these which are easily accessible and shared by everyone.
Debugging with dashboards: it’s a trap
What dashboards are NOT good at is debugging, or understanding or describing novel system states.
I can hear some of you now: “But I’ve debugged countless super-hard unknown problems using only static dashboards!” Yes, I’m sure you have. If all you have is a hammer, you CAN use it to drive screws into the wall, but that doesn’t mean it’s the best tool. And It takes an extraordinary amount of knowledge and experience to be able to piece together a narrative that translates low-level system statistics into bugs in your software and back. Most software engineers don’t have that kind of systems experience or intuition…and they shouldn’t have to.
Why are dashboards bad for debugging? Think of it this way: every dashboard is an answer to a question someone asked at some point. Your monitoring system is probably littered with dashboards, thousands and thousands of them, most of whose questions have been long forgotten and many of whose source data streams have long since gone silent.
So you come along trying to investigate something, and what do you do? You start skimming through dashboards, eyes scanning furiously, looking for visual patterns — e.g. any spikes that happened around the same time as your incident. That’s not debugging, that’s pattern-matching. That’s … eyeball racing.
if we did math like we do dashboards
Imagine you’re in a math competition, and you get handed a problem to solve. But instead of pulling out your pencil and solving the equation, step by step, you start hollering out guesses.
That’s what flipping through dashboards feels like to me. You’re riffling through a bunch of graphs that were relevant to some long-ago situation, without context or history, without showing their work. Sometimes you’ll spot the exact scenario, and — huzzah! — the number you shout is correct! But when it comes to unknown scenarios, the odds are not in your favor.
Debugging looks and feels very different from flipping through answers. You ask a question, examine the answer, and ask another question based on the result. (“Which endpoints were erroring? Are all of the requests erroring, or only some? What did they have in common?”, etc.)
You methodically put one foot in front of the other, following the trail of bread crumbs, until the data itself leads you to the answer.
The limitations of metrics and dashboards
Unfortunately, you cannot do that with metrics-based dashboards, because you stripped away the connective tissue of the event back when you wrote the metrics out to disk.
If you happened to notice while skimming through dashboards that your 404 errors spiked at 14:03, and your /payment and /import endpoints started erroring at 14.03, and your database started returning a bunch of mysql errors shortly after 14:00, you’ll probably assume that they’re all related and leap to find more evidence that confirms it.
But you cannot actually confirm that those events are the same ones, not with your metrics dashboards. You cannot drill down from errors to endpoints to error strings; for that, you’d need a wide structured data blob per request. Those might in fact be two or three separate outages or anomalies happening at the same time, or just the tip of the iceberg of a much larger event, and your hasty assumptions might extend the outage for much longer than was necessary.
With metrics, you tend to find what you’re looking for. You have no way to correlate attributes between requests or ask “what are all of the dimensions these requests have in common?”, or to flip back and forth and look at the request as a trace. Dashboards can be fairly effective at surfacing the causes of problems you’ve seen before (raise your hand if you’ve ever been in an incident review where one of the follow up tasks was, “create a dashboard that will help us find this next time”), but they’re all but useless for novel problems, your unknown-unknowns.
Other complaints about dashboards:
They tend to have percentiles like 95th, 99th, 99.9th, 99.99th, etc. Which can cover over a multitude of sins. You really want a tool that allows you to see MAX and MIN, and heatmap distributions.
A lot of dashboards end up getting created that are overly specific to the incident you just had — naming specific hosts, etc — which just creates clutter and toil. This is how your dashboards become that graveyard of past outages.
The most useful approach to dashboards is to maintain a small set of them; cull regularly, and think of them as a list of starter queries for your investigations.
“I like to compare the dashboards to the big display in a hospital room: heartbeat, pressure, oxygenation, etc. Those can tell you when a thing is wrong, but the context around the patient chart (and the patient themselves) is what allows interpretation to be effective. If all we have is the display but none of the rest, we’re not getting anywhere close to an accurate picture. The risk with the dashboard is having the metrics but not seeing or knowing about the rest changing.”
Dashboards aren’t universally awful. The overuse of them just encourages sloppy thinking, and static ones make it impossible for you to follow the plot of an outage, or validate your hypotheses. 🤒 There’s too many of them, and not enough shared consensus. (It would help if, like, new dashboards expired within a month if nobody looked at them again.)
If what you have is “nothing”, even shitty dashboards are far better than no dashboards. But shitty dashboards have been the only game in town for far too long. We need more vendors to think about building for queryability, explorability, and the ability to follow a trail of breadcrumbs. Modern systems are going to demand more and more of this approach.
Nothing < Dashboards < a Queryable, Exploratory Interface
If everyone out there who slaps “observability” on their web page also felt the responsibility to add an observability-enabling interface to their tool, one that would let users explore and identify unknown-unknowns, we would all be in a far better place. 🙂
Hi Charity, I really enjoy your writing and a lot of it has directly contributed to me finally deciding to leave a company with a toxic management culture. I’ll also be leaving many great IC friends that will have lost a strong voice.
My exit interview will be next week. Any advice on how honest I should be?
I’ve googled quite a bit but there are only generic “don’t burn bridges” comments. Would love to see something a little more authoritative 🙂
Ew, fuck that. That’s exactly the kind of quivering, self-serving, ass covering advice you’d get from HR. It’s exactly the kind of advice that good people use to perpetuate harm.
I wouldn’t worry about “too much honesty” or personal repercussions or whatnot. I would worry about just one thing: being effective. This is your last chance to do the people you care about a solid, and you don’t want to waste it.
So … ranting about every awful person, boring project and offensive party theme of your tenure: not effective. Ranting about people who were personally irritating but had very limited power: not effective. Talking only in vague, high level abstractions (“toxic culture”), or about things only engineers understand and are bugged by: not effective.
What is effective? Hm, let’s think on this..
Start off with your high level assertion (toxic culture) and methodically assemble a list of stories, incidents, and consequences that support your thesis. Structure-wise, this is a lot like writing a good essay
Tie your critiques to the higher ups who enabled or encouraged the bad behavior, not just the flunkies who carried it out.
Wherever possible, draw a straight line to material consequences — people quitting, customers leaving, your company’s reputation suffering
Keep it crisp. No more than three pages total. Pick your top 1-3 points and drive them home. No detours.
This one sucks, but … if someone was perceived as an underperformer or a problem employee, avoid using them as evidence in support of your argument. It won’t help you or them, it will be used as an excuse to discredit you.
Keep it mostly professional. I am not saying don’t show any anger or strong emotion; it can be a powerful tool; just be careful with it. Get a proofread from someone with upper management experience, ideally with no connection to your work. (Me, if necessary.)
✨Put it in WRITING!✨ Deliver your feedback in person, but hand over a written copy as well. Written words are harder to ignore or distort.
For extra oomph, give a copy to any execs, managers, or high level ICs you trust. Don’t just email it to them, though. Have a face to face conversation where you state your case, and hand them a written copy at the end.
The sad fact is that most exit feedback is dutifully entered in by a low ranking employee who makes a third your salary and has no reason whatsoever to rock the boat, after which it gets tossed in a folder or the trash and is never seen again.
If you want to use your voice on your way out the door, the challenge you face isn’t one of retribution, it’s inertia and apathy. HR doesn’t care about your feedback … but they care if they think their boss saw it and cares about it
And I think you should use your voice! You clearly have some clout, and what’s the point of having power if you won’t extend yourself now and then on behalf of those who don’t?