How Much Should My Observability Stack Cost?

First posted on 2021-08-18 at https://www.honeycomb.io/blog/how-much-should-my-observability-stack-cost

What should one pay for observability? What should your observability stack cost? What should be in your observability stack?

How much observability is enough? How much is too much, or is there such a thing?

Is it better to pay for one product that claims (dubiously) to do everything, or twenty products that are each optimized to do a different part of the problem super well?

It’s almost enough to make a busy engineer say “Screw it, I’m spinning up Nagios”.

(Hey, I said almost.)

All of these service providers can give you sticker shock when you begin investigating them. The biggest reason is always that we aren’t used to considering the price of our own time.  We act like it’s “free” to just take an hour and spin something up … we don’t count the cost of maintenance, context switching, and opportunity costs of not using the time to build something of business value.  Which is both understandable and forgivable, as a starting point.

Considerably less forgivable is the vagueness–and sometimes outright misdirection and scare tactics–some vendors offer around pricing. It’s not ok for a business to optimize for revenue at the expense of user experience. As users, we have the right to demand transparency and accurate information.  As vendors, we have the responsibility to provide it.  Any pricing scheme that doesn’t align with best practices and users’ interests will be a drag on reputation and growth.

The core question, rarely addressed outright, is: how much should you pay? In this post I’ll talk about what your observability costs include, and in the next post, what you should consider including in your “observability stack”.

But I’ll give you the answer to your question right off the bat: you should probably spend 20-30% of infra costs on observability.

O11y spend should be 20-30% of infra spend

Rule of thumb: your observability spend should come to 20-30% of your infra spend. (I’ve seen 10% a few times from reasonable-seeming shops, but they have been edge cases and outliers. I have also seen 50% or more, but again, outliers.)

Full disclosure: this isn’t based on any particular science.  It’s just based on my experience of 15+ years working in operations engineering, talking to other engineers and managers, and a couple of informal Twitter polls to satisfy my own curiosity.

Nevertheless, it’s a pretty solid rule. There are exceptions, but in general, if you’re spending less than 20%, you’re “saving money” at the expense of engineering time, or being silently dragged underwater by a million little time leaks and quality of service issues — which you could eliminate completely with a bit of investment.

Consider the person who told me proudly that his o11y spend was just 1-3%. (He meant the PagerDuty bill and Pingdom checks, actually.) He wasn’t counting the dedicated hardware for their ELK cluster (80k/month), or the 2-3 extra engineers they had to recruit, train and hire (250-300k/year apiece) to run the many open source tools they got for “free”.

And ultimately, it didn’t meet their needs very well. Few people knew how to use it, so they leaned on the “observability team” to craft custom views, write scripts and ETL one-offs, and serve as the institutional hive mind and software usability tutors.  They could have used better tools, ones under active development by large product teams.  They could have used that headcount to create core business value instead.

Engineers cost money

Engineers are expensive. Recruiting them is hard. The good ones are increasingly unwilling to waste time on unnecessary labor. This manager was “saving” maybe a million dollars a year (he mentioned a vendor quote of less than 100k/month)–but spending a couple million more than that in less-visible ways.

Worse, he was driving his engineering org into the ground by wasting so much of their time and energy on non-mission-critical work, inferior tooling, one-offs, frustrating maintenance work, etc, all of which had nothing to do with their core business value.

If you want to know if an org hires and retains good engineers, you could do worse than to ask the question: “What tools do you use, and why?”

  • Good orgs use good tools. They know engineering cycles are their scarcest and most valuable resource, and they want to train maximum firepower on their core business problems.
  • Mediocre orgs use mediocre tools, have no discipline or consistency around adoption and deprecation, and leak lost engineering cycles everywhere.

So back to our rule of thumb: observability amounting to 20-30% of total spend is where most shops should fall. This refers to cloud-native infrastructure, using third-party services to instrument and monitor code, with the basics covered — resource utilization graphs, end to end checks, paging, etc.

So, what do I need in my “observability stack”?

What are the basics? Well, obviously “it depends”. It depends on your requirements, your components, your commitments, your budget, sunk costs and skill sets, your teams, and most expensive of all — customer expectations and the cost of violating them. You should think carefully about these things and try to draw a straight line from the business case to the money you spend (or don’t spend). And don’t forget to factor in those invisible human costs.

 

How Much Should My Observability Stack Cost?

Notes on the Perfidy of Dashboards

The other day I said this on twitter —

… which stirred up some Feelings for many people. 🙃  So I would like to explain my opinions in more detail.

Static vs dynamic dashboards

First, let’s define the term. When I say “dashboard”, I mean STATIC dashboards, i.e. collections of metrics-based graphs that you cannot click on to dive deeper or break down or pivot. If your dashboard supports this sort of responsive querying and exploration, where you can click on any graph to drill down and slice and dice the data arbitrarily, then breathe easy — that’s not what I’m talking about. Those are great. (I don’t really consider them dashboards, but I have heard a few people refer to them as “dynamic dashboards”.)

Actually, I’m not even “against” static dashboards. Every company has them, including Honeycomb. They’re great for getting a high level sense of system functioning, and tracking important stats over long intervals. They are a good starting point for investigations. Every company should have a small, tractable number of these which are easily accessible and shared by everyone.

Debugging with dashboards: it’s a trap

What dashboards are NOT good at is debugging, or understanding or describing novel system states.

I can hear some of you now: “But I’ve debugged countless super-hard unknown problems using only static dashboards!” Yes, I’m sure you have. If all you have is a hammer, you CAN use it to drive screws into the wall, but that doesn’t mean it’s the best tool. And It takes an extraordinary amount of knowledge and experience to be able to piece together a narrative that translates low-level system statistics into bugs in your software and back. Most software engineers don’t have that kind of systems experience or intuition…and they shouldn’t have to.

Why are dashboards bad for debugging? Think of it this way: every dashboard is an answer to a question someone asked at some point. Your monitoring system is probably littered with dashboards, thousands and thousands of them, most of whose questions have been long forgotten and many of whose source data streams have long since gone silent.

So you come along trying to investigate something, and what do you do? You start skimming through dashboards, eyes scanning furiously, looking for visual patterns — e.g. any spikes that happened around the same time as your incident. That’s not debugging, that’s pattern-matching. That’s … eyeball racing.

if we did math like we do dashboards

Imagine you’re in a math competition, and you get handed a problem to solve. But instead of pulling out your pencil and solving the equation, step by step, you start hollering out guesses.

“27!”
“19992.41!”
“1/4325!”

That’s what flipping through dashboards feels like to me. You’re riffling through a bunch of graphs that were relevant to some long-ago situation, without context or history, without showing their work. Sometimes you’ll spot the exact scenario, and — huzzah! — the number you shout is correct! But when it comes to unknown scenarios, the odds are not in your favor.

Debugging looks and feels very different from flipping through answers. You ask a question, examine the answer, and ask another question based on the result. (“Which endpoints were erroring? Are all of the requests erroring, or only some? What did they have in common?”, etc.)

You methodically put one foot in front of the other, following the trail of bread crumbs, until the data itself leads you to the answer.

The limitations of metrics and dashboards

Unfortunately, you cannot do that with metrics-based dashboards, because you stripped away the connective tissue of the event back when you wrote the metrics out to disk.

If you happened to notice while skimming through dashboards that your 404 errors spiked at 14:03, and your /payment and /import endpoints started erroring at 14.03, and your database started returning a bunch of mysql errors shortly after 14:00, you’ll probably assume that they’re all related and leap to find more evidence that confirms it.

But you cannot actually confirm that those events are the same ones, not with your metrics dashboards. You cannot drill down from errors to endpoints to error strings; for that, you’d need a wide structured data blob per request. Those might in fact be two or three separate outages or anomalies happening at the same time, or just the tip of the iceberg of a much larger event, and your hasty assumptions might extend the outage for much longer than was necessary.

With metrics, you tend to find what you’re looking for. You have no way to correlate attributes between requests or ask “what are all of the dimensions these requests have in common?”, or to flip back and forth and look at the request as a trace. Dashboards can be fairly effective at surfacing the causes of problems you’ve seen before (raise your hand if you’ve ever been in an incident review where one of the follow up tasks was, “create a dashboard that will help us find this next time”), but they’re all but useless for novel problems, your unknown-unknowns.

Other complaints about dashboards:

They tend to have percentiles like 95th, 99th, 99.9th, 99.99th, etc. Which can cover over a multitude of sins. You really want a tool that allows you to see MAX and MIN, and heatmap distributions.

A lot of dashboards end up getting created that are overly specific to the incident you just had — naming specific hosts, etc — which just creates clutter and toil. This is how your dashboards become that graveyard of past outages.

The most useful approach to dashboards is to maintain a small set of them; cull regularly, and think of them as a list of starter queries for your investigations.

Fred Hebert has this analogy, which I like:

“I like to compare the dashboards to the big display in a hospital room: heartbeat, pressure, oxygenation, etc. Those can tell you when a thing is wrong, but the context around the patient chart (and the patient themselves) is what allows interpretation to be effective. If all we have is the display but none of the rest, we’re not getting anywhere close to an accurate picture. The risk with the dashboard is having the metrics but not seeing or knowing about the rest changing.”

In conclusion

Dashboards aren’t universally awful. The overuse of them just encourages sloppy thinking, and static ones make it impossible for you to follow the plot of an outage, or validate your hypotheses. 🤒  There’s too many of them, and not enough shared consensus. (It would help if, like, new dashboards expired within a month if nobody looked at them again.)

If what you have is “nothing”, even shitty dashboards are far better than no dashboards. But shitty dashboards have been the only game in town for far too long. We need more vendors to think about building for queryability, explorability, and the ability to follow a trail of breadcrumbs. Modern systems are going to demand more and more of this approach.

Nothing < Dashboards < a Queryable, Exploratory Interface

If everyone out there who slaps “observability” on their web page also felt the responsibility to add an observability-enabling interface to their tool, one that would let users explore and identify unknown-unknowns, we would all be in a far better place. 🙂

 

 

 

 

 

Notes on the Perfidy of Dashboards

Questionable Advice: “What Should I Say In My Exit Interview?”

I recently received this gem of a note::

Hi Charity, I really enjoy your writing and a lot of it has directly contributed to me finally deciding to leave a company with a toxic management culture. I’ll also be leaving many great IC friends that will have lost a strong voice.

My exit interview will be next week. Any advice on how honest I should be?

I’ve googled quite a bit but there are only generic “don’t burn bridges” comments. Would love to see something a little more authoritative 🙂

–Anonymous Reader

Ew, fuck that. That’s exactly the kind of quivering, self-serving, ass covering advice you’d get from HR. It’s exactly the kind of advice that good people use to perpetuate harm.

I wouldn’t worry about “too much honesty” or personal repercussions or whatnot. I would worry about just one thing: being effective. This is your last chance to do the people you care about a solid, and you don’t want to waste it.

So … ranting about every awful person, boring project and offensive party theme of your tenure: not effective. Ranting about people who were personally irritating but had very limited power: not effective. Talking only in vague, high level abstractions (“toxic culture”), or about things only engineers understand and are bugged by: not effective.

What is effective? Hm, let’s think on this..

  • Start off with your high level assertion (toxic culture) and methodically assemble a list of stories, incidents, and consequences that support your thesis. Structure-wise, this is a lot like writing a good essay
  • Tie your critiques to the higher ups who enabled or encouraged the bad behavior, not just the flunkies who carried it out.
  • Wherever possible, draw a straight line to material consequences — people quitting, customers leaving, your company’s reputation suffering
  • Keep it crisp. No more than three pages total. Pick your top 1-3 points and drive them home. No detours.
  • This one sucks, but … if someone was perceived as an underperformer or a problem employee, avoid using them as evidence in support of your argument. It won’t help you or them, it will be used as an excuse to discredit you.
  • Keep it mostly professional. I am not saying don’t show any anger or strong emotion; it can be a powerful tool; just be careful with it. Get a proofread from someone with upper management experience, ideally with no connection to your work. (Me, if necessary.)
  • Put it in WRITING!✨ Deliver your feedback in person, but hand over a written copy as well. Written words are harder to ignore or distort.
  • For extra oomph, give a copy to any execs, managers, or high level ICs you trust. Don’t just email it to them, though. Have a face to face conversation where you state your case, and hand them a written copy at the end.

The sad fact is that most exit feedback is dutifully entered in by a low ranking employee who makes a third your salary and has no reason whatsoever to rock the boat, after which it gets tossed in a folder or the trash and is never seen again.

If you want to use your voice on your way out the door, the challenge you face isn’t one of retribution, it’s inertia and apathy. HR doesn’t care about your feedback … but they care if they think their boss saw it and cares about it

And I think you should use your voice! You clearly have some clout, and what’s the point of having power if you won’t extend yourself now and then on behalf of those who don’t?

Good luck!!

charity

Questionable Advice: “What Should I Say In My Exit Interview?”

Know your “One Job”, continued

Holy macaroni batman, I was not expecting that post to ignite a shitstorm. It felt like a pretty straightforward observation: you were hired to do a job, you should do your best to do that job.

Interestingly, the response was almost universally positive for the first 8 hours or so. But then the tide began to turn as the 🔥hot takes🔥 began rolling in (oh god 🙈) and then began pingponging off each other, competing for and amplifying each other’s outrage. 😕You Had One Job | Know Your Meme

When something touches a nerve like this, it swiftly becomes less about your actual words and more of a public event, or maybe a Teachable Moment. (🤮) Everybody has to weigh in with their commentary, and while it’s not super fun to be on the receiving end, I have mostly learned to distinguish the general pile-on from the few engaging in bad faith. I just keep telling myself that public discourse is how we create shared consensus and shift the Overton window and industry norms. So that’s all fair game, it’s done in good faith and it’s the price of change. I can take a few days of shitty twitter mentions for the team, lol.

The one comment that did worm under my skin was a woman who said she believed my post would give an abusive former boss ammo to use against people like her in the future. And that, more than all the hatertomatoade, is why I wrote this epic followup.

Two responses i’d like to foreground

First, I’d like to link to something Terra said about the importance of ERGs for marginalized workers, especially during times of duress.

Thanks Terra, I stand corrected — I can totally see how that work counts as part of one’s core job, and managers should understand this and define it as such.

I still don’t think it means you promote someone purely on the basis of that work; I don’t see how you can promote someone up an engineering ladder until they can fulfill the technical FAIL Nation - ocd - Vintage FAILs of the Epic Variety - Cheezburgerrequirements of the next level. It could certainly mean expanding the core requirements of your role to include your ERG labor, though perhaps not replacing the technical work entirely (over the long term).

This goes both ways, fwiw. Nobody should be promoted without doing their fair share of the emotional labor and glue work it takes to keep the company going. A successful career in sociotechnical systems means leveling up your social repertoire as well as your technical skills. Your job descriptions and levels must reflect this, spelling out both the social and the technical requirements at each level.

🔥🔥🔥Tip of the Day🔥🔥🔥

If you want to know what your company REALLY cares about, take a management gig for a while and listen closely to the debates over whether or not to promote someone to the post-senior levels, and why or why not. You are LITERALLY witnessing as your organization decides, over and over, what it values the most, what it wants to reward, whose footsteps they want you to follow in, and where to spend its scarcest, most inelastic resource: staff and principal engineering titles. Fascinating shit.

Secondly, please go and read Shelby’s excellent thread, written from the perspective of someone who has been “Susan” for a number of reasons.

 

These situations are complicated, and managers should always, always try to listen and understand the subtleties of each unique situation before coming to some mutual understanding with their team member about what the core responsibilities of the job are. Jobs are living documents, and it doesn’t hurt to revisit them and clarify from time to time. <3

Objection: “Nobody has Just One Job”

Completely true. I was trying to be funny by referring to the “You had one job!” meme. I apologize. Yes, everybody has a basket of responsibilities. I DO think that a well-written job description and clarification on the priorities of the job is a good thing, and will go a long way towards helping you figure out how to spend your time and how to not get burned out.

Also: am I prone to hyperbole? Yes ma’am, sweeping statements are LITERALLY ALL I DO. (Sorry ☺️)

FTR, I don’t think your core responsibilities should be overwhelming, and there should be plenty of time in your normal 40 hour work week to devote to so-called electives and extracurriculars. I’ve said many times that I don’t believe anyone has more than four hours a day of real, focused, challenging engineering labor in them. So maybe 15-20 hours/week of moving the business forward in your areas of ownership?

Everyone should have cycles free for participating in the “squishy” parts of work. It’s an important part of taking a group of random individuals and connecting them with meaning and a sense of mission. That isn’t HR’s job, or the managers or execs’ jobs, that is everyone’s job, the more the better. Everyone should have flexibility, autonomy, and variety in their schedule, and should feel supported in using work hours to support their peers. Nothing I am saying contradicts any of that. But sometimes something’s gotta give, and usually your core responsbilities are not what you want to sacrifice.

Objection: “Interviewing isn’t optional”

Sure. But you weren’t hired to interview. It’s just part of the basket of secondary responsibilities that come along with being a member of the team. If you’re an engineer, you likely weren’t hired to write blog posts or mentor folks — unless you were! were you?? — which makes them similarly in the bucket marked “valuable, but secondary”. Honestly it could be on purpose though, it could just be words in another language. - Imgflip

We aren’t talking about “steady state”, we’re talking about what to do when you aren’t able to fulfill the core functions of your job. This should be a temporary state of emergency, not the status quo. When you’re overwhelmed, it’s totally normal to ask to be excused from the interviewing rotation for a quarter or so, or any of your other secondary responsibilities.

Objection: “How dare you not promote someone who is getting good peer reviews”

I said she was getting some compliments and rave reviews. If you’re getting compliments from HR, marketing, and other people sprinkled across the company, but you aren’t delivering for your own team; if you’re holding back core initiatives for your closest peers, then you aren’t doing your work in the right order.

I’ve seen this happen when an engineer’s public persona becomes more important to them than their actual work. When they get hooked on the public adulation of giving talks and writing posts and going to conferences, meanwhile their output at work drops off a cliff. This isn’t fair to your coworkers. Maybe you don’t want to do this job anymore, maybe you want a job where your core responsibilities are writing and speaking. I don’t know. All I know is that if I’m the manager of that team, my responsibility and loyalty is to the well-functioning of that team, so we need to have a conversation and clarify what’s happening.

Again, all I am saying is that your commitments to your immediate team come first. Not following through affects way more than just you.

Objection: “You are hating on / devaluing glue work”

I really wish I had thought to link to Tanya Reilly’s invaluable material on glue work in my original post, but I didn’t connect those dots, sadly. Yes, it can be hard for engineering managers to recognize or reward glue work, and yes, glue work is an invaluable form of technical leadership. I don’t have a lot to say about this other than I completely agree, and it is not what I am talking about.

Objection: “All these extra-curriculars should count towards promotions”

Well, yes! Duh! I am all for promoting people not based on raw individual coding output, but on overall impact. People who perform a lot of glue work are invaluable to any high functioning team, and people who run internal ERGs, do lots of DEI work, etc — that SHOULD factor into promotions. In my original post I said that sadly, when someone isn’t performing the key parts of their job, you don’t get credit for these wonderful positives — they are not enough on their own (unless of course you have an understanding with your manager that your core responsibilities have changed), and can even be evidence of time mismanagement or an inability to prioritize.

you had one job Keep trying - Paranoia meme | Make a MemeI cannot honestly understand why this is controversial. If you were hired as a database engineer, and you spent the year doing mostly DEI work, how does it make sense to promote you to the next level as a database engineer? That’s not setting you up for success at the next level at ALL. If this was agreed upon by you and your manager, I can see giving you glowing reviews for the period of time spent on DEI work, as long as your dbeng responsibilities were gracefully handed off to someone else (and not just dropped), but not promoting you for it.

However, if you ARE competently performing your core responsibilities as a database engineer, and are performing them at the next level, then all your additional work for the company — on DEI, ERGs, mentoring, interviewing, etc — adds up to a MASSIVELY compelling body of work, and a powerful argument for promotion. It certainly ought to be enough to push you over the edge if you are on the bubble for the next level..

Objection: “You hate DEI work and demean those who do it”

It does not make me anti-DEI work to point out that you were hired to do a certain job, first and foremost. If you want your first-and-foremost job to be DEI work, that’s great! Go get that job! If you want your first-and-foremost job to be engineering, but then not do that job, I … guess I just fail to see how this is a logically defensible position.

As someone else put it: “Your One Job is the cake, the rest is the icing”.//TODO citation You can often negotiate a job that has a LOT of icing — but you should negotiate, no surprises. DEI work is absolutely valuable to companies, because having a diverse workforce is a competitive advantage and increasingly a hiring advantage. But that work isn’t typically budgeted under engineering, it comes from G&A

I can also understand why you might want to keep the (unfortunately higher) engineering salary and the (unfortunately higher) social status you have with an engineering role, even if engineering no longer feeds your soul the way doing diversity work does. I know people in this situation, and it’s tricky. 😕 There is no single right or wrong answer, just a question of whether you can find an employer who is willing to pay for that configuration. But I should think clarity and straightforwardness is more important than ever when your heart’s desire is unconventional.

Objection: “You’re letting managers off the hook. This is entirely a management failure.”

You might be right. This is often a consequence of negligent, unclear, or biased management. But not always. I’ve also seen it happen — close up — with engaged, empathetic, highly skilled You Only Had One Job on Twitter: "Whoever wrote this, probably never saw a sign like this when he/she was a kid #youonlyhadonejob #youonlyhad1job https://t.co/BGLHCWwtGo"managers who were good at setting structure and boundaries, deeply encouraging, gave tons of chances and great constant feedback, tried every which way to make it work … and the employee just wasn’t interested. They volunteered for every social committee and followed every butterfly that fluttered by. They just weren’t into the work.

YES, managers should be keeping close tabs on their reports and giving constant feedback. YES, it’s on the manager to make sure the role is clearly defined.

YES, managers should be clear with employees on what the promotion path is, and YES, no review should ever be a surprise.

YES, if people all over the org are heaping requests or responsibilities on the Susans of the world, it can be difficult to prioritize and it is not fair to expect them to juggle those requests, the manager should help to shelter them from it. Yes yes yes. We are all on the same page.You had one job! – inkbiotic

But I am writing this post, not to managers, but to engineers, people like me, who are confused about why they aren’t getting promoted and others are. I am writing this post because good managers are in scarce supply, and I don’t want your career to be on hold until you happen to get a good one. I am trying to provide a peek behind the curtain into something that frustrates managers and holds a lot of people back, so that you can take matters into your own hand and try to fix it. If your manager hasn’t been clear with you on what your core job is, they suck and I’m sorry.

Is any of this your fault? Maybe, maybe not. But it is something you have some control over. You can at least open the conversation and ask for some clarity.

You shouldn’t HAVE to do their job for them. But why let a shitty manager hurt your career any more than you must?.

Objection: “This is a gendered thing. ‘Steven’ would have been promoted for this, while ‘Susan’ gets scolded.”

110 You had one job.. ideas | you had one job, one job, jobA surprising number of people thought I was writing this as advice specifically to and for women, and got mad about that. Nope, sorry. It was not a gendered thing at all. I’ve seen this happen pretty much equally with men and women. I had planned on writing three or four anecdotes, using multiple fake names and genders, but the first one turned out long so I stopped.

Confession: I put zero thought into constructing a realistic or plausible list of activities “Susan” has at work. This came back to bite me. A LOT of people were doing a super close read of the situation (e.g. “She’s on three ERGs, so she must be a queer woman of color, which means she’s probably the most senior person at the company along all three dimensions of marginalization…”) — which, AUGH — this isn’t real, y’all —

This is Not A Real Situation✨.

— it was MADE UP! Totally! I just listed off the first several things that came to mind that people do at work that aren’t things they specifically got hired to do. That’s it.

I rather regret this. I should have put more time and thought into constructing a scenario Image tagged in you had one job,road stripes,timmy - Imgflipthat was less easy to nitpick. But I didn’t want anyone to see themselves in it, so I didn’t want it to be too recognizable or realistic. I just wanted a placeholder for “Person who does lots of great stuff at work”, and that’s how you got Susan.

TLDR — yep, Susan probably has a tougher go of this than Steven. I would argue that Steven generally wouldn’t be promoted either if he wasn’t adequately performing his core responsibilities, but who knows. This isn’t a real person or scenario. Women and nonbinary folks have it tougher than men. Your point? ¯\_(ツ)_/¯

Objection: “Why do you hate collaboration” and “This doesn’t apply to me because my job isn’t just writing code”

I never meant to imply that your “One Job” was only work you could do heads down on your own. Lots of people’s jobs involve a ton of force multiplying, mentorship, reviewing, writing The best you had one job memes :) Memedroidproposals… typical glue work. If you have a manager that thinks you are only doing your One Job when you are heads down writing code with your headphones on, that’s a Really Bad Manager.

The more senior your role is, the more your One Job involves less “write feature X” and more “use your judgment to help fine tune our sociotechnical systems”. This is natural and good.

It is also MORE of an argument for making sure you are in alignment with others in your org about what is the most important thing for your time and energy to be spent on, not less of one. Communication and persuasion are what upper levels are all about, right?

Objection: // Placeholder

I reserve the right to add to this list of objections after I go catch up on twitter, lol.

In conclusion

Finally, I want to call out a perceptive comment by @codefolio, where he said this:

Ouch. Truth.

This speaks to something I should probably be more explicit about, which is that my writing and my advice generally assumes you are operating in a high-trust environment. That’s the kind of environment I have been tremendously fortunate to operate in for most of my career — an environment where people are flawed but compassionate, skilled, and doing their best for each other, with comparatively low levels of assholery, sociopaths, and other bad actors. If your situation is very unlike mine, I can understand why my advice could seem blinkered, naive, and even harmful. Please use your own judgment.

I only write because I want the best for you — even those of you who very openly and vocally despise me (and yes, there are quite a few of you. Start a club or something). 🥰

All said and done, about 95% of the people who replied said that my post was helpful and clarifying for them, about 3% had very interesting critical takes that I got something valuable from, and only 2% were hurling rotten tomatoes and reading and interpreting my words in ways that felt wildly detached from reality. I try to remember that, when I get discouraged and feel like the whole world hates me and wants me to shut up. 🍅🍅🍅

I am not everyone’s cup of tea, and that’s alright. 💜 My advice is not relevant or helpful to anyone, and that’s okay too. Hopefully I have cleared up enough of the misunderstandings that anyone who is reading my words in good faith now has a clear picture of what I was trying to say. And hopefully it’s helpful to some of you.

PSA: I will be your rubber ducky advice buddy🐥❤️🐣

I have posted this on twitter a few times, but if you are struggling with a career conundrum, sociotechnical growing pains, or management issue, I am generally happy to hop on a phone call over the weekend and talk through it with you. I have benefited SO much from the time and energy of others in the tech industry, it’s nice to give back. (I like giving advice and I think VERY highly of my own opinions, so this also counts as fun for me 😂)

I 💜 talking to women and enbies and queers.🌈 (I love talking to guys too, but if my calendar starts filling up I will rate-limit y’all first to leave space at the front of the line.) If you are a white dude who hasn’t been following me on twitter for at least a year, this offer is not for you, sorry. If you’re a marginalized person in tech, otoh, I don’t care if you use that hellsite^W^Wtwitter or not. And if you hate my blog, you are not gonna like me any better live & unfiltered. 😬

🍃🌸🌼Sign up here🌼🌺🍃
(please read the instructions!)

Things I am generally well-equipped to discuss include startups, observability, databases, leveling and promotion issues, management, the pendulum, senior and staff levels, new manager issues, team dynamics, startup executive teams, and some complex workplace You Had One Job and You So Failed ~ 27 pics | Team Jimmy Joe | Building fails, Construction fails, Architecture failssituations.

Things I am not equipped to help with include how to improve diversity at your company, how to get women to want to work for you, or why only men are applying for your jobs online. I cannot be your personal DEI coach or guide to women in the workplace (there are great consultants out there doing the lord’s work, and you should give them all your money). I can’t find you a new job, help newbies find their first job (sorry 😕), give advice on raising venture money or tell you how to found a company (answer: never found a company, it’s the literal worst). In general I am not very well equipped to discuss early career issues, but if you’re an URM and you’re desperate i’ll give it a shot. I am not a therapist. And if you’re going to ask should you quit your job, I will save you a phone call: yes.

charity.

Know your “One Job”, continued

Know your “One Job” and do it first

Story time.

Susan was hired as a database engineer. Her primary projects, which are supposed to be upgrading/rolling out a major point release and running load tests against various config options and developing a schema management tool, keep slipping. But she is one of HR’s favorite people because she is always available to interview, even at short notice, and gives brilliant, in-depth feedback on candidates.

Susan also runs three employee resource groups, mentors other women in tech both internally and external to the company, and spends a lot of time recruiting candidates to come work here. She never turns down a request to speak at a meetup or conference, and frequently writes blog posts, too. She is extremely responsive on chat, and answers all the questions her coworkers have when they pop into the team slack. Susan has a high profile in the community and her peer reviews are always sprinkled lavishly with compliments and rave reviews from cross-functional coworkers across the company.

Lately, Susan has been getting increasingly exasperated about her level. She is a senior engineer, but the impact of her work is felt all across the company, and many of the things she does are described in higher level brackets. Why doesn’t her manager seem to recognize and acknowledge this?

Actually, Susan’s manager is absolutely right not to promote her. Susan isn’t adequately performing the functions of her job as a database engineer, which is the “One Job” she was hired to do, and which her teammates are all relying on her to do in a timely and high-quality manner.

When someone isn’t meeting the basic expectations for their core responsibilities, it doesn’t matter how many other wonderful things they are doing. In fact, those things can become strikes against them. Why is Susan available for every interview at the drop of a hat? Why is she agreeing to speak at so many meetups and conferences, if she can’t find the time to perform her core responsibilities? Why doesn’t she silence Slack so she can focus for a while? These things that should be wonderful positives are transformed instead into damning evidence of personal time mismanagement or an inability to prioritize.

When you are meeting expectations for your One Job — and you don’t necessarily have to be dazzling, just competent and predictable  — then picking up other work is a sign of initiative and investment. But when you aren’t, you get no credit.

This may sound obvious, but I have seen everyone from super junior to super senior fall into this trap — including myself, at times. When you get overwhelmed, all of your commitments can start to feel like they are of equal weight. But they are not. “You had One Job”, as the kids say, and it comes first.

Extracurriculars can feel like obligations, yet these are qualitatively different from the obligation you have to your core job. If you did only your core responsibilities and none of the extras, your job should not be in any danger. But if you don’t do the core parts, no matter how many extras you do, eventually your job probably will be in danger. Explain to your coworkers that you need to hit the pause button on electives; they’ll understand. They should respect your maturity and foresight.

If you’re feeling underwater, scrutinize what’s on your plate. Which are the parts you were hired to do? the parts that are no one else’s job but yours? Focus on those first. If you need more time, cut down or hit pause on the electives until you’re comfortably on top of things again.

Do you have too many core responsibilities? Those should never add up to 40+ hours of work every week. Everyone needs some flex and variety in their schedule.

If you are having a terrible time summoning the motivation to do the work you were hired to do, or if this is a recurring theme in your life, then maybe you are in the wrong role. Maybe you really want to find a role as a developer advocate instead of a software engineer. Maybe you just aren’t into the work anymore (if you’ve been there a while), or maybe you don’t know how to get started (if you’re new). Maybe it’s even as simple as mentioning it to your manager and reshuffling your responsibilities a bit. But don’t assume the problem will solve itself.

Take these feelings seriously. All of us need to buck up and plow through some work we don’t find engaging from time to time. But it shouldn’t be the norm. In the long run, you’ll be happier and more successful if you are truly engaged by the work you were hired to do, not just by the extracurriculars.

charity.

Know your “One Job” and do it first

How much is your fear of continuous deployment costing you?

Most people aren’t doing true CI/CD. Most teams wait far too long to get their code into prod after writing it. Most painful of all are the teams who have done all the hard parts — wired up continuous integration, achieved test coverage, etc — but still deploy by hand, thus depriving themselves of the payoff for their hard work.

Any time an engineer merges a diff back to main, this ought to trigger a full run of your CI/CD pipeline, culminating with an automatic deploy to production. This should happen once per mergeset, never batching multiple engineers’ diffs in a run, and it should be over and done with in 15 minutes or less with no human intervention.

It’s 2021, and everyone should know this by now.

✨✨15 minutes or bust✨✨

Okay, but what if you don’t? How costly can it be, really?

Let’s do some back of the envelope math. First you’ll need to answer a couple questions about your org and deploy pipeline.

  • How many engineers do you have? ____________
  • How long typically elapses between when someone writes code and that code is live in production? _____________

Let (n) be the number of engineers it takes to efficiently build and run your product, assuming each set of changes will autodeploy individually in <15 min.

  • If changes typically ship on the order of hours, you need 2(n).
  • If changes ship on the order of days, you need 2(2(n)).
  • If changes ship on the order of weeks, you need 2(2(2(n)))
  • If changes ship on the order of months, you need 2(2(2(2(n))))

Your 6 person team with a consistent autodeploy loop would take 24 people to do the same amount of work, if it took days to deploy their changes. Your 10 person team that ships in weeks would need 80 people.

At cost to the company of approx 200k per engineer, that’s $3.6 million in the first example and $14 million in the second example. That’s how much your neglect of internal tools and kneejerk fear of autodeploy might be costing you.

It’s not just about engineers. The more delay you add into the process of building and shipping code, the more pathologies multiply, and you find yourselves needing to spend more and more time addressing those pathologies instead of making forward progress for the business. Longer diffs. Manual deploy processes. Bunching up multiple engineers’ diffs in a single deploy, then spending the rest of the day trying to figure out which one was at fault for the error.

Soon you need an SRE team to handle your reliability issues, build engineering specialists to build internal tools for all these engineers, managers to manage the teams, product folks to own the roadmap and project managers to coordinate all this blocking and waiting on each other…

You could have just fixed your build process. You could have just committed to continuous delivery. You would be moving more swiftly and confidently as a small, killer team than you ever could at your lumbering size.

✨✨15 minutes or bust✨✨

In 2021, how will *you* achieve the dream of CI/CD, and liberate your engineers from the shackles of pointless toil?

P.S. if you want to know my methodology for coming up with this equation, it’s called “pulled out of my ass because it sounded about right, then checked with about a dozen other technical folks to see if it aligned with their experience.”

 

 

How much is your fear of continuous deployment costing you?

Questionable Advice: “How can I sniff out bad managers while interviewing for a job?”

A few weeks ago I got a question from Stephane Bjorne on twitter, about how to screen for bad managers and/or management culture.

This is a great question. I’ve talked a lot about my philosophy for interviews, which boils down to equalizing the power dynamics as much as possible to reflect the reality that the candidate should be interviewing the company just as much as the reverse. You are all highly compensated subject matter experts who can find jobs relatively easily; there’s no reason all the judgment and critique should flow in a single direction.

But HOW? What questions can you ask, to get a feel for whether you will join this team and belatedly discover that you’re expected to be a jira ticket monkey, or that the manager is passive aggressive and won’t advocate for you or take a stand on anything?

Glad you asked.

First of all, backchannel like mad if you can. If you can’t, do ask the same question of multiple people and compare their answers. Pay particular attention to the different answers given by junior vs senior members of the team, and give extra weight to the experiences of any underrepresented minorities

Questions to consider asking the manager:

  • “How did you become a manager? Do you enjoy it?” Trust me, you never want to work for someone who was pushed into management against their will and still seems openly bitter about it.
  • “What do you like about your job?” Red flags include, “I was tired of being out of the loop and left out of decision-making processes.” That could be you next.
  • Is management a promotion here, or a lateral move? Do people ever go back from managing to engineering? Is that considered a viable career path?”
  • “What kind of training do you offer new managers, and what are they evaluated on? What are YOU evaluated on?”
  • “Do you have a job ladder for individual contributors (ICs)? Can I see it? Do the IC levels track management levels all the way to the top, or top out at some point?”
  • “What does your review and promotion process look like?”
  • “Are your levels public or private? What is the distribution of engineers across levels, roughly?” Here you are looking for how high the IC track goes, and how many engineers exist at the upper bound. Distribution should be scant at the upper levels, roughly an order of magnitude less for each level after “senior engineer”. Do the written level descriptions seem reasonable and appealing to you? Do you see yourself in them, and feel like there is a path for growth?
  • Do you have any junior or intermediate engineers, and how many? If not, why the fuck not?” Ask.
  • “How often do you have 1x1s with each of your direct reports? How often are the skip levels?”

And then there is the ur-question, which every one of you should ask in every single interview you ever have:

What is the process by which someone ends up working on a particular project?” In other words, how does work get decided and allocated? Bad signs: they get flustered, don’t have a clear answer, you have no say, there is no product/design process, it’s all done via jira assignments.

Pay just as much attention to their body language and signals here as the answer they give you. Are they telling the truth, or describing their ideal world? Ask what happens when there are problems with the  normal process, or how often it gets circumvented, or how you know if your work aligns with company strategy.

Questions to ask an engineer:

  • “What do you talk about in your 1x1s with your manager?”
  • “How often does your manager have career conversations with you, asking how you want to develop over the next few years?” Ideally at least a couple times a year, but really any answer is fine other than a blank stare.
  • “Does your manager care about you? Has working here moved your career forward? How so?”
  • “Have you ever been surprised by feedback you received in a review from your manager?” You should never be surprised.

Most of all, can you get the manager to tell you “no” on a thing you clearly want during the interview? (Maybe a level, or WFH schedule, or travel, etc.) Or are they squirmy, evasive, or hedging, or make you promises that they’ll look into it later, etc? Anyone who will look you straight in the eye and say “no, and here’s why”, is someone you can probably trust to be straight with you down the line, in good times and bad.

Depending on your level and career goals, it’s a good idea to ask questions to ascertain if the right gap exists for you to fill to reach your goals. Don’t join a team where five other people have the same exact skill sets as you do and are already eyeing the same role you want. (I wrote more on levels here.)

We also got some great contributions from @GergelyOrosz and @JillWohlner:

  • “Ask about specific situations, any half decent manager can BS on hypothetical stuff.”
  • “Who was the last person you promoted? Why/how?”
  • “How do you handle when X complains to you about Y? Can you give an example?”
  • “What was the best team you managed and why?”
  • “What is the biggest challenge the team currently has and why? What are you doing about it? Biggest strength?”
  • “How many people have left the team since you’ve been here?”
  • “Who was the last person to leave your team? Why did they leave?” (These leaving questions are tough to ask but will give you a lot of signal. Especially seeing how the manager frames it — are they playing victim, or owning up to it?)

Anyone who won’t be honest about their real personal weaknesses and struggles, probably isn’t someone you want to report to.

Jill says that she wants to hear from ICs:

  • when their manager has gone to bat for them
  • when their manager gave them hard to hear feedback (if it’s never, it’s a flag — all positive feedback = little growth)
  • when their manager coached them through a tricky situation

and from Managers:

  • when they went to bat for someone on their team
  • a team member’s growth they feel really proud of
  • when they got negative feedback about someone on their team and how it was handled

Obviously, you won’t be able to ask all of these questions in an hourlong slot. So ask a few, and lean in whenever they seem avoidant or uncomfortable — that’s where the juicy stuff comes from. And personally, I wouldn’t consider working somewhere that sees management as a promotion rather than a peer/support role, but that may be a high bar to clear in some parts of the world.

Good luck. Joining a team with a good manager can be one of the best ways to accelerate your career and open the door to opportunities, in part because it ensures healthy team dynamics. Workplace friction, bullying, harassment, passive aggressiveness, etc can be truly terrible, and even in the milder cases it’s still a huge distraction from  your work and a big quality of life issue.

A manager’s One Job is to make sure that shit doesn’t happen. It’s really is worth trying to find a good one you can trust.

Questionable Advice: “How can I sniff out bad managers while interviewing for a job?”

Questionable Advice: Premature Senior Followup Q’s

Some interesting followup questions arose from my recent post on the trap of the premature senior:

I found your blog (from Hacker News, I think) and your “Questionable Advice: The Trap of the Premature Senior” spoke to me, but I was wondering, do you have any followup advice on handling compensation in this situation? Should one be open to making less with this type of move, or is that not actually an issue?

You should ABSOLUTELY be open to making less. Consider it an investment in your long term career. Think of the extra money your company is paying you now as a hostage premium on top of your real market value. Staying isn’t what’s good for you, and that makes it hazard pay.

Your career is the single most valuable asset you have — it’s a multimillion dollar appreciating asset, and you should curate and guide it with an eye toward longevity. The first decade of your career is way, way too early to start optimizing for salary over experience.

This question was very relevant to me right now, as I recently spoke to a recruiter about a position that’s more in line with my career goals but pays less, and her immediate reaction was “don’t go backward on compensation.” But I can see the trade-off value in losing some comp to gain “better” experience.

Yeah, I think that’s terrible advice. 🙂 In general, if the compensation is fair and respectful, I think that is the absolute *worst* reason to make a job decision.

Salary is not a one-way escalator that you hop on after school, and gracefully exit at the peak upon retiring. It’s possible your recruiter’s advice was based on the assumption that your employers are likely to ask for your current salary and base their offers off of it. That used to be common practice, but is less and less so because of the fairness issues involved. Comp should be based on the value of your labor to the company, not your past comp, and in many states it is illegal to ask about your salary.

You may choose to take lower salaries at various times in your career — to work at a nonprofit or gov job, to learn new skills, in exchange for more flexibility or vacation time or titles or stock options, or as a result of moving to a different city or country.

I am not a financial advisor, but I am a big believer in retaining optionality. If the opportunity of your dreams came along with a starting salary of 150k, but every penny of your 170k salary is already committed, that’s a big loss of opportunity for you. This is a strong argument for trying to live well within your means, save religiously, and always have fuck you money in the bank. If you’re young and you get a salary bump, consider automatically diverting the raise into savings. Avoiding lifestyle inflation is the most painless way to save.

We have the unfathomable luxury of being incredibly well compensated for what we do. What is that luxury worth if we don’t use it to liberate ourselves, to facilitate happiness and fulfillment?

Questionable Advice: Premature Senior Followup Q’s

Questionable Advice: “How do I feel worthwhile as a manager when my people are doing all the implementing?”

“How do I feel worthwhile as a manager when my people are doing all the implementing?”

— An Engineering Manager

Hey, real quick: how long have you been managing? If it’s less than two years, honey, the answer is “you don’t.” Your feelings about your performance don’t mean much in a new role. If you think you’re crushing it, you probably aren’t. But hey, if you think you’re screwing it up royally, you probably aren’t that either. ☺️

It took years for you to develop reliable instincts as an engineer, right? Then you switched careers and went right back to beginnerhood. That rarely feels good. So just don’t worry about it. Try not to obsess over how well you’re doing or not doing. Just engage your beginner brain, set phasers to “curiosity!” and actively pursue every learning opportunity for a year or two. Your judgment will improve. Give it time.

But experienced managers still struggle with this too. So if that’s you: let’s talk.

job satisfaction feels different for managers

First, let’s be clear: job satisfaction as a manager, should you find it, will feel very different than it did as an engineer. As an engineer, you get that very tactile sense of merging code, solving puzzles and incrementally pushing the business forward. It has a rhythm and a powerful drip, drip, drip of dopamine, and as a manager you will never ever feel that. Sorry! Some people eventually make peace with this, but many never do. No shame in that.

This is partly a function of time and proximity. Manager successes and failures play out over a much longer period of time than the successes and failures of writing and debugging code, and you can only indirectly trace your impact. It can be hard to draw a straight line from cause to effect. Some of your greatest successes may resonate and compound for years to come, yet the person might not remember, may never even have known how you contributed to their triumph. (Hell, you might not either.)

It is also related to your changing relationship with public credit and attribution. It is extremely poor form for managers to go around taking credit for things, so hopefully your org has a sturdy culture of celebrating the people doing the work and not their manager. But if you are used to receiving that stream of praise and recognition, it can be disorienting and deeply demoralizing when it dries up.

Most managers are unreliable narrators

There seems to be precisely one acceptable answer to the question of what motivates managers: loftily waxing on and on about how they get ALL their joy and fulfillment from empowering others and watching other people succeed without ever personally building anything tangible or receiving ANY of the credit. I call bullshit. (This bugs the ever-loving crap out of me.)

It reminds me of the self-abnegating monologues women are supposed to give about how amazing motherhood is, as they’re covered in vomit and haven’t slept in a week.

There is nothing wrong with wanting credit for your work, and affirmation and validation, and there is nothing inherently noble about not wanting those things. Whatever motivates you, motivates you. What  matters is that you are self-aware about your needs, generous with credit, and conscious of who you lean on to get those needs met. Anyway, lots of people who become managers find themselves suddenly adrift and lacking reliable indicators about their job performance.

Part of becoming an effective manager over time is learning to recognize your own contributions and derive your own inner sense of worth. Nobody wants a needy manager. So here’s where I’d start: by locating your impact in the Really Big Stuff, the small personal moments, and any sort of crisis.

1 💜 The Really Big Stuff.

Are your users happy, and your business growing? Are you setting ambitious strategic goals and hitting them? Are your DORA metrics excellent? Are people happy to join your team and report to you? Are they awesome? Great, then you deserve some portion of the credit for that.

Most big shit is unfortunately only truly visible over much longer timelines, **but!** the longer you are a manager the more sensitive your feelers will become, the more they will pick up on subtle hints that betray deeper concerns. The sideways glance that suggests lack of trust, the offhand comment with an edge that sticks with you — those are fleeting clues which you may then delicately and expertly probe and use to disarm bad situations before they deteriorate or detonate.

(And sure, it can be really lovely to watch someone succeed when you know you had a small part to play. Congrats, you earned your salary. I just find it a little creepy and culty to act like this is what every manager must live for. There is nothing whatsoever wrong with you if living vicariously thru your reports’ successes doesn’t do it for you.)

2 💙 Random little moments.

On the flip side are the small and precious moments: did you just make someone feel supported in taking their mental health days? Did you pick up on someone’s anxiety and take a moment to check in them? Did they leave with a smile? Did you amplify someone’s voice, or help them work through a problem, or argue someone into receiving a well-deserved raise? Wielding your manager powers for good can be so easy and so gratifying.

(Seriously — give yourself a little pat on the back. This is the closest you’re going to get to a compliment most weeks. And nobody else is going to do it for you. 🤣)

3 💚 In a crisis.

Every manager will eventually encounter a crisis, and those are the moments that reveal the most about how well you have done your job. Do you have the credibility to speak for your team? Does your manager reach out for your support? Do your peers take you seriously or confide in your? Will people vouch for you? You’ll find out!

Not to put it too harshly, but in a clutch situation, are you a source of calm or are you often on the list of “situations to be managed”? Do you consistently tamp down drama and lower the stakes and the volume, or do you react in ways that amplify and escalate emotionally-charged situations? Do your feelings become other people’s problems? This reflects your ability to regulate your own feelings and emotional impulses under stress, and most of us quite overestimate our own power to self-regulate.

Go to therapy. Practice that mindfulness shit. Find what works for you, but pay attention to the energy you are contributing to any situation.

“Does it even matter if I come to work or not?”

A friend of mine was recently lamenting that it didn’t feel like it mattered if he came to work or missed all his 1x1s or not. What even was the point of showing up, as a manager?

In a way, he’s right. It shouldn’t matter if you’re out for a day, or a week. No single 1×1 should make or break something major, or you were already on terribly thin ice.

It is impossible to predict what the next crisis will be. All you can do in the meantime is keep your sociotechnical systems humming along and steadily work to improve them. Build good relationships and deepen the web of trust around you. Optimize your systems so that your people can spend as much of their time as possible on satisfying, high impact work. Make sure nobody is running ragged or being taken advantage of. Ensure redundancy and resiliency across all social and technical domains. Never stop learning. Keep your troops shiny.

Managerial value accrues over time. You can’t show up in the middle of a crisis and start fixing trust issues, any more than you can be a good coach who only shows up on game days. Train yourself to rejoice when things go smoothly in your absence (and really mean it).

TLDR

In the end, your worth as a manager is seen in the trail you leave behind. The teams that got buy-in to achieve continuous delivery. The coworkers who fondly remember working together and recruit each other, or follow you from job to job for years. The way they saw you advocate for them. Set the bar high enough that their future managers will be compared to you. 🙃

If you are a good manager, you will rack up moments over the years that mean the world to you, heartwarming and vulnerable moments when people share the impact you’ve had on them. Treasure those like rare, unpredictable treats that they are, but don’t confuse them with fuel. It will never be enough to run on.

(And hey — if you’re just starting out, and all this sounds impossibly long-term? — never underestimate the value of just being fucking kind and generous and a pleasure to interact with. The job isn’t a popularity contest — the day will come when your effectiveness means the right people hating you. But that day is not today. And it is hard to be a good manager unless people genuinely enjoy talking to you and affirmatively want to help you meet your goals.)

charity.

Questionable Advice: “How do I feel worthwhile as a manager when my people are doing all the implementing?”

“Why are my tests so slow?” A list of likely suspects, anti-patterns, and unresolved personal trauma.

Over the past couple of weeks I’ve been tweeting a LOT about lead time to deploy: the interval encompassing the time from when the code gets written and when it’s been deployed to production. Also described as “how long it takes you to run CI/CD.”

How important is this?

Fucking central.

Here is a quickie thread from this week, or just go read “Accelerate” like everybody already should have. 🙃

It’s nigh impossible to have a high-performing team with a long lead time, and becomes drastically easier with a dramatically shorter lead time.

🌷 Shorter is always better.
🌻 One mergeset per deploy.
🌹 Deploy should be automatic.

And it should clock in under 15 minutes, all the way from “merging!” to “deployed!”.

Now some people will nod and agree here, and others freak the fuck out. “FIFTEEN MINUTES?” they squall, and begin accusing me of making things up or working for only very small companies. Nope, and nope. There are no magic tricks here, just high standards and good engineering, and the commitment to maintaining your goals quarter by quarter.

If you get CI/CD right, a lot of other critical functions, behaviors, and intuitions are aligned to be comfortably successful and correct with minimal effort. If you get it wrong, you will spend countless cycles chasing pathologies. It’s like choosing to eat your vegetables every day vs choosing a diet of cake and soda for fifty years, then playing whackamole with all the symptoms manifesting on your poor, mouldering body.

Is this ideal achievable for every team, on every stack, product, customer and regulatory environment in the world? No, I’m not being stupid or willfully blind. But I suggest pouring your time and creative energy into figuring out how closely you can approximate the ideal given what you have, instead of compiling all the reasons why you can’t achieve it.

Most of the people who tell me they can’t do this are quite wrong, turns out. And even if you can’t down to 15 minutes, ANY reduction in lead time will pay out massive, compounding, benefits to your team and adjacent teams forever and ever.

So — what was it you said you were working on right now, exactly? that was so important? 🤔

“Cutting my build time by 90%!” — you

Huzzah. 🤠

So let’s get you started! Here, courtesy of my twitterfriends, is a long compiled list of Likely Suspects and CI/CD Offenders, a long list of anti-patterns, and some unresolved personal pain & suffering to hunt down and question when your build gets slow..

✨15 minutes or bust, baby!✨

 
Where it all started: what keeps you from getting under 15 minute CI/CD runs?

Generally good advice.

  • Instrument your build pipeline with spans and traces so you can see where all your time is going. ALWAYS. Instrument.
  • Order tests by time to execute and likelihood of failure.
  • Don’t run all tests, only tests affected by your change
  • Similarly, reduce build scope; if you only change front-end code, only build/test/deploy the front end, and for heaven’s sake don’t fuss with all the static asset generation
  • Don’t hop regions or zones any more than you absolutely must.
  • Prune and expire tests regularly, don’t wait for it to get Really Bad
  • Combine functionality of tests where possible — tests need regular massages and refactors too
  • Pipeline, pipeline, pipeline tests … with care and intention
  • You do not need multiple non-production environment in your CI/CD process. Push your artifacts to S3 and pull them down from production. Fight me on this
  • Pull is preferable to push. (see below)
  • Set a time elapsed target for your team, and give it some maintenance any time it slips by 25%

The usual suspects

  • tests that take several seconds to init
  • setup/teardown of databases (HINT try ramdisks)
  • importing test data, seeding databases, sometimes multiple times
  • rsyncing sequentially
  • rsyncing in parallel, all pulling from a single underprovisioned source
  • long git pulls (eg cloning whole repo each time)
  • CI rot (eg large historical build logs)
  • poor teardown (eg prior stuck builds still running, chewing CPU, or artifacts bloating over time
  • integration tests that spin up entire services (eg elasticsearch)
  • npm install taking 2-3 minutes
  • bundle install taking 5 minutes
  • resource starvation of CI/CD system
  • not using containerized build pipeline
  • …(etc)
 
Continuous deployment to industrial robots in prod?? Props, man.

Not properly separating the streams of “Our Software” (changes constantly) vs “infrastructure” (changes rarely)

  • running cloudformation to setup new load balancers, dbs, etc for an entire acceptance environment
  • docker pulls, image builds, docker pushes container spin up for tests

“Does this really go here?”

  • packaging large build artifacts into different format for distribution
  • slow static source code analysis tools
  • trying to clone production data back to staging, or reset dbs between runs
  • launching temp infra of sibling services for end-to-end tests, running canaries
  • selenium and other UX tests, transpiling and bundling assets

“Have a seat and think about your life choices.”

  • excessive number of dependencies
  • extreme legacy dependencies (things from the 90s)
  • tests with “sleep” in them
  • entirely too large frontends that should be broken up into modules

“We regret to remind you that most AWS calls operate at the pace of ‘Infrastructure’, not ‘Software'”

  • AWS CodeBuild has several minutes of provisioning time before you’re even executing your own code — even a few distinct jobs in a pipeline and you might suffer 15 min of waiting for CodeBuild to do actual work
  • building a new AMI
  • using EBS
  • spinning up EC2 nodes .. sequentially 😱
  • cool it with the AWS calls basically



A few responses were oozing with some unresolved trauma, lol.

Natural Born Opponents: “Just cache it” and “From the top!”

  • builds install correct version of toolchain from scratch each time
  • rebuilding entire project from source every build
  • failure to cache dependencies across runs (eg npm cache not set properly)

“Parallelization: the cause of, and solution to, all CI problems”

  • shared test state, which prevents parallel testing due to flakiness and non-deterministic test results
  • not parallelizing tests
 
 
I have so many questions….

Thanks to @wrd83, @sorenvind, @olitomli, @barney_parker, @dastbe, @myajpitz, @gfodor, @mrz, @rwilcox, @tomaslin, @pwyliu, @runewake2, @pdehlkefor, and many more for their contributions!

P.S. what did I say about instrumenting your build pipeline? For more on honeycomb + instrumentation, see this thread. Our free tier is incredibly generous, btw ☺️

Stay tuned for more long form blog posts on this topic. Coming soon. 🌈

charity

P.S. this blog post is the best thing i’ve ever read about reducing your build time. EVER.

“Why are my tests so slow?” A list of likely suspects, anti-patterns, and unresolved personal trauma.