On Friday Deploys: Sometimes that Puppy Needs Murdering (xpost)

(Cross posted from its original source)

‘Tis the season that one of my favorite blog posts gets pulled out and put in rotation, much like “White Christmas” on the radio station. I’m speaking, of course, of “Friday Deploy Freezes are Exactly Like Murdering Puppies” (old link on WP).

This feels like as good a time as any to note that I am not as much of an extremist as people seem to think I am when it comes to Friday deploys, or deploy freezes in general.

(Sometimes I wonder why people think I’m such an extremist, and then I remember that I did write a post about murdering puppies. Ok, ok. Point taken.)

Take this recent thread from LinkedIn, where Michael Davis posted an endorsement of my Puppies article along with his own thoughts on holiday code freezes, followed by a number of smart, thoughtful comments on why this isn’t actually attainable for everyone. Payam Azadi talks about an “icing” and “defrosting” period where you ease into and out of deploy freezes (never heard of this, I like it!), and a few other highly knowledgeable folks chime in with their own war stories and cautionary tales.

It’s a great thread, with lots of great points. I recommend reading it. I agree with all of them!!

For the record, I do not believe that everyone should get rid of deploy freezes, on Fridays or otherwise.

If you do not have the ability to move swiftly with confidence, which in practice means “you can generally find problems in your new code before your customers do”, which generally comes down to the quality and usability of your observability tooling, and your ability to explore high cardinality dimensions in real time (which most teams do not have), then deploy freezes before a holiday or a big event, or hell, even weekends, are probably the sensible thing to do.

If you can’t do the “right” thing, you find a workaround. This is what we do, as engineers and operators.

Deploy freezes are a hack, not a virtue

Look, you know your systems better than I do. If you say you need to freeze deploys, I believe you.

Honestly, I feel like I’ve always been fairly pragmatic about this. The one thing that does get my knickers in a twist is when people adopt a holier-than-thou posture towards their Friday deploy freezes. Like they’re doing it because they Care About People and it’s the Right Thing To Do and some sort of grand moral gesture. Dude, it’s a fucking hack. Just admit it.

It’s the best you can do with the hand you’ve been dealt, and there’s no shame in that! That is ALL I’m saying. Don’t pat yourself on the back, act a little sheepish, and I am so with you.

I think we can have nice things

I think there’s a lot of wisdom in saying “hey, it’s the holidays, this is not the time to be rushing new shit out the door absent some specific forcing function, alright?”

My favorite time of year to be at work (back when I worked in an office) was always the holidays. It was so quiet and peaceful, the place was empty, my calendar was clear, and I could switch gears and work on completely different things, out of the critical line of fire. I feel like I often peaked creatively during those last few weeks of the year.

I believe we can have the best of both worlds: a yearly period of peace and stability, with relatively low change rate, and we can evade the high stakes peril of locks and freezes and terrifying January recoveries.

How? Two things.

Don’t freeze deploys. Freeze merges.

To a developer, ideally, the act of merging their changes back to main and those changes being deployed to production should feel like one singular atomic action, the faster the better, the less variance the better. You merge, it goes right out. You don’t want it to go out, you better not merge.

The worst of both worlds is when you let devs keep merging diffs, checking items off their todo lists, closing out tasks, for days or weeks. All these changes build up like a snowdrift over a pile of grenades. You aren’t going to find the grenades til you plow into the snowdrift on January 5th, and then you’ll find them with your face. Congrats!

If you want to freeze deploys, freeze merges. Let people work on other things. I assure you, there is plenty of other valuable work to be done.

Don’t freeze deploys unless your goal is to test deploy freezes

The second thing is a corollary. Don’t actually freeze deploys, unless your SREs and on call folks are bored and sitting around together, going “wouldn’t this be a great opportunity to test for memory leaks and other systemic issues that we don’t know about due to the frequency and regularity of our deploys?”

If that’s you, godspeed! Park that deploy engine and sit on the hood, let’s see what happens!

People always remember the outages and instability that we trigger with our actions. We tend to forget about the outages and instability we trigger with our inaction. But if you’re used to deploying every day, or many times a day: first, good for you. Second, I bet you a bottle of whiskey that something’s gonna break if you go for two weeks without deploying.

I bet you the good shit. Top shelf. 🥃

This one is so easy to mitigate, too. Just run the deploy process every day or two, but don’t ship new code out.

Alright. Time for me to go fly to my sister’s house. Happy holidays everyone! May your pagers be silent and your bellies be full, and may no one in your family or friend group mention politics this year!

💜💙💚💛🧡❤️💖
charity

Me and Bubba and Miss Pinky Persnickety

P.S. The title is hyperbole! I was frustrated! I felt like people were intentionally misrepresenting my point and my beliefs, so I leaned into it. Please remember that I grew up on a farm and we ended up eating most of our animals. Possibly I am still adjusting to civilized life. Also, I have two cats and I love them very much and have not eaten either of them yet.

A few other things I’ve written on the topic:

On Friday Deploys: Sometimes that Puppy Needs Murdering (xpost)

2025 was for AI what 2010 was for cloud (xpost)

The satellite, experimental technology has become the mainstream, foundational tech. (At least in developer tools.) (xposted from new home)

I was at my very first job, Linden Lab, when EC2 and S3 came out in 2006. We were running Second Life out of three datacenters, where we racked and stacked all the servers ourselves. At the time, we were tangling with a slightly embarrassing data problem in that there was no real way for users to delete objects (the Trash folder was just another folder), and by the time we implemented a delete function, our ability to run garbage collection couldn’t keep up with the rate of asset creation. In desperation, we spun up an experimental project to try using S3 as our asset store. Maybe we could make this Amazon’s problem and buy ourselves some time?

Why yes, we could. Other “experimental” projects sprouted up like weeds: rebuilding server images in the cloud, running tests, storing backups, load testing, dev workstations. Everybody had shit they wanted to do that exceeded our supply of datacenter resources.

By 2010, the center of gravity had shifted. Instead of “mainstream engineering” (datacenters) and “experimental” (cloud), there was “mainstream engineering” (cloud) and “legacy, shut it all down” (datacenters).

Why am I talking about the good old days? Because I have a gray beard and I like to stroke it, child. (Rude.)

And also: it was just eight months ago that Fred Hebert and I were delivering the closing keynote at SRECon. The title is “AIOps: Prove It! An Open Letter to Vendors Selling AI for SREs”, which makes it sound like we’re talking to vendors, but we’re not; we’re talking to our fellow SREs, begging them to engage with AI on the grounds that it’s not ALL hype.

We’re saying to a room of professional technological pessimists that AI needs them to engage. That their realism and attention to risk is more important than ever, but in order for their critique to be relevant and accurate and be heard, it has to be grounded in expertise and knowledge. Nobody cares about the person outside taking potshots.

This talk recently came up in conversation, and it made me realize—with a bit of a shock—how far my position has come since then.

That was just eight months ago, and AI still felt like it was somehow separable, or a satellite of tech mainstream. People would gripe about conferences stacking the lineup with AI sessions, and AI getting shoehorned into every keynote.

I get it. I too love to complain about technology, and this is certainly an industry that has seen its share of hype trains: dotcom, cloud, crypto, blockchain, IoT, web3, metaverse, and on and on. I understand why people are cynical—why some are even actively looking for reasons to believe it’s a mirage.

But for me, this year was for AI what 2010 was for the cloud: the year when AI stopped being satellite, experimental tech and started being the mainstream, foundational technology. At least in the world of developer tools.

It doesn’t mean there isn’t a bubble. Of COURSE there’s a fucking bubble. Cloud was a bubble. The internet was a bubble. Every massive new driver of innovation has come with its own frothy hype wave.

But the existence of froth doesn’t disprove the existence of value.

Maybe y’all have already gotten there, and I’m the laggard. 😉 (Hey, it’s an SRE’s job to mind the rear guard.) But I’m here now, and I’m excited. It’s an exciting time to be a builder.

2025 was for AI what 2010 was for cloud (xpost)

Hello World (xpost from substack)

I recently posted a short note about moving from WordPress to Substack after ten years on WP. A number of people replied, commented, or DM’d me to express their dismay, along the lines of “why are you supporting Nazis?”. A few begged me to reconsider.

So I did. I paused the work I was doing on migration and setup, and I paused the post I was drafting on Substack. I read the LeaveSubstack site, and talked with its author (thank you, Sean 💜). I had a number of conversations with people I consider experts in content creation, and people I consider stakeholders (my coworkers and customers), as well as my own personal Jiminy Cricket, Liz Fong-Jones. I also slept on it.

I’ve decided to stay.

I said I would share my thinking once I made a decision, and it comes down to this: I have a job to do, and I haven’t been doing it.

I have not been doing my job 💔

I’ve gone increasingly dark on social media over the past few years, and while this has been delightful from a personal perspective, I have developed an uncomfortable conviction that I have also been abdicating a core function of my job in doing so.

The world of software is changing—fast. It’s exciting. But it is not enough to have interesting ideas and say them once, or write them down in a single book. You need to be out there mixing it up with the community every day, or at least every week. You need to be experimenting with what language works for people, what lands, what sparks a light in people’s eyes.

You (by which I mean me) also need to be listening more— reading and interacting with other people’s thoughts, volleying back and forth, polishing each other like diamonds.

How many times did we define observability or high cardinality or the sins of aggregation? Cool. How many times have we talked about the ways that AI has made the honeycomb vision technologically realizable for the first time? Uh, less, by an order of thousands.

Write more, engage with mainstream tech

My primary goal is to get back into the mainstream of technical discussion and mix it up a lot more. Unfortunately, to the extent there is a tech mainstream, it still exists on X. I am not ruling out the possibility of returning, but I would strongly prefer not to. I’m going to see if I can do my job by being much more active on LinkedIn and Substack.

My secondary goal is to remove friction and barriers to posting. WordPress just feels so heavyweight. Like I’m trying to craft a web page, not write a quick post. Substack feels more like writing an email. I’ve been trying to make myself post more all year on WP, and it hasn’t happened. I have a lot of shit backed up to talk about, and I think this will help grease the wheels.

There are platforms that are outside the pale, that exist solely to platform and support Nazis and violent extremists—your Gabs, your Parlers. Substack is very far from being one of those. All of these content platforms exist on some continuum of grey, and governance is hard, hard, hard in an era of mainstreaming right wing extremism.

Substack may not make all the decisions I would make, but I feel like it is a light dove grey, all things considered.

Some mitigations

I have received some tips and done some research on how to minimize the value of my writing to Substack. Here they are.

  • Substack makes money from paid subscriptions, so I don’t accept money. Ever.
  • I am told that if you use email or RSS, it benefits Substack less than if you use the app. RSS feed here.
  • I will set up an auto-poster from Substack to WordPress (at some point… probably whenever I find the time to fix the url rewriter and change domain pointer)

I hope these will allow conscientous objectors continue to read and engage with my work, but I also understand if not.

A vegan friend of mine once used an especially vivid metaphor to indignantly tell us why no, he could NOT just pick the meat and dairy off his plate and eat the vegetables and grains left behind (they were not cooked together). He said, “If somebody shit on your plate, would you just pick the shit off and keep eating?”

So. If Substack is the shit in your social media plate, and you feel morally obligated to reject anything that has ever so much as touched the domain, I can respect that.

Everyone has to decide which battles are theirs to fight. This one is not mine.

💜💙💚💛🧡❤️💖,
charity.

Hello World (xpost from substack)

Moving from WordPress to Substack

Well, shit.

I wrote my first blog post in this space on December 27th, 2015 — almost exactly a decade ago.

“Hello, world.”

I had just left Facebook, hadn’t yet formally incorporated Honeycomb, and it just felt like it was time, long past time for me to put something up and start writing.

Ten years later, it feels long past time for me to do something else. I despise WP (who doesn’t?), and there’s so much friction in getting a post out that I just don’t do it. Plus, it’s clear that to the extent that there is a vibrant ecosystem of tech writers in longform, it’s on substack. I miss tech twitter, always will. Time to give substack a try.

I’ve been working on the second edition of “Observability Engineering” for much of this year, and I have learned SO MUCH in the writing process. As soon as these rough drafts have all been turned in, I will be streaming my thoughts out via substack. They are burning a hole in my brain the longer I hold them in.

Housekeeping notes:

  • I’ve tried to export the email subscribers and import them into substack, but it’s held up for manual review. I don’t know what that means.
  • I won’t be able to export and bring along the comments you folks have left over the years. I’m sorry. 🙁
  • I am going to leave charity.wtf pointed here for the foreseeable future, even tho I am working to port over the corpus of posts. I don’t want to break anyone’s bookmarks or article links, so I’ll leave it up here until / unless I find a solution.

If you want to go subscribe, I’m at charitydotwtf.substack.com. Here’s the blurb:

Thank you all for a wonderful 10 years together. WordPress may be a piece of shit, but the community I’ve found here has been anything but. I hope to see you on the other side.

💜💙💚💛🧡❤️💖

charity

 

Moving from WordPress to Substack

From Cloudwashing to O11ywashing

I was just watching a panel on observability, with a handful of industry executives and experts who shall remain nameless and hopefully duly obscured—their identities are not the point, the point is that this is a mainstream view among engineering executives and my head is exploding.

Scene: the moderator asked a fairly banal moderator-esque question about how happy and/or disappointed each exec has been with their observability investments.

One executive said that as far as traditional observability tools are concerned (“are there faults in our systems?”), that stuff “generally works well.”

However, what they really care about is observing the quality of their product from the customer’s perspective. EACH customer’s perspective.

Nines don't matter if users aren't happy
Nines don’t matter if users aren’t happy

“Did you know,” he mused, “that there are LOTS of things that can interrupt service or damage customer experience that won’t impact your nines of availability?”

(I begin screaming helplessly into my monitor.)

“You could have a dependency hiccup,” he continued, oblivious to my distress. “There could be an issue with rendering latency in your mobile app. All kinds of things.”

(I look down and realize that I am literally wearing this shirt.)

He finishes with,“And that is why we have invested in our own custom solution to measure key workflows through startup payment and success.”

(I have exploded. Pieces of my head now litter this office while my headless corpse types on and on.)

It’s twenty fucking twenty five. How have we come to this point?

 

Observability is now a billion dollar market for a meaningless term

My friends, I have failed you.

It is hard not to register this as a colossal fucking failure on a personal level when a group of modern, high performing tech execs and experts can all sit around a table nodding their heads at the idea that “traditional observability” is about whether your systems are UP👆 or DOWN👇, and that the idea of observing the quality of service from each customer’s perspective remains unsolved! unexplored! a problem any modern company needs to write custom tooling from scratch to solve. 

This guy is literally describing the original definition of observability, and he doesn’t even know it. He doesn’t know it so hard that he went and built his own thing.

You guys know this, right? When he says “traditional observability tools”, he means monitoring tools. He means the whole three fucking pillars model: metrics, logging, and tracing, all separate things. As he notes, these traditional tools are entirely capable of delivering on basic operational outcomes (are we up, down, happy, sad?). They can DO this. They are VERY GOOD tools if that is your goal.

But they are not capable of solving the problem he wants to solve, because that would require combining app, business, and system telemetry in a unified way. Data that is traceable, but not just tracing. With the ability to slice and dice by any customer ID, site location, device ID, blah blah. Whatever shall we call THAT technological innovation, when someone invents it? Schmobservability, perhaps?

So anyway, “traditional observability” is now part of the mainstream vernacular. Fuck. What are we going to do about it? What CAN be done about it?

From cloudwashing to o11ywashing

I learned a new term yesterday: cloudwashing. I learned this from Rick Clark, who tells a hilarious story about the time IBM got so wound up in the enthusiasm for cloud computing that they reclassified their Z series mainframe as “cloud” back in 2008. 

(Even more hilarious: asking Google about the precipitating event, and following the LLM down a decade-long wormhole of incredibly defensive posturing from the IBM marketing department and their paid foot soldiers in tech media about how this always gets held up as an example of peak cloudwashing but it was NOT AT ALL cloudwashing due to being an extension of the Z/Series Mainframe rather than a REPLACEMENT of the Z/Series Mainframe, and did you know that Mainframes are bigger business and more relevant today than ever before?)

(Sorry, but I lost a whole afternoon to this nonsense, I had to bring you along for the ride.)

Rick says the same thing is happening right now with observability. And of course it is. It’s too big of a problem, with too big a budget: an irresistible target. It’s not just the legacy behemoths anymore. Any vendor that does anything remotely connected to telemetry is busy painting on a fresh coat of o11ywashing. From a marketing perspective, It would be irresponsible not to.

How to push back on *-washing

Anyway, here are the key takeaways from my weekend research into cloudwashing.

  1. This o11ywashing problem isn’t going away. It is only going to get bigger, because the problem keeps getting bigger, because the traditional vendors aren’t solving it, because they can’t solve it.

  2. The Gartners of the world will help users sort this out someday, maybe, but only after we win. We can’t expect them to alienate multibillion dollar companies in the pursuit of technical truth, justice and the American Way. If we ever want to see “Industry Experts” pitching in to help users spot o11ywashing, as they eventually did with cloudwashing (see exhibit A), we first need to win in the market.
    How to Spot Cloudwashing
    Exhibit A: “How to Spot Cloudwashing”

  3. And (this is the only one that really matters.) we have to do a better job of telling this story to engineering executives, not just engineers. Results and outcomes, not data structures and algorithms.

    (I don’t want to make this sound like an epiphany we JUST had…we’ve been working hard on this for a couple years now, and it’s starting to pay off. But it was a powerful confirmation.)

Talking to execs is different than talking to engineers

When Christine and I started Honeycomb, nearly ten years ago, we were innocent, doe-eyed engineers who truly believed on some level that if we just explained the technical details of cardinality and dimensionality clearly and patiently enough to the world, enough times, the consequences to the business would become obvious to everyone involved.

It has now been ten years since I was a hands-on engineer every day (say it again, like pressing on a bruise makes it hurt less), and I would say I’ve been a decently functioning exec for about the last three or four of those years. 

What I’ve learned in that time has actually given me a lot of empathy for the different stresses and pressures that execs are under. 

I wouldn’t say it’s less or more than the stresses of being an SRE on call for some of the world’s biggest databases, but it is a deeply and utterly different kind of stress, the kind of stress less expiable via fine whiskey and poor life choices. (You just wake up in the morning with a hangover, and the existential awareness of your responsibilities looming larger than ever.)

This is a systems problem, not an operational one

There is a lot of noise in the field, and executives are trying to make good decisions that satisfy all parties and constraints amidst the unprecedented stress-panic-opportunity-terror of AI changing everything. That takes storytelling skills and sales discipline on our part, in addition to technical excellence.

Companies are dumping more and more and more money into their so-called observability tools, and not getting any closer to a solution. Nor will they, so long as they keep thinking about observability in terms of operational outcomes (and buying operational tools). Observability is a systems problem. It’s the most powerful lever in your arsenal when it comes to disrupting software doom spirals and turning them into positive feedback loops. Or it should be.

As Fred Hebert might say, it’s great you’re so good at firefighting, but maybe it’s time to go read the city fire codes.

Execs don’t know what they don’t know, because we haven’t been speaking to them. But we’re starting to.

What will be the next term that gets invented and coopted in the search to solve this problem?

Where to start, with a project so big? Google’s AI says that “experts suggest looking for specific features to identify true cloud observability solutions versus cloudwashed o11ywashed ones.”

I guess this is a good place to start as any: If your “observability” tooling doesn’t help you understand the quality of your product from the customer’s perspective, EACH customer’s perspective, it isn’t fucking observability. 

It’s just monitoring dressed up in marketing dollars.

Call it o11ywashing.

From Cloudwashing to O11ywashing

How many pillars of observability can you fit on the head of a pin?

My day started off with an innocent question, from an innocent soul.

“Hey Charity, is profiling a pillar?”

I hadn’t even had my coffee yet.

“Someone was just telling me that profiling is the fourth pillar of observability now. I said I think profiling is a great tool, but I don’t know if it quite rises to the level of pillar. What do you think?”

What….do.. I think.

What I think is, there are no pillars. I think the pillars are a fucking lie, dude. I think the language of pillars does a lot of work to keep good engineers trapped inside a mental model from the 1980s, paying outrageous sums of money for tooling that can’t keep up with the chaos and complexity of modern systems.

Here is a list of things I have recently heard people refer to as the “fourth pillar of observability”:

  • Profiling
  • Tokens (as in LLMs)
  • Errors, exceptions
  • Analytics
  • Cost

Is it a pillar, is it not a pillar? Are they all pillars? How many pillars are there?? How many pillars CAN there be? Gaahhh!

This is not a new argument. Take this ranty little tweet thread of mine from way back in 2018, for starters.

 

Or perhaps you have heard of TEMPLE: Traces, Events, Metrics, Profiles, Logs, and Exceptions?

Or the “braid” of observability data, or “They Aren’t Pillars, They’re Lenses”, or the Lightstep version: “Three Pillars, Zero Answers” (that title is a personal favorite).

Alright, alright. Yes, this has been going on for a long time. I’m older now and I’m tireder now, so here’s how I’ll sum it up.

Pillar is a marketing term.
Signal is a technical term.

So “is profiling a pillar?” is a valid question, but it’s not a technical question. It’s a question about the marketing claims being made by a given company. Some companies are building a profiling product right now, so yes, to them, it is vitally important to establish profiling as a “pillar” of observability, because you can charge a hell of a lot more for a “pillar” than you can charge for a mere “feature”. And more power to them. But it doesn’t mean anything from a technical point of view.

On the other hand, “signal” is absolutely a technical term. The OpenTelemetry Signals documentation, which I consider canon, says that OTel currently supports Traces, Metrics, Logs, and Baggage as signal types, with Events and Profiles at the proposal/development stage. So yes, profiling is a type of signal.

The OTel docs define a telemetry signal as “a type of data transmitted remotely for monitoring and analysis”, and they define a pillar as … oh, they don’t even mention pillars? like at all??

I guess there’s your answer.

And this is probably where I should end my piece. (Why am I still typing…. 🤔)

Pillars vs signals

First of all, I want to stress that it does not bother me when engineers go around talking about pillars. Nobody needs to look at me guiltily and apologize for using the term ‘pillar’ atBunnies Addendum (For the Buffy Fans) - En Tequila Es Verdad the bar after a conference because they think I’m mad at them. I am not the language police, it is not my job to go around enforcing correct use of technical terms. (I used to, I know, and I’m sorry! 😆)

When engineers talk about pillars of observability, they’re just talking about signals and signal types, and “pillar” is a perfectly acceptable colloquialism for “signal”.

When a vendor starts talking about pillars, though — as in the example above! — it means they are gearing up to sell you something: another type of signal, siloed off from all the other signals you send them. Your cost multiplier is about to increment again, and then they’re going to start talking about how Important it is that you buy a product for each and every one of the Pillars they happen to have.

As a refresher: there are two basic architecture models used by observability companies, the multiple pillars model and the unified storage model (aka o11y 2.0). The multiple pillars model is to store every type of signal in a different siloed storage location — metrics, logs, traces, profiling, exceptions, etc, everybody gets a database! The unified storage model is to store all signals together in ONE database, preserving context and relationships, so you can treat data like data: slice and dice, zoom in, zoom out, etc.

Most of the industry giants were built using the pillars model, but Honeycomb (and every other observability company founded post-2019) has built using the unified storage model, building wide, structured log events on a columnar storage engine with high cardinality support, and so on.

Bunny-hopping from pillar to pillar

When you use each signal type as a standalone pillar, this leads to an experience I think of as “bunny products” 🐇 where the user is always hopping from pillar to pillar. You see something on your metrics dashboard that looks scary? hop-hop to your logs and try to find it there, using grep and search and matching by timestamps. If you can find the right logs, then you need to trace it, so you hop-hop-hop to your traces and repeat your search there. With profiling as a pillar, maybe you can hop over to that dataset too.🐇🐰

The amount of data duplication involved in this model is mind boggling. You are literally storing the same information in your metrics TSDB as you are in your logs and your traces, just formattedThe 30 Best Bunny Rabbit Memes - Hop to Pop differently. (I never miss an opportunity to link to Jeremy Morrell’s masterful doc on instrumenting your code for wide events, which also happens to illustrate this nicely.) This is insanely expensive. Every request that enters your system gets stored how many times, in how many signals? Count it up; that’s your cost multiplier.

Worse, much of the data that connects each “pillar” exists only in the heads of the most senior engineers, so they can guess or intuit their way around the system, but anyone who relies on actual data is screwed. Some vendors have added an ability to construct little rickety bridges post hoc between pillars, e.g. “this metric is derived from this value in this log line or trace”, but now you’re paying for each of those little bridges in addition to each place you store the data (and it goes without saying, you can only do this for things you can predict or hook up in the first place).

The multiple pillars model (formerly known as observability 1.0) relies on you believing that each signal type must be stored separately and treated differently. That’s what the pillars language is there to reinforce. Is it a Pillar or not?? It doesn’t matter because pillars don’t exist. Just know that if your vendor is calling it a Pillar, you are definitely going to have to Pay for it. 😉

Zooming in and out

But all this data is just.. data. There is no good reason to silo signals off from each other, and lots of good reasons not to. You can derive metrics from rich, structured data blobs, or append your metrics to wide, structured log events. You can add span IDs and visualize them as a trace. The unified storage model (“o11y 2.0”) says you should store your data once, and do all the signal processing in the collection or analysis stages. Like civilized folks.

Anya Bunny Quote - Etsy
All along, Anya was right

From the perspective of the developer, not much changes. It just gets easier (a LOT easier), because nobody is harping on you about whether this nit of data should be a metric, a log, a trace, or all of the above, or if it’s low cardinality or high cardinality, or whether the cardinality of the data COULD someday blow up, or whether it’s a counter, a gauge, a heatmap, or some other type of metric, or when the counter is going to get reset, or whether your heatmap buckets are defined at useful intervals, or…or…

Instead, it’s just a blob of json. Structured data.. If you think it might be interesting to you someday, you dump it in, and if not, you don’t. That’s all. Cognitive load drops way down..

On the backend side, we store it once, retaining all the signal type information and connective tissue.

It’s the user interface where things change most dramatically. No more bunny hopping around from pillar to pillar, guessing and copy-pasting IDs and crossing your fingers. Instead, it works more like the zoom function on PDFs or Google maps.

You start with SLOs, maybe, or a familiar-looking metrics dashboard. But instead of hopping, you just.. zoom in. The SLOs and metrics are derived from the data you need to debug with, so you’re just like.. “Ah what’s my SLO violation about? Oh, it’s because of these events.” Want to trace one of them? Just click on it. No hopping, no guessing, no pasting IDs around, no lining up time stamps.

Zoom in, zoom out, it’s all connected. Same fucking data.

“But OpenTelemetry FORCES you to use three pillars”

There’s a misconception out there that OpenTelemetry is very pro-three pillars, and very anti o11y 2.0. This is a) not true and b) actually the opposite. Austin Parker has written a voluminous amount of material explaining that actually, under the hood, OTel treats everything like one big wide structured event log.

As Austin puts it, “OpenTelemetry, fundamentally, unifies telemetry signals through shared, distributed context.” However:

“The project doesn’t require you to do this. Each signal is usable more or less independently of the other. If you want to use OpenTelemetry data to feed a traditional ‘three pillars’ system where your data is stored in different places, with different query semantics, you can. Heck, quite a few very successful observability tools let you do that today!”

“This isn’t just ‘three pillars but with some standards on top,’ it’s a radical departure from the traditional ‘log everything and let god sort it out’ approach that’s driven observability practices over the past couple of decades.”

You can use OTel to reinforce a three pillars mindset, but you don’t have to. Most vendors have chosen to implement three pillarsy crap on top of it, which you can’t really hold OTel responsible for. One[1] might even argue that OTel is doing as much as it can to influence you in the opposite direction, while still meeting Pillaristas where they’re at.

A postscript on profiling

What will profiling mean in a unified storage world? It just means you’ll be able to zoom in to even finer and lower-level resolution, down to syscalls and kernel operations instead of function calls. Like when Google Maps got good enough that you could read license plates instead of just rooftops.

Admittedly, we don’t have profiling yet at Honeycomb. When we did some research into the profiling space, what we learned was that most of the people who think they’re in desperate need of a profiling tool are actually in need of a good tracing tool. Either they didn’t have distributed tracing or their tracing tools just weren’t cutting it, for reasons that are not germane in a Honeycomb tracing world.

We’ll get to profiling, hopefully in the near-ish future, but for the most part, if you don’t need syscall level data, you probably don’t need profiling data either. Just good traces.

Also… I did not make this site or have any say whatsoever in the building of it, but I did sign the manifesto[2] and every day that I remember it exists is a day I delight in the joy and fullness of being alive: kill3pill.com 📈

Kill Three Pillars

Hop hop, little friends,
~charity

 

[1] Austin argues this. I’m talking about Austin, if not clear enough.
[2] Thank you, John Gallagher!!

How many pillars of observability can you fit on the head of a pin?

Got opinions on observability? I could use your help (once more, with feeling)

Last month I dropped a desperate little plea for help in this space, asking people to email me any good advice and/or strong opinions they happened to have on the topic of buying software.

I wasn’t really sure what to expect — desperate times, desperate measures — but holy crap, you guys delivered. To the many people who took the time to write up your experiences and expertise for me, and suffer through rounds of questions and drafts: ✨thank you✨. And thank you, too, to those of you who forwarded my queries along to experts in your network and asked for help on my behalf.

I learned a LOT about buying software and managing vendor relationships in the process of writing this. Honestly, this chapter is shaping up to be one of the things I’m most excited about for the second edition of the book.

Why I’m excited about the software buying chapter (& you should be too)

I’m imagining you reading this with a skeptical expression and an arched eyebrow. “Really, Charity…‘how to buy software’ doesn’t exactly suggest peak engineering prowess.”

Au contraire, my friends. I’ve come to believe that vendor engineering is one of the subtlest and most powerful practical applications of deep subject matter expertise, and some of the highest leverage work an engineer can do. How often do you get to make decisions that leverage the labor of hundreds or thousands of engineers per year, for fractions of pennies on the dollar? How many of the decisions you make will have an impact on every single engineer you work with and their ability to do their jobs well, as well as the experience of every single customer?

If you think I’m hyperventilating a bit, nah; this is entry level shit. In the book, I tell the story of the best engineer I ever worked with, and how I watched him alter the trajectory of multiple other companies, none of which he was working for, buying from, or formally connected to in any way — in the space of a few conversations. It upended my entire worldview about what it can look like for an engineer to wield great power.

Doing this stuff well takes both technical depth and technical breadth, in addition to systems thinking and knowledge of the business. It is one of the only ways a staff+ engineer can acquire and develop executive-level communication, strategy, and execution skills while remaining an individual contributor.

I’ve been wanting to write about this for YEARS. Anyway — ergh! — I’m rambling now. That was not what I came here to talk about, I’m just excited. Back to the point.

My second (and final) round of questions

I got so much out of your thoughtful responses that I thought I’d press my luck and put a few more questions out to the universe, before it’s too late.

These questions speak to areas where I worry that my writing may be a little weak or uninformed, or too far away from the world where people are using the “three pillars” model (aka multiple pillars or o11y 1.0) and happy about it. I don’t know many (any??) of those people, which suggests some pretty heavy selection bias.

I don’t expect anyone to answer all the questions; if one or two resonate with you, write about those and ignore the rest. If there’s something I didn’t ask that I should have asked, answer that. Something I’ve written in the past that bugged you that you hope I won’t say again? Tell me! We are almost out of time ⌛ so gimme what you got. 🙌

On migrations:

📈 Have you ever migrated from one observability vendor to another? If so, what did you learn? What was the hardest part, what took you by surprise? What do you wish you could go back in time and tell your self at the start?

📈 If you ran (or were involved in) a large scale migration or tool change… how did you structure the process? Like, was it team by team, service by service, product by product? Did you have a playbook? What did you do to make it fun or push through organizational inertia? How long did it take?

On managing costs for the traditional three pillars:

📈 For orgs that are using Datadog, Grafana, Chronosphere, or another traditional three pillars architecture.. How would you describe your approach to cutting and controlling costs? Pro tips and/or comprehensive strategy.

📈 Alternately, if there are particular blog posts with advice you have followed and can personally vouch for, would you send me a link?

📈 How do you guide your software engineers on which data to send to which place — metrics, logs, traces, errors/exceptions, profiling, etc? How do you manage cardinality? How do you work to keep the pillars in sync, or are there any particular tips and tricks you have for linking / jumping between the data sources?

📈 How many ongoing engineering cycles does it take to manage and maintain costs, once you’ve gotten them to a sustainable place?

On managing costs at massive scale:

(Especially for people who work at a large enterprise, the kind with multiple business units, but others welcome too!):

  • Do you use tiers of service for managing costs? How do you define those?
  • How do new tools get taken for a spin? (Like, sometimes there is an office of the CTO with carte blanche to try new things and evaluate them for the rest of the org)
  • How do you use telemetry pipelines?

Observability teams (quick poll):

📈 If you have an observability team, how big is it? What part of the org does it report up into? Roughly how many engineers does that team support?

📈 If you don’t have an observability team — and you have more than, say, 300 engineers — who owns observability? Platform? SRE? Other?

A grab bag:

📈 Build vs Buy: If you built your own observability tool(s)…. What were the reasons? What does it do? Would you make the same decision today?

📈 OpenTelemetry: If your team has weighed the pros and cons of adopting OTel and ultimately decided not to, for technical or philosophical reasons (i.e. not just “we’re too busy”) — what are those reasons?

📈 Instrumentation: what do you do to try and remove cognitive overhead for engineers? How much have you been able to make automatic and magical, and where has the magic failed?

📈 Consolidation: I would love to hear any thoughts on tool consolidation vs tool proliferation. Is this primarily driven by execs, or do technical users care too? Is it driven by cost concerns, usability, or something else?

edited on 2025-10-15 to add… oh crap, one last question:

📈 Open source: Are you using open source observability tools, and if so, are these your primary tools or one piece of a comprehensive tooling strategy? If the latter, could you describe that strategy for me?

Send it to me in an email

Please send me your opinions or answers in an email, to my first name at honeycomb dot io, with the subject line “Observability questions”.

If I end up cribbing from your material, it okay for me to print your name? (As in, “thanks to the people who informed my thinking on this subject, abc xyz etc”). I will not mention your employer or where you work, don’t worry.

If you send it to me more than a week from now, I probably won’t be able to use it. Augh, I wish I had thought of this in JUNE!!! #ragrets

✨THANK YOU✨

I know this is an incredibly time consuming thing to ask of someone, and I can’t express how much I appreciate your help.

P.S. Yes, the title is absolutely a reference to the Buffy musical. Hey, I had to give you guys something fun to read along with my second bleg in less than a month (do people still say “bleg”??).

6 Musical Episodes of TV Shows That Deserve an Encore

P.P.S. Grammar quiz of the day: should my title read “opinions ABOUT observability” or “opinions ON observability” ??

GREAT QUESTION — and, as it turns out, the preposition you choose may reveal more than you realized.

“About” is used to introduce a topic or subject in a broad, vague, or approximate sense, while “on” is used to signal more detailed, specific, formal or serious subject matter (as well as physical objects). “Let’s talk about dinner” vs “she delivered a lecture on why AI is trying to kill babies.”

Or as Xander says, “To read makes our English speaking good.”

The earth is doomed,
~charity

Got opinions on observability? I could use your help (once more, with feeling)

Are you an experienced software buyer? I could use some help.

If it seems like I’ve been relatively quiet lately on social media and my blog, that’s because I have. Liz, Austin, George and I have been busy toiling away on the second edition of “Observability Engineering” ever since April or May. I personally have been trying to spend 75-80% of my time on the book since May.

Have I been successful in that attempt? No. But I’m trying. Progress is being made. Hopefully just a few more weeks of drafting and we’ll be on to edits, and on to your grubby little paws by May-ish.

The world has changed A LOT since we wrote the first edition, in 2019-2022. Do you know, the phrase “observability engineering teams” doesn’t even occur in the first edition of the book? Try and search — it can’t be found! Even the phrase “observability teams” doesn’t pop up til near the end, and when it does, we are referring to those few teams that choose to build their own observability tools from scratch.

These days, observability engineering teams are everywhere. Which is why we are adding a whole new section, a sizable one, called “Observability Governance.” The governance section will have a bunch of chapters on topics like how to staff these teams, where they should fit in the org chart, how to buy good tools, how to integrate them, how to manage costs, how to make the business case up the chain to senior execs, how to manage schemas and semantic conventions at scale, and much much more.

The problem

The problem is, I’ve never really bought software. Not like this. I’ve never even worked at a  truly large, software-buying enterprise tech company. So I am not well equipped to give good advice on questions like:

  • How do you shop around for options?
  • What are some signs you may need to suck it up and change vendors?
  • What does a good POC (proof of concept) look like?
  • Who are your stakeholders? What are their concerns?
  • How do you drive consensus when millions of dollars (and the work experience of thousands of engineers) are on the line? What does ‘consensus’ even mean in that context?
  • What are the primary considerations should you take into account when making a decision? What are secondary considerations?

I’m looking for the kind of advice that a principal engineer who has done this many times might give a staff engineer who is doing it for the first time. Or that a VP who’s done this many times might give a director who is doing it for the first time.

Can you help?

This is me wearing Leia buns and projecting a unicorn-shaped rainbow bat signal out into the sky for help. Do you have any advice for me? What guidance would you give to the readers of the second edition of this book?

Please send your advice to me in an email, addressed to my first name at honeycomb dot io, with the subject line: “Buying Software”. Include any relevant context about how large the company or engineering org is, and what your role in purchasing was.

I may respond with more questions, or reply and ask if you are able to talk synchronously. But I will not quote anything you send me without first asking your permission and getting a signed release. I will not mention ANY vendors by name, good or bad.

I am not fishing for honeycomb customers or buyers, I assume most of you haven’t tried honeycomb and don’t care about it and that is fine. This is not a Honeycomb project, this is an O’Reilly writing project. I just want to gather up some good advice on buying software and funnel it back out to good engineers.

Can you help? Your industry needs you! <3

 

 

Are you an experienced software buyer? I could use some help.

How We Migrated the Parse API From Ruby to Golang (Resurrected)

I wrote a lot of blog posts over my time at Parse, but they all evaporated after Facebook killed the product. Most of them I didn’t care about (there were, ahem, a lot of “service reliability updates”), but I was mad about losing one specific piece, a deceptively casual retrospective of the grueling, murderous two-year rewrite of our entire API from Ruby on Rails to Golang..

I could have sworn I’d looked for it before, but someone asked me a question about migrations this morning, which spurred me to pull up the Wayback Machine again and dig in harder, and … ✨I FOUND IT!!✨

Honestly, it is entirely possible that if we had not done this rewrite, there might be no Honeycomb. In the early days of the rewrite, we would ship something in Go and the world would break, over and over and over. As I said,

Rails HTTP processing is built on a philosophy of “be liberal in what you accept”. So developers end up inadvertently sending API requests that are undocumented or even non-RFC compliant … but Rails middleware cleans them up and handles it fine.

Rails would accept any old trash, Go would not. Breakage ensues. Tests couldn’t catch what we didn’t know to look for. Eventually we lit upon a workflow where we would split incoming production traffic, run each request against a Go API server and a Ruby API server, each backed by its own set of MongoDB replicas, and diff the responses. This is when we first got turned on to how incredibly powerful Scuba was, in its ability to compare individual responses, field by field, line by line.

Once you’ve used a tool like that, you’re hooked.. you can’t possibly go back to metrics and aggregates. The rest, as they say, is history.

The whole thing is still pretty fun to read, even if I can still smell the blood and viscera a decade later. Enjoy.


“How We Moved Our API From Ruby to Go and Saved Our Sanity”

Originally posted on blog.parse.com on June 10th, 2015.

The first lines of Parse code were written nearly four years ago. In 2011 Parse was a crazy little idea to solve the problem of building mobile apps.

Those first few lines were written in Ruby on Rails.


Ruby on Rails

Ruby let us get the first versions of Parse out the door quickly. It let a small team of engineers iterate on it and add functionality very fast. There was a deep bench of library support, gems, deploy tooling, and best practices available, so we didn’t have to reinvent very many wheels.

We used Unicorn as our HTTP server, Capistrano to deploy code, RVM to manage the environment, and a zillion open source gems to handle things like YAML parsing, oauth, JSON parsing, MongoDB, and MySQL. We also used Chef which is Ruby-based to manage our infrastructure so everything played together nicely. For a while.

The first signs of trouble bubbled up in the deploy process. As our code base grew, it took longer and longer to deploy, and the “graceful” unicorn restarts really weren’t very graceful. So, we monkeypatched rolling deploy groups in to Capistrano.

“Monkeypatch” quickly became a key technical term that we learned to associate with our Ruby codebase.

A year and a half in, at the end of 2012, we had 200 API servers running on m1.xlarge instance types with 24 unicorn workers per instance. This was to serve 3000 requests per second for 60,000 mobile apps. It took 20 minutes to do a full deploy or rollback, and we had to do a bunch of complicated load balancer shuffling and pre-warming to prevent the API from being impacted during a deploy.

Then, Parse really started to take off and experience hockey-stick growth.


Problems

When our API traffic and number of apps started growing faster, we started having to rapidly spin up more database machines to handle the new request traffic. That is when the “one process per request” part of the Rails model started to fall apart.

With a typical Ruby on Rails setup, you have a fixed pool of worker processes, and each worker can handle only one request at a time. So any time you have a type of request that is particularly slow, your worker pool can rapidly fill up with that type of request. This happens too fast for things like auto-scaling groups to react. It’s also wasteful because the vast majority of these workers are just waiting on another service. In the beginning, this happened pretty rarely and we could manage the problem by paging a human and doing whatever was necessary to keep the API up. But as we started growing faster and adding more databases and workers, we added more points of failure and more ways for performance to get degraded.

We started looking ahead to when Parse would 10x its size, and realized that the one-process-per-request model just wouldn’t scale. We had to move to an async model that was fundamentally different from the Rails way. Yeah, rewrites are hard, and yeah they always take longer than anyone ever anticipates, but we just didn’t see how we could make the Rails codebase scale while it was tied to one process per request.


What next?

We knew we needed asynchronous operations. We considered a bunch of options:

EventMachine

We already had some of our push notification service using EventMachine, but our experience was not great as it too was scaling. We had constant trouble with accidentally introducing synchronous behavior or parallelism bugs. The vast majority of Ruby gems are not asynchronous, and many are not threadsafe, so it was often hard to find a library that did some common task asynchronously.

JRuby

This might seem like the obvious solution – after all, Java has threads and can handle massive concurrency. Plus it’s Ruby already, right? This is the solution Twitter investigated before settling on Scala. But since JRuby is still basically Ruby, it still has the problem of asynchronous library support. We were concerned about needing a second rewrite later, from JRuby to Java. And literally nobody at all on our backend or ops teams wanted to deal with deploying and tuning the JVM. The groans were audible from outer space.

C++

We had a lot of experienced C++ developers on our team. We also already had some C++ in our stack, in our Cloud Code servers that ran embedded V8. However, C++ didn’t seem like a great choice. Our C++ code was harder to debug and maintain. It seemed clear that C++ development was generally less productive than more modern alternatives. It was missing a lot of library support for things we knew were important to us, like HTTP request handling. Asynchronous operation was possible but often awkward. And nobody really wanted to write a lot of C++ code.

C#

C# was a strong contender. It arguably had the best concurrency model with Async and Await. The real problem was that C# development on Linux always felt like a second-class citizen. Libraries that interoperate with common open source tools are often unavailable on C#, and our toolchain would have to change a lot.

Go

Go and C# both have asynchronous operation built into the language at a low level, making it easy for large groups of people to write asynchronous code. The MongoDB Go driver is probably the best MongoDB driver in existence, and complex interaction with MongoDB is core to Parse. Goroutines were much more lightweight than threads. And frankly we were most excited about writing Go code. We thought it would be a lot easier to recruit great engineers to write Go code than any of the other solid async languages.

In the end, the choice boiled down to C# vs Go, and we chose Go.


Wherein we rewrite the world

We started out rewriting our EventMachine push backend from Ruby to Go. We did some preliminary benchmarking with Go concurrency and found that each network connection ate up only 4kb of RAM. After rewriting the EventMachine push backend to Go we went from 250k connections per node to 1.5 million connections per node without even touching things like kernel tuning. Plus it seemed really fun. So, Go it was.

We rewrote some other minor services and starting building new services in Go. The main challenge, though, was to rewrite the core API server that handles requests to api.parse.com while seamlessly maintaining backward compatibility. We rewrote this endpoint by endpoint, using a live shadowing system to avoid impacting production, and monitored the differential metrics to make sure the behaviors matched.

During this time, Parse 10x’d the number of apps on our backend and more than 10x’d our request traffic. We also 10x’d the number of storage systems backed by Ruby. We were chasing a rapidly moving target.

The hardest part of the rewrite was dealing with all the undocumented behaviors and magical mystery bits that you get with Rails middleware. Parse exposes a REST API, and Rails HTTP processing is built on a philosophy of “be liberal in what you accept”. So developers end up inadvertently sending API requests that are undocumented or even non-RFC compliant … but Rails middleware cleans them up and handles it fine.

So we had to port a lot of delightful behavior from the Ruby API to the Go API, to make sure we kept handling the weird requests that Rails handled. Stuff like doubly encoded URLs, weird content-length requirements, bodies in HTTP requests that shouldn’t have bodies, horrible oauth misuse, horrible mis-encoded Unicode.

Our Go code is now peppered with fun, cranky comments like these:

// Note: an unset cache version is treated by ruby as “”.
// Because of this, dirtying this isn’t as simple as deleting it – we need to
// actually set a new value.

// This byte sequence is what ruby expects.
// yes that’s a paren after the second 180, per ruby.

// Inserting and having an op is kinda weird: We already know
// state zero. But ruby supports it, so go does too.

// single geo query, don’t do anything. stupid and does not make sense
// but ruby does it. Changing this will break a lot of client tests.
// just be nice and fix it here.

// Ruby sets various defaults directly in the structure and expects them to appear in cache.
// For consistency, we’ll do the same thing.

Results

Was the rewrite worth it? Hell yes it was. Our reliability improved by an order of magnitude. More importantly, our API is not getting more and more fragile as we spin up more databases and backing services. Our codebase got cleaned up and we got rid of a ton of magical gems and implicit assumptions. Co-tenancy issues improved for customers across the board. Our ops team stopped getting massively burned out from getting paged and trying to track down and manually remediate Ruby API outages multiple times a week. And needless to say, our customers were happier too.

We now almost never have reliability-impacting events that can be tracked back to the API layer – a massive shift from a year ago. Now when we have timeouts or errors, it’s usually constrained to a single app – because one app is issuing a very inefficient query that causes timeouts or full table scans for their app, or it’s a database-related co-tenancy problem that we can resolve by automatically rebalancing or filtering bad actors.

An asynchronous model had many other benefits. We were also able to instrument everything the API was doing with counters and metrics, because these were no longer blocking operations that interfered with communicating to other services. We could downsize our provisioned API server pool by about 90%. And we were also able to remove silos of isolated Rails API servers from our stack, drastically simplifying our architecture.

As if that weren’t enough, the time it takes to run our full integration test suite dropped from 25 minutes to 2 minutes, and the time to do a full API server deploy with rolling restarts dropped from 30 minutes to 3 minutes. The go API server restarts gracefully so no load balancer juggling and prewarming is necessary.

We love Go. We’ve found it really fast to deploy, really easy to instrument, really lightweight and inexpensive in terms of resources. It’s taken a while to get here, but the journey was more than worth it.

Credits/Blames

Credits/Blames go to Shyam Jayaraman for driving the initial decision to use Go, Ittai Golde for shepherding the bulk of the API server rewrite from start to finish, Naitik Shah for writing and open sourcing a ton of libraries and infrastructure underpinning our Go code base, and the rest of the amazing Parse backend SWE team who performed the rewrite.

How We Migrated the Parse API From Ruby to Golang (Resurrected)

Thoughts on Motivation and My 40-Year Career

I’ve never published an essay quite like this. I’ve written about my life before, reams of stuff actually, because that’s how I process what I think, but never for public consumption.

I’ve been pushing myself to write more lately because my co-authors and I have a whole fucking book to write between now and October. After ten years, you’d think this would be getting easier, not harder.

There’s something about putting out such memoiristic material that feels uncomfortably feminine to me. (Wow, ok.) I want to be known for my work, not for having a dramatic personal life. I love my family and don’t want to put them on display for the world to judge. And I never want the people I care about to feel like I am mining their experiences for clicks and content, whether that’s my family or my coworkers.

Many of the writing exercises I’ve been doing lately have ended up pulling on threads from my backstory, and the reason I haven’t published them is because I find myself thinking, “this won’t make any sense to people unless they know where I’m coming from.”

So hey, fuck it, let’s do this.

I went to college at the luckiest time

I left home when I was 15 years old. I left like a bottle rocket taking off – messy, explosive, a trail of destruction in my wake, and with absolutely zero targeting mechanisms.

It tells you a lot about how sheltered I was that the only place I could think of to go was university. I had never watched TV or been to a sports game or listened to popular music. I had never been to a doctor, I was quite unvaccinated.

I grew up in the backwoods of Idaho, the oldest of six, all of us homeschooled. I would go for weeks without seeing anyone other than my family. The only way to pass the time was by reading books or playing piano, so I did quite a lot of both. I called up the University of Idaho, asked for an admissions packet, hand wrote myself a transcript and gave myself all As, drove up and auditioned for the music department, and was offered a partial ride scholarship for classical piano performance.

I told my parents I was leaving, with or without their blessing or financial support. I left with neither.

My timing turned out to be flawless. I arrived on the cusp of the Internet age – they were wiring dorms for ethernet the year I enrolled. Maybe even more important, I arrived in the final, fading glory years of affordable state universities.

I worked multiple minimum wage jobs to put myself through school; day care, front desk, laundry, night audit. It was grueling, round the clock labor, but it was possible, if you were stubborn enough. I didn’t have a Social Security number (long story), I wasn’t old enough to take out loans, I couldn’t get financial aid because my parents didn’t file income taxes (again, long story). There was no help coming, I sank or I swam.

I found computers and the Internet around the same time as it dawned on me that everybody who studied music seemed to end up poor as an adult. I grew up too poor to buy canned vegetables or new underwear; we were like an 1800s family, growing our food, making our clothes, hand-me-downs til they fell apart.

Fuck being poor. Fuck it so hard. I was out.

I lost my music scholarship, but I started building websites and running systems for the university, then for local businesses. I dropped out and took a job in San Francisco. I went back, abortively; I dropped out again.

By the time I was 20 I was back in SF for good, making a salary five times what my father had made.

I grew up with a very coherent belief system that did not work for me

A lot of young people who flee their fundamentalist upbringing do so because they were abused and/or lost their faith, usually due to the hypocrisy of their leaders. Not me. I left home still believing the whole package – that evolution was a fraud, that the earth was created in seven days, that woman was created from Adam’s rib to be a submissive helpmate for their husband, that birth control was a sin, that anyone who believed differently was going to hell.

My parents loved us deeply and unshakably, and they were not hypocrites. In the places I grew up, the people who believed in God and went to church and lived a certain way were the ones who had their shit together, and the people who believed differently had broken lives. Reality seemed to confirm the truth of all we were taught, no matter how outlandish it sounds.

So I fully believed it was all true. I also knew it did not work for me. I did not want a small life. I did not want to be the support system behind some godly dude. I wanted power, money, status, fame, autonomy, success. I wanted to leave a crater in the world.

I was not a rebellious child, believe it or not. I loved my parents and wanted to make them proud. But as I entered my teens, I became severely depressed, and turned inward and hurt myself in all the ways young people do.

I left because staying there was killing me, and ultimately, I think my parents let me go because they saw it too.

Running away from things worked until it didn’t

I didn’t know what I wanted out of life other than all of it; right now, and my first decade out on my own was a hoot. It was in my mid twenties that everything started to fall apart.

I was an earnest kid who liked to study and think about the meaning of life, but when I bolted, I slammed the door to my conscience shut. I knew I was going to hell, but since I couldn’t live the other way, I made the very practical determination based on actuarial tables that I could to go my own way for a few decades, then repent and clean up my shit before I died. (Judgment Day was one variable that gave me heartburn, since it could come at any time.)

I was not living in accordance with my personal values and ethics, to put it lightly. I compartmentalized; it didn’t bother me, until it did. It started leaking into my dreams every night, and then it took over my waking life. I was hanging on by a thread; something had to give.

My way out, unexpectedly, started with politics. I started mainlining books about politics and economics during the Iraq War, which then expanded to history, biology, philosophy, other religious traditions, and everything else. (You can still find a remnant of my reading list here.)

When I was 13, I had an ecstatic religious experience; I was sitting in church, stewing over going to hell, and was suddenly filled with a glowing sense of warmth and acceptance. It lasted for nearly two weeks, and that’s how I knew I was “saved”.

In my late 20s, after a few years of intense study and research, I had a similar ecstatic experience walking up the stairs from the laundry room. I paused, I thought “maybe there is no God; maybe there is nobody out there judging me; maybe it all makes sense”, and it all clicked into place, and I felt high for days, suffused with peace and joy.

My career didn’t really take off until after that. I always had a job, but I wasn’t thinking about tech after hours. At first I was desperately avoiding my problems and self-medicating, later I became obsessed with finding answers. What did I believe about taxation, public policy, voting systems, the gender binary, health care, the whole messy arc of American history? I was an angry, angry atheist for a while. I filled notebook after notebook with handwritten notes; if I wasn’t working, I was studying.

And then, gradually, I wound down. The intensity, the high, tapered off. I started dating, realized I was poly and queer, and slowly chilled the fuck out. And that’s when I started being able to dedicate the creative, curious parts of my brain to my job in tech.

Why am I telling you all this?

Will Larson has talked a lot about how his underlying motivation is “advancing the industry”. I love that for him. He is such a structured thinker and prolific writer, and the industry needs his help, very badly.

For a while I thought that was my motivation too. And for sure, that’s a big part of it, particularly when it comes to observability and my day job. (Y’all, it does not need to be this hard. Modern observability is the cornerstone and prerequisite for high performing engineering teams, etc etc.)

But when I think about what really gets me activated on a molecular level, it’s a little bit different. It’s about living a meaningful life, and acting with integrity, and building things of enduring value instead of tearing them down.

When I say it that way, it sounds like sitting around on the mountain meditating on the meaning of life, and that is not remotely what I mean. Let me try again.

For me, work has been a source of liberation

It’s very uncool these days to love your job or talk about hard work. But work has always been a source of liberation for me. My work has brought me so much growth and development and community and friendship. It brings meaning to my life, and the joy of creation. I want this for myself. I want this for anyone else who wants it too.

I understand why this particular tide has turned. So many people have had jobs where their employers demanded total commitment, but felt no responsibility to treat them well or fairly in return. So many people have never experienced work as anything but a depersonalizing grind, or an exercise in exploitation, and that is heartbreaking.

I don’t think there’s anything morally superior about people who want their work to be a vehicle for personal growth instead of just a paycheck. I don’t think there’s anything wrong with just wanting a paycheck, or wanting to work the bare minimum to get by. But it’s not what I want for myself, and I don’t think I’m alone in this.

I feel intense satisfaction and a sense of achievement when I look back on my career. On a practical level, I’ve been able to put family members through college, help with down payments, and support artists in my community. All of this would have been virtually unimaginable to me growing up.

I worked a lot harder on the farm than I ever have in front of a keyboard, and got a hell of a lot less for my efforts.

(People who glamorize things like farming, gardening, canning and freezing, taking care of animals, cooking and caretaking, and other forms of manual labor really get under my skin. All of these things make for lovely hobbies, but subsistence labor is neither fun nor meaningful. Trust me on this one.)

My engineer/manager pendulum days

I loved working as an engineer. I loved how fast the industry changes, and how hard you have to scramble to keep up. I loved the steady supply of problems to fix, systems to design, and endless novel catastrophes to debug. The whole Silicon Valley startup ecosystem felt like it could not have been more perfectly engineered to supply steady drips of dopamine to my brain.

I liked working as an engineering manager. Eh, that might be an overstatement. But I have strong opinions and I like being in charge, and I really wanted more access to information and influence over decisions, so I pushed my way into the role more than once.

If Honeycomb hadn’t happened, I am sure I would have bounced back and forth between engineer and manager for the rest of my career. I never dreamed about climbing the ladder or starting a company. My attitude towards middle management could best be described as amiable contempt, and my interest in the business side of things was nonexistent.

I have always despised people who think they’re too good to work for other people, and that describes far too many of the founders I’ve met.

Operating a company draws on a different kind of meaning

I got the chance to start a company in 2016, so I took it, almost on a whim. Since then I have done so many things I never expected to do. I’ve been a founder, CEO, CTO, I’ve raised money, hired and fired other execs, run organizations, crafted strategy, and come to better understand and respect the critical role played by sales, marketing, HR, and other departments. No one is more astonished than I am to find me still here, still doing this.

But there is joy to be found in solving systems problems, even the ones that are less purely technical. There is joy to be found in building a company, or competing in a marketplace.

To be honest, this is not a joy that came to me swiftly or easily. I’ve been doing this for the past 9.5 years, and I’ve been happy doing it for maybe the past 2-3 years. But it has always felt like work worth doing. And ultimately, I think I’m less interested in my own happiness (whatever that means) than I am interested in doing work that feels worth doing.

Work is one of the last remaining places where we are motivated to learn from people we don’t agree with and find common pursuit with people we are ideologically opposed to. I think that’s meaningful. I think it’s worth doing.

Reality doesn’t give a shit about ideology

I am a natural born extremist. But when you’re trying to operate a business and win in the marketplace, ideological certainty crashes hard into the rocks of reality. I actually find this deeply motivating.

I spent years hammering out my own personal ontological beliefs about what is right and just, what makes a life worth living, what responsibilities we have to each another. I didn’t really draw on those beliefs very often as an engineer/manager, at least not consciously. That all changed dramatically after starting a company.

It’s one thing to stand off to the side and critique the way a company is structured and the decisions leaders make about compensation, structure, hiring/firing, etc. But creation is harder than critique (one of my favorite Jeff Gray quotes) — so, so, so much harder. And reality resists easy answers.

Being an adult, to me, has meant making peace with a multiplicity of narratives. The world I was born into had a coherent story and a set of ideals that worked really well for a lot of people, but it was killing me. Not every system works for every person, and that’s okay. That’s life. Startups aren’t for everyone, either.

The struggle is what brings your ideals to life

Almost every decision you make running a company has some ethical dimension. Yet the foremost responsibility you have to your stakeholders, from investors to employees, is to make the business succeed, to win in the marketplace. Over-rotating on ethical repercussions of every move can easily cause you to get swamped in the details and fail at your prime directive.

Sometimes you may have a strongly held belief that some mainstream business practice is awful, so you take a different path, and then you learn the hard way why it is that people don’t take that path. (This has happened to me more times than I can count. 🙈)

Ideals in a vacuum are just not that interesting. If I wrote an essay droning on and on about “leading with integrity”, no one would read it, and nor should they. That’s boring. What’s interesting is trying to win and do hard things, while honoring your ideals.

Shooting for the stars and falling short, innovating, building on the frontier of what’s possible, trying but failing, doing exciting things that exceed your hopes and dreams with a team just as ambitious and driven as you are, while also holding your ideals to heart — that’s fucking exciting. That’s what brings your ideals to life.

We have lived through the golden age of tech

I recognize that I have been profoundly lucky to be employed through the golden age of tech. It’s getting tougher out there to enter the industry, change jobs, or lead with integrity.

It’s a tough time to be alive, in general. There are macro scale political issues that I have no idea how to solve or fix. Wages used to rise in line with productivity, and now they don’t, and haven’t since the mid 70s. Capital is slurping up all the revenue and workers get an ever decreasing share, and I don’t know how to fix that, either.

But I don’t buy the argument that just because something has been touched by capitalism or finance it is therefore irreversibly tainted, or that there is no point in making capitalist institutions better. The founding arguments of capitalism were profoundly moral ones, grounded in a keen understanding of human nature. (Adam Smith’s “Wealth of Nations” gets all the attention, but his other book, “Theory of Moral Sentiments”, is even better, and you can’t read one without the other.)

As a species we are both individualistic and communal, selfish and cooperative, and the miracle of capitalism is how effectively it channels the self-interested side of our nature into the common good.

Late stage capitalism, however, along with regulatory capture, enshittification, and the rest of it, has made the modern world brutally unkind to most people. Tech was, for a shining moment in time, a path out of poverty for smart kids who were willing to work their asses off. It’s been the only reliable growth industry of my lifetime.

It remains, for my money, the best job in the world. Or it can be. It’s collaborative, creative, and fun; we get paid scads of money to sit in front of a computer and solve puzzles all day. So many people seem to be giving up on the idea that work can ever be a place of meaning and collaboration and joy. I think that sucks. It’s too soon to give up! If we prematurely abandon tech to its most exploitative elements, we guarantee its fate.

If you want to change the world, go into business

Once upon a time, if you had strongly held ideals and wanted to change the world, you went into government or nonprofit work.

For better or for worse (okay, mostly worse), we live in an age where corporate power dominates. If you want to change the world, go into business.

The world needs, desperately, people with ethics and ideals who can win at business. We can’t let all the people who care about people go into academia or medicine or low wage service jobs. We can’t leave the ranks of middle and upper management to be filled by sycophants and sociopaths.

There’s nothing sinister about wanting power; what matters is what you do with it. Power, like capitalism, is a tool, and can be bent to powerful ends both good and evil. If you care about people, you should be unashamed about wanting to amass power and climb the ladder.

There are a lot of so-called best practices in this industry that are utterly ineffective (cough, whiteboarding B-trees in an interview setting), yet they got cargo culted and copied around for years. Why? Because the company that originated the practice made a lot of money. This is stupid, but it also presents an opportunity. All you need to do is be a better company, then make a lot of money. 😉

People need institutions

I am a fundamentalist at heart, just like my father. I was born to be a bomb thrower and a contrarian, a thorn in the side of the smug moderate establishment. Unfortunately, I was born in an era where literally everyone is a fucking fundamentalist and the establishment is holding on by a thread.

I’ve come to believe that the most quietly radical, rebellious thing I can possibly do is to be an institutionalist, someone who builds instead of performatively tearing it all down.

People need institutions. We crave the feeling of belonging to something much larger than ourselves. It’s one of the most universal experiences of our species.

One of the reasons modern life feels so fragmented and hard is because so many of our institutions have broken down or betrayed the people they were supposed to serve. So many of the associations that used to frame our lives and identities — church, government, military, etc — have tolerated or covered up so much predatory behavior and corruption, it no longer surprises anyone.

We’ve spent the past few decades ripping down institutions and drifting away from them. But we haven’t stopped wanting them, or needing them.

I hope, perhaps naively, that we are entering into a new era of rebuilding, sadder but wiser. An era of building institutions with accountability and integrity, institutions with enduring value, that we can belong to and take pride in… not because we were coerced or deceived, not because they were the only option, but because they bring us joy and meaning. Because we freely choose them, because they are good for us.

The second half of your career is about purpose

It seems very normal to enter the second half of your 40 year career thinking a lot about meaning and purpose. You spend the first decade or so hoovering up skill sets, the second finding your place and what feeds you, and then, inevitably, you start to think about what it all means and what your legacy will be.

That’s definitely where I’m at, as I think about the second half of my career. I want to take risks. I want to play big and win bigger. I want to show that hard work isn’t just a scam inflicted on those who don’t know any better. If we win, I want the people I work with to earn lifechanging amounts of money, so they can buy homes and send their kids to college. I want to show that work can still be an avenue for liberation and community and personal growth, for those of us who still want that.

I care about this industry and the people in it so much, because it’s been such a gift to me. I want to do what I can to make it a better place for generations to come. I want to build institutions worth belonging to.

Thoughts on Motivation and My 40-Year Career

In Praise of “Normal” Engineers

This article was originally commissioned by Luca Rossi (paywalled) for refactoring.fm, on February 11th, 2025. Luca edited a version of it that emphasized the importance of building “10x engineering teams” . It was later picked up by IEEE Spectrum (!!!), who scrapped most of the teams content and published a different, shorter piece on March 13th.

This is my personal edit. It is not exactly identical to either of the versions that have been publicly released to date. It contains a lot of the source material for the talk I gave last week at #LDX3 in London, “In Praise of ‘Normal’ Engineers” (slides), and a couple weeks ago at CraftConf. 

In Praise of “Normal” Engineers

Most of us have encountered a few engineers who seem practically magician-like, a class apart from the rest of us in their ability to reason about complex mental models, leap to non-obvious yet elegant solutions, or emit waves of high quality code at unreal velocity.In Praise of "Normal" Engineers

I have run into any number of these incredible beings over the course of my career. I think this is what explains the curious durability of the “10x engineer” meme. It may be based on flimsy, shoddy research, and the claims people have made to defend it have often been risible (e.g. “10x engineers have dark backgrounds, are rarely seen doing UI work, are poor mentors and interviewers”), or blatantly double down on stereotypes (“we look for young dudes in hoodies that remind us of Mark Zuckerberg”). But damn if it doesn’t resonate with experience. It just feels true.

The problem is not the idea that there are engineers who are 10x as productive as other engineers. I don’t have a problem with this statement; in fact, that much seems self-evidently true. The problems I do have are twofold.

Measuring productivity is fraught and imperfect

First: how are you measuring productivity? I have a problem with the implication that there is One True Metric of productivity that you can standardize and sort people by. Consider, for a moment, the sheer combinatorial magnitude of skills and experiences at play:

  • Are you working on microprocessors, IoT, database internals, web services, user experience, mobile apps, consulting, embedded systems, cryptography, animation, training models for gen AI… what?
  • Are you using golang, python, COBOL, lisp, perl, React, or brainfuck? What version, which libraries, which frameworks, what data models? What other software and build dependencies must you have mastered?
  • What adjacent skills, market segments, or product subject matter expertise are you drawing upon…design, security, compliance, data visualization, marketing, finance, etc?
  • What stage of development? What scale of usage? What matters most — giving good advice in a consultative capacity, prototyping rapidly to find product-market fit, or writing code that is maintainable and performant over many years of amortized maintenance? Or are you writing for the Mars Rover, or shrinkwrapped software you can never change?

Also: people and their skills and abilities are not static. At one point, I was a pretty good DBRE (I even co-wrote the book on it). Maybe I was even a 10x DB engineer then, but certainly not now. I haven’t debugged a query plan in years.

“10x engineer” makes it sound like 10x productivity is an immutable characteristic of a person. But someone who is a 10x engineer in a particular skill set is still going to have infinitely more areas where they are normal or average (or less). I know a lot of world class engineers, but I’ve never met anyone who is 10x better than everyone else across the board, in every situation.

Engineers don’t own software, teams own software

Second, and even more importantly: So what? It doesn’t matter. Individual engineers don’t own software, teams own software. The smallest unit of software ownership and delivery is the engineering team. It doesn’t matter how fast an individual engineer can write software, what matters is how fast the team can collectively write, test, review, ship, maintain, refactor, extend, architect, and revise the software that they own.

Everyone uses the same software delivery pipeline. If it takes the slowest engineer at your company five hours to ship a single line of code, it’s going to take the fastest engineer at your company five hours to ship a single line of code. The time spent writing code is typically dwarfed by the time spent on every other part of the software development lifecycle.

If you have services or software components that are owned by a single engineer, that person is a single point of failure.

I’m not saying this should never happen. It’s quite normal at startups to have individuals owning software, because the biggest existential risk that you face is not moving fast enough, not finding product market fit, and going out of business. But as you start to grow up as a company, as users start to demand more from you, and you start planning for the survival of the company to extend years into the future…ownership needs to get handed over to a team. Individual engineers get sick, go on vacation, and leave the company, and the business has got to be resilient to that.

If teams own software, then the key job of any engineering leader is to craft high-performing engineering teams. If you must 10x something, 10x this. Build 10x engineering teams.

The best engineering orgs are the ones where normal engineers can do great work

When people talk about world-class engineering orgs, they often have in mind teams that are top-heavy with staff and principal engineers, or recruiting heavily from the ranks of ex-FAANG employees or top universities.

But I would argue that a truly great engineering org is one where you don’t HAVE to be one of the “best” or most pedigreed engineers in the world to get shit done and have a lot of impact on the business.

I think it’s actually the other way around. A truly great engineering organization is one where perfectly normal, workaday software engineers, with decent software engineering skills and an ordinary amount of expertise, can consistently move fast, ship code, respond to users, understand the systems they’ve built, and move the business forward a little bit more, day by day, week by week.

Any asshole can build an org where the most experienced, brilliant engineers in the world can build product and make progress. That is not hard. And putting all the spotlight on individual ability has a way of letting your leaders off the hook for doing their jobs. It is a HUGE competitive advantage if you can build sociotechnical systems where less experienced engineers can convert their effort and energy into product and business momentum.

A truly great engineering org also happens to be one that mints world-class software engineers. But we’re getting ahead of ourselves, here.

Let’s talk about “normal” for a moment

A lot of technical people got really attached to our identities as smart kids. The software industry tends to reflect and reinforce this preoccupation at every turn, from Netflix’s “we look for the top 10% of global talent” to Amazon’s talk about “bar-raising” or Coinbase’s recent claim to “hire the top .1%”. (Seriously, guys? Ok, well, Honeycomb is going to hire only the top .00001%!)

In this essay, I would like to challenge us to set that baggage to the side and think about ourselves as normal people.

It can be humbling to think of ourselves as normal people, but most of us are in fact pretty normal people (albeit with many years of highly specialized practice and experience), and there is nothing wrong with that. Even those of us who are certified geniuses on certain criteria are likely quite normal in other ways — kinesthetic, emotional, spatial, musical, linguistic, etc.

Software engineering both selects for and develops certain types of intelligence, particularly around abstract reasoning, but nobody is born a great software engineer. Great engineers are made, not born. I just don’t think there’s a lot more we can get out of thinking of ourselves as a special class of people, compared to the value we can derive from thinking of ourselves collectively as relatively normal people who have practiced a fairly niche craft for a very long time.

Build sociotechnical systems with “normal people” in mind

When it comes to hiring talent and building teams, yes, absolutely, we should focus on identifying the ways people are exceptional and talented and strong. But when it comes to building sociotechnical systems for software delivery, we should focus on all the ways people are normal.

Normal people have cognitive biases — confirmation bias, recency bias, hindsight bias. We work hard, we care, and we do our best; but we also forget things, get impatient, and zone out. Our eyes are inexorably drawn to the color red (unless we are colorblind). We develop habits and ways of doing things, and resist changing them. When we see the same text block repeatedly, we stop reading it.

We are embodied beings who can get overwhelmed and fatigued. If an alert wakes us up at 3 am, we are much more likely to make mistakes while responding to that alert than if we tried to do the same thing at 3pm. Our emotional state can affect the quality of our work. Our relationships impact our ability to get shit done.

When your systems are designed to be used by normal engineers, all that excess brilliance they have can get poured into the product itself, instead of wasting it on navigating the system itself.

How do you turn normal engineers into 10x engineering teams?

None of this should be terribly surprising; it’s all well known wisdom. In order to build the kind of sociotechnical systems for software delivery that enable normal engineers to move fast, learn continuously, and deliver great results as a team, you should:

Shrink the interval between when you write the code and when the code goes live.

Make it as short as possible; the shorter the better. I’ve written and given talks about this many, many times. The shorter the interval, the lower the cognitive carrying costs. The faster you can iterate, the better. The more of your brain can go into the product instead of the process of building it.

One of the most powerful things you can do is have a short, fast enough deploy cycle that you can ship one commit per deploy. I’ve referred to this as the “software engineering death spiral” … when the deploy cycle takes so long that you end up batching together a bunch of engineers’ diffs in every build. The slower it gets, the more you batch up, and the harder it becomes to figure out what happened or roll back. The longer it takes, the more people you need, the higher the coordination costs, and the more slowly everyone moves.

Deploy time is the feedback loop at the heart of the development process. It is almost impossible to overstate the centrality of keeping this short and tight.

Make it easy and fast to roll back or recover from mistakes.

Developers should be able to deploy their own code, figure out if it’s working as intended or not, and if not, roll forward or back swiftly and easily. No muss, no fuss, no thinking involved.

Make it easy to do the right thing and hard to do the wrong thing.

Wrap designers and design thinking into all the touch points your engineers have with production systems. Use your platform engineering team to think about how to empower people to swiftly make changes and self-serve, but also remember that a lot of times people will be engaging with production late at night or when they’re very stressed, tired, and possibly freaking out. Build guard rails. The fastest way to ship a single line of code should also be the easiest way to ship a single line of code.

Invest in instrumentation and observability.

You’ll never know — not really — what the code you wrote does just by reading it. The only way to be sure is by instrumenting your code and watching real users run it in production. Good, friendly sociotechnical systems invest heavily in tools for sense-making.

Being able to visualize your work is what makes engineering abstractions accessible to actual engineers. You shouldn’t have to be a world-class engineer just to debug your own damn code.

Devote engineering cycles to internal tooling and enablement.

If fast, safe deploys, with guard rails, instrumentation, and highly parallelized test suites are “everybody’s job”, they will end up nobody’s job. Engineering productivity isn’t something you can outsource. Managing the interfaces between your software vendors and your own teams is both a science and an art. Making it look easy and intuitive is really hard. It needs an owner.

Build an inclusive culture.

Growth is the norm, growth is the baseline. People do their best work when they feel a sense of belonging. An inclusive culture is one where everyone feels safe to ask questions, explore, and make mistakes; where everyone is held to the same high standard, and given the support and encouragement they need to achieve their goals.

Diverse teams are resilient teams.

Yeah, a team of super-senior engineers who all share a similar background can move incredibly fast, but a monoculture is fragile. Someone gets sick, someone gets pregnant, you start to grow and you need to integrate people from other backgrounds and the whole team can get derailed — fast.

When your teams are used to operating with a mix of genders, racial backgrounds, identities, age ranges, family statuses, geographical locations, skill sets, etc — when this is just table stakes, standard operating procedure — you’re better equipped to roll with it when life happens.

Assemble engineering teams from a range of levels.

The best engineering teams aren’t top-heavy with staff engineers and principal engineers. The best engineering teams are ones where nobody is running on autopilot, banging out a login page for the 300th time; everyone is working on something that challenges them and pushes their boundaries. Everyone is learning, everyone is teaching, everyone is pushing their own boundaries and growing. All the time.

By the way — all of that work you put into making your systems resilient, well-designed, and humane is the same work you would need to do to help onboard new engineers, develop junior talent, or let engineers move between teams.

It gets used and reused. Over and over and over again.

The only meaningful measure of productivity is impact to the business

The only thing that actually matters when it comes to engineering productivity is whether or not you are moving the business materially forward.

Which means…we can’t do this in a vacuum. The most important question is whether or not we are working on the right thing, which is a problem engineering can’t answer without help from product, design, and the rest of the business.

Software engineering isn’t about writing lots of lines of code, it’s about solving business problems using technology.

Senior and intermediate engineers are actually the workhorses of the industry. They move the business forward, step by step, day by day. They get to put their heads down and crank instead of constantly looking around the org and solving coordination problems. If you have to be a staff+ engineer to move the product forward, something is seriously wrong.

Great engineering orgs mint world-class engineers

A great engineering org is one where you don’t HAVE to be one of the best engineers in the world to have a lot of impact. But — rather ironically — great engineering orgs mint world class engineers like nobody’s business.

The best engineering orgs are not the ones with the smartest, most experienced people in the world, they’re the ones where normal software engineers can consistently make progress, deliver value to users, and move the business forward, day after day.

Places where engineers can get shit done and have a lot of impact are a magnet for top performers. Nothing makes engineers happier than building things, solving problems, making progress.

If you’re lucky enough to have world-class engineers in your org, good for you! Your role as a leader is to leverage their brilliance for the good of your customers and your other engineers, without coming to depend on their brilliance. After all, these people don’t belong to you. They may walk out the door at any moment, and that has to be okay.

These people can be phenomenal assets, assuming they can be team players and keep their egos in check. Which is probably why so many tech companies seem to obsess over identifying and hiring them, especially in Silicon Valley.

But companies categorically overindex on finding these people after they’ve already been minted, which ends up reinforcing and replicating all the prejudices and inequities of the world at large. Talent may be evenly distributed across populations, but opportunity is not.

Don’t hire the “best” people. Hire the right people.

We (by which I mean the entire human race) place too much emphasis on individual agency and characteristics, and not enough on the systems that shape us and inform our behaviors.

I feel like a whole slew of issues (candidates self-selecting out of the interview process, diversity of applicants, etc) would be improved simply by shifting the focus on engineering hiring and interviewing away from this inordinate emphasis on hiring the BEST PEOPLE and realigning around the more reasonable and accurate RIGHT PEOPLE.

It’s a competitive advantage to build an environment where people can be hired for their unique strengths, not their lack of weaknesses; where the emphasis is on composing teams rather than hiring the BEST people; where inclusivity is a given both for ethical reasons and because it raises the bar for performance for everyone. Inclusive culture is what actual meritocracy depends on.

This is the kind of place that engineering talent (and good humans) are drawn to like a moth to a flame. It feels good to ship. It feels good to move the business forward. It feels good to sharpen your skills and improve your craft. It’s the kind of place that people go when they want to become world class engineers. And it’s the kind of place where world class engineers want to stick around, to train up the next generation.

<3, charity

 

In Praise of “Normal” Engineers

On How Long it Takes to Know if a Job is Right for You or Not

A few eagle-eyed readers have noticed that it’s been 4 weeks since my last entry in what I have been thinking of as my “niblet series” — one small piece per week, 1000 words or less, for the next three months.

This is true. However, I did leave myself some wiggle room in my original goal, when I said “weeks when I am not traveling”, knowing I was traveling 6 of the next 7 weeks. I was going to TRY to write something on the weeks I was traveling, but as you can see, I mostly did not succeed. Oh well!

Honestly, I don’t feel bad about it. I’ve written well over 1k words on bsky over the past two weeks in the neverending thread on the costs and tradeoffs of remote work. (A longform piece on the topic is coming soon.) I also wrote a couple of lengthy internal pieces.

This whole experiment was designed to help me unblock my writing process and try out new habits, and I think I’m making progress. I will share what I’m learning at a later date, but for now: onward!

How long does it take to form an impression of a new job?

This week’s niblet was inspired by a conversation I had yesterday with an internet friend. To paraphrase (and lightly anonymize) their question:

“I took a senior management role at this company six months ago. My search for this role was all about values alignment, from company mission to leadership philosophy, and the people here said all the right things in the process. But it’s just not clicking.

It’s only been six months, but it’s starting to feel like it might not work out. How much longer should I give it?”

Zero. You should give it 0 time. You already know, and you’ve known for a long time; it’s not gonna change. I’m sorry. 💔

I’m not saying you should quit tomorrow, a person needs a paycheck, but you should probably start thinking in terms of how to manage the problem and extricate yourself from it, not like you’re waiting to see if it will be a good fit.

Every job I’ve ever had has made a strong first impression

I’ve had…let’s see…about six different employers, over the course of my (post-university) career.

Every job I’ve ever taken, I knew within the first week whether it was right for me or not. That might be overstating things a bit (memory can be like that). But I definitely had a strong visceral reaction to the company within days after starting, and the rest of my tenure played out more or less congruent with that reaction.Progress not Perfection

The first week at EVERY job is a hot mess of anxiety and nerves and second-guessing yourself and those around you. It’s never warm fuzzies. But at the jobs I ended up loving and staying at long term, the anxiety was like “omg these people are so cool and so great and so fucking competent, I hope I can measure up to their expectations.”

And then there were the jobs where the anxiety I felt was more like a sinking sensation of dread, of “oooohhh god I hope this is a one-off and not the kind of thing I will encounter every day.”

🌸 There was the job where they had an incident on my very first day, and by 7 pm I was like “why isn’t someone telling me I should go home?” There was literally nothing I could do to help, I was still setting up my accounts, yet I had the distinct impression I was expected to stay.

This job turned out to be stereotypically Silicon Valley in the worst ways, hiring young, cheap engineers and glorifying coding all night and sleeping under your desks.

🌼 There was the job where they were walking me through a 50-page Microsoft Word doc on how to manage replication between DB nodes, and I laughed a little, and looked for some rueful shared acknowledgement of how shoddy this was…but I was the only one laughing.

That job turned out to be shoddy, ancient, flaky tech all the way down, with comfortable, long-tenured staff who didn’t know (and did NOT want to hear) how out of date their tech had become.

Over time, I learned to trust that intuition

Around the time I became a solidly senior engineer, I began to reflect on how indelible my early impressions of each job had been, and how reliable those impressions had turned out to be.Communicate Positive Intent

To be clear, I don’t regret these jobs. I got to work with some wonderful people, and I got to experience a range of different organizational structures and types. I learned a lot from every single one of my jobs.

Perhaps most of all, I learned how to sniff out particular environments that really do not work for me, and I never made the same mistake twice.

Companies can and do change dramatically. But absent dramatic action, which can be quite painful, they tend to drift along their current trajectory.

This matters even more for managers

This is one of those ways that I think the work of management is different from the work of engineering. As an experienced IC, it’s possible to phone it in and still do a good job. As long as you’re shipping at an acceptable rate, you can check out mentally and emotionally, even work for people or companies you basically despise.

Lots of people do in fact do this. Hell, I’ve done it. You aren’t likely to do the best work of your life under these circumstances, but people have done far worse to put food on the table.

An IC can wall themselves off emotionally and still do acceptable work, but I’m not sure a manager can do the same.

Alignment *is* the job of management

As a manager, you literally represent the company to your team and those around you. You don’t have to agree with every single decision the company makes, but if you find yourself constantly having to explain and justify things the company has done that deeply violate your personal beliefs or ethics, it does you harm.

Some managers respond to a shitty corporate situation by hunkering down and behaving like a shit umbrella; doing whatever they can to protect their people, at the cost ofif it hurts...do it more undermining the company itself. I don’t recommend this, either. It’s not healthy to know you walk around every day fucking over one of your primary stakeholders, whether it’s the company OR  your teammates.

There are also companies that aren’t actually that bad, but you just aren’t aligned with them. That’s fine. Alignment matters a lot more for managers than for ICs, because alignment is the job.

Management is about crafting and tending to complex sociotechnical systems. No manager can do this alone. Having a healthy, happy team of direct reports is only a fraction of the job description. It’s not enough. You can and should expect more.

What can you learn from the experience?

I asked my friend to think back to the interview process. What were the tells? What do they wish they had known to watch out for?

They thought for a moment, then said:

“Maybe the fact that the entire leadership team had been grown or promoted from within. SOME amount of that is terrific, but ALL of it might be a yellow flag. The result seems to be that everyone else thinks and feels the same way…and I think differently.”

This is SO insightful.

It reminds me of all the conversations Emily and I have had over the years, on how to balance developing talent from within vs bringing in fresh perspectives, people who have already seen what good looks like at the next stage of growth, people who can see around corners and challenge us in different ways.

This is a tough thing to suss out from the outside, especially when the employer is saying all the right things. But having an experience like this can inoculate you from an entire family of related mistakes. My friend will pick up on this kind of insularity from miles away, from now on.

Bad jobs happen. Interviews can only predict so much. A person who has never had a job they disliked is a profoundly lucky person. In the end, sometimes all you can take is the lessons you learned and won’t repeat.

The pig is committed

Have you ever heard the metaphor of the chicken vs the pig? The chicken contributes an egg to breakfast, the pig contributes bacon. The punch line goes something like, “the chicken is involved, but the pig is committed!”

It’s vivid and a bit over the top, but I kept thinking about it while writing this piece. The engineer contributes their labor and output to move the company forward, but the manager contributes their emotional and relational selves — their humanity — to serve the cause.

You only get one career. Who are you going to give your bacon to?

On How Long it Takes to Know if a Job is Right for You or Not

On Pronouns, Policies and Mandates

Hi friends! We’re on week three of my 12-week practice in writing one bite-sized topic per week — scoping it down, writing straight through, trying real hard to avoid over-writing or editing down to a pulp.

Week 1 — “On Writing, Social Media, and Finding the Line of Embarrassment
Week 2 — “On Dropouts and Bootstraps

Three points in a row makes a line, and three posts in a row called “On [Something or Other]” is officially a pattern.

It was an accidental repeat last week (move fast and break things! 🙈), but I think I like it, so I’m sticking with it.

Next on the docket: pronouns and mandates

This week I would like to talk about pronouns (as in “my name is Charity, my pronouns are she/her or they/them”) and pronoun mandates, in the context of work.

Here’s where I stand, in brief:

  • Making it safe to disclose the pronouns you use: ✨GOOD✨
  • Normalizing the practice of sharing your pronouns: ✨GOOD✨
  • Mandating that everyone share their pronouns: ✨BAD✨

This includes soft mandates, like when a manager or HR asks everyone at work to share their pronouns when introducing themselves, or making pronouns a required field in email signatures or display names.

I absolutely understand that people who do this are acting in good faith, trying to be good allies. But I do not like it. 😡 And I think it can massively backfire!

Here are my reasons.

I resent being forced to pick a side in public

I have my own gender issues, y’all. Am I supposed to claim “she/her” or “they/them”? Ugh, I don’t know. I’ve never felt any affinity with feminine pronouns or identity, but I don’t care enough to correct anyone or assert a preference for they/them. Ultimately, the strongest feeling I have about my gender is apathy/discomfort/irritation. Maybe that will change someday, maybe it won’t, but I resent being forced to pick a side and make some kind of public declaration when I’m just trying to do my goddamn job. My gender doesn’t need to be anyone else’s business.

I totally acknowledge that it is valuable for cis people to help normalize the practice by sharing their pronouns. (It never fails to warm the cockles of my cold black heart when I see a graying straight white dude lead with “My pronouns are he/him” in his bio. Charmed! 😍)

If I worked at a company where this was not commonly done, I would suck it up and take one for the team. But I don’t feel the need, because it is normalized here. We have loads of other queer folks, my cofounder shares her pronouns. I don’t feel like I’m hurting anyone by not doing it myself.

Priming people with gender cues can be…unwise

One of the engineering managers I work with, Hannah Henderson, once told me that she has always disliked pronoun mandates for a different reason. Research shows that priming someone to think of you as a woman first and foremost generally leads them to think of you as being less technical, less authoritative, even less competent.

Great, just what we need.

What about people who don’t know, or aren’t yet out?

Some people may be in a transitional phase, or may be in the process of coming out as trans or genderqueer or nonbinary, or maybe they don’t know yet. Gender is a deeply personal question, and it’s inappropriate to force people to take a stand or pick a side in public or at work.

If **I** feel this way about pronoun mandates (and keep in mind that I am queer, have lived in San Francisco for 20 years, and am married to a genderqueer trans person), I can’t imagine how offputting and irritating these mandates must be to someone who holds different values, or comes from a different cultural background.

You can’t force someone to be a good ally

As if that wasn’t enough, pronoun mandates also have a flattening effect, eliminating useful signal about who is willing to stand up and identify themselves as someone who is a queer ally, and/or is relatively informed about gender issues.

As a friend commented, when reviewing a draft of this post: “Mandating it means we can’t look around the room and determine who might be friendly or safe, while also escalating resentment that bigots hold towards us.”

A couple months back I wrote a long (LONG) essay detailing my mixed feelings about corporate DEI initiatives. One of the points I was trying to land is how much easier it is to make and enforce rules, if you’re in a position with the power to do so, than to win hearts and minds. Rules always have edge cases and unintended consequences, and the backlash effect is real. People don’t like being told what to do.

Pronoun mandates were at the top of my mind when I wrote that, and I’ve been meaning to follow up and unpack this ever since.

Til next week, when we’ll talk “On something or some other thing”,
~charity💕

(835 words! 🙌)

On Pronouns, Policies and Mandates