How many pillars of observability can you fit on the head of a pin?

My day started off with an innocent question, from an innocent soul.

“Hey Charity, is profiling a pillar?”

I hadn’t even had my coffee yet.

“Someone was just telling me that profiling is the fourth pillar of observability now. I said I think profiling is a great tool, but I don’t know if it quite rises to the level of pillar. What do you think?”

What….do.. I think.

What I think is, there are no pillars. I think the pillars are a fucking lie, dude. I think the language of pillars does a lot of work to keep good engineers trapped inside a mental model from the 1980s, paying outrageous sums of money for tooling that can’t keep up with the chaos and complexity of modern systems.

Here is a list of things I have recently heard people refer to as the “fourth pillar of observability”:

  • Profiling
  • Tokens (as in LLMs)
  • Errors, exceptions
  • Analytics
  • Cost

Is it a pillar, is it not a pillar? Are they all pillars? How many pillars are there?? How many pillars CAN there be? Gaahhh!

This is not a new argument. Take this ranty little tweet thread of mine from way back in 2018, for starters.

 

Or perhaps you have heard of TEMPLE: Traces, Events, Metrics, Profiles, Logs, and Exceptions?

Or the “braid” of observability data, or “They Aren’t Pillars, They’re Lenses”, or the Lightstep version: “Three Pillars, Zero Answers” (that title is a personal favorite).

Alright, alright. Yes, this has been going on for a long time. I’m older now and I’m tireder now, so here’s how I’ll sum it up.

Pillar is a marketing term.
Signal is a technical term.

So “is profiling a pillar?” is a valid question, but it’s not a technical question. It’s a question about the marketing claims being made by a given company. Some companies are building a profiling product right now, so yes, to them, it is vitally important to establish profiling as a “pillar” of observability, because you can charge a hell of a lot more for a “pillar” than you can charge for a mere “feature”. And more power to them. But it doesn’t mean anything from a technical point of view.

On the other hand, “signal” is absolutely a technical term. The OpenTelemetry Signals documentation, which I consider canon, says that OTel currently supports Traces, Metrics, Logs, and Baggage as signal types, with Events and Profiles at the proposal/development stage. So yes, profiling is a type of signal.

The OTel docs define a telemetry signal as “a type of data transmitted remotely for monitoring and analysis”, and they define a pillar as … oh, they don’t even mention pillars? like at all??

I guess there’s your answer.

And this is probably where I should end my piece. (Why am I still typing…. 🤔)

Pillars vs signals

First of all, I want to stress that it does not bother me when engineers go around talking about pillars. Nobody needs to look at me guiltily and apologize for using the term ‘pillar’ atBunnies Addendum (For the Buffy Fans) - En Tequila Es Verdad the bar after a conference because they think I’m mad at them. I am not the language police, it is not my job to go around enforcing correct use of technical terms. (I used to, I know, and I’m sorry! 😆)

When engineers talk about pillars of observability, they’re just talking about signals and signal types, and “pillar” is a perfectly acceptable colloquialism for “signal”.

When a vendor starts talking about pillars, though — as in the example above! — it means they are gearing up to sell you something: another type of signal, siloed off from all the other signals you send them. Your cost multiplier is about to increment again, and then they’re going to start talking about how Important it is that you buy a product for each and every one of the Pillars they happen to have.

As a refresher: there are two basic architecture models used by observability companies, the multiple pillars model and the unified storage model (aka o11y 2.0). The multiple pillars model is to store every type of signal in a different siloed storage location — metrics, logs, traces, profiling, exceptions, etc, everybody gets a database! The unified storage model is to store all signals together in ONE database, preserving context and relationships, so you can treat data like data: slice and dice, zoom in, zoom out, etc.

Most of the industry giants were built using the pillars model, but Honeycomb (and every other observability company founded post-2019) has built using the unified storage model, building wide, structured log events on a columnar storage engine with high cardinality support, and so on.

Bunny-hopping from pillar to pillar

When you use each signal type as a standalone pillar, this leads to an experience I think of as “bunny products” 🐇 where the user is always hopping from pillar to pillar. You see something on your metrics dashboard that looks scary? hop-hop to your logs and try to find it there, using grep and search and matching by timestamps. If you can find the right logs, then you need to trace it, so you hop-hop-hop to your traces and repeat your search there. With profiling as a pillar, maybe you can hop over to that dataset too.🐇🐰

The amount of data duplication involved in this model is mind boggling. You are literally storing the same information in your metrics TSDB as you are in your logs and your traces, just formattedThe 30 Best Bunny Rabbit Memes - Hop to Pop differently. (I never miss an opportunity to link to Jeremy Morrell’s masterful doc on instrumenting your code for wide events, which also happens to illustrate this nicely.) This is insanely expensive. Every request that enters your system gets stored how many times, in how many signals? Count it up; that’s your cost multiplier.

Worse, much of the data that connects each “pillar” exists only in the heads of the most senior engineers, so they can guess or intuit their way around the system, but anyone who relies on actual data is screwed. Some vendors have added an ability to construct little rickety bridges post hoc between pillars, e.g. “this metric is derived from this value in this log line or trace”, but now you’re paying for each of those little bridges in addition to each place you store the data (and it goes without saying, you can only do this for things you can predict or hook up in the first place).

The multiple pillars model (formerly known as observability 1.0) relies on you believing that each signal type must be stored separately and treated differently. That’s what the pillars language is there to reinforce. Is it a Pillar or not?? It doesn’t matter because pillars don’t exist. Just know that if your vendor is calling it a Pillar, you are definitely going to have to Pay for it. 😉

Zooming in and out

But all this data is just.. data. There is no good reason to silo signals off from each other, and lots of good reasons not to. You can derive metrics from rich, structured data blobs, or append your metrics to wide, structured log events. You can add span IDs and visualize them as a trace. The unified storage model (“o11y 2.0”) says you should store your data once, and do all the signal processing in the collection or analysis stages. Like civilized folks.

Anya Bunny Quote - Etsy
All along, Anya was right

From the perspective of the developer, not much changes. It just gets easier (a LOT easier), because nobody is harping on you about whether this nit of data should be a metric, a log, a trace, or all of the above, or if it’s low cardinality or high cardinality, or whether the cardinality of the data COULD someday blow up, or whether it’s a counter, a gauge, a heatmap, or some other type of metric, or when the counter is going to get reset, or whether your heatmap buckets are defined at useful intervals, or…or…

Instead, it’s just a blob of json. Structured data.. If you think it might be interesting to you someday, you dump it in, and if not, you don’t. That’s all. Cognitive load drops way down..

On the backend side, we store it once, retaining all the signal type information and connective tissue.

It’s the user interface where things change most dramatically. No more bunny hopping around from pillar to pillar, guessing and copy-pasting IDs and crossing your fingers. Instead, it works more like the zoom function on PDFs or Google maps.

You start with SLOs, maybe, or a familiar-looking metrics dashboard. But instead of hopping, you just.. zoom in. The SLOs and metrics are derived from the data you need to debug with, so you’re just like.. “Ah what’s my SLO violation about? Oh, it’s because of these events.” Want to trace one of them? Just click on it. No hopping, no guessing, no pasting IDs around, no lining up time stamps.

Zoom in, zoom out, it’s all connected. Same fucking data.

“But OpenTelemetry FORCES you to use three pillars”

There’s a misconception out there that OpenTelemetry is very pro-three pillars, and very anti o11y 2.0. This is a) not true and b) actually the opposite. Austin Parker has written a voluminous amount of material explaining that actually, under the hood, OTel treats everything like one big wide structured event log.

As Austin puts it, “OpenTelemetry, fundamentally, unifies telemetry signals through shared, distributed context.” However:

“The project doesn’t require you to do this. Each signal is usable more or less independently of the other. If you want to use OpenTelemetry data to feed a traditional ‘three pillars’ system where your data is stored in different places, with different query semantics, you can. Heck, quite a few very successful observability tools let you do that today!”

“This isn’t just ‘three pillars but with some standards on top,’ it’s a radical departure from the traditional ‘log everything and let god sort it out’ approach that’s driven observability practices over the past couple of decades.”

You can use OTel to reinforce a three pillars mindset, but you don’t have to. Most vendors have chosen to implement three pillarsy crap on top of it, which you can’t really hold OTel responsible for. One[1] might even argue that OTel is doing as much as it can to influence you in the opposite direction, while still meeting Pillaristas where they’re at.

A postscript on profiling

What will profiling mean in a unified storage world? It just means you’ll be able to zoom in to even finer and lower-level resolution, down to syscalls and kernel operations instead of function calls. Like when Google Maps got good enough that you could read license plates instead of just rooftops.

Admittedly, we don’t have profiling yet at Honeycomb. When we did some research into the profiling space, what we learned was that most of the people who think they’re in desperate need of a profiling tool are actually in need of a good tracing tool. Either they didn’t have distributed tracing or their tracing tools just weren’t cutting it, for reasons that are not germane in a Honeycomb tracing world.

We’ll get to profiling, hopefully in the near-ish future, but for the most part, if you don’t need syscall level data, you probably don’t need profiling data either. Just good traces.

Also… I did not make this site or have any say whatsoever in the building of it, but I did sign the manifesto[2] and every day that I remember it exists is a day I delight in the joy and fullness of being alive: kill3pill.com 📈

Kill Three Pillars

Hop hop, little friends,
~charity

 

[1] Austin argues this. I’m talking about Austin, if not clear enough.
[2] Thank you, John Gallagher!!

How many pillars of observability can you fit on the head of a pin?

Got opinions on observability? I could use your help (once more, with feeling)

Last month I dropped a desperate little plea for help in this space, asking people to email me any good advice and/or strong opinions they happened to have on the topic of buying software.

I wasn’t really sure what to expect — desperate times, desperate measures — but holy crap, you guys delivered. To the many people who took the time to write up your experiences and expertise for me, and suffer through rounds of questions and drafts: ✨thank you✨. And thank you, too, to those of you who forwarded my queries along to experts in your network and asked for help on my behalf.

I learned a LOT about buying software and managing vendor relationships in the process of writing this. Honestly, this chapter is shaping up to be one of the things I’m most excited about for the second edition of the book.

Why I’m excited about the software buying chapter (& you should be too)

I’m imagining you reading this with a skeptical expression and an arched eyebrow. “Really, Charity…‘how to buy software’ doesn’t exactly suggest peak engineering prowess.”

Au contraire, my friends. I’ve come to believe that vendor engineering is one of the subtlest and most powerful practical applications of deep subject matter expertise, and some of the highest leverage work an engineer can do. How often do you get to make decisions that leverage the labor of hundreds or thousands of engineers per year, for fractions of pennies on the dollar? How many of the decisions you make will have an impact on every single engineer you work with and their ability to do their jobs well, as well as the experience of every single customer?

If you think I’m hyperventilating a bit, nah; this is entry level shit. In the book, I tell the story of the best engineer I ever worked with, and how I watched him alter the trajectory of multiple other companies, none of which he was working for, buying from, or formally connected to in any way — in the space of a few conversations. It upended my entire worldview about what it can look like for an engineer to wield great power.

Doing this stuff well takes both technical depth and technical breadth, in addition to systems thinking and knowledge of the business. It is one of the only ways a staff+ engineer can acquire and develop executive-level communication, strategy, and execution skills while remaining an individual contributor.

I’ve been wanting to write about this for YEARS. Anyway — ergh! — I’m rambling now. That was not what I came here to talk about, I’m just excited. Back to the point.

My second (and final) round of questions

I got so much out of your thoughtful responses that I thought I’d press my luck and put a few more questions out to the universe, before it’s too late.

These questions speak to areas where I worry that my writing may be a little weak or uninformed, or too far away from the world where people are using the “three pillars” model (aka multiple pillars or o11y 1.0) and happy about it. I don’t know many (any??) of those people, which suggests some pretty heavy selection bias.

I don’t expect anyone to answer all the questions; if one or two resonate with you, write about those and ignore the rest. If there’s something I didn’t ask that I should have asked, answer that. Something I’ve written in the past that bugged you that you hope I won’t say again? Tell me! We are almost out of time ⌛ so gimme what you got. 🙌

On migrations:

📈 Have you ever migrated from one observability vendor to another? If so, what did you learn? What was the hardest part, what took you by surprise? What do you wish you could go back in time and tell your self at the start?

📈 If you ran (or were involved in) a large scale migration or tool change… how did you structure the process? Like, was it team by team, service by service, product by product? Did you have a playbook? What did you do to make it fun or push through organizational inertia? How long did it take?

On managing costs for the traditional three pillars:

📈 For orgs that are using Datadog, Grafana, Chronosphere, or another traditional three pillars architecture.. How would you describe your approach to cutting and controlling costs? Pro tips and/or comprehensive strategy.

📈 Alternately, if there are particular blog posts with advice you have followed and can personally vouch for, would you send me a link?

📈 How do you guide your software engineers on which data to send to which place — metrics, logs, traces, errors/exceptions, profiling, etc? How do you manage cardinality? How do you work to keep the pillars in sync, or are there any particular tips and tricks you have for linking / jumping between the data sources?

📈 How many ongoing engineering cycles does it take to manage and maintain costs, once you’ve gotten them to a sustainable place?

On managing costs at massive scale:

(Especially for people who work at a large enterprise, the kind with multiple business units, but others welcome too!):

  • Do you use tiers of service for managing costs? How do you define those?
  • How do new tools get taken for a spin? (Like, sometimes there is an office of the CTO with carte blanche to try new things and evaluate them for the rest of the org)
  • How do you use telemetry pipelines?

Observability teams (quick poll):

📈 If you have an observability team, how big is it? What part of the org does it report up into? Roughly how many engineers does that team support?

📈 If you don’t have an observability team — and you have more than, say, 300 engineers — who owns observability? Platform? SRE? Other?

A grab bag:

📈 Build vs Buy: If you built your own observability tool(s)…. What were the reasons? What does it do? Would you make the same decision today?

📈 OpenTelemetry: If your team has weighed the pros and cons of adopting OTel and ultimately decided not to, for technical or philosophical reasons (i.e. not just “we’re too busy”) — what are those reasons?

📈 Instrumentation: what do you do to try and remove cognitive overhead for engineers? How much have you been able to make automatic and magical, and where has the magic failed?

📈 Consolidation: I would love to hear any thoughts on tool consolidation vs tool proliferation. Is this primarily driven by execs, or do technical users care too? Is it driven by cost concerns, usability, or something else?

edited on 2025-10-15 to add… oh crap, one last question:

📈 Open source: Are you using open source observability tools, and if so, are these your primary tools or one piece of a comprehensive tooling strategy? If the latter, could you describe that strategy for me?

Send it to me in an email

Please send me your opinions or answers in an email, to my first name at honeycomb dot io, with the subject line “Observability questions”.

If I end up cribbing from your material, it okay for me to print your name? (As in, “thanks to the people who informed my thinking on this subject, abc xyz etc”). I will not mention your employer or where you work, don’t worry.

If you send it to me more than a week from now, I probably won’t be able to use it. Augh, I wish I had thought of this in JUNE!!! #ragrets

✨THANK YOU✨

I know this is an incredibly time consuming thing to ask of someone, and I can’t express how much I appreciate your help.

P.S. Yes, the title is absolutely a reference to the Buffy musical. Hey, I had to give you guys something fun to read along with my second bleg in less than a month (do people still say “bleg”??).

6 Musical Episodes of TV Shows That Deserve an Encore

P.P.S. Grammar quiz of the day: should my title read “opinions ABOUT observability” or “opinions ON observability” ??

GREAT QUESTION — and, as it turns out, the preposition you choose may reveal more than you realized.

“About” is used to introduce a topic or subject in a broad, vague, or approximate sense, while “on” is used to signal more detailed, specific, formal or serious subject matter (as well as physical objects). “Let’s talk about dinner” vs “she delivered a lecture on why AI is trying to kill babies.”

Or as Xander says, “To read makes our English speaking good.”

The earth is doomed,
~charity

Got opinions on observability? I could use your help (once more, with feeling)

Are you an experienced software buyer? I could use some help.

If it seems like I’ve been relatively quiet lately on social media and my blog, that’s because I have. Liz, Austin, George and I have been busy toiling away on the second edition of “Observability Engineering” ever since April or May. I personally have been trying to spend 75-80% of my time on the book since May.

Have I been successful in that attempt? No. But I’m trying. Progress is being made. Hopefully just a few more weeks of drafting and we’ll be on to edits, and on to your grubby little paws by May-ish.

The world has changed A LOT since we wrote the first edition, in 2019-2022. Do you know, the phrase “observability engineering teams” doesn’t even occur in the first edition of the book? Try and search — it can’t be found! Even the phrase “observability teams” doesn’t pop up til near the end, and when it does, we are referring to those few teams that choose to build their own observability tools from scratch.

These days, observability engineering teams are everywhere. Which is why we are adding a whole new section, a sizable one, called “Observability Governance.” The governance section will have a bunch of chapters on topics like how to staff these teams, where they should fit in the org chart, how to buy good tools, how to integrate them, how to manage costs, how to make the business case up the chain to senior execs, how to manage schemas and semantic conventions at scale, and much much more.

The problem

The problem is, I’ve never really bought software. Not like this. I’ve never even worked at a  truly large, software-buying enterprise tech company. So I am not well equipped to give good advice on questions like:

  • How do you shop around for options?
  • What are some signs you may need to suck it up and change vendors?
  • What does a good POC (proof of concept) look like?
  • Who are your stakeholders? What are their concerns?
  • How do you drive consensus when millions of dollars (and the work experience of thousands of engineers) are on the line? What does ‘consensus’ even mean in that context?
  • What are the primary considerations should you take into account when making a decision? What are secondary considerations?

I’m looking for the kind of advice that a principal engineer who has done this many times might give a staff engineer who is doing it for the first time. Or that a VP who’s done this many times might give a director who is doing it for the first time.

Can you help?

This is me wearing Leia buns and projecting a unicorn-shaped rainbow bat signal out into the sky for help. Do you have any advice for me? What guidance would you give to the readers of the second edition of this book?

Please send your advice to me in an email, addressed to my first name at honeycomb dot io, with the subject line: “Buying Software”. Include any relevant context about how large the company or engineering org is, and what your role in purchasing was.

I may respond with more questions, or reply and ask if you are able to talk synchronously. But I will not quote anything you send me without first asking your permission and getting a signed release. I will not mention ANY vendors by name, good or bad.

I am not fishing for honeycomb customers or buyers, I assume most of you haven’t tried honeycomb and don’t care about it and that is fine. This is not a Honeycomb project, this is an O’Reilly writing project. I just want to gather up some good advice on buying software and funnel it back out to good engineers.

Can you help? Your industry needs you! <3

 

 

Are you an experienced software buyer? I could use some help.

How We Migrated the Parse API From Ruby to Golang (Resurrected)

I wrote a lot of blog posts over my time at Parse, but they all evaporated after Facebook killed the product. Most of them I didn’t care about (there were, ahem, a lot of “service reliability updates”), but I was mad about losing one specific piece, a deceptively casual retrospective of the grueling, murderous two-year rewrite of our entire API from Ruby on Rails to Golang..

I could have sworn I’d looked for it before, but someone asked me a question about migrations this morning, which spurred me to pull up the Wayback Machine again and dig in harder, and … ✨I FOUND IT!!✨

Honestly, it is entirely possible that if we had not done this rewrite, there might be no Honeycomb. In the early days of the rewrite, we would ship something in Go and the world would break, over and over and over. As I said,

Rails HTTP processing is built on a philosophy of “be liberal in what you accept”. So developers end up inadvertently sending API requests that are undocumented or even non-RFC compliant … but Rails middleware cleans them up and handles it fine.

Rails would accept any old trash, Go would not. Breakage ensues. Tests couldn’t catch what we didn’t know to look for. Eventually we lit upon a workflow where we would split incoming production traffic, run each request against a Go API server and a Ruby API server, each backed by its own set of MongoDB replicas, and diff the responses. This is when we first got turned on to how incredibly powerful Scuba was, in its ability to compare individual responses, field by field, line by line.

Once you’ve used a tool like that, you’re hooked.. you can’t possibly go back to metrics and aggregates. The rest, as they say, is history.

The whole thing is still pretty fun to read, even if I can still smell the blood and viscera a decade later. Enjoy.


“How We Moved Our API From Ruby to Go and Saved Our Sanity”

Originally posted on blog.parse.com on June 10th, 2015.

The first lines of Parse code were written nearly four years ago. In 2011 Parse was a crazy little idea to solve the problem of building mobile apps.

Those first few lines were written in Ruby on Rails.


Ruby on Rails

Ruby let us get the first versions of Parse out the door quickly. It let a small team of engineers iterate on it and add functionality very fast. There was a deep bench of library support, gems, deploy tooling, and best practices available, so we didn’t have to reinvent very many wheels.

We used Unicorn as our HTTP server, Capistrano to deploy code, RVM to manage the environment, and a zillion open source gems to handle things like YAML parsing, oauth, JSON parsing, MongoDB, and MySQL. We also used Chef which is Ruby-based to manage our infrastructure so everything played together nicely. For a while.

The first signs of trouble bubbled up in the deploy process. As our code base grew, it took longer and longer to deploy, and the “graceful” unicorn restarts really weren’t very graceful. So, we monkeypatched rolling deploy groups in to Capistrano.

“Monkeypatch” quickly became a key technical term that we learned to associate with our Ruby codebase.

A year and a half in, at the end of 2012, we had 200 API servers running on m1.xlarge instance types with 24 unicorn workers per instance. This was to serve 3000 requests per second for 60,000 mobile apps. It took 20 minutes to do a full deploy or rollback, and we had to do a bunch of complicated load balancer shuffling and pre-warming to prevent the API from being impacted during a deploy.

Then, Parse really started to take off and experience hockey-stick growth.


Problems

When our API traffic and number of apps started growing faster, we started having to rapidly spin up more database machines to handle the new request traffic. That is when the “one process per request” part of the Rails model started to fall apart.

With a typical Ruby on Rails setup, you have a fixed pool of worker processes, and each worker can handle only one request at a time. So any time you have a type of request that is particularly slow, your worker pool can rapidly fill up with that type of request. This happens too fast for things like auto-scaling groups to react. It’s also wasteful because the vast majority of these workers are just waiting on another service. In the beginning, this happened pretty rarely and we could manage the problem by paging a human and doing whatever was necessary to keep the API up. But as we started growing faster and adding more databases and workers, we added more points of failure and more ways for performance to get degraded.

We started looking ahead to when Parse would 10x its size, and realized that the one-process-per-request model just wouldn’t scale. We had to move to an async model that was fundamentally different from the Rails way. Yeah, rewrites are hard, and yeah they always take longer than anyone ever anticipates, but we just didn’t see how we could make the Rails codebase scale while it was tied to one process per request.


What next?

We knew we needed asynchronous operations. We considered a bunch of options:

EventMachine

We already had some of our push notification service using EventMachine, but our experience was not great as it too was scaling. We had constant trouble with accidentally introducing synchronous behavior or parallelism bugs. The vast majority of Ruby gems are not asynchronous, and many are not threadsafe, so it was often hard to find a library that did some common task asynchronously.

JRuby

This might seem like the obvious solution – after all, Java has threads and can handle massive concurrency. Plus it’s Ruby already, right? This is the solution Twitter investigated before settling on Scala. But since JRuby is still basically Ruby, it still has the problem of asynchronous library support. We were concerned about needing a second rewrite later, from JRuby to Java. And literally nobody at all on our backend or ops teams wanted to deal with deploying and tuning the JVM. The groans were audible from outer space.

C++

We had a lot of experienced C++ developers on our team. We also already had some C++ in our stack, in our Cloud Code servers that ran embedded V8. However, C++ didn’t seem like a great choice. Our C++ code was harder to debug and maintain. It seemed clear that C++ development was generally less productive than more modern alternatives. It was missing a lot of library support for things we knew were important to us, like HTTP request handling. Asynchronous operation was possible but often awkward. And nobody really wanted to write a lot of C++ code.

C#

C# was a strong contender. It arguably had the best concurrency model with Async and Await. The real problem was that C# development on Linux always felt like a second-class citizen. Libraries that interoperate with common open source tools are often unavailable on C#, and our toolchain would have to change a lot.

Go

Go and C# both have asynchronous operation built into the language at a low level, making it easy for large groups of people to write asynchronous code. The MongoDB Go driver is probably the best MongoDB driver in existence, and complex interaction with MongoDB is core to Parse. Goroutines were much more lightweight than threads. And frankly we were most excited about writing Go code. We thought it would be a lot easier to recruit great engineers to write Go code than any of the other solid async languages.

In the end, the choice boiled down to C# vs Go, and we chose Go.


Wherein we rewrite the world

We started out rewriting our EventMachine push backend from Ruby to Go. We did some preliminary benchmarking with Go concurrency and found that each network connection ate up only 4kb of RAM. After rewriting the EventMachine push backend to Go we went from 250k connections per node to 1.5 million connections per node without even touching things like kernel tuning. Plus it seemed really fun. So, Go it was.

We rewrote some other minor services and starting building new services in Go. The main challenge, though, was to rewrite the core API server that handles requests to api.parse.com while seamlessly maintaining backward compatibility. We rewrote this endpoint by endpoint, using a live shadowing system to avoid impacting production, and monitored the differential metrics to make sure the behaviors matched.

During this time, Parse 10x’d the number of apps on our backend and more than 10x’d our request traffic. We also 10x’d the number of storage systems backed by Ruby. We were chasing a rapidly moving target.

The hardest part of the rewrite was dealing with all the undocumented behaviors and magical mystery bits that you get with Rails middleware. Parse exposes a REST API, and Rails HTTP processing is built on a philosophy of “be liberal in what you accept”. So developers end up inadvertently sending API requests that are undocumented or even non-RFC compliant … but Rails middleware cleans them up and handles it fine.

So we had to port a lot of delightful behavior from the Ruby API to the Go API, to make sure we kept handling the weird requests that Rails handled. Stuff like doubly encoded URLs, weird content-length requirements, bodies in HTTP requests that shouldn’t have bodies, horrible oauth misuse, horrible mis-encoded Unicode.

Our Go code is now peppered with fun, cranky comments like these:

// Note: an unset cache version is treated by ruby as “”.
// Because of this, dirtying this isn’t as simple as deleting it – we need to
// actually set a new value.

// This byte sequence is what ruby expects.
// yes that’s a paren after the second 180, per ruby.

// Inserting and having an op is kinda weird: We already know
// state zero. But ruby supports it, so go does too.

// single geo query, don’t do anything. stupid and does not make sense
// but ruby does it. Changing this will break a lot of client tests.
// just be nice and fix it here.

// Ruby sets various defaults directly in the structure and expects them to appear in cache.
// For consistency, we’ll do the same thing.

Results

Was the rewrite worth it? Hell yes it was. Our reliability improved by an order of magnitude. More importantly, our API is not getting more and more fragile as we spin up more databases and backing services. Our codebase got cleaned up and we got rid of a ton of magical gems and implicit assumptions. Co-tenancy issues improved for customers across the board. Our ops team stopped getting massively burned out from getting paged and trying to track down and manually remediate Ruby API outages multiple times a week. And needless to say, our customers were happier too.

We now almost never have reliability-impacting events that can be tracked back to the API layer – a massive shift from a year ago. Now when we have timeouts or errors, it’s usually constrained to a single app – because one app is issuing a very inefficient query that causes timeouts or full table scans for their app, or it’s a database-related co-tenancy problem that we can resolve by automatically rebalancing or filtering bad actors.

An asynchronous model had many other benefits. We were also able to instrument everything the API was doing with counters and metrics, because these were no longer blocking operations that interfered with communicating to other services. We could downsize our provisioned API server pool by about 90%. And we were also able to remove silos of isolated Rails API servers from our stack, drastically simplifying our architecture.

As if that weren’t enough, the time it takes to run our full integration test suite dropped from 25 minutes to 2 minutes, and the time to do a full API server deploy with rolling restarts dropped from 30 minutes to 3 minutes. The go API server restarts gracefully so no load balancer juggling and prewarming is necessary.

We love Go. We’ve found it really fast to deploy, really easy to instrument, really lightweight and inexpensive in terms of resources. It’s taken a while to get here, but the journey was more than worth it.

Credits/Blames

Credits/Blames go to Shyam Jayaraman for driving the initial decision to use Go, Ittai Golde for shepherding the bulk of the API server rewrite from start to finish, Naitik Shah for writing and open sourcing a ton of libraries and infrastructure underpinning our Go code base, and the rest of the amazing Parse backend SWE team who performed the rewrite.

How We Migrated the Parse API From Ruby to Golang (Resurrected)

Thoughts on Motivation and My 40-Year Career

I’ve never published an essay quite like this. I’ve written about my life before, reams of stuff actually, because that’s how I process what I think, but never for public consumption.

I’ve been pushing myself to write more lately because my co-authors and I have a whole fucking book to write between now and October. After ten years, you’d think this would be getting easier, not harder.

There’s something about putting out such memoiristic material that feels uncomfortably feminine to me. (Wow, ok.) I want to be known for my work, not for having a dramatic personal life. I love my family and don’t want to put them on display for the world to judge. And I never want the people I care about to feel like I am mining their experiences for clicks and content, whether that’s my family or my coworkers.

Many of the writing exercises I’ve been doing lately have ended up pulling on threads from my backstory, and the reason I haven’t published them is because I find myself thinking, “this won’t make any sense to people unless they know where I’m coming from.”

So hey, fuck it, let’s do this.

I went to college at the luckiest time

I left home when I was 15 years old. I left like a bottle rocket taking off – messy, explosive, a trail of destruction in my wake, and with absolutely zero targeting mechanisms.

It tells you a lot about how sheltered I was that the only place I could think of to go was university. I had never watched TV or been to a sports game or listened to popular music. I had never been to a doctor, I was quite unvaccinated.

I grew up in the backwoods of Idaho, the oldest of six, all of us homeschooled. I would go for weeks without seeing anyone other than my family. The only way to pass the time was by reading books or playing piano, so I did quite a lot of both. I called up the University of Idaho, asked for an admissions packet, hand wrote myself a transcript and gave myself all As, drove up and auditioned for the music department, and was offered a partial ride scholarship for classical piano performance.

I told my parents I was leaving, with or without their blessing or financial support. I left with neither.

My timing turned out to be flawless. I arrived on the cusp of the Internet age – they were wiring dorms for ethernet the year I enrolled. Maybe even more important, I arrived in the final, fading glory years of affordable state universities.

I worked multiple minimum wage jobs to put myself through school; day care, front desk, laundry, night audit. It was grueling, round the clock labor, but it was possible, if you were stubborn enough. I didn’t have a Social Security number (long story), I wasn’t old enough to take out loans, I couldn’t get financial aid because my parents didn’t file income taxes (again, long story). There was no help coming, I sank or I swam.

I found computers and the Internet around the same time as it dawned on me that everybody who studied music seemed to end up poor as an adult. I grew up too poor to buy canned vegetables or new underwear; we were like an 1800s family, growing our food, making our clothes, hand-me-downs til they fell apart.

Fuck being poor. Fuck it so hard. I was out.

I lost my music scholarship, but I started building websites and running systems for the university, then for local businesses. I dropped out and took a job in San Francisco. I went back, abortively; I dropped out again.

By the time I was 20 I was back in SF for good, making a salary five times what my father had made.

I grew up with a very coherent belief system that did not work for me

A lot of young people who flee their fundamentalist upbringing do so because they were abused and/or lost their faith, usually due to the hypocrisy of their leaders. Not me. I left home still believing the whole package – that evolution was a fraud, that the earth was created in seven days, that woman was created from Adam’s rib to be a submissive helpmate for their husband, that birth control was a sin, that anyone who believed differently was going to hell.

My parents loved us deeply and unshakably, and they were not hypocrites. In the places I grew up, the people who believed in God and went to church and lived a certain way were the ones who had their shit together, and the people who believed differently had broken lives. Reality seemed to confirm the truth of all we were taught, no matter how outlandish it sounds.

So I fully believed it was all true. I also knew it did not work for me. I did not want a small life. I did not want to be the support system behind some godly dude. I wanted power, money, status, fame, autonomy, success. I wanted to leave a crater in the world.

I was not a rebellious child, believe it or not. I loved my parents and wanted to make them proud. But as I entered my teens, I became severely depressed, and turned inward and hurt myself in all the ways young people do.

I left because staying there was killing me, and ultimately, I think my parents let me go because they saw it too.

Running away from things worked until it didn’t

I didn’t know what I wanted out of life other than all of it; right now, and my first decade out on my own was a hoot. It was in my mid twenties that everything started to fall apart.

I was an earnest kid who liked to study and think about the meaning of life, but when I bolted, I slammed the door to my conscience shut. I knew I was going to hell, but since I couldn’t live the other way, I made the very practical determination based on actuarial tables that I could to go my own way for a few decades, then repent and clean up my shit before I died. (Judgment Day was one variable that gave me heartburn, since it could come at any time.)

I was not living in accordance with my personal values and ethics, to put it lightly. I compartmentalized; it didn’t bother me, until it did. It started leaking into my dreams every night, and then it took over my waking life. I was hanging on by a thread; something had to give.

My way out, unexpectedly, started with politics. I started mainlining books about politics and economics during the Iraq War, which then expanded to history, biology, philosophy, other religious traditions, and everything else. (You can still find a remnant of my reading list here.)

When I was 13, I had an ecstatic religious experience; I was sitting in church, stewing over going to hell, and was suddenly filled with a glowing sense of warmth and acceptance. It lasted for nearly two weeks, and that’s how I knew I was “saved”.

In my late 20s, after a few years of intense study and research, I had a similar ecstatic experience walking up the stairs from the laundry room. I paused, I thought “maybe there is no God; maybe there is nobody out there judging me; maybe it all makes sense”, and it all clicked into place, and I felt high for days, suffused with peace and joy.

My career didn’t really take off until after that. I always had a job, but I wasn’t thinking about tech after hours. At first I was desperately avoiding my problems and self-medicating, later I became obsessed with finding answers. What did I believe about taxation, public policy, voting systems, the gender binary, health care, the whole messy arc of American history? I was an angry, angry atheist for a while. I filled notebook after notebook with handwritten notes; if I wasn’t working, I was studying.

And then, gradually, I wound down. The intensity, the high, tapered off. I started dating, realized I was poly and queer, and slowly chilled the fuck out. And that’s when I started being able to dedicate the creative, curious parts of my brain to my job in tech.

Why am I telling you all this?

Will Larson has talked a lot about how his underlying motivation is “advancing the industry”. I love that for him. He is such a structured thinker and prolific writer, and the industry needs his help, very badly.

For a while I thought that was my motivation too. And for sure, that’s a big part of it, particularly when it comes to observability and my day job. (Y’all, it does not need to be this hard. Modern observability is the cornerstone and prerequisite for high performing engineering teams, etc etc.)

But when I think about what really gets me activated on a molecular level, it’s a little bit different. It’s about living a meaningful life, and acting with integrity, and building things of enduring value instead of tearing them down.

When I say it that way, it sounds like sitting around on the mountain meditating on the meaning of life, and that is not remotely what I mean. Let me try again.

For me, work has been a source of liberation

It’s very uncool these days to love your job or talk about hard work. But work has always been a source of liberation for me. My work has brought me so much growth and development and community and friendship. It brings meaning to my life, and the joy of creation. I want this for myself. I want this for anyone else who wants it too.

I understand why this particular tide has turned. So many people have had jobs where their employers demanded total commitment, but felt no responsibility to treat them well or fairly in return. So many people have never experienced work as anything but a depersonalizing grind, or an exercise in exploitation, and that is heartbreaking.

I don’t think there’s anything morally superior about people who want their work to be a vehicle for personal growth instead of just a paycheck. I don’t think there’s anything wrong with just wanting a paycheck, or wanting to work the bare minimum to get by. But it’s not what I want for myself, and I don’t think I’m alone in this.

I feel intense satisfaction and a sense of achievement when I look back on my career. On a practical level, I’ve been able to put family members through college, help with down payments, and support artists in my community. All of this would have been virtually unimaginable to me growing up.

I worked a lot harder on the farm than I ever have in front of a keyboard, and got a hell of a lot less for my efforts.

(People who glamorize things like farming, gardening, canning and freezing, taking care of animals, cooking and caretaking, and other forms of manual labor really get under my skin. All of these things make for lovely hobbies, but subsistence labor is neither fun nor meaningful. Trust me on this one.)

My engineer/manager pendulum days

I loved working as an engineer. I loved how fast the industry changes, and how hard you have to scramble to keep up. I loved the steady supply of problems to fix, systems to design, and endless novel catastrophes to debug. The whole Silicon Valley startup ecosystem felt like it could not have been more perfectly engineered to supply steady drips of dopamine to my brain.

I liked working as an engineering manager. Eh, that might be an overstatement. But I have strong opinions and I like being in charge, and I really wanted more access to information and influence over decisions, so I pushed my way into the role more than once.

If Honeycomb hadn’t happened, I am sure I would have bounced back and forth between engineer and manager for the rest of my career. I never dreamed about climbing the ladder or starting a company. My attitude towards middle management could best be described as amiable contempt, and my interest in the business side of things was nonexistent.

I have always despised people who think they’re too good to work for other people, and that describes far too many of the founders I’ve met.

Operating a company draws on a different kind of meaning

I got the chance to start a company in 2016, so I took it, almost on a whim. Since then I have done so many things I never expected to do. I’ve been a founder, CEO, CTO, I’ve raised money, hired and fired other execs, run organizations, crafted strategy, and come to better understand and respect the critical role played by sales, marketing, HR, and other departments. No one is more astonished than I am to find me still here, still doing this.

But there is joy to be found in solving systems problems, even the ones that are less purely technical. There is joy to be found in building a company, or competing in a marketplace.

To be honest, this is not a joy that came to me swiftly or easily. I’ve been doing this for the past 9.5 years, and I’ve been happy doing it for maybe the past 2-3 years. But it has always felt like work worth doing. And ultimately, I think I’m less interested in my own happiness (whatever that means) than I am interested in doing work that feels worth doing.

Work is one of the last remaining places where we are motivated to learn from people we don’t agree with and find common pursuit with people we are ideologically opposed to. I think that’s meaningful. I think it’s worth doing.

Reality doesn’t give a shit about ideology

I am a natural born extremist. But when you’re trying to operate a business and win in the marketplace, ideological certainty crashes hard into the rocks of reality. I actually find this deeply motivating.

I spent years hammering out my own personal ontological beliefs about what is right and just, what makes a life worth living, what responsibilities we have to each another. I didn’t really draw on those beliefs very often as an engineer/manager, at least not consciously. That all changed dramatically after starting a company.

It’s one thing to stand off to the side and critique the way a company is structured and the decisions leaders make about compensation, structure, hiring/firing, etc. But creation is harder than critique (one of my favorite Jeff Gray quotes) — so, so, so much harder. And reality resists easy answers.

Being an adult, to me, has meant making peace with a multiplicity of narratives. The world I was born into had a coherent story and a set of ideals that worked really well for a lot of people, but it was killing me. Not every system works for every person, and that’s okay. That’s life. Startups aren’t for everyone, either.

The struggle is what brings your ideals to life

Almost every decision you make running a company has some ethical dimension. Yet the foremost responsibility you have to your stakeholders, from investors to employees, is to make the business succeed, to win in the marketplace. Over-rotating on ethical repercussions of every move can easily cause you to get swamped in the details and fail at your prime directive.

Sometimes you may have a strongly held belief that some mainstream business practice is awful, so you take a different path, and then you learn the hard way why it is that people don’t take that path. (This has happened to me more times than I can count. 🙈)

Ideals in a vacuum are just not that interesting. If I wrote an essay droning on and on about “leading with integrity”, no one would read it, and nor should they. That’s boring. What’s interesting is trying to win and do hard things, while honoring your ideals.

Shooting for the stars and falling short, innovating, building on the frontier of what’s possible, trying but failing, doing exciting things that exceed your hopes and dreams with a team just as ambitious and driven as you are, while also holding your ideals to heart — that’s fucking exciting. That’s what brings your ideals to life.

We have lived through the golden age of tech

I recognize that I have been profoundly lucky to be employed through the golden age of tech. It’s getting tougher out there to enter the industry, change jobs, or lead with integrity.

It’s a tough time to be alive, in general. There are macro scale political issues that I have no idea how to solve or fix. Wages used to rise in line with productivity, and now they don’t, and haven’t since the mid 70s. Capital is slurping up all the revenue and workers get an ever decreasing share, and I don’t know how to fix that, either.

But I don’t buy the argument that just because something has been touched by capitalism or finance it is therefore irreversibly tainted, or that there is no point in making capitalist institutions better. The founding arguments of capitalism were profoundly moral ones, grounded in a keen understanding of human nature. (Adam Smith’s “Wealth of Nations” gets all the attention, but his other book, “Theory of Moral Sentiments”, is even better, and you can’t read one without the other.)

As a species we are both individualistic and communal, selfish and cooperative, and the miracle of capitalism is how effectively it channels the self-interested side of our nature into the common good.

Late stage capitalism, however, along with regulatory capture, enshittification, and the rest of it, has made the modern world brutally unkind to most people. Tech was, for a shining moment in time, a path out of poverty for smart kids who were willing to work their asses off. It’s been the only reliable growth industry of my lifetime.

It remains, for my money, the best job in the world. Or it can be. It’s collaborative, creative, and fun; we get paid scads of money to sit in front of a computer and solve puzzles all day. So many people seem to be giving up on the idea that work can ever be a place of meaning and collaboration and joy. I think that sucks. It’s too soon to give up! If we prematurely abandon tech to its most exploitative elements, we guarantee its fate.

If you want to change the world, go into business

Once upon a time, if you had strongly held ideals and wanted to change the world, you went into government or nonprofit work.

For better or for worse (okay, mostly worse), we live in an age where corporate power dominates. If you want to change the world, go into business.

The world needs, desperately, people with ethics and ideals who can win at business. We can’t let all the people who care about people go into academia or medicine or low wage service jobs. We can’t leave the ranks of middle and upper management to be filled by sycophants and sociopaths.

There’s nothing sinister about wanting power; what matters is what you do with it. Power, like capitalism, is a tool, and can be bent to powerful ends both good and evil. If you care about people, you should be unashamed about wanting to amass power and climb the ladder.

There are a lot of so-called best practices in this industry that are utterly ineffective (cough, whiteboarding B-trees in an interview setting), yet they got cargo culted and copied around for years. Why? Because the company that originated the practice made a lot of money. This is stupid, but it also presents an opportunity. All you need to do is be a better company, then make a lot of money. 😉

People need institutions

I am a fundamentalist at heart, just like my father. I was born to be a bomb thrower and a contrarian, a thorn in the side of the smug moderate establishment. Unfortunately, I was born in an era where literally everyone is a fucking fundamentalist and the establishment is holding on by a thread.

I’ve come to believe that the most quietly radical, rebellious thing I can possibly do is to be an institutionalist, someone who builds instead of performatively tearing it all down.

People need institutions. We crave the feeling of belonging to something much larger than ourselves. It’s one of the most universal experiences of our species.

One of the reasons modern life feels so fragmented and hard is because so many of our institutions have broken down or betrayed the people they were supposed to serve. So many of the associations that used to frame our lives and identities — church, government, military, etc — have tolerated or covered up so much predatory behavior and corruption, it no longer surprises anyone.

We’ve spent the past few decades ripping down institutions and drifting away from them. But we haven’t stopped wanting them, or needing them.

I hope, perhaps naively, that we are entering into a new era of rebuilding, sadder but wiser. An era of building institutions with accountability and integrity, institutions with enduring value, that we can belong to and take pride in… not because we were coerced or deceived, not because they were the only option, but because they bring us joy and meaning. Because we freely choose them, because they are good for us.

The second half of your career is about purpose

It seems very normal to enter the second half of your 40 year career thinking a lot about meaning and purpose. You spend the first decade or so hoovering up skill sets, the second finding your place and what feeds you, and then, inevitably, you start to think about what it all means and what your legacy will be.

That’s definitely where I’m at, as I think about the second half of my career. I want to take risks. I want to play big and win bigger. I want to show that hard work isn’t just a scam inflicted on those who don’t know any better. If we win, I want the people I work with to earn lifechanging amounts of money, so they can buy homes and send their kids to college. I want to show that work can still be an avenue for liberation and community and personal growth, for those of us who still want that.

I care about this industry and the people in it so much, because it’s been such a gift to me. I want to do what I can to make it a better place for generations to come. I want to build institutions worth belonging to.

Thoughts on Motivation and My 40-Year Career

In Praise of “Normal” Engineers

This article was originally commissioned by Luca Rossi (paywalled) for refactoring.fm, on February 11th, 2025. Luca edited a version of it that emphasized the importance of building “10x engineering teams” . It was later picked up by IEEE Spectrum (!!!), who scrapped most of the teams content and published a different, shorter piece on March 13th.

This is my personal edit. It is not exactly identical to either of the versions that have been publicly released to date. It contains a lot of the source material for the talk I gave last week at #LDX3 in London, “In Praise of ‘Normal’ Engineers” (slides), and a couple weeks ago at CraftConf. 

In Praise of “Normal” Engineers

Most of us have encountered a few engineers who seem practically magician-like, a class apart from the rest of us in their ability to reason about complex mental models, leap to non-obvious yet elegant solutions, or emit waves of high quality code at unreal velocity.In Praise of "Normal" Engineers

I have run into any number of these incredible beings over the course of my career. I think this is what explains the curious durability of the “10x engineer” meme. It may be based on flimsy, shoddy research, and the claims people have made to defend it have often been risible (e.g. “10x engineers have dark backgrounds, are rarely seen doing UI work, are poor mentors and interviewers”), or blatantly double down on stereotypes (“we look for young dudes in hoodies that remind us of Mark Zuckerberg”). But damn if it doesn’t resonate with experience. It just feels true.

The problem is not the idea that there are engineers who are 10x as productive as other engineers. I don’t have a problem with this statement; in fact, that much seems self-evidently true. The problems I do have are twofold.

Measuring productivity is fraught and imperfect

First: how are you measuring productivity? I have a problem with the implication that there is One True Metric of productivity that you can standardize and sort people by. Consider, for a moment, the sheer combinatorial magnitude of skills and experiences at play:

  • Are you working on microprocessors, IoT, database internals, web services, user experience, mobile apps, consulting, embedded systems, cryptography, animation, training models for gen AI… what?
  • Are you using golang, python, COBOL, lisp, perl, React, or brainfuck? What version, which libraries, which frameworks, what data models? What other software and build dependencies must you have mastered?
  • What adjacent skills, market segments, or product subject matter expertise are you drawing upon…design, security, compliance, data visualization, marketing, finance, etc?
  • What stage of development? What scale of usage? What matters most — giving good advice in a consultative capacity, prototyping rapidly to find product-market fit, or writing code that is maintainable and performant over many years of amortized maintenance? Or are you writing for the Mars Rover, or shrinkwrapped software you can never change?

Also: people and their skills and abilities are not static. At one point, I was a pretty good DBRE (I even co-wrote the book on it). Maybe I was even a 10x DB engineer then, but certainly not now. I haven’t debugged a query plan in years.

“10x engineer” makes it sound like 10x productivity is an immutable characteristic of a person. But someone who is a 10x engineer in a particular skill set is still going to have infinitely more areas where they are normal or average (or less). I know a lot of world class engineers, but I’ve never met anyone who is 10x better than everyone else across the board, in every situation.

Engineers don’t own software, teams own software

Second, and even more importantly: So what? It doesn’t matter. Individual engineers don’t own software, teams own software. The smallest unit of software ownership and delivery is the engineering team. It doesn’t matter how fast an individual engineer can write software, what matters is how fast the team can collectively write, test, review, ship, maintain, refactor, extend, architect, and revise the software that they own.

Everyone uses the same software delivery pipeline. If it takes the slowest engineer at your company five hours to ship a single line of code, it’s going to take the fastest engineer at your company five hours to ship a single line of code. The time spent writing code is typically dwarfed by the time spent on every other part of the software development lifecycle.

If you have services or software components that are owned by a single engineer, that person is a single point of failure.

I’m not saying this should never happen. It’s quite normal at startups to have individuals owning software, because the biggest existential risk that you face is not moving fast enough, not finding product market fit, and going out of business. But as you start to grow up as a company, as users start to demand more from you, and you start planning for the survival of the company to extend years into the future…ownership needs to get handed over to a team. Individual engineers get sick, go on vacation, and leave the company, and the business has got to be resilient to that.

If teams own software, then the key job of any engineering leader is to craft high-performing engineering teams. If you must 10x something, 10x this. Build 10x engineering teams.

The best engineering orgs are the ones where normal engineers can do great work

When people talk about world-class engineering orgs, they often have in mind teams that are top-heavy with staff and principal engineers, or recruiting heavily from the ranks of ex-FAANG employees or top universities.

But I would argue that a truly great engineering org is one where you don’t HAVE to be one of the “best” or most pedigreed engineers in the world to get shit done and have a lot of impact on the business.

I think it’s actually the other way around. A truly great engineering organization is one where perfectly normal, workaday software engineers, with decent software engineering skills and an ordinary amount of expertise, can consistently move fast, ship code, respond to users, understand the systems they’ve built, and move the business forward a little bit more, day by day, week by week.

Any asshole can build an org where the most experienced, brilliant engineers in the world can build product and make progress. That is not hard. And putting all the spotlight on individual ability has a way of letting your leaders off the hook for doing their jobs. It is a HUGE competitive advantage if you can build sociotechnical systems where less experienced engineers can convert their effort and energy into product and business momentum.

A truly great engineering org also happens to be one that mints world-class software engineers. But we’re getting ahead of ourselves, here.

Let’s talk about “normal” for a moment

A lot of technical people got really attached to our identities as smart kids. The software industry tends to reflect and reinforce this preoccupation at every turn, from Netflix’s “we look for the top 10% of global talent” to Amazon’s talk about “bar-raising” or Coinbase’s recent claim to “hire the top .1%”. (Seriously, guys? Ok, well, Honeycomb is going to hire only the top .00001%!)

In this essay, I would like to challenge us to set that baggage to the side and think about ourselves as normal people.

It can be humbling to think of ourselves as normal people, but most of us are in fact pretty normal people (albeit with many years of highly specialized practice and experience), and there is nothing wrong with that. Even those of us who are certified geniuses on certain criteria are likely quite normal in other ways — kinesthetic, emotional, spatial, musical, linguistic, etc.

Software engineering both selects for and develops certain types of intelligence, particularly around abstract reasoning, but nobody is born a great software engineer. Great engineers are made, not born. I just don’t think there’s a lot more we can get out of thinking of ourselves as a special class of people, compared to the value we can derive from thinking of ourselves collectively as relatively normal people who have practiced a fairly niche craft for a very long time.

Build sociotechnical systems with “normal people” in mind

When it comes to hiring talent and building teams, yes, absolutely, we should focus on identifying the ways people are exceptional and talented and strong. But when it comes to building sociotechnical systems for software delivery, we should focus on all the ways people are normal.

Normal people have cognitive biases — confirmation bias, recency bias, hindsight bias. We work hard, we care, and we do our best; but we also forget things, get impatient, and zone out. Our eyes are inexorably drawn to the color red (unless we are colorblind). We develop habits and ways of doing things, and resist changing them. When we see the same text block repeatedly, we stop reading it.

We are embodied beings who can get overwhelmed and fatigued. If an alert wakes us up at 3 am, we are much more likely to make mistakes while responding to that alert than if we tried to do the same thing at 3pm. Our emotional state can affect the quality of our work. Our relationships impact our ability to get shit done.

When your systems are designed to be used by normal engineers, all that excess brilliance they have can get poured into the product itself, instead of wasting it on navigating the system itself.

How do you turn normal engineers into 10x engineering teams?

None of this should be terribly surprising; it’s all well known wisdom. In order to build the kind of sociotechnical systems for software delivery that enable normal engineers to move fast, learn continuously, and deliver great results as a team, you should:

Shrink the interval between when you write the code and when the code goes live.

Make it as short as possible; the shorter the better. I’ve written and given talks about this many, many times. The shorter the interval, the lower the cognitive carrying costs. The faster you can iterate, the better. The more of your brain can go into the product instead of the process of building it.

One of the most powerful things you can do is have a short, fast enough deploy cycle that you can ship one commit per deploy. I’ve referred to this as the “software engineering death spiral” … when the deploy cycle takes so long that you end up batching together a bunch of engineers’ diffs in every build. The slower it gets, the more you batch up, and the harder it becomes to figure out what happened or roll back. The longer it takes, the more people you need, the higher the coordination costs, and the more slowly everyone moves.

Deploy time is the feedback loop at the heart of the development process. It is almost impossible to overstate the centrality of keeping this short and tight.

Make it easy and fast to roll back or recover from mistakes.

Developers should be able to deploy their own code, figure out if it’s working as intended or not, and if not, roll forward or back swiftly and easily. No muss, no fuss, no thinking involved.

Make it easy to do the right thing and hard to do the wrong thing.

Wrap designers and design thinking into all the touch points your engineers have with production systems. Use your platform engineering team to think about how to empower people to swiftly make changes and self-serve, but also remember that a lot of times people will be engaging with production late at night or when they’re very stressed, tired, and possibly freaking out. Build guard rails. The fastest way to ship a single line of code should also be the easiest way to ship a single line of code.

Invest in instrumentation and observability.

You’ll never know — not really — what the code you wrote does just by reading it. The only way to be sure is by instrumenting your code and watching real users run it in production. Good, friendly sociotechnical systems invest heavily in tools for sense-making.

Being able to visualize your work is what makes engineering abstractions accessible to actual engineers. You shouldn’t have to be a world-class engineer just to debug your own damn code.

Devote engineering cycles to internal tooling and enablement.

If fast, safe deploys, with guard rails, instrumentation, and highly parallelized test suites are “everybody’s job”, they will end up nobody’s job. Engineering productivity isn’t something you can outsource. Managing the interfaces between your software vendors and your own teams is both a science and an art. Making it look easy and intuitive is really hard. It needs an owner.

Build an inclusive culture.

Growth is the norm, growth is the baseline. People do their best work when they feel a sense of belonging. An inclusive culture is one where everyone feels safe to ask questions, explore, and make mistakes; where everyone is held to the same high standard, and given the support and encouragement they need to achieve their goals.

Diverse teams are resilient teams.

Yeah, a team of super-senior engineers who all share a similar background can move incredibly fast, but a monoculture is fragile. Someone gets sick, someone gets pregnant, you start to grow and you need to integrate people from other backgrounds and the whole team can get derailed — fast.

When your teams are used to operating with a mix of genders, racial backgrounds, identities, age ranges, family statuses, geographical locations, skill sets, etc — when this is just table stakes, standard operating procedure — you’re better equipped to roll with it when life happens.

Assemble engineering teams from a range of levels.

The best engineering teams aren’t top-heavy with staff engineers and principal engineers. The best engineering teams are ones where nobody is running on autopilot, banging out a login page for the 300th time; everyone is working on something that challenges them and pushes their boundaries. Everyone is learning, everyone is teaching, everyone is pushing their own boundaries and growing. All the time.

By the way — all of that work you put into making your systems resilient, well-designed, and humane is the same work you would need to do to help onboard new engineers, develop junior talent, or let engineers move between teams.

It gets used and reused. Over and over and over again.

The only meaningful measure of productivity is impact to the business

The only thing that actually matters when it comes to engineering productivity is whether or not you are moving the business materially forward.

Which means…we can’t do this in a vacuum. The most important question is whether or not we are working on the right thing, which is a problem engineering can’t answer without help from product, design, and the rest of the business.

Software engineering isn’t about writing lots of lines of code, it’s about solving business problems using technology.

Senior and intermediate engineers are actually the workhorses of the industry. They move the business forward, step by step, day by day. They get to put their heads down and crank instead of constantly looking around the org and solving coordination problems. If you have to be a staff+ engineer to move the product forward, something is seriously wrong.

Great engineering orgs mint world-class engineers

A great engineering org is one where you don’t HAVE to be one of the best engineers in the world to have a lot of impact. But — rather ironically — great engineering orgs mint world class engineers like nobody’s business.

The best engineering orgs are not the ones with the smartest, most experienced people in the world, they’re the ones where normal software engineers can consistently make progress, deliver value to users, and move the business forward, day after day.

Places where engineers can get shit done and have a lot of impact are a magnet for top performers. Nothing makes engineers happier than building things, solving problems, making progress.

If you’re lucky enough to have world-class engineers in your org, good for you! Your role as a leader is to leverage their brilliance for the good of your customers and your other engineers, without coming to depend on their brilliance. After all, these people don’t belong to you. They may walk out the door at any moment, and that has to be okay.

These people can be phenomenal assets, assuming they can be team players and keep their egos in check. Which is probably why so many tech companies seem to obsess over identifying and hiring them, especially in Silicon Valley.

But companies categorically overindex on finding these people after they’ve already been minted, which ends up reinforcing and replicating all the prejudices and inequities of the world at large. Talent may be evenly distributed across populations, but opportunity is not.

Don’t hire the “best” people. Hire the right people.

We (by which I mean the entire human race) place too much emphasis on individual agency and characteristics, and not enough on the systems that shape us and inform our behaviors.

I feel like a whole slew of issues (candidates self-selecting out of the interview process, diversity of applicants, etc) would be improved simply by shifting the focus on engineering hiring and interviewing away from this inordinate emphasis on hiring the BEST PEOPLE and realigning around the more reasonable and accurate RIGHT PEOPLE.

It’s a competitive advantage to build an environment where people can be hired for their unique strengths, not their lack of weaknesses; where the emphasis is on composing teams rather than hiring the BEST people; where inclusivity is a given both for ethical reasons and because it raises the bar for performance for everyone. Inclusive culture is what actual meritocracy depends on.

This is the kind of place that engineering talent (and good humans) are drawn to like a moth to a flame. It feels good to ship. It feels good to move the business forward. It feels good to sharpen your skills and improve your craft. It’s the kind of place that people go when they want to become world class engineers. And it’s the kind of place where world class engineers want to stick around, to train up the next generation.

<3, charity

 

In Praise of “Normal” Engineers

On How Long it Takes to Know if a Job is Right for You or Not

A few eagle-eyed readers have noticed that it’s been 4 weeks since my last entry in what I have been thinking of as my “niblet series” — one small piece per week, 1000 words or less, for the next three months.

This is true. However, I did leave myself some wiggle room in my original goal, when I said “weeks when I am not traveling”, knowing I was traveling 6 of the next 7 weeks. I was going to TRY to write something on the weeks I was traveling, but as you can see, I mostly did not succeed. Oh well!

Honestly, I don’t feel bad about it. I’ve written well over 1k words on bsky over the past two weeks in the neverending thread on the costs and tradeoffs of remote work. (A longform piece on the topic is coming soon.) I also wrote a couple of lengthy internal pieces.

This whole experiment was designed to help me unblock my writing process and try out new habits, and I think I’m making progress. I will share what I’m learning at a later date, but for now: onward!

How long does it take to form an impression of a new job?

This week’s niblet was inspired by a conversation I had yesterday with an internet friend. To paraphrase (and lightly anonymize) their question:

“I took a senior management role at this company six months ago. My search for this role was all about values alignment, from company mission to leadership philosophy, and the people here said all the right things in the process. But it’s just not clicking.

It’s only been six months, but it’s starting to feel like it might not work out. How much longer should I give it?”

Zero. You should give it 0 time. You already know, and you’ve known for a long time; it’s not gonna change. I’m sorry. 💔

I’m not saying you should quit tomorrow, a person needs a paycheck, but you should probably start thinking in terms of how to manage the problem and extricate yourself from it, not like you’re waiting to see if it will be a good fit.

Every job I’ve ever had has made a strong first impression

I’ve had…let’s see…about six different employers, over the course of my (post-university) career.

Every job I’ve ever taken, I knew within the first week whether it was right for me or not. That might be overstating things a bit (memory can be like that). But I definitely had a strong visceral reaction to the company within days after starting, and the rest of my tenure played out more or less congruent with that reaction.Progress not Perfection

The first week at EVERY job is a hot mess of anxiety and nerves and second-guessing yourself and those around you. It’s never warm fuzzies. But at the jobs I ended up loving and staying at long term, the anxiety was like “omg these people are so cool and so great and so fucking competent, I hope I can measure up to their expectations.”

And then there were the jobs where the anxiety I felt was more like a sinking sensation of dread, of “oooohhh god I hope this is a one-off and not the kind of thing I will encounter every day.”

🌸 There was the job where they had an incident on my very first day, and by 7 pm I was like “why isn’t someone telling me I should go home?” There was literally nothing I could do to help, I was still setting up my accounts, yet I had the distinct impression I was expected to stay.

This job turned out to be stereotypically Silicon Valley in the worst ways, hiring young, cheap engineers and glorifying coding all night and sleeping under your desks.

🌼 There was the job where they were walking me through a 50-page Microsoft Word doc on how to manage replication between DB nodes, and I laughed a little, and looked for some rueful shared acknowledgement of how shoddy this was…but I was the only one laughing.

That job turned out to be shoddy, ancient, flaky tech all the way down, with comfortable, long-tenured staff who didn’t know (and did NOT want to hear) how out of date their tech had become.

Over time, I learned to trust that intuition

Around the time I became a solidly senior engineer, I began to reflect on how indelible my early impressions of each job had been, and how reliable those impressions had turned out to be.Communicate Positive Intent

To be clear, I don’t regret these jobs. I got to work with some wonderful people, and I got to experience a range of different organizational structures and types. I learned a lot from every single one of my jobs.

Perhaps most of all, I learned how to sniff out particular environments that really do not work for me, and I never made the same mistake twice.

Companies can and do change dramatically. But absent dramatic action, which can be quite painful, they tend to drift along their current trajectory.

This matters even more for managers

This is one of those ways that I think the work of management is different from the work of engineering. As an experienced IC, it’s possible to phone it in and still do a good job. As long as you’re shipping at an acceptable rate, you can check out mentally and emotionally, even work for people or companies you basically despise.

Lots of people do in fact do this. Hell, I’ve done it. You aren’t likely to do the best work of your life under these circumstances, but people have done far worse to put food on the table.

An IC can wall themselves off emotionally and still do acceptable work, but I’m not sure a manager can do the same.

Alignment *is* the job of management

As a manager, you literally represent the company to your team and those around you. You don’t have to agree with every single decision the company makes, but if you find yourself constantly having to explain and justify things the company has done that deeply violate your personal beliefs or ethics, it does you harm.

Some managers respond to a shitty corporate situation by hunkering down and behaving like a shit umbrella; doing whatever they can to protect their people, at the cost ofif it hurts...do it more undermining the company itself. I don’t recommend this, either. It’s not healthy to know you walk around every day fucking over one of your primary stakeholders, whether it’s the company OR  your teammates.

There are also companies that aren’t actually that bad, but you just aren’t aligned with them. That’s fine. Alignment matters a lot more for managers than for ICs, because alignment is the job.

Management is about crafting and tending to complex sociotechnical systems. No manager can do this alone. Having a healthy, happy team of direct reports is only a fraction of the job description. It’s not enough. You can and should expect more.

What can you learn from the experience?

I asked my friend to think back to the interview process. What were the tells? What do they wish they had known to watch out for?

They thought for a moment, then said:

“Maybe the fact that the entire leadership team had been grown or promoted from within. SOME amount of that is terrific, but ALL of it might be a yellow flag. The result seems to be that everyone else thinks and feels the same way…and I think differently.”

This is SO insightful.

It reminds me of all the conversations Emily and I have had over the years, on how to balance developing talent from within vs bringing in fresh perspectives, people who have already seen what good looks like at the next stage of growth, people who can see around corners and challenge us in different ways.

This is a tough thing to suss out from the outside, especially when the employer is saying all the right things. But having an experience like this can inoculate you from an entire family of related mistakes. My friend will pick up on this kind of insularity from miles away, from now on.

Bad jobs happen. Interviews can only predict so much. A person who has never had a job they disliked is a profoundly lucky person. In the end, sometimes all you can take is the lessons you learned and won’t repeat.

The pig is committed

Have you ever heard the metaphor of the chicken vs the pig? The chicken contributes an egg to breakfast, the pig contributes bacon. The punch line goes something like, “the chicken is involved, but the pig is committed!”

It’s vivid and a bit over the top, but I kept thinking about it while writing this piece. The engineer contributes their labor and output to move the company forward, but the manager contributes their emotional and relational selves — their humanity — to serve the cause.

You only get one career. Who are you going to give your bacon to?

On How Long it Takes to Know if a Job is Right for You or Not

On Pronouns, Policies and Mandates

Hi friends! We’re on week three of my 12-week practice in writing one bite-sized topic per week — scoping it down, writing straight through, trying real hard to avoid over-writing or editing down to a pulp.

Week 1 — “On Writing, Social Media, and Finding the Line of Embarrassment
Week 2 — “On Dropouts and Bootstraps

Three points in a row makes a line, and three posts in a row called “On [Something or Other]” is officially a pattern.

It was an accidental repeat last week (move fast and break things! 🙈), but I think I like it, so I’m sticking with it.

Next on the docket: pronouns and mandates

This week I would like to talk about pronouns (as in “my name is Charity, my pronouns are she/her or they/them”) and pronoun mandates, in the context of work.

Here’s where I stand, in brief:

  • Making it safe to disclose the pronouns you use: ✨GOOD✨
  • Normalizing the practice of sharing your pronouns: ✨GOOD✨
  • Mandating that everyone share their pronouns: ✨BAD✨

This includes soft mandates, like when a manager or HR asks everyone at work to share their pronouns when introducing themselves, or making pronouns a required field in email signatures or display names.

I absolutely understand that people who do this are acting in good faith, trying to be good allies. But I do not like it. 😡 And I think it can massively backfire!

Here are my reasons.

I resent being forced to pick a side in public

I have my own gender issues, y’all. Am I supposed to claim “she/her” or “they/them”? Ugh, I don’t know. I’ve never felt any affinity with feminine pronouns or identity, but I don’t care enough to correct anyone or assert a preference for they/them. Ultimately, the strongest feeling I have about my gender is apathy/discomfort/irritation. Maybe that will change someday, maybe it won’t, but I resent being forced to pick a side and make some kind of public declaration when I’m just trying to do my goddamn job. My gender doesn’t need to be anyone else’s business.

I totally acknowledge that it is valuable for cis people to help normalize the practice by sharing their pronouns. (It never fails to warm the cockles of my cold black heart when I see a graying straight white dude lead with “My pronouns are he/him” in his bio. Charmed! 😍)

If I worked at a company where this was not commonly done, I would suck it up and take one for the team. But I don’t feel the need, because it is normalized here. We have loads of other queer folks, my cofounder shares her pronouns. I don’t feel like I’m hurting anyone by not doing it myself.

Priming people with gender cues can be…unwise

One of the engineering managers I work with, Hannah Henderson, once told me that she has always disliked pronoun mandates for a different reason. Research shows that priming someone to think of you as a woman first and foremost generally leads them to think of you as being less technical, less authoritative, even less competent.

Great, just what we need.

What about people who don’t know, or aren’t yet out?

Some people may be in a transitional phase, or may be in the process of coming out as trans or genderqueer or nonbinary, or maybe they don’t know yet. Gender is a deeply personal question, and it’s inappropriate to force people to take a stand or pick a side in public or at work.

If **I** feel this way about pronoun mandates (and keep in mind that I am queer, have lived in San Francisco for 20 years, and am married to a genderqueer trans person), I can’t imagine how offputting and irritating these mandates must be to someone who holds different values, or comes from a different cultural background.

You can’t force someone to be a good ally

As if that wasn’t enough, pronoun mandates also have a flattening effect, eliminating useful signal about who is willing to stand up and identify themselves as someone who is a queer ally, and/or is relatively informed about gender issues.

As a friend commented, when reviewing a draft of this post: “Mandating it means we can’t look around the room and determine who might be friendly or safe, while also escalating resentment that bigots hold towards us.”

A couple months back I wrote a long (LONG) essay detailing my mixed feelings about corporate DEI initiatives. One of the points I was trying to land is how much easier it is to make and enforce rules, if you’re in a position with the power to do so, than to win hearts and minds. Rules always have edge cases and unintended consequences, and the backlash effect is real. People don’t like being told what to do.

Pronoun mandates were at the top of my mind when I wrote that, and I’ve been meaning to follow up and unpack this ever since.

Til next week, when we’ll talk “On something or some other thing”,
~charity💕

(835 words! 🙌)

On Pronouns, Policies and Mandates

On Dropouts and Bootstraps

In my early twenties I had a cohort of friends and coworkers, all Silicon Valley engineers, all quite good at their jobs, all college dropouts. We developed a shared conviction that only losers got computer science degrees. This sounds like a joke, or a self-defense mechanism, but it was neither. We were serious.

We held CS grads in contempt, as a class. We privately mocked them. When interviewing candidates, we considered it a knock against someone if they graduated — not an insuperable one by any means, but certainly a yellow flag, something to be probed in the interview process, to ensure they had good judgment and were capable of learning independently and getting shit done, despite all evidence to the contrary.

We didn’t look down on ALL college graduates (that would be unreasonable). If you went to school to study something like civil engineering, or philosophy, or Russian literature, good for you! But computers? Everything in my experience led me to conclude that sitting in a classroom studying computers was a waste of time and money.

I had evidence! I worked my way through school — as the university sysadmin, at a local startup — and I had always learned soooo much more from my work than my classes. The languages and technologies they taught us were consistently years out of date. Classes were slow and plodding. Our professors lectured on and on about “IN-dustry” in a way that made it abundantly clear that they had no recent, relevant experience.

College dropouts: the original bootstrappers

The difference became especially stark after I spent a year working in Silicon Valley. I then returned to school, fully intending to finish and graduate, but I could not focus; I was bored out of my skull.

How could anyone sit through that amount of garbage? Wouldn’t anyone with an ounce of self-respect and intrinsic motivation have gotten up off their butts and learned what they needed to know much faster on their own? For fuck’s sake! just google it!

My friends and I rolled our eyes at each other and sighed over these so-called software engineers with degrees, who apparently needed their learning doled out in small bites and spoon-fed to them, like a child. Who wanted to work with someone with such a high tolerance for toil and bullshit?

Meanwhile we, the superior creatures, had simply figured out whatever the fuck we needed to learn by reading the source code, reading books and manuals, trying things out. We pulled OUR careers up by our own bootstraps, goddammit. Why couldn’t they? What was WRONG with them??

We knew so many deeply mediocre software engineers who had gotten their bachelor’s degree in computer science, and so many exceptional engineers with arts degrees or no degrees, that it started to feel like a rule or something.

Were they cherrypicked examples? Of course they were. That’s how these things work.

People are really, really good at justifying their status

Ever since then, I’ve met wave after wave of people in this industry who are convinced they know how to sift “good” talent from “bad” via easily detected heuristics. They’re mostly bullshit.

Which is not to say that heuristics are never useful, or that any of us can afford to expend infinite amounts of time sifting through prospects on the off chance that we miss a couple quality candidates. They can be useful, and we cannot.

However, I have retained an abiding skepticism of heuristics that serve to reinforce existing power structures, or ones that just so happen to overlap with the background of the holder of said heuristics.

Those of us who work in tech are fabulously fortunate; in terms of satisfying, remunerative career outcomes, we are easily in the top .0001% of all humans who have ever lived. Maybe this is why so many of us seem to have some deep-seated compulsion to prove that we belong here, no really, people like me deserve to be here.

This calls for some humility

If nothing else, I think it calls for some humility. I don’t feel like I “deserve” to be here. I don’t think any of us do. I think I worked really fucking hard and I got really fucking lucky. Both can be true. Some of the smartest kids I grew up with are now pumping gas or dead. Almost none of the people I grew up with ever reached escape velocity and made it out of our small town.

When I stop to think about it, it scares me how lucky I got. How lucky I am to have grown up when I did, to have entered tech when I did, when the barriers to entry were so low and you really could just learn on the job, if you were willing to work your ass off. I left home when I was 15 to go to college, and put myself through largely on minimum wage jobs. Even five years later, I couldn’t have done that.

There was a window of time in the 2000s when tech was an escalator to the middle class for a whole generation of weirdos, dropouts and liberal arts misfits. That window has been closed for a while now. I understand why the window closed, and why it was inevitable (software isn’t a toy anymore), but it’s still.. bittersweet.

I guess I’m just really grateful to be here.

~charity

Experiment update

As I wrote last week, I’m trying to reset my relationship with writing, by publishing one short blog post per week: under 1000 words, minimal editing. And there marks week 2, 942 words.

See you next week.

Week 1 — “On Writing, Social Media, and Finding the Line of Embarrassment

 

 

 

On Dropouts and Bootstraps

On Writing, Social Media, and Finding the Line of Embarrassment

Brace yourself, because I’m about to utter a sequence of words I never thought I would hear myself say:

I really miss posting on Twitter.

I really, really miss it.

It’s funny, because Twitter was never not a trash fire. There was never a time when it felt like we were living through some kind of hallowed golden age of Twitter. I always felt a little embarrassed about the amount of time I spent posting.

Or maybe you only ever really see golden ages in hindsight.

I joined Twitter in 2009, and was an intermittent user for years. But it was after we started working on Honeycomb that Twitter became a lifeline, a job, a huge part of my everyday life.

Without Twitter, there would be no Honeycomb

Every day I would leave the house, look down at my phone, and start pecking out tweets as I walked to work. I turned out these mammoth threads about instrumentation, cardinality, storage engines, etc. Whatever was on my mind that day, it fed into Twitter.

In retrospect, I now realize that I was doing things like “outbounding” and “product marketing” and “category creation”, but at the time it felt more like oxygen.

Working out complex technical concepts in public, in real time, seeing what resonated, batting ideas back and forth with so many other smart, interesting people online…it was heady shit.

In the early days, we actually thought that Honeycomb-style observability (high cardinality, slice-and-dice, explorability, etc) was something only super large, multi-tenant platforms would ever care about or be willing to pay for. It was the conversations we were having on Twitter, the intensity of people’s reactions, that made us realize that no, actually; this was fast becoming an everybody problem.

Twitter was my most reliable source of dopamine

It’s impossible to talk about Twitter’s impact on my life and career without also acknowledging the ways I used it to self-medicate.

My ADHD was unmanaged, unmedicated, and unknown to me in those years. In retrospect, I can see that my only tool as an engineer was hyperfocus, and I rode that horse into the ground. When I unexpectedly became CEO, my job splintered into a million little bite sized chunks of time, and hyperfocus was no longer available to me. The tools I did have were Twitter and sleep deprivation.

Lack of sleep, it turns out, can wind me down and help me focus. If I’ve been awake for over 24 hours, I can buckle down and force myself to grind through things like email, expense reports, or writing marketing copy. Sleep deprivation is not pleasant, it’s actually really fucking painful, but it works. So I did it. From 2016 to 2020, I slept only once every two or three days. (People always think I am exaggerating when I say this, but people closer to me know that this is probably an understatement.)

But Twitter, you dear, dysfunctional hellsite… Twitter could wind me up.

I would go for a walk, pound out a fifty-tweet thread, and arrive at my destination feeling all revved up.

I picked fights, I argued. I was combative and aggressive in public, and I loved it. I regret some of it now; I burned some good relationships, and I burned out my adrenal glands. But I would sit down at my desk feeling high on dopamine, and I could channel that high into focus. It’s the only way I got shit done.

I got my ADHD diagnosis in 2020 (thank the gods). Since then I’ve done medication, coaching, therapy in several modalities, cats… I’ve tried it all, and a lot of it has helped. I sleep every single night now.

That world is gone, and it’s not coming back

The social media landscape has fragmented, and maybe that’s a good thing. There is nothing today that scratches the same itch for me as Twitter did, in its golden years. And maybe I don’t need it in quite the same way as I used to.

Most of the people I used to love talking with on X seem to have abandoned it to the fascists. LinkedIn is performatively corporate and has no soul. I’m still on Bluesky, but it’s a bit of an echo chamber and people mostly talk about politics; that is not what I go to social media for. The noisy, combative tech scene I loved doesn’t really seem to exist anymore.

These days I use social media less than ever, but I am learning that my writing is more important to me than ever. Which is forcing me to reckon with the fact that my writing process may no longer fit or serve the function I need it to.

Most of those epic threads I put so much time and energy into crafting have vanished into the ether. The few that I bothered to convert into essay format are the only ones that have endured.

I’ve been writing in public for ten years now

Do you ever hear yourself say something, causing you to pause, surprised: “I guess that’s a thing I believe”?

A couple months ago, Cynthia Dunlop asked me to share any thoughts I might have on my writing, as part of the promotional tour for “Writing for Developers: Blogs That Get Read” (p.s., great book!). I wrote back:

There are very few things in life that I am prouder of than the body of writing I have developed over the past 10 years.

When I look back over things I have written, I feel like I can see myself growing up, my mental health improving, I’m getting better at taking the long view, being more empathetic, being less reactive… I’ve never graduated from anything in my life, so to me, my writing kind of externalizes the progress I’ve made as a human being. It’s meaningful to me.

Huh. Turns out that’s a thing I believe. 🤔

I wrote my first post on this site in December of 2015. It’s crazy to look back on all the different things I have written about here over the past ten years — book reviews, boba recipes, technology, management, startup life, and more.

Even more mindblowing is when I look at my drafts folder, my notes folders. The hundreds of ideas or pieces I wanted to write about, or started writing about, but never found the time to polish or finish. Whuf.

I need to learn how to write shorter, faster pieces, without the buffer of social media

From 2015 to somewhere in the 2021-2023 timeframe, thoughts and snippets of writing were pouring out of me every day, mostly feeding the Twitter firehose. Only a few of those thoughts ever graduated into blog post form, but those few are the ones that have endured and had the most impact.

Over the past 2-4 years, I’ve been writing less frequently, less consistently, and mostly in blog post form. My posts, meanwhile, have gotten longer and longer. I keep shipping these 5000-9000-word monstrosities (I’m so sorry 🤦). I sometimes wonder who, if anyone, ever reads the whole thing.

The problem is that I keep writing myself into a ditch. I pick up a topic, and start writing, and somehow it metastasizes. It expands to consume all available time and space (and then some). By the time I’ve finished editing it down, weeks if not months have passed, and I have usually grown to loathe the sight of it.

For most of my adult life, I’ve relied on hard deadlines and panic to drive projects to completion, or to determine the scope of a piece. I’ve relied on anger and adrenaline rushes to fuel my creative juices, and due dates and external pressure to get myself over the finish line.

And what does that finish line look like? Running out of time, of course! I know I’m done because I have run out of time to work on it. No wonder scoping is such a problem for me.

A three month experiment in writing bite sized pieces

I need to learn to write in a different way. I need to learn to draft without twitter, scope without deadlines. Over the next five years, I want to get a larger percentage of my thoughts shipped in written form, and I don’t want them to evaporate into the ether of social media. This means I need to make some changes.

  1. write shorter pieces
  2. spend less time writing and editing
  3. find the line of embarrassment, and hug it.

For the next three months, I am going to challenge myself to write one blog post per week (travel weeks exempt). I will try to cap each one under 1000 words (but not obsess over it, because the point is to edit less).

I’m writing this down as a public commitment and accountability mechanism.

So there we go, 1473 words. Just above the line of embarrassment.

See you here next week.

On Writing, Social Media, and Finding the Line of Embarrassment

Another observability 3.0 appears on the horizon

Groan. Well, it’s not like I wasn’t warned. When I first started teasing out the differences between the pillars model and the single unified storage model and applying “2.0” to the latter, Christine was like “so what is going to stop the next vendor from slapping 3.0, 4.0, 5.0 on whatever they’re doing?”

Matt Klein dropped a new blog post last week called “Observability 3.0”, in which he argues that bitdrift’s Capture — a ring buffer storage on mobile devices — deserves that title. This builds on his previous blog posts: “1000x the telemetry at 0.01x the cost”, “Why is observability so expensive?”, and “Reality check: Open Telemetry is not going to solve your observability woes”, wherein he argues that the model of sending your telemetry to a remote aggregator is fundamentally flawed.

I love Matt Klein’s writing — it’s opinionated, passionate, and deeply technical. It’s a joy to read, full of fun, fiery statements about the “logging industrial complex” and backhanded… let’s call them “compliments”… about companies like ours. I’m a fan, truly.

In retrospect, I semi regret the “o11y 2.0” framing

Yeah, it’s cheap and terribly overdone to use semantic versioning as a marketing technique. (It worked for Tim O’Reilly with “Web 2.0”, but Tim O’Reilly is Tim O’Reilly — the exception that proves the rule.) But that’s not actually why I regret it.

I regret it because a bunch of people — vendors mostly, but not entirely — got really bristly about having “1.0” retroactively applied to describe the multiple pillars model. It reads like a subtle diss, or devaluation of their tools.Unstructured Logs Go Here (Trash)

One of the principles I live my life by is that you should generally call people, or groups of people, what they want to be called.

That is why, moving forwards, I am going to mostly avoid referring to the multiple pillars model as “o11y 1.0”, and instead I will call it the … multiple pillars model. And I will refer to the unified storage model as the “unified or consolidated storage model, sometimes called ‘o11y 2.0’”.

(For reference, I’ve previously written about why it’s time to version observability, what the key difference is between o11y 1.0 vs 2.0, and had a fun volley back and forth with Hazel Weakly on versioning observabilities: mine, hers.)

Why do we need new language?

It is clearer than ever that a sea change is underway when it comes to how telemetry gets collected and stored. Here is my evidence (if you have evidence to the contrary or would like to challenge me on this, please reach out — first name at honeycomb dot io, email me!!):

  • Every single observability startup that was founded before 2021, that still exists, was built using the multiple pillars model … storing each type of signal in a different location, with limited correlation ability across data sets. (With one exception: Honeycomb.)
  • Every single observability startup that was founded after 2021, that still exists, was built using the unified storage model, capturing wide, structured log events, stored in a columnar database. (With one exception: Chronosphere.)

The major cost drivers in an o11y 1.0 — oop, sorry, in a “multiple pillars” world, are 1) the number of tools you use, 2) cardinality of your data, and 3) dimensionality of your data — or in other words, the amount of context and detail you store about your data, which is the most valuable part of the data! You get locked in a zero sum game between cost and value.

containers, because the dumpsters could no longer contain the firesThe major cost drivers in a unified storage world, aka “o11y 2.0”, are 1) your traffic, 2) your architecture, and 3) density of your instrumentation. This is important, because it means your cost growth should roughly align with the growth of your business and the value you get out of your telemetry.

This is a pretty huge shift in the way we think about instrumentation of services and levers of cost control, with a lot of downstream implications. If we just say “everything is observability”, it robs engineers of the language they need to make smart decisions about instrumentation, telemetry and tools choices. Language informs thinking and vice versa, and when our cognitive model changes, we need language to follow suit.

(Technically, we started out by defining observability as differentiated from monitoring, but the market has decided that everything is observability, so … we need to find new language, again. 😉)

Can we just … not send all that data?

My favorite of Matt’s blog posts is “Why is observability so expensive?” wherein he recaps the last 30 years of telemetry, gives some context about his work with Envoy and the separation of control planes / data planes, all leading up to this fiery proposition:

“What if by default we never send any telemetry at all?”

As someone who is always rooting for the contrarian underdog, I salute this. 🫡

As someone who has written and operated a ghastly amount of production services, I am not so sure.My life was full of known-unknowns until you came into it

Matt is the cofounder and CTO of Bitdrift, a startup for mobile observability. And in the context of mobile devices and IoT, I think it makes a lot of sense to gather all the data and store it at the origin, and only forward along summary statistics, until or unless that data is requested in fine granularity. Using the ring buffer is a stroke of genius.

Mobile devices are strictly isolated from each other, they are not competing with each other for shared resources, and the debugging model is mostly offline and ad hoc. It happens whenever the mobile developer decides to dig in and start exploring.

It’s less clear to me that this model will ever serve us well in the environment of highly concurrent, massively multi-tenant services, where two of the most important questions are always what is happening right now, and what just changed?

Even the 60-second aggregation window for traditional metrics collectors is a painful amount of lag when the site is down. I can’t imagine waiting to pull all the data in from hundreds or thousands of remote devices just to answer a question. And taking service isolation to such an extreme effectively makes traces impossible.

The hunger for more cost control levers is real

I think there’s a kernel of truth there, which is that the desire to keep a ton of rich telemetry detail about a fast-expanding footprint of data in a central location is not ultimately compatible with what people are willing or able to pay.Slow is the new down

The fatal flaw of the multiple pillars model is that your levers of control consist of deleting your most valuable data: context and detail. The unified storage (o11y 2.0) model advances the state of the art by giving you tools that let you delete your LEAST valuable data, via tail sampling.

In a unified storage model, you should also have to store your data only once, instead of once per tool (Gartner data shows that most of their clients are using 10-20 tools, which is a hell of a cost multiplier.)

But I also think Matt’s right to say that these are only incremental improvements. And the cost levers I see emerging in the market that I’m most excited about are model agnostic.

Telemetry pipelines, tiered storage, data governance

The o11y 2.0 model (with no aggregation, no time bucketing, no indexing jobs) allows teams to get their telemetry faster than ever… but it does this by pushing all aggregation decisions from write time to read time. Instead of making a bunch of decisions at the instrumentation level about how to aggregate and organize your data… you store raw, wide structured event data, and perform ad hoc aggregations at query time.

Test in prod or make your users do it for youMany engineers have argued that this is cost-prohibitive and unsustainable in the long run, and…I think they are probably right. Which is why I am so excited about telemetry pipelines.

Telemetry pipelines are the slider between aggregating metrics at write time (fast, cheap, painfully limited) and shipping all your raw, rich telemetry data off to a vendor, for aggregating at read time.

Sampling, too, has come a long way from its clumsy, kludgey origins. Tail-based sampling is now the norm, where you make decisions about what to retain or not only after the request has completed. The combination of fine-grained sampling + telemetry pipelines + AI is incredibly promising.

I’m not going to keep going into detail here because I’m currently editing down a massive piece on levers of cost control, and I don’t want to duplicate all that work (or piss off my editors). Suffice it to say, there’s a lot of truth to what Matt writes… and also he has a way of skipping over all the details that would complicate or contradict his core thesis, in a way I don’t love. This has made me vow to be more careful in how I represent other vendors’ offerings and beliefs.

Money is not always the most expensive resource

I don’t think we’re going to get to “1000x the telemetry at 0.01x the cost”, as Matt put it, unless we are willing to sacrifice or seriously compromise some of the other things we hold dear, like the ability to debug complex systems in real time.

Gartner recently put out a webinar on controlling observability costs, which I very much appreciated, because it brought some real data to what has been a terribly vibes-based conversation. They pointed out that one of the biggest drivers of o11y costs has been that people get attached to it, and start using it heavily. You can’t claw it back.if it hurts...do it more

I think this is a good thing — a long overdue grappling with the complexity of our systems and the fact that we need to observe it through our tools, not through our mental map or how we remember it looking or behaving, because it is constantly changing out from under us.

I think observability engineering teams are increasingly looking less like ops teams, and more like data governance teams, the purest embodiment of platform engineering goals.

When it comes to developer tooling, cost matters, but it is rarely the most important thing or the most valuable thing. The most important things are workflows and cognitive carrying costs.

Observability is moving towards a data lake model

Whatever you want to call it, whatever numeric label you want to slap on it, I think the industry is clearly moving in the direction of unified storage — a data lake, if you will, where signals are connected to each other, and particular use cases are mostly derived at read time instead of write time. Where you pay to store each request only one time, and there are no dead ends between signals.

Matt wrote another post about how OpenTelemetry wasn’t going to solve the cost crisis inotel me again why i should instrument my code o11y … but I think that misses the purpose of OTel. The point of OTel is to get rid of vendor lock-in, to make it so that o11y vendors compete for your business based on being awesome, instead of impossible to get rid of.

Getting everyone’s data into a structured, predictable format also opens up lots of possibilities for tooling to feel like “magic”, which is exciting. And opens some entirely different avenues for cost controls!

In my head, the longer term goals for observability involve unifying not just data for engineering, but for product analytics, business forecasting, marketing segmentation… There’s so much waste going on all over the org by storing these in siloed locations. It fragments people’s view of the world and reality. As much as I snarked on it at the time, I think Hazel Weakly’s piece on “The future of observability is observability 3.0” was incredibly on target.

One of my guiding principles is that ✨data is made valuable by context.✨ When you store it densely packed together — systems, app, product, marketing, sales — and derive insights from a single source of truth, how much faster might we move? How much value might we unlock?

I think the new few years are going to be pretty exciting.

Another observability 3.0 appears on the horizon

Corporate “DEI” is an imperfect vehicle for deeply meaningful ideals

I have not thought or said much about DEI (Diversity, Equity and Inclusion) over the years. Not because I don’t care about the espoused ideals — I suppose I do, rather a lot — but because corporate DEI efforts have always struck me as ineffective and bland; bolted on at best, if not actively compensating for evil behavior.

I know how crisis PR works. The more I hear a company natter on and on about how much it cares for the environment, loves diversity, values integrity, yada yada, the more I automatically assume they must be covering their ass for some truly heinous shit behind closed doors.

My philosophy has historically been that actions speak louder than words. I would one million times rather do the work, and let my actions speak for themselves, than spend a lot of time yapping about what I’m doing or why.

I also resent being treated like an expert in “diversity stuff”, which I manifestly am not. As a result, I have always shrugged off any idea that I might have some personal responsibility to speak up or defend these programs.

Recent events (the tech backlash, the govt purge) have forced me to sit down and seriously rethink my operating philosophy. It’s one thing to be cranky and take potshots at corporate DEI efforts when they seem ascendant and powerful; it’s another when they are being stamped out and reviled in the public mind.

Actually, my work does not speak for itself

It took all of about thirty seconds to spot my first mistake, which is that no, actually, my work does not and cannot speak for itself. 🤦 No one’s does, really, but especially not when your job literally consists of setting direction and communicating priorities.

Maybe this works ok at a certain scale, when pretty much anyone can still overhear or participate in any topic they care about. But at some point, not speaking up at the company level sends its own message.

If you don’t state what you care about, how are random employees supposed to guess whether the things they value about your culture are the result of hard work and careful planning, or simply…emergent properties? Even more importantly, how are they supposed to know if your failures and shortcomings are due to trying but failing or simply not giving a shit?

These distinctions are not the most important (results will always matter most), but they are probably pretty meaningful to a lot of your employees.

The problem isn’t the fact that companies talk about their values, it’s that they treat it like a branding exercise instead of an accountability mechanism.

Fallacy #1: “DEI is the opposite of excellence or high performance”

There are two big category errors I see out there in the world. To be clear, one is a lot more harmful (and a lot more common, and increasingly ascendant) than the other, but both of these errors do harm.

The first error is what I heard someone call the “seesaw fallacy”: the notion that DEI and high performance are somehow linked in opposition to each other, like a seesaw; getting more of one means getting less of the other.

This is such absolute horseshit. 🙄 It fails basic logic, as well as not remotely comporting with my experience. You can kind of see where they’re coming from, but only by conveniently forgetting that every team and every company is a system.

Nobody is born a great engineer, or a great designer, or a great employee of any type. Great contributors are not born, they are forged — over years upon years of compounding experiences: education, labor, hard work, opportunities and more.

So-called “merit-based” hiring processes act like outputs are the only thing that matter; as though the way people show up on your doorstep is the way they were fated to be and the way they will always be. They don’t see people as inputs to the system — people with potential to grow and develop, people who may have been held back or disregarded in the past, people who will achieve a wide range of divergent outcomes based on the range of different experiences they may have in your system.

Fallacy #2: “DEI is the definition of excellence or high performance”

There is a mirror image error on the other end of the spectrum, though. You sometimes hear DEI advocates talk as though if you juuuust build the most diverse teams and the most inclusive culture, you will magically build better products and achieve overwhelming success in all of your business endeavors.

This is also false. You still have to build the fucking business! Your values and culture need to serve your business and facilitate its continued existence and success.

With the small caveat that … DEI isn’t the way you define excellence unless the way you define excellence is diversity, equity and inclusion, because “excellence” is intrinsically a values statement of what you hold most dear. This definition of excellence would not make sense for a profit-driven company, but valuing diverse teams and an inclusive culture over money and efficiency is a perfectly valid and coherent stance for a person to take, and lots of people do feel this way!

There is no such thing as the “best” or “right” values. Values are a way of navigating territory and creating alignment where there IS no one right answer. People value what they value, and that is their right.

DEI gets caricatured in the media as though the goal of DEI is diverse teams and equitable outcomes. But DEI is better seen as a toolkit. Your company values ought to help you achieve your goals, and your goals as a business usually some texture and nuance beyond just profit. At Honeycomb, for example, we talk about how we can “build a company people are proud to be part of”. DEI can help with this.

Let’s talk about MEI (Merit, Excellence and Intelligence)

Until last month I remained blissfully unaware of MEI, or “Merit, Excellence and Intelligence” (sic), and if you were too until just this moment, I apologize for ruining your party.

This idea that DEI is the opposite of MEI is particularly galling to me. I care a lot about high-performing teams and building an environment where people can do the best work of their lives. That is why I give a shit about building an inclusive culture.

An inclusive culture is one that sets as many people as possible up to soar and succeed, not just the narrow subset of folks who come pre-baked with all of life’s opportunities and advantages. When you get better at supporting folks and building a culture that foregrounds growth and learning, this both raises the bar for outcomes for everyone, and broadens the talent base you can draw from.

Honestly, I can’t think of anything less meritocratic than simply receiving and replicating all of society’s existing biases. Do you have any idea how much talent gets thrown away, in terms of unrealized potential? Let’s take a look at some of those stories from recent history.

If you actually give a shit about merit, you have to care about inclusion

Remember the Susan Fowler blog post that led to Travis Kalanick’s ouster as CEO of Uber in 2017? I suggest going back and skimming that post again, just to remind yourself what an absolutely jaw-dropping barrage of shit she went through, starting with being propositioned for sex by her very own manager on her very first day.

In “What You Do Is Who You Are”, investor Ben Horowitz wrote,”By all accounts Kalanick was furious about the incident, which he saw as a woman being judged on issues other than performance.” He believed that by treating her this way, his employees were failing to live up to their stated values around meritocracy.

I think that’s a flawed (but revealing) response to the situation at hand. Treating this like a question of “merit” suggests that they should be prioritizing the needs of whoever was most valuable to the company. And it kind of seems like that’s exactly what Kalanick’s employees were trying to do.

Susan was brilliant, yes; she was also young (25!) small, quiet, with a soft voice, in a corporate environment that valued aggression and bombast. She was early in her career and comparatively unproven; and when she reported her engineering manager’s persistent sexual advances and retaliatory actions to HR, she was told that HE was the high performer they couldn’t afford to lose.

Ask yourself this: would the manager’s behavior have been any more acceptable if Susan had been a total fuckup, instead of a certifiable genius? (NO. 😡)

Susan’s piece also noted that the percentage of women in Uber’s SRE org dropped from 25% to 3% across that same one year interval. Alarm bells were going off all over the place for an entire year, and nobody gave a shit, because an inclusive culture was nowhere on their radar as a thing that mattered.

There is no rational conversation to be had about merit that does not start with inclusion

You might know (or think you know) who your highest performers are today, but you do not know who will be on that list in six months, one year, five years. Your company is a system, and the environment you build will drive behaviors that help determine who is on that list.

Maybe you have a Susan Fowler type onboarding at your company right now. How confident are you that she will be treated fairly and equitably, that she will feel like she belongs? Do you think she might be underestimated due to her gender or presentation? Do you think she would want to stick around for the long haul? Will she be motivated to do her best work in service of your mission? Why?

Can you say the same about all your employees, not just ones you already know to be certifiable geniuses?

That’s inclusion. That’s how you build a real fucking meritocracy. You start with “do not tolerate the things that kneecap your employees in their pursuit of excellence”, and ESPECIALLY not the things that subject them to the compounding tax of being targeted for who they are. In life as in finance, it’s the compound interest that kills you, more than the occasional expensive purchase.

There’s more to merit and excellence than just inclusion, obviously, but there’s no rational adult conversation to be had about merit or meritocracy that doesn’t start there.

Susan left the tech industry, by the way. She seems to be doing great, of course, but what a loss for us.

If you give a shit about merit, tell me what you are doing to counteract bias

Anyone who talks a big game about merit, but doesn’t grapple with how to identify or counteract the effects of bias in the system, doesn’t really care about merit at all. What they actually want is what Ijeoma Oluo calls “entitlement masquerading as meritocracy” (“Mediocre”).

The “just world fallacy” is one of those cognitive biases that will be with us forever, because we have such a deep craving for narrative coherence. On a personal level, we are embodied beings awash with intrinsic biases; on a societal level, obviously, structural inequities abound. No one is saying we should aim for equality of outcomes, despite what some nutbag MEI advocates seem to think.

But anyone who truly cares about merit should feel compelled to do at least some work to try and lean against the ways our biases cause us to systematically under-value, under-reward, under-recognize, and under-promote some people (and over-value others). Because these effects add up to something cumulatively massive.

In the Amazon book “Working Backwards”, chapter 2, they briefly mention an engineering director who “wanted to increase the gender diversity of their team”, and decided to give every application with a female-gendered name a screening call. The number of women hired into that org “increased dramatically”.

That’s it — that’s the only tweak they made. They didn’t change the interview process, they didn’t “lower the bar”, they didn’t do anything except skip the step where women’s resumes were getting filtered out due to the intrinsic biases of the hiring managers.

There’s no shame in having biases — we all have them. The shame is in making other people pay the price for your unexamined life..

DEI is an imperfect vehicle for deeply meaningful ideals

I am by no means trying to muster a blanket defense of everything that gets lumped under DEI, just to be clear. Some of it is performative, ham-handed, well-intentioned but ineffective, disconnected or a distraction from real problems; diversity theater; a steam valve to vent off any real pressure for change; nitpicky and authoritarian, flirts with thought policing, or just horrendously cringe.

I don’t know how much I really care whether corporate DEI programs live or die, because I never thought they were that effective to start with. Jay Caspian Kang wrote a great piece in the New Yorker that captured my feelings on the matter:

The problem, at a grand scale, is that D.E.I.’s malleability and its ability to survive in pretty much every setting, whether it’s a nearby public school or the C.I.A., means that it has to be generic and ultimately inoffensive, which means that, in the end, D.E.I. didn’t really satisfy anyone.

What it did was provide a safety valve (I am speaking about D.E.I. in the past tense because I do think it will quickly be expunged from the private sector as well) for institutions that were dealing with racial and social-justice problems. If you had a protest on campus over any issue having to do with “diverse students” who wanted “equity,” that now became the provenance of D.E.I. officers who, if they were doing their job correctly, would defuse the situation and find some solution—oftentimes involving a task force—that made the picket line go away.

~Jay Caspian Kang, “What’s the Point of Trump’s War on DEI?”

It’s a symbolic loss of something that was only ever a symbolic gain. Corporate DEI programs as we know them sprung up in the wake of the Black Lives Matter protests of 2020, but I haven’t exactly noticed the world getting substantially more diverse or inclusive since then.

Which is not to say that tech culture has not gotten more diverse or inclusive over the longer arc of my career; it absolutely, definitely has. I began working in tech when I was just a teenager, over 20 years ago, and it is actually hard to convey just how much the world has changed since then.

And not because of corporate DEI policies. So why? Great question. 🙌

Tech culture changed because hearts and minds were changed

I think social media explains a lot about why awareness suddenly exploded in the 2010s. People who might never have intentionally clicked a link about racism or sexism were nevertheless exposed to a lot of compelling stories and arguments, via retweets and stuff ending up in their feed. I know this, because I was one of them.

The 2010s were a ferment of commentary and consciousness-raising in tech. A lot of brave people started speaking up and sharing their experiences with harassment, abuse, employer retaliation, unfair wage practices, blatant discrimination, racism, predators.. you name it. People were comparing notes with each other and realizing how common some of these experiences were, and developing new vocabulary to identify them — “missing stair”, “sandpaper feminism”, etc.

If you were in tech and you were paying attention at all, it got harder and harder to turn a blind eye. People got educated despite themselves, and in the end…many, many hearts and minds were changed.

This is what happened to me. I came from a religious and political background on the far right, but my eyes were opened. The more I looked around, the more evidence I saw in support of the moral and intellectual critiques I was reading online. I began waking up to some of the ways I had personally been complicit in doing harm to others.

The “unofficial affirmative action movement” in tech, circa 2010-2020

And I was not alone. Emily once offhandedly referred to an “unofficial affirmative action movement” in tech, and this really struck a chord with me. I know so many people whose hearts and minds were changed, who then took action.

They worked to diversify their personal networks of friends and acquaintances; to mentor, sponsor, and champion underrepresented folks in their workplaces; to recruit, promote, and refer women and people of color; to invite marginalized folks to speak at their conferences and on their panels; to support codes of conduct and unconscious bias training; and to educate themselves on how to be better allies in general.

All of this was happening for at least a decade leading up to 2020, when BLM shook up the industry and led to the creation of many corporate DEI initiatives. Kang, again:

What happened in many workplaces across the country after 2020 was that the people in charge were either genuinely moved by the Floyd protests or they were scared. Both the inspired and the terrified built out a D.E.I. infrastructure in their workplaces. These new employees would be given titles like chief diversity officer or C.D.O., which made it seem like it was part of the C-suite, and would be given a spot at every table, but much like at Stanford Law, their job was simply to absorb and handle any race stuff that happened.

The pivot from lobbying/persuading from the outside to holding the levers of formal power is a hard, hard one to execute well. History is littered with the shells of social movements that failed to make this leap.

You got here because you persuaded and earned credibility based on your stories and ideals, and now people are handing you the reins to make the rules. What do you do with them? Uh oh.

It’s easier to make rules and enforce them than it is to change hearts and minds

I think this happened to a lot of DEI advocates in the 2020-2024 era, when corporations briefly invested DEI programs and leaders with some amount of real corporate power, or at least the power to make petty rules. And I do not think it served our ideals well.

I just think…there’s only so much you can order people to do, before it backfires on you. Which doesn’t mean that laws and policies are useless; far from it. But they are limited. And they can trigger powerful backlash and resentment when they get overused as a means of policing people’s words and behaviors, especially in ways that seem petty or disconnected from actual impact.

When you lean on authority to drive compliance, you also stop giving people the opportunity to get on board and act from the heart.

MLK actually has a quote on this that I love, where he says “the law cannot make a man love me”:

“It may be true that the law cannot make a man love me, religion and education will have to do that, but it can restrain him from lynching me. And I think that’s pretty important also. And so that while legislation may not change the hearts of men, it does change the habits of men.”

~ Dr. Martin Luther King, Jr.

There are ways that the DEI movement really lost me around the time they got access to formal levers of power. It felt like there was a shift away from vulnerability and persuasion and towards mandates and speech policing.

Instead of taking the time to explain why something mattered, people were simply ordered to conform to an ever-evolving, opaque set of speech patterns as defined by social media. Worse, people sometimes got shamed or shut down for having legitimate questions.

There’s a big difference between saying that “marginalized people shouldn’t have to constantly have to defend their own existence and do the work of educating other people” (hard agree!), and saying that nobody should have to persuade or educate other folks and bring them along.

We do have to persuade, we do have to bring people along with us. We do have to fight for hearts and minds. I think we did a better job of this without the levers of formal power.

Don’t underestimate what a competitive advantage diversity can be

People have long marveled at the incredible amount of world class engineering talent we have always had at Honeycomb — long before we even had any customers, or a product to sell them. How did we manage this? The relative diversity of our teams has always been our chief recruiting asset.

There is a real hunger out there on the part of employees to work at a company that does more than the bare minimum in the realm of ethics. Especially as AI begins chewing away at historically white collar professions, people are desperate for evidence that you can be an ambitious, successful, money-making business that is unabashed about living its values and holding a humane, ethical worldview.

And increasingly, one of the main places people go to look for evidence that your company has ethical standards and takes them seriously is…the diversity of your teams.

Diversity is an imperfect proxy for corporate ethics, but it’s not a crazy one.

The diversity of your teams over the long run rests on your ability to build an inclusive culture and equitable policies. Which depends on your ability to infuse an ethical backbone throughout your entire company; to balance short-term and long-term investments, as you build a company that can win at business without losing its soul.

And I’m not actually talking about junior talent. Competition is so fierce lower on the ladder, those folks will usually take whatever they can get. (💔) I’m talking about senior folks, the kind of people who have their pick of roles, even in a weak job market. You might be shocked how many people out there will walk away from millions/year in comp at Netflix, Meta or Google, in order to work at a company where ethics are front and center, where diversity is table stakes, where their reporting chain and the executive team do not all look alike.

The longer you wait to build in equity and inclusion, the tougher it will be

Founders and execs come up to me rather often and ask what the secret is to hiring so many incredible contributors from underrepresented backgrounds. I answer: “It’s easy!…if you already have a diverse team.”

It is easier to build equitable programs and hire diverse teams early, and not drive yourself into a ditch, than it is to go full tilt with a monoculture and face years of recovery and repair. The longer you wait to do the work, the harder the work is going to be. Don’t put it off.

As I wrote a while back:

“If you don’t spend time, money, attention, or political capital on it, you don’t care about it, by definition. And it is a thousand times worse to claim you value something, and then demonstrate with your actions that you don’t care, than to never claim it in the first place.”

“You must remind yourself as you do, uneasily, queasily, how easily ‘I didn’t have a choice’ can slip from reason to excuse. How quickly ‘this isn’t the right time’ turns into ‘never the right time’. You know this, I know this, and I guarantee you every one of your employees knows this.”

~ Pragmatism, Neutrality and Leadership

It can be a massive competitive advantage if you build a company that knows how to develop a deep bench of talent and set people up for success.

Not only the preexisting elite, the smartest and most advantaged decile of talent — for whom competition will always be cutthroat — but people from broader walks of life.

Winning at business is what earns you the right to make bigger bets and longer-term investments

As the saying goes, “Nobody ever got fired for buying IBM” — and nobody ever had the failure of their startup blamed on the fact that they hired engineers away (or followed management practices) from Google, Netflix or Facebook, regardless of how good or bad those engineers (or practices) may be.

If you want to do something different, you need to succeed. People cargo cult the culture of places that make lots of money.

If you want your values and ideals to spread throughout the industry, the most impactful thing you can possibly do is win.

It’s a reality that when you’re a startup, your resources are scarce, your time horizons are short. You have to make smart decisions about where to invest them. Perfection is the enemy of success. Make good choices, so you can live to fight another day.

But fight another day.

If you don’t give a shit, don’t try and fake it

Finally let me say this: if you don’t give a shit about diversity or inclusion, don’t pretend you give a shit. It isn’t going to fool anyone. (If you “really care” but for some reason DEI loses every single bake-off for resources, friend, you don’t care.)

And honestly, as an employee, I would rather work for a soulless corporation that is honest with itself and its employees about how decisions get made, than for someone who claims to care about the things I value, but whose actions are unpredictable or inconsistent with those values.

Listen.. There is never just one true way to win. There are many paths up the mountain. There are many ways to win. (And there are many, many, many more ways to fail.)

Nothing that got imported or bolted on to your company operating system was ever going to work, anyway. 🤷 If it doesn’t live on in the hearts and minds of the people who are building the strategy and executing on it, they are dead words.

When I look at the long list of companies who say they are rolling back mentions to DEI internally, I don’t get that depressed. I see a long list of companies who never really meant it anyway. I’m glad they decided to stop performing.

You need a set of operating practices and principles that are internally consistent and authentic to who you are. And you need to do the work to bring people along with you, hearts and minds and all.

So if we care about our ideals, let’s go fucking win.

 

 

Corporate “DEI” is an imperfect vehicle for deeply meaningful ideals

On Versioning Observabilities (1.0, 2.0, 3.0…10.0?!?)

Hazel Weakly, you little troublemaker. 

As I whined to Hazel over text, after she sweetly sent me a preview draft of her post: “PLEASE don’t post this! I feel like I spend all my time trying to help bring clarity and context to what’s happening in the market, and this is NOT HELPING. Do you know how hard it is to try and socialize shared language around complex sociotechnical topics? Talking about ‘observability 3.0’ is just going to confuse everyone.”

That’s the problem with the internet, really; the way any asshole can go and name things (she said piteously, self-righteously, and with an astounding lack of self-awareness).

Semantic versioning is cheap and I kind of hate it

I’m complaining, because I feel sorry for myself (and because Hazel is a dear friend and can take it). But honestly, I actually kind of loathe the 1.0 vs 2.0 (or 3.0) framing myself. It’s helpful, it has explanatory power, I’m using it…but you’ll notice we aren’t slapping “Honeycomb is Observability 2.0” banners all over the website or anything.

Semantic versioning is a cheap and horrendously overused framing device in both technology and marketing. And it’s cheap for exactly these reasons…it’s too easy forEvery pillar has a price tag anyone to come along and bump the counter again and say it happens to be because of whatever fucking thing they are doing.

I don’t love it, but I don’t have a better idea. In this case, the o11y 2.0 language describes a real, backwards-incompatible, behavioral and technical generational shift in the industry. This is not a branding exercise in search of technological justification, it’s a technical sea change reaching for clarification in the market.

One of the most exciting things that happened this year is that all the new observability startups have suddenly stopped looking like cheaper Datadogs (three pillars, many sources of truth) and started looking like cheaper Honeycombs (wide, structured log events, single source of truth, OTel-native, usually Clickhouse-based). As an engineer, this is so fucking exciting.

(I should probably allow that these technologies have been available for a long time; adoption has accelerated over the past couple of years in the wake of the ZIRP era, as the exploding cost multiplier of the three pillars model has become unsustainable for more and more teams.)

Some non-controversial “controversial claims”

Firstly, I’m going to make a somewhat controversial claim in that you can get observability 2.0 just fine with “observability 1.0” vendors. The only thing you need from a UX standpoint is the ability to query correlations, which means any temporal data-structure, decorated with metadata, is sufficient.”

This is not controversial at all, in my book. You can get most of the way there, if you have enough time and energy and expertise, with 1.0 tooling. There are exceptions, and it’s really freaking hard. If all you have is aggregate buckets and random exemplars, your ability to slice and dice with precision will be dramatically limited. Hansel and Gretel forgot to instrument their path

This matters a lot, if you’re trying to e.g. break down by any combination of feature flags, build IDs, canaries, user IDs, app IDs, etc in an exploratory, open-ended fashion. As Hazel says, the whole point is to “develop the ability to ask meaningful questions, get useful answers, and act effectively on what you learn.” A-yep.

However, any time your explanation takes more than 30 seconds, you’ve lost your audience. This is at least a three-minute answer. Therefore, I typically tell people they need structured log events.

Observability 2.0” describes a sociotechnical sea change that is already well underway

Let’s stop talking about engineering for a moment, and talk about product marketing.

A key aspect of product marketing is simplification. That’s where the 2.0 language grew out of. About a year ago I started having a series of conversations with CTOs and VPEngs. All of them are like, “we already have observability, how is Honeycomb any different?” And I would launch off into a laundry list of features and capabilities, and a couple minutes later you see their eyes glazing over.Metrics, logs, tracing, drama

You have to have some way of boiling it down and making it pithy and memorable. And any time you do that, you lose some precision. So I actually disagree with very little Hazel has said in this essay. I’ve made most of the same points, in various times and places.

Good product marketing is when you take a strong technical differentiator and try to find evocative, resonant ways of making it click for people. Bad product marketing — and oh my god is there a lot of that — is when you start with the justification and work backwards. Or start with “well we should create our own category” and then try to define and defend one for sales purposes.

Or worst of all — “what our competitors are saying seems to be really working, but building it would take a long time and be very hard, so what if we just say the same words out loud and confuse everyone into buying our shit instead?”

(Ask me how many times this has happened to us, I fucking dare you.)

Understanding your software in the language of your business

Here’s why I really hate the 3.0 framing: I feel like all the critical aspects that I really really care about are already part of 2.0. They have to be. It’s the whole freaking point of the generational change which is already underway. 

We aren’t just changing data structures for the fun of it. The whole point is to be able to ask better questions, as Hazel correctly emphasizes in her piece.

Christine and I recently rewrote our company’s mission and vision. Our new vision states:

Understand your software in the language of your business.

Decades on, the promise of software and the software industry remains unfulfilled. Software engineering teams were supposed to be the innovative core of modern business; instead they are order-takers, cost centers, problem children. Honeycomb is here to shape a future where there is no divide between building software and building a business — a future where software engineers are truly the innovation engine of modern companies.

The beauty of high cardinality, high dimensionality data is that it gives you the power to pack dense quantities of application data, systems data, and business data all into the same blob of context, and then explore all three together.

Austin Parker wrote about this earlier this year (ironically, in response to yet another of Miss Weakly’s articles on observability):

Even if you’ve calculated the cost of downtime, you probably aren’t really thinking about the relationship between telemetry data and business data. Engineering stuff tends to stay in the engineering domain. Here’s some questions that I’d suggest most people can’t answer with their observability programs, but are absolutely fucking fascinating questions (emphasis mine):

 

  • What’s the relationship between system performance and conversions, by funnel stage? Break it down by geo, device, and intent signals.
  • What’s our cost of goods sold per request, per customer, with real-time pricing data of resources?
  • How much does each marginal API request to our enterprise data endpoint cost in terms of availability for lower-tiered customers? Enough to justify automation work?

Every truly interesting question we ask as engineers is some combination or intersection of business data + application data. We do no one any favors by chopping them up and siloing them off into different tools and data stores, for consumption by different teams.

Data lake ✅, query flexibility ✅, non-engineering functions…🚫

Hazel’s three predictions for what she calls “observability 3.0” are as follows:

  • Observability 3.0 backends are going to look a lot like a data lake-house architecture
  • Observability 3.0 will expand query capabilities to the point that it mostly erases the distinction between pay now / pay later, or “write time” vs “read time”
  • Observability 3.0 will, more than anything else, be measured by the value that non-engineering functions in the business are able to get from it

I agree with the first two — in fact, I think that’s exactly the trajectory that we’re on with 2.0. We are moving fast and accelerating in the direction of data lakehouse architectures, and in the direction of fast, flexible, and cheap querying. There’s nothing backwards-incompatible or breaking about these changes from a 2.0 -> 3.0 perspective.

Which brings us to the final one. This is the only place in the whole essay where there may be some actual daylight between where Hazel and I stand, depending on your perspective.

Other business functions already have nice things; we need to get our own house in order

No, I don’t think success will be measured by non-engineering functions’ ability to interrogate our data. I think it’s the opposite. I think it is engineers who need to integrate data about the business into our own telemetry, and get used to using it in our daily lives.

They’ve had nice things on the business side for years — for decades. They were rolling out columnar stores for business intelligence almost 20 years ago! Folks in sales and marketing are used to being able to explore and query their business data with ease. Can you even imagine trying to run a marketing org if you had to pre-define cohorts into static buckets before you even got started?Observability is a property of complex systems

No, in this case it’s actually engineering that are the laggards. It’s a very “the cobbler’s children have no shoes” kind of vibe, that we’re still over here warring over cardinality limits and pre-defined metrics and trying to wrestle them into understanding our massively, sprawlingly complex systems.

So I would flip that entirely around. The success of observability 2.0 will be measured by how well engineering teams can understand their decisions and describe what they do in the language of the business.

Other business functions already have nice tools for business data. What they don’t have — can’t have — is observability that integrates systems and application data in the same place as their business data. Uniting all three sources, that’s on us.

If every company is now a technology company, then technology execs need to sit at the big table

Hazel actually gets at this point towards the end of her essay:

We’ve had multiple decades as an industry to figure out how to deliver meaningful business value in a transparent manner, and if engineering leaders can’t catch up to other C-suites in that department soon, I don’t expect them to stick around another decade

The only member of the C-suite that has no standard template for their role is…CTO. CTOs are all over the freaking map.

Similarly, VPs of Engineering are usually not part of the innermost circles of execs.

Why? Because the point of that inner circle of execs is to co-make and co-own all of the decisions at the highest level about where to invest the company’s resources.

And engineering (and product, and design) usually can’t explain their decisions wellMy life was full of known-unknowns until you came into it enough in terms of the business for them to be co-owned and co-understood by the other members of the exec team. R&D is full of the artistes of the company. We tell you what we think we need to do our jobs, and you either trust us or you don’t.

(This is not a one-way street, of course; the levers of investment into R&D are often opaque, counter-intuitive and poorly understood by the rest of the exec team, and they also have a responsibility to educate themselves well enough to co-own these decisions. I always recommend these folks start by reading “Accelerate”.)

But twenty years of free money has done poorly by us as engineering leaders. The end of the ZIRP era is the best thing that could have happened to us. It’s time to get our house in order and sit at the big table.

“Know your business, run it with data”, as Jeff Gray, our COO, often says.

Which starts with having the right tools.

~charity

On Versioning Observabilities (1.0, 2.0, 3.0…10.0?!?)