I wrote a lot of blog posts over my time at Parse, but they all evaporated after Facebook killed the product. Most of them I didn’t care about (there were, ahem, a lot of “service reliability updates”), but I was mad about losing one specific piece, a deceptively casual retrospective of the grueling, murderous two-year rewrite of our entire API from Ruby on Rails to Golang..
I could have sworn I’d looked for it before, but someone asked me a question about migrations this morning, which spurred me to pull up the Wayback Machine again and dig in harder, and … ✨I FOUND IT!!✨
Honestly, it is entirely possible that if we had not done this rewrite, there might be no Honeycomb. In the early days of the rewrite, we would ship something in Go and the world would break, over and over and over. As I said,
Rails HTTP processing is built on a philosophy of “be liberal in what you accept”. So developers end up inadvertently sending API requests that are undocumented or even non-RFC compliant … but Rails middleware cleans them up and handles it fine.
Rails would accept any old trash, Go would not. Breakage ensues. Tests couldn’t catch what we didn’t know to look for. Eventually we lit upon a workflow where we would split incoming production traffic, run each request against a Go API server and a Ruby API server, each backed by its own set of MongoDB replicas, and diff the responses. This is when we first got turned on to how incredibly powerful Scuba was, in its ability to compare individual responses, field by field, line by line.
Once you’ve used a tool like that, you’re hooked.. you can’t possibly go back to metrics and aggregates. The rest, as they say, is history.
The whole thing is still pretty fun to read, even if I can still smell the blood and viscera a decade later. Enjoy.
“How We Moved Our API From Ruby to Go and Saved Our Sanity”
Originally posted on blog.parse.com on June 10th, 2015.
The first lines of Parse code were written nearly four years ago. In 2011 Parse was a crazy little idea to solve the problem of building mobile apps.
Those first few lines were written in Ruby on Rails.
Ruby on Rails
Ruby let us get the first versions of Parse out the door quickly. It let a small team of engineers iterate on it and add functionality very fast. There was a deep bench of library support, gems, deploy tooling, and best practices available, so we didn’t have to reinvent very many wheels.
We used Unicorn as our HTTP server, Capistrano to deploy code, RVM to manage the environment, and a zillion open source gems to handle things like YAML parsing, oauth, JSON parsing, MongoDB, and MySQL. We also used Chef which is Ruby-based to manage our infrastructure so everything played together nicely. For a while.
The first signs of trouble bubbled up in the deploy process. As our code base grew, it took longer and longer to deploy, and the “graceful” unicorn restarts really weren’t very graceful. So, we monkeypatched rolling deploy groups in to Capistrano.
“Monkeypatch” quickly became a key technical term that we learned to associate with our Ruby codebase.
A year and a half in, at the end of 2012, we had 200 API servers running on m1.xlarge instance types with 24 unicorn workers per instance. This was to serve 3000 requests per second for 60,000 mobile apps. It took 20 minutes to do a full deploy or rollback, and we had to do a bunch of complicated load balancer shuffling and pre-warming to prevent the API from being impacted during a deploy.
Then, Parse really started to take off and experience hockey-stick growth.
Problems
When our API traffic and number of apps started growing faster, we started having to rapidly spin up more database machines to handle the new request traffic. That is when the “one process per request” part of the Rails model started to fall apart.
With a typical Ruby on Rails setup, you have a fixed pool of worker processes, and each worker can handle only one request at a time. So any time you have a type of request that is particularly slow, your worker pool can rapidly fill up with that type of request. This happens too fast for things like auto-scaling groups to react. It’s also wasteful because the vast majority of these workers are just waiting on another service. In the beginning, this happened pretty rarely and we could manage the problem by paging a human and doing whatever was necessary to keep the API up. But as we started growing faster and adding more databases and workers, we added more points of failure and more ways for performance to get degraded.
We started looking ahead to when Parse would 10x its size, and realized that the one-process-per-request model just wouldn’t scale. We had to move to an async model that was fundamentally different from the Rails way. Yeah, rewrites are hard, and yeah they always take longer than anyone ever anticipates, but we just didn’t see how we could make the Rails codebase scale while it was tied to one process per request.
What next?
We knew we needed asynchronous operations. We considered a bunch of options:
EventMachine
We already had some of our push notification service using EventMachine, but our experience was not great as it too was scaling. We had constant trouble with accidentally introducing synchronous behavior or parallelism bugs. The vast majority of Ruby gems are not asynchronous, and many are not threadsafe, so it was often hard to find a library that did some common task asynchronously.
JRuby
This might seem like the obvious solution – after all, Java has threads and can handle massive concurrency. Plus it’s Ruby already, right? This is the solution Twitter investigated before settling on Scala. But since JRuby is still basically Ruby, it still has the problem of asynchronous library support. We were concerned about needing a second rewrite later, from JRuby to Java. And literally nobody at all on our backend or ops teams wanted to deal with deploying and tuning the JVM. The groans were audible from outer space.
C++
We had a lot of experienced C++ developers on our team. We also already had some C++ in our stack, in our Cloud Code servers that ran embedded V8. However, C++ didn’t seem like a great choice. Our C++ code was harder to debug and maintain. It seemed clear that C++ development was generally less productive than more modern alternatives. It was missing a lot of library support for things we knew were important to us, like HTTP request handling. Asynchronous operation was possible but often awkward. And nobody really wanted to write a lot of C++ code.
C#
C# was a strong contender. It arguably had the best concurrency model with Async and Await. The real problem was that C# development on Linux always felt like a second-class citizen. Libraries that interoperate with common open source tools are often unavailable on C#, and our toolchain would have to change a lot.
Go
Go and C# both have asynchronous operation built into the language at a low level, making it easy for large groups of people to write asynchronous code. The MongoDB Go driver is probably the best MongoDB driver in existence, and complex interaction with MongoDB is core to Parse. Goroutines were much more lightweight than threads. And frankly we were most excited about writing Go code. We thought it would be a lot easier to recruit great engineers to write Go code than any of the other solid async languages.
In the end, the choice boiled down to C# vs Go, and we chose Go.
Wherein we rewrite the world
We started out rewriting our EventMachine push backend from Ruby to Go. We did some preliminary benchmarking with Go concurrency and found that each network connection ate up only 4kb of RAM. After rewriting the EventMachine push backend to Go we went from 250k connections per node to 1.5 million connections per node without even touching things like kernel tuning. Plus it seemed really fun. So, Go it was.
We rewrote some other minor services and starting building new services in Go. The main challenge, though, was to rewrite the core API server that handles requests to api.parse.com while seamlessly maintaining backward compatibility. We rewrote this endpoint by endpoint, using a live shadowing system to avoid impacting production, and monitored the differential metrics to make sure the behaviors matched.
During this time, Parse 10x’d the number of apps on our backend and more than 10x’d our request traffic. We also 10x’d the number of storage systems backed by Ruby. We were chasing a rapidly moving target.
The hardest part of the rewrite was dealing with all the undocumented behaviors and magical mystery bits that you get with Rails middleware. Parse exposes a REST API, and Rails HTTP processing is built on a philosophy of “be liberal in what you accept”. So developers end up inadvertently sending API requests that are undocumented or even non-RFC compliant … but Rails middleware cleans them up and handles it fine.
So we had to port a lot of delightful behavior from the Ruby API to the Go API, to make sure we kept handling the weird requests that Rails handled. Stuff like doubly encoded URLs, weird content-length requirements, bodies in HTTP requests that shouldn’t have bodies, horrible oauth misuse, horrible mis-encoded Unicode.
Our Go code is now peppered with fun, cranky comments like these:
// Note: an unset cache version is treated by ruby as “”.// Because of this, dirtying this isn’t as simple as deleting it – we need to// actually set a new value.// This byte sequence is what ruby expects.// yes that’s a paren after the second 180, per ruby.// Inserting and having an op is kinda weird: We already know// state zero. But ruby supports it, so go does too.// single geo query, don’t do anything. stupid and does not make sense// but ruby does it. Changing this will break a lot of client tests.// just be nice and fix it here.// Ruby sets various defaults directly in the structure and expects them to appear in cache.// For consistency, we’ll do the same thing.
Results
Was the rewrite worth it? Hell yes it was. Our reliability improved by an order of magnitude. More importantly, our API is not getting more and more fragile as we spin up more databases and backing services. Our codebase got cleaned up and we got rid of a ton of magical gems and implicit assumptions. Co-tenancy issues improved for customers across the board. Our ops team stopped getting massively burned out from getting paged and trying to track down and manually remediate Ruby API outages multiple times a week. And needless to say, our customers were happier too.
We now almost never have reliability-impacting events that can be tracked back to the API layer – a massive shift from a year ago. Now when we have timeouts or errors, it’s usually constrained to a single app – because one app is issuing a very inefficient query that causes timeouts or full table scans for their app, or it’s a database-related co-tenancy problem that we can resolve by automatically rebalancing or filtering bad actors.
An asynchronous model had many other benefits. We were also able to instrument everything the API was doing with counters and metrics, because these were no longer blocking operations that interfered with communicating to other services. We could downsize our provisioned API server pool by about 90%. And we were also able to remove silos of isolated Rails API servers from our stack, drastically simplifying our architecture.
As if that weren’t enough, the time it takes to run our full integration test suite dropped from 25 minutes to 2 minutes, and the time to do a full API server deploy with rolling restarts dropped from 30 minutes to 3 minutes. The go API server restarts gracefully so no load balancer juggling and prewarming is necessary.
We love Go. We’ve found it really fast to deploy, really easy to instrument, really lightweight and inexpensive in terms of resources. It’s taken a while to get here, but the journey was more than worth it.
Credits/Blames
Credits/Blames go to Shyam Jayaraman for driving the initial decision to use Go, Ittai Golde for shepherding the bulk of the API server rewrite from start to finish, Naitik Shah for writing and open sourcing a ton of libraries and infrastructure underpinning our Go code base, and the rest of the amazing Parse backend SWE team who performed the rewrite.
I’ve never published an essay quite like this. I’ve written about my life before, reams of stuff actually, because that’s how I process what I think, but never for public consumption.
I’ve been pushing myself to write more lately because my co-authors and I have a whole fucking book to write between now and October. After ten years, you’d think this would be getting easier, not harder.
There’s something about putting out such memoiristic material that feels uncomfortably feminine to me. (Wow, ok.) I want to be known for my work, not for having a dramatic personal life. I love my family and don’t want to put them on display for the world to judge. And I never want the people I care about to feel like I am mining their experiences for clicks and content, whether that’s my family or my coworkers.
Many of the writing exercises I’ve been doing lately have ended up pulling on threads from my backstory, and the reason I haven’t published them is because I find myself thinking, “this won’t make any sense to people unless they know where I’m coming from.”
So hey, fuck it, let’s do this.
I went to college at the luckiest time
I left home when I was 15 years old. I left like a bottle rocket taking off – messy, explosive, a trail of destruction in my wake, and with absolutely zero targeting mechanisms.
It tells you a lot about how sheltered I was that the only place I could think of to go was university. I had never watched TV or been to a sports game or listened to popular music. I had never been to a doctor, I was quite unvaccinated.
I grew up in the backwoods of Idaho, the oldest of six, all of us homeschooled. I would go for weeks without seeing anyone other than my family. The only way to pass the time was by reading books or playing piano, so I did quite a lot of both. I called up the University of Idaho, asked for an admissions packet, hand wrote myself a transcript and gave myself all As, drove up and auditioned for the music department, and was offered a partial ride scholarship for classical piano performance.
I told my parents I was leaving, with or without their blessing or financial support. I left with neither.
My timing turned out to be flawless. I arrived on the cusp of the Internet age – they were wiring dorms for ethernet the year I enrolled. Maybe even more important, I arrived in the final, fading glory years of affordable state universities.
I worked multiple minimum wage jobs to put myself through school; day care, front desk, laundry, night audit. It was grueling, round the clock labor, but it was possible, if you were stubborn enough. I didn’t have a Social Security number (long story), I wasn’t old enough to take out loans, I couldn’t get financial aid because my parents didn’t file income taxes (again, long story). There was no help coming, I sank or I swam.
I found computers and the Internet around the same time as it dawned on me that everybody who studied music seemed to end up poor as an adult. I grew up too poor to buy canned vegetables or new underwear; we were like an 1800s family, growing our food, making our clothes, hand-me-downs til they fell apart.
Fuck being poor. Fuck it so hard. I was out.
I lost my music scholarship, but I started building websites and running systems for the university, then for local businesses. I dropped out and took a job in San Francisco. I went back, abortively; I dropped out again.
By the time I was 20 I was back in SF for good, making a salary five times what my father had made.
I grew up with a very coherent belief system that did not work for me
A lot of young people who flee their fundamentalist upbringing do so because they were abused and/or lost their faith, usually due to the hypocrisy of their leaders. Not me. I left home still believing the whole package – that evolution was a fraud, that the earth was created in seven days, that woman was created from Adam’s rib to be a submissive helpmate for their husband, that birth control was a sin, that anyone who believed differently was going to hell.
My parents loved us deeply and unshakably, and they were not hypocrites. In the places I grew up, the people who believed in God and went to church and lived a certain way were the ones who had their shit together, and the people who believed differently had broken lives. Reality seemed to confirm the truth of all we were taught, no matter how outlandish it sounds.
So I fully believed it was all true. I also knew it did not workfor me. I did not want a small life. I did not want to be the support system behind some godly dude. I wanted power, money, status, fame, autonomy, success. I wanted to leave a crater in the world.
I was not a rebellious child, believe it or not. I loved my parents and wanted to make them proud. But as I entered my teens, I became severely depressed, and turned inward and hurt myself in all the ways young people do.
I left because staying there was killing me, and ultimately, I think my parents let me go because they saw it too.
Running away from things worked until it didn’t
I didn’t know what I wanted out of life other than all of it; right now, and my first decade out on my own was a hoot. It was in my mid twenties that everything started to fall apart.
I was an earnest kid who liked to study and think about the meaning of life, but when I bolted, I slammed the door to my conscience shut. I knew I was going to hell, but since I couldn’t live the other way, I made the very practical determination based on actuarial tables that I could to go my own way for a few decades, then repent and clean up my shit before I died. (Judgment Day was one variable that gave me heartburn, since it could come at any time.)
I was not living in accordance with my personal values and ethics, to put it lightly. I compartmentalized; it didn’t bother me, until it did. It started leaking into my dreams every night, and then it took over my waking life. I was hanging on by a thread; something had to give.
My way out, unexpectedly, started with politics. I started mainlining books about politics and economics during the Iraq War, which then expanded to history, biology, philosophy, other religious traditions, and everything else. (You can still find a remnant of my reading list here.)
When I was 13, I had an ecstatic religious experience; I was sitting in church, stewing over going to hell, and was suddenly filled with a glowing sense of warmth and acceptance. It lasted for nearly two weeks, and that’s how I knew I was “saved”.
In my late 20s, after a few years of intense study and research, I had a similar ecstatic experience walking up the stairs from the laundry room. I paused, I thought “maybe there is no God; maybe there is nobody out there judging me; maybe it all makes sense”, and it all clicked into place, and I felt high for days, suffused with peace and joy.
My career didn’t really take off until after that. I always had a job, but I wasn’t thinking about tech after hours. At first I was desperately avoiding my problems and self-medicating, later I became obsessed with finding answers. What did I believe about taxation, public policy, voting systems, the gender binary, health care, the whole messy arc of American history? I was an angry, angry atheist for a while. I filled notebook after notebook with handwritten notes; if I wasn’t working, I was studying.
And then, gradually, I wound down. The intensity, the high, tapered off. I started dating, realized I was poly and queer, and slowly chilled the fuck out. And that’s when I started being able to dedicate the creative, curious parts of my brain to my job in tech.
Why am I telling you all this?
Will Larson has talked a lot about how his underlying motivation is “advancing the industry”. I love that for him. He is such a structured thinker and prolific writer, and the industry needs his help, very badly.
For a while I thought that was my motivation too. And for sure, that’s a big part of it, particularly when it comes to observability and my day job. (Y’all, it does not need to be this hard. Modern observability is the cornerstone and prerequisite for high performing engineering teams, etc etc.)
But when I think about what really gets me activated on a molecular level, it’s a little bit different. It’s about living a meaningful life, and acting with integrity, and building things of enduring value instead of tearing them down.
When I say it that way, it sounds like sitting around on the mountain meditating on the meaning of life, and that is not remotely what I mean. Let me try again.
For me, work has been a source of liberation
It’s very uncool these days to love your job or talk about hard work. But work has always been a source of liberation for me. My work has brought me so much growth and development and community and friendship. It brings meaning to my life, and the joy of creation. I want this for myself. I want this for anyone else who wants it too.
I understand why this particular tide has turned. So many people have had jobs where their employers demanded total commitment, but felt no responsibility to treat them well or fairly in return. So many people have never experienced work as anything but a depersonalizing grind, or an exercise in exploitation, and that is heartbreaking.
I don’t think there’s anything morally superior about people who want their work to be a vehicle for personal growth instead of just a paycheck. I don’t think there’s anything wrong with just wanting a paycheck, or wanting to work the bare minimum to get by. But it’s not what I want for myself, and I don’t think I’m alone in this.
I feel intense satisfaction and a sense of achievement when I look back on my career. On a practical level, I’ve been able to put family members through college, help with down payments, and support artists in my community. All of this would have been virtually unimaginable to me growing up.
I worked a lot harder on the farm than I ever have in front of a keyboard, and got a hell of a lot less for my efforts.
(People who glamorize things like farming, gardening, canning and freezing, taking care of animals, cooking and caretaking, and other forms of manual labor really get under my skin. All of these things make for lovely hobbies, but subsistence labor is neither fun nor meaningful. Trust me on this one.)
My engineer/manager pendulum days
I loved working as an engineer. I loved how fast the industry changes, and how hard you have to scramble to keep up. I loved the steady supply of problems to fix, systems to design, and endless novel catastrophes to debug. The whole Silicon Valley startup ecosystem felt like it could not have been more perfectly engineered to supply steady drips of dopamine to my brain.
I liked working as an engineering manager. Eh, that might be an overstatement. But I have strong opinions and I like being in charge, and I really wanted more access to information and influence over decisions, so I pushed my way into the role more than once.
If Honeycomb hadn’t happened, I am sure I would have bounced back and forth between engineer and manager for the rest of my career. I never dreamed about climbing the ladder or starting a company. My attitude towards middle management could best be described as amiable contempt, and my interest in the business side of things was nonexistent.
I have always despised people who think they’re too good to work for other people, and that describes far too many of the founders I’ve met.
Operating a company draws on a different kind of meaning
I got the chance to start a company in 2016, so I took it, almost on a whim. Since then I have done so many things I never expected to do. I’ve been a founder, CEO, CTO, I’ve raised money, hired and fired other execs, run organizations, crafted strategy, and come to better understand and respect the critical role played by sales, marketing, HR, and other departments. No one is more astonished than I am to find me still here, still doing this.
But there is joy to be found in solving systems problems, even the ones that are less purely technical. There is joy to be found in building a company, or competing in a marketplace.
To be honest, this is not a joy that came to me swiftly or easily. I’ve been doing this for the past 9.5 years, and I’ve been happy doing it for maybe the past 2-3 years. But it has always felt like work worth doing. And ultimately, I think I’m less interested in my own happiness (whatever that means) than I am interested in doing work that feels worth doing.
Work is one of the last remaining places where we are motivated to learn from people we don’t agree with and find common pursuit with people we are ideologically opposed to. I think that’s meaningful. I think it’s worth doing.
Reality doesn’t give a shit about ideology
I am a natural born extremist. But when you’re trying to operate a business and win in the marketplace, ideological certainty crashes hard into the rocks of reality. I actually find this deeply motivating.
I spent years hammering out my own personal ontological beliefs about what is right and just, what makes a life worth living, what responsibilities we have to each another. I didn’t really draw on those beliefs very often as an engineer/manager, at least not consciously. That all changed dramatically after starting a company.
It’s one thing to stand off to the side and critique the way a company is structured and the decisions leaders make about compensation, structure, hiring/firing, etc. But creation is harder than critique (one of my favorite Jeff Gray quotes) — so, so, so much harder. And reality resists easy answers.
Being an adult, to me, has meant making peace with a multiplicity of narratives. The world I was born into had a coherent story and a set of ideals that worked really well for a lot of people, but it was killing me. Not every system works for every person, and that’s okay. That’s life. Startups aren’t for everyone, either.
The struggle is what brings your ideals to life
Almost every decision you make running a company has some ethical dimension. Yet the foremost responsibility you have to your stakeholders, from investors to employees, is to make the business succeed, to win in the marketplace. Over-rotating on ethical repercussions of every move can easily cause you to get swamped in the details and fail at your prime directive.
Sometimes you may have a strongly held belief that some mainstream business practice is awful, so you take a different path, and then you learn the hard way why it is that people don’t take that path. (This has happened to me more times than I can count. 🙈)
Ideals in a vacuum are just not that interesting. If I wrote an essay droning on and on about “leading with integrity”, no one would read it, and nor should they. That’s boring. What’s interesting is trying to win and do hard things, while honoring your ideals.
Shooting for the stars and falling short, innovating, building on the frontier of what’s possible, trying but failing, doing exciting things that exceed your hopes and dreams with a team just as ambitious and driven as you are, while also holding your ideals to heart — that’s fucking exciting. That’s what brings your ideals to life.
We have lived through the golden age of tech
I recognize that I have been profoundly lucky to be employed through the golden age of tech. It’s getting tougher out there to enter the industry, change jobs, or lead with integrity.
It’s a tough time to be alive, in general. There are macro scale political issues that I have no idea how to solve or fix. Wages used to rise in line with productivity, and now they don’t, and haven’t since the mid 70s. Capital is slurping up all the revenue and workers get an ever decreasing share, and I don’t know how to fix that, either.
But I don’t buy the argument that just because something has been touched by capitalism or finance it is therefore irreversibly tainted, or that there is no point in making capitalist institutions better. The founding arguments of capitalism were profoundly moral ones, grounded in a keen understanding of human nature. (Adam Smith’s “Wealth of Nations” gets all the attention, but his other book, “Theory of Moral Sentiments”, is even better, and you can’t read one without the other.)
As a species we are both individualistic and communal, selfish and cooperative, and the miracle of capitalism is how effectively it channels the self-interested side of our nature into the common good.
Late stage capitalism, however, along with regulatory capture, enshittification, and the rest of it, has made the modern world brutally unkind to most people. Tech was, for a shining moment in time, a path out of poverty for smart kids who were willing to work their asses off. It’s been the only reliable growth industry of my lifetime.
It remains, for my money, the best job in the world. Or it can be. It’s collaborative, creative, and fun; we get paid scads of money to sit in front of a computer and solve puzzles all day. So many people seem to be giving up on the idea that work can ever be a place of meaning and collaboration and joy. I think that sucks. It’s too soon to give up! If we prematurely abandon tech to its most exploitative elements, we guarantee its fate.
If you want to change the world, go into business
Once upon a time, if you had strongly held ideals and wanted to change the world, you went into government or nonprofit work.
For better or for worse (okay, mostly worse), we live in an age where corporate power dominates. If you want to change the world, go into business.
The world needs, desperately, people with ethics and ideals who can win at business. We can’t let all the people who care about people go into academia or medicine or low wage service jobs. We can’t leave the ranks of middle and upper management to be filled by sycophants and sociopaths.
There’s nothing sinister about wanting power; what matters is what you do with it. Power, like capitalism, is a tool, and can be bent to powerful ends both good and evil. If you care about people, you should be unashamed about wanting to amass power and climb the ladder.
There are a lot of so-called best practices in this industry that are utterly ineffective (cough, whiteboarding B-trees in an interview setting), yet they got cargo culted and copied around for years. Why? Because the company that originated the practice made a lot of money. This is stupid, but it also presents an opportunity. All you need to do is be a better company, then make a lot of money. 😉
People need institutions
I am a fundamentalist at heart, just like my father. I was born to be a bomb thrower and a contrarian, a thorn in the side of the smug moderate establishment. Unfortunately, I was born in an era where literally everyone is a fucking fundamentalist and the establishment is holding on by a thread.
I’ve come to believe that the most quietly radical, rebellious thing I can possibly do is to be an institutionalist, someone who builds instead of performatively tearing it all down.
People need institutions. We crave the feeling of belonging to something much larger than ourselves. It’s one of the most universal experiences of our species.
One of the reasons modern life feels so fragmented and hard is because so many of our institutions have broken down or betrayed the people they were supposed to serve. So many of the associations that used to frame our lives and identities — church, government, military, etc — have tolerated or covered up so much predatory behavior and corruption, it no longer surprises anyone.
We’ve spent the past few decades ripping down institutions and drifting away from them. But we haven’t stopped wanting them, or needing them.
I hope, perhaps naively, that we are entering into a new era of rebuilding, sadder but wiser. An era of building institutions with accountability and integrity, institutions with enduring value, that we can belong to and take pride in… not because we were coerced or deceived, not because they were the only option, but because they bring us joy and meaning. Because we freely choose them, because they are good for us.
The second half of your career is about purpose
It seems very normal to enter the second half of your 40 year career thinking a lot about meaning and purpose. You spend the first decade or so hoovering up skill sets, the second finding your place and what feeds you, and then, inevitably, you start to think about what it all means and what your legacy will be.
That’s definitely where I’m at, as I think about the second half of my career. I want to take risks. I want to play big and win bigger. I want to show that hard work isn’t just a scam inflicted on those who don’t know any better. If we win, I want the people I work with to earn lifechanging amounts of money, so they can buy homes and send their kids to college. I want to show that work can still be an avenue for liberation and community and personal growth, for those of us who still want that.
I care about this industry and the people in it so much, because it’s been such a gift to me. I want to do what I can to make it a better place for generations to come. I want to build institutions worth belonging to.
This article was originally commissioned by Luca Rossi (paywalled) for refactoring.fm, on February 11th, 2025. Luca edited a version of it that emphasized the importance of building “10x engineering teams” . It was later picked up by IEEE Spectrum (!!!), who scrapped most of the teams content and published a different, shorter piece on March 13th.
This is my personal edit. It is not exactly identical to either of the versions that have been publicly released to date. It contains a lot of the source material for the talk I gave last week at #LDX3 in London, “In Praise of ‘Normal’ Engineers” (slides), and a couple weeks ago at CraftConf.
In Praise of “Normal” Engineers
Most of us have encountered a few engineers who seem practically magician-like, a class apart from the rest of us in their ability to reason about complex mental models, leap to non-obvious yet elegant solutions, or emit waves of high quality code at unreal velocity.
I have run into any number of these incredible beings over the course of my career. I think this is what explains the curious durability of the “10x engineer” meme. It may be based on flimsy, shoddy research, and the claims people have made to defend it have often been risible (e.g. “10x engineers have dark backgrounds, are rarely seen doing UI work, are poor mentors and interviewers”), or blatantly double down on stereotypes (“we look for young dudes in hoodies that remind us of Mark Zuckerberg”). But damn if it doesn’t resonate with experience. It just feels true.
The problem is not the idea that there are engineers who are 10x as productive as other engineers. I don’t have a problem with this statement; in fact, that much seems self-evidently true. The problems I do have are twofold.
Measuring productivity is fraught and imperfect
First: how are you measuring productivity? I have a problem with the implication that there is One True Metric of productivity that you can standardize and sort people by. Consider, for a moment, the sheer combinatorial magnitude of skills and experiences at play:
Are you working on microprocessors, IoT, database internals, web services, user experience, mobile apps, consulting, embedded systems, cryptography, animation, training models for gen AI… what?
Are you using golang, python, COBOL, lisp, perl, React, or brainfuck? What version, which libraries, which frameworks, what data models? What other software and build dependencies must you have mastered?
What adjacent skills, market segments, or product subject matter expertise are you drawing upon…design, security, compliance, data visualization, marketing, finance, etc?
What stage of development? What scale of usage? What matters most — giving good advice in a consultative capacity, prototyping rapidly to find product-market fit, or writing code that is maintainable and performant over many years of amortized maintenance? Or are you writing for the Mars Rover, or shrinkwrapped software you can never change?
Also: people and their skills and abilities are not static. At one point, I was a pretty good DBRE (I even co-wrote the book on it). Maybe I was even a 10x DB engineer then, but certainly not now. I haven’t debugged a query plan in years.
“10x engineer” makes it sound like 10x productivity is an immutable characteristic of a person. But someone who is a 10x engineer in a particular skill set is still going to have infinitely more areas where they are normal or average (or less). I know a lot of world class engineers, but I’ve never met anyone who is 10x better than everyone else across the board, in every situation.
Engineers don’t own software, teams own software
Second, and even more importantly: So what? It doesn’t matter. Individual engineers don’t own software, teams own software. The smallest unit of software ownership and delivery is the engineering team. It doesn’t matter how fast an individual engineer can write software, what matters is how fast the team can collectively write, test, review, ship, maintain, refactor, extend, architect, and revise the software that they own.
Everyone uses the same software delivery pipeline. If it takes the slowest engineer at your company five hours to ship a single line of code, it’s going to take the fastest engineer at your company five hours to ship a single line of code. The time spent writing code is typically dwarfed by the time spent on every other part of the software development lifecycle.
If you have services or software components that are owned by a single engineer, that person is a single point of failure.
I’m not saying this should never happen. It’s quite normal at startups to have individuals owning software, because the biggest existential risk that you face is not moving fast enough, not finding product market fit, and going out of business. But as you start to grow up as a company, as users start to demand more from you, and you start planning for the survival of the company to extend years into the future…ownership needs to get handed over to a team. Individual engineers get sick, go on vacation, and leave the company, and the business has got to be resilient to that.
If teams own software, then the key job of any engineering leader is to craft high-performing engineering teams. If you must 10x something, 10x this. Build 10x engineering teams.
The best engineering orgs are the ones where normal engineers can do great work
When people talk about world-class engineering orgs, they often have in mind teams that are top-heavy with staff and principal engineers, or recruiting heavily from the ranks of ex-FAANG employees or top universities.
But I would argue that a truly great engineering org is one where you don’t HAVE to be one of the “best” or most pedigreed engineers in the world to get shit done and have a lot of impact on the business.
I think it’s actually the other way around. A truly great engineering organization is one where perfectly normal, workaday software engineers, with decent software engineering skills and an ordinary amount of expertise, can consistently move fast, ship code, respond to users, understand the systems they’ve built, and move the business forward a little bit more, day by day, week by week.
Any asshole can build an org where the most experienced, brilliant engineers in the world can build product and make progress. That is not hard. And putting all the spotlight on individual ability has a way of letting your leaders off the hook for doing their jobs. It is a HUGE competitive advantage if you can build sociotechnical systems where less experienced engineers can convert their effort and energy into product and business momentum.
A truly great engineering org also happens to be one that mints world-class software engineers. But we’re getting ahead of ourselves, here.
Let’s talk about “normal” for a moment
A lot of technical people got really attached to our identities as smart kids. The software industry tends to reflect and reinforce this preoccupation at every turn, from Netflix’s “we look for the top 10% of global talent” to Amazon’s talk about “bar-raising” or Coinbase’s recent claim to “hire the top .1%”. (Seriously, guys? Ok, well, Honeycomb is going to hire only the top .00001%!)
In this essay, I would like to challenge us to set that baggage to the side and think about ourselves as normal people.
It can be humbling to think of ourselves as normal people, but most of us are in fact pretty normal people (albeit with many years of highly specialized practice and experience), and there is nothing wrong with that. Even those of us who are certified geniuses on certain criteria are likely quite normal in other ways — kinesthetic, emotional, spatial, musical, linguistic, etc.
Software engineering both selects for and develops certain types of intelligence, particularly around abstract reasoning, but nobody is born a great software engineer. Great engineers are made, not born. I just don’t think there’s a lot more we can get out of thinking of ourselves as a special class of people, compared to the value we can derive from thinking of ourselves collectively as relatively normal people who have practiced a fairly niche craft for a very long time.
Build sociotechnical systems with “normal people” in mind
When it comes to hiring talent and building teams, yes, absolutely, we should focus on identifying the ways people are exceptional and talented and strong. But when it comes to building sociotechnical systems for software delivery, we should focus on all the ways people are normal.
Normal people have cognitive biases — confirmation bias, recency bias, hindsight bias. We work hard, we care, and we do our best; but we also forget things, get impatient, and zone out. Our eyes are inexorably drawn to the color red (unless we are colorblind). We develop habits and ways of doing things, and resist changing them. When we see the same text block repeatedly, we stop reading it.
We are embodied beings who can get overwhelmed and fatigued. If an alert wakes us up at 3 am, we are much more likely to make mistakes while responding to that alert than if we tried to do the same thing at 3pm. Our emotional state can affect the quality of our work. Our relationships impact our ability to get shit done.
When your systems are designed to be used by normal engineers, all that excess brilliance they have can get poured into the product itself, instead of wasting it on navigating the system itself.
How do you turn normal engineers into 10x engineering teams?
None of this should be terribly surprising; it’s all well known wisdom. In order to build the kind of sociotechnical systems for software delivery that enable normal engineers to move fast, learn continuously, and deliver great results as a team, you should:
Shrink the interval between when you write the code and when the code goes live.
Make it as short as possible; the shorter the better. I’ve written and given talks about this many, many times. The shorter the interval, the lower the cognitive carrying costs. The faster you can iterate, the better. The more of your brain can go into the product instead of the process of building it.
One of the most powerful things you can do is have a short, fast enough deploy cycle that you can ship one commit per deploy. I’ve referred to this as the “software engineering death spiral” … when the deploy cycle takes so long that you end up batching together a bunch of engineers’ diffs in every build. The slower it gets, the more you batch up, and the harder it becomes to figure out what happened or roll back. The longer it takes, the more people you need, the higher the coordination costs, and the more slowly everyone moves.
Deploy time is the feedback loop at the heart of the development process. It is almost impossible to overstate the centrality of keeping this short and tight.
Make it easy and fast to roll back or recover from mistakes.
Developers should be able to deploy their own code, figure out if it’s working as intended or not, and if not, roll forward or back swiftly and easily. No muss, no fuss, no thinking involved.
Make it easy to do the right thing and hard to do the wrong thing.
Wrap designers and design thinking into all the touch points your engineers have with production systems. Use your platform engineering team to think about how to empower people to swiftly make changes and self-serve, but also remember that a lot of times people will be engaging with production late at night or when they’re very stressed, tired, and possibly freaking out. Build guard rails. The fastest way to ship a single line of code should also be the easiest way to ship a single line of code.
Invest in instrumentation and observability.
You’ll never know — not really — what the code you wrote does just by reading it. The only way to be sure is by instrumenting your code and watching real users run it in production. Good, friendly sociotechnical systems invest heavily in tools for sense-making.
Being able to visualize your work is what makes engineering abstractions accessible to actual engineers. You shouldn’t have to be a world-class engineer just to debug your own damn code.
Devote engineering cycles to internal tooling and enablement.
If fast, safe deploys, with guard rails, instrumentation, and highly parallelized test suites are “everybody’s job”, they will end up nobody’s job. Engineering productivity isn’t something you can outsource. Managing the interfaces between your software vendors and your own teams is both a science and an art. Making it look easy and intuitive is really hard. It needs an owner.
Build an inclusive culture.
Growth is the norm, growth is the baseline. People do their best work when they feel a sense of belonging. An inclusive culture is one where everyone feels safe to ask questions, explore, and make mistakes; where everyone is held to the same high standard, and given the support and encouragement they need to achieve their goals.
Diverse teams are resilient teams.
Yeah, a team of super-senior engineers who all share a similar background can move incredibly fast, but a monoculture is fragile. Someone gets sick, someone gets pregnant, you start to grow and you need to integrate people from other backgrounds and the whole team can get derailed — fast.
When your teams are used to operating with a mix of genders, racial backgrounds, identities, age ranges, family statuses, geographical locations, skill sets, etc — when this is just table stakes, standard operating procedure — you’re better equipped to roll with it when life happens.
Assemble engineering teams from a range of levels.
The best engineering teams aren’t top-heavy with staff engineers and principal engineers. The best engineering teams are ones where nobody is running on autopilot, banging out a login page for the 300th time; everyone is working on something that challenges them and pushes their boundaries. Everyone is learning, everyone is teaching, everyone is pushing their own boundaries and growing. All the time.
By the way — all of that work you put into making your systems resilient, well-designed, and humane is the same work you would need to do to help onboard new engineers, develop junior talent, or let engineers move between teams.
It gets used and reused. Over and over and over again.
The only meaningful measure of productivity is impact to the business
The only thing that actually matters when it comes to engineering productivity is whether or not you are moving the business materially forward.
Which means…we can’t do this in a vacuum. The most important question is whether or not we are working on the right thing, which is a problem engineering can’t answer without help from product, design, and the rest of the business.
Software engineering isn’t about writing lots of lines of code, it’s about solving business problems using technology.
Senior and intermediate engineers are actually the workhorses of the industry. They move the business forward, step by step, day by day. They get to put their heads down and crank instead of constantly looking around the org and solving coordination problems. If you have to be a staff+ engineer to move the product forward, something is seriously wrong.
Great engineering orgs mint world-class engineers
A great engineering org is one where you don’t HAVE to be one of the best engineers in the world to have a lot of impact. But — rather ironically — great engineering orgs mint world class engineers like nobody’s business.
The best engineering orgs are not the ones with the smartest, most experienced people in the world, they’re the ones where normal software engineers can consistently make progress, deliver value to users, and move the business forward, day after day.
Places where engineers can get shit done and have a lot of impact are a magnet for top performers. Nothing makes engineers happier than building things, solving problems, making progress.
If you’re lucky enough to have world-class engineers in your org, good for you! Your role as a leader is to leverage their brilliance for the good of your customers and your other engineers, without coming to depend on their brilliance. After all, these people don’t belong to you. They may walk out the door at any moment, and that has to be okay.
These people can be phenomenal assets, assuming they can be team players and keep their egos in check. Which is probably why so many tech companies seem to obsess over identifying and hiring them, especially in Silicon Valley.
But companies categorically overindex on finding these people after they’ve already been minted, which ends up reinforcing and replicating all the prejudices and inequities of the world at large. Talent may be evenly distributed across populations, but opportunity is not.
Don’t hire the “best” people. Hire the right people.
We (by which I mean the entire human race) place too much emphasis on individual agency and characteristics, and not enough on the systems that shape us and inform our behaviors.
I feel like a whole slew of issues (candidates self-selecting out of the interview process, diversity of applicants, etc) would be improved simply by shifting the focus on engineering hiring and interviewing away from this inordinate emphasis on hiring the BEST PEOPLE and realigning around the more reasonable and accurate RIGHT PEOPLE.
It’s a competitive advantage to build an environment where people can be hired for their unique strengths, not their lack of weaknesses; where the emphasis is on composing teams rather than hiring the BEST people; where inclusivity is a given both for ethical reasons and because it raises the bar for performance for everyone. Inclusive culture is what actual meritocracy depends on.
This is the kind of place that engineering talent (and good humans) are drawn to like a moth to a flame. It feels good to ship. It feels good to move the business forward. It feels good to sharpen your skills and improve your craft. It’s the kind of place that people go when they want to become world class engineers. And it’s the kind of place where world class engineers want to stick around, to train up the next generation.
A few eagle-eyed readers have noticed that it’s been 4 weeks since my last entry in what I have been thinking of as my “niblet series” — one small piece per week, 1000 words or less, for the next three months.
This is true. However, I did leave myself some wiggle room in my original goal, when I said “weeks when I am not traveling”, knowing I was traveling 6 of the next 7 weeks. I was going to TRY to write something on the weeks I was traveling, but as you can see, I mostly did not succeed. Oh well!
Honestly, I don’t feel bad about it. I’ve written well over 1k words on bsky over the past two weeks in the neverending thread on the costs and tradeoffs of remote work. (A longform piece on the topic is coming soon.) I also wrote a couple of lengthy internal pieces.
This whole experiment was designed to help me unblock my writing process and try out new habits, and I think I’m making progress. I will share what I’m learning at a later date, but for now: onward!
How long does it take to form an impression of a new job?
This week’s niblet was inspired by a conversation I had yesterday with an internet friend. To paraphrase (and lightly anonymize) their question:
“I took a senior management role at this company six months ago. My search for this role was all about values alignment, from company mission to leadership philosophy, and the people here said all the right things in the process. But it’s just not clicking.
It’s only been six months, but it’s starting to feel like it might not work out. How much longer should I give it?”
Zero. You should give it 0 time. You already know, and you’ve known for a long time; it’s not gonna change. I’m sorry. 💔
I’m not saying you should quit tomorrow, a person needs a paycheck, but you should probably start thinking in terms of how to manage the problem and extricate yourself from it, not like you’re waiting to see if it will be a good fit.
Every job I’ve ever had has made a strong first impression
I’ve had…let’s see…about six different employers, over the course of my (post-university) career.
Every job I’ve ever taken, I knew within the first week whether it was right for me or not. That might be overstating things a bit (memory can be like that). But I definitely had a strong visceral reaction to the company within days after starting, and the rest of my tenure played out more or less congruent with that reaction.
The first week at EVERY job is a hot mess of anxiety and nerves and second-guessing yourself and those around you. It’s never warm fuzzies. But at the jobs I ended up loving and staying at long term, the anxiety was like “omg these people are so cool and so great and so fucking competent, I hope I can measure up to their expectations.”
And then there were the jobs where the anxiety I felt was more like a sinking sensation of dread, of “oooohhh god I hope this is a one-off and not the kind of thing I will encounter every day.”
🌸 There was the job where they had an incident on my very first day, and by 7 pm I was like “why isn’t someone telling me I should go home?” There was literally nothing I could do to help, I was still setting up my accounts, yet I had the distinct impression I was expected to stay.
This job turned out to be stereotypically Silicon Valley in the worst ways, hiring young, cheap engineers and glorifying coding all night and sleeping under your desks.
🌼 There was the job where they were walking me through a 50-page Microsoft Word doc on how to manage replication between DB nodes, and I laughed a little, and looked for some rueful shared acknowledgement of how shoddy this was…but I was the only one laughing.
That job turned out to be shoddy, ancient, flaky tech all the way down, with comfortable, long-tenured staff who didn’t know (and did NOT want to hear) how out of date their tech had become.
Over time, I learned to trust that intuition
Around the time I became a solidly senior engineer, I began to reflect on how indelible my early impressions of each job had been, and how reliable those impressions had turned out to be.
To be clear, I don’t regret these jobs. I got to work with some wonderful people, and I got to experience a range of different organizational structures and types. I learned a lot from every single one of my jobs.
Perhaps most of all, I learned how to sniff out particular environments that really do not work for me, and I never made the same mistake twice.
Companies can and do change dramatically. But absent dramatic action, which can be quite painful, they tend to drift along their current trajectory.
This matters even more for managers
This is one of those ways that I think the work of management is different from the work of engineering. As an experienced IC, it’s possible to phone it in and still do a good job. As long as you’re shipping at an acceptable rate, you can check out mentally and emotionally, even work for people or companies you basically despise.
Lots of people do in fact do this. Hell, I’ve done it. You aren’t likely to do the best work of your life under these circumstances, but people have done far worse to put food on the table.
An IC can wall themselves off emotionally and still do acceptable work, but I’m not sure a manager can do the same.
Alignment *is* the job of management
As a manager, you literally represent the company to your team and those around you. You don’t have to agree with every single decision the company makes, but if you find yourself constantly having to explain and justify things the company has done that deeply violate your personal beliefs or ethics, it does you harm.
Some managers respond to a shitty corporate situation by hunkering down and behaving like a shit umbrella; doing whatever they can to protect their people, at the cost of undermining the company itself. I don’t recommend this, either. It’s not healthy to know you walk around every day fucking over one of your primary stakeholders, whether it’s the company OR your teammates.
There are also companies that aren’t actually that bad, but you just aren’t aligned with them. That’s fine. Alignment matters a lot more for managers than for ICs, because alignment is the job.
Management is about crafting and tending to complex sociotechnical systems. No manager can do this alone. Having a healthy, happy team of direct reports is only a fraction of the job description. It’s not enough. You can and should expect more.
What can you learn from the experience?
I asked my friend to think back to the interview process. What were the tells? What do they wish they had known to watch out for?
They thought for a moment, then said:
“Maybe the fact that the entire leadership team had been grown or promoted from within. SOME amount of that is terrific, but ALL of it might be a yellow flag. The result seems to be that everyone else thinks and feels the same way…and I think differently.”
This is SO insightful.
It reminds me of all the conversations Emily and I have had over the years, on how to balance developing talent from within vs bringing in fresh perspectives, people who have already seen what good looks like at the next stage of growth, people who can see around corners and challenge us in different ways.
This is a tough thing to suss out from the outside, especially when the employer is saying all the right things. But having an experience like this can inoculate you from an entire family of related mistakes. My friend will pick up on this kind of insularity from miles away, from now on.
Bad jobs happen. Interviews can only predict so much. A person who has never had a job they disliked is a profoundly lucky person. In the end, sometimes all you can take is the lessons you learned and won’t repeat.
The pig is committed
Have you ever heard the metaphor of the chicken vs the pig? The chicken contributes an egg to breakfast, the pig contributes bacon. The punch line goes something like, “the chicken is involved, but the pig is committed!”
It’s vivid and a bit over the top, but I kept thinking about it while writing this piece. The engineer contributes their labor and output to move the company forward, but the manager contributes their emotional and relational selves — their humanity — to serve the cause.
You only get one career. Who are you going to give your bacon to?
Hi friends! We’re on week three of my 12-week practice in writing one bite-sized topic per week — scoping it down, writing straight through, trying real hard to avoid over-writing or editing down to a pulp.
Three points in a row makes a line, and three posts in a row called “On [Something or Other]” is officially a pattern.
It was an accidental repeat last week (move fast and break things! 🙈), but I think I like it, so I’m sticking with it.
Next on the docket: pronouns and mandates
This week I would like to talk about pronouns (as in “my name is Charity, my pronouns are she/her or they/them”) and pronoun mandates, in the context of work.
Here’s where I stand, in brief:
Making it safe to disclose the pronouns you use: ✨GOOD✨
Normalizing the practice of sharing your pronouns: ✨GOOD✨
Mandating that everyone share their pronouns: ✨BAD✨
This includes soft mandates, like when a manager or HR asks everyone at work to share their pronouns when introducing themselves, or making pronouns a required field in email signatures or display names.
I absolutely understand that people who do this are acting in good faith, trying to be good allies. But I do not like it. 😡 And I think it can massively backfire!
Here are my reasons.
I resent being forced to pick a side in public
I have my own gender issues, y’all. Am I supposed to claim “she/her” or “they/them”? Ugh, I don’t know. I’ve never felt any affinity with feminine pronouns or identity, but I don’t care enough to correct anyone or assert a preference for they/them. Ultimately, the strongest feeling I have about my gender is apathy/discomfort/irritation. Maybe that will change someday, maybe it won’t, but I resent being forced to pick a side and make some kind of public declaration when I’m just trying to do my goddamn job. My gender doesn’t need to be anyone else’s business.
I totally acknowledge that it is valuable for cis people to help normalize the practice by sharing their pronouns. (It never fails to warm the cockles of my cold black heart when I see a graying straight white dude lead with “My pronouns are he/him” in his bio. Charmed! 😍)
If I worked at a company where this was not commonly done, I would suck it up and take one for the team. But I don’t feel the need, because it is normalized here. We have loads of other queer folks, my cofounder shares her pronouns. I don’t feel like I’m hurting anyone by not doing it myself.
Priming people with gender cues can be…unwise
One of the engineering managers I work with, Hannah Henderson, once told me that she has always disliked pronoun mandates for a different reason. Research shows that priming someone to think of you as a woman first and foremost generally leads them to think of you as being less technical, less authoritative, even less competent.
Great, just what we need.
What about people who don’t know, or aren’t yet out?
Some people may be in a transitional phase, or may be in the process of coming out as trans or genderqueer or nonbinary, or maybe they don’t know yet. Gender is a deeply personal question, and it’s inappropriate to force people to take a stand or pick a side in public or at work.
If **I** feel this way about pronoun mandates (and keep in mind that I am queer, have lived in San Francisco for 20 years, and am married to a genderqueer trans person), I can’t imagine how offputting and irritating these mandates must be to someone who holds different values, or comes from a different cultural background.
You can’t force someone to be a good ally
As if that wasn’t enough, pronoun mandates also have a flattening effect, eliminating useful signal about who is willing to stand up and identify themselves as someone who is a queer ally, and/or is relatively informed about gender issues.
As a friend commented, when reviewing a draft of this post: “Mandating it means we can’t look around the room and determine who might be friendly or safe, while also escalating resentment that bigots hold towards us.”
A couple months back I wrote a long (LONG) essay detailing my mixed feelings about corporate DEI initiatives. One of the points I was trying to land is how much easier it is to make and enforce rules, if you’re in a position with the power to do so, than to win hearts and minds. Rules always have edge cases and unintended consequences, and the backlash effect is real. People don’t like being told what to do.
Pronoun mandates were at the top of my mind when I wrote that, and I’ve been meaning to follow up and unpack this ever since.
Til next week, when we’ll talk “On something or some other thing”,
~charity💕
In my early twenties I had a cohort of friends and coworkers, all Silicon Valley engineers, all quite good at their jobs, all college dropouts. We developed a shared conviction that only losers got computer science degrees. This sounds like a joke, or a self-defense mechanism, but it was neither. We were serious.
We held CS grads in contempt, as a class. We privately mocked them. When interviewing candidates, we considered it a knock against someone if they graduated — not an insuperable one by any means, but certainly a yellow flag, something to be probed in the interview process, to ensure they had good judgment and were capable of learning independently and getting shit done, despite all evidence to the contrary.
We didn’t look down on ALL college graduates (that would be unreasonable). If you went to school to study something like civil engineering, or philosophy, or Russian literature, good for you! But computers? Everything in my experience led me to conclude that sitting in a classroom studying computers was a waste of time and money.
I had evidence! I worked my way through school — as the university sysadmin, at a local startup — and I had always learned soooo much more from my work than my classes. The languages and technologies they taught us were consistently years out of date. Classes were slow and plodding. Our professors lectured on and on about “IN-dustry” in a way that made it abundantly clear that they had no recent, relevant experience.
College dropouts: the original bootstrappers
The difference became especially stark after I spent a year working in Silicon Valley. I then returned to school, fully intending to finish and graduate, but I could not focus; I was bored out of my skull.
How could anyone sit through that amount of garbage? Wouldn’t anyone with an ounce of self-respect and intrinsic motivation have gotten up off their butts and learned what they needed to know much faster on their own? For fuck’s sake! just google it!
My friends and I rolled our eyes at each other and sighed over these so-called software engineers with degrees, who apparently needed their learning doled out in small bites and spoon-fed to them, like a child. Who wanted to work with someone with such a high tolerance for toil and bullshit?
Meanwhile we, the superior creatures, had simply figured out whatever the fuck we needed to learn by reading the source code, reading books and manuals, trying things out. We pulled OUR careers up by our own bootstraps, goddammit. Why couldn’t they? What was WRONG with them??
We knew so many deeply mediocre software engineers who had gotten their bachelor’s degree in computer science, and so many exceptional engineers with arts degrees or no degrees, that it started to feel like a rule or something.
Were they cherrypicked examples? Of course they were. That’s how these things work.
People are really, really good at justifying their status
Ever since then, I’ve met wave after wave of people in this industry who are convinced they know how to sift “good” talent from “bad” via easily detected heuristics. They’re mostly bullshit.
Which is not to say that heuristics are never useful, or that any of us can afford to expend infinite amounts of time sifting through prospects on the off chance that we miss a couple quality candidates. They can be useful, and we cannot.
However, I have retained an abiding skepticism of heuristics that serve to reinforce existing power structures, or ones that just so happen to overlap with the background of the holder of said heuristics.
Those of us who work in tech are fabulously fortunate; in terms of satisfying, remunerative career outcomes, we are easily in the top .0001% of all humans who have ever lived. Maybe this is why so many of us seem to have some deep-seated compulsion to prove that we belong here, no really, people like me deserve to be here.
This calls for some humility
If nothing else, I think it calls for some humility. I don’t feel like I “deserve” to be here. I don’t think any of us do. I think I worked really fucking hard and I got really fucking lucky. Both can be true. Some of the smartest kids I grew up with are now pumping gas or dead. Almost none of the people I grew up with ever reached escape velocity and made it out of our small town.
When I stop to think about it, it scares me how lucky I got. How lucky I am to have grown up when I did, to have entered tech when I did, when the barriers to entry were so low and you really could just learn on the job, if you were willing to work your ass off. I left home when I was 15 to go to college, and put myself through largely on minimum wage jobs. Even five years later, I couldn’t have done that.
There was a window of time in the 2000s when tech was an escalator to the middle class for a whole generation of weirdos, dropouts and liberal arts misfits. That window has been closed for a while now. I understand why the window closed, and why it was inevitable (software isn’t a toy anymore), but it’s still.. bittersweet.
I guess I’m just really grateful to be here.
~charity
Experiment update
As I wrote last week, I’m trying to reset my relationship with writing, by publishing one short blog post per week: under 1000 words, minimal editing. And there marks week 2, 942 words.
Brace yourself, because I’m about to utter a sequence of words I never thought I would hear myself say:
I really miss posting on Twitter.
I really, really miss it.
It’s funny, because Twitter was never not a trash fire. There was never a time when it felt like we were living through some kind of hallowed golden age of Twitter. I always felt a little embarrassed about the amount of time I spent posting.
Or maybe you only ever really see golden ages in hindsight.
I joined Twitter in 2009, and was an intermittent user for years. But it was after we started working on Honeycomb that Twitter became a lifeline, a job, a huge part of my everyday life.
Without Twitter, there would be no Honeycomb
Every day I would leave the house, look down at my phone, and start pecking out tweets as I walked to work. I turned out these mammoth threads about instrumentation, cardinality, storage engines, etc. Whatever was on my mind that day, it fed into Twitter.
In retrospect, I now realize that I was doing things like “outbounding” and “product marketing” and “category creation”, but at the time it felt more like oxygen.
Working out complex technical concepts in public, in real time, seeing what resonated, batting ideas back and forth with so many other smart, interesting people online…it was heady shit.
In the early days, we actually thought that Honeycomb-style observability (high cardinality, slice-and-dice, explorability, etc) was something only super large, multi-tenant platforms would ever care about or be willing to pay for. It was the conversations we were having on Twitter, the intensity of people’s reactions, that made us realize that no, actually; this was fast becoming an everybody problem.
Twitter was my most reliable source of dopamine
It’s impossible to talk about Twitter’s impact on my life and career without also acknowledging the ways I used it to self-medicate.
My ADHD was unmanaged, unmedicated, and unknown to me in those years. In retrospect, I can see that my only tool as an engineer was hyperfocus, and I rode that horse into the ground. When I unexpectedly became CEO, my job splintered into a million little bite sized chunks of time, and hyperfocus was no longer available to me. The tools I did have were Twitter and sleep deprivation.
Lack of sleep, it turns out, can wind me down and help me focus. If I’ve been awake for over 24 hours, I can buckle down and force myself to grind through things like email, expense reports, or writing marketing copy. Sleep deprivation is not pleasant, it’s actually really fucking painful, but it works. So I did it. From 2016 to 2020, I slept only once every two or three days. (People always think I am exaggerating when I say this, but people closer to me know that this is probably an understatement.)
But Twitter, you dear, dysfunctional hellsite… Twitter could wind me up.
I would go for a walk, pound out a fifty-tweet thread, and arrive at my destination feeling all revved up.
I picked fights, I argued. I was combative and aggressive in public, and I loved it. I regret some of it now; I burned some good relationships, and I burned out my adrenal glands. But I would sit down at my desk feeling high on dopamine, and I could channel that high into focus. It’s the only way I got shit done.
I got my ADHD diagnosis in 2020 (thank the gods). Since then I’ve done medication, coaching, therapy in several modalities, cats… I’ve tried it all, and a lot of it has helped. I sleep every single night now.
Most of the people I used to love talking with on X seem to have abandoned it to the fascists. LinkedIn is performatively corporate and has no soul. I’m still on Bluesky, but it’s a bit of an echo chamber and people mostly talk about politics; that is not what I go to social media for. The noisy, combative tech scene I loved doesn’t really seem to exist anymore.
These days I use social media less than ever, but I am learning that my writing is more important to me than ever. Which is forcing me to reckon with the fact that my writing process may no longer fit or serve the function I need it to.
Most of those epic threads I put so much time and energy into crafting have vanished into the ether. The few that I bothered to convert into essay format are the only ones that have endured.
I’ve been writing in public for ten years now
Do you ever hear yourself say something, causing you to pause, surprised: “I guess that’s a thing I believe”?
There are very few things in life that I am prouder of than the body of writing I have developed over the past 10 years.
When I look back over things I have written, I feel like I can see myself growing up, my mental health improving, I’m getting better at taking the long view, being more empathetic, being less reactive… I’ve never graduated from anything in my life, so to me, my writing kind of externalizes the progress I’ve made as a human being. It’s meaningful to me.
Huh. Turns out that’s a thing I believe. 🤔
I wrote my first post on this site in December of 2015. It’s crazy to look back on all the different things I have written about here over the past ten years — book reviews, boba recipes, technology, management, startup life, and more.
Even more mindblowing is when I look at my drafts folder, my notes folders. The hundreds of ideas or pieces I wanted to write about, or started writing about, but never found the time to polish or finish. Whuf.
I need to learn how to write shorter, faster pieces, without the buffer of social media
From 2015 to somewhere in the 2021-2023 timeframe, thoughts and snippets of writing were pouring out of me every day, mostly feeding the Twitter firehose. Only a few of those thoughts ever graduated into blog post form, but those few are the ones that have endured and had the most impact.
Over the past 2-4 years, I’ve been writing less frequently, less consistently, and mostly in blog post form. My posts, meanwhile, have gotten longer and longer. I keep shipping these 5000-9000-word monstrosities (I’m so sorry 🤦). I sometimes wonder who, if anyone, ever reads the whole thing.
The problem is that I keep writing myself into a ditch. I pick up a topic, and start writing, and somehow it metastasizes. It expands to consume all available time and space (and then some). By the time I’ve finished editing it down, weeks if not months have passed, and I have usually grown to loathe the sight of it.
For most of my adult life, I’ve relied on hard deadlines and panic to drive projects to completion, or to determine the scope of a piece. I’ve relied on anger and adrenaline rushes to fuel my creative juices, and due dates and external pressure to get myself over the finish line.
And what does that finish line look like? Running out of time, of course! I know I’m done because I have run out of time to work on it. No wonder scoping is such a problem for me.
A three month experiment in writing bite sized pieces
I need to learn to write in a different way. I need to learn to draft without twitter, scope without deadlines. Over the next five years, I want to get a larger percentage of my thoughts shipped in written form, and I don’t want them to evaporate into the ether of social media. This means I need to make some changes.
write shorter pieces
spend less time writing and editing
find the line of embarrassment, and hug it.
For the next three months, I am going to challenge myself to write one blog post per week (travel weeks exempt). I will try to cap each one under 1000 words (but not obsess over it, because the point is to edit less).
I’m writing this down as a public commitment and accountability mechanism.
So there we go, 1473 words. Just above the line of embarrassment.
Groan. Well, it’s not like I wasn’t warned. When I first started teasing out the differences between the pillars model and the single unified storage model and applying “2.0” to the latter, Christine was like “so what is going to stop the next vendor from slapping 3.0, 4.0, 5.0 on whatever they’re doing?”
I love Matt Klein’s writing — it’s opinionated, passionate, and deeply technical. It’s a joy to read, full of fun, fiery statements about the “logging industrial complex” and backhanded… let’s call them “compliments”… about companies like ours. I’m a fan, truly.
In retrospect, I semi regret the “o11y 2.0” framing
Yeah, it’s cheap and terribly overdone to use semantic versioning as a marketing technique. (It worked for Tim O’Reilly with “Web 2.0”, but Tim O’Reilly is Tim O’Reilly — the exception that proves the rule.) But that’s not actually why I regret it.
I regret it because a bunch of people — vendors mostly, but not entirely — got really bristly about having “1.0” retroactively applied to describe the multiple pillars model. It reads like a subtle diss, or devaluation of their tools.
One of the principles I live my life by is that you should generally call people, or groups of people, what they want to be called.
That is why, moving forwards, I am going to mostly avoid referring to the multiple pillars model as “o11y 1.0”, and instead I will call it the … multiple pillars model. And I will refer to the unified storage model as the “unified or consolidated storage model, sometimes called ‘o11y 2.0’”.
It is clearer than ever that a sea change is underway when it comes to how telemetry gets collected and stored. Here is my evidence (if you have evidence to the contrary or would like to challenge me on this, please reach out — first name at honeycomb dot io, email me!!):
Every single observability startup that was founded before 2021, that still exists, was built using the multiple pillars model … storing each type of signal in a different location, with limited correlation ability across data sets. (With one exception: Honeycomb.)
Every single observability startup that was founded after 2021, that still exists, was built using the unified storage model, capturing wide, structured log events, stored in a columnar database. (With one exception: Chronosphere.)
The major cost drivers in an o11y 1.0 — oop, sorry, in a “multiple pillars” world, are 1) the number of tools you use, 2) cardinality of your data, and 3) dimensionality of your data — or in other words, the amount of context and detail you store about your data, which is the most valuable part of the data! You get locked in a zero sum game between cost and value.
The major cost drivers in a unified storage world, aka “o11y 2.0”, are 1) your traffic, 2) your architecture, and 3) density of your instrumentation. This is important, because it means your cost growth should roughly align with the growth of your business and the value you get out of your telemetry.
This is a pretty huge shift in the way we think about instrumentation of services and levers of cost control, with a lot of downstream implications. If we just say “everything is observability”, it robs engineers of the language they need to make smart decisions about instrumentation, telemetry and tools choices. Language informs thinking and vice versa, and when our cognitive model changes, we need language to follow suit.
(Technically, we started out by defining observability as differentiated from monitoring, but the market has decided that everything is observability, so … we need to find new language, again. 😉)
Can we just … not send all that data?
My favorite of Matt’s blog posts is “Why is observability so expensive?” wherein he recaps the last 30 years of telemetry, gives some context about his work with Envoy and the separation of control planes / data planes, all leading up to this fiery proposition:
“What if by default we never send any telemetry at all?”
As someone who is always rooting for the contrarian underdog, I salute this. 🫡
As someone who has written and operated a ghastly amount of production services, I am not so sure.
Matt is the cofounder and CTO of Bitdrift, a startup for mobile observability. And in the context of mobile devices and IoT, I think it makes a lot of sense to gather all the data and store it at the origin, and only forward along summary statistics, until or unless that data is requested in fine granularity. Using the ring buffer is a stroke of genius.
Mobile devices are strictly isolated from each other, they are not competing with each other for shared resources, and the debugging model is mostly offline and ad hoc. It happens whenever the mobile developer decides to dig in and start exploring.
It’s less clear to me that this model will ever serve us well in the environment of highly concurrent, massively multi-tenant services, where two of the most important questions are always what is happening right now, and what just changed?
Even the 60-second aggregation window for traditional metrics collectors is a painful amount of lag when the site is down. I can’t imagine waiting to pull all the data in from hundreds or thousands of remote devices just to answer a question. And taking service isolation to such an extreme effectively makes traces impossible.
The hunger for more cost control levers is real
I think there’s a kernel of truth there, which is that the desire to keep a ton of rich telemetry detail about a fast-expanding footprint of data in a central location is not ultimately compatible with what people are willing or able to pay.
The fatal flaw of the multiple pillars model is that your levers of control consist of deleting your most valuable data: context and detail. The unified storage (o11y 2.0) model advances the state of the art by giving you tools that let you delete your LEAST valuable data, via tail sampling.
In a unified storage model, you should also have to store your data only once, instead of once per tool (Gartner data shows that most of their clients are using 10-20 tools, which is a hell of a cost multiplier.)
But I also think Matt’s right to say that these are only incremental improvements. And the cost levers I see emerging in the market that I’m most excited about are model agnostic.
Telemetry pipelines, tiered storage, data governance
The o11y 2.0 model (with no aggregation, no time bucketing, no indexing jobs) allows teams to get their telemetry faster than ever… but it does this by pushing all aggregation decisions from write time to read time. Instead of making a bunch of decisions at the instrumentation level about how to aggregate and organize your data… you store raw, wide structured event data, and perform ad hoc aggregations at query time.
Many engineers have argued that this is cost-prohibitive and unsustainable in the long run, and…I think they are probably right. Which is why I am so excited about telemetry pipelines.
Telemetry pipelines are the slider between aggregating metrics at write time (fast, cheap, painfully limited) and shipping all your raw, rich telemetry data off to a vendor, for aggregating at read time.
Sampling, too, has come a long way from its clumsy, kludgey origins. Tail-based sampling is now the norm, where you make decisions about what to retain or not only after the request has completed. The combination of fine-grained sampling + telemetry pipelines + AI is incredibly promising.
I’m not going to keep going into detail here because I’m currently editing down a massive piece on levers of cost control, and I don’t want to duplicate all that work (or piss off my editors). Suffice it to say, there’s a lot of truth to what Matt writes… and also he has a way of skipping over all the details that would complicate or contradict his core thesis, in a way I don’t love. This has made me vow to be more careful in how I represent other vendors’ offerings and beliefs.
Money is not always the most expensive resource
I don’t think we’re going to get to “1000x the telemetry at 0.01x the cost”, as Matt put it, unless we are willing to sacrifice or seriously compromise some of the other things we hold dear, like the ability to debug complex systems in real time.
Gartner recently put out a webinar on controlling observability costs, which I very much appreciated, because it brought some real data to what has been a terribly vibes-based conversation. They pointed out that one of the biggest drivers of o11y costs has been that people get attached to it, and start using it heavily. You can’t claw it back.
I think this is a good thing — a long overdue grappling with the complexity of our systems and the fact that we need to observe it through our tools, not through our mental map or how we remember it looking or behaving, because it is constantly changing out from under us.
I think observability engineering teams are increasingly looking less like ops teams, and more like data governance teams, the purest embodiment of platform engineering goals.
When it comes to developer tooling, cost matters, but it is rarely the most important thing or the most valuable thing. The most important things are workflows and cognitive carrying costs.
Observability is moving towards a data lake model
Whatever you want to call it, whatever numeric label you want to slap on it, I think the industry is clearly moving in the direction of unified storage — a data lake, if you will, where signals are connected to each other, and particular use cases are mostly derived at read time instead of write time. Where you pay to store each request only one time, and there are no dead ends between signals.
Matt wrote another post about how OpenTelemetry wasn’t going to solve the cost crisis in o11y … but I think that misses the purpose of OTel. The point of OTel is to get rid of vendor lock-in, to make it so that o11y vendors compete for your business based on being awesome, instead of impossible to get rid of.
Getting everyone’s data into a structured, predictable format also opens up lots of possibilities for tooling to feel like “magic”, which is exciting. And opens some entirely different avenues for cost controls!
In my head, the longer term goals for observability involve unifying not just data for engineering, but for product analytics, business forecasting, marketing segmentation… There’s so much waste going on all over the org by storing these in siloed locations. It fragments people’s view of the world and reality. As much as I snarked on it at the time, I think Hazel Weakly’s piece on “The future of observability is observability 3.0” was incredibly on target.
One of my guiding principles is that ✨data is made valuable by context.✨ When you store it densely packed together — systems, app, product, marketing, sales — and derive insights from a single source of truth, how much faster might we move? How much value might we unlock?
I think the new few years are going to be pretty exciting.
I have not thought or said much about DEI (Diversity, Equity and Inclusion) over the years. Not because I don’t care about the espoused ideals — I suppose I do, rather a lot — but because corporate DEI efforts have always struck me as ineffective and bland; bolted on at best, if not actively compensating for evil behavior.
I know how crisis PR works. The more I hear a company natter on and on about how much it cares for the environment, loves diversity, values integrity, yada yada, the more I automatically assume they must be covering their ass for some truly heinous shit behind closed doors.
My philosophy has historically been that actions speak louder than words. I would one million times rather do the work, and let my actions speak for themselves, than spend a lot of time yapping about what I’m doing or why.
I also resent being treated like an expert in “diversity stuff”, which I manifestly am not. As a result, I have always shrugged off any idea that I might have some personal responsibility to speak up or defend these programs.
Recent events (the tech backlash, the govt purge) have forced me to sit down and seriously rethink my operating philosophy. It’s one thing to be cranky and take potshots at corporate DEI efforts when they seem ascendant and powerful; it’s another when they are being stamped out and reviled in the public mind.
Actually, my work does not speak for itself
It took all of about thirty seconds to spot my first mistake, which is that no, actually, my work does not and cannot speak for itself. 🤦 No one’s does, really, but especially not when your job literally consists of setting direction and communicating priorities.
Maybe this works ok at a certain scale, when pretty much anyone can still overhear or participate in any topic they care about. But at some point, not speaking up at the company level sends its own message.
If you don’t state what you care about, how are random employees supposed to guess whether the things they value about your culture are the result of hard work and careful planning, or simply…emergent properties? Even more importantly, how are they supposed to know if your failures and shortcomings are due to trying but failing or simply not giving a shit?
These distinctions are not the most important (results will always matter most), but they are probably pretty meaningful to a lot of your employees.
The problem isn’t the fact that companies talk about their values, it’s that they treat it like a branding exercise instead of an accountability mechanism.
Fallacy #1: “DEI is the opposite of excellence or high performance”
There are two big category errors I see out there in the world. To be clear, one is a lot more harmful (and a lot more common, and increasingly ascendant) than the other, but both of these errors do harm.
The first error is what I heard someone call the “seesaw fallacy”: the notion that DEI and high performance are somehow linked in opposition to each other, like a seesaw; getting more of one means getting less of the other.
This is such absolute horseshit. 🙄 It fails basic logic, as well as not remotely comporting with my experience. You can kind of see where they’re coming from, but only by conveniently forgetting that every team and every company is a system.
Nobody is born a great engineer, or a great designer, or a great employee of any type. Great contributors are not born, they are forged — over years upon years of compounding experiences: education, labor, hard work, opportunities and more.
So-called “merit-based” hiring processes act like outputs are the only thing that matter; as though the way people show up on your doorstep is the way they were fated to be and the way they will always be. They don’t see people as inputs to the system — people with potential to grow and develop, people who may have been held back or disregarded in the past, people who will achieve a wide range of divergent outcomes based on the range of different experiences they may have in your system.
Fallacy #2: “DEI is the definition of excellence or high performance”
There is a mirror image error on the other end of the spectrum, though. You sometimes hear DEI advocates talk as though if you juuuust build the most diverse teams and the most inclusive culture, you will magically build better products and achieve overwhelming success in all of your business endeavors.
This is also false. You still have to build the fucking business! Your values and culture need to serve your business and facilitate its continued existence and success.
With the small caveat that … DEI isn’t the way you define excellence unless the way you define excellence is diversity, equity and inclusion, because “excellence” is intrinsically a values statement of what you hold most dear. This definition of excellence would not make sense for a profit-driven company, but valuing diverse teams and an inclusive culture over money and efficiency is a perfectly valid and coherent stance for a person to take, and lots of people do feel this way!
There is no such thing as the “best” or “right” values. Values are a way of navigating territory and creating alignment where there IS no one right answer. People value what they value, and that is their right.
DEI gets caricatured in the media as though the goal of DEI is diverse teams and equitable outcomes. But DEI is better seen as a toolkit. Your company values ought to help you achieve your goals, and your goals as a business usually some texture and nuance beyond just profit. At Honeycomb, for example, we talk about how we can “build a company people are proud to be part of”. DEI can help with this.
Let’s talk about MEI (Merit, Excellence and Intelligence)
Until last month I remained blissfully unaware of MEI, or “Merit, Excellence and Intelligence” (sic), and if you were too until just this moment, I apologize for ruining your party.
This idea that DEI is the opposite of MEI is particularly galling to me. I care a lot about high-performing teams and building an environment where people can do the best work of their lives. That is why I give a shit about building an inclusive culture.
An inclusive culture is one that sets as many people as possible up to soar and succeed, not just the narrow subset of folks who come pre-baked with all of life’s opportunities and advantages. When you get better at supporting folks and building a culture that foregrounds growth and learning, this both raises the bar for outcomes for everyone, and broadens the talent base you can draw from.
Honestly, I can’t think of anything less meritocratic than simply receiving and replicating all of society’s existing biases. Do you have any idea how much talent gets thrown away, in terms of unrealized potential? Let’s take a look at some of those stories from recent history.
If you actually give a shit about merit, you have to care about inclusion
Remember the Susan Fowler blog post that led to Travis Kalanick’s ouster as CEO of Uber in 2017? I suggest going back and skimming that post again, just to remind yourself what an absolutely jaw-dropping barrage of shit she went through, starting with being propositioned for sex by her very own manager on her very first day.
In “What You Do Is Who You Are”, investor Ben Horowitz wrote,”By all accounts Kalanick was furious about the incident, which he saw as a woman being judged on issues other than performance.” He believed that by treating her this way, his employees were failing to live up to their stated values around meritocracy.
I think that’s a flawed (but revealing) response to the situation at hand. Treating this like a question of “merit” suggests that they should be prioritizing the needs of whoever was most valuable to the company. And it kind of seems like that’sexactly what Kalanick’s employees were trying to do.
Susan was brilliant, yes; she was also young (25!) small, quiet, with a soft voice, in a corporate environment that valued aggression and bombast. She was early in her career and comparatively unproven; and when she reported her engineering manager’s persistent sexual advances and retaliatory actions to HR, she was told that HE was the high performer they couldn’t afford to lose.
Ask yourself this: would the manager’s behavior have been any more acceptable if Susan had been a total fuckup, instead of a certifiable genius? (NO. 😡)
Susan’s piece also noted that the percentage of women in Uber’s SRE org dropped from 25% to 3% across that same one year interval. Alarm bells were going off all over the place for an entire year, and nobody gave a shit, because an inclusive culture was nowhere on their radar as a thing that mattered.
There is no rational conversation to be had about merit that does not start with inclusion
You might know (or think you know) who your highest performers are today, but you do not know who will be on that list in six months, one year, five years. Your company is a system, and the environment you build will drive behaviors that help determine who is on that list.
Maybe you have a Susan Fowler type onboarding at your company right now. How confident are you that she will be treated fairly and equitably, that she will feel like she belongs? Do you think she might be underestimated due to her gender or presentation? Do you think she would want to stick around for the long haul? Will she be motivated to do her best work in service of your mission? Why?
Can you say the same about all your employees, not just ones you already know to be certifiable geniuses?
That’s inclusion. That’s how you build a real fucking meritocracy. You start with “do not tolerate the things that kneecap your employees in their pursuit of excellence”, and ESPECIALLY not the things that subject them to the compounding tax of being targeted for who they are. In life as in finance, it’s the compound interest that kills you, more than the occasional expensive purchase.
There’s more to merit and excellence than just inclusion, obviously, but there’s no rational adult conversation to be had about merit or meritocracy that doesn’t start there.
Susan left the tech industry, by the way. She seems to be doing great, of course, but what a loss for us.
If you give a shit about merit, tell me what you are doing to counteract bias
Anyone who talks a big game about merit, but doesn’t grapple with how to identify or counteract the effects of bias in the system, doesn’t really care about merit at all. What they actually want is what Ijeoma Oluo calls “entitlement masquerading as meritocracy” (“Mediocre”).
The “just world fallacy” is one of those cognitive biases that will be with us forever, because we have such a deep craving for narrative coherence. On a personal level, we are embodied beings awash with intrinsic biases; on a societal level, obviously, structural inequities abound. No one is saying we should aim for equality of outcomes, despite what some nutbag MEI advocates seem to think.
But anyone who truly cares about merit should feel compelled to do at least some work to try and lean against the ways our biases cause us to systematically under-value, under-reward, under-recognize, and under-promote some people (and over-value others). Because these effects add up to something cumulatively massive.
In the Amazon book “Working Backwards”, chapter 2, they briefly mention an engineering director who “wanted to increase the gender diversity of their team”, and decided to give every application with a female-gendered name a screening call. The number of women hired into that org “increased dramatically”.
That’s it — that’s the only tweak they made. They didn’t change the interview process, they didn’t “lower the bar”, they didn’t do anything except skip the step where women’s resumes were getting filtered out due to the intrinsic biases of the hiring managers.
There’s no shame in having biases — we all have them. The shame is in making other people pay the price for your unexamined life..
DEI is an imperfect vehicle for deeply meaningful ideals
I am by no means trying to muster a blanket defense of everything that gets lumped under DEI, just to be clear. Some of it is performative, ham-handed, well-intentioned but ineffective, disconnected or a distraction from real problems; diversity theater; a steam valve to vent off any real pressure for change; nitpicky and authoritarian, flirts with thought policing, or just horrendously cringe.
I don’t know how much I really care whether corporate DEI programs live or die, because I never thought they were that effective to start with. Jay Caspian Kang wrote a great piece in the New Yorker that captured my feelings on the matter:
The problem, at a grand scale, is that D.E.I.’s malleability and its ability to survive in pretty much every setting, whether it’s a nearby public school or the C.I.A., means that it has to be generic and ultimately inoffensive, which means that, in the end, D.E.I. didn’t really satisfy anyone.
What it did was provide a safety valve (I am speaking about D.E.I. in the past tense because I do think it will quickly be expunged from the private sector as well) for institutions that were dealing with racial and social-justice problems. If you had a protest on campus over any issue having to do with “diverse students” who wanted “equity,” that now became the provenance of D.E.I. officers who, if they were doing their job correctly, would defuse the situation and find some solution—oftentimes involving a task force—that made the picket line go away.
It’s a symbolic loss of something that was only ever a symbolic gain. Corporate DEI programs as we know them sprung up in the wake of the Black Lives Matter protests of 2020, but I haven’t exactly noticed the world getting substantially more diverse or inclusive since then.
Which is not to say that tech culture has not gotten more diverse or inclusive over the longer arc of my career; it absolutely, definitely has. I began working in tech when I was just a teenager, over 20 years ago, and it is actually hard to convey just how much the world has changed since then.
And not because of corporate DEI policies. So why? Great question. 🙌
Tech culture changed because hearts and minds were changed
I think social media explains a lot about why awareness suddenly exploded in the 2010s. People who might never have intentionally clicked a link about racism or sexism were nevertheless exposed to a lot of compelling stories and arguments, via retweets and stuff ending up in their feed. I know this, because I was one of them.
The 2010s were a ferment of commentary and consciousness-raising in tech. A lot of brave people started speaking up and sharing their experiences with harassment, abuse, employer retaliation, unfair wage practices, blatant discrimination, racism, predators.. you name it. People were comparing notes with each other and realizing how common some of these experiences were, and developing new vocabulary to identify them — “missing stair”, “sandpaper feminism”, etc.
If you were in tech and you were paying attention at all, it got harder and harder to turn a blind eye. People got educated despite themselves, and in the end…many, many hearts and minds were changed.
This is what happened to me. I came from a religious and political background on the far right, but my eyes were opened. The more I looked around, the more evidence I saw in support of the moral and intellectual critiques I was reading online. I began waking up to some of the ways I had personally been complicit in doing harm to others.
The “unofficial affirmative action movement” in tech, circa 2010-2020
And I was not alone. Emily once offhandedly referred to an “unofficial affirmative action movement” in tech, and this really struck a chord with me. I know so many people whose hearts and minds were changed, who then took action.
They worked to diversify their personal networks of friends and acquaintances; to mentor, sponsor, and champion underrepresented folks in their workplaces; to recruit, promote, and refer women and people of color; to invite marginalized folks to speak at their conferences and on their panels; to support codes of conduct and unconscious bias training; and to educate themselves on how to be better allies in general.
All of this was happening for at least a decade leading up to 2020, when BLM shook up the industry and led to the creation of many corporate DEI initiatives. Kang, again:
What happened in many workplaces across the country after 2020 was that the people in charge were either genuinely moved by the Floyd protests or they were scared. Both the inspired and the terrified built out a D.E.I. infrastructure in their workplaces. These new employees would be given titles like chief diversity officer or C.D.O., which made it seem like it was part of the C-suite, and would be given a spot at every table, but much like at Stanford Law, their job was simply to absorb and handle any race stuff that happened.
The pivot from lobbying/persuading from the outside to holding the levers of formal power is a hard, hard one to execute well. History is littered with the shells of social movements that failed to make this leap.
You got here because you persuaded and earned credibility based on your stories and ideals, and now people are handing you the reins to make the rules. What do you do with them? Uh oh.
It’s easier to make rules and enforce them than it is to change hearts and minds
I think this happened to a lot of DEI advocates in the 2020-2024 era, when corporations briefly invested DEI programs and leaders with some amount of real corporate power, or at least the power to make petty rules. And I do not think it served our ideals well.
I just think…there’s only so much you can order people to do, before it backfires on you. Which doesn’t mean that laws and policies are useless; far from it. But they are limited. And they can trigger powerful backlash and resentment when they get overused as a means of policing people’s words and behaviors, especially in ways that seem petty or disconnected from actual impact.
When you lean on authority to drive compliance, you also stop giving people the opportunity to get on board and act from the heart.
MLK actually has a quote on this that I love, where he says “the law cannot make a man love me”:
“It may be true that the law cannot make a man love me, religion and education will have to do that, but it can restrain him from lynching me. And I think that’s pretty important also. And so that while legislation may not change the hearts of men, it does change the habits of men.”
~ Dr. Martin Luther King, Jr.
There are ways that the DEI movement really lost me around the time they got access to formal levers of power. It felt like there was a shift away from vulnerability and persuasion and towards mandates and speech policing.
Instead of taking the time to explain why something mattered, people were simply ordered to conform to an ever-evolving, opaque set of speech patterns as defined by social media. Worse, people sometimes got shamed or shut down for having legitimate questions.
There’s a big difference between saying that “marginalized people shouldn’t have to constantly have to defend their own existence and do the work of educating other people” (hard agree!), and saying that nobody should have to persuade or educate other folks and bring them along.
We do have to persuade, we do have to bring people along with us. We do have to fight for hearts and minds. I think we did a better job of this without the levers of formal power.
Don’t underestimate what a competitive advantage diversity can be
People have long marveled at the incredible amount of world class engineering talent we have always had at Honeycomb — long before we even had any customers, or a product to sell them. How did we manage this? The relative diversity of our teams has always been our chief recruiting asset.
There is a real hunger out there on the part of employees to work at a company that does more than the bare minimum in the realm of ethics. Especially as AI begins chewing away at historically white collar professions, people are desperate for evidence that you can be an ambitious, successful, money-making business that is unabashed about living its values and holding a humane, ethical worldview.
And increasingly, one of the main places people go to look for evidence that your company has ethical standards and takes them seriously is…the diversity of your teams.
Diversity is an imperfect proxy for corporate ethics, but it’s not a crazy one.
The diversity of your teams over the long run rests on your ability to build an inclusive culture and equitable policies. Which depends on your ability to infuse an ethical backbone throughout your entire company; to balance short-term and long-term investments, as you build a company that can win at business without losing its soul.
And I’m not actually talking about junior talent. Competition is so fierce lower on the ladder, those folks will usually take whatever they can get. (💔) I’m talking about senior folks, the kind of people who have their pick of roles, even in a weak job market. You might be shocked how many people out there will walk away from millions/year in comp at Netflix, Meta or Google, in order to work at a company where ethics are front and center, where diversity is table stakes, where their reporting chain and the executive team do not all look alike.
The longer you wait to build in equity and inclusion, the tougher it will be
Founders and execs come up to me rather often and ask what the secret is to hiring so many incredible contributors from underrepresented backgrounds. I answer: “It’s easy!…if you already have a diverse team.”
It is easier to build equitable programs and hire diverse teams early, and not drive yourself into a ditch, than it is to go full tilt with a monoculture and face years of recovery and repair. The longer you wait to do the work, the harder the work is going to be. Don’t put it off.
As I wrote a while back:
“If you don’t spend time, money, attention, or political capital on it, you don’t care about it, by definition. And it is a thousand times worse to claim you value something, and then demonstrate with your actions that you don’t care, than to never claim it in the first place.”
“You must remind yourself as you do, uneasily, queasily, how easily ‘I didn’t have a choice’ can slip from reason to excuse. How quickly ‘this isn’t the right time’ turns into ‘never the right time’. You know this, I know this, and I guarantee you every one of your employees knows this.”
It can be a massive competitive advantage if you build a company that knows how to develop a deep bench of talent and set people up for success.
Not only the preexisting elite, the smartest and most advantaged decile of talent — for whom competition will always be cutthroat — but people from broader walks of life.
Winning at business is what earns you the right to make bigger bets and longer-term investments
As the saying goes, “Nobody ever got fired for buying IBM” — and nobody ever had the failure of their startup blamed on the fact that they hired engineers away (or followed management practices) from Google, Netflix or Facebook, regardless of how good or bad those engineers (or practices) may be.
If you want to do something different, you need to succeed. People cargo cult the culture of places that make lots of money.
If you want your values and ideals to spread throughout the industry, the most impactful thing you can possibly do is win.
It’s a reality that when you’re a startup, your resources are scarce, your time horizons are short. You have to make smart decisions about where to invest them. Perfection is the enemy of success. Make good choices, so you can live to fight another day.
But fight another day.
If you don’t give a shit, don’t try and fake it
Finally let me say this: if you don’t give a shit about diversity or inclusion, don’t pretend you give a shit. It isn’t going to fool anyone. (If you “really care” but for some reason DEI loses every single bake-off for resources, friend, you don’t care.)
And honestly, as an employee, I would rather work for a soulless corporation that is honest with itself and its employees about how decisions get made, than for someone who claims to care about the things I value, but whose actions are unpredictable or inconsistent with those values.
Listen.. There is never just one true way to win. There are many paths up the mountain. There are many ways to win. (And there are many, many, many more ways to fail.)
Nothing that got imported or bolted on to your company operating system was ever going to work, anyway. 🤷 If it doesn’t live on in the hearts and minds of the people who are building the strategy and executing on it, they are dead words.
When I look at the long list of companies who say they are rolling back mentions to DEI internally, I don’t get that depressed. I see a long list of companies who never really meant it anyway. I’m glad they decided to stop performing.
You need a set of operating practices and principles that are internally consistent and authentic to who you are. And you need to do the work to bring people along with you, hearts and minds and all.
So if we care about our ideals, let’s go fucking win.
As I whined to Hazel over text, after she sweetly sent me a preview draft of her post: “PLEASE don’t post this! I feel like I spend all my time trying to help bring clarity and context to what’s happening in the market, and this is NOT HELPING. Do you know how hard it is to try and socialize shared language around complex sociotechnical topics? Talking about ‘observability 3.0’ is just going to confuse everyone.”
That’s the problem with the internet, really; the way any asshole can go and name things (she said piteously, self-righteously, and with an astounding lack of self-awareness).
Semantic versioning is cheap and I kind of hate it
I’m complaining, because I feel sorry for myself (and because Hazel is a dear friend and can take it). But honestly, I actually kind of loathe the 1.0 vs 2.0 (or 3.0) framing myself. It’s helpful, it has explanatory power, I’m using it…but you’ll notice we aren’t slapping “Honeycomb is Observability 2.0” banners all over the website or anything.
Semantic versioning is a cheap and horrendously overused framing device in both technology and marketing. And it’s cheap for exactly these reasons…it’s too easy for anyone to come along and bump the counter again and say it happens to be because of whatever fucking thing they are doing.
I don’t love it, but I don’t have a better idea. In this case, the o11y 2.0 language describes a real, backwards-incompatible, behavioral and technical generational shift in the industry. This is not a branding exercise in search of technological justification, it’s a technical sea change reaching for clarification in the market.
One of the most exciting things that happened this year is that all the new observability startups have suddenly stopped looking like cheaper Datadogs (three pillars, many sources of truth) and started looking like cheaper Honeycombs (wide, structured log events, single source of truth, OTel-native, usually Clickhouse-based). As an engineer, this is so fucking exciting.
(I should probably allow that these technologies have been available for a long time; adoption has accelerated over the past couple of years in the wake of the ZIRP era, as the exploding cost multiplier of the three pillars model has become unsustainable for more and more teams.)
Some non-controversial “controversial claims”
Firstly, I’m going to make a somewhat controversial claim in that you can get observability 2.0 just fine with “observability 1.0” vendors. The only thing you need from a UX standpoint is the ability to query correlations, which means any temporal data-structure, decorated with metadata, is sufficient.”
This is not controversial at all, in my book. You can get most of the way there, if you have enough time and energy and expertise, with 1.0 tooling. There are exceptions, and it’s really freaking hard. If all you have is aggregate buckets and random exemplars, your ability to slice and dice with precision will be dramatically limited.
This matters a lot, if you’re trying to e.g. break down by any combination of feature flags, build IDs, canaries, user IDs, app IDs, etc in an exploratory, open-ended fashion. As Hazel says, the whole point is to “develop the ability to ask meaningful questions, get useful answers, and act effectively on what you learn.” A-yep.
However, any time your explanation takes more than 30 seconds, you’ve lost your audience. This is at least a three-minute answer. Therefore, I typically tell people they need structured log events.
“Observability 2.0” describes a sociotechnical sea change that is already well underway
Let’s stop talking about engineering for a moment, and talk about product marketing.
A key aspect of product marketing is simplification. That’s where the 2.0 language grew out of. About a year ago I started having a series of conversations with CTOs and VPEngs. All of them are like, “we already have observability, how is Honeycomb any different?” And I would launch off into a laundry list of features and capabilities, and a couple minutes later you see their eyes glazing over.
You have to have some way of boiling it down and making it pithy and memorable. And any time you do that, you lose some precision. So I actually disagree with very little Hazel has said in this essay. I’ve made most of the same points, in various times and places.
Good product marketing is when you take a strong technical differentiator and try to find evocative, resonant ways of making it click for people. Bad product marketing — and oh my god is there a lot of that — is when you start with the justification and work backwards. Or start with “well we should create our own category” and then try to define and defend one for sales purposes.
Or worst of all — “what our competitors are saying seems to be really working, but building it would take a long time and be very hard, so what if we just say the same words out loud and confuse everyone into buying our shit instead?”
(Ask me how many times this has happened to us, I fucking dare you.)
Understanding your software in the language of your business
Here’s why I really hate the 3.0 framing: I feel like all the critical aspects that I really really care about are already part of 2.0. They have to be. It’s the whole freaking point of the generational change which is already underway.
We aren’t just changing data structures for the fun of it. The whole point is to be able to ask better questions, as Hazel correctly emphasizes in her piece.
Christine and I recently rewrote our company’s mission and vision. Our new vision states:
Understand your software in the language of your business.
Decades on, the promise of software and the software industry remains unfulfilled. Software engineering teams were supposed to be the innovative core of modern business; instead they are order-takers, cost centers, problem children. Honeycomb is here to shape a future where there is no divide between building software and building a business — a future where software engineers are truly the innovation engine of modern companies.
The beauty of high cardinality, high dimensionality data is that it gives you the power to pack dense quantities of application data, systems data, and business data all into the same blob of context, and then explore all three together.
Even if you’ve calculated the cost of downtime, you probably aren’t really thinking about the relationship between telemetry data and business data. Engineering stuff tends to stay in the engineering domain. Here’s some questions that I’d suggest most people can’t answer with their observability programs, but are absolutely fucking fascinating questions (emphasis mine):
What’s the relationship between system performance and conversions, by funnel stage? Break it down by geo, device, and intent signals.
What’s our cost of goods sold per request, per customer, with real-time pricing data of resources?
How much does each marginal API request to our enterprise data endpoint cost in terms of availability for lower-tiered customers? Enough to justify automation work?
Every truly interesting question we ask as engineers is some combination or intersection of business data + application data. We do no one any favors by chopping them up and siloing them off into different tools and data stores, for consumption by different teams.
Data lake ✅, query flexibility ✅, non-engineering functions…🚫
Hazel’s three predictions for what she calls “observability 3.0” are as follows:
Observability 3.0 backends are going to look a lot like a data lake-house architecture
Observability 3.0 will expand query capabilities to the point that it mostly erases the distinction between pay now / pay later, or “write time” vs “read time”
Observability 3.0 will, more than anything else, be measured by the value that non-engineering functions in the business are able to get from it
I agree with the first two — in fact, I think that’s exactly the trajectory that we’re on with 2.0. We are moving fast and accelerating in the direction of data lakehouse architectures, and in the direction of fast, flexible, and cheap querying. There’s nothing backwards-incompatible or breaking about these changes from a 2.0 -> 3.0 perspective.
Which brings us to the final one. This is the only place in the whole essay where there may be some actual daylight between where Hazel and I stand, depending on your perspective.
Other business functions already have nice things; we need to get our own house in order
No, I don’t think success will be measured by non-engineering functions’ ability to interrogate our data. I think it’s the opposite. I think it is engineers who need to integrate data about the business into our own telemetry, and get used to using it in our daily lives.
They’ve had nice things on the business side for years — for decades. They were rolling out columnar stores for business intelligence almost 20 years ago! Folks in sales and marketing are used to being able to explore and query their business data with ease. Can you even imagine trying to run a marketing org if you had to pre-define cohorts into static buckets before you even got started?
No, in this case it’s actually engineering that are the laggards. It’s a very “the cobbler’s children have no shoes” kind of vibe, that we’re still over here warring over cardinality limits and pre-defined metrics and trying to wrestle them into understanding our massively, sprawlingly complex systems.
So I would flip that entirely around. The success of observability 2.0 will be measured by how well engineering teams can understand their decisions and describe what they do in the language of the business.
Other business functions already have nice tools for business data. What they don’t have — can’t have — is observability that integrates systems and application data in the same place as their business data. Uniting all three sources, that’s on us.
If every company is now a technology company, then technology execs need to sit at the big table
Hazel actually gets at this point towards the end of her essay:
We’ve had multiple decades as an industry to figure out how to deliver meaningful business value in a transparent manner, and if engineering leaders can’t catch up to other C-suites in that department soon, I don’t expect them to stick around another decade
The only member of the C-suite that has no standard template for their role is…CTO. CTOs are all over the freaking map.
Similarly, VPs of Engineering are usually not part of the innermost circles of execs.
Why? Because the point of that inner circle of execs is to co-make and co-own all of the decisions at the highest level about where to invest the company’s resources.
And engineering (and product, and design) usually can’t explain their decisions well enough in terms of the business for them to be co-owned and co-understood by the other members of the exec team. R&D is full of the artistes of the company. We tell you what we think we need to do our jobs, and you either trust us or you don’t.
(This is not a one-way street, of course; the levers of investment into R&D are often opaque, counter-intuitive and poorly understood by the rest of the exec team, and they also have a responsibility to educate themselves well enough to co-own these decisions. I always recommend these folks start by reading “Accelerate”.)
But twenty years of free money has done poorly by us as engineering leaders. The end of the ZIRP era is the best thing that could have happened to us. It’s time to get our house in order and sit at the big table.
“Know your business, run it with data”, as Jeff Gray, our COO, often says.
I’ve never been good at “hot takes”. Anyone who knows anything about marketing can tell you that the best time to share your opinion about something is when everyone is all worked up about it. Hot topics drive clicks and eyeballs and attention en masse.
Unfortunately, my internal combustion engine doesn’t run that way. If anything, my fuel runs the other way. If everybody’s already buzzing about something, I feel like chances are, everything that needs to be said is already being said by someone else, so why should I bother?
Earlier this year I started writing a piece on why “hire great people and get out of their way” is such terrible, dangerous, counterproductive advice to give anyone in a leadership role. Then Paul Graham dropped his famous essay on “founder mode”, inspired by a talk given at a YC event by Brian Chesky. PG called it “a talk everyone who was there will remember…Most founders I talked to afterward said it was the best they’d ever heard.” The internet went nuts for it.
What I should have done: put my head down and finished the fucking piece. 🙄
What I actually did: ragetweeted a long thread from bed, read a bunch of other people’s takes, then went “well, all the bases seem to be covered” and lost all interest in finishing.
For the curious, here are the takes I really liked:
https://x.com/ejames_c/status/1830411301413421552, on why AirBNB’s free cash flow margin is due to their prepayment business model and has nothing whatsoever to do with ‘founder mode’, by Cedric Chin
A month and a half later, we all got to see what the fuss was about. Keith Rabois interviewed Brian Chesky at a Khosla Ventures event in NYC and posted the ensuing 45 min video to YouTube, calling it “Founder Mode and the Art of Hiring”.
The gripping tale of Airbnb’s dramatic rise, crash, and rebirth
Chesky starts off by relating a story about how Airbnb in its early years hired way too many people, way too fast, and buckled under all the nasty consequences of hypergrowth. Lack of clarity and direction, excessive coordination costs, lack of focus, layers of bureaucracy that added no value or expertise, empire building, you name it.
So it’s 2019, and it’s just starting to dawn on Brian Chesky that he has this massive clusterfuck on his hands. But Airbnb is barrelling towards an IPO, so he feels like his hands are tied. Then COVID hits. Airbnb loses 80% of its business in 8 weeks, going from “the hottest IPO since Uber” to facing possible bankruptcy and dissolution, practically overnight. You never want to let a crisis go to waste, so Chesky seizes the opportunity to restructure the company and make a bunch of massive changes.
This is a fascinating story, right? It is! Or it should be. A young, first-time founder hits it big with his first startup, barrels through a decade of hypergrowth and free money towards a white hot IPO, then belatedly realizes everything he’s done has resulted in a big, bloated, horrendously inefficient company where nobody can get shit done and all the top talent is leaving. Then comes the pandemic. Holy shit! How will he turn things around??
This is an incredible story. I want to hear this story.
The problem is that he somehow manages to tell it in the most aggravating possible way, where he is a lone hero, buffeted by mediocrity and held back by his own employees at every turn. Actual quote:
“Oh my god, I guess I’m not crazy. I’m just made to believe I’m crazy by my own employees. You’re not crazy. Even though people who work for you tell you you are. You’re not crazy.”
He talks about the people who worked for him in supremely belittling terms — “C players”, “incapable”, “mediocre”, “worst people”. And he takes absolutely zero responsibility for the corporate disaster that developed in slow motion under his watch, while taking ALL the credit for its recovery.
How might another person have told this story?
I mean…if it was me, I might have started off by confessing that “Wow, I did not do a good job as CEO for the first decade of running my company. I over-hired, underspecified the roles, did a terrible job of setting expectations and rewarding the skills and behaviors that really mattered, didn’t know what org charts were for, and in general just completely failed to build a company that valued efficiency, or had any kind of effective strategy or culture of high performance”.
If Brian Chesky had done that I would have been like, “THIS MAN IS A HERO, EVERYONE STOP WHAT YOU ARE DOING AND COME HEAR HIS HARD WON WISDOM”. Instead, the way he tells the story, the problem is always everyone else, and the solution is always more Brian Chesky.
But Brian Chesky created the fucking problems, by being bad at running the business!
There is actually no shame in this! He is right: being a CEO is fucking hard. It does not come naturally. Nobody is born good at it. It takes a lot of hard work and pain and suffering to become someone who is good at running a company. I was CEO of Honeycomb for 3.5 years, and it almost killed me. I never got good at it. I have immense respect for the people who do it well.
But this attitude he has, where the buck stops literally everywhere but him — is one I find so fucking repellent. Ethics aside, I also feel like it constitutes a material risk to any company when the CEO is so lacking in humility and self-awareness. (I can leave room for the possibility that he is actually humble as fuck and he just…chose not to share those reflections with us in this talk. 🤷)
It took me a month to make it through the entire recording
I’ll be honest, I made it about three minutes into the video before I blew my fucking top and closed the tab. It made me so angry. This fucking guy. It pushes all my buttons.
But then I had a few conversations with other founders who did watch the whole thing, people I genuinely respect. I kept hearing there was great advice in the piece, if you can just get past the attitude and total lack of accountability.
It took me over a month to make it through the full thing, in fits and starts, but once I finally did, I had to admit that they were right. There is good advice inside, and there are reasonable principles embedded in this talk. Chesky seems to have successfully turned his company around, after all. That’s a really hard thing to do!
In the end, I forced myself to buckle down and get this piece out because … between PG’s “founder mode” essay and the wide distribution of the Chesky interview, these opinions have already imprinted onto generations of Silicon Valley founders and leaders. They have seeped into the water table, and there’s no going back.
I would PREFER the enduring legacy of both “founder mode” and Brian Chesky’s “The Art of Hiring” to be one that moves the industry forward in material ways, and not one that further entrenches the Silicon Valley cult of the founder, Great Man of History, 10x engineer Lone Ranger superhero John Galt type bullshit that has dogged our heels for decades. And there is some decent material here! We can work with this.
So let’s take the major points he makes, one at a time, and mine them for gold nuggets. Here we go!
The story, in Brian Chesky’s words
My apologies for the extremely long quotes, but I think they set the stage well. (Lightly edited for readability.)
“You know, we were one of the first ‘unicorns’, before that was a term. And it was amazing for a bit, from like 2009-2014. It was awesome. It was fun. It was exciting. And then one day it was horrible. And that day went on for like six years (emphasis mine). And basically what happened was I realized you can kind of be born a good founder…I think I was a pretty good founder the day we started the company…But I’m not sure any of us are born good CEOs.
But the other problem with being a CEO is I think almost all the advice and everything they teach at like Harvard Business School…is wrong. For example, the role of a great leader is to hire great people and empower them to do their job…If you do that, your company will be destroyed.”
I’ve never been to Harvard Business School, but I would be pretty surprised to learn that they don’t cover things like organizational structure, span of control, or operational efficiency.
We had a company where we were like a matrix organization. And so like we had all these different teams. And by the way, there’s no governor of how many teams there are. So teams can create teams, can create sub teams, can create sub teams, that people can decide how many manager levels they create. Like if you’re not careful people do this. And why do they do this? Because they want to have new teams.”
(The “governor of how many teams there are” is whoever leads your People team or HR, btw, who in turn rolls up to the CEO. Again, org design is a pretty traditional and well-studied aspect of operating a company.)
So let’s take a marketing or creative department. There’s a team in Airbnb doing graphics and different parts of the site need graphics, advertising needs graphics. And when it was five teams, the five teams would ask the graphics department for graphics and they’d have like five requests. And then pretty soon it’s 20 teams and once it’s 20 teams…they’re like the deli, there’s a line out the block, there’s a multi-month wait. And then what happens is the graphics team, the central service, kind of like gives up and everything seems pointless. And the teams waiting forever give up and they say, ‘give me my own people’. So now they get their own graphics team. So now you have 5 or 10 graphics teams. And you can do the same thing with technology. And product. Oh, you can have 10 data teams that have different metrics and we can go down the list.
So now you have 10 divisions. Now those 10 divisions are wanting to go in different directions. And they have general managers. And GMs are like little Russian babushka dolls. They want to create miniature GMs and miniature-miniature GMs. And so now you don’t have 10 teams, you’ve got actually 100 teams, because you’ve got these little babushkas running around and they’re going in 100 directions with different technology…
You end up with a lot of bureaucracy. You end up with a company where there’s meetings about meetings where metrics and strategic priorities are the only thing that bind the company together. There’s no cohesive product roadmap, everything is a different time horizon. It’s all short term oriented. And the biggest problem of all is a CEO gets separated from their own product.
And I noticed this thing where there was more bureaucracy, there were these divisions, the divisions then they have to advocate for resources. That advocacy creates politics. And then you have a situation where it’s hard to track what everyone’s doing. So you have like this free for all. There’s not a lot of accountability, which leads to complacency. The complacency means that, like the bad people, the good people are indistinguishable. So the good people tend to move on. They say the company’s changed, the company slows down, and one day you wake up.
Sounds like a mess, all right.
(Chesky’s use of the passive voice here is truly spectacular. Who was in charge for those horrible six years while all this organizational fuckery and uncontrolled sprawl was happening? Oh right, you were.)
To sum up: before the pandemic, Airbnb seems to have had multiple business divisions, each of which had its own GM and a whole ass org structure, with its own engineering, design, marketing teams, etc. This seems wildly weird and inefficient and crazy to me, given that Airbnb only has one product, which is Airbnb? But, they did. So yeah, I am unsurprised that this did not work well.
Which brings us to our first lesson on efficiency.
You should have as few employees as possible
“So what did I do? The first thing I did is I went from a divisional structure to a functional organization. Functional organizations are when you have design and engineering and product management or product marketing and sales. So we went back to a functional organization where our goal was to have as few employees as possible…We said we were the Navy Seals, not the Navy. We want a small, lean, elite, highly skilled team, not a team of kind of mid-level battalion type people. And the reason why is that every person brings with them a communication tax.”
Basically, Brian Chesky is rediscovering this graphic and it’s blowing his mind.
Brooks’ Law
I feel like this should be really fucking obvious, but I guess the legacy of hypergrowth companies proves that it is not: You should ALWAYS have as few employees as possible. Always. Hiring more people should never be the first lever you reach for, it’s what you do after exhausting your other options. Doing great things with a small team is always something to brag about.
(Okay…maybe not ALWAYS-always. There are some business models where your revenue scales linearly along with headcount, but for your average VC-funded technology startup, “we want a small, lean, elite, highly skilled team” is like saying “you should eat vegetables”.)
Your managers should be subject matter experts
“Oh and by the way, you have leaders that are, quote, managers. I don’t like managers. We don’t have a single manager at Airbnb. And I put that in air quotes. A manager that doesn’t know how to do the job is like a cavalry general that can’t ride a horse. A lot of companies do that. So we only allowed managers that were experts but for a long time we had managers. And one day I woke up and I realized I had 50 year olds, managing 40 year olds, managing 30 year olds, managing interns, doing the job with all these layers that weren’t adding any value.”
The disgust in his voice when he says the word “managers” is palpable. And it’s gross. You can talk about the importance of managers being highly skilled in their domain — and I have, many times! — without treating people with contempt, or disparaging them in public for performing the exact jobs that, again, your own company defined and hired them to do, and they faithfully did, for years.
The moral of the story is valid. The tone is unwarranted and disrespectful (and the whiff of ageism is just the rotten little cherry on top).
As for his claim that “A lot of companies do that” — hire managers that aren’t experts in their field, who just do pure people management — no? Maybe? Not that I’m aware of, not in the past decade. Citation needed.
You don’t manage people, you manage people through the work
“I got rid of all quote managers or they left the company and we said you can only manage the function if you’re an expert. So like the head of design has to actually manage the work first. You don’t manage people. You manage people through the work. I learned this from Johnny Ive because most heads of design, at most tech companies don’t actually manage design. They manage the people. Johnny Ive would say no, my main job is to manage the work and I build a team and we design together. But I’m mostly looking at the work. I’m not like having career conversations all day long. That’s crazy.”
Again, I’m not sure where he gets this idea that at “most tech companies”, the head of design is just like…hired from Starbucks or something for their people management skills? So mystifying.
“The best way to get rid of meetings is to not have so many people”
“The reason there’s too many meetings in a company isn’t because they don’t have no-meeting Wednesdays, it’s because they have too many people. People create meetings, and the best way to get rid of meetings is to not have so many people. There’s no other better way to do that (emphasis mine).”
Um…it might be a mistake to read this too literally, but this is a really stupid thing to say. People do incur coordination costs, but just to be clear, there are lots of ways to get rid of meetings, no matter how many people you do or don’t have, and you should absolutely be investing in some of them in an ongoing way. For example,
Develop a rich written culture and rituals around async work
Make recordings available, use AI transcription and summaries, or take notes and send them around
Use calendar plugins to visualize where your time is going, or even automatically reschedule meetings to compact your calendar and create blocks of focus time (e.g. Clockwise)
Declare calendar bankruptcy for meetings with >3 people every quarter, like Spotify does
Use ‘optional’ invites to be clear whether you’re inviting someone because you need them there vs for awareness purposes, or because you think they might be interested
Simply remind people that they own their calendar, and it’s okay to decline!
Synchronous meetings are one of many, many ways to coordinate between people and groups. There are others. Explore and experiment.
Maybe don’t call your employees “C players”, “incapable people” or “non world class”
“So you end up with this situation where non world class people, you know the old saying ‘A players hire A players, B players hire C players’, I would like to amend it. B players hire LOTS of C players, not just a few but a lot, because those are the kind of people that like building empires. If you can’t capably do your job, you don’t hire people better than you, and a person less capable than you can’t do the job.
So you need three incapable people because one incapable person can’t actually do all the work. But now three incapable people are just going in three different directions, creating all these meetings and all this administrative tax.”
Deep breaths.
Ok. My goal for this piece is NOT to spend the whole time complaining about Brian Chesky and his lack of accountability, empathy, or respect (or as a friend of mine put it: “I am prepared to argue that he has no theory of mind for any actor at the company that is not the CEO. The search for the deep truth can stop, Brian doesn’t actually know what people are.”)
I want to invest my own limited time and energy into plucking out the bits of advice he gives that are solid, practical, and actionable, so I can contextualize and expound upon them.
With that in mind, let’s skip right past the insults and acknowledge the fact that there are real challenges here. It’s extremely difficult to evaluate people who are more skilled than you are in the interview process, and harder still to evaluate those who are skilled in a different domain. Developing these muscles as an organization, figuring out what excellence looks like for each level in each role, maintaining a high bar of quality and employee-role fit…these are investments, and they take time and attention.
Constraints fuel creativity. Constraints also fuel efficiency. One of the biggest pathologies of hypergrowth is that when money is free, and everybody is telling you to go go go, grow grow grow! discipline tends to fly out the window. These things are hardto do well even under the best of circumstances; when everyone’s being given unlimited budgets and told to hire their way out of their backlog, well, can you blame them for doing exactly as they’ve been told?
Pretty shitty to retroactively decide they were all losers, if you ask me.
Great leadership is presence, not absence
“Founder mode at its core, though, is about the single principle to be in the details. Great leadership is presence, not absence. So to go back to my lesson, it is not good for you to hire great people and trust them to do their job. How do you know if they’re doing a good job if you’re not in the details?
You should start in the details. And no one does this (emphasis mine). Everyone hires executives and they let them do their thing, and then they find out a year later, the whole thing has been wrong. They’ve hired people they shouldn’t have hired. Now you got to get in the details. And of course, now their confidence goes down. They always inevitably leave the company. And you should actually start in the details, develop trust, develop muscle memory and then let go. So great leadership is presence not absence.”
A-fucking-men.
…Except for the one small fact that Chesky keeps repeating, “no one does this”. My dude, everyone does this. Nobody just hires an executive and sets them loose and doesn’t look over their shoulder for a year. What the flying fuck? That is lunacy. I love that you are discovering basic leadership principles and it is just fucking flooring you, but have you ever cracked a book about management, or talked to another leader? Ever?
Christine and I learned a long time ago not to tell our execs, “I’m not going to tell you how to run your org.” The goal is to do the work to be in alignment so that you don’t have to tell someone how to run their org, because you have a shared idea of what “great” looks like — and what “good enough” looks like — and you can catch deviations early, while they’re easy enough to fix.
Great leadership is presence, not absence; agreed, absolutely. But what does that mean exactly? Fortunately, he’s about to tell us.
“I review every single thing in the company. If I don’t review it, it doesn’t ship.”
“There was this paradox of CEO involvement. The less involved I got in a project, the more dysfunctional it got; the more dysfunctional it got, the more people assumed the dysfunction came from leadership…And then it would get so screwed up, then I would get involved. So what I ended up doing, I took a playbook of Steve Jobs, Elon Musk does this, Jensen Huang does this, Walt Disney does this, all of them do this. (emphasis mine)
If the CEO is the chief product officer in the company, then you should review all of the work. So I review every single thing in the company. If I don’t review it, it doesn’t ship. I review everything on a cadence…If you’re not actually good at product, you don’t have good judgment and you’re not a super skilled product leader, then maybe you shouldn’t be CEO of the company, I don’t know. So let’s assume you’re actually good at what you do, then I think you should review all the work.”
Whuf.
Let’s back up a second. Brian Chesky has led Airbnb on an incredible journey over the past 17 years — from idea to startup to bloated, sprawling post-unicorn behemoth; through a near-death experience, restructuring and IPO; and emerged on the other side of it all as a public company with a share price of $130. He didn’t do this alone (I really loathe the trope where we treat companies like the extension and embodiment of one man’s will to power), but this also doesn’t happen by accident or happenstance.
He deserves credit for this. It’s more than I’ve done! Who cares what I have to say about any of this, really? I don’t have the same degree of believability as Brian Chesky when it comes to how to build a resilient, enduring, high-quality product company.
So let’s listen to someone who does have believability. Here’s what Reed Hastings says in “No Rules Rules: Netflix and the Culture of Reinvention” (share price: $921):
“There’s a whole mythology about CEOs and other senior leaders who are so involved in the details of the business that their product or service becomes amazing. The legend of Steve Jobs was that his micromanagement made the iPhone a great product…Of course, at most companies, even at those who have leaders who don’t micromanage, employees seek to make the decision the boss is most likely to support.
We don’t emulate those top-down models, because we believe we are fastest and most innovative when employees throughout the company make and own decisions. At Netflix, we strive to develop good decision-making muscles everywhere in our company — and we pride ourselves on how few decisions senior management makes (emphasis mine).”
His co-author, Erin Meyer, chimes in:
“People desire and thrive on jobs that give them control over their own decisions. Since the 1980s, management literature has been filled with instructions for how to delegate more and ‘empower employees to empower themselves’…The more people are given control over their own projects, the more ownership they feel, and the more motivated they are to do their best work.”
OMG, confusing!! Evidently ALL of them do NOT do it. What even IS the moral of the story here?! Well…it’s not a simple one, unfortunately. It turns out that you can’t just copy what Brian Chesky did at Airbnb, or what Reed Hastings did at Netflix, and paste it into your company and expect the same results. Bummer!
There are many paths up the mountain
This is an architecture problem. The Chesky/Airbnb architecture is like a monolith application, or a single-threaded process. Everything goes through the CEO, and that’s how they maintain quality. The Hastings/Netflix architecture is more like a microservices application or a threaded, highly concurrent process.
Either can work. Both have tradeoffs and implications. If you try to import either philosophy wholesale, it will break in unexpected ways; if you try to mix and match, it will probably be an unfettered nightmare.
Your architecture will only work if it solves for your problems, utilizing your resources, values, and contingencies. It needs to be authentic, consistent, and internally coherent. This doesn’t mean you can’t learn anything from either of these companies. You can — I have! But you should probably treat them like reference architectures — just-so stories about how individual cultures have successfully evolved in response to their unique challenges and threats, not recipe books.
And I can tell you right away that as an employee, one of these models looks a whole hell of a lot more appealing than the other.
But wait — it gets worse. 😅
Should the CEO interview every candidate?
“I interviewed the first 400 people and I wish I interviewed longer. Maybe my biggest regret is not interviewing the first thousand. I think you should interview every candidate until the recruiting team stages an intervention. Once they stage an intervention, you should interview for two more years after that until everyone threatens to resign…and then you should step away.”
Well. If this is the kind of company you’re choosing to build, then I suppose you may as well be consistent.
Can you be calibrated as an interviewer on every single opening, for every role? My God, no, not even close.
The thing is…I have talked to so many people who work at companies where the CEO insists on interviewing every candidate. It seems to be a trend that is gaining steam rather than losing steam, much to everyone’s misfortune.
Which means that I have personally heard so many anguished stories from angry, frustrated engineering managers who have had their decisions overturned by arrogant CEOs who lacked the skills to evaluate their candidate’s experience, who were biased in blatant and embarrassing ways, who were so fucking overconfident in their own judgment that their teams are constantly having to compensate and apologize and mop up after them.
Want an example? Sure. I recently heard from a director at a 500-person company who spent six months cultivating and recruiting an exceptional hire with an unusual skill set. The candidate made it through their interview loop with flying colors, only for the CEO to reject them because they had recently had a child and were forthright about the fact that work/life balance was a meaningful consideration for them at this point in time. (The director did their best to do damage control, but even though the CEO ultimately relented, the candidate was no longer willing to leave their job. Can you blame them?!?)
It keeps getting worse! Here comes the low point.
“If they would come work for you, they’re not good enough. They’re only good enough if they come to work for me.”
“Can I give you an example of what I do today that no one else, not no one but maybe 95% of public company CEOs don’t do. I have an executive team, right?…I have like seven execs and 40 or 50 VPs. All the directs to my directs dual report to me. I am the co-hiring manager of all the directs to my directs and so we meet and I often tell my directs, ‘I don’t want somebody that you could hire without me. If they would come to work for you, they’re not good enough. They’re only good enough if they come to work for me. So if you can hire them without my help, they’re not good enough.’”
I just about lost my shit over this. Do youhear yourself, bud?
The irony is…I am actually the world’s hugest proponent of skip level 1x1s. I have two or three half-written blog posts in my drafts folder preaching the value of skip levels. I’ve written MULTIPLE twitter threads over the years, talking about how important it is to build relationships with your manager’s managers and your direct reports’ direct reports.
I’ve said that I think skip levels are like end-to-end health checks. It’s important to open a line of communication and explicitly invite critical feedback and bad news. It’s a way to verify that managers are doing a good job managing their teams. It’s how you help iron out telephone games and ensure packets are being transmitted and received up and down the org chart. They are such a critical contribution to organizational health and clear communications, and not enough places invest in them.
I’m also a big proponent of promoting from within, of hiring ambitious people — all of it.
But this attitude towards hierarchy that locates the CEO at the center of every universe, and ranks people in importance according to their proximity…it’s just gross. It’s an attitude that’s contagious; it spreads, like syphilis. And I do not think it unlocks intrinsic motivation or excellence in most humans. It mostly incentivizes a bunch of maladaptive behaviors like sucking up to the CEO.
UGH. Okay, this is getting really long. I’m going to jump rapid-fire through a few final nuggets.
Executive hiring fails when you hire someone at the wrong stage
“Probably the number one reason executive hiring fails is because you hire somebody at the wrong stage. And they were managing instead of building, and you didn’t know that. And so you brought in a manager who is an expert or not so expert, but comfortable in a highly political bureaucracy. And now they have to do things themselves and they can’t. They also have the wrong stage instinct, right? Maybe a CMO used to run $500 million marketing budgets. Now they have a $50 million or $5 million budget, and they don’t know what to do and they can’t do anything themselves.”
Yes, execs can fail because they are managing instead of building, but they can ALSO fail because they are building instead of managing. I’ve worked with execs who operated like they were effectively the most senior IC in the room, and they had…extreme limitations as leaders, let’s put it that way.
Overall, this is a solid point. Being a CMO that takes a company from $1-10m or $10-$50m is a very, very different skill set than taking a company from $50 to $250m, or through an IPO.
We look for executives who can both scale up and scale down. Scale up: you can speak credibly to the board, at the right level of abstraction vs detail, you can craft strategy, see around corners etc. Scale down: you know what “good” looks like for work all over your organization, you can get down in the weeds to help coach a struggling IC back to victory, you can debug a flailing campaign or workflow. Both matter.
References are critical for building confidence in your hires
“I actually prioritize references over interviewing…Andreeson Horowitz would tell me, you should do 8 hours of reference checks per employee.”
Agreed. I’ve said many times that if I had to choose between interviews or references, I would pick references every time. (Fortunately, you don’t have to pick!)
“Ask them who the best people are. Say, ‘okay, separate from this topic, I just want to know who’s the best person you’ve ever worked with.’ Do they say the person’s name you just talked about?”
This trick doesn’t fool anybody.
“Then you ask questions like, okay, what do I need to watch out for? If I were to hire them? What is the one area of development you would give them?”
This is good advice. You should always probe into people’s weaknesses and areas of development. Everyone has them, there’s no shame in that. Hearing details about where they are weak can give you confidence, and set you up better to support them. It gives you richer insight into them as a person and coworker.
A basket of interviewing tips and tricks
“Interviewing. My first tip is you ask follow up questions. You ask them how to explain how they did something. And the key is to ask two followups. You never want to get the first answer, you always want the third answer.”
Asking follow-up questions is a classic technique, and a good one. But don’t let them dominate the conversation with a narrative. You want to be intentional about pulling on specific threads and making sure they answer what you asked, not pull a politician’s move and give the answer they feel like answering. Does the answer sound canned, or are they thinking on their feet?
“Often there’s too many people interviewing for too short a time, not going deep enough. Your interview panel should be as few people as possible, going as deep as possible…3 or 4 people going really deep is better than 8 or 10 people giving you their first impression…and they’re actually mostly thinking about what this means for them.”
Yeah, so this is an area where my thinking has actually changed a lot over the years. I used to cast a much wider net, like I felt like people ought to get to interview anyone who was being hired over them. I’ve come to realize that having too many veto points in the system is dangerous and doesn’t actually add more value. Yes, people like being offered the opportunity to affirmatively vet someone, but at a certain point you have to prioritize the candidate experience — and trust your team to make good choices.
It’s usually better to have a fewer number of interviewers, but make sure they are all well calibrated for the role, and that there’s a certain amount of coordination between interviewers so everyone is covering different questions/aspects of the role. If you have 8 or 10 interviewers, that is way, way too many.
“Every potential hire is guilty until proven innocent. It is the opposite of our justice system. Most people, when they interview, they look for the absence of weaknesses and that is innocence. The presumption is someone’s good. You should always presume somebody is not good. You need proof. They don’t work for you. So you need evidence to hire them, not evidence to eliminate them as a candidate and almost every company gets this wrong. And what they end up doing is hiring mediocre people with an absence of weaknesses, not people that have a preponderance of evidence of being really good and spike in a few areas.”
Again, there is a solid principle buried deep under all this repugnant bullshit about “mediocre people” and “guilty until proven innocent”. Here’s how I would put it: you want to hire people for their unique strengths, not their lack of weaknesses. If they’re strong where you need them to be strong, it’s okay if they aren’t equally superpowered at everything — that’s why we build teams, to supplement and balance each other out.
In the Honeycomb interview process, we emphasize that we want to see you at your best — please help us do that! If you don’t feel like we’ve seen your strengths, please tell us, so we can fix it.
See, how hard was that? Same point, zero jackassery.
There is no such thing as the ‘best people’
Another way to look at it is the quality of the people. People never hire people better than them. So there might be people that are good at their job, but it’s not enough to be good at your job in most large companies. If you are the best in the world at your job, but you can’t hire really great people, then you’re not going to be the best in the world because your team isn’t really good.
God, he does this over and over again, talking about people like they exist on some index you can stack rank or something.
Here’s one small mental hack that makes a world of difference: remember that you are trying to hire the right people to join your team/org/company.
Not the “best” people.
The right people.
The fact that someone isn’t a superstar employee for this company, this product, this team, at this stage, doesn’t mean they might not be a superstar employee for someone else. And people who aren’t “superstar employees” are still worthy of your respect. Not wanting to work your ass off is a perfectly legitimate life choice and does not make them a lower quality human. Maybe they aren’t the right hire for you, but you don’t have to treat them — or talk about them — like shit.
People who work for big, stable companies, are not necessarily bad at their job or incapable of building things. They have a different skill set, they may work at a different tempo, but this doesn’t mean they suck. My god. So fucking condescending.
There’s such a special kind of hubris in these startup kids who are losing tens of millions of dollars a year and looking down their noses at their peers in organizations that are making tens of millions of dollars a year, believing themselves to be categorically better than them just because they can…prototype real fast? Unclear.
Building a world-class team is about more than just hiring
I wrote a piece a few years ago called “The Real 11 Reasons I Don’t Hire You”, where I discussed a few of the many variables that go into deciding who to hire. It’s complicated — it is irreducibly complicated. And it should be.
But it’s also just the beginning. The team, the culture, the sociotechnical systems you hire them into are going to exert a gravitational pull over all of the people you hire. Are you bringing them into an environment that is generative, playful, creative, experimental, intense, competitive, demoralizing, controlling, grinding, aspirational, compliant, hierarchical, passive-aggressive, or aggressive-aggressive? Are standards applied consistently? What behaviors get rewarded or punished, actively or otherwise? Who gets mentored and fast tracked to the top? Who gets the most facetime with the CEO? Is CEO facetime a prized currency? Why? Systems drive behavior.
Sociologists have a term for the cognitive bias that causes us to predictably, consistently over-emphasize individual agency and attributes and underestimate situational factors: the FAE, or Fundamental Attribution Error. This whole interview is sopping with FAE energy.
It’s not as simple as “just hire great people”. You want to hire people who share your values, want to do the job, have the right skills, are motivated, etc, and then the conditions you create for them to work under will either cause them to flourish and feed their creativity and drive, or will crush them and shut them down. The feedback loop runs both ways.
Hypergrowth is hazardous to your company’s health
“In a hypergrowth company, it could even be 50% of your time is hiring.”
Chesky mentions hypergrowth only once and briefly, towards the end, but it’s a vital piece of context if you want to understand the Airbnb story.
As he says, Airbnb was one of the O.G. unicorns — a unicorn before they coined the term ‘unicorn’. It was born in the era of hypergrowth and free money. That’s the only way to make any sense of the fact that a company could pay such comically little attention to efficiency, for so long. (Thirteen years, to be exact.)
When all you have is a hammer, everything looks like a nail. In hypergrowth mode, you solve every problem by throwing more resources at the system. The tools you learn are weird ones, which map awkwardly to the skills you need to run a normal, sustainable company that’s expected to turn a profit. Hypergrowth encourages a raft of bad habits, and attacking every problem by hiring more people is one of them.
This is not good for anyone, except perhaps venture capitalists. The externalities are dreadful. It’s impossible to scale your culture, your practices, your values, or people’s expectations at an equivalent pace. The correction is brutal, when the time finally comes to worry about efficiency — and eventually, everybody needs to worry about efficiency. The higher the ride, the harder the fall. The bill comes due.
The CEO-centric view of the universe
One of my least favorite things about YC is the way it seems to pursue extremely young and inexperienced founders. If you’ve never been a manager, director, VP, staff or principal engineer, it’s a lot easier to look down on those people and disrespect the role they play in the ecosystem.
It looks like Brian Chesky was about 26 years old when he cofounded Airbnb. He has basically been a CEO for his entire career. And this is, I think, a great example of the kind of blinkered perspective you get from someone who has no real idea what it’s like to sit anywhere else on the org chart.
After watching the first 40 minutes of this talk, one might reasonably wonder if Brian Chesky understands that being CEO of a company means being accountable for its outcomes.
What makes all of this extra frustrating is that in the final five minutes, he shows us that he does know this…at least when it comes to board interactions.
“Oftentimes if you take advice from a VC and it doesn’t work and you don’t have traction…You’re still held responsible. So the only thing that matters is you’re successful, not if you listen to them or not. People sometimes forget and they’re like, well, you shouldn’t have listened to me. They don’t say it that way, but that’s kind of the way it happens. So I would just know that, like, you own the outcome no matter what.”
We’ve been talking about observability 2.0 a lot lately; what it means for telemetry and instrumentation, its practices and sociotechnical implications, and the dramatically different shape of its cost model. With all of these details swimming about, I’m afraid we’re already starting to lose sight of what matters.
The distinction between observability 1.0 and observability 2.0 is not a laundry list, it’s not marketing speak, and it’s not that complicated or hard to understand. The distinction is a technical one, and it’s actually quite simple:
Observability 1.0 has three pillars and many sources of truth, scattered across disparate tools and formats.
Observability 2.0 has one source of truth, wide structured log events, from which you can derive all the other data types.
That’s it. That’s what defines each generation, respectively. Everything else is a consequence that flows from this distinction.
Multiple “pillars” are an observability 1.0 phenomenon
We’ve all heard the slogan, “metrics, logs, and traces are the three pillars of observability.” Right?
Well, that’s half true; it’s true of observability 1.0 tools. You might even say that pillars define the observability 1.0 generation. For every request that enters your system, you write logs, increment counters, and maybe trace spans; then you store telemetry in many places. You probably use some subset (or superset) of tools including APM, RUM, unstructured logs, structured logs, infra metrics, tracing tools, profiling tools, product analytics, marketing analytics, dashboards, SLO tools, and more. Under the hood, these are stored in various metrics formats: unstructured logs (strings), structured logs, time-series databases, columnar databases, and other proprietary storage systems.
Observability 1.0 tools force you to make a ton of decisions at write time about how you and your team would use the data in the future. They silo off different types of data and different kinds of questions into entirely different tools, as many different tools as you have use cases.
Many pillars, many tools.
An observability 2.0 tool does not have pillars.
Your observability 2.0 tool has one unified source of truth
Your observability 2.0 tool stores the telemetry for each request in one place, in one format: arbitrarily-wide structured log events.
These log events are not fired off willy-nilly as the request executes. They are specifically composed to describe all of the context accompanying a unit of work. Some common patterns include canonical logs, organized around each hop of the request; traces and spans, organized around application logic; or traces emitted as pulses for long-running jobs, queues, CI/CD pipelines, etc.
Structuring your data in this way preserves as much context and connective tissue as possible about the work being done. Once your data is gathered up this way, you can:
Derive metrics from your log events
Visualize them over time, as a trace
Zoom into individual requests, zoom out to long-term trends
Derive SLOs and aggregates
Collect system, application, product, and business telemetry together
Slice and dice and explore your data in an open-ended way
Swiftly compute outliers and identify correlations
The beauty of observability 2.0 is that it lets you collect your telemetry and store it—once—in a way that preserves all that rich context and relational data, and make decisions at read time about how you want to query and use the data. Store it once, and use it for everything.
Everything else is a consequence of this differentiator
Yeah, there’s a lot more to observability 2.0 than whether your data is stored in one place or many. Of course there is. But everything else is unlocked and enabled by this one core difference.
Here are some of the other aspects of observability 2.0, many of which have gotten picked up and discussed elsewhere in recent weeks:
Observability 1.0 is how you operate your code; observability 2.0 is about how you develop your code
Observability 1.0 has historically been infra-centric, and often makes do with logs and metrics software already emits, or that can be extracted with third-party tools
Observability 2.0 is oriented around your application code, the software at the core of your business
Observability 1.0 is traditionally focused on MTTR, MTTD, errors, crashes, and downtime
Observability 2.0 includes those things, but it’s about holistically understanding your software and your users—not just when things are broken
To control observability 2.0 costs, you typically reach for tail-based or head-based sampling
Observability 2.0 complements and supercharges the effectiveness of other modern development best practices like feature flags, progressive deployments, and chaos engineering.
The reason observability 2.0 is so much more effective at enabling and accelerating the entire software development lifecycle is because the single source of truth and wide, dense, cardinality-rich data allow you do things you can’t in an observability 1.0 world: slice and dice on arbitrary high-cardinality dimensions like build_id, feature flags, user_id, etc. to see precisely what is happening as people use your code in production.
In the same way that whether a database is a document store, a relational database, or a columnar database has an enormous impact on the kinds of workloads it can do, what it excels at and which teams end up using it, the difference between observability 1.0 and 2.0 is a technical distinction that has enduring consequences for how people use it.
These are not hard boundaries; data is data, telemetry is telemetry, and there will always be a certain amount of overlap. You can adopt some of these observability 2.0-ish behaviors (like feature flags) using 1.0 tools, to some extent—and you should try!—but the best you can do with metrics-backed tools will always be percentile aggregates and random exemplars. You need precision tools to unlock the full potential of observability 2.0.
Observability 1.0 is a dinner knife; 2.0 is a scalpel.
Why now? What changed?
If observability 2.0 is so much better, faster, cheaper, simpler, and more powerful, then why has it taken this long to emerge on the landscape?
Observability 2.0-shaped tools (high cardinality, high dimensionality, explorable interfaces, etc.) have actually been de rigeur on the business side of the house for years. You can’t run a business without them! It was close to 20 years ago that columnar stores like Vertica came on the scene for data warehouses. But those tools weren’t built for software engineers, and they were prohibitively expensive at production scale.
FAANG companies have also been using tools like this internally for a very long time. Facebook’s Scuba was famously the inspiration for Honeycomb—however, Scuba ran on giant RAM disks as recently as 2015, which means it was quite an expensive service to run. The falling cost of storage, bandwidth, and compute has made these technologies viable as commodity SaaS platforms, at the same time as the skyrocketing complexity of systems due to microservices, decoupled architecture patterns has made them mandatory.
Three big reasons the rise of observability 2.0 is inevitable
Number one: our systems are exploding in complexity along with power and capabilities. The idea that developing your code and operating your code are two different practices that can be done by two different people is no longer tenable. You can’t operate your code as a black box, you have to instrument it. You also can’t predict how things are going to behave or break, and one of the defining characteristics of observability 1.0 was that you had to make those predictions up front, at write time.
Number two: the cost model of observability 1.0 is brutally unsustainable. Instead of paying to store your data once, you pay to store it again and again and again, in as many different pillars or formats or tools as you have use cases. The post-ZIRP era has cast a harsh focus on a lot of teams’ observability bills—not only the outrageous costs, but also the reality that as costs go up, the value you get out of them is going down.
Yet the cost multiplier angle is in some ways the easiest to fix: you bite the bullet and sacrifice some of your tools. Cardinality is even more costly, and harder to mitigate. You go to bed Friday night with a $150k Datadog bill and wake up Monday morning with a million dollar bill, without changing a single line of code. Many observability engineering teams spend an outright majority of their time just trying to manage the cardinality threshold—enough detail to understand their systems and solve users’ problems, not so much detail that they go bankrupt.
And that is the most expensive part of all: engineering cycles. The cost of the time engineers spend laboring below the value line—trying to understand their code, their telemetry, their user behaviors—is astronomical. Poor observability is the dark matter of engineering teams. It’s why everything we do feels so incredibly, grindingly slow, for no apparent reason. Good observability empowers teams to ship swiftly, consistently, and with confidence.
Number three: a critical mass of developers have seen what observability 2.0 can do. Once you’ve tried developing with observability 2.0, you can’t go back. That was what drove Christine and me to start Honeycomb, after we experienced this at Facebook. It’s hard to describe the difference in words, but once you’ve built software with fast feedback loops and real-time, interactive visibility into what your code is doing, you simply won’t go back.
It’s not just Honeycomb; observability 2.0 tools are going mainstream
We’re starting to see a wave of early startups building tools based on these principles. You’re seeing places like Shopify build tools in-house using something like Clickhouse as a backing store. DuckDB is now available in the open-source realm. I expect to see a blossoming of composable solutions in the next year or two, in the vein of ELK stacks for o11y 2.0.
There are still valid reasons to go with a 1.0 vendor. Those tools are more mature, fully featured, and most importantly, they have a more familiar look and feel to engineers who have been working with metrics and logs their whole career. But engineers who have tried observability 2.0 are rarely willing to go back.
Beware observability 2.0 marketing claims
You do have to be a little bit wary here. There are lots of observability 1.0 vendors who talk about having a “unified observability platform” or having all your data in one place. But what they actually mean is that you can pay for all your tools in one unified bill, or present all the different data sources in one unified visualization.
The best of these vendors have built a bunch of elaborate bridges between their different tools and storage systems, so you can predefine connection points between e.g. a particular metric and your logging tool or your tracing tool. This is a massive improvement over having no connection points between datasets, no doubt. But a unified presentation layer is not the same thing as a unified data source.
So if you’re trying to clear a path through all the sales collateral and marketing technobabble, you only need to ask one question: how many times is your data going to be stored?
Recently we learned that Google spent $2.7 billion to re-hire a single AI researcher who had left to start his own company. As Charlie Brown would say: “Good grief.” 🙄
This is an (incredibly!) extreme example. But back in the halcyon days of the zero interest rate phenomenon (ZIRP), smaller versions of this tale played out daily. Many rank-and-file engineers have stories about submitting their resignation, or threatening to quit, and their managers plying them with stock or cash or promotions to stay. This happened so much that it started to seem like the normal thing to do when you wanted a raise or a promotion. Job hopping for better comp also happened, but people quickly figured out that by merely threatening to leave, you could often get the loot without the hassle of having to actually switch jobs.
Many of these stories have been embellished dramatically over time, as real anecdotes fade into legends of the “my friend knows a person who” or “I read it on Blind” varieties, but the lore is based in reality. It really did happen. The legacy of these episodes is…not great.
To be clear, I do not begrudge employees trying to maximize their wages and comp by changing jobs. It’s the gamification and brinksmanship I object to, and all the ways it ends up distorting company culture and values and outcomes. In the overheated ZIRP environment, lots of companies felt like this is what they were forced to do to compete for talent. Maybe so, maybe not. But money is not the only thing people value, which means that this is not the only way to compete for talent.
After all, the hot air of the inflationary ZIRP bidding wars is what led to the post-ZIRP job market collapse. The boom and bust cycle is stressful and counterproductive, which leads to uneven, disastrously unfair outcomes and an oppositional, extractive mindset on both sides. We can do better. We must do better. Let’s talk about how.
You should stay at your job as long as it fulfills your career priorities
How long should you stay at your job? As long as it’s the best thing you can do for your career, or at least a reasonable, smart career choice, in alignment with your own personal career goals and life priorities.
Maybe this sounds mind-numbingly obvious to you. But far too many people stay far too long at jobs where they aren’t happy, aren’t growing, and aren’t setting their future selves up for success. Hey, I’ve been there…these decisions can be brutal. 💔
Your career is an appreciating, multimillion dollar asset, probably the largest single asset you will ever own. How you define what is best or right for you will inevitably shift over the course of your 40-year career, and that’s fine. This is normal.
But you have to make these decisions based on what is right for you, your career, and your family. Not because, say, you feel responsible for protecting your team from upper management, or you’re afraid of what will happen to the product or the team if you leave, or you feel like you owe them something. Nor should you stay out of fear, whether that be fear of interviews, that this is the best you can do, etc.
Sometimes your top priority might be making the most money, so you can get out of debt. Sometimes it might be a simple, uncomplicated paycheck and low expectations so you can spend a lot of time with your family. Sometimes you may be on a hot streak and raring to go, working like crazy and making a name for yourself in the industry. When in doubt, my advice is to 1) preserve optionality, 2) follow good people and 3) lean into that which energizes you.
The company should employ you as long as it’s a good fit
There are certainly companies where people get fired too quickly or in bad faith. There are also companies where people who are not working out linger on and on and on in the role. It might be tempting to conceive of the latter situation as more worker-friendly, but in all honesty, neither situation is great.
If the wants and needs of the company and the employee are not aligned, you aren’t doing them any favors by dragging it out or keeping them around in a prolonged state of purgatory. If things are decidedly not working out, I promise, they are miserable.
If you are a manager, your number one job is to bring clarity. What are the expectations for the role, what does success look like, what support does the employee need in getting there? When things aren’t going well, your job is to work with them to figure out what is happening, and come up with a plan. Is there a shared understanding of what success looks like in this role? Is it a skills gap, are there relationships that need mending, do they need some time off to deal with personal issues? Are they still interested in the work? Is it still a good fit?
There is an extremely short list of jobs that can only be done by managers, and managing people out (which does sometimes mean firing them, but not always), is at the tippy top of that list. Making sure the right people are on the team is job number one. Figuring this shit out swiftly — we’re talking months, not years — is critical.
Also, none of this happens magically or automatically. This shit is hard. Which is why it is important to invest in these skills and set expectations for your managers.
Your manager should try to make this a great career opportunity for you, for as long as possible
It’s the job of your manager to ensure that this role is a great opportunity for you, for as long as possible. For mid-level engineers this means making sure you are learning and expanding your skill sets, that you have access to mentorship and support systems, that you get to follow your curiosity to some extent and work on things that interest you. For more senior folks, this might mean looking out for opportunities to lead projects or wear new hats.
But that won’t be forever, for anyone — not even your CEO or founders! And that’s okay. This is not a family, it’s a company, and hopefully something of a community.
Sometimes you get an opportunity you can’t refuse. Or life takes you in a different direction. It happens! It is not a tragedy when people leave for a better opportunity or something that excites them.
Real-life example: Paul Osman left Honeycomb because he and his family were moving to NYC and needed a Big Tech salary. He was a wonderful staff engineer (at a time when those were scarce), a high performer, effective across the org, beloved by all; he was even on our board of directors, our first elected employee board member! But when he let us know he was going to leave, we … wished him well. We couldn’t match the salary he needed to pull down; he knew that, we knew that. Nor would it have been fair to all the other staff engineers if we had tried.
Managers need to be actively engaging in career development and planning with their reports. The more you know about someone’s personal values and priorities, the better you can do to try and set them up with opportunities that appeal to them and the trajectory they are on.
Your manager should also be honest if you could find better opportunities elsewhere
It can be hard to admit to your star employee that if you were them, you’d be looking elsewhere for opportunities. Maybe you have an incredible, ambitious senior engineering director who is hungry and chafing to move up, but you don’t expect to see any openings at the VP level over the next year or two. They deserve to know that, I think.
To be clear, you are NOT firing them. Usually, you are holding your breath and praying they will choose to stay. Often they do! Maybe they love their job enough that they’re happy to stick around for another couple years just to see if any openings do arise, or they switch into passive job search mode, taking interesting calls but not actively looking. Maybe you have a conversation about ways they could build their career in other ways, by doing more writing and speaking. Maybe they decide this is a good window of time to have another kid.
But if you can’t honestly look them in the eye and tell them this is the best place for them, given what you know of their ambitions and priorities, you have to say so. It’s on them to decide what to do with that information. But if you want them to trust you when you say this is a great opportunity for their career, you have to be truthful when the opportunity is just not there.
Some amount of employee turnover is natural and healthy
When I worked at Linden Lab in my early twenties, I remember vividly how much pride we took in the fact that people never left. I was there for 4.5 years, and I think we had a single-digit number of departures that entire time. I remember thinking to myself how incredibly special this company must be, because nobody ever wants to leave.
It was a special company. ❤️ But when I look back now, this part makes me cringe. Yep, nobody ever left. No one was ever managed out, even the people who never seemed to do anything but hang out in Second Life or work on whatever the fuck they personally felt like doing. It was a little bit … culty? There were some incredible engineers there, but also a systematic inability to row in the same direction or make a plan and execute on it. In some ways Linden felt more like a social club than a business.
I loved working there, don’t get me wrong, and I learned a lot. But in retrospect, some amount of turnover is good. It’s healthy. It means you have standards for yourselves, and someone is paying attention to whether or not we’re actually making progress and getting shit done, or whether or not the people we need are in the right seats.
Tenure functions somewhat differently at very large companies; it may take years for someone just to come up to speed and learn how to operate within the system, so they do their best to retain people for decades. When you’re a startup in growth mode, though, you become a completely different organism every few years. People who are happy as clams and supremely productive from $0-$1m or 1-50 people may or may not adjust well to the $50m or $200m environment. People who are superstars on one side of the Dunbar number are sometimes bitterly unhappy on the other side.
There’s “regrettable” and “non-regrettable” attritions, but the company should be able to go on operating even in the face of “regrettable” departures.
There are, of course, exceptions. So let’s talk about these.
Sometimes people sit in critical roles at critical moments
At any given time, there exists a subset of people who are disproportionately critical to the success of the business at the moment, people whose departure could seriously jeopardize the company’s ability to meet its goals this quarter or even this year. It sucks, but it’s a reality. This happens.
If that’s a very long list of people, however, or if it’s the same people over and over, or if the actual survival of the company would be in jeopardy and not just a subset of your goals, then your leaders are not doing their fucking job.
Part of the job of running a company is developing talent to be successors to key people. Part of their job is to replicate and distribute critical company knowledge and skills. None of us should be irreplaceable — not even the CEO, or CTO, or founders. If the company’s future depends irrevocably on the continued employment of any individual person, the company’s leaders are fucking up, full stop.
There are two types of disproportionally critical employees: superstars and SPOFs
The right time to determine who is on that critical subset is NOT when one of them resigns. You should be asking yourselves somewhat regularly — which people are our superstars, the ones we really, really want to make sure are happy and fulfilled here, and which people are single points of failure, the ones we cannot easily replace, or function in their absence?
Note that these are not necessarily the same two lists!
This doesn’t have to be a heavyweight process, but if you are large enough to have a People team or HR team, they should be ensuring that talent reviews and succession planning conversations are happening like clockwork, once per quarter or so.
Your superstars are the people who are standout performers, carrying a ton of load for the company or generating uniquely creative ideas, etc. You should identify these people proactively and make sure they are feeling challenged, supported and valued. What are their values — what lights their fire? Where are they trying to go in their career, in their life? How do they like to receive recognition? How does it manifest when they feel overwhelmed or demotivated?
Managers tend to devote most of their attention to their lowest performers. Be wary of this. Yes, give people the support they need. But the biggest bang for your buck is typically the time you spend on your highest performers. Don’t neglect your superstars just because they are doing well.
Get to know your superstars, and compensate them
And compensate your superstars. Whatever pool of money is set aside for high performers at your company, make sure they get a slice of it — a raise, a bonus, direct equity, etc.
But money isn’t the full story, it’s just the first chapter. This is where you need to dig a little deeper and get to know them better — their values, their love languages, how they like to receive recognition. Make sure other company leaders know who is kicking ass and what kind of opportunities they’d be into.
Being a superstar should earn you more than money — it earns the right to experiment, try a moonshot, be first in line for a lateral role change into another area of interest. Maybe you can line them up with a work coach or continuing education, support them writing or presenting their work at conferences…the list is endless What do they value? Find out.
It is normal and desirable for your shortlist of superstars to shift over time. If it’s always the same few names on the list, that may reflect a different problem: that you are handing out all of the opportunities to take risks and shine brightly to the same few people, over and over again. It’s your job to cultivate a deep bench of talent, not one or two lead singers with everyone else in the chorus.
Work on a plan to de-risk your SPOFs
And then there are your single points of failure, people who are the only person who knows how to do something, or the only person in a function. In the early days of any startup, you have a ton of these. As you grow, you should steadily pay down this list.
If superstars are the people you want to keep out of joy, SPOFs can be the people you need to keep out of fear. You can’t function without them, even if they’re mediocre contributors. This is bad on several levels.
This is just a risk analysis you need to work through as a leadership team. Have a plan, have a backup options, and steer a path out of this state as soon as you can afford to.
I’m not naive. The realities of business are real, and sometimes something takes you by surprise, or you need to try and do a diving save for someone who has just announced they are leaving. But that should not be common. The normal, expected reaction when someone tells you they are leaving should be, “ah, that’s too bad, we’ll miss you! I’m so happy for you and this new opportunity you’re excited about!”
Most jobs will be saved or lost by boring organizational labor, not heroic diving saves.
Here is one important reality that many employees don’t seem to grasp:
The harder your employer is affirmatively working to do right by you, the fewer heroics they will be willing or able to do to retain you. And the harder the company is working to be fair and equitable, the less they will be willing or able to make exceptions to their existing compensation framework.
Here is one good end-to-end test of the system: you should not be able to get a higher salary or a larger stock grant by quitting and getting immediately re-hired. If you can, your company is not doing the work to value the labor of its existing employees by the same yardstick as it values new hires.
A lot of companies fail this test! Because in order for this to be true, your company needs to consistently adhere to pay bands, pegged to market rates, adjusted and reconciled each year. They need to do something like boxcar stock grants. They need to periodically audit their own levels and comp and look for evidence of systematic bias. They REALLY need to not make exceptions to their own god damn rules.
As Emily Nakashima says, “Many companies hemorrhage great employees in underrepresented groups because they do all those things but they fail to bring a DEI lens to them — ‘we have salary bands! we have a fair comp system! we think about ladders and promo paths!’ and then they do zero work to make sure those things are applied equitably to all their employees, including across axes of diversity like race and gender.”
All of these things take organizational willpower, and they are hard. It means a lot of hard conversations. It means saying “no” to people. It’s much easier to give out goodies to the people who complain the loudest or threaten to quit, at least in the short term.
It’s easy to talk about fairness and equity, but it takes a lot of structural labor to walk the walk
A lot of work goes into building and maintaining a system that can pass the sniff test in terms of compensating people fairly and equitably, instead of based on their negotiating skills or how much they made at their previous job.
You need to have a job ladder and levels you believe in, ones that accurately reflect the skills, behaviors and values of your org and have broad buy-in from the team. You need a process for leveling people as new hires and at review time, and for appealing those levels when you get it wrong. You should have salary bands for each level, with compa ratios based on market rates. You should be able to show your work and explain your decisions. (For example, we target the 65th% for companies of our size and funding levels, and we pay everyone SF market rates, no matter where they are located in the world.)
This is why review-time calibrations are so important. Calibrations are not about calibrating ICs, they are about calibrating managers. Calibrations are to diminish the inequity that results when one manager has a different understanding of the level an engineer is operating at, so the engineer would receive a different level, band, or rating under a different manager.
Obviously, all of these sociotechnical systems are made and operated by human beings, so there will always be some intrinsic messiness and imprecision. This is why it matters that managers show up with humility and work to get aligned with their peers on what truly matters to the company and the org. This is why it is so important that we show our work and engage with ladders and levels as living documents.
A lot of this labor is invisible to employees, and not especially well understood. I think a critical part of making these systems work is helping employees understand the tradeoffs being made, and how having a consistent leveling system ultimately benefits them, even if they are personally frustrated about not getting promoted this half. Which means every manager needs to be equipped to have these hard conversations with their team.
It should be okay to tell your manager you’re thinking about leaving, and talk about your options
HR teams will typically bucket departures into voluntary and involuntary, aka “regrettable” and “nonregrettable”. In reality, almost any time someone leaves their job, it’s some muddled combination of the two.
In the optimal case, voluntary departures are rarely a complete surprise. Surprises suck. They’re hard to plan around, they often leave gaps in coverage or contributions, and they’re a bummer for morale. You should be able to be honest with your manager and tell them if you’re starting to look around, or if you’re finding yourself less happy and motivated these days. However, this requires a lot of trust in the relationship — that the manager won’t retaliate, won’t fire you, etc — and from what I gather, it seems to be fairly uncommon in the wild. 🙁
Employees do not owe their manager a heads up or a conversation in advance, but this is unequivocally the level of relationship trust we should aim for.
Steph Hippo says, “I love being the manager people want to work for, and it took me a while to figure out how to also be the kind of manager people wanted to have ‘fire’ them by helping them move on. I’m really proud of how many people I’ve been able to help move off my team because we found a better fit. Doing this contributes to your reputation as a leader and as an employer. I found it meaningful if someone that moved on from my team did so on good terms, came back to visit, or sent other people to check out our job listings. That’s a sign that you’re parting with folks on good terms.” 💯
Managers can prove themselves worthy of this trust by not reacting, not retaliating, not treating people any differently, not leaping to conclusions, not running ahead and making decisions or commitments ahead of what the employee has stated.
Should you ever try to change someone’s mind about leaving?
Not never…but rarely. You should always try to understand why someone is leaving. Exit interviews are a great tool here, especially in situations where there has been relationship friction. Departures are a trailing indicator, but often a very powerful signal of things managers should be paying attention to, to make things better for those who remain.
If someone has decided to leave, you’re not going to “save” them via bribery alone. I’ve never seen the tactic of throwing money and titles at someone actually get them to stick around in the long run.
However, I have seen departure announcements get turned around when they include some form of development — when you can identify real underlying sources of discontent, and meet them with action.
Another real life example: A couple years ago, Phillip Carter told us he had decided to leave and take another role in the industry. We had some intense conversations about why that was and what was missing, and realized he had been struggling to connect with the reasons behind what we were building, largely because he had never written or supported code in production during his time as a software engineer. He decided not to leave after that, and he is here to this day.
There will be times when someone has decided to leave, and you want to fight for them to stay. In those situations, you need to get really crystal clear with yourself before taking action. What are the underlying risks to the business, and how far are you prepared to go?
On extremely rare occasions, heroic measures may be the lesser of two evils
Sometimes you may have to try for a diving save. That’s just the reality of doing business, esp at startup stages where you have less redundancy, a shorter planning horizon, more overall chaos and a smaller overall operating budget.
Sometimes your goals are at risk, and you feel like you don’t have a choice. But any time you find yourself bargaining or trying to bribe people to stay after they’ve decided to leave, you should take a hard fucking look at yourself and how you got there, and whether or not you can justify your actions.
Exceptions are often the path of least resistance for the manager making the exception in the moment, but they impose a heavy, compounding cost to the business over time. Any time you make an exception to keep someone, you risk breaking your commitments to everyone else. And rumors about exceptions being made will fly fast and furious (sometimes it seems like there is a 10-20x multiplier of rumors to reality). 😣
I will not sit here and tell you no exceptions can be ever made. Systems made of people are systems that are never perfect. Once in a while, making an exception might actually the way to restore justice to a situation. Other times your ass is well and truly backed into a corner. But exceptions are SO costly to your credibility, you must at least build peer review and consequences for exceptions into the system.
A few checks and balances to consider:
Individual managers should not be able to make an exception without the buy-in of their director, VP, and people team
It should generally trigger some kind of review of the system policy in question, to see if it still serves its purpose
You should be able to look in each other’s eyes and explain your reasoning, and not feel ashamed of it if word gets out
Shit does happen. But if this kind of shit happens on the regular, you can’t blame people for becoming extremely cynical about the way you do business, and you can expect to get way more people trying to game the system to get the same results for themselves.
People should not use threats of leaving to try and effect change or get raises. This should not be an effective tactic — and in order for it not to be an effective tactic, we cannot reward it with results. When you make exceptions, you all but guarantee more people will try this.
People work at jobs for money, but not only money
While writing this piece, a friend told me a story about when he became an engineering manager a decade ago, and soon noticed that his two women engineers were the lowest paid and the lowest leveled people on the team, which didn’t seem to correlate with their actual skills or experience. He asked his own manager what was up with this, and the response he received was: “Well yeah, neither of them has ever been a flight risk.”
This kind of attitude is, to put it politely, a fucking cancer on our industry.
There are two radically different philosophies when it comes to corporate compensation. In the first scenario, you pay people as little as possible, and consider it your job as a manager to extract the most work out of people for the least pay. Information is power, so information asymmetry is endemic in these environments, and people are paid according to their skill at negotiation or brinksmanship. You typically blow your wad trying to compete for the “best” talent in the world.
In the second scenario, you do your best to compensate employees fairly and competitively, balancing their needs and wants against other stakeholders and the overarching mandate for the company as a whole to succeed. You practice transparency and show your work, and actively work to counter systemic biases. You understand you can’t compete for every great hire out there, but you try to equip people with the information they need to evaluate whether or not you are mutually a good fit.
Companies that operate according to the first scenario are so alienating and toxic (and almost certainly illegal, in many cases) that few will openly claim to be this kind of company. Most companies at least pay lip service to equity and fairness. But because everyone is typically mouthing the same kind of things, employees will scrutinize your actions far more than your words, especially when it comes to comp.
At the end of the day, these are jobs. People work at jobs for money, but not only money. I think we would all be better off if we could get better at articulating the tangible and intangible rewards of our labor, treating each other with dignity and honesty, and being straightforward about our needs and wants and goals on both sides, instead of treating comp like some kind of high stakes casino game.