Is It Time To Version Observability? (Signs Point To Yes)

August 7, 2024August 9, 2024 mipsytipsycanonical logs, columnar storage, databases, devops, metrics, monitoring, observability 2.010 Comments

Augh! I am so behind on so much writing, I’m even behind on writing shit that I need to reference in order to write other pieces of writing. Like this one. So we’re just gonna do this quick and dirty on the personal blog, and not bother bringing it up to the editorial standards of…anyone else’s sites. 😬

If you’d rather consume these ideas in other ways:

I gave a keynote at SRECon in March
Here is a slide deck of my slides from CTO Craft Con London in May
A Screaming In The Cloud podcast with Corey Quinn in April
My piece earlier in the year on the Cost Crisis in Observability Tooling touched on some of the concepts too
Matt Sanabria wrote a great piece comparing us and a bunch of other observability vendors in 2024.

What does observability mean? No one knows

In 2016, we first borrowed the term “observability” from the wikipedia entry for control systems observability, where it is a measure of your ability to understand internal system states just by observing its outputs. We (Honeycomb) then spent a couple of years trying to work out how that definition might apply to software systems. Many twitter threads, podcasts, blog posts and lengthy laundry lists of technical criteria emerged from that work, including a whole ass book.

In 2018, Peter Bourgon wrote a blog post proposing that “observability has three pillars: metrics, logs and traces. Ben Sigelman did a masterful job of unpacking why metrics, logs and traces are just telemetry. However, lots of people latched on to the three pillars language: vendors because they (coincidentally!) had metrics products, logging products, and tracing products to sell, engineers because it described their daily reality.

Since then the industry has been stuck in kind of a weird space, where the language used to describe the problems and solutions has evolved, but the solutions themselves are largely the same ones as five years ago, or ten years ago. They’ve improved, of course — massively improved — but structurally they’re variations on the same old pre-aggregated metrics.

It has gotten harder and harder to speak clearly about different philosophical approaches and technical solutions without wading deep into the weeds, where no one but experts should reasonably have to go.

This is what semantic versioning was made for

Look, I am not here to be the language police. I stopped correcting people on twitter back in 2019. We all do observability! One big happy family. 👍

I AM here to help engineers think clearly and crisply about the problems in front of them. So here we go. Let’s call the metrics, logs and traces crowd — the “three pillars” generation of tooling — that’s “Observability 1.0“. Tools like Honeycomb, which are built based on arbitrarily-wide structured log events, a single source of truth — that’s “Observability 2.0“.

Here is the twitter thread where I first teased out the differences between these generations of tooling (all the way back in December, yes, that’s how long I’ve been meaning to write this 😅).

About two months ago I wrote this thread about how we lost the battle to define ✨observability✨ -- to give it a real, specific, falsifiable technical definition, distinct from monitoring or telemetry.

I complained, I argued, I grieved...and now I'm over it. SO over it. 🙄 https://t.co/QH3U0iZZ8G
— Charity Majors (@mipsytipsy) December 22, 2023

This is literally the problem that semantic versioning was designed to solve, by the way. Major version bumps are reserved for backwards-incompatible, breaking changes, and that’s what this is. You cannot simultaneously store your data across both multiple pillars and a single source of truth.

Incompatible. Breaking change. O11y 1.0, meet O11y 2.0.

small technical changes can unlock waves of powerful sociotechnical transformation

There are a LOT of ramifications and consequences that flow from this one small change in how your data gets stored. I don’t have the time or space to go into all of them here, but I will do a quick overview of the most important ones.

The historical analogue that keeps coming to mind for me is virtualization. VMs are old technology, they’ve been around since the 70s. But it wasn’t until the late 90s that VMware productized it, unlocking wave after wave of change, from cloud computing and SaaS to the very DevOps movement itself.

I believe the shift to observability 2.0 holds a similarly massive potential for change, based on what I see happening today, with teams who have already made the leap. Why? In a word, precision. O11y 1.0 can only ever give you aggregates and random exemplars. O11y 2.0, on the other hand, can tell you precisely what happened when you flipped a flag, deployed to a canary, or made any other change in production.

Will these waves of sociotechnical transformation ever be realized? Who knows. The changes that get unlocked will depend to some extent on us (Honeycomb), and to an even greater extent on engineers like you. Anyway, I’ll talk about this more some other time. Right now, I just want to establish a baseline for this vocabulary.

1.0 vs 2.0: How does the data get stored?

1.0💙 O11y 1.0 has many sources of truth, in many different formats. Typically, you end up storing your data across metrics, logs, traces, APM, RUM, profiling, and possibly other tools as well. Some folks even find themselves falling back to B.I. (business intelligence) tools like Tableau in a pinch to understand what’s happening on their systems.

Each of these tools are siloed, with no connective tissue, or only a few, predefined connective links that connect e.g. a specific metric to a specific log line. Aggregation is done at write time, so you have to decide up front which data points to collect and which questions you want to be able to ask. You may find yourself eyeballing graph shapes and assuming they must be the same data, or copy-pasting IDs around from logging to tracing tools and back.

2.0 💚 Data gets stored in arbitrarily-wide structured log events (often called “canonical logs“), often with trace and span IDs appended. You can visualize the events over time as a

trace, or slice and dice your data to zoom in to individual events, or zoom out to a birds-eye view. You can interact with your data by group by, break down, etc.

You aggregate at read time, and preserve raw events for ad hoc querying. Hopefully, you derive your SLO data from the same data you query! Think of it as B.I. for systems/app/business data, all in one place. You can derive metrics, or logs, or traces, but it’s all the same data.

1.0 vs 2.0: on metrics vs logs

1.0 💙 The workhorse of o11y 1.0 is metrics. RUM tools are built on metrics to understand browser user sessions. APM tools are built using metrics to understand application performance. Long ago, the decision was made to use metrics as the source of truth for telemetry because they are cheap and fast, and hardware used to be incredibly expensive.

The more complex our systems get, the worse of a tradeoff this becomes. Metrics are a terrible building block for understanding rich data, because you have to discard all that valuable context at write time, and they don’t support high (or even medium!) cardinality data. All you can do to enrich the data is via tags.

Metrics are a great tool for cheaply summarizing vast quantities of data. They are not equipped to help you introspect or understand complex systems. You will go broke and go mad if you try.

2.0 💚 The building block of o11y 2.0 is wide, structured log events. Logs are infinitely more powerful, useful and cost-effective than metrics are because they preserve context and relationships between data, and data is made valuable by context. Logs also allow you to capture high cardinality data and data relationships/structures, which give you the ability to compute outliers and identify related events.

1.0 vs 2.0: Who uses it, and how?

1.0 💙 Observability 1.0 is predominantly about how you operate your code. It centers around errors, incidents, crashes, bugs, user reports and problems. MTTR, MTTD, and reliability are top concerns.

O11y 1.0 is typically consumed using static dashboards — lots and lots of static dashboards. “Single pane of glass” is often mentioned as a holy grail. It’s easy to find something once you know what you’re looking for, but you need to know to look for it before you can find it.

2.0 💚 If o11y 1.0 is about how you operate your code, o11y 2.0 is about how you develop your code. O11y 2.0 is what underpins the entire software development lifecycle, enabling engineers to connect feedback loops end to end so they get fast feedback on the changes they make, while it’s still fresh in their heads. This is the foundation of your team’s ability to move swiftly, with confidence. It isn’t just about understanding bugs and outages, it’s about proactively understanding your software and how your users are experiencing it.

Thus, o11y 2.0 has a much more exploratory, open-ended interface. Any dashboards should be dynamic, allowing you to drill down into a question or follow a trail of breadcrumbs as part of the debugging/understanding process. The canonical question of o11y 2.0 is “here’s a thing I care about … why do I care about it? What are all of the ways it is different from all the other things I don’t care for?”

When it comes to understanding your software, it’s often harder to identify the question than the answer. Once you know what the question is, you probably know the answer too. With o11y 1.0, it’s very easy to find something once you know what you’re looking for. With o11y 2.0, that constraint is removed.

1.0 vs 2.0: How do you interact with production?

1.0 💙 You deploy your code and wait to get paged. 🤞 Your job is done as a developer when you commit your code and tests pass.

2.0 💚 You practice observability-driven development: as you write your code, you instrument it. You deploy to production, then inspect your code through the lens of the instrumentation you just wrote. Is it behaving the way you expected it to? Does anything else look … weird?

Your job as a developer isn’t done until you know it’s working in production. Deploying to production is the beginning of gaining confidence in your code, not the denouement.

1.0 vs 2.0: How do you debug?

1.0 💙 You flip from dashboard to dashboard, pattern-matching and looking for similar shapes with your eyeballs.

You lean heavily on intuition, educated guesses, past experience, and a mental model of the system. This means that the best debuggers are ALWAYS the engineers who have been there the longest and seen the most.

Your debugging sessions are search-first: you start by searching for something you know should exist.

2.0 💚 You check your instrumentation, or you watch your SLOs. If something looks off, you see what all the mysterious events have in common, or you start forming hypotheses, asking a question, considering the result, and forming another one based on the answer. You interrogate your systems, following the trail of breadcrumbs to the answer, every time.

You don’t have to guess or rely on elaborate, inevitably out-of-date mental models. The data is right there in front of your eyes. The best debuggers are the people who are the most curious.

Your debugging questions are analysis-first: you start with your user’s experience.

1.0 vs 2.0: The cost model

1.0 💙 You pay to store your data again and again and again and again, multiplied by all the different formats and tool types you are paying to store it in. Cost goes up at a multiplier of your traffic increase. I wrote a whole piece earlier this year on the cost crisis in observability tooling, so I won’t go into it in depth here.

As your costs increase, the value you get out of your tools actually decreases.

If you are using metrics-based products, your costs go up based on cardinality. “Custom metrics” is a euphemism for “cardinality”; “100 free custom metrics” actually means “100 free cardinality”, aka unique values.

2.0 💚 You pay to store your data once. As your costs go up, the value you get out goes up too. You have powerful, surgical options for controlling costs via head-based or tail-based dynamic sampling.

You can have infinite cardinality. You are encouraged to pack hundreds or thousands of dimensions in per event, and any or all of those dimensions can be any data type you want. This luxurious approach to cardinality and data is one of the least well understood aspects of the switch from o11y 1.0 to 2.0.

Many observability engineering teams have spent their entire careers massaging cardinality to control costs. What if you just .. didn’t have to do that? What would you do with your lives? If you could just store and query on all the crazy strings you want, forever? 🌈

Metrics are a bridge to our past

Why are observability 1.0 tools so unbelievably, eyebleedingly expensive? As anyone who works with data can tell you, this is always what happens when you use the wrong tool for the job. Once again, metrics are a great tool for summarizing vast quantities of data. When it comes to understanding complex systems, they flail.

I wrote a whole whitepaper earlier this year that did a deep dive into exactly why tools built on top of metrics are so unavoidably costly. If you want the gnarly detail, download that.

The TLDR is this: tools built on metrics — whether RUM, APM, dashboards, etc — are a bridge to our past. If there’s one thing I’m certain of, it’s that tools built on top of wide, structured logs are the bridge to our future.

Wide, structured log events are the bridge to our future

Five years from now, I predict that the center of gravity will have swung dramatically; all modern engineering teams will be powering their telemetry off of tools backed by wide, structured log events, not metrics. It’s getting harder and harder and harder to try and wring relevant insights out of metrics-based observability tools. The end of the ZIRP era is bringing unprecedented cost pressure to bear, and it’s simply a matter of time.

The future belongs to tools built on wide, structured log events — a single source of truth that you can trace over time, or zoom in, zoom out, derive SLOs from, etc.

It’s the only way to understand our systems in all their skyrocketing complexity. This constant dance with cost vs cardinality consumes entire teams worth of engineers and adds zero value. It adds negative value.

And here’s the weirdest part. The main thing holding most teams back psychologically from embracing o11y 2.0 seems to be the entrenched difficulties they have grappling with o11y 1.0, and their sense that they can’t adopt 2.0 until they get a handle on 1.0. Which gets things exactly backwards.

Because observability 2.0 is so much easier, simpler, and more cost effective than 1.0.

observability 1.0 is the hard way

It’s so fucking hard. We’ve been doing it so long that we are blind to just how HARD it is. But trying to teach teams of engineers to wrangle metrics, to squeeze the questions they want to ask into multiple abstract formats scattered across many different tools, with no visibility into what they’re doing until it comes out eventually in form of a giant bill… it’s fucking hard.

Observability 2.0 is so much simpler. You want data, you just toss it in. Format? don’t care. Cardinality? don’t care.

You want to ask the question, you just ask it. Format? don’t care.

Teams are beating themselves up trying to master an archaic, unmasterable set of technical tradeoffs based on data types from the 80s. It’s an unwinnable war. We can’t understand today’s complex systems without context-rich, explorable data.

We need more options for observability 2.0 tooling

My hope is that by sketching out these technical differences between o11y 1.0 and 2.0, we can begin to collect and build up a vendor-neutral library of o11y 2.0 options for folks. The world needs more options for understanding complex systems besides just Honeycomb and Baselime.

The world desperately needs an open source analogue to Honeycomb — something built for wide structured events, stored in a columnar store (or even just Clickhouse), with an interactive interface. Even just a written piece on how you solved it at your company would help move the industry forward.

My other hope is that people will stop building new observability startups built on metrics. Y’all, Datadog and Prometheus are the last, best metrics-backed tools that will ever be built. You can’t catch up to them or beat them at that; no one can. Do something different. Build for the next generation of software problems, not the last generation.

If anyone knows of anything along these lines, please send me links? I will happily collect them and signal boost. Honeycomb is a great, lifechanging tool (and we have a generous free tier, hint hint) but one option does not a movement make.

<3 charity

P.S. Here’s a great piece written by Ivan Burmistrov on his experience using observability 2.0 type tooling at Facebook — namely Scuba, which was the inspiration for Honeycomb. It’s a terrific piece and you should read it.

P.P.S. And if you’re curious, here’s the long twitter thread I wrote in October of 2023 on how we lost the battle to define observability:

So, we lost the battle to define observability. You know it, I know it. Observability was supposed to *mean* something, and in the early days, it did.

"Observability" once meant the kind of exploratory, open ended investigation our systems increasingly demand.
— Charity Majors (@mipsytipsy) October 30, 2023

Pragmatism, Neutrality and Leadership

July 24, 2024July 25, 2024 mipsytipsyculture, leadership, management, neutrality, politics, tech culture2 Comments

Every year or so, some tech CEO does something massively stupid, like declaring “No politics at work!”, or “Trump voters are oppressed and live in fear!”, and we all get a good pained laugh over how out of touch and lacking in self-awareness they are.

We hear a lot about the howlers, and much less about the practical challenges leaders face in trying to create a work environment where people from vastly different backgrounds and belief systems come together in peace to focus on the mission and do good work. Or how that intersects with the deeply polarizing events that now seem to shatter our world every other week — invasions, Supreme Court rulings, elections, school shootings, and the like.

Are we supposed to speak up or stay silent? Share our own beliefs, or take a studiously neutral stance? What do we do if half of the company is numb and reeling with grief, and the other half is bursting with joy? Nothing at all? That feels inhumane. Is the reality that we live in a world where we can only live, work, and interact with people who already agree with us and our political beliefs? God, I hope not. 🙁

This has been on my mind a lot recently. We are 103 days out from a US Presidential election, and it’s going to get worse before it gets better.

So here goes.

Caveats, challenges and cautionary tales

There are some immediate challenges to things I’m trying to say here. A couple:

The term “politics”, much like the term “technical debt”, can mean way too many things. Local, regional or national electoral politics; activities associated with power distribution or resource allocation; influence peddling or status seeking behaviors, putting your needs above the good of the group, and so much more. Therefore I will use the term sparingly, and prefer more specific language where possible.

I don’t often do this, but I am explicitly addressing this piece to other founders and execs. Not because it doesn’t apply to people in other roles; it does. It just got really wordy trying to account for all the possible variations on role, scope and perspective involved.

As a leader, your job is to succeed

This might sound obvious, possibly to the point of idiocy. Yet I think it bears repeating. For all the mountains of forests of trees worth of books that get written every year on leadership, it remains the case that nobody knows what the fuck they’re doing.

“I think great leaders treat money like oxygen: they make sure there is plenty of it, and understand that if you’re talking about it all of the time you’re in deep shit and better take drastic actions to make sure you have enough.” ~ Mark Ferlatte

As a founder or leader of a venture-backed startup or public company, your #1 job is to make the business succeed. Success comes first. It’s Maslow’s hierarchy of needs all over again; you must ensure your company’s continued existence before you earn the right to tinker.

Success in business is what earns you the right to devote more time, attention, and resources to cultural issues, and to experiment with things that matter to you.

One of the most common ways that leaders fail is that they get so bogged down in the daily chaos of running the company, managing a team, raising money, responding to crises and scoring OKRs is that they struggle to keep the focus zeroed in on the most important thing: succeeding at your mission.

Know your mission, craft a strategy, execute

And how do you do that?

Know your mission, craft a strategy, and execute. It’s as simple and straightforward as it is unbelievably difficult and devastatingly complicated.

The system exists to fulfill the mission. I’ve written before about systems thinking in organizations, how hierarchy emerges to benefit the workers, how we look up for purpose and down for function.

Your mission is what brings people together to collectively build something that they could not do as individuals. The more crisp and well articulated your mission, the more employees can tie the work they do back to the mission, the more meaningful their daily work is likely to feel.

Your culture serves the business, not the other way around

A great culture can’t compensate for a weak product that users don’t want. If people want to work at your company more than they want to use your product, that’s a bad sign.

A company culture with tremendous energy, ownership and transparency can be an accelerant to your business, it can grant you unique advantages, and it can help mitigate risks. But it is not why you exist. Your mission is why you exist.

It would be nice to believe that having a warm, supportive culture, with friendly people and four day work weeks, could guarantee success, or at least give you a reliable advantage. Wouldn’t it?

Companies with shitty cultures win all the time

We’ve all watched companies become wildly successful under assholes, while waves of employees leave broken and burned out. I wish this wasn’t true, but it is. People’s lives and careers are just another externality as far as the corporate books are concerned.

Many live through this nightmare and emerge dead set on doing things differently. And so, when they become founders or leaders, they put culture ahead of the business. And then they lose.

Most companies fail, and if you aren’t hungry and zeroed in on the success of your business, your slim chances become even slimmer.

I don’t believe this has to be either/or, cultural success or business success. I think it’s a false dichotomy. I believe that healthy companies can be more successful than shitty ones, all else being equal. Which is why I believe that leaders who care about building a workplace culture rooted in dignity and respect have a responsibility to care even more about success in business. Let’s show these motherfuckers how it’s done. Nothing succeeds like success.

Good culture is rooted in organizational health

Six questions for organizational health, from “The Advantage” by Patrick Lencioni

I feel like a big reason so many leaders get twisted up here is by trying to make employees happy instead of driving organizational health. This is a huge topic, and I won’t go deep on it here, but my understanding of organizational health owes a lot to “The Advantage: Why Organizational Health Trumps Everything Else In Business”, by Patrick Lencioni, with honorable mention going to “ Good Strategy/Bad Strategy”, by Richard Rumelt.

A terrific company culture begins with organizational health: a competent, experienced leadership team that trusts each other, a mission, and a strategy, clarity and good communication. If everyone in the company knows what the most important thing is, and their actions align with that, your company is probably pretty healthy.

People’s feelings matter, and you should treat them with dignity and respect, but you can’t be driven by them. You have to let go of underperformers, deliver hard feedback, set high standards and hold people accountable. A lot of this does not feel good.

You will make mistakes. Things will fail. You will have to spin down teams, or entire orgs. People are going to have huge emotional reactions about your decisions and take things personally. They’ll be angry with you and disagree with your decisions. They will blame you, and maybe they should.

If you do your job well, with some luck, many people will be happy, much of the time. But if your goal is to make people happy, you will fail, and then everyone will be unhappy. Feelings are a trailing indicator and only roughly, occasionally a sign that you are doing a good job.

Survive in the short term, but live your values in the longer term

Most companies have seen times where all of the options seem like bad ones, even a betrayal of their values. There are times that hurt your conscience, or rouse up anger and cynicism in the ranks. Some hypothetical examples:

When you’re doing layoffs to save the company, and realize the list is disproportionately made up of marginalized groups 💔
When you have an all-male exec team, and desperately need a new engineering leader, but all of the qualified candidates in your pipeline are men
When you had to let someone go for cause, and they’re going around publicly lying about what happened but you can’t respond

These things happen. And when they do, you have a legal and ethical responsibility to make the decision that is right for the company, every time.

And yet.

You must remind yourself as you do, uneasily, queasily, how easily “I didn’t have a choice” can slip from reason to excuse. How quickly “this isn’t the right time” turns into “never the right time”. You know this, I know this, and I guarantee you every one of your employees knows this.

Don’t expect them to give you the benefit of the doubt. Why should they? They’ve heard this shit a million times. Don’t get mad, just do your job.

Living your values takes planning and sacrifice

No halfway decent leader spends ALL their time reacting to the burning bushes in front of their faces. Being a leader means planning for the future, so you can do better next time.

So you had to make a tough decision, and the optics (and maybe the reality) of it are terrible. Okay. It happens. Don’t just wince and put it behind you. If you don’t take steps to change things, you’re going to face the same bad choices next time.

What will you do differently?
Why were there no good alternatives?
What will the right time look like? How will you know?
How will you do a better job of recruiting, retaining, or setting them up for success?

If you don’t spend time, money, attention, or political capital on it, you don’t care about it, by definition. And it is a thousand times worse to claim you value something, and then demonstrate with your actions that you don’t care, than to never claim it in the first place.

Your resources are limited, and you must spend them with purpose

As an exec, you get a very limited amount of people’s time and attention — maybe a few minutes per week, or per month. Don’t waste them.

Jess Mink, our director of platform engineering, has a lovely story about this. They work with local search and rescue teams, which are staffed by people all over the political spectrum. The mission is crystal clear; all of them know why they’re there, and they don’t talk about things that aren’t tied to the mission. Yet Jess is giving a talk about pronouns at their next training. Why?

”Because there’s a really crisp, clear mission, I can say, I don’t care what your politics are. I’m not asking you to change your beliefs, but this is the impact of what you’re doing on these people that you’ve said you’re here to help.” ~ Jess Mink

There are a million things in the world you could say or do that would have intrinsic value. Why this thing? You should have a reason, and it should connect to your mission or your strategy for achieving it, or you are just muddying the waters.

Should political speech at work be a free-for-all?

Many leaders have opted to ban political speech at work. What’s the alternative, a free-for-all? Trump gifs and ~~Biden~~ Harris banners and a heated debate on the border in #general?

Please. Nobody wants that. Most folks seem to understand that work Slack is not the place for proselytizing or stirring up shit. There’s an element of good judgment here that extends well beyond political speech to include other disruptive actions such as criticizing religious beliefs, oversharing extremely personal info, posting sexy selfies, or good old verbal diarrhea. These are all, shall we say, “good coaching opportunities”. You don’t have to ban all political speech just to enforce reasonable norms.

In general, people want to work in an environment that is relatively peaceful and neutral-feeling, where people can focus on their work and our shared mission. But people also need spaces to talk about what’s going on in their lives and process their reactions.

At Honeycomb, we prefix all non-work slack channels with #misc. We have #misc-bible-reading-group, #misc-politics, #misc-book-club, #misc-shoes-and-fashion, #misc-so-fuzzy (for pictures of people’s pets).

People don’t join those channels automatically upon being hired — you have to seek them out, and you can leave them just as easily. Nobody has to worry about missing out on critical work conversations co-mingling with off putting political speech. And it’s easy to redirect non-work chatter out of work channels.

The value (and limitations) of neutrality

Neutral spaces are a good thing — a societal necessity. However, it becomes a problem when it fails to honor the paradox of tolerance — that if we tolerate the intolerant, intolerance will ultimately dominate. We cannot be equally tolerant of gay people and people who hate gay people, in other words.

At their worst, statements of neutrality punish the victimized and protect the victimizers. As Yonatan Zunger puts it, in one of my favorite essays of all time, “Tolerance is not a moral absolute; it is a peace treaty.”

But even peace treaties have their limits. Some problems are just fucking hard . As Emily put it,

“What does it mean to feel silence from the majority of your coworkers on a topic that feels like life and death to you? In normal times, silence can seem like a lack of political speech; in extraordinary times, silence speaks volumes. This creates division, even if your coworkers have landed there through ignorance or low awareness.” ~ Emily Nakashima

The hard thing about hard things is that they’re really fucking hard. There is no playbook. I can’t solve them for you here. Every situation is unique, and the details matter — details really matter, in fact. You can only take each situation as it comes with humility, sensitivity, and a willingness to listen.

Good leaders don’t invite unnecessary controversy

If you are a CEO or founder, especially, the things you say will be heard as representing the views of your company. Period. Keep this in mind, and try to be extra respectful and responsible. You don’t want your big mouth to accidentally create a wave of distraction and drama for people throughout your company to have to deal with. Your opinions are more than just your opinions.

If you’re thinking that I’m an odd person to be delivering this particular message, I sheepishly acknowledge the truth of this.

If you work at a company where the CEO and leadership openly espouse a particular set of partisan beliefs, you are inevitably going to feel somewhat othered. You wonder uncomfortably whether or not they are aware you hold different beliefs. If so, will you be promoted, will you be given equal opportunities? Would your leaders like you as much as they like employees who share their political convictions? Would they be as willing to chat with you or hang out with you? Does it matter?

People aren’t wrong to be concerned. There’s scads of research that shows how much we automatically prefer people who are more like us. It’s automatic — it’s natural. That doesn’t mean it’s right. Nor is it inevitable. We have to work harder to give an equal shot to those who aren’t like us, and we should do that.

Good leaders don’t make it all about them

One of the hardest parts about being a good leader is managing your ego, and keeping it from taking center stage or making things worse.

I have done and said a lot of dumb things online, but the worst of them was probably during the Black Lives Matter protests of 2020. I was trying to express my support, so I tweeted something about how actions matter more than words, and that we were trying to help by building a workplace where Black employees could thrive, or something like that. I don’t remember exactly (and the tweets are gone), but it was awful. I made it about us; it was super tone deaf. And I got whaled on, in a way that really threw me for a loop. I tried to apologize and made it worse. Friends blocked me.

It took me a long time to process the experience and come to terms with my mistakes: first, by framing my comments almost like a promo for how great honeycomb was, and second, by reacting so defensively when called out over it.

You don’t need to have a take on everything. And the more you have a track record of taking stances on issues, the more it’s expected of you, and the more dicey it becomes, because even not taking a stance is taking a stance.

Good leaders look for ways to de-escalate

Any time the conversation sails into the terrain of morals and ethics, it’s an automatic escalation. It raises the stakes, it exacerbates differences. It can transform an ordinary, practical matter into the forces of good versus evil in the blink of an eye.

There are bright lines and moral dilemmas in business. (Should you pay women less than men for the same work? No.) But most of our everyday work doesn’t need to be so emotionally fraught.

An example may help here. When you have a geographically distributed company, you have two basic choices when it comes to comp philosophy:

have a single set of comp bands, which apply no matter where you live
peg their salary to their local cost of living

When this question first came up, back in 2019, I came out swinging for the fences on option 1). I treated it like a moral question, a matter of basic human equity. “What kind of company would dare pay you less money based on where you live? What business is it of theirs where you live?” — that sort of thing.

In this, I was hardly alone. A lot of people have really strong feelings about this (I still have some pretty strong feelings about it 😬). But there are also some pretty reasonable arguments for and circumstances in which geo wage arbitrage makes a lot of sense, and can offer more opportunities to more people than you could otherwise afford. It’s not as simple as I made it out to be.

Having taken such a strong stance though, I have definitely made it extremely difficult for our finance team to change that policy, should we ever decide to.

Good leaders turn the volume down. They dampen drama, they don’t amplify it. They don’t ratchet up the stakes or the rhetoric, they look for practical solutions where possible.

Good leaders connect the culture to the mission

I started off as one of those leaders who cared more about culture than the business. In honesty, I assumed we’d fail. I never planned to start a company, it was an impulse decision. I really didn’t think I’d have to be the CEO. I wasn’t equipped for the job; I didn’t even know the difference between sales and marketing. I did however have MANY strong opinions on company culture.

The first few years of Honeycomb, any time I thought of some neat thing to try, I did it. Put an employee on the board? Yes! Run regular ethics discussions? Hell yes! Put together cross-functional teams to discuss company values? Cool!

I don’t regret it, precisely; I think it played a role in instilling a culture of curiosity and ownership. I think it helped us figure out who we were.

But as we grow past 200 people, and as the pace of growth accelerates, I am increasingly aware of the opportunity cost of these experiments. It doesn’t mean we don’t do things like this anymore, but there needs to be a much better reason than “Charity thinks it would be cool.” It needs to add up to something bigger.

Good leaders have conviction, and don’t pretend to give a shit when they don’t

I appreciate it when leaders do real talk about their values and how they make decisions. Too many leaders hide behind the bland slogans of corporate piety, in ways that tell you nothing about how they make decisions or where their priorities lie when the chips are down.

Honestly, I would rather work for someone who holds different values than I do, but who seems honest and consistent and fair-minded in their decision-making, than someone who holds the same values but whose decisions seem impulsive and subjective.

This is a business, not a family. If I believe in the mission, and the leaders and I align on the facts, and I respect their integrity and the way they make decisions, that matters more.

As it turns out, all of this has been said before…by my antagonists?!? Oh dear…

As I was wrapping up this article, I went back and read a few of the pieces written by and about the companies who banned political speech, and my mouth literally dropped open.

You could copy-paste entire sections between my article and theirs, without anyone knowing the difference.

Companies exist for the sake of their mission, check. They don’t have to have a take on everything, check. Your work day shouldn’t consist of arguments over abortion and other hot button topics, check. It IS distracting. It’s NOT why you’re here. Uh…

How can I have written the same fucking article as theirs, and come to such a radically different conclusion?

Or is it that radically different? After all, I’m not out here advocating a free-for-all, or that companies should take a stand on every social issue of the day. I actually pretty much agree with most of the sentiments these founders wrote in their official posts on the matter.

Shit?

I was sitting here having a legit internal crisis, and then I stumbled into some other pieces, where rank-and-file employees were talking about the changes and what led up to them.

Employees say the founders’ memos unfairly depicted their workplace as being riven by partisan politics, when in fact the main source of the discussion had always been Basecamp itself.

“At least in my experience, it has always been centered on what is happening at Basecamp,” said one employee. “What is being done at Basecamp? What is being said at Basecamp? And how it is affecting individuals? It has never been big political discussions, like ‘the postal service should be disbanded,’ or ‘I don’t like Amy Klobuchar.’

The whole article is required reading. It goes on to detail a hair-raising amount of hypocrisy and high-handed behaviors by the Basecamp founders; a bunch of workers who self-organized to improve internal hiring practices and culture, and how they got shut down.

“There’s always been this kind of unwritten rule at Basecamp that the company basically exists for David and Jason’s enjoyment,” one employee told me. “At the end of the day, they are not interested in seeing things in their work timeline that make them uncomfortable, or distracts them from what they’re interested in. And this is the culmination of that.”

Then there was this damning piece from the NYTimes about the appalling way Black employees were treated at Coinbase, and this one, which closes with an anecdote about the Coinbase CEO tweeting out his own (noxious) political views in direct contradiction of his own policies. Oopsie-daisy. 🌼

Are these policies designed to protect the mission, or the CEO?

All of this paints a very different picture. These bans on political speech seem to be less about protecting the commons from wayward employees who won’t stop distracting everyone with hot button political arguments, and more about employees doing their level best to grapple with real tensions and systemic problems at work — problems that their leaders got sick of hearing about and decided to shut down.

There’s a real stench of “politics for me, but not for thee!” in a lot of these cases, which makes it extra galling. At the beginning of this piece, I noted that “politics” is an obscenely broad category — it can mean almost anything. So when the CEO arrogates to himself the right to define it and silence it, it generates a lot of confusion and uncertainty. That’s bad for the mission!

The fact is, this shit is hard. It’s hard to craft a strategy and execute. It’s hard to train managers to have hard conversations with their employees, or gently de-escalate when things get emotionally fraught. It’s hard to reset expectations on how much of a voice employees can expect to have in a given area. It’s hard to know when to take a stand on principle, and back it with your time and treasure, and when to settle or compromise.

But you signed up for this, bro. It’s part of the job, and you’re getting paid a lot of money to do it. You don’t get to just nope out when the going gets rough.

Just because you made a rule that people can’t talk about the hard stuff, doesn’t mean the hard stuff goes away. It mostly just serves to reinforce whatever power structures and inequities already exist in your company. Which means a lot of people will go on doing just fine, while some are totally fucked. You’ve also shut down all of the reasonable routes for people to advocate for change, so good job, you.

You don’t have to agree with them, but you do have to be respectful to your employees

Look, none of us are perfect. That’s why systems need mechanisms for change. Resiliency isn’t about never breaking the system, it’s about knowing your systems will break, and equipping them with the tools to repair.

If you want to lead a company, you have to deal with the people. It comes with the job.

If you want your people to care as much about the mission as you do, to feel personally invested in its success, to devote whole long stretches of their brilliant, creative, busy lives to helping you make that mission come true…you owe them in return.

If a bunch of your employees are waving a flag and urgently saying “we have a problem”, they are very likely doing you a favor. Either way, they deserve to be heard.

You don’t have to do what they want. But you ought to listen to them, and reserve judgment. Open your eyes. Look around. Do some reading. Talk to people. Consider whether you might be missing something. Then make a decision and give an honest answer. They may or may not agree, and they may or may not choose to stay, but that’s what treating them with respect looks like, just like you ask them to treat you, and each other.

To instead say “Sorry, your feedback is a distraction from the mission and will no longer be tolerated” is so unbelievably disrespectful, and wrapping your decision in the noble flag of the mission is dishonest. It’s hard to tell sometime whether people are deluding themselves or only trying to delude other people, but holy shit, what a doozy.

Good leaders know they will make mistakes, and when they do, they own them, apologize properly, and fix them. They do not use their power to silence people and then swagger around like they own the moral high ground.

Generative AI is not going to build your engineering team for you

June 10, 2024December 21, 2024 mipsytipsyartificial intelligence, code generation, generative AI, hiring, junior engineers, llmsLeave a comment

Originally posted on the Stack Overflow blog on June 10th, 2024

When I was 19 years old, I dropped out of college and moved to San Francisco. I had a job offer in hand to be a Unix sysadmin for Taos Consulting. However, before my first day of work I was lured away to a startup in the city, where I worked as a software engineer on mail subsystems.

I never questioned whether or not I could find work. Jobs were plentiful, and more importantly, hiring standards were very low. If you knew how to sling HTML or find your way around a command line, chances were you could find someone to pay you.

Was I some kind of genius, born with my hands on a computer keyboard? Assuredly not. I was homeschooled in the backwoods of Idaho. I didn’t touch a computer until I was sixteen and in college. I escaped to university on a classical performance piano scholarship, which I later traded in for a peripatetic series of nontechnical majors: classical Latin and Greek, musical theory, philosophy. Everything I knew about computers I learned on the job, doing sysadmin work for the university and CS departments.

In retrospect, I was so lucky to enter the industry when I did. It makes me blanch to think of what would have happened if I had come along a few years later. Every one of the ladders my friends and I took into the industry has long since vanished.

To some extent, this is just what happens as an industry matures. The early days of any field are something of a Wild West, where the stakes are low, regulation nonexistent, and standards nascent. If you look at the early history of other industries—medicine, cinema, radio—the similarities are striking.

There is a magical moment with any young technology where the boundaries between roles are porous and opportunity can be seized by anyone who is motivated, curious, and willing to work their asses off.

It never lasts. It can’t; it shouldn’t. The amount of prerequisite knowledge and experience you must have before you can enter the industry swells precipitously. The stakes rise, the magnitude of the mission increases, the cost of mistakes soars. We develop certifications, trainings, standards, legal rites. We wrangle over whether or not software engineers are really engineers.

Nowadays, you wouldn’t want a teenaged dropout like me to roll out of junior year and onto your pager rotation. The prerequisite knowledge you need to enter the industry has grown, the pace is faster, and the stakes are much higher, so you can no longer learn literally everything on the job, as I once did.

However, it’s not like you can learn everything you need to know at college either. A CS degree typically prepares you better for a life of computing research than life as a workaday software engineer. A more practical path into the industry may be a good coding bootcamp, with its emphasis on problem solving and learning a modern toolkit. In either case, you don’t so much learn “how to do the job” as you do “learn enough of the basics to understand and use the tools you need to use to learn the job.”

Software is an apprenticeship industry. You can’t learn to be a software engineer by reading books. You can only learn by doing…and doing, and doing, and doing some more. No matter what your education consists of, most learning happens on the job—period. And it never ends! Learning and teaching are lifelong practices; they have to be, the industry changes so fast.

It takes a solid seven-plus years to forge a competent software engineer. (Or as most job ladders would call it, a “senior software engineer”.) That’s many years of writing, reviewing, and deploying code every day, on a team alongside more experienced engineers. That’s just how long it seems to take.

Here is where I often get some very indignant pushback to my timelines, e.g.:

“Seven years?! Pfft, it took me two years!”

“I was promoted to Senior Software Engineer in less than five years!”

Good for you. True, there is nothing magic about seven years. But it takes time and experience to mature into an experienced engineer, the kind who can anchor a team. More than that, it takes practice.

I think we have come to use “Senior Software Engineer” as shorthand for engineers who can ship code and be a net positive in terms of productivity, and I think that’s a huge mistake. It implies that less senior engineers must be a net negative in terms of productivity, which is untrue. And it elides the real nature of the work of software engineering, of which writing code is only a small part.

To me, being a senior engineer is not primarily a function of your ability to write code. It has far more to do with your ability to understand, maintain, explain, and manage a large body of software in production over time, as well as the ability to translate business needs into technical implementation. So much of the work is around crafting and curating these large, complex sociotechnical systems, and code is just one representation of these systems.

What does it mean to be a senior engineer? It means you have learned how to learn, first and foremost, and how to teach; how to hold these models in your head and reason about them, and how to maintain, extend, and operate these systems over time. It means you have good judgment, and instincts you can trust.

Which brings us to the matter of AI.

It is really, really tough to get your first role as an engineer. I didn’t realize how hard it was until I watched my little sister (new grad, terrific grades, some hands on experience, fiendishly hard worker) struggle for nearly two years to land a real job in her field. That was a few years ago; anecdotally, it seems to have gotten even harder since then.

This past year, I have read a steady drip of articles about entry-level jobs in various industries being replaced by AI. Some of which absolutely have merit. Any job that consists of drudgery such as converting a document from one format to another, reading and summarizing a bunch of text, or replacing one set of icons with another, seems pretty obviously vulnerable. This doesn’t feel all that revolutionary to me, it’s just extending the existing boom in automation to cover textual material as well as mathy stuff.

Recently, however, a number of execs and so-called “thought leaders” in tech seem to have genuinely convinced themselves that generative AI is on the verge of replacing all the work done by junior engineers. I have read so many articles about how junior engineering work is being automated out of existence, or that the need for junior engineers is shriveling up. It has officially driven me bonkers.

All of this bespeaks a deep misunderstanding about what engineers actually do. By not hiring and training up junior engineers, we are cannibalizing our own future. We need to stop doing that.

People act like writing code is the hard part of software. It is not. It never has been, it never will be. Writing code is the easiest part of software engineering, and it’s getting easier by the day. The hard parts are what you do with that code—operating it, understanding it, extending it, and governing it over its entire lifecycle.

A junior engineer begins by learning how to write and debug lines, functions, and snippets of code. As you practice and progress towards being a senior engineer, you learn to compose systems out of software, and guide systems through waves of change and transformation.

Sociotechnical systems consist of software, tools, and people; understanding them requires familiarity with the interplay between software, users, production, infrastructure, and continuous changes over time. These systems are fantastically complex and subject to chaos, nondeterminism and emergent behaviors. If anyone claims to understand the system they are developing and operating, the system is either exceptionally small or (more likely) they don’t know enough to know what they don’t know. Code is easy, in other words, but systems are hard.

The present wave of generative AI tools has done a lot to help us generate lots of code, very fast. The easy parts are becoming even easier, at a truly remarkable pace. But it has not done a thing to aid in the work of managing, understanding, or operating that code. If anything, it has only made the hard jobs harder.

If you read a lot of breathless think pieces, you may have a mental image of software engineers merrily crafting prompts for ChatGPT, or using Copilot to generate reams of code, then committing whatever emerges to GitHub and walking away. That does not resemble our reality.

The right way to think about tools like Copilot is more like a really fancy autocomplete or copy-paste function, or maybe like the unholy love child of Stack Overflow search results plus Google’s “I feel lucky”. You roll the dice, every time.

These tools are at their best when there’s already a parallel in the file, and you want to just copy-paste the thing with slight modifications. Or when you’re writing tests and you have a giant block of fairly repetitive YAML, and it repeats the pattern while inserting the right column and field names, like an automatic template.

However, you cannot trust generated code. I can’t emphasize this enough. AI-generated code always looks quite plausible, but even when it kind of “works”, it’s rarely congruent with your wants and needs. It will happily generate code that doesn’t parse or compile. It will make up variables, method names, function calls; it will hallucinate fields that don’t exist. Generated code will not follow your coding practices or conventions. It is not going to refactor or come up with intelligent abstractions for you. The more important, difficult or meaningful a piece of code is, the less likely you are to generate a usable artifact using AI.

You may save time by not having to type the code in from scratch, but you will need to step through the output line by line, revising as you go, before you can commit your code, let alone ship it to production. In many cases this will take as much or more time as it would take to simply write the code—especially these days, now that autocomplete has gotten so clever and sophisticated. It can be a LOT of work to bring AI-generated code into compliance and coherence with the rest of your codebase. It isn’t always worth the effort, quite frankly.

Generating code that can compile, execute, and pass a test suite isn’t especially hard; the hard part is crafting a code base that many people, teams, and successive generations of teams can navigate, mutate, and reason about for years to come.

So that’s the TLDR: you can generate a lot of code, really fast, but you can’t trust what comes out. At all. However, there are some use cases where generative AI consistently shines.

For example, it’s often easier to ask chatGPT to generate example code using unfamiliar APIs than by reading the API docs—the corpus was trained on repositories where the APIs are being used for real life workloads, after all.

Generative AI is also pretty good at producing code that is annoying or tedious to write, yet tightly scoped and easy to explain. The more predictable a scenario is, the better these tools are at writing the code for you. If what you need is effectively copy-paste with a template—any time you could generate the code you want using sed/awk or vi macros—generative AI is quite good at this.

It’s also very good at writing little functions for you to do things in unfamiliar languages or scenarios. If you have a snippet of Python code and you want the same thing in Java, but you don’t know Java, generative AI has got your back.

Again, remember, the odds are 50/50 that the result is completely made up. You always have to assume the results are incorrect until you can verify it by hand. But these tools can absolutely accelerate your work in countless ways.

One of the engineers I work with, Kent Quirk, describes generative AI as “an excitable junior engineer who types really fast”. I love that quote—it leaves an indelible mental image.

Generative AI is like a junior engineer in that you can’t roll their code off into production. You are responsible for it—legally, ethically, and practically. You still have to take the time to understand it, test it, instrument it, retrofit it stylistically and thematically to fit the rest of your code base, and ensure your teammates can understand and maintain it as well.

The analogy is a decent one, actually, but only if your code is disposable and self-contained, i.e. not meant to be integrated into a larger body of work, or to survive and be read or modified by others.

And hey—there are corners of the industry like this, where most of the code is write-only, throwaway code. There are agencies that spin out dozens of disposable apps per year, each written for a particular launch or marketing event and then left to wither on the vine. But that is not most software. Disposable code is rare; code that needs to work over the long term is the norm. Even when we think a piece of code will be disposable, we are often (urf) wrong.

In that particular sense—generating code that you know is untrustworthy—GenAI is a bit like a junior engineer. But in every other way, the analogy fails. Because adding a person who writes code to your team is nothing like autogenerating code. That code could have come from anywhere—Stack Overflow, Copilot, whatever. You don’t know, and it doesn’t really matter. There’s no feedback loop, no person on the other end trying iteratively to learn and improve, and no impact to your team vibes or culture.

To state the supremely obvious: giving code review feedback to a junior engineer is not like editing generated code. Your effort is worth more when it is invested into someone else’s apprenticeship. It’s an opportunity to pass on the lessons you’ve learned in your own career. Even just the act of framing your feedback to explain and convey your message forces you to think through the problem in a more rigorous way, and has a way of helping you understand the material more deeply.

And adding a junior engineer to your team will immediately change team dynamics. It creates an environment where asking questions is normalized and encouraged, where teaching as well as learning is a constant. We’ll talk more about team dynamics in a moment.

The time you invest into helping a junior engineer level up can pay off remarkably quickly. Time flies. ☺️ When it comes to hiring, we tend to valorize senior engineers almost as much as we underestimate junior engineers. Neither stereotype is helpful.

People seem to think that once you hire a senior engineer, you can drop them onto a team and they will be immediately productive, while hiring a junior engineer will be a tax on team performance forever. Neither are true. Honestly, most of the work that most teams have to do is not that difficult, once it’s been broken down into its constituent parts. There’s plenty of room for lower level engineers to execute and flourish.

The grossly simplified perspective of your accountant goes something like this. “Why should we pay $100k for a junior engineer to slow things down, when we could pay $200k for a senior engineer to speed things up?” It makes no sense!

But you know and I know—every engineer who is paying attention should know—that’s not how engineering works. This is an apprenticeship industry, and productivity is defined by the output and carrying capacity of each team, not each person.

There are lots of ways a person can contribute to the overall velocity of a team, just like there are lots of ways a person can sap the energy out of a team or add friction and drag to everyone around them. These do not always correlate with the person’s level (at least not in the direction people tend to assume), and writing code is only one way.

Furthermore, every engineer you hire requires ramp time and investment before they can contribute. Hiring and training new engineers is a costly endeavor, no matter what level they are. It will take any senior engineer time to build up their mental model of the system, familiarize themselves with the tools and technology, and ramp up to speed. How long? It depends on how clean and organized the codebase is, past experience with your tools and technologies, how good you are at onboarding new engineers, and more, but likely around 6-9 months. They probably won’t reach cruising altitude for about a year.

Yes, the ramp will be longer for a junior engineer, and yes, it will require more investment from the team. But it’s not indefinite. Your junior engineer should be a net positive within roughly the same time frame, six months to a year, and they develop far more rapidly than more senior contributors. (Don’t forget, their contributions may vastly exceed the code they personally write.)

In terms of writing and shipping features, some of the most productive engineers I’ve ever known have been intermediate engineers. Not yet bogged down with all the meetings and curating and mentoring and advising and architecture, their calendars not yet pockmarked with interruptions, they can just build stuff. You see them put their headphones on first thing in the morning, write code all day, and cruise out the door in the evening having made incredible progress.

Intermediate engineers sit in this lovely, temporary state where they have gotten good enough at programming to be very productive, but they are still learning how to build and care for systems. All they do is write code, reams and reams of code.

And they’re energized…engaged. They’re having fun! They aren’t bored with writing a web form or a login page for the 1000th time. Everything is new, interesting, and exciting, which typically means they will do a better job, especially under the light direction of someone more experienced. Having intermediate engineers on a team is amazing. The only way you get them is by hiring junior engineers.

Having junior and intermediate engineers on a team is a shockingly good inoculation against overengineering and premature complexity. They don’t yet know enough about a problem to imagine all the infinite edge cases that need to be solved for. They help keep things simple, which is one of the hardest things to do.

If you ask, nearly everybody will wholeheartedly agree that hiring junior engineers is a good thing…and someone else should do it. This is because the long-term arguments for hiring junior engineers are compelling and fairly well understood.

We need more senior engineers as an industry
Somebody has to train them
Junior engineers are cheaper
They may add some much-needed diversity
They are often very loyal to companies who invest in training them, and will stick around for years instead of job hopping
Did we already mention that somebody needs to do it?

But long-term thinking is not a thing that companies, or capitalism in general, are typically great at. Framed this way, it makes it sound like you hire junior engineers as a selfless act of public service, at great cost to yourself. Companies are much more likely to want to externalize costs like those, which is how we got to where we are now.

However, there are at least as many arguments to be made for hiring junior engineers in the short term—selfish, hard-nosed, profitable reasons for why it benefits the team and the company to do so. You just have to shift your perspective slightly, from individuals to teams, to bring them into focus.

Let’s start here: hiring engineers is not a process of “picking the best person for the job”. Hiring engineers is about composing teams. The smallest unit of software ownership is not the individual, it’s the team. Only teams can own, build, and maintain a corpus of software. It is inherently a collaborative, cooperative activity.

If hiring engineers was about picking the “best people”, it would make sense to hire the most senior, experienced individual you can get for the money you have, because we are using “senior” and “experienced” as a proxy for “productivity”. (Questionable, but let’s not nitpick.) But the productivity of each individual is not what we should be optimizing for. The productivity of the team is all that matters.

And the best teams are always the ones with a diversity of strengths, perspectives, and levels of expertise. A monoculture can be spectacularly successful in the short term—it may even outperform a diverse team. But they do not scale well, and they do not adapt to unfamiliar challenges gracefully. The longer you wait to diversify, the harder it will be.

We need to hire junior engineers, and not just once, but consistently. We need to keep feeding the funnel from the bottom up. Junior engineers only stay junior for a couple years, and intermediate engineers turn into senior engineers. Super-senior engineers are not actually the best people to mentor junior engineers; the most effective mentor is usually someone just one level ahead, who vividly remembers what it was like in your shoes.

A healthy team is an ecosystem. You wouldn’t staff a product engineering team with six DB experts and one mobile developer. Nor should you staff it with six staff+ engineers and one junior developer. A good team is composed of a range of skills and levels.

Have you ever been on a team packed exclusively with staff or principal engineers? It is not fun. That is not a high-functioning team. There is only so much high-level architecture and planning work to go around, there are only so many big decisions that need to be made. These engineers spend most of their time doing work that feels boring and repetitive, so they tend to over-engineer solutions and/or cut corners—sometimes at the same time. They compete for the “fun” stuff and find reasons to pick technical fights with each other. They chronically under-document and under-invest in the work that makes systems simple and tractable.

Teams that only have intermediate engineers (or beginners, or seniors, or whatever) will have different pathologies, but similar problems with contention and blind spots. The work itself has a wide range in complexity and difficulty—from simple, tightly scoped functions to tough, high-stakes architecture decisions. It makes sense for the people doing the work to occupy a similar range.

The best teams are ones where no one is bored, because every single person is working on something that challenges them and pushes their boundaries. The only way you can get this is by having a range of skill levels on the team.

The bottleneck we face now is not our ability to train up new junior engineers and give them skills. Nor is it about juniors learning to hustle harder; I see a lot of solid, well-meaning advice on this topic, but it’s not going to solve the problem. The bottleneck is giving them their first jobs. The bottleneck consists of companies who see them as a cost to externalize, not an investment in their—the company’s—future.

After their first job, an engineer can usually find work. But getting that first job, from what I can see, is murder. It is all but impossible—if you didn’t graduate from a top college, and you aren’t entering the feeder system of Big Tech, then it’s a roll of the dice, a question of luck or who has the best connections. It was rough before the chimera of “Generative AI can replace junior engineers” rose up from the swamp. And now…oof.

Where would you be, if you hadn’t gotten into tech when you did?

I know where I would be, and it is not here.

The internet loves to make fun of Boomers, the generation that famously coasted to college, home ownership, and retirement, then pulled the ladder up after them while mocking younger people as snowflakes. “Ok, Boomer” may be here to stay, but can we try to keep “Ok, Staff Engineer” from becoming a thing?

Lots of people seem to think we don’t need junior engineers, but nobody is arguing that we need fewer senior engineers, or will need fewer senior engineers in the foreseeable future.

I think it’s safe to assume that anything deterministic and automatable will eventually be automated. Software engineering is no different—we are ground zero! Of course we’re always looking for ways to automate and improve efficiency, as we should be.

But large software systems are unpredictable and nondeterministic, with emergent behaviors. The mere existence of users injects chaos into the system. Components can be automated, but complexity can only be managed.

Even if systems could be fully automated and managed by AI, the fact that we cannot understand how AI makes decisions is a huge, possibly insurmountable problem. Running your business on a system that humans can’t debug or understand seems like a risk so existential that no security, legal or finance team would ever sign off on it. Maybe some version of this future will come to pass, but it’s hard to see it from here. I would not bet my career or my company on it happening.

In the meantime, we still need more senior engineers. The only way to grow them is by fixing the funnel.

No. You need to be able to set them up for success. Some factors that disqualify you from hiring junior engineers:

You have less than two years of runway
Your team is constantly in firefighting mode, or you have no slack in your system
You have no experienced managers, or you have bad managers, or no managers at all
You have no product roadmap
Nobody on your team has any interest in being their mentor or point person

The only thing worse than never hiring any junior engineers is hiring them into an awful experience where they can’t learn anything. (I wouldn’t set the bar quite as high as Cindy does in this article; while I understand where she’s coming from, it is so much easier to land your second job than your first job that I think most junior engineers would frankly choose a crappy first job over none at all.)

Being a fully distributed company isn’t a complete dealbreaker, but it does make things even harder. I would counsel junior engineers to seek out office jobs if at all possible. You learn so much faster when you can soak up casual conversations and technical chatter, and you lose that working from home. If you are a remote employer, know that you will need to work harder to compensate for this. I suggest connecting with others who have done this successfully (they exist!) for advice.

I also advise companies not to start by hiring a single junior engineer. If you’re going to hire one, hire two or three. Give them a cohort of peers, so it’s a little less intimidating and isolating.

I have come to believe that the only way this will ever change is if engineers and engineering managers across our industry take up this fight and make it personal.

Most of the places I know that do have a program for hiring and training entry level engineers, have it only because an engineer decided to fight for it. Engineers—sometimes engineering managers—were the ones who made the case and pushed for resources, then designed the program, interviewed and hired the junior engineers, and set them up with mentors. This is not an exotic project, it is well within the capabilities of most motivated, experienced engineers (and good for your career as well).

Finance isn’t going to lobby for this. Execs aren’t likely to step in. The more a person’s role inclines them to treat engineers like fungible resources, the less likely they are to understand why this matters.

AI is not coming to solve all our problems and write all our code for us—and even if it was, it wouldn’t matter. Writing code is but a sliver of what professional software engineers do, and arguably the easiest part. Only we have the context and the credibility to drive the changes we know form the bedrock for great teams and engineering excellence..

Great teams are how great engineers get made. Nobody knows this better than engineers and EMs. It’s time for us to make the case, and make it happen.

The Cost Crisis in Observability Tooling

January 24, 2024December 21, 2024 mipsytipsycost control, metrics, observability 2.0, three pillarsLeave a comment

Originally posted on the Honeycomb blog on January 24th, 2024

The cost of services is on everybody’s mind right now, with interest rates rising, economic growth slowing, and organizational budgets increasingly feeling the pinch. But I hear a special edge in people’s voices when it comes to their observability bill, and I don’t think it’s just about the cost of goods sold. I think it’s because people are beginning to correctly intuit that the value they get out of their tooling has become radically decoupled from the price they are paying.

In the happiest cases, the price you pay for your tools is “merely” rising at a rate several times faster than the value you get out of them. But that’s actually the best case scenario. For an alarming number of people, the value they get actually decreases as their bill goes up.

Observability 1.0 and the cost multiplier effect

Are you familiar with this chestnut?

“Observability has three pillars: metrics, logs, and traces.”

This isn’t exactly true, but it’s definitely true of a particular generation of tools—one might even say definitionally true of a particular generation of tools. Let’s call it “observability 1.0.”

From an evolutionary perspective, you can see how we got here. Everybody has logs… so we spin up a service for log aggregation. But logs are expensive and everybody wants dashboards… so we buy a metrics tool. Software engineers want to instrument their applications… so we buy an APM tool. We start unbundling the monolith into microservices, and pretty soon we can’t understand anything without traces… so we buy a tracing tool. The front-end engineers point out that they need sessions and browser data… so we buy a RUM tool. On and on it goes.

Logs, metrics, traces, APM, RUM. You’re now paying to store telemetry five different ways, in five different places, for every single request. And a 5x multiplier is on the modest side of the spectrum, given how many companies pay for multiple overlapping tools in the same category. You may also also be collecting:

Profiling data
Product analytics
Business intelligence data
Database monitoring/query profiling tools
Mobile app telemetry
Behavioral analytics
Crash reporting
Language-specific profiling data
Stack traces
CloudWatch or hosting provider metrics
…and so on.

So, how many times are you paying to store data about your user requests? What’s your multiplier? (If you have one consolidated vendor bill, this may require looking at your itemized bill.)

There are many types of tools, each gathering slightly different data for a slightly different use case, but underneath the hood there are really only three basic data types: the metric, unstructured logs, and structured logs. Each of these have their own distinctive trade-offs when it comes to how much they cost and how much value you can get out of them.

Metrics

Metrics are the great-granddaddy of telemetry formats; tiny, fast, and cheap. A “metric” consists of a single number, often with tags appended. All of the context of the request gets discarded at write time; each individual metric is emitted separately. This means you can never correlate one metric with another from the same request, or select all the metrics for a given request ID, user, or app ID, or ask arbitrary new questions about your metrics data.

Metrics-based tools include vendors like Datadog and open-source projects like Prometheus. RUM tools are built on top of metrics to understand browser user sessions; APM tools are built on top of metrics to understand application performance.

When you set up a metrics tool, it generally comes prepopulated with a bunch of basic metrics, but the useful ones are typically the custom metrics you emit from your application.

Your metrics bill is usually dominated by the cost of these custom metrics. At minimum, your bill goes up linearly with the number of custom metrics you create. Which is unfortunate, because to restrain your bill from unbounded growth, you have to regularly audit your metrics, do your best to guess which ones are going to be useful in the future, and prune any you think you can afford to go without. Even in the hands of experts, these tools require significant oversight.

Linear cost growth is the goal, but it’s rarely achieved. The cost of each metric varies wildly depending on how you construct it, what the values are, how often it gets hit, etc. I’ve seen a single custom metric cost $30k per month. You probably have dozens of custom metrics per service, and it’s almost impossible to tell how much each of them costs you. Metrics bills tend to be incredibly opaque (possibly by design).

Nobody can understand their software or their systems with a metrics tool alone, because the metric is extremely limited in what it can do. No context, no cardinality, no strings… only basic static dashboards. For richer data, we must turn to logs.

Unstructured logs

You can understand much more about your code with logs than you can with metrics. Logs are typically emitted multiple times throughout the execution of the request, with one or a small number of nouns per log line, plus the request ID. Unstructured logs are still the default, although this is slowly changing.

The cost of unstructured logs is driven by a few things:

Write amplification. If you want to capture lots of rich context about the request, you need to emit a lot of log lines. If you are printing out just 10 log lines per request, per service, and you have half a dozen services, that’s 60 log events for every request.
Noisiness. It’s extremely easy to accidentally blow up your log footprint yet add no value—e.g., by putting a print statement inside a loop instead of outside the loop. Here, the usefulness of the data goes down as the bill shoots up.
Constraints on physical resources. Due to the write amplification of log lines per request, it’s often physically impossible to log everything you want to log for all requests or all users—it would saturate your NIC or disk. Therefore, people tend to use blunt instruments like these to blindly slash the log volume:
- Log levels
- Consistent hashes
- Dumb sample rates

When you emit multiple log lines per request, you end up duplicating a lot of raw data; sometimes over half the bits are consumed by request ID, process ID, timestamp. This can be quite meaningful from a cost perspective.

All of these factors can be annoying. But the worst thing about unstructured logs is that the only thing you can do to query them is full text search. The more data you have, the slower it becomes to search that data, and there’s not much you can do about it.

Searching your logs over any meaningful length of time can take minutes or even hours, which means experimenting and looking around for unknown-unknowns is prohibitively time-consuming. You have to know what to look for in order to find it. Once again, as your logging bill goes up, the value goes down.

Structured logs

Structured logs are gaining adoption across the industry, especially as OpenTelemetry picks up steam. The nice thing about structured logs is that you can actually do things with the data other than slow, dumb string searches. If you’ve structured your data properly, you can perform calculations! Compute percentiles! Generate heatmaps!

Tools built on structured logs are so clearly the future. But just taking your existing logs and adding structure isn’t quite good enough. If all you do is stuff your existing log lines into key/value pairs, the problems of amplification, noisiness, and physical constraints remain unchanged—you can just search more efficiently and do some math with your data.

There are a number of things you can and should do to your structured logs in order to use them more effectively and efficiently. In order of achievability:

Instrument your code using the principles of canonical logs, which collects all the vital characteristics of a request into one wide, dense event. It is difficult to overstate the value of doing this, for reasons of usefulness and usability as well as cost control.
Add trace IDs and span IDs so you can trace your code using the same events instead of having to use an entirely separate tool.
Feed your data into a columnar storage engine so you don’t have to predefine a schema or indexes to decide which dimensions future you can search or compute based on.
Use a storage engine that supports high cardinality, with an explorable interface.

If you go far enough down this path of enriching your structured events, instrumenting your code with the right data, and displaying it in real time, you will reach an entirely different set of capabilities, with a cost model so distinct it can only be described as “observability 2.0.” More on that in a second.

Ballooning costs are baked into observability 1.0

To recap: high costs are baked into the observability 1.0 model. Every pillar has a price.

You have to collect and store your data—and pay to store it—again and again and again, for every single use case. Depending on how many tools you use, your observability bill may be growing at a rate 3x faster than your traffic is growing, or 5x, or 10x, or even more.

It gets worse. As your costs go up, the value you get out of your tools goes down.

Your logs get slower and slower to search.
You have to know what you’re searching for in order to find it.
You have to use blunt force sampling technique to keep log volume from blowing up.
Any time you want to be able to ask a new question, you first have to commit new code and deploy it.
You have to guess which custom metrics you’ll need and which fields to index in advance.
As volume goes up, your ability to find a needle in the haystack—any unknown-unknowns—goes down commensurately.

And nothing connects any of these tools. You cannot correlate a spike in your metrics dashboard with the same requests in your logs, nor can you trace one of the errors. It’s impossible. If your APM and metrics tools report different error rates, you have no way of resolving this confusion. The only thing connecting any of these tools is the intuition and straight-up guesses made by your most senior engineers. Which means that the cognitive costs are immense, and your bus factor risks are very real. The most important connective data in your system—connecting metrics with logs, and logs with traces—exists only in the heads of a few people.

At the same time, the engineering overhead required to manage all these tools (and their bills) rises inexorably. With metrics, an engineer needs to spend time auditing your metrics, tracking people down to fix poorly constructed metrics, and reaping those that are too expensive or don’t get used. With logs, an engineer needs to spend time monitoring the log volume, watching for spammy or duplicate log lines, pruning or consolidating them, choosing and maintaining indexes.

But all this the time spent wrangling observability 1.0 data types isn’t even the costliest part. The most expensive part is the unseen costs inflicted on your engineering organization as development slows down and tech debt piles up, due to low visibility and thus low confidence.

Is there an alternative? Yes.

The cost model of observability 2.0 is very different

Observability 2.0 has no three pillars; it has a single source of truth. Observability 2.0 tools are built on top of arbitrarily-wide structured log events, also known as spans. From these wide, context-rich structured log events you can derive the other data types (metrics, logs, or traces).

Since there is only one data source, you can correlate and cross-correlate to your heart’s content. You can switch fluidly back and forth between slicing and dicing, breaking down or grouping by events, and viewing them as a trace waterfall. You don’t have to worry about cardinality or key space limitations.

You also effectively get infinite custom metrics, since you can append as many as you want to the same events. Not only does your cost not go up linearly as you add more custom metrics, your telemetry just gets richer and more valuable the more key-value pairs you add! Nor are you limited to numbers; you can add any and all types of data, including valuable high-cardinality fields like “App Id” or “Full Name.”

Observability 2.0 has its own amplification factor to consider. As you instrument your code with more spans per request, the number of events you have to send (and pay for) goes up. However, you have some very powerful tools for dealing with this: you can perform dynamic head-based sampling or even tail-based sampling, where you decide whether or not to keep the event after it’s finished, allowing you to capture 100% of slow requests and other outliers.

Engineering time is your most precious resource

But the biggest difference between observability 1.0 and 2.0 won’t show up on any invoice. The difference shows up in your engineering team’s ability to move quickly, with confidence.

Modern software engineering is all about hooking up fast feedback loops. And observability 2.0 tooling is what unlocks the kind of fine-grained, exploratory experience you need in order to accelerate those feedback loops.

Where observability 1.0 is about MTTR, MTTD, reliability, and operating software, observability 2.0 is what underpins the entire software development lifecycle, setting the bar for how swiftly you can build and ship software, find problems, and iterate on them. Observability 2.0 is about being in conversation with your code, understanding each user’s experience, and building the right things.

Observability 2.0 isn’t exactly cheap either, although it is often less expensive. But the key difference between o11y 1.0 and o11y 2.0 has never been that either is cheap; it’s that with observability 2.0, when your bill goes up, the value you derive from your telemetry goes up too. You pay more money, you get more out of your tools. As you should.

Interested in learning more? We’ve written at length about the technical prerequisites for observability with a single source of truth (“observability 2.0” as we’ve called it here). Honeycomb was built to this spec; ServiceNow (formerly Lightstep) and Baselime are other vendors that qualify. Click here to get a Honeycomb demo.

CORRECTION: The original version of this document said that “nothing connects any of these tools.” If you are using a single unified vendor for your metrics, logging, APM, RUM, and tracing tools, this is not strictly true. Vendors like New Relic or Datadog now let you define certain links between your traces and metrics, which allows you to correlate between data types in a few limited, predefined ways. This is better than nothing! But it’s very different from the kind of fluid, open-ended correlation capabilities that we describe with o11y 2.0. With o11y 2.0, you can slice and dice, break down, and group by your complex data sets, then grab a trace that matches any specific set of criteria at any level of granularity. With o11y 1.0, you can define a metric up front, then grab a random exemplar of that metric, and that’s it. All the limitations of metrics still apply; you can’t correlate any metric with any other metric from that request, app, user, etc, and you certainly can’t trace arbitrary criteria. But you’re right, it’s not nothing. 😸

Questionable Advice: “My boss says we don’t need any engineering managers. Is he right?”

January 5, 2024January 10, 2024 mipsytipsyadvice, culture, engineering management, hierarchy, organizations, sociotechnical18 Comments

I recently joined a startup to run an engineering org of about 40 engineers. My title is VP Engineering. However, I have been having lots of ongoing conflict with the CEO (a former engineer) around whether or not I am allowed to have or hire any dedicated engineering managers. Right now, the engineers are clustered into small teams of 3-4, each of which has a lead engineer — someone who leads the group, but whose primary responsibility is still writing code and shipping product.

I have headcount to hire more engineers in the coming year, but no managers. My boss says we are a startup and can’t afford such luxuries. It seems obvious to me that we need engineering managers, but to him, it seems just as obvious that managers are unnecessary overhead and that all hands should be on deck writing code at our stage.

I don’t know how to make that argument. It seems so obvious to me that I actually struggle to put it into words or make the case for why we should hire EMs. Help?

— Unnecessary Overhead(?!?)

Oh boy, there’s a lot to unpack here.

It is unsurprising to me that your CEO does not understand why managers exist, given that he does not seem to understand why organizational structures exist. 🙈 Why is he micromanaging how you are structuring your org or what roles you are allowed to fill? He hired you to do a job, and he’s not letting you do it. He can’t even explain why he isn’t letting you do it. This does not bode well.

But I do think it’s an interesting question. So let’s pretend he isn’t holding your ability to do your damn job hostage until you defend yourself to his satisfaction. 😒

I can think of two ways to make the case for engineering managers: one is rather complicated, from first principles, and the other very simple, but perhaps unsatisfying.

I personally have a … vigorous … knee-jerk response to authority; I hate being told what to do. It’s only recently that I’ve found my way to an understanding of hierarchy that feels healthy and practical, and that was by looking at it through the lens of systems theory.

Why does hierarchy exist in organizations?

It makes sense that hierarchy comes with a lot of baggage. Many of us have had bad experience with managers — indeed, entire organizations — where hierarchy was used as a tool of oppression, where people rose up the leadership ranks by hoarding information and playing dominance games, and decisions got made by pulling rank.

Working at a place like that fucking sucks. Who wants to invest their creativity and life force into a place that feels like a Dilbert cartoon, knowing how little it will be valued or reciprocated, and that it will slowly but surely get crushed out of you?

But hierarchy is not intrinsically authoritarian. Hierarchy did not originate as a political structure that humans invented for controlling and dominating one another, it is in fact a property of self-organizing systems, and it emerges for the benefit of the subsystems. In fact, hierarchy is absolutely critical to the adaptability, resiliency, and scalability of complex systems.

Let’s start with few basic facts about systems, for anyone that may be unfamiliar.

Hierarchy is a property of self-organizing systems

A system is “a network of interdependent components that work together to try to accomplish a common aim” (W. Edward Deming). A pile of sand is not a system, but a car is a system; if you take out its gas tank, the car cannot achieve its aim.

A subsystem is a collection of elements with a smaller aim inside a larger system. There can be many levels of subsystems that operate interdependently. The subsystems always work to support the needs of the larger system; if the subsystem instead optimizes for its own best interests, the whole system can fail (this is where the term “suboptimize” comes from 😄).

A system is self-organizing if it has the ability to make itself more complex, by diversifying, adapting, and improving itself. As systems self-organize and their complexity increases, they tend to generate hierarchy — an arrangement of systems and subsystems. In a stable, resilient and efficient system, subsystems can largely take care of themselves, regulate themselves, and serve the needs of the larger system, while the larger system coordinates between subsystems and helps them perform better.

Hierarchy minimizes the costs of coordination and reduces the amount of information that any given part of the system has to keep track of, preventing information overload. Information transfer and relationships within a subsystem are much more dense and have fewer delays than information transfer or relationships between subsystems.

(This should all sound pretty familiar to any software engineer. Modularization, amirite?? 😍)

Applying this definition, we can say that a manager’s job is to coordinate between teams and help their team perform better.

The false binary of sociotechnical systems

You’ve probably heard this canard: “Engineers do the technical work, managers do the people work.” I hate it. ☺️ I think it misconstrues the fundamental nature of sociotechnical systems. The “socio” and “technical” of sociotechnical systems are not neatly separable, they are interwoven and interdependent. There is actually precious little that is purely technical work or purely people work; there is a metric shitload of glue work that draws upon both skill sets.

Consider a very partial list of tasks done by any functional engineering org, besides writing code:

Recruiting, networking, interviewing, training interviewers, synthesizing feedback, writing job descriptions and career ladders
Project management for each project or commitment, prioritizing backlog, managing stakeholders and resolving conflicts, estimating size and scope, running retrospectives
Running team meetings, having 1x1s, giving continuous growth feedback, writing reviews, representing the team’s needs
Architecture, code review, refactoring; capturing DORA and productivity metrics, managing alert volume to prevent burnout

A lot of this work can be done by engineers, and often is. Every company distributes the load somewhat differently. This is a good thing! You don’t WANT an org where this work is only done by managers. You want individual contributors to help co-create the org and have a stake in how it gets run. Almost all of this work would be done more effectively by someone with an engineering background.

So you can understand why someone might hesitate to spend valuable headcount on engineering managers. Why wouldn’t you want everyone in engineering to be writing and shipping code as their primary job? Isn’t that by definition the best way to maximize productivity?

Ehhh… 😉

Engineering managers are a useful abstraction

In theory, you could make a list of all the tasks that need to be done to coordinate with other teams and have each item be picked up by a different person. In practice, this is impractical because then everybody would need to know about everything. One of the primary benefits of hierarchy, remember, is to reduce information overload. Intra-team communication should be high-bandwidth and fast, inter-team communication should be more sparse.

As the company scales, you can’t expect everybody to know everyone else; we need abstractions in order to function. A manager is the point of contact and representative for their team, and they serve as routers for important information.

Graph theory: for each of n engineers to be connected ("aligned") w each other you need n(n-1)/2 links, i.e. you need 780 interactions just to keep consistency. Without a few engineering managers (and arks) he is allowing velocity to build up tech debt at blazing speed.
— HurtingBrain (@HurtingBrain) January 5, 2024

I sometimes imagine managers as the nervous system of the company body, carrying around messages from one limb to another to coordinate actions. Centralizing many or most of these functions into one person lets you take advantage of specialization, as a manager builds relationships and context and improves at their role, and this massively reduces context switching for everyone else.

Manager calendars vs maker calendars

Engineering labor takes concentration and focus. Context switching is expensive, and too many interrupts can be fatal. Management labor consists of context switching every hour or so, and being available for interruptions throughout the day. These are two very different modes of being, headspaces, and calendar schedules, and do not coexist well.

In general, you want people to be able to spend most of their time working on things that contribute to the success of the outcomes they are directly responsible for. Engineers can only do so much glue work before their calendar turns into Swiss cheese and they can no longer deliver on their commitments. Since managers’ calendars are already Swiss cheese, it’s typically less disruptive for them to take on a larger share of glue labor.

It isn’t up to managers to do all the glue work, but it is a manager’s job to make sure that everything that needs to get done, does gets done. It is a manager’s job to try to line up every engineer with work that is interesting and challenging, but not overwhelming, and to ensure that unpleasant labor gets equitably distributed. It’s also a manager’s job to make sure that if we are asking someone to do a job, they are equipped with the resources they need to succeed at that job. Including time to focus.

Management is a tool for accountability

When you’re an engineer, you are responsible for the software you develop, deploy, and maintain. When you’re a manager, you are responsible for your team and the organization as a whole.

Management is one way of holding people accountable for specific outcomes (building teams with the right skills, relationships, and processes to make good decisions and build value for the company), and equipping them with the resources (budget, tools, headcount) to achieve those outcomes. If you aren’t making building the organization someone’s number one job, it won’t be anyone’s number one job, which means it probably won’t get done very well. And whose responsibility will that be, Mr. CEO?

There’s a real upper limit to what you can reasonably expect tech leads, or engineers, or anyone whose actual job is shipping software to do in their “spare time”. If you’re trying to hold your tech leads responsible for building healthy engineering teams, tools, and processes, you are asking them to do two calendarily incompatible jobs with only one calendar. The likeliest scenario is that they will focus on the outcomes they feel comfortable owning (the technical ones), while you pile up organizational debt in the background.

In natural hierarchies, we look up for purpose and down for function. That, in a nutshell, is the more complicated argument for why we need engineering managers.

Choose Boring technology Culture

The simpler argument is this: most engineering orgs have engineering managers. That’s the default. Lots of people much smarter than you or me have spent lots of time thinking and tinkering with org structures over the years, and this is what we’ve got.

As Dan McKinley famously said, we should “choose boring technology“. Boring doesn’t mean bad, it means the capabilities and failure conditions are well understood. You only ever get a few innovation tokens, so you should spend those wisely on core differentiators that could make or break your business. The same goes for culture. Do you really want to spend one of your tokens on org structure? Why??

For better or for worse, the hierarchical org structure is well understood. There are plenty of people on the job market who are proficient at managing or working with managers, and you can hire them. You can get training, coaching, or read a lot of self-help books. There are various management philosophies you can coalesce around or use to rule people out. On the other hand, the manager-free experiments I’m aware of (e.g. holacracy at Medium and GitHub, or “Choose Your Own Work” at Linden Lab) have all been quietly abandoned or outgrown. Not, in my experience, because leaders went mad for power, but due to chaos, lack of focus, and poor execution.

When there is no explicit structure or hierarchy, the result is not freedom and egalitarianism, it’s “informal, unacknowledged, and unaccountable leadership”, as famously detailed in “The Tyranny of Structureless“. In reality, sadly, these teams tend to be chaotic, fragile, and frustrating. I know! I’m pissed too! 😭

This argument doesn’t necessarily prove your CEO is wrong, but I should think his bar for proof is much higher than yours. “I don’t want any of my engineers to stop writing code” is not an argument. But I’m also feeling like I haven’t quite addressed the core question of productivity, so let’s pick that up again once more.

More lines of code != more productivity

To briefly recap: we were talking about an org with ~40 engineers, broken up into 10 small clusters of 3-4 engineers, each with a tech lead. Your CEO is arguing that you can’t afford to lose any velocity, which he thinks is what would happen if anyone stops writing code full time.

Maybe. But everything I have ever experienced leads me to believe that a fewer number of larger teams, each helmed by an experienced engineering manager, should way outperform this gaggle of tiny groups. It’s not even close. And they can do so in a way that’s more efficient, sustainable, and humane than this scrappy death march.

And systems thinking shows us why! With fewer groups, but larger ones, you have less overall management overhead, and much less of the slow and costly intra-group coordination. You unlock rich, dense knowledge transfer within groups, which gives you more shared coverage of the surface area. With 7-9 engineers per group you can build a real on call rotation, which means fewer heroics and less burnout. The coordination that you do need to do can be more strategic, less tactical, and much more forward-looking.

Would five big teams ship as many lines of code as 10 small teams, even if five engineers become managers and stop writing code? Probably, but who cares? Your customers give zero fucks how many lines of code you write. They care about whether you are building the right things and solving problems that matter to them. What matters is moving the business forward, not churning out code. Don’t forget, the act of churning out code creates costs and externalities in and of itself.

What defines your velocity is that you spend your time on the right things. Learning to make good decisions about what to build is something every organization has to work out for itself, and it is always an ongoing work in progress. Engineering managers don’t do all the work or make all the decisions, but they are absolutely fucking vital, in my experience, to ensuring that work happens and is done well. As I wrote in my last piece, engineering managers are the embodiment of the feedback loops that systems use to learn and improve.

Are managers ever unnecessary overhead?

Sure, absolutely. Management is about coordinating between teams and helping teams run more optimally, so anything that decreases your need for coordination also decreases your need for management. If you are a small company, or if you have really senior folks who are used to working together, you need a lot less coordination. The next most relevant factor is probably the rate of change; if you’re growing fast or have a lot of turnover, or if there’s a lot of time pressure or frequent shifts in strategy, your need for managers goes up. But there are plenty of smaller orgs out there that are doing just fine without a lot of formal management.

Look, I’m not a fan of the word “overhead”, because a) it’s kind of rude and b) people who call managers “overhead” are typically people who disrespect or do not value the craft of management.

But management is, in fact, overhead. 😅 So is a lot of other glue work! By which I mean the work is important, but does not itself move the business forward; we should do as much of it as absolutely necessary and no more. The nature of glue work is such that it too-easily expands to consume all available time and space (and then some). Constraints are good. Feeling a bit underresourced is good, and should be the norm. It is incredibly easy for management to get a bit bloated, and managers can be very loath to acknowledge this, because it’s not like they ever feel any less stressed or stretched.[*]

Management is also very much like operations work in that when it’s being done well, it’s invisible. Evaluating managers can be very hard, especially in the near term, and making decisions about when it’s time to create or pay down organizational debt is a whole nother ball of wax, and way outside the scope of this post.

But yes, managers can absolutely be unnecessary overhead.

However, if you have 40 engineers all reporting to one VP, and nobody else whose number one job is the outcomes related to people, teams and org, I feel pretty safe in saying this is not a risk for you at this time.

<3 charity

[*] In fact, the reverse can be true; bloated management can create MORE work for managers, and they may counterintuitively feel LESS stretched or stressed with a leaner org chart. Bureaucracies do tend to pick up a momentum all their own. Especially when management gets all wrapped up in promotions and egos. Which is yet another good reason to ensure that management is not a promotion or a form of domination.

Becoming An Engineering Manager Can Make You Better At Life And Relationships

December 15, 2023January 10, 2024 mipsytipsyadvice, careers, culture, engineering management, ethics, management31 Comments

Original title: “Why Should You (Or Anyone) Become An Engineering Manager?”

The first piece I ever wrote about engineering management, The Engineer/Manager Pendulum, was written as a love letter to a friend of mine who was unhappy at work. He was an engineering director at a large and fast-growing startup, where he had substantially built out the entire infrastructure org, but he really missed being an engineer and building things. He wasn’t getting a lot of satisfaction out of his work, and he felt like there were other people who might relish the challenge and do it better than he could.

At the same time, it felt like a lot to walk away from! He had spent years building up not only the teams, but also his influence and reputation. He had grown accustomed to being in the room where decisions get made, and didn’t want to give that up or take a big step back in his career. He agonized over this for a long time (and I listened over many whiskeys). 🙂

To me it seemed obvious that his power and influence would only increase if he went back to engineering. You bring your credibility and your relationships along with you, and enthusiasm is contagious. So I wrote the piece with him in mind, but it definitely struck a nerve; it is still the most-read piece I have ever written.

That was in 2017. I’ve written a lot over the years since then about teams and management. For a long time, everything I wrote seemed to come out with a pretty noticeable bias against management, towards engineering:

I go back and read some of those pieces now, and the pervasive anti-manager slant actually makes me a bit uncomfortable, because the environment has changed quite a lot since then.

Miserable managers have miserable reports

When I started writing about engineering management, it seemed like there were a lot of unhappy, resentful, poorly trained managers, lots of whom would prefer to be writing code. Most people made the choice to switch to management for reasons that had nothing to do with the work itself.

Becoming a manager was seen as a promotion
It was the only form of career progression available at many places
Managers made a lot more money
It was the only way to get a seat at the table, or be in the loop
They were tired of taking orders from someone else

But a lot has changed. The emergence of staff+ engineering has been huge (two new books published in the last five years, and at least one conference!). The industry has broadly coalesced around engineering levels and career progression; a parallel technical leadership track is now commonplace.

We’ve become more aware of how fragile command-and-control systems are, and that you want to engage people’s agency and critical thinking skills. You want them to feel ownership over their labor. You can’t build great software on autopilot, or by picking up jira tasks. Our systems are becoming so complex that you need people to be emotionally and mentally engaged, curious, and continuously learning and improving, both as individuals and as teams.

I tell people all the time that you can do most of the "fun" management things (mentoring, coaching, watching people grow, contributing to decision making) as an IC without doing all the terrible parts of management (firing, budgeting, serious HR things).

— Jill Wetzler (@JillWetzler) September 5, 2019

At the same time, our expectations for managers have gone up dramatically. We’ve become more aware of the damage done by shoddy managers, and we increasingly expect managers to be empathetic, supportive, as well as deeply technical. All of this has made the job of engineering manager more challenging.

Ambitious engineers had already begun to drift away from management and towards the role of staff or principal engineer. And then came the pandemic, which caused managers (poorly supported, overwhelmed, squeezed between unrealistic expectations on both sides) to flee the profession in droves.

It’s getting harder to find people who are willing to be managers. On the one hand, it is fucking fantastic that people aren’t being driven into management out of greed, rage, or a lust for power. It is WONDERFUL that people are finding engineering roles where they have autonomy, ownership, and career progression, and where they are recognized and rewarded for their contributions.

On the other hand, engineering managers are incredibly important and we need them. Desperately.

Good engineering managers are force multipliers

A team with a good engineering manager will build circles around a team without one. The larger or more complex the org or the product, and the faster you want to move, the more true this is. Everybody understands the emotional component, that it feels nice to have a competent manager you trust. But these aren’t just squishy feels. This shit translates directly into velocity and quality. The biggest obstacles to engineering productivity are not writing lines of code too slowly or not working long hours, they are:

Working on the wrong thing
Getting bogged down in arguments, or being endlessly indecisive
Waiting on other teams to do their work, waiting on code review
Ramping new engineers, or trying to support unfamiliar code
When people are upset, distracted, or unmotivated
Unfinished migrations, migrations in flight, or having to support multiple systems indefinitely
When production systems are poorly understood and opaque, quality suffers, and firefighting skyrockets
Terrible processes, tools, or calendars that don’t support focus time
People who refuse to talk to each other
Letting bad hires and chronic underperformers stick around indefinitely

Engineers are responsible for delivering products and outcomes, but managers are responsible for the systems and structural support that enables this to happen.

Managers don’t make all the decisions, but they do ensure the decisions get made. They make sure that workstreams are are staffed and resourced sufficiently, that engineers are trained and improving at their craft. They pay attention to the contracts and commitments you have made with other teams, companies or orgs. They advocate for your needs at all levels of the organization. They connect dots and nudge and suggest ideas or solutions, they connect strategy with execution.

Breaking down a complex business problem into a software project that involves the collaboration of multiple teams, and ensuring that every single contributor has work to do that is challenging and pushes their boundaries while not being overwhelming or impossible… is really fucking hard. Even the best leaders don’t get it right every time.

In systems theory, hierarchy emerges for the benefit of the subsystems. Hierarchy exists to coordinate between the subsystems and help them improve their function; it is how systems create resiliency to unknown stressors. Which means that managers are, in a very real way, the embodiment of the feedback loops and meta loops that a system depends on to align itself and all of its parts around a goal, and for the system itself to improve over time.

For some people, that is motivation enough to try being a manager. But not for all (and that’s okay!!). What are some other reasons for going into management?

Why should you (or anyone) be a manager?

I can think of a few good reasons off the top of my head, like…

It gets you closer to how the business operates, and gives you a view into how and why decisions get made that translate eventually into the work you do as an engineer
Which makes the work feel more meaningful and less arbitrary, I think. It connects you to the real value you are creating in the world.
Many people reach a point where they feel a gravitational pull towards mentorship. It’s almost like a biological imperative to replicate yourself and pass on what you have learned to the next generation.
Many people also get to a point where they develop strong convictions about what not to do as a manager. They may feel compelled to use what they’ve learned to build happy teams and propagate better practices through the industry
One way to develop a great staff engineer is to take a great senior engineer and put them through 2-3 years of management experience.

But the main reason I would encourage you to try engineering management is a reason that I’m not sure I’ve ever heard someone articulate up front, which is that…it can make you better at life and relationships, in a huge and meaningful way.

Work is always about two things: what you put out into the world, and who you become while doing it.

I want to stop short of proclaiming that “being a manager will make you a better person!” — because skills are skills, and they can be used for good or ill. But it can.

It’s a lot like choosing to become a parent. You don’t decide to have kids because it sounds like a hoot (I hope); you go into it knowing it will be hard work, but meaningful work. It’s a way of processing and passing on the experiences that have shaped you and who you are. You also take up the mantle understanding that this will change you — it changes who you are as a person, and the relationships you have with others.

From the outside, management looks like making decisions and calling the shots. From the inside, management looks more like becoming intimately acquainted with your own limitations and motivations and those of others, plus a lot of systems thinking.

Yes, management absolutely draws on higher-level skills like strategy and planning, writing reviews, mediating conflict, designing org charts, etc. But being a good manager — showing up for other people and supporting them consistently, day after day — rests on a bedrock of some much more foundational skills.

The kind of skills you learn in therapy, not in classes

Self-regulation. Can you take care of yourself consistently — sleep, eat, leave the house, socialize, balance your moods, moderate your impulses? As an engineer, you can run your tank dry on occasion, but as a manager, that’s malpractice. You always need to have fuel left in the tank, because you don’t know when it will be called upon.

Self-awareness. Identifying your feelings in the heat of the moment, unpacking where they came from, and deciding how to act on them. It’s not about clamping down on your feelings and denying you have them. It’s definitely not about making your feelings into other people’s problems, or letting your reactions create even bigger problems for your future self. It’s about acting in ways that are fueled by your authentic emotional responses, but not ruled by them.

Understanding other people. You learn to read people and their reactions, starting with your direct reports. You build up a mental model for what motivates someone, what moves them, what bothers them, and what will be extremely challenging for them. You must develop a complex topographical map of how much you can trust each person’s judgment, on which aspect, in any given situation.

Setting good boundaries. Where is the line between supporting someone, advocating for them, encouraging them, pushing them … but not propping them up at all costs, or taking responsibility for their success? How is the manager/report relationship different from the coworker relationship, or the friend relationship? How do you navigate the times when you have to hold someone accountable because their work is falling short?

Sensitivity to power dynamics. Do people treat you differently as a manager than they did as a peer? Are there things that used to be okay for you to say or do that now come across as inappropriate or coercive? How does interacting with your reports inform the way you interact with your own manager, or how you understand what they say?

Hard conversations. Telling people things that you know they don’t want to hear, or things that will make them feel afraid, angry, or upset; and then sitting with their reactions, resisting the urge to take it all back and make everything okay.

The art of being on the same side. When you’re giving someone feedback, especially constructive feedback, it’s easy to trigger a defensive response. It’s SO easy for people to feel like you are judging them or criticizing them. But the dynamic you want to foster is one where you are both side by side, shoulder to shoulder, facing the same way, working together. You are giving them feedback because you care; feedback that could help them be even better, if they choose to accept it. You are on their side. They always have agency.

Did you learn these skills growing up? I sure as hell did not.

I grew up in a family that was very nice. It was very kind and loving and peaceful, but we did not tell each other hard things. When I went off to college and started dating, I had no idea how to speak up when something bothered me. There were sentences that lingered on the tip of my tongue for years but were never spoken out loud, over the ebb and fall of entire relationships. And if somebody raises their voice to me in anger, to this day, I crumble.

I also came painfully late to developing a so-called growth mindset. This is super common among “smart kids”, who get so used to perfect scores and praise for high achievements that any feedback or critique feels like failure…and failure feels like the end of the world.

It was only after I became a manager that I began to consciously practice skills like giving feedback, or receiving constructive criticism, or initiating hard conversations. I had to. But once I did, I started getting better.

Turns out, work is actually kind of an ideal sandbox for life skills, because the social contract is more explicit. These are structured relationships, with rules and conventions and expectations, and your purpose for coming together is clear: to succeed at business, to finish a project, to pull a paycheck. Even the element of depersonalization can be useful: it’s not you-you, it’s the professional version of you, performing a professional role. The stakes are lower than they would be with your mother, your partner, or your child.

You don’t have to be a manager to build these skills, of course. But it’s a great opportunity to do so! And there are tools — books and classes, mentors and review cycles. You can ask for feedback from others. Growth and development is expected in this role.

People skills are persistent

The last thing I will say is this — technical skills do decay and become obsolete, particularly language fluency, but people skills do not. Once you have built these muscles, you will carry them with you for life. They will enhance your ability to connect with people and build trust, to listen perceptively and communicate clearly, in both personal and professional relationships.

Whatever you decide to do with your life, these skills increase your optionality and make you more effective.

And that is the reason I think you should consider being an engineering manager. Like I said, I’ve never heard someone cite this as their reason for wanting to become a manager. But if you ask managers why they do it five or ten years later, you hear a version of this over and over again.

One cautionary note

As an engineer, you can work for a company whose leadership team you don’t particularly respect, whose product you don’t especially love, or whose goals you aren’t super aligned with, and it can be “okay.” Not terrific, but not terrible.

As a manager, you can’t. Or you shouldn’t. The conflicts will eat you up inside and/or prevent you from doing excellent work.

Your job consists of representing the leadership team and their decisions, pulling people into alignment with the company’s goals, and thinking about how to better achieve the mission. As far as your team goes, you are the face of The Man. If you can’t do that, you can’t do your job. You don’t get to stand apart from the org and throw rocks, e.g. “they told me I have to tell you this, but I don’t agree with it”. That does nothing but undermine your own position and the company’s. If you’re going to be a manager, choose your company wisely.

charity

P.S. My friend (from the start of the article) went back to being an engineer, despite his trepidation, and never regretted it once. His career has been up and to the right ever since; he went on to start a company. The skills he built as a manager were a huge boost to an already stellar career. 📈

LLMs Demand Observability-Driven Development

September 20, 2023December 21, 2024 mipsytipsyai, artificial intelligence, llms, observability-driven developmentLeave a comment

Originally posted on the Honeycomb blog on September 20th, 2023

Our industry is in the early days of an explosion in software using LLMs, as well as (separately, but relatedly) a revolution in how engineers write and run code, thanks to generative AI.

Many software engineers are encountering LLMs for the very first time, while many ML engineers are being exposed directly to production systems for the very first time. Both types of engineers are finding themselves plunged into a disorienting new world—one where a particular flavor of production problem they may have encountered occasionally in their careers is now front and center.

Namely, that LLMs are black boxes that produce nondeterministic outputs and cannot be debugged or tested using traditional software engineering techniques. Hooking these black boxes up to production introduces reliability and predictability problems that can be terrifying. It’s important to understand this, and why.

100% debuggable? Maybe not

Software is traditionally assumed to be testable, debuggable, and reproducible, depending on the flexibility and maturity of your tooling and the complexity of your code. The original genius of computing was one of constraint; that by radically constraining language and mathematics to a defined set, we could create algorithms that would run over and over and always return the same result. In theory, all software is debuggable. However, there are lots of things that can chip away at that beauteous goal and make your software mathematically less than 100% debuggable, like:

Adding concurrency and parallelism.
Certain types of bugs.
Stacking multiple layers of abstractions (e.g., containers).
Randomness.
Using JavaScript (HA HA).

There is a much longer list of things that make software less than 100% debuggable in practice. Some of these things are related to cost/benefit tradeoffs, but most are about weak telemetry, instrumentation, and tooling.

If you have only instrumented your software with metrics, for example, you have no way of verifying that a spike in api_requests and an identical spike in 503 errors are for the same events (i.e., you are getting a lot of api_requests returning 503) or for a disjoint set of events (the spike in api_requests is causing general congestion causing a spike in 503s across ALL events). It is mathematically impossible; all you can do is guess. But if you have a log line that emits both the request_path and the error_code, and a tool that lets you break down and group by arbitrary dimensions, this would be extremely easy to answer. Or if you emit a lot of events or wide log lines but cannot trace them, or determine what order things executed in, there will be lots of other questions you won’t be able to answer.

There is another category of software errors that are logically possible to debug, but prohibitively expensive in practice. Any time you see a report from a big company that tracked down some obscure error in a kernel or an ethernet device, you’re looking at one of the rare entities with 1) enough traffic for these one in a billion errors to be meaningful, and 2) enough raw engineering power to dedicate to something most of us just have to live with.

But software is typically understandable because we have given it structure and constraints.

IF (); THEN (); ELSE () is testable and reproducible. Natural languages, on the other hand, are infinitely more expressive than programming languages, query languages, or even a UI that users interact with. The most common and repeated patterns may be fairly predictable, but the long tail your users will create is very long, and they expect meaningful results there, as well. For complex reasons that we won’t get into here, LLMs tend to have a lot of randomness in the long tail of possible results.

So with software, if you ask the exact same question, you will always get the exact same answer. With LLMs, you might not.

LLMs are their own beast

Unit testing involves asserting predictable outputs for defined inputs, but this obviously cannot be done with LLMs. Instead, ML teams typically build evaluation systems to evaluate the effectiveness of the model or prompt. However, to get an effective evaluation system bootstrapped in the first place, you need quality data based on real use of an ML model. With software, you typically start with tests and graduate to production. With ML, you have to start with production to generate your tests.

Even bootstrapping with early access programs or limited user testing can be problematic. It might be ok for launching a brand new feature, but it’s not good enough for a real production use case.

Early access programs and user testing often fail to capture the full range of user behavior and potential edge cases that may arise in real-world usage when there are a wide range of users. All these programs do is delay the inevitable failures you’ll encounter when an uncontrolled and unprompted group of end users does things you never expected them to do.

Instead of relying on an elaborate test harness to give you confidence in your software a priori, it’s a better idea to embrace a “ship to learn” mentality and release features earlier, then systematically learn from what is shipped and wrap that back into your evaluation system. And once you have a working evaluation set, you also need to figure out how quickly the result set is changing.

Phillip gives this list of things to be aware of when building with LLMs:

Failure will happen—it’s a question of when, not if.
Users will do things you can’t possibly predict.
You will ship a “bug fix” that breaks something else.
You can’t really write unit tests for this (nor practice TDD).
Latency is often unpredictable.
Early access programs won’t help you.

Sound at all familiar? 😂

Observability-driven development is necessary with LLMs

Over the past decade or so, teams have increasingly come to grips with the reality that the only way to write good software at scale is by looping in production via observability—not by test-driven development, but observability-driven development. This means shipping sooner, observing the results, and wrapping your observations back into the development process.

Modern applications are dramatically more complex than they were a decade ago. As systems get increasingly more complex, and nondeterministic outputs and emergent properties become the norm, the only way to understand them is by instrumenting the code and observing it in production. LLMs are simply on the far end of a spectrum that has become ever more unpredictable and unknowable.

Observability—both as a practice and a set of tools—tames that complexity and allows you to understand and improve your applications. We have written a lot about what differentiates observability from monitoring and logging, but the most important bits are 1) the ability to gather and store telemetry as very wide events, ordered in time as traces, and 2) the ability to break down and group by any arbitrary, high-cardinality dimension. This allows you to explore your data and group by frequency, input, or result.

In the past, we used to warn developers that their software usage patterns were likely to be unpredictable and change over time; now we inform you that if you use LLMs, your data set is going to be unpredictable, and it will absolutely change over time, and you must have a way of gathering, aggregating, and exploring that data without locking it into predefined data structures.

With good observability data, you can use that same data to feed back into your evaluation system and iterate on it in production. The first step is to use this data to evaluate the representativity of your production data set, which you can derive from the quantity and diversity of use cases.

You can make a surprising amount of improvements to an LLM based product without even touching any prompt engineering, simply by examining user interactions, scoring the quality of the response, and acting on the correctable errors (mainly data model mismatches and parsing/validation checks). You can fix or handle for these manually in the code, which will also give you a bunch of test cases that your corrections actually work! These tests will not verify that a particular input always yields a correct final output, but they will verify that a correctable LLM output can indeed be corrected.

You can go a long way in the realm of pure software, without reaching for prompt engineering. But ultimately, the only way to improve LLM-based software is by adjusting the prompt, scoring the quality of the responses (or relying on scores provided by end users), and readjusting accordingly. In other words, improving software that uses LLMs can only be done by observability and experimentation. Tweak the inputs, evaluate the outputs, and every now and again, consider your dataset for representivity drift.

Software engineers who are used to boolean/discrete math and TDD now need to concern themselves with data quality, representivity, and probabilistic systems. ML engineers need to spend more time learning how to develop products and concern themselves with user interactions and business use cases. Everyone needs to think more holistically about business goals and product use cases. There’s no such thing as a LLM that gives good answers that don’t serve the business reason it exists, after all.

So, what do you need to get started with LLMs?

Do you need to hire a bunch of ML experts in order to start shipping LLM software? Not necessarily. You cannot (there aren’t enough of them), you should not (this is something everyone needs to learn), and you don’t want to (these are changes that will make software engineers categorically more effective at their jobs). Obviously you will need ML expertise if your goal is to build something complex or ambitious, but entry-level LLM usage is well within the purview of most software engineers. It is definitely easier for software engineers to dabble in using LLMs than it is for ML engineers to dabble in writing production applications.

But learning to write and maintain software in the manner of LLMs is going to transform your engineers and your engineering organizations. And not a minute too soon.

The hardest part of software has always been running it, maintaining it, and understanding it—in other words, operating it. But this reality has been obscured for many years by the difficulty and complexity of writing software. We can’t help but notice the upfront cost of writing software, while the cost of operating it gets amortized over many years, people, and teams, which is why we have historically paid and valued software engineers who write code more than those who own and operate it. When people talk about the 10x engineer, everyone automatically assumes it means someone who churns out 10x as many lines of code, not someone who can operate 10x as much software.

But generative AI is about to turn all of these assumptions upside down. All of a sudden, writing software is as easy as sneezing. Anyone can use ChatGPT or other tools to generate reams of code in seconds. But understanding it, owning it, operating it, extending and maintaining it… all of these are more challenging than ever, because in the past, the way most of us learned to understand software was by writing it.

What can we possibly do to make sure our code makes sense and works, and is extendable and maintainable (and our code base is consistent and comprehensible) when we didn’t go through the process of writing it? Well, we are in the early days of figuring that out, too. 🙃

If you’re an engineer who cares about your craft: do code reviews. Follow coding standards and conventions. Write (or generate) tests for it. But ultimately, the only way you can know for sure whether or not it works is to ship it to production and watch what happens.

This has always been true, by the way. It’s just more true now.

If you’re an engineer adjusting to the brave new era: take some of that time you used to spend writing lines of code and reinvest it back into understanding, shipping under controlled circumstances, and observing. This means instrumenting your code with intention, and inspecting its output. This means shipping as soon as possible into the production environment. This means using feature flags to decouple deploys from releases and gradually roll new functionality out in a controlled fashion. Invest in these—and other—guardrails to make the process of shipping software more safe, fine-grained, and controlled.

Most of all, it means developing the habit of looking at your code in production, through the lens of your telemetry, and asking yourself: does this do what I expected it to do? Does anything else look weird?

Or maybe I should say “looking at your systems” instead of “looking at your code,” since people might confuse the latter with an admonition to “read the code.” The days when you could predict how your system would behave simply by reading lines of code are long, long gone. Software behaves in unpredictable, emergent ways, and the important part is observing your code as it’s running in production, while users are using it. Code in a buffer can tell you very little.

This future is a breath of fresh air

This, for once, is not a future I am afraid of. It’s a future I cannot wait to see manifest. For years now, I’ve been giving talks on modern best practices for software engineering—developers owning their code in production, testing in production, observability-driven development, continuous delivery in a tight feedback loop, separating deploys from releases using feature flags. No one really disputes that life is better, code is better, and customers are happier when teams adopt these practices. Yet, only 11% of teams can deploy their code in less than a day, according to the DORA report. Only a tiny fraction of teams are operating in the way everybody agrees we all should!

Why? The answers often boil down to organizational roadblocks, absurd security/compliance policies, or lack of buy-in/prioritizing. Saddest of all are the ones who say something like, “our team just isn’t that good” or “our people just aren’t that smart” or “that only works for world-class teams like the Googles of the world.” Completely false. Do you know what’s hard? Trying to build, run, and maintain software on a two month delivery cycle. Running with a tight feedback loop is so much easier.

Just do the thing

So how do teams get over this hump and prove to themselves that they can have nice things? In my experience, only one thing works: when someone joins the team who has seen it work before, has confidence in the team’s abilities, and is empowered to start making progress against those metrics (which they tend to try to do, because people who have tried writing code the modern way become extremely unwilling to go back to the bad old ways).

And why is this relevant?

I hypothesize that over the course of the next decade, developing with LLMs will stop being anything special, and will simply be one skill set of many, alongside mobile development, web development, etc. I bet most engineers will be writing code that interacts with an LLM. I bet it will become not quite as common as databases, but up there. And while they’re doing that, they will have to learn how to develop using short feedback loops, testing in production, observability-driven development, etc. And once they’ve tried it, they too may become extremely unwilling to go back.

In other words, LLMs might ultimately be the Trojan Horse that drags software engineering teams into the modern era of development best practices. (We can hope.)

In short, LLMs demand we modify our behavior and tooling in ways that will benefit even ordinary, deterministic software development. Ultimately, these changes are a gift to us all, and the sooner we embrace them the better off we will be.

How to Communicate When Trust Is Low (Without Digging Yourself Into A Deeper Hole)

August 17, 2023January 10, 2024 mipsytipsyadvice, communication, culture, ethics28 Comments

This is based on an internal quip doc I wrote up about careful communication in the context of rebuilding trust. I got a couple requests to turn it into a blog post for sharing purposes; here you go.🌈✨🥂

In this doc I mention Christine, my wonderful, brilliant cofounder and CEO, and the time (years ago) when our relationship had broken down completely, forcing us to rebuild our trust from the ground up.

(Cofounder relationships can be hard. They are a lot like marriages; in their difficulty and intensity, yes, but also in that when you’re doing it with the right person, it’s all worth it. 💜)

Tips for Careful Communication

When a relationship has very little trust, you tend to interpret everything someone says in the worst possible light, or you may hear hostility, contempt, or dismissiveness where none exists. On the other side of the exchange, the conversation becomes a minefield, where it feels like everything you say gets misinterpreted or turned against you no matter how careful you are trying to be. This can turn into a death spiral of trust where every interaction ends up with each of you hardening against each other a little more and filing away ever more wounds and slights. 💔

Yet you HAVE to communicate in order to work together! You have to be able to ask for things and give feedback.

The way trust gets rebuilt is by ✨small, positive interactions✨. If you’re in a trust hole, you can’t hear them clearly, and they can’t hear you (or your intent) clearly. So you have to bend over backwards to overcommunicate and overcompensate.

There are lots of books out there on how to talk about hard topics. (We actually include a copy of “Crucial Conversations” in every new employee packet.) They are all pretty darn cheesy, but it’s worth reading at least one of them.

I’m not going to try and cover all of that territory. What follows is a very subjective list of tactics that worked for Christine and me when we were digging our way out of a massive trust deficit. Power dynamics can admittedly make things more difficult, but the mechanics are the same.

Acknowledge it is hard beforehand:

“I want to say something, but I am having a hard time with it.”
“I have something to say, but I don’t know how you’ll take it.”
“I need to tell you something and I am anxious about your reaction.”

What this does: forces you to slow down and be intentional about the words you’re going to use. It gives the other person a heads up that this was hard for you to say. Most of all, it shows that you do care about their feelings, and are trying to do your best for them (even if you whiff the landing).

… or check in afterwards.

“I’m not sure how that came across. Is there a better way I could have phrased it?”
“In my head that sounded like a compliment…how did you hear it?”
“Did that sound overly critical? I’m not trying to dwell on the past, but I could use your help in figuring out a better way.”

It’s okay if it’s minutes or hours or days later; if it’s still eating at you, ✨clear the fucking air.✨

Speak tentatively.

“Speak tentatively” is the exact opposite of the advice that people (especially women) tend to get in business. But it’s actually super helpful when the relationship is frayed because you are explicitly allowing that they may have a different perception, and making it safer to share it.

“From my perspective, it looks like these results might be missing some data… do you see the same thing?” opens the door for a friendly conversation based on concrete outcomes, whereas “You’re missing data” might sound accusatory and trigger fear and defensiveness.

Try to sound friendly.

Say “please” and “thank you” a lot. Add buffer words like “Hey there”, or “Good morning!” or “lol”. Even just using 🌱emojis🍃 will soften your response to an almost unsettling degree. This may seem almost insultingly simple, but it works. When trust is low, the lack of frills can easily be read as brusque or rude.

Take a breath.

If you are experiencing a physical panic response (sweating, heart racing, etc), announce that you need a few minutes before responding. Compose yourself. Firing off a reply while you are in fight-or-flight mode reliably leads to unintentional escalations.

If you need to take a few beats to read and process, take the time. But empty silence can also generate anxiety 🙂 so maybe say something to indicate “I’m listening, but I need a minute to absorb what you said”, or “I’m still processing”. (We often use “whoa…” as shorthand for this.)

(Alternately, if you find yourself really pissed, “whoa” becomes a great placeholder for yourself to get yourself under control 😬 before saying something you’ll regret having to deal with later.)

“The story in my head”.

When you are in a state where you are assuming the worst of someone and reading hostile intent into their words or actions, try to check yourself on those assumptions.

Repeat the words or behaviors back to them along with your interpretation, like: “The story in my head is that you asked me to send that status email because you don’t trust me to have done the work, or maybe even gathering evidence that I am not performing for a PIP.” This gives them the opportunity to reply and clarify what they actually meant.

Engineer positive interactions, even if you have to invent them.

Relationship experts say that there’s a magic ratio for happy, healthy relationships, which is at least five positive interactions for every one negative interaction. If you only interact with the people you have difficult relationships when you have something difficult to say, you are always going to dread it. Forever.

It might seem artificial at first, but look for chances to have any sort of positive interactions, and seize them.

Communicate positive intent.

In a low trust environment, you can assume everything you say will be read with a voice that is menacing, dismissive or sneering. It behooves you to pay extra attention to tone and voice, and to add extra words that overcommunicate your intended meaning. A neutral statement like “That number seems low”, or “Why is that number low?” will come out sounding brusque and accusatory, e.g. why isn’t that number growing? it’s your fault, you should know this, I blame you, you’re bad at your job. Not might: it will. Try to immunize your communication from distortion by saying things like,

“Hey, I know this just got dropped in your lap, but do you have any idea why this number is so low?”
“This number seems lower than usual. I’m wondering if it’s due to this other thing we tried. Do you have any better ideas?”
“I know it isn’t exactly in your wheel house, but can you help me understand this?”
“I’m new to this system and still trying to figure out how it works. Should this number be going down like this?”

It may seem excessive and time consuming, but it will save you time and effort overall because you will have fewer miscommunications to debug. ☺️

Give people the opening to do better.

We tend to make up our minds about people very quickly, and see them through that lens from then on. It takes work to open our selves up again.

“Assume positive intent” is a laudable goal, but in practice falls short. If every word someone says sounds accusatory or patronizing to you, what are you supposed to do with that advice? Just pretend you don’t hear it, or tell yourself they mean well? That’s not sustainable; your anger will only build up.

But if you can hold just enough space for the idea that they might mean well, then you can give them the opportunity to clarify (and hopefully use different words next time). Like,

Person A: “Why is that number low?”
Person B: “I’m not sure.”
(pause)
Person B: “…. Hey, sorry to interrupt, but the story in my head is that you think owning that number is part of my job, and now you’re upset with me, or you think I’m incompetent at my job.”
Person A: “OMG no, not at all. I’m just trying to figure out who understands this part of the system, since it seems like none of us do! 😃 Sorry for stressing you out!”

and maybe next time it will start off like…

Person A: “Hey, do you have any idea why this number is low? It’s a mystery … nobody I’ve talked to yet seems to know.” 🙂

Remember the handicaps, value the effort.

Ever meet someone you didn’t like online, and realize they’re terrific in person? Online communication loses sooooo much in transit. Christine and I know each other extremely well, and still sometimes we realize we’re reading way too much into each other’s written words. That’s when we try to remember to move it to “mouth words”, aka zoom or phone. Not as good as in person, but eons better than text.

Once you’ve met someone in person, it’s usually easier to read their written words in their voice, too.

Some people just aren’t great at written communication. Some people have neurodiversities that make it difficult for them to hear tone. Some people have English as a second language. And so on. Do give points for effort; if they’re trying, obviously, they care about your experience.

To the best of your ability, try to resist reading layers of meaning into textual communication; keep it simple, overcommunicate intent, and ask for clarity. And if someone is asking you for clarity, help them do a better job for you.

Helicopter Management and Other Mistakes

June 19, 2023July 14, 2023 mipsytipsybusiness, careers, culture, management4 Comments

You are a freshly minted manager. You come full of rage and frustration at the poor management you’ve endured and witnessed in tech, and you are god damn determined not to repeat all of those mistakes.

You are tired of reporting to a manager who isn’t transparent with you, who hoards critical information and isn’t forthcoming about changes that impact you. You are tired of not being listened to or treated like a cog, so you swear to really listen and take your reports seriously.

You have seen sooooo many managers who failed to develop their people or sponsor them for growth opportunities, who blamed their team and hung them out to dry instead of having their back behind closed doors. Managers who didn’t seem to care about you as people, or who never made it feel safe to say, “I need a mental health day”. Managers who dangled the promise of a promotion, but even though you are doing the work, the recognition never comes.

Fuck that shit. You aren’t going to do ANY of it.

And … you don’t! 🎉

🌸🍃 You make time in your 1x1s to ask about their personal lives and hobbies — you are careful not to pry or be intrusive, just showing that you care. You urge them to take vacation, often. You remind them, firmly, not to be a hero. You model the behavior of taking mental health days to show that not only is it safe, but managers need them too.

🌸🍃 You ask them about their career goals and aspirations. You make it your personal mission to get them promoted, so you frequently check in with them to make sure they’re on the right path. You keep an eye out for things they do that are above and beyond, and for strengths that make them special. They always know you are on their side.

🌸🍃 If you hear about a clash or a conflict between them and another team member, you quickly jump in to figure out what’s going on and make sure it gets resolved, with each person feeling heard before the conflict has time to marinate or fester.

🌸🍃 When reviews comes around, you write warm, passionate essays for each of your direct reports, listing all the things they have done and all the ways they have grown. You go in to manager calibrations fully prepped to advocate for each of your people to get the promotions and rewards they so richly deserve.

🌸🍃 If someone on your team ends up needing more help, whether that’s keeping them productive and on track or helping with prioritization or conflicts… whatever it is, you are there to help turn the situation around. This person was struggling under their old manager, maybe even close to being let go, but under your care they are thriving.

🌸🍃 Nobody ever leaves your team. This is a point of quiet pride for you. People want to transfer to your team, but never from. There may have been a couple close calls, but you are always able to save the day by talking it out with the person and figuring out what they need in order to stay.

🌸🍃 You take pride in your transparency and the democratized ethos of your team, where you collectively determine your priorities and no one feels pressured into doing something they don’t want to do.

Bottom line: you are a GREAT manager.

Right???

After all, your team ranks sky high on every company survey on employee happiness, manager trust, and autonomy and sense of purpose. Your team fucking LOVES YOU. You’re pretty sure they would follow you to your next job, if you left. So maybe you ARE the world’s greatest manager.

Or maybe…you are heavily optimizing for one aspect of the management role, the part where you interface with your direct reports as an ally and coach. You might even be optimizing to the extent that you are neglecting or outright harming other aspects of the job.

Rookie Mistake #1: Only Managing Down

But management means coaching and supporting the people on your team, right? What else is there?

Well.. a lot, actually. Like, the business needs to succeed, for starters. ☺️ And there are a bunch of other relationships that matter besides your own direct reports. A good, strong manager needs to care about:

Goals and planning. Managers are generally responsible for crafting a team roadmap out of the impossible mess of company strategy requirements, requests from other teams, product roadmap commitments, and KTLO (keep the lights on) work. Some companies also use OKRs.
Right-sizing workloads. There is always at least 10x as much work to be done as cycles to do work, which means estimating how much your team can deliver, planning that work, and dealing with the inevitable surprises that come up during execution. How do you balance urgency vs importance? It is YOUR job to make sure your team isn’t overcommitted.
Stakeholder management. Does your team have a reputation for delivering quality work when they say they will, more or less on schedule? Are you a good neighbor to other teams, or do they feel like anything they ask for goes into a black hole? This is largely determined by you.
Managing up. Your manager relies on you to provide enough visibility into your team that they can (at minimum) make good decisions where your team is involved, and help head off any problems or conflicts before they escalate.
Managing up (another sort) is the relationships you build and impressions you leave with your skip-level and other adjacent leaders. You are your team’s representative and ambassador. Leaders form a view of the org based on scraps of data. For the sake of your team: give good scraps.
Managing ~~out~~ horizontally. Building great relationships and a web of mutual support with your peers. Sharing context with each other. Managers are like the nervous system, carrying signal from point to point.
Contributing to the organization your team sits in, and its standards, policies, and structural integrity. This is the one most likely to suffer if a manager is laser focused on their own team. This means things like…applying the job ladder fairly and consistently, without playing favorites. Engaging in a dialogue with the ladder rather than bending the rules or making an exception.

As a manager, you have been granted certain formal powers by the org, to be used for the benefit of the org. This means you have a responsibility to care for the organization, and your team within that context.

You shouldn’t be advocating for the benefit of your team members, you have a greater responsibility to the rules and categories of the system, which you collectively maintain and agree upon. The system can’t survive if every manager is gaming the rules on behalf of their team. The system only works if every manager is playing fair.

As a line manager, the work you do interfacing with your team will likely be 50-75% of your time and energy … and impact. But this ratio changes as you go up the org chart. As a VP? Maybe 10-20% of your energy and time can go to your direct reports.

The higher up the ladder you go, the less important your bedside manner becomes and the more important your strategic direction becomes. You are first and foremost responsible for the company’s success, not for your reports’ feelings and career development.

Rookie mistake #2: Helicopter management

If rookie manager mistake number one is thinking that management consists of coaching and interfacing with your team, mistake number two is closely related. Mistake number two is … overmanaging the team, coddling people, and basically never allowing anyone to fail. I think of it as “helicopter managing”.

Helicopter management consists of overly identifying with your team and their needs and wants, instead of taking a step back and considering them in the full context of the organization, or letting them take risks and stand on their own two feet. You’re their manager, not their babysitter.

I have a personal story to illustrate this.

Once upon a time, many years ago, I had a team member who was energetic, highly talented, and a little high strung. I ended up spending a lot of time managing their relationships with other team members, keeping them on track with their projects, and helping them manage their emotional state in general. They nearly left in a dudgeon one time, and I think most managers would have let them go, but I saved the day and they stayed. I was actually really proud of the fact that I had retained them and kept them high-functioning for years. If you asked me, I would have shelved this under my successes, maybe even “proudest manager moments”.

Years later, though, I look back on this situation through very different eyes. Yes, I retained them at the company / on the team, with decent relationships, and they did a lot of good work! But should I have?

At what cost?

Most weeks, I probably spent 50-75% of my total emotional bandwidth on this one person’s needs. For years. Is this the best thing I could have done for the company with all that time and energy? Probably not!

Was this the best thing I could have done for them? I don’t even think it was that either! All that my coddling ultimately did was teach them the wrong lessons, and prevent them from learning the right ones. It delayed those lessons by a few years, and made learning them all the more painful when they finally came.

There are no clear bright lines here. But it’s worth checking in with yourself from time to time, and asking hard questions.

You spent all that time coaching and doing a diving save to retain that person …. but should you have? Is this really the best place for them to be at this point in their career?
Or let’s say you managed to turn around someone’s performance from failing to succeeding. Great!! But are you confident they are set up to excel, or are they always going to be hovering on the lower bound of acceptable performance? Are you going to be having this same discussion again next quarter?
Consider all that time you spent intimately entwined with every detail of every technical project your entire team was working on, reading every PR and design doc. Should you have? Or did you unintentionally deprive them of some agency, while cheating yourself out of time you should have spent becoming a better leader, strengthening your org, or understanding next year’s challenges?
Are you giving people only positive feedback? This is a common rookie manager mistake, and it often comes from a place of kindness, or overcompensating for overly negative environments. But you are not only cheating your people of opportunities for growth, you are teaching them that growth is something to be feared and avoided.
Or are you cheerleading people so intensely that they come away with a lopsided view of how valuable or advanced their skills are? Are you promoting fast and loose, so they grow to equate promotions with career development? Have you been spoon feeding them growth, or are they developing autonomy over time?

This can be especially unfortunate at higher levels, where autonomy is part of the definition of being a senior+ engineer. You might be stifling them and not allowing them to exercise that agency, or even develop that skill. For senior contributors, autonomy is what they bring. You gotta let them do it.

This shit is challenging. There are no simple answers. The “right” answer is often only obvious in retrospect, months or years later. Everyone needs help sometime, some of us more than others, and that’s okay.

But is it sustainable? What price will you pay?

What I do know is that if you haul someone over the finish line, that is not a success. If you’re going to be having the same hard conversations with them again in one month, three months, six months…that is not success. If they are going to have a rough landing the next time they change teams, that is not success, nor is it in their best interest. And if your team is overly dependent on you, you aren’t actually doing your job.

And honestly? People really WANT to be challenged. They crave it! Or at least the people you want to work with do.

Rookie Mistake #3: Your view of the system is incomplete

I’m only going to touch on this one very briefly; it’s long and complicated, and probably deserves a post of its own.

Systems thinking is a core skill for both managers and engineers. It’s not a skill we are born with; it takes a lot of practice and failure to develop good instincts for debugging complex systems. As an engineering manager, you may have spent 10+ years writing software and learning how computers work, but you have hardly begun to understand how business and organizational systems work.

This explains a lot when it comes to the empathy gap between engineers and management, I believe. 🙃

We spend a lot of time talking about empathy these days — empathy between teams, people, neurotypes; holding space for the fact that nobody is always at their best, etc. Yet engineers can still be incredibly dismissive and judgey towards management actions and organizational decisions.

We see a decision that doesn’t make sense to us, or that we wouldn’t have made, and we write it off as being selfish, uninformed, incompetent, stupid, money-grubbing, bureaucratic, untrustworthy, craven, selling out. Or — maybe worst of all — we shrug and say something cynical about how this kind of thing always happens in business. Or they’re out to get us, or they never listen to us, or it shows how much they don’t give a shit about us..

Far be it to me to excuse corporate venality, or to try and blow smoke up your ass about your leaders’ motives. But in many, many of these situations, this actually represents a failure of systems thinking when it comes to imagining the complex business, corporate, and people systems your leaders are operating in.

When you find yourself thinking things like:

“Why am I hearing this feedback so late, in such a roundabout way? Why didn’t they just come to me right away and tell me directly, and I could have fixed it so much sooner!”
“Why would they hire someone external to fill that role, instead of promoting the person who has been doing the work just fine in the meantime? Typical exec move; they never see the potential in the people they have, they always want to get someone who has already done the job before.”
“Why is our roadmap changing yet again? Why is this getting dropped in our lap? Our director doesn’t seem to know anything about building good software.”
“Why didn’t I get invited to that meeting, when it was about MY budget and MY workload? You can’t even get a seat at the table around here unless you have a director title or report to the CEO.”
“Why is that person being given ANOTHER chance? If they weren’t a straight white guy, they would have been fired a year ago.”

… or anything else that boils down to “other managers are stupid, hypocritical, or bad at their jobs”, stop yourself, and first try to understand “under what circumstances might their action be a reasonable one, or even the right one?”

Approaching people systems problems with curiosity, empathy, and the full awareness that you may never know the entire story (and there may be good reasons for this!) will make you a better coworker and a much more effective leader.

And if you are working as a manager at a company where you have enough evidence to prove that you cannot, should not take such a generous view of your peers, then maybe don’t. Like, if you have a professional responsibility to represent an organization you can not ethically represent… I would suggest not doing that. ¯\_(ツ)_/¯ If you can.

Your View of Your Manager Is Incomplete

One of the most challenging things to deal with, in my (limited) experience as an exec, is when you have a manager who is well-liked by their team but fundamentally ineffective in the role, so you have to replace them. Then you are left with a team left feeling bereft and confused, and you can’t just give them a list of all the ways their manager was actually dropping the ball and not doing their job well, because people deserve some privacy and dignity. You pretty much just have to suck it up, and hope that you have enough trust banked for them not to quit.

It’s entirely possible for a manager to be beloved by their entire team, while having a corrosive effect on the system around them. Sometimes it’s their very willingness to bend the system for their people’s benefit that generates that loyalty.

An unwary manager may create a sort of island within the company, where the team does not feel part of — may even feel separate from, superior to, or suspicious of — the broader org or the company. Team members may feel like “this is the only team I would ever want to belong to at this company”, or “my manager is great, they protect us from all the big company bullshit”, or “it’s us against the world”, or “nobody understands us, except our manager”. These are seductive dynamics to slip into because tribalism is so powerful. The more apart you feel from the company, the more tightly you may bond with each other.

I’m not judging you. I was that manager at Parse, to some extent, after the Facebook acquisition. I did not give a shit about the org, I only cared about my team. So…I get it. I still wouldn’t want me as a manager in my org.

What would you do?

Put yourself in your senior leadership’s shoes. What would YOU do if you had to choose between a good, reliable manager who gets the job done for the org and isn’t particularly beloved by their reports (she’s not awful, the team thinks she’s “fine”), vs a manager who is beloved by their reports and cares about their career growth deeply, but is weak at everything else? What will manager #2 cost the rest of the org, and who will bear those costs?

Don’t make your leadership make that choice. Be a manager who is good to your team, and good to the organization too.

Honestly, it is healthy for a manager to not identify too closely with their team. You should stand with them, but a step or two apart. After all, your job is to be pushing or pulling them in a direction, not just standing still and … marinating.

If you identify with them too closely, it can get very hard to tell your reports hard things. You may empathize with them so much that you put their feelings above the need to get shit done. You can be friends with your team, just like parents can be friends with their children, but the friendship doesn’t come first. Your formal role comes first.

On being a new manager who cares a lot

There’s something really beautiful about the energy and dedication that brand new managers bring to the role. Some of the most spectacular results I’ve ever seen for individual team members have come from teams with first time managers, who are determined and idealistic and pouring their whole heart and soul into the people reporting to them. They haven’t yet learned to pace themselves or to be more well-rounded with their time and energy, and sometimes a person can soak up that attention and turn a failing situation around into a thriving one.

I would never tell a manager that they should care less. Caring for people is the beating heart of this job. It doesn’t matter how efficient and effective you are at delivering feedback, managing people out, and planning roadmaps if you don’t truly give a shit about the people you serve. Even as you rise in the ranks and people-interfacing becomes a smaller % of what makes you good at your job, caring is still absolutely essential.

So I hope the message of this post doesn’t come across as “you think you’re a good manager, but here’s why you actually suck”. ☺️ You got into this role because you cared, and this is valuable. Never lose touch with it.

The message is simply that it took me years and years to learn that there is more to being a great manager than caring about my team. I hope you can learn it faster than I did.

Questionable Advice: “How can I drive change and influence teams…without power?”

June 5, 2023July 14, 2023 mipsytipsyadvice, deploys, devops, engineering, influence, management1 Comment

Last month I got to attend GOTO Chicago and give a talk about continuous deployment and high-performing teams. Honestly I did a terrible job, and I’m not being modest. I had just rolled off a delayed redeye flight; I realized partway through that I had the wrong slides loaded, and my laptop screen was flashing throughout the talk, which was horribly distracting and means I couldn’t read the speaker notes or see which slide was next. 😵 Argh!

Anyway, shit happens. BUT! I got to meet some longstanding online friends and acquaintances (hi JJ, Avdi, Matt!) and got to eat some of Hillel Wayne’s homemade chocolates, and the Q&A session afterwards was actually super fun.

My talk was about what high performing teams look like and why it’s so important to be on one (spoiler: because this is the #1 way to become a radically better engineer!!). Most of the Q&A topics therefore came down to some version of “okay, so how can I help my team get there?” These are GREAT questions, so I thought I’d capture a few of them for posterity.

But first… just a reminder that the actual best way to persuade people to listen to you is to make good decisions and display good judgment. Each of us has an implicit reputation score, which formal power can only overcome to an extent. Even the most junior engineer can work up a respectable reputation over time, and even principal engineers can fritter theirs away by shooting off at the mouth. 🥰

“how can I drive change when I have no power or influence?”

This first question came from someone who had just landed their first real software engineering job (congrats!!!):

“This is my first real job as a software engineer. One other junior person and myself just formed a new team with one super-senior guy who has been there forever. He built the system from scratch and knows everything about it. We keep trying to suggest ideas like the things you talked about in your talk, but he always shoots us down. How can we convince him to give it a shot?”

Well, you probably can’t. ☺️ Which isn’t the end of the world.

If you’re just starting to write software every day, you are facing a healthy learning curve for the next 3-5 years. Your one and only job is to learn and practice as much you possibly can. Pour your heart and soul into basic skills acquisition, because there really are no shortcuts. (Please don’t get hooked on chatGPT!!)

I know that I came down hard in my talk on the idea that great engineers are made by great teams, and that the best thing most people can do for their career is to join a high-performing, fast-moving team. There will come a time where this is true for you too, but by then you will have skills and experience, and it will be much easier for you to find a new job, one with a better culture of learning.

It is hard to land your first job as a software engineer. Few can afford to be picky. But as long as you are a) writing code every day, b) debugging code every day, and c) getting good feedback via code reviews, this job will get you where you need to go. When you’re fluent and starting to mentor others, or getting into higher level architecture work, or when you’re starting to get bored … then it’s time to start looking for roles with better teachers and a more collaborative team, so your growth doesn’t stall. (Please don’t fall into the Trap of the Premature Senior.)

This is an apprenticeship industry. You’re like a med student right now, who is just starting to do rounds under the supervision of an attending physician (your super-senior engineer). You can kinda understand why he isn’t inclined to listen to your opinions on his choice of stethoscope or how he fills out a patient chart. A better teacher would take time to listen and explain, but you already know he isn’t one. 🤷

I only have one piece of advice. If there’s something you want to try, and it involves doing engineering work, consider tinkering around and building it after hours. It’s real hard to say no to someone who cares enough to invest their own time into something.

“how can I drive change when I am a tech lead on a new team?”

“I have the same question! — except I’m a tech lead, so in theory I DO have some power and influence. But I just joined a new team, and I’m wondering what the best way is to introduce changes or roll them out, given that there are soooo many changes I’d like to make.”

(I wrote a somewhat scattered post a few years ago on engineers and influence, or influence without authority, which covers some related territory.)

As a tech lead who is new to a team, busting at the seams with changes I want to make, here’s where I’d start:

Understand why things are the way they are and get to know the personalities on your team a bit before you start pitching changes. (UNLESS they are coming to you with arms outstretched, pleading desperately for changes ~fast~ because everything is on fire and they know they need help. This does happen!)
Spend some time working with the old systems, even if you think you already understand. It’s not enough for you to know; you need to take the team on this journey with you. If you expect your changes to be at all controversial, you need to show that you respect their work and are giving it a chance.
Change one thing at a time, and go for the developer experience wins first. Address things that will visibly pay off for your team in terms of shipping faster, saving time, less frustration. You have no credibility in the beginning, so you want to start racking up wins before you take on the really hard stuff.
Roll up your sleeves. Nothing buys a leader more goodwill than being willing to do the scut work. Got a flaky test suite that everybody has been dreading trying to fix? I smell opportunity…
Pitch it as an experiment. If people aren’t sold on your idea for e.g. code review SLAs, ask if they’d be willing to try it out for three weeks just as an experiment.
Strategically shop it around to the rest of the team, if you sense there will be resistance…

At this point in my answer 👆 I outlined a technique for persuading a team and building support for a plan or an idea, especially when you already know it’s gonna be an uphill battle. Hillel Wayne said I should write it up in a blog post, so here it is! (I’ll do anything for free chocolate 😍)

“How can I get people on board with my controversial plan?”

So you have a great idea, and you’re eager to get started. Awesome!!! You believe it’s going to make people’s lives better, even though you know you are going to have to fight tooth and nail to make it happen.

What NOT to do:

Walk into the team meeting and drop your bomb idea on everyone cold:

“Hey, I think we should stop shipping product changes until we fix our build pipeline to the point where we can auto-deploy each merge set to production, one at a time, in under an hour.” ~ (for example)

…. then spend the rest of the hour grappling with everybody’s thoughts, feelings, and intense emotional reactions, before getting discouraged and slinking away, vowing to never have another idea, ever again.

What to do instead:

Suss out your audience. Who will be there? How are they likely to react? Are any of them likely to feel especially invested in the existing solution, maybe because they built it? Are any of them known for their strong opinions or being combative?

Great!!! Your first move is to have a conversation with each of them. Approach them in the spirit of curiosity, and ask what they think of your idea. Talking with them will also help you hash out the details and figure out if it is actually a good idea or not.

Your goal is to make the rounds, ask for advice, identify any allies, and talk your idea through with anybody who is likely to oppose you…before the meeting where you intend to unveil your plan. So that when that happens, you have:

given people the chance to process their reactions and ask questions in private
ensured that key people will not feel surprised, threatened, or out of the loop
already heard and discussed any objections
ideally, you have earned their support!

Even if you didn’t manage to convince every person, this was still a valuable exercise. By approaching people in advance, you are signaling that you respect them and their voice matters. You are always going to get people’s absolute worst reactions when you spring something on them in a group setting; any anxiety or dismay will be amplified tenfold. By letting them reflect and ask questions in private, you’re giving time for their better selves to emerge.

What to do instead…if you’re a manager:

As an engineer or a tech lead, you sometimes end up out front and visible as the owner of a change you are trying to drive. This is normal. But as a manager, there are far more times when you need to influence the group but not be the leader of the change, or when you need to be wary of sounding like you are telling people what to do. These are just a few of the many reasons it can be highly effective to have other people arguing on your behalf.

In the ideal scenario, particularly on technical topics, you don’t have to push for anything. All you do is pose the question, then sit back and listen as vigorous debate ensues, with key stakeholders and influential engineers arguing for your intended outcome. That’s a good sign that not only are they convinced, they feel ownership over the decision and its execution. This is the goal! 🌈

It’s not just about persuading people to agree with you, either. Instead of having a shitty dynamic where engineers are attached to the old way of doing things and you are “dragging them” into the newer ways against their will, you are inviting them to partner with you. You are offering them the opportunity to lead the team into the brave new world, by getting on board early.

(It probably goes without saying, but always start with the smallest relevant group of stakeholders, and not, say, all of engineering, or a group that has no ownership over the given area. 🙃 And … even this strategy will stop working rather quickly, if your controversial ideas all turn out to be disastrous. 😉)

“How do I know where to even start?!? 😱”

Before I wrap up, I want to circle back to the question from the tech lead about how to drive change on a team when you do have some influence or power. He went on to say (or maybe this was from a third questioner?*):

“There is SO MUCH I’d like to do or change with our culture and our tech stack. Where can I even start??”

Yeah, it can be pretty overwhelming. And there are no universal answers… as you know perfectly well, the answer is always “it depends.” ☺️ But in most cases you can reduce the solution space substantially to one of the two following starting points.

1. Can you understand what’s going on in your systems? If not, start with observability.

It doesn’t have to be elegant or beautiful; grepping through shitty text logs is fine, if it does the trick. But do any of the following make you shudder in recognition?:

If I get paged, I might lose the rest of the afternoon trying to figure out what happened
Our biggest problem is performance and we don’t know where the time is going
We have a lot of flaky, flappy alerts, and unexplained outages that simply resolve themselves without our ever truly understanding what happened.

If you can’t understand what’s going on in your system, you have to start with instrumentation and observability. It’s just too deadly, and too risky, not to. You’re going to waste a ton of time stabbing around in the dark trying to do anything else without visibility. Put your glasses on before you start driving down the freeway, please.

2. Can you build, test and deploy software in under an hour? If not, start with your deploy pipeline.

Specifically, the interval of time between when the code is written and when it’s being used in production. Make it shorter, less flaky, more reliable, more automated. This is the feedback loop at the heart of software engineering, which means that it’s upstream from a whole pile of pathologies and bullshit that creep in as a consequence of long, painful, batched-up deploys.

Here’s a talk I’ve given a few times on why this matters so much:

You pretty much can’t fail with one of those two; your lives will materially improve as you make progress. And the iterative process of doing them will uncover a great deal of shit you should probably know about.

Cheers! 🥂

charity.

* My apologies if I remembered anyone’s question inaccurately!

Ritual Brilliance: How a pair of Shrek ears shaped Linden Lab culture by making failure funny — and safe

May 30, 2023February 19, 2025 mipsytipsyblameless retros, culture, devops, linden lab2 Comments

[Originally posted on the now-defunct “Roadmap: A Magazine About Work” website, on May 30th, 2023. A pretty, nicely-formatted PDF version of this article can be downloaded here. Thanks to Molly McArdle for editing!]

If you talk to former Lindens about the company’s culture—and be careful, because we will do so at length—you will eventually hear about the Shrek ears.

When you saw a new person wearing the Shrek ears, a matted green-felt headband with ogre ears on it, you introduced yourself, congratulated them warmly, and begged to hear the story of how they came to be wearing them. Then you welcomed the new person to the team (“You’re truly one of us now!”) and shared a story about a time when you did something even dumber than they did.

My first job after (dropping out of) college was at Linden Lab, the home of Second Life. I joined in 2004 and stayed for nearly six years, during which the company grew from around 25 nerds in a room to around 400 employees who worked out of offices in Brighton, San Francisco, Menlo Park, and Singapore, or their own homes—wherever they were.

When I think back on that time now, almost two decades later, I’m puzzled by the Shrek ears phenomenon. I wasn’t exactly powerful then, at barely 20 years old. Not only was this my first real job, I was also the first woman engineer, and I made tons of mistakes. Shouldn’t I have found the practice of being systematically singled out and spotlighted for my errors humiliating, shaming, and traumatic?

Yet I remember loving the tradition and participating with joy and vigor. Everyone else seemed to love it, too. The practice spread beyond engineering and out into the rest of the company, not by fiat but because individual people would voluntarily track down the Shrek ears and put them on their own head. (I’m not imagining this, right?)

Step 1, break production; Step 2, put on Shrek ears

Here’s how it worked: The first time an engineer broke production or caused major outage, they would seek out the ears and put them on for the day. The ears weren’t a mark of shame—they were a badge of honor! Everyone breaks production eventually, if they’re working on something meaningful.

If people saw you wearing the ears, they would eagerly ask, “What happened? How did you find the problem? What was the fix?” Then they would regale you with their own stories of breaking production or tell you about the first outage they caused. If the person was self-flagellating or being too hard on themselves, the Shrek ears gave their colleagues an excuse to kindly but firmly correct it on the spot. It was Linden’s way of saying, Hey, we don’t do that here: “You did the reasonable thing! How can we make the system better, so the next person doesn’t stumble into the same trap?”

In those days, Linden was running a massively distributed system across multiple data centers on three continents, and doing so without the help of DevOps, CI/CD, GitHub, virtualization, the cloud, or infrastructure as code. We had an incredibly high-performing operations team, with a thousand-to-one server-to-ops engineer ratio, which was a real achievement in the days when the role required doing everything from racking and stacking boxes in the colocation center to developing your own automation software.

Failures were just fucking inevitable. In a world like that, devoid of the entire toolchain ecosystem we’ve come to rely on, you just had to learn to roll with it, absorb the hits, and keep moving fast. You could only test so much in staging; it was more important to get it out into production and watch it—understand it—there. It was better to invest in swift recovery, graceful degradation, and decoupling services than to focus on trying to prevent anything from going wrong. (Still is, as a matter of fact.)

This might all sound a little overwrought to you—maybe even dangerous or irresponsible. Didn’t we care about quality? Were we bad engineers?

The Shrek ears were “blameless retros” before there were blameless retros

I assure you, we cared. The engineers I worked with at Linden were of at least as high a caliber as the engineers I later worked with at Facebook (and a whole lot more diverse). In this specific place and time, the Shrek ears were what we needed to alleviate paralysis and fear of production, and to encourage the sharing of knowledge—even if anecdotal—about our systems.

In retrospect, the Shrek ears were a brilliant piece of social jujitsu. There was an element of shock value or contrarianism in celebrating outages instead of getting all worked up about them. But the larger purpose of the ears was to reset people’s expectations (especially in the case of new hires) and reprogram them with a different set of values: Linden’s values.

In the years since those early days at Linden, the industry has developed an entire language and set of practices around dealing with the aftermath of incidents: blameless post mortems, retrospectives, and so on. But those tools weren’t available to us at the time. What we did have was the Shrek ears. A couple of times a month something would break, the ears would be claimed, and we would all go around reminding one another that failure is both inevitable and ridiculous, and that no one is going to get mad at you or fire you when it happens.

Failure is always a question of when, not if

It’s important to note that you never saw anyone get teased or shamed for wearing the ears or for breaking production. There was a script to follow, and we all knew it. We learned it from watching others put on the ears, or by donning them ourselves. On a day when the Shrek ears had appeared, people would gather around at lunch or at the bar after work and swap war stories, one-upping one another and laughing uproariously.

Every new engineer was told, “If you never break production, you probably aren’t
doing anything that really matters or taking enough risks.”

It’s also important to emphasize that the ears were opt-in, not opt-out. You didn’t have to do it. And if you did take them, you could expect a wave of sympathy, good humor, and support. It affirmed that you deserved to be here, that you were part of the team.

And though the Shrek ears started in engineering, people in sales, marketing, accounting, and other departments picked them up over the years. It was a process of voluntary adoption, not a top-down policy. Someone would announce in IRC that they were wearing the ears today, and why. The camaraderie and laughter that ensued were infectious—and made it easier and easier over time for people to be transparent about what wasn’t working.

Rituals exist to instill values and train culture

In Rituals for Work, Kursat Ozenc defines rituals as “actions that a person or group does repeatedly, following a similar pattern or script, in which they’ve imbued symbolism and meaning.” Ritual exists to instill a value, create a mindset, or train a reflex.

And this particular ritual was extremely effective at taking lots of scared engineers and teaching them, very quickly:

✨ It is safe to fail✨
✨ Failure is constant✨
✨ Failure is fucking hilarious✨

At Linden, failure was not something to be ashamed of or to hide from your teammates. We understood that it’s not something that happens only to careless or inexperienced people. In fact, the senior people have the funniest fuckups—because what they are trying to do is insanely hard. The Shrek ears taught us that you fail, you laugh, you drink whiskey, you move on.

Other companies had similar rituals around the same time—Etsy famously had the “three-armed sweater,” which they would pass around to whoever had last broken production. But I’ve never again worked at a place where mistakes were discussed as freely and easily across the entire company as they were at Linden Lab. And I think the Shrek ears had a lot to do with that.

Their point was never to single out the person who had made a mistake and humiliate them, but the exact opposite. By putting on the ears, you said not just “Hi, I made a mistake” but also “I’m going to be brave about it, so we can all collectively learn and improve.” It was a ritualized act of bravery rewarded by affirmation, empathy, and acceptance. At Linden, the Shrek ears weren’t just a terrific tool for promoting team coherence and creating a sense of belonging. They also provided structure to help individuals and teams recover from scary events, and even traumas.

In so many ways, Linden Lab was ahead of its time

Linden was an extremely strange workplace when I was there, and it inspired unusually strong devotion, which we self-deprecatingly referred to as “the Kool-Aid.” It can be difficult to convey just how radical and weird it was at the time because the world has changed so much since then, and so many of the company’s “weird” philosophies have since gone mainstream. (Though not all: using “Kool-Aid” as a casual phrase to denote “excessive enthusiasm” or “cult-like devotion” is now recognized by many as being in poor taste. After all, people actually died at the Jonestown massacre.)

In a lot of ways, Linden culture (and Second Life technology) was profoundly, recognizably modern, and similar to the best workplaces of today, 20+ years later.

Philip Rosedale, Linden’s founder and CEO, is an inventor and technologist who believed it was every inch as interesting and important to experiment with company culture as with the virtual worlds we built. Except we did it all from scratch: building the technology and the culture together. And this led us down some weird rabbit holes, such as a cron job that rsynced the entire file system down over thousands of live servers every night. And the Shrek ears.

There was a period when “Choose your own work” was a company core value, and there were effectively no managers. (Not every experiment worked!) We went all-in on a fully distributed company culture at a time when practically no one else had. We ran a massively distributed, high-concurrency virtual world at a time before microservices, sharded databases, config management virtualization, AWS, or SRE and DevOps.

I can understand why people now find this story horrifying

With the distance of time, I get why the Shrek ears might make you recoil. If you think “That sounds awful! What kind of monsters would do that to each other?”—you are far from alone. Any time I mention the story in public, a sizable minority of people are aghast and appalled. Representative quotes include:

“I hope you realize how many people you traumatized by doing this to them.”

“I wonder how many introverted people found this excruciating but were too
afraid to say so.”

“Office bullying is fucked up even with cute Shrek ears.”

Even:

“We heard about the Shrek ears from an engineer we interviewed. He was telling us how great they were, but we were all so horrified that we declined to hire him because of it.”

And they’re right. It sounds awful to us now. It really does! It sounds like we were singling people out for their failures, like a dunce cap. I wouldn’t be surprised to someday learn that, in fact, a small number of people did felt pressured into using the ears, or hated them and were too afraid to say something. But how do we account for the fact that this tradition was so deeply beloved by so many—and that we are still fondly reminiscing about it more than 15 years later? It had a purpose.

Linden Lab was an incredibly progressive company for its time: very anti-hierarchical, very much about empowering people to be creative and independent. It also was by far the most diverse company I’ve ever worked in (other than Honeycomb, which I cofounded and where I’m CTO), with lots of women and genderqueer and trans people and people of color. We were way out on the sensitive branch relative to tech at that time. It’s tough to square this knowledge of what Linden was like as a place with the reactions some people outside the organization have to the Shrek ears.

I think this is, above all, a sign of progress. So many questionable practices that were ordinary back then—like referring to everyone as “guys,” using terms like “master/slave” for replication, or throwing alcohol-sloshed parties—are now rightfully frowned upon. We have become more sensitive to people’s differences and more clued into the power dynamics of the workplace. It’s far from perfect, but it is a lot better.

As a ritual, the Shrek ears were powerful and did the job. They were also fun—proving once again that making something goofy is the best way to make it stick. But I can’t imagine plopping Shrek ears on a new hire who has just broken production in 2023. And honestly, I think that’s probably a good thing. It’s time for new rituals.

Choose Boring Technology Culture

May 1, 2023July 14, 2023 mipsytipsybusiness, culture, startupsLeave a comment

Honeycomb recently announced our $50M Series D funding round. We aren’t the type to hype this a lot; Emily summed it up crisply as, “Living another day on someone else’s money isn’t business success, even though it is a lovely vote of confidence.”

Agreed. The vote of confidence does mean more than usual, given the dire state of VC funding these days, but…raising money is not success. Building a viable, sustainable company is success.

Whenever we are talking to investors, something that inevitably comes up is what a bomb ass team 🌈 we have. They have always been impressed by our ability to recruit and retain marquee names, people we “shouldn’t have been able to get” at our stage; honestly it’s even better than they realize, because we have heavy hitters all up and down the company, most of whom simply aren’t as well known. 😉 (Fame, and this may shock you, is not a function of talent.)

People join Honeycomb for many reasons, but “culture” is one of the most commonly cited. We have never been shy about talking about the ways we think tech culture sucks, or the experiments we are running. But this has given rise to the occasional impression that we are primarily cultural innovators who occasionally write software. We really aren’t.

In fact, I’d say the opposite is true. We try to choose boring culture.

What The Fuck Does “Culture” Even Mean?

Ok, so this is where the problem starts. This is why it grates on my nerves any time someone starts making pronouncements about how “your culture is bad”, “culture is the problem”, “fix your broken culture”… AUUGGGHHH. Those sentences are MEANINGLESS.

What does “culture” even mean?? Let’s consult the interwebs:

Culture: “An umbrella term which encompasses the social behavior, institutions and norms for a group; knowledge, beliefs, arts, laws, customs, capabilities, and habits of those individuals”
Culture: “The shared values, goals, attitudes and practices that characterize an organization; working environment, company policies and employee behavior”
Culture: “Maintain tissue cells, bacteria, etc in conditions suitable for growth”

Well, at least that last one makes sense. 😛 But if culture means everything, then culture means nothing. That’s just not helpful!

Instead, let’s disambiguate company culture into two categories. There is the formal culture of the organization (meetings, mission/vision, management, job ladders, hiring practices, strategy, organizational structure, team dynamics, and so on), and there is the informal culture of the people, the ways that humor, playfulness, and practices manifest in groups and individuals.

Organizational culture is professional, formal, structural, institutional. Managerial responsibilities, promotions, compensation plans, and fiduciary duties are just a few of the .many aspects of organizational culture.

Informal culture is chaotic, joyful, free-spirited, and fun, individualized, inherently anarchic and bottoms-up. It’s things like writing release notes in limerick form, bringing banana bread to work after an outage, long pun threads, slack channels dedicated to pets, competing on the number of employees named “Jess”* vs “Chris”*.

Organizational culture is the cake; informal culture is the frosting. Organizational culture is what leaders are hired to build, informal culture is what bubbles up irrepressibly in the gaps. (I wish I had better names for these!) And when it comes to formal, organizational culture, you don’t want to be in the business of innovating.

Culture Serves The Business

As a leader, you should absolutely care about your culture, but your primary responsibility is the health of the business. The purpose of your culture is to make your business succeed. It does not serve you, and it does not serve the people you care about, to be unclear on this front.

I don’t mean to make it sound like this is simple or easy. It is not. You are dealing with people’s lives and livelihoods, and it is all about tradeoffs. What might be best for an individual in the long run (for example, leaving your company to pursue another opportunity) might harm your business in the near term. Yet you might decide to celebrate them in leaving and not pressure them to stay, because you believe that what’s best for your business in the longer term is for employees to be able to trust their managers when they say, “I believe that working here is the best thing you can do for your career right now.”

The transactional nature of work relationships is how they differ from e.g. family relationships. You can form intense bonds and deep friendships with the people you work with — you may even form bonds that transcend your work relationship — but your relationship at work comes with terms and conditions.

Your company culture can’t be everything to everyone. Nor should you try.

You HAVE to care more about the health of the business than about culture for culture’s own sake. Even if — especially if — you have lots of strong opinions about culture, and there are lots of ways you want to deviate from common wisdom. Doing well at business is what earns you more innovation tokens to invest.

“Choose Boring Technology Culture”

Dan McKinley coined the phrase “choose boring technology” and the concept of innovation tokens nearly a decade ago.

“Boring” should not be conflated with “bad.” There is technology out there that is both boring and bad [2]. You should not use any of that. But there are many choices of technology that are boring and good, or at least good enough….The nice thing about boringness (so constrained) is that the capabilities of these things are well understood. But more importantly, their failure modes are well understood. — @mcfunley

The moral of the story is that innovation is costly, so you should choose standard, well-understood, rock-solid technologies insofar as you possibly can. You only get a few innovation tokens to spend, so you should spend them on technologies that can give you a true competitive advantage — not on, like, reinventing memcache for the hell of it.

The same goes for running a business, and the same goes for organizational culture. We have collectively inherited a set of default practices that work pretty well, like the 40 hour work week and having 1x1s with your manager. You CAN choose to do something different, but you should probably have a good reason. To the extent that you can learn from other people’s experience, you probably should, whether in business or in tech; innovation is expensive, and you only get so many tokens. Do you really want to spend one on a radical reinvention of your PTO policy? How does that serve you?

Innovation gets all the headlines, but I would posit that what most companies need is actually much simpler: organizational health.

Great Culture begins with Organizational Health

There’s this book by Patrick Lencione called “The Advantage: How Organizational Health Trumps Everything Else in Business.” (He is best known for writing “Five Dysfunctions of a Team“.) This guy is to organizational health what James Madison was to constitutional government: a very specific kind of genius.

I picked up “The Advantage” in 2020, around the time Honeycomb stopped teetering on the brink of failure, once it became clear we were likely to be around for a while. It made a huge impression on me. He makes the case that most businesses spend a ton of energy on trying to be “smart”, and relatively little on being “healthy”.

Healthy orgs are characterized by minimal politics, minimal confusion, high morale, high productivity, and low turnover. Health begets — and trumps — intelligence.

As Lencione says, an organization that is healthy will inevitably get smarter over time. People in a healthy organization will learn from each other, identify problems, and recover quickly from mistakes. Without politics and confusion, they will cycle through problems and rally much faster than dysfunctional rivals will. And they create an environment in which everyone else can do the same, which creates a multiplier effect.

The healthier an org is, the more of its collective intelligence it is able to tap into and use. Most orgs exploit only a fraction of the knowledge, experience, and intellectual capital available to them, but the healthy ones can tap into almost all of it.

Organizational Health Is Too Boring

No one would disagree with any of this, in principle. ☺️ EVERYBODY wants to work at a place where the mission, vision, and values are clear, meaningful and inspiring; where everyone is rallied around the same winning strategy; and where it’s crystal clear how your role specifically will contribute to that success. Everybody agrees that a healthy organizational culture leads to better outcomes.

So why isn’t every company like that?

Well, it is much easier said than done. ¯\_(ツ)_/¯ It is unglamorous work, difficult to measure, and at the end of the day we are always making risky decisions between conflicting tradeoffs based on partial information. We are imperfect meat sacks who lack self awareness, struggle to understand each other, and get hangry and snap. And the job is never done. You never “get there”, any more than you are ever perfectly healthy with perfect relationships.

But that doesn’t mean we shouldn’t try. We don’t have to be perfect to be a meaningfully better presence in people’s lives. We just have to be healthy enough to achieve our goals.

Nobody Wants An “Exciting” Company Culture

When you tell your partner you had an exciting day at work, do they respond with “uh oh 😬🔥🧯”?

All too often, excitement at work comes from strategic swerves, projects getting canceled, lack of focus, missteps or conflicts, anxiety and passive aggression, outages or downtime, outrageous demands coming from out of left field, or getting information at the last minute that you should have had ages ago. Living on the edge of your seat can be very stimulating! Firefighting is a huge rush, and if you’re part of the essential glue holding this creaky vessel together, you can get hooked on feeling desperately needed.

But this isn’t good for your cortisol levels, and it doesn’t move the company forward. When so much of your energy goes to bailing water and staying afloat, you don’t have much left over for rowing the oars. You want energy going to the oars.

Should work be exciting? ¯\_(ツ)_/¯ It’s not the adjective I would reach for. Emotional rollercoaster rides don’t provide the kind of circumstances that tend to unlock great design or engineering, or collaboration or focus. I would rather reach for words like achievement, fulfillment, pride, comradeship, or the joy of being part of something greater than yourself, not “exciting” or “fun”.

Leaders Worry Too Much About Making Work Fun

As a leader, your job is not to “make work fun”. You are not here to entertain your employees. Your responsibility is to build a formal culture that works, that supports the success of the business.

So what, am I dooming you to a life of bureaucratic beige and meetings without puns? Fuck no.

If formal organizational culture is like the architecture, then informal culture is furnishings, light displays, murals and banners — whatever you do on the inside. You don’t want someone getting overly creative with the load-bearing beams. Save that for when it’s time to paint “Frozen” murals on the walls and hang the matching icicle curtains.

You want formal culture to be boring, stable, reliable, load-bearing…because this creates a safe structure for people to bring the humor, the fun, the joy, the delight, without any fear of building collapse. The company doesn’t have to bring the fun; people bring the fun. Have you met people? People are fucking weirdos. 🥰 If you create an emotionally safe zone and the conditions for success, informal culture will thrive. 🪅People bring the fun🪅 — they always do.

The best informal culture is almost always bottoms up. But managers, execs, HR/People teams, etc can encourage informal culture. One of the most powerful things you can do is just participate. Show up for drinks, play the board games, keep the puns rolling, get silly with your team! Your participation gives people permission and shows that you value their creative cultural labor at work.

Of course there are no bright lines — companies can throw great parties! — but that’s not the job; building a healthy org is the job. Doing that right frees people up to have joy at work. It makes the celebrations that much bigger, the fun that much funnier.

Success Is Rocket Fuel For Fun

Think back to the most corporate fun you’ve ever had at work — the biggest parties, celebrations, blowouts, etc. Were they holiday parties and random occasions, or were they actually linked to great achievements? I bet they were the latter.

You don’t gather at work for the fun of it, you come together to do great things. It stands to reason the peak moments of joy and bonding are fueled by a sense of accomplishment.

Even on a smaller scale, levity and joy are inextricably linked to doing great work and making customers happy. For example, ops/SRE teams are notorious for their gallows humor around outages (ops is ALWAYS the funniest engineering team, in my experience ☺️). But dark humor is only funny when you are also taking your work seriously. Joking about the inevitability of data loss stops being funny real fast if you are actually playing fast and loose with customer backups.

In the absence of success, progress, and high performance, the kind of “frosting” behaviors that bring so much hilarity to work — joking and teasing, puns and stories — actually stop being fun and start making people feel distracted, irritated and on edge. You don’t want to hear a steady stream of jokes from somebody who keeps letting you down.

Side note: unhealthy orgs may have pockets of humor, but it often comes at the expense of other, less prestigious teams. Lots of people may feel too anxious, powerless or threatened to participate. Your experience of whether those companies are “fun” or not is likely to depend heavily on where you sit in the hierarchy. But a healthy org creates the level conditions for humor, playfulness and creativity throughout the org.

Investing Your Innovation Tokens

So yeah. Despite our reputation for cultural innovation, I’d say we’re actually pretty conservative when it comes to operating a company.

Not only are we not revolutionaries, we are actually trying to do as little differently as possible, because innovation is costly!! Instead, we (as a leadership team) are more focused on trying to execute well and improve upon our organizational health. For the past year, we have been laboring especially hard over strategy — the diagnosis, guiding policy, and set of coherent actions we need to win. Our first responsibility is to make the business succeed, after all.

Which brings us back to the topic of innovation tokens.

I started writing down some of the innovation tokens I feel like we’ve spent. But it dawned on me that when I look at most of the cultural experiments we run, and the things we talk about and write about publicly — stuff like the dangers of hierarchy, hiring, interviewing, high-leverage teams, engineering levels, rituals for engineering teams, etc — it doesn’t feel like innovation at all. It’s all just about trying to have a healthier organization. Hierarchy sucks because visible hierarchy has been shown to dampen people’s creativity, motivation and problem-solving skills. Engineering levels are important because they bring clarity. And so on.

What makes something rise to the level of an innovation token is the amount of time we end up asking other people to invest lots of their time into.

Like, we are 1.5 years into a 4 year experiment having an employee on our board of directors. We are about to spin up an internal Advisory Panel to more broadly distribute the impact of our employee board member around the company.
In the past, we have experimented with regular ethics discussion groups.
Last year we did a deep dive into company values with small breakout groups.
Some internal decisions around things like values are handled, not by estaff, but by a group of six people; one employee representative of each org, nominated by their VP; who do a deep dive into the material together and come back with a decision or recommendation.
We are about to start the process of developing our own leadership curriculum. We know that we need to equip our managers with better tools, and culturally indoctrinate new employees, so I am excited to build something with our cultural fingerprints all over it.

We run a lot of experiments around transparency, like, the agenda for exec staff meetings can be viewed by the whole company. After every board meeting, we present the same thing we showed to the board to the whole company during all-hands. We are transparent on salary bands. Stuff like that.

We are far from perfect; we have a long ways to go, and when I look around the org it’s hard not to only see all the work left to be done. But we are a lot healthier and better off than we were a year ago, which was better off than we were two years ago, let alone three.

The Experience Of Making This Will Be With Us Forever

A few months ago I was reading this lengthy profile of Sarah Polley in the New Yorker, as she was doing a bunch of press for her new movie, “Women Talking”. (The movie itself sounds incredibly intense; I am still trying to find time and emotional energy to watch it. Someday!)

One thing she said got lodged in my brain, and I’ve been unable to forget it ever since. She’s talking about the experience of having been a child actor, and how intensely it informs the experience she strives to create for everybody working on the set of one of her movies; where parents get to go home and have dinner with their kids, etc.

[He] told her, “If this film is everything we want it to be, maybe, if we are very lucky, it will affect a few people for a little while, in a way that is out of our control. The only thing that’s certain is that the experience of making it will be with all of us—it will become part of us—forever. So we must try our best to make it a good experience.”

Making a movie that lots of people want to see, one that was a good financial return on investment, buys you the ability to make even more movies, employ more people, take even bigger creative risks. If all you want to do is be a niche indie player, working on a shoestring budget, more power to you. But if you really believe in your ideas, and you want to see them go mainstream … you need mainstream success.

Sarah Polley makes movies. We make developer tools. ☺️ But the same thing is true of working at Honeycomb.

If we are very lucky, and work very hard, our work may help teams build better software and spend fewer, more meaningful hours at work, for a long time to come. I love our mission. But the only certain thing is that the experience of making it will be with all of us, become part of us, forever.

So we should try our best to make it a good experience. ☺️

charity.

Footnotes

(1) Inherited Defaults

How to access these inherited defaults can be a bit more complicated than I make it sound. Working as a manager at Facebook for two years taught me more about these defaults than anything else I’ve ever done in my career. Big companies have had to figure a lot of shit out in order to function at scale, which is why I often advise anyone who plans on starting a company or being a director/VP to do a stint at one. Will Larson’s book “An Elegant Puzzle” does a great job of laying out defaults and best practices for engineering orgs, and his blog has even more useful bits.. Otherwise, you might wanna get yourself an advisor or two with a lot of operator experience, and get used to asking questions like “how does this normally get done?”

(2) Corporate Fun

There’s plenty of stuff in the grey area between formal, organizational culture and informal, individual culture. Companies often stray into fun-like adjacencies like holiday parties, offsites, etc. Fostering a sense of “play” and informality is actually really important for making teams click with each other, and obviously the company should foot the bill if it’s a work function. Just be mindful of what you’re doing and what your goals are when you veer into the rocky shoals of Forced Corporate Fun. 😆

Questionable Advice: “People Used To Take Me Seriously. Then I Became A Software Vendor”

March 29, 2023July 14, 2023 mipsytipsyadvice, culture, startups, vendorsLeave a comment

I recently got a plaintive text message from my magnificent friend Abby Bangser, asking about a conversation we had several years ago:

“Hey, I’ve got a question for you. A long time ago I remember you talking about what an adjustment it was becoming a vendor, how all of a sudden people would just discard your opinion and your expertise without even listening. And that it was SUPER ANNOYING.

I’m now experiencing something similar. Did you ever find any good reading/listening/watching to help you adjust to being on the vendor side without being either a terrible human or constantly disregarded?”

Oh my.. This brings back memories. ☺️🙈

Like Abby, I’ve spent most of my career as an engineer in the trenches. I have also spent a lot of time cheerfully talking smack about software. I’ve never really had anyone question my experience[1] or my authority as an expert, hardened as I was in the flames of a thousand tire fires.

Then I started a software company. And all of a sudden this bullshit starts popping up. Someone brushing me off because I was “selling something”, or dismissing my work like I was fatally compromised. I shrugged it off, but if I stopped to think, it really bothered me. Sometimes I felt like yelling “HEY FUCKERS, I am one of your kind! I’m trying to HELP YOU. Stop making this so hard!” 😡 (And sometimes I actually did yell, lol.)

That’s what I remember complaining to Abby about, five or six years ago. It was all very fresh and raw at the time.

We’ll get to that. First let’s dial the clock back a few more years, so you can fully appreciate the rich irony of my situation. (Or skip the story and jump straight to “Five easy ways to make yourself a vendor worth listening to“.)

The first time I encountered “software for sale”

My earliest interaction with software vendors was at Linden Lab. Like most infrastructure teams, most of the software we used was open source. But somewhere around 2009? 2010? Linden’s data engineering team began auditioning vendors like Splunk, Greenplum, Vertica[2], etc for our data warehouse solution, and I tagged along as the sysinfra/ops delegate.

For two full days we sat around this enormous table as vendor after vendor came by to demo and plump their wares, then opened the floor for questions.

One of the very first sales guys did something that pissed me off me. I don’t remember exactly what happened — maybe he was ignoring my questions or talking down to me. (I’m certain I didn’t come across like a seasoned engineering professional; in my mid twenties, face buried in my laptop, probably wearing pajamas and/or pigtails.) But I do remember becoming very irritated, then settling in to a stance of, shall we say, oppositional defiance.

I peppered every sales team aggressively with questions about the operational burden of running their software, their architectural decisions, and how canned or cherry-picked their demos were. Any time they let slip a sign of weakness or betrayed uncertainty, I bore down harder and twisted the knife. I was a ✨royal asshole✨. My coworkers on the data team found this extremely entertaining, which only egged me on.

What the fuck?? 🫢😧🫠 I’m not usually an asshole to strangers.. where did that come from?

What open source culture taught me about sales

I came from open source, where contempt for software vendors was apparently de rigueur. (is it still this way?? seems like it might have gotten better? 😦) It is fascinating now to look back and realize how much attitude I soaked up before coming face to face with my first software vendor. According to my worldview at the time,

Vendors are liars
They will say anything to get you to buy
Open source software is always the safest and best code
Software written for profit is inherently inferior, and will ultimately be replaced by the inevitable rise of better, faster, more democratic open source solutions
Sales exists to create needs that ought never to have existed, then take you to the cleaners
Engineers who go work for software vendors have either sold out, or they aren’t good enough to hack it writing real (consumer facing) software.

I’m remembering Richard Stallman trailing around behind me, up and down the rows of vendor booths at USENIX in his St IGNUcious robes, silver disk platter halo atop his head, offering (begging?) to lay his hands on my laptop and bless it, to “free it from the demons of proprietary software.” Huh. (Remember THIS song? 🎶 😱)

Given all that, it’s not hugely surprising that my first encounter with software vendors devolved into hostile questioning.

(It’s fun to speculate on the origin of some of these beliefs. Like, I bet 3) and 4) came from working on databases, particularly Oracle and MySQL/Postgres. As for 5) that sounds an awful lot like the beauty industry and other products sold to women. 🤭)

Behind every software vendor lies a recovering open source zealot(???)

I’ve had many, many experiences since then that slowly helped me dismantle this worldview, brick by brick. Working at Facebook made me realize that open source successes like Apache, Haproxy, Nginx etc are exceptions, not the norm; that this model is only viable for certain types of general-purpose infrastructure software; that governance and roadmaps are a huge issue for open source projects too; and that if steady progress is being made, at the end of the day, somewhere somebody is probably paying those developers.

I learned that the overwhelming majority of production-caliber code is written by somebody who was paid to write it — not by volunteers. I learned about coordination costs and overhead, how expensive it is to organize an army of volunteers, and the pains of decentralized quality control. I learned that you really really want the person who wrote the code to stick around and own it for a long time, and not just on alternate weekends when they don’t have the kids (and/or they happen to feel like it).

I learned about game theory, and I learned that sales is about relationships. Yes, there are unscrupulous sellers out there, just like there are shady developers, but good sales people don’t want you to walk away feeling tricked or disappointed any more than you want to be tricked or disappointed. They want to exceed your expectations and deliver more value than expected, so you’ll keep coming back. In game theory terms, it’s a “repeated game”.

I learned SO MUCH from interviewing sales candidates at Honeycomb.[3] Early on, when nobody knew who we were, I began to notice how much our sales candidates were obsessed with value. They were constantly trying to puzzle out out how much value Honeycomb actually brought to the companies we were selling to. I was not used to talking or thinking about software in terms of “value”, and initially I found this incredibly offputting (can you believe it?? 😳).

Sell unto others as you would have them sell unto you

Ultimately, this was the biggest (if dumbest) lesson of all: I learned that good software has tremendous value. It unlocks value and creates value, it pays enormous ongoing dividends in dollars and productivity, and the people who build it, support it, and bring it to market fully deserve to recoup a slice of the value they created for others.

There was a time when I would have bristled indignantly and said, “we didn’t start honeycomb to make money!” I would have said that the reason we built honeycomb because we knew as engineers what a radical shift it had wrought in how we built and understood software, and we didn’t want to live without it, ever again.

But that’s not quite true. Right from the start, Christine and I were intent on building not just great software, but a great software business. It wasn’t personal wealth we were chasing, it was independence and autonomy — the freedom to build and run a company the way we thought it should be run, building software to radically empower other engineers like ourselves.

Guess what you have to do if you care about freedom and autonomy?

Make money. 🙄☺️

I also realized, belatedly, that most people who start software companies do so for the same damn reasons Christine and I did… to solve hard problems, share solutions, and help other engineers like ourselves. If all you want to do is get rich, this is actually a pretty stupid way to do that. Over 90% of startups fail, and even the so-called “success stories” aren’t as predictably lucrative as RSUs. And then there’s the wear and tear on relationships, the loss of social life, the vicissitudes of the financial system, the ever-looming spectre of failure … 👻☠️🪦 Startups are brutal, my friend.

Karma is a bitch

None of these are particularly novel insights, but there was a time when they were definitely news to me. ☺️ It was a pretty big shock to my system when I first became a software vendor and found myself sitting on the other side of the table, the freshly minted target of hostile questioning.

These days I am far less likely to be cited as an objective expert than I used to be. I see people on Hacker News dismissing me with the same scornful wave of the hand as I used to dismiss other vendors. Karma’s a bitch, as they say. What goes around comes around. 🥰

I used to get very bent out of shape by this. “You act like I only care because I’m trying to sell you something,” I would hotly protest, “but it’s exactly the opposite. I built something because I cared.” That may be true, but it doesn’t change the fact that vested interests can create blind spots, ones I might not even be aware of.

And that’s ok! My arguments/my solutions should be sturdy enough to withstand any disclosure of personal interest. ☺️

Some people are jerks; I can’t control that. But there are a few things I can do to acknowledge my biases up front, play fair, and just generally be the kind of vendor that I personally would be happy to work with.

Five easy ways to make yourself a vendor worth listening to

So I gave Abby a short list of a few things I do to try and signal that I am a trustworthy voice, a vendor worth listening to. (What do you think, did I miss anything?)

🌸 Lead with your bias.🌸
I always try to disclose my own vested interest up front, and sometimes I exaggerate for effect: “As a vendor, I’m contractually obligated to say this”, or “Take it for what you will, obviously I have religious convictions here”. Everyone has biases; I prefer to talk to people who are aware of theirs.

🌸 Avoid cheap shots.🌸
Try to engage with the most powerful arguments for your competitors’ solutions. Don’t waste your time against straw men or slam dunks; go up against whatever ideal scenarios or “steel man” arguments they would muster in their own favor. Comparing your strengths vs their strengths results in a way more interesting, relevant and USEFUL discussion for all involved.

🌸 Be your own biggest critic.🌸
Be forthcoming about the flaws of your own solution. People love it when you are unafraid to list your own product’s shortcomings or where the competition shines, or describe the scenarios where other tools are genuinely superior or more cost-effective. It makes you look strong and confident, not weak.

What would you say about your own product as an engineer, or a customer? Say that.

🌸 You can still talk shit about software, just not your competitors‘ software. 🌸
I try not to gratuitously snipe at our competitors. It’s fine to speak at length about technical problems, differentiation and tradeoffs, and to address how specifically your product compares with theirs. But confine your shit talking to categories of software where you don’t have a personal conflict of interest.

Like, I’m not going to get on twitter and take a swipe at a monitoring vendor (anymore 😇), but I might say rude things about a language, a framework, or a database I have no stake in, if I’m feeling punchy. ☺️ (This particular gem of advice comes by way of Adam Jacob.)

🌸 Be generous with your expertise.🌸
If you have spent years going deep on one gnarly problem, you might very well know that problem and its solution space more thoroughly than almost anyone else in the world. Do you know how many people you can help with that kind of mastery?! A few minutes from you could potentially spare someone days or weeks of floundering. This is a gift few can give.

It feels good, and it’s a nice break from battering your head against unsolvable problems. Don’t restrict your help to paying customers, and, obviously, don’t give self-serving advice. Maybe they can’t buy/don’t need your solution today, but maybe someday they will.

In conclusion

There’s a time and place for being oppositional. Sometimes a vendor gets all high on their own supply, or starts making claims that aren’t just an “optimistic” spin on the facts but are provably untrue. If any vendor is operating in poor faith they deserve to to be corrected.

But it’s a shitty, self-limiting stance to take as a default. We are all here to build things, not tear things down. No one builds software alone. The code you write that defines your business is just the wee tippy top of a colossal iceberg of code written by other people — device drivers, libraries, databases, graphics cards, routers, emacs. All of this value was created by other people, yet we collectively benefit.

Think of how many gazillion lines of code are required for you to run just one AWS Lambda function! Think of how much cooperation and trust that represents. And think of all the deals that brokered that trust and established that value, compensating the makers and allowing them to keep building and improving the software we all rely on.

We build software together. Vendors exist to help you. We do what we do best, so you can spend your engineering cycles doing what you do best, working on your core product. Good sales deals don’t leave anyone feeling robbed or cheated, they leave both sides feeling happy and excited to collaborate.[4]

🐝💜Charity.

[1] Yes, I know this experience is far from universal; LOTS of people in tech have not felt like their voices are heard or their expertise acknowledged. This happens disproportionately to women and other under-represented groups, but it also happens to plenty of members of the dominant groups. It’s just a really common thing! However that has not really been my experience — or if it has, I haven’t noticed — nor Abby’s, as far as I’m aware.

[2] My first brush with columnar storage systems! Which is what makes Honeycomb possible today.

[3] I have learned SO MUCH from watching the world class sales professionals we have at Honeycomb. Sales is a tough gig, and doing it well involves many disciplines — empathy, creativity, business acumen, technical expertise, and so much more. Selling to software engineers in particular means you are often dealing with cocky little shits who think they could do your job with a few lines of code. On behalf of my fellow ~~little shits~~ engineers, I am sorry. 🙈

[4] Like our sales team says: “Never do a deal unless you’d do both sides of the deal.” I fucking love that.