First I wrote the wrong book, then I wrote the right book (xpost)

I’m not sure whether to say “thank you” or “HOW COULD YOU DO THIS TO ME”, but this one goes out to all the people who sent me advice on buying software last fall.

This is the second in a two-part episode. The first part ended on a ✨cliffhanger!!!✨ — so if you missed the first episode, catch up here:

Six long weeks of writer’s block

I was merrily cranking away what I believed to be my last chapter when I asked the internet — YOU guys — for help the first time. “Are you an experienced software buyer? I could use some help,” went up on September 19th, 2025.

The response was overwhelming. I heard from software engineers, SREs, observability leads, CTOs, VPs, distinguished engineers, consultants, even the odd CISO. All these emails and responses and lengthy threads kept me busy for a while, but eventually I had to get back to writing. That’s when I discovered, to my unpleasant surprise, that I couldn’t seem to write anymore.

“Well,” I reasoned, “maybe I’ll just ask the internet for EVEN MORE advice” — and out popped Buffy-themed post number two, on October 13th.

Keep in mind, I thought I would be done by then. November was my stretch deadline, my just in caseI better leave myself some breathing room kind of deadline.

As November 1st came and went, my frustration began spiraling out into blind panic. What the hell is going on and why can I not finish this???

In which I finally listen to the advice I asked for

A week before Thanksgiving, I was up late tinkering with Claude. I imported all the emails and advice I had gotten from y’all, and started sorting into themes and picking out key quotes, and that is when it finally hit me: I had written the wrong thing.

No, this deserves a bigger font.

✨I wrote the wrong thing.✨

I wrote the wrong thing, for the wrong people, and none of it was going to move the needle in any meaningful way.

The chapters I had written were full of practical advice for observability engineering teams and platform engineering teams, wrestling with implementation challenges like instrumentation and cost overflows. Practical stuff.

Yes.

The internet was right (this ONE time)

My inbox, on the other hand, was overflowing with stories like these:

  • “Many times [competitive research] is faked. One person has their favorite option and then they do just enough ‘competitive analysis’ to convince the sourcing folks that due diligence was done or to nullify the CIO/CTO/whoever is accepting this on to their budget”
  • “We [the observability team] spent six months exhaustively trialing three different solutions before we made a decision. The CEO of one of the losing vendors called our CEO, and he overruled our decision without even telling us.” (Does your CEO know anything at all about engineering??) “No.”
  • “Our SRE teams have vetoed any attempt to modernize our tool stack. ($Vendor) is part of their identity, and since they would have to help roll out and support any changes, we are stuck living in 2015 apparently forever.” (What does management have to say?) “It’s been twenty years since they touched a line of code.”
  • “We’re weird in that most of the company hates technology and really hates that we have to pay for it since they don’t understand the value it brings to the company. This is intentional ignorance, we make the value props continually and well, we just haven’t succeeded yet….We’re a little obsessed with trying to get champagne quality at Boone’s prices.”
  • “When it comes to dealing with salespeople and the enterprise sales process, the best tip for engineers is to not anthropomorphize sales professionals who are driven by commission. The best ones are like robot lawn mowers dressed in furry unicorn costumes. They may seem cute and nice but they do not care about anything besides closing the next deal….All of the best SaaS companies are full of these friendly fake unicorn zombies who suck cash instead of blood.”

Nearly all of the emails I got were either describing a terminally fucked up buying process from the top down, or the long term consequences of those fucked up decisions.

In other words: I was writing tactical advice for teams who were surviving in a strategic vacuum.

So I threw the whole thing out, and started over from scratch. 😭

Even good teams are struggling right now

As Tolstoy once wrote, “Happy teams are all alike; every fucked up team is fucked up in its own precious way.”

There is an infinity of ways to screw something up. But there is one pattern I see a critical mass of engineering orgs falling into right now, even orgs that are generally quite solid. That is when there is no shared alignment or even shared vocabulary between engineering and other stakeholders directors, VPs and SVPs, CTO, CIO, principal and distinguished engineers — on some pretty clutch questions. Such as:

  • “What is observability?”
  • “Who needs it?”
  • “What problem are we trying to solve?”

And my favorite: “Is observability still relevant in a post-AI era? Can’t agents do that stuff now?”

Even some generally excellent CTOs[1] have been heard saying things like, “yeah, observability is definitely very important, but all our top priorities are related to AI right now.”

Which gets causality exactly backwards. Because your ability to get any returns on your investments into AI will be limited by how swiftly you can validate your changes and learn from them. Another word for this is “OBSERVABILITY”.

Enough ranting. Want a peek? I’ll share the new table of contents, and a sentence or two about a couple of my own favorite chapters.

Part 6: “Observability Governance” (v2)

The new outline is organized to speak to technical decision-makers, starting at the top and loosely descending. What do CTOs need to know? What do VPs and distinguished engineers need to know? and so on. We start off abstract, and become more concrete.

Since every technical term (e.g. high cardinality, high dimensionality, etc) has become overloaded and undifferentiated by too much sales and marketing, we mostly avoid it. Instead, we use the language of systems and feedback loops.

Again, we are trying to help your most senior engineers and execs develop a shared understanding of “What problem are we solving?” and “What is our goal? Technical terms can actually detract and distract from that shared understanding.

  1. An Open Letter to CTOs: Why Organizational Learning Speed is Now Your Biggest Constraint. Organizations used to be limited by the speed of delivery; now they are limited by how swiftly they can validate and understand what they delivered.
  2. Systems Thinking for Software Delivery. Observability is the signal that connects the dots to make a feedback loop; no observability, no loop. What happens to amplifying or balancing loops when that signal is lossy, laggy, or missing?
  3. The Observability Landscape Through a Systems Lens. What feedback loops do developers need, and what feedback loops does ops need? How do these map to the tools on the market?
  4. The Business Case for Observability. Is your observability a cost center or an investment? How should you quantify your RoI?
  5. Diagnosing Your Observability Investment
  6. The Organizational Shift
  7. Build vs Buy (vs Open Source)
  8. The Art and Science of Vendor Partnerships. Internal transformations run on trust and credibility; vendor partnerships run on trust and reciprocity. We’ll talk about both of these, as well as how to run a strong POC.
  9. Instrumentation for Observability Teams
  10. Where to Go From Here

Hey, I have a lot of empathy right now for leaders and execs who feel like they’re behind on everything. I feel it too. Anyone who doesn’t is lying to themselves (or their name is Simon Willison).

But the role observability plays in complex sociotechnical systems is one of those foundational concepts you need to understand. You’re not gonna get this right by accident. You’re not going to win by doing the same thing you were doing five years ago. And if you screw up your observability, you screw up everything downstream of it too.

To those of you who do understand this, and are working hard to drive change in your organizations: I see you. It is hard, often thankless work, but it is work worth doing. If I can ever be of help: reach out.

A longer book, but a better book

The last few chapters are heading into tech review on Friday, February 20th. Finally. The last 3.5 months have been some of the most panicky and stressful of my life. I….just typed several paragraphs about how terrible this has been, and deleted them, because you do not need to listen to me whine. ☺️

Like I said, I have never felt especially proud of the first edition. I am not UN proud, it’s just…eh. I feel differently this time around. I think—I hope—it can be helpful to a lot of different people who are wrestling with adapting to our new AI-native reality, from a lot of different angles.[2]

Thanks, Christine. (Another for the folder marked ”NOW YOU TELL ME”)

I am incredibly grateful to my co-authors, collaborators, and our editor, Rita Fernando, without whom I never would have made it through.

But there’s one more group that deserves some credit, and it’s…you guys. I asked for help, and help I got. So many people wrote me such long, thought-provoking emails full of stories, advice and hard-earned wisdom. The better the email, the more I peppered you with followup questions, which is a great way to punish a good deed.

Blame these people

I am a tiny bit torn on whether to say “thank you” or “fuck you”, because my life would have been much nicer if I had stuck to the plan and wrapped in October.

But the following list of people were especially instrumental in forcing me to rethink my approach. It made the book much stronger, so if you catch one of them in the wild, please buy them a stiff drink. (Or buy yourself one, and throw it in their face with my sincere compliments.)

  • Abraham Ingersoll, the aforementioned “odd CISO”, who would be quoted in the book had his advice not been so consistently unprintable by the standards of respectable publications
  • Benjamin Mann of Delivery Hero, who I would work for in a heartbeat, and not just for the way he wields “NOPE” as a term of art
  • Marty Lindsay, who has spent more time explaining POCs and tech evals to me than anyone should have to. (If you need an o11y consultant, Marty should be your very first stop).
  • Sam Dwyer, whose stories seeded my original plan to write a set of chapters for observability engineering teams. (I hope the replacement plan is useful too!)

Many others sent me terrific advice, and endured multiple rounds of questions and more questions and clarifications on said questions. A few of them:

Matthew Sanabria, Chris Cooney, Glen Mailer, Austin Culbertson, John Scancella, John Doran, Bryan Finster, Hazel Weakly, Chris Ziehr, Thomas Owens, Mike Lee, Jay Gengelbach, Will Hegedus, Natasha Litt, Alonso Suarez, Jason McMunn, Evgeny Rubtsov, George Chamales, Ken Finnegan, Cliff Snyder, Robyn Hirano, Rita Canavarro, Matt Schouten, Shalini Samudri Ananda Rao (Sam).

I am definitely forgetting some names; I will try to update the list as I remember them.

But seriously: thank you, from the bottom of my heart. I loved hearing your stories, your complaints, your arguments about how the world should improve. Your DNA is in this book; I hope it does you justice.

~charity
💜💙💚💛🧡❤️💖

 

[1] It’s ironic (and makes me uncomfortably self-conscious), but some of the worst top-down decision-making processes I have ever seen have come from companies where CEO and CTO are both former engineers. The confidence they have in their own technical acumen may be not wholly unfounded, but it is often ten or more years out of date. We gotta update those priors, my friends. Stay humble.

[2] On the other hand, as my co-founder, Christine Yen, informed me last week: “Nobody reads books anymore.”

First I wrote the wrong book, then I wrote the right book (xpost)

Got opinions on observability? I could use your help (once more, with feeling)

Last month I dropped a desperate little plea for help in this space, asking people to email me any good advice and/or strong opinions they happened to have on the topic of buying software.

I wasn’t really sure what to expect — desperate times, desperate measures — but holy crap, you guys delivered. To the many people who took the time to write up your experiences and expertise for me, and suffer through rounds of questions and drafts: ✨thank you✨. And thank you, too, to those of you who forwarded my queries along to experts in your network and asked for help on my behalf.

I learned a LOT about buying software and managing vendor relationships in the process of writing this. Honestly, this chapter is shaping up to be one of the things I’m most excited about for the second edition of the book.

Why I’m excited about the software buying chapter (& you should be too)

I’m imagining you reading this with a skeptical expression and an arched eyebrow. “Really, Charity…‘how to buy software’ doesn’t exactly suggest peak engineering prowess.”

Au contraire, my friends. I’ve come to believe that vendor engineering is one of the subtlest and most powerful practical applications of deep subject matter expertise, and some of the highest leverage work an engineer can do. How often do you get to make decisions that leverage the labor of hundreds or thousands of engineers per year, for fractions of pennies on the dollar? How many of the decisions you make will have an impact on every single engineer you work with and their ability to do their jobs well, as well as the experience of every single customer?

If you think I’m hyperventilating a bit, nah; this is entry level shit. In the book, I tell the story of the best engineer I ever worked with, and how I watched him alter the trajectory of multiple other companies, none of which he was working for, buying from, or formally connected to in any way — in the space of a few conversations. It upended my entire worldview about what it can look like for an engineer to wield great power.

Doing this stuff well takes both technical depth and technical breadth, in addition to systems thinking and knowledge of the business. It is one of the only ways a staff+ engineer can acquire and develop executive-level communication, strategy, and execution skills while remaining an individual contributor.

I’ve been wanting to write about this for YEARS. Anyway — ergh! — I’m rambling now. That was not what I came here to talk about, I’m just excited. Back to the point.

My second (and final) round of questions

I got so much out of your thoughtful responses that I thought I’d press my luck and put a few more questions out to the universe, before it’s too late.

These questions speak to areas where I worry that my writing may be a little weak or uninformed, or too far away from the world where people are using the “three pillars” model (aka multiple pillars or o11y 1.0) and happy about it. I don’t know many (any??) of those people, which suggests some pretty heavy selection bias.

I don’t expect anyone to answer all the questions; if one or two resonate with you, write about those and ignore the rest. If there’s something I didn’t ask that I should have asked, answer that. Something I’ve written in the past that bugged you that you hope I won’t say again? Tell me! We are almost out of time ⌛ so gimme what you got. 🙌

On migrations:

📈 Have you ever migrated from one observability vendor to another? If so, what did you learn? What was the hardest part, what took you by surprise? What do you wish you could go back in time and tell your self at the start?

📈 If you ran (or were involved in) a large scale migration or tool change… how did you structure the process? Like, was it team by team, service by service, product by product? Did you have a playbook? What did you do to make it fun or push through organizational inertia? How long did it take?

On managing costs for the traditional three pillars:

📈 For orgs that are using Datadog, Grafana, Chronosphere, or another traditional three pillars architecture.. How would you describe your approach to cutting and controlling costs? Pro tips and/or comprehensive strategy.

📈 Alternately, if there are particular blog posts with advice you have followed and can personally vouch for, would you send me a link?

📈 How do you guide your software engineers on which data to send to which place — metrics, logs, traces, errors/exceptions, profiling, etc? How do you manage cardinality? How do you work to keep the pillars in sync, or are there any particular tips and tricks you have for linking / jumping between the data sources?

📈 How many ongoing engineering cycles does it take to manage and maintain costs, once you’ve gotten them to a sustainable place?

On managing costs at massive scale:

(Especially for people who work at a large enterprise, the kind with multiple business units, but others welcome too!):

  • Do you use tiers of service for managing costs? How do you define those?
  • How do new tools get taken for a spin? (Like, sometimes there is an office of the CTO with carte blanche to try new things and evaluate them for the rest of the org)
  • How do you use telemetry pipelines?

Observability teams (quick poll):

📈 If you have an observability team, how big is it? What part of the org does it report up into? Roughly how many engineers does that team support?

📈 If you don’t have an observability team — and you have more than, say, 300 engineers — who owns observability? Platform? SRE? Other?

A grab bag:

📈 Build vs Buy: If you built your own observability tool(s)…. What were the reasons? What does it do? Would you make the same decision today?

📈 OpenTelemetry: If your team has weighed the pros and cons of adopting OTel and ultimately decided not to, for technical or philosophical reasons (i.e. not just “we’re too busy”) — what are those reasons?

📈 Instrumentation: what do you do to try and remove cognitive overhead for engineers? How much have you been able to make automatic and magical, and where has the magic failed?

📈 Consolidation: I would love to hear any thoughts on tool consolidation vs tool proliferation. Is this primarily driven by execs, or do technical users care too? Is it driven by cost concerns, usability, or something else?

edited on 2025-10-15 to add… oh crap, one last question:

📈 Open source: Are you using open source observability tools, and if so, are these your primary tools or one piece of a comprehensive tooling strategy? If the latter, could you describe that strategy for me?

Send it to me in an email

Please send me your opinions or answers in an email, to my first name at honeycomb dot io, with the subject line “Observability questions”.

If I end up cribbing from your material, it okay for me to print your name? (As in, “thanks to the people who informed my thinking on this subject, abc xyz etc”). I will not mention your employer or where you work, don’t worry.

If you send it to me more than a week from now, I probably won’t be able to use it. Augh, I wish I had thought of this in JUNE!!! #ragrets

✨THANK YOU✨

I know this is an incredibly time consuming thing to ask of someone, and I can’t express how much I appreciate your help.

P.S. Yes, the title is absolutely a reference to the Buffy musical. Hey, I had to give you guys something fun to read along with my second bleg in less than a month (do people still say “bleg”??).

6 Musical Episodes of TV Shows That Deserve an Encore

P.P.S. Grammar quiz of the day: should my title read “opinions ABOUT observability” or “opinions ON observability” ??

GREAT QUESTION — and, as it turns out, the preposition you choose may reveal more than you realized.

“About” is used to introduce a topic or subject in a broad, vague, or approximate sense, while “on” is used to signal more detailed, specific, formal or serious subject matter (as well as physical objects). “Let’s talk about dinner” vs “she delivered a lecture on why AI is trying to kill babies.”

Or as Xander says, “To read makes our English speaking good.”

The earth is doomed,
~charity

Got opinions on observability? I could use your help (once more, with feeling)

Are you an experienced software buyer? I could use some help.

If it seems like I’ve been relatively quiet lately on social media and my blog, that’s because I have. Liz, Austin, George and I have been busy toiling away on the second edition of “Observability Engineering” ever since April or May. I personally have been trying to spend 75-80% of my time on the book since May.

Have I been successful in that attempt? No. But I’m trying. Progress is being made. Hopefully just a few more weeks of drafting and we’ll be on to edits, and on to your grubby little paws by May-ish.

The world has changed A LOT since we wrote the first edition, in 2019-2022. Do you know, the phrase “observability engineering teams” doesn’t even occur in the first edition of the book? Try and search — it can’t be found! Even the phrase “observability teams” doesn’t pop up til near the end, and when it does, we are referring to those few teams that choose to build their own observability tools from scratch.

These days, observability engineering teams are everywhere. Which is why we are adding a whole new section, a sizable one, called “Observability Governance.” The governance section will have a bunch of chapters on topics like how to staff these teams, where they should fit in the org chart, how to buy good tools, how to integrate them, how to manage costs, how to make the business case up the chain to senior execs, how to manage schemas and semantic conventions at scale, and much much more.

The problem

The problem is, I’ve never really bought software. Not like this. I’ve never even worked at a  truly large, software-buying enterprise tech company. So I am not well equipped to give good advice on questions like:

  • How do you shop around for options?
  • What are some signs you may need to suck it up and change vendors?
  • What does a good POC (proof of concept) look like?
  • Who are your stakeholders? What are their concerns?
  • How do you drive consensus when millions of dollars (and the work experience of thousands of engineers) are on the line? What does ‘consensus’ even mean in that context?
  • What are the primary considerations should you take into account when making a decision? What are secondary considerations?

I’m looking for the kind of advice that a principal engineer who has done this many times might give a staff engineer who is doing it for the first time. Or that a VP who’s done this many times might give a director who is doing it for the first time.

Can you help?

This is me wearing Leia buns and projecting a unicorn-shaped rainbow bat signal out into the sky for help. Do you have any advice for me? What guidance would you give to the readers of the second edition of this book?

Please send your advice to me in an email, addressed to my first name at honeycomb dot io, with the subject line: “Buying Software”. Include any relevant context about how large the company or engineering org is, and what your role in purchasing was.

I may respond with more questions, or reply and ask if you are able to talk synchronously. But I will not quote anything you send me without first asking your permission and getting a signed release. I will not mention ANY vendors by name, good or bad.

I am not fishing for honeycomb customers or buyers, I assume most of you haven’t tried honeycomb and don’t care about it and that is fine. This is not a Honeycomb project, this is an O’Reilly writing project. I just want to gather up some good advice on buying software and funnel it back out to good engineers.

Can you help? Your industry needs you! <3

 

 

Are you an experienced software buyer? I could use some help.