Your Data is Made Powerful By Context (so stop destroying it already) (xpost)

Your Data Is Made Powerful By Context (so stop destroying it already)

In logs as in life, the relationships are the most important part. AI doesn’t fix this. It makes it worse.

(cross-posted)

After twenty years of devops, most software engineers still treat observability like a fire alarm — something you check when things are already on fire.

Not a feedback loop you use to validate every change after shipping. Not the essential, irreplaceable source of truth on product quality and user experience.

This is not primarily a culture problem, or even a tooling problem. It’s a data problem. The dominant model for telemetry collection stores each type of signal in a different “pillar”, which rips the fabric of relationships apart — irreparably.

Your observability data is self-destructing at write time

The three pillars model works fine for infrastructure1, but it is catastrophic for software engineering use cases, and will not serve for agentic validation.

But why? It’s a flywheel of compounding factors, not just one thing, but the biggest one by far is this:

✨Data is made powerful by context✨

The more context you collect, the more powerful it becomes

Your data does not become linearly more powerful as you widen the dataset, it becomes exponentially more powerful. Or if you really want to get technical, it becomes combinatorially more powerful as you add more context.

I made a little Netlify app here where you can enter how many attributes you store per log or trace, to see how powerful your dataset is.

  • 4 fields? 6 pairwise combos, 15 possible combinations.
  • 8 fields? 28 pairwise combos, 255 possible combinations.
  • 50 fields? 1.2K pairwise combos, 1.1 quadrillion (2^250) possible combinations, as seen in the screenshot below.

When you add another attribute to your structured log events, it doesn’t just give you “one more thing to query”. It gives you new combinations with every other field that already exists.

The wider your data is, the more valuable the data becomes. Click on the image to go futz around with the sliders yourself.

Note that this math is exclusively concerned with attribute keys. Once you account for values, the precision of your tooling goes higher still, especially if you handle high cardinality data.

Data is made valuable by relationships

“Data is made valuable by context” is another way of saying that the relationships between attributes are the most important part of any data set.

This should be intuitively obvious to anyone who uses data. How valuable is the string “Mike Smith”, or “21 years old”? Stripped of context, they hold no value.

By spinning your telemetry out into siloes based on signal type, the three pillars model ends up destroying the most valuable part of your data: its relational seams.

AI-SRE agents don’t seem to like three pillars data

posted something on LinkedIn yesterday, and got a pile of interesting comments. One came from Kyle Forster, founder of an AI-SRE startup called RunWhen, who linked to an article he wrote called “Do Humans Still Read Logs?”

Humpty Dumpty traced every span, Humpty Dumpty had a great plan.

In his article, he noted that <30% of their AI SRE tools were to “traditional observability data”, i.e. metrics, logs and traces. Instead, they used the instrumentation generated by other AI tools to wrap calls and queries. His takeaway:

Good AI reasoning turns out to require far less observability data than most of us thought when it has other options.

My takeaway is slightly different. After all, the agent still needed instrumentation and telemetry in order to evaluate what was happening. That’s still observability, right?

But as Kyle tells it, the agents went searching for a richer signal than the three pillars were giving them. They went back to the source to get the raw, pre-digested telemetry with all its connective tissue intact. That’s how important it was to them.

Huh.

You can’t put Humpty back together again

I’ve been hearing a lot of “AI solves this”, and “now that we have MCPs, AI can do joins seamlessly across the three pillars”, and “this is a solved problem”.

Mmm. Joins across data siloes can be better than nothing, yes. But they don’t restore the relational seams. They don’t get you back to the mathy good place where every additional attribute makes every other attribute exponentially more valuable. At agentic speed, that reconstruction becomes a bottleneck and a failure surface.

Humpty Dumpty stored all the state, Humpty Dumpty forgot to replicate.

Our entire industry is trying to collectively work out the future of agentic development right now. The hardest and most interesting problems (I think) are around validation. How do we validate a change rate that is 10x, 100x, 1000x greater than before?

I don’t have all the answers, but I do know this: agents are going to need production observability with speed, flexibility, TONS of context, and some kind of ontological grounding via semantic conventions.

In short: agents are going to need precision tools. And context (and cardinality) are what feed precision.

Production is a very noisy place

Production is a noisy, rowdy place of chaos, particularly at scale. If you are trying to do anomaly detection with no a priori knowledge of what to look for, the anomaly has to be fairly large to be detected. (Or else you’re detecting hundreds of “anomalies” all the time.)

But if you do have some knowledge of intent, along with precision tooling, these anomalies can be tracked and validated even when they are exquisitely minute. Like even just a trickle of requests2 out of tens of millions per second.

Let’s say you work for a global credit card provider. You’re rolling out a code change to partner payments, which are “only” tens of thousands of requests per second — a fraction of your total request volume of tens of millions of req/sec, but an important one.

This is a scary change, no matter how many tests you ran in staging. To test this safely in production, you decide to start by rolling the new build out to a small group of employee test users, and oh, what the hell — you make another feature flag that lets any user opt in, and flip it on for your own account.

You wait a few days. You use your card a few times. It works (thank god).

On Monday morning you pull up your observability data and select all requests containing the new build_id or commit hash, as well as all of the feature flags involved. You break down by endpoint, then start looking at latency, errors, and distribution of request codes for these requests, comparing them to the baseline.

Hm — something doesn’t seem quite right. Your test requests aren’t timing out, but they are taking longer to complete than the baseline set. Not for all requests, but for some.

Further exploration lets you isolate the affected requests to a set with a particular query hash. Oops.. how’d that n+1 query slip in undetected??

You quickly submit a fix, ship a new build_id, and roll your change out to a larger group: this time, it’s going out to 1% of all users in a particular region.

The anomalous requests may have been only a few dozen per day, spread across many hours, in a system that served literally billions of requests in that time.

Humpty Dumpty: assembled, redeployed, A patchwork of features half-built, half-destroyed. “It’s not what we planned,” said the architect, grim. “But the monster is live — and the monster is him.”

Precision tooling makes them findable. Imprecise tooling makes them unfindable.

How do you expect your agents to validate each change, if the consequences of each change cannot be found?[3]

Well, one might ask, how have we managed so far? The answer is: by using human intuition to bridge the gaps. This will not work for agents. Our wisdom must be encoded into the system, or it does not exist.

Agents need speed, flexibility, context, and precision to validate in prod

In the past, excruciatingly precise staged rollouts like these have been mostly the province of your Googles and Facebooks. Progressive deployments have historically required a lot of tooling and engineering resources.

Agentic workflows are going to make these automated validation techniques much easier and more widely used; at the exact same time, agents developing to spec are going to require a dramatically higher degree of precision and automated validation in production.

It is not just the width of your data that matters when it comes to getting great results from AI. There’s a lot more involved in optimizing data for reasoning, attribution, or anomaly detection. But capturing and preserving relationships is at the heart of all of it.

In this situation, as in so many others, AI is both the sickness and the cure[4]. Better get used to it.

 

 

 

1 — Infrastructure teams use the three pillars for one extremely good reason: they have to operate a lot of code they did not write and can not change. They have to slurp up whatever metrics or logs the components emit and store them somewhere.

2 — Yes, there are some complications here that I am glossing past, ones that start with ‘s’ and rhyme with “ampling”. However, the rich data + sampling approach to the cost-usability balance is generally satisfied by dropping the least valuable data. The three pillars approach to the cost-usability problem is generally satisfied by dropping the MOST valuable data: cardinality and context.

3 — The needle-in-a-haystack is one visceral illustration of the value of rich context and precision tooling, but there are many others. Another example: wouldn’t it be nice if your agentic task force could check up on any diffs that involve cache key or schema changes, say, once a day for the next 6-12 months? These changes famously take a long time to manifest, by which time everyone has forgotten that they happened.

4 — One sentence I have gotten a ton of mileage out of lately: “AI, much like alcohol, is both the cause of and solution to all of life’s problems.”

Your Data is Made Powerful By Context (so stop destroying it already) (xpost)

My (hypothetical) SRECon26 keynote (xpost)

My (hypothetical) SRECon26 keynote

One year ago, Fred Hebert and I delivered the closing keynote at SRECon25. Looking back on it now, I can hardly connect with how I felt then. Here’s what I’d say one year later.

Crossposted from here. 

Hey, it’s almost time for SRECon 2026! (I can’t go, but YOU really should!)

Which means it was almost a year ago that Fred Hebert and I were up on stage, delivering the closing keynote1 at SRECon25.

We argued that SREs should get involved and skill up on generative AI tools and techniques, instead of being naysayers and peanut gallerians. You can get a feel for the overall vibe from the description:

It’s easy to be cynical when there’s this much hype and easy money flying around, but generative AI is not a fad; it’s here to stay.

Which means that even operators and cynics — no, especially operators and cynics — need to get off the sidelines and engage with it. How should responsible, forward-looking SREs evaluate the truth claims being made in the market without being reflexively antagonistic?

Yep, that was our big pitch. Don’t be reflexively antagonistic. You should learn AI so that your critiques will land with credibility.

That is not the message I would give today, if I were keynoting SRECon26.

I came out of a hole, and the world had changed

I’ve been in a bit of a hole for the past few months, trying to get the second edition of “Observability Engineering” written and shipped.

Maybe the hole is why this feels so abrupt and discontinuous to me. Or maybe it’s just having such a clear artifact of my views one year ago. I don’t know.

What I do know is that one year ago, I still thought of generative AI as one more really big integration or use case we had to support, whether we liked it or not. Like AI was a slop-happy toddler gone mad in our codebase, and our sworn duty as SREs was to corral and control it, while trying not be a total dick about it.

Today, it’s very clear to me that the center of gravity has shifted from cloud/automation workflows to AI/generation workflows, and that the agentic revolution has only just begun. That toddler is heading off to school. With a loaded gun.

When the facts change, I change my mind

I don’t know when exactly that bit flipped in my head, I only know that it did. And as soon as it did, I felt like the last person on earth to catch on. I can barely connect with my own views from eleven months ago.

Were my views unreasonably pessimistic? Was I willfully ignoring credible evidence in early 2025?

Hmm, perhaps. But Silicon Valley hype trains have not exactly covered themselves in glory in recent years. VR/AR, crypto/web3/NFTs, wearable tech, the Metaverse, 3D printing, the sharing economy…this is not an illustrious string of wins.2

Cloud computing, on the other hand: genuinely huge. So was the Internet. Sometimes the hype train brings you internets, sometimes the hype train brings you tulips.

So no, I don’t think it was obvious in early 2025 that AI generated code would soon grow out of its slop phase. Skepticism was reasonable for a time, and then it was not. I know a lot of technologists who flipped the same bit at some point in 2025.

The keynote I would give today

If I was giving the keynote at SRECon 2026, I would ditch the begrudging stance. I would start by acknowledging that AI is radically changing the way we build software. It’s here, it’s happening, and it is coming for us all.

1 — This is happening

It is very, very hard to adjust to change that is being forced on you. So please don’t wait for it to be forced on you. Swim out to meet it. Find your way in, find something to get excited about.

As Adam Jacob recently advised,

“If you’re an engineer or an operations person, there is only one move. You have to start working in this new way as much as you can. If you can’t do it at work, do it at home. You want to be on the frontier of this change, because the career risk to being a laggard is incredibly high.” — Adam Jacob

This AI shit is not hard. The early days of any technology are the simplest, and this technology more than most. Conquer the brain weasels in your head by learning the truth of this for yourself.

2 — Know thyself

At a time of elevated uncertainty and anxiety, our natural human tendency to drift into confirmation bias and disconfirmation bias is higher than ever. Whatever proof you instinctively seek out, you are guaranteed to find.

The best advice I can give anyone is: know your nature, and lean against it.

  • If you are a reflexive naysayer or a pessimist, know that, and force yourself to find a way in to wonder, surprise and delight.
  • If you are an optimist who gets very excited and tends to assume that everything will improve: know that, and force yourself to mind real cautionary tales.

Try to keep your aperture wide, and remain open to possibilities you find uncomfortable. Curate the ocean you swim in. Puncture your bubble.

3 — Don’t panic

Don’t panic, and don’t give in to despair. The future isn’t written yet, and nobody knows what’s going to happen. I sure as hell don’t. Neither do you.

The fact that AI has radically changed the way we develop software in very a short time, and seems poised to change it much more in the next year or two, is real and undeniable.

This does not mean that everything else predicted by AI optimists will come to pass.

Extraordinary claims still require extraordinary evidence. AGI is, at present, an elaborate thought experiment, one that contradicts all the evidence we currently have about how technological breakthroughs typically yield enormous change in the early days, and then plateau.

We are all technologists now

Here’s another Adam quote I really like:

The bright side is that it’s a technology shift, not a manufacturing shift – meaning you still have to have technologists to do it.

I’ve written a number of blog posts over the years where I have advised people to go into the second half of their career thinking of themselves not as “engineers” or as “managers”, but as “technologists”. 3

Every great technologist needs an arsenal of skills on top of their technical expertise. They need to understand how to navigate an organization, how to translate between the language of technology and the language of the business; how to wield influence and drive results across team, company, even industry lines.

These remain durable skills, in an era where good code can be generated practically for free.

This is the moment for pragmatists

Many people who love the art and craft of software are struggling in this moment, as the value of that craft is diminishing.

People who take a much more…functional…approach to software seem to be thriving in the present chaos. “Functional” describes most of the SREs I know, including myself.

After all, SREs have always been judged by outcomes — uptime, reliability, whether the thing kept running. An outcome orientation turns out to be excellent preparation for a world where the “how” of software is becoming less important than the what and the whether, across the board.

So maybe the advice we gave at SRECon wasn’t so bad after all. Especially this part:

Which means that even operators and cynics — no, especially operators and cynics — need to get off the sidelines and engage with it.

Who can build better guardrails for AI, than SREs and operators who have spent their entire careers building guardrails for software engineers and customers?

The industry needs us. But not begrudgingly, eyerollingly, pretending to get on board in order to slow things down from the inside. The industry needs our skills to help engineering teams go fast forever.

Don’t sit back and wait for change to reach you. Run towards the waves. It’s nice out here.

1 — Our talk was called “AIOps: Prove it! An Open Letter to Vendors Selling AI for SREs”. In retrospect, this was a terrible title. It was not an open letter to vendors at all; if anything, it was an open letter to SREs. It started out as one topic, but by the time the event rolled around, it had morphed into something entirely different. Ah well.

2 — I am not even listing the kooky religious shit like effective accelerationism, transhumanism, AI “alignment” or the Singularity, all of which has seeped into the water table around these parts.

3 — Omg, I have so many unwritten posts wriggling around in my brain right now on this topic.

My (hypothetical) SRECon26 keynote (xpost)