Cross-posted from “Bring Back Ops Pride“
“Operations” is not a dirty word, a synonym for toil, or a title for people who can’t write code. May those who shit on ops get the operational outcomes they deserve.
I was planning to write something else today, but god dammit, I got nerd-sniped.
Last week I published a piece on the Honeycomb blog called “You Had One Job: Why Twenty Years of DevOps Has Failed to Do It.” Allow me to quote myself:
In retrospect, I think the entire DevOps movement was a mighty, twenty year battle to achieve one thing: a single feedback loop connecting devs with prod.
On those grounds, it failed.
Not because software engineers weren’t good at their jobs, or didn’t care enough. It failed because the technology wasn’t good enough. The tools we gave them weren’t designed for this, so using them could easily double, triple, or quadruple the time it took to do their job: writing business logic.
This isn’t true everywhere. Please keep in mind that all data tools are effectively fungible if you can assume an infinite amount of time, money, and engineering skill. You can run production off an Excel spreadsheet if you have to, and some SREs have done so. That doesn’t make it a great solution, the right use of resources, or accessible to the median engineering org.
It’s a fun piece, if I do say so myself, and you should go read it. (Much stick art!)
I posted the link on LinkedIn yesterday morning. Comments, as ever, ensued.
(The internet was a mistake, Virginia.)
“Devs should own everything”
A number of commenters said things like, “devs should own everything”, “make every team responsible for their own devops work”, and my personal favorite:
“I still think the main problem is with the ownership model – the fact that devs don’t own the full system, including infra and ops.”
(Courtesy of Alex Pulver, who has graciously allowed me to quote him here by name, adding that “he stands firmly behind this 😂”.)
As it happens, I have been aggressively advocating for the model I believe my friend is describing here, where software developers are empowered (and expected) to own their code in production, for approximately the past decade. No argument!
But “devs should own the full system, including infra and ops”?
We need to talk.

I do not think ‘ops’ means what you think it means
In software—and only in software—ops has become a dirty word. Nobody wants to claim it. Operations teams got renamed to DevOps teams, SRE, infrastructure, production engineering, or more recently, platform engineering teams. Anything but ops.
Ops means Toil! Hashtag #NoOps!
Number one, this is fucking ridiculous.
What’s wrong with operations? Ops is not a synonym for toil; it literally means “get shit done as efficiently as possible”. Every function has an operational component at scale: business ops, marketing ops, sales ops, product ops, design ops and everything else I could think of to search for, and so far as I can tell, none of them are treated with anything like the disrespect, dismissal and outright contempt that software engineering1 has chosen to heap upon its operational function.
Number two…what happened?
I can think of a number of contributing factors (APIs and cloud computing, soaring profit margins, etc), but I can also think of one easy, obvious, mustache-twirling villain, which would make a better story for you AND less work for me. (Root cause analysis wins again!! ✊)
Whose fault is it?
Google. It’s Google’s fault.
I know this, because I asked Google and it told me so.

(What? This is a free substack, not science.)
Here’s what I think happened. I think Google came out swinging, saying traditional operations could not scale and needed to become more like software engineering, and it was exactly the right message at exactly the right time, because cloud computing, APIs, SaaSes and so forth were just coming online and making it possible to manage systems more programmatically.
So far so good. But a crucial distinction got lost in translation, when we started defining this as developers (people who write code: good) vs operators (people who do things by hand: bad), which is what set us on the slippery slope to where we are today, where the entire business-critical function of operations engineering is widely seen as backwards and incompetent.
Dev vs Ops is a separation of concerns
The difference between “dev” and “ops” is not about whether or not you can write code. Dude, it’s 2026: everyone writes software.
The difference between dev and ops is a separation of concerns.

If your concern is building new features and products to attract customers and generate new revenue, then congrats: you’re a dev. (But you knew that.)
If your concern is building core services and protecting their ability to serve customers in the face of any and all threats (including, at the top of the list, your own developers): congratulations slash I’m sorry, but you are, in fact, in ops.

Both of these functional concerns are vital, as in “you literally can’t survive without them”, and complementary. You need product developers to be focused on building features and products, caring deeply about the experience of each user, and looking for ways to add value to the business. You need operations to provide a resilient, scalable, efficient base for those products to run on.
The hardest technical problems are found in ops
Ops is not “toil”. It does not mean “dummies who can’t program good”. Operations engineering is not easier or lesser than writing software to build products and features.
What’s darkly ironic is that, if anything, the opposite is true.
Product engineering is typically much simpler than infrastructure engineering—in part, of course, because one of the key functions of operations is to make it as easy as possible to build and ship products. Operations absorbs the toughest technical problems and provides a surface layer for product development that is simple, reliable, and easy to navigate.
Not because product engineers are dumb or lesser than (let’s not slip into that trap2 again!), but because cognitive bandwidth is the scarcest resource in any engineering org, and you want as much of that as possible going towards things that move the business materially forward, instead of wrestling with the messy underbelly.
The hardest technical challenges and the long, stubborn tail of intractable problems have always been on the infrastructure side. That’s why we work so hard to try not to have them—to solve them by partnerships, cloud computing, open source, etc. Anything is better than trying to build them again, starting over from scratch. We know the cost of new code in our bones.3
As I have said a thousand times: the closer you get to laying bits down on disk, the more conservative (and afraid) you should be.
The closer you get to user interaction, the more okay it is to get experimental, let AI take a shot, YOLO this puppy.
This is as it should be.

Domain level differences
The difference between dev and ops isn’t about writing code or not. But there are differences. In perspective, priorities, and (often) temperament.
I touched on a number of these in the article I just wrote on feedback loops, so I’m not going to repeat myself here.
The biggest difference I did not mention is that they have different relationships with resources and definitions of success.
Infrastructure is a cost center. You aren’t going to make more money if you give ten laptops to everyone in your company, and you aren’t going to make more money by over-spending on infrastructure, either. Great operations engineers and architects never forget that cost is a first class citizen of their engineering decisions.
You can, in theory, make more money by spending more on product engineering. This is what we refer to as an “investment”, although sometimes it seems to mean “engineers who forget their time costs money”.
(Sorry, that was rude.)
What about platform engineering?
“What about platform engineering?” Baby, that’s ops in dressup.
A bit less flippantly: I like my friend Abby Bangser’s quote: “platforms should encode things that are unique to your business but common to your teams”, and I like Jack Danger’s stick art, and his observation that “The only thing that naturally draws engineers to look at the middle of their system is pure blinding rage.”
What I love about the platform engineering movement is that it has brought design thinking and product development practices to the operational domain.
Yes, we should absolutely be treating our product developers like customers, and thinking critically about the interfaces we give them. Yes, there is a middle layer between infrastructure and product engineering, with patterns and footguns of its very own.
Also yes: from a functional perspective, platform engineering is still ops. (Or at least, more ops than not.)

Does it matter what we call it?
Yeah, I kinda think it does.
All these trendy naming schemes do not change the core value of operations, which is to consolidate and efficiently serve the revenue-generating parts of the function.4 This is as true in technology as it is in sales or marketing. Running away from the term and denying your purpose muddies the water and causes confusion at the exact point where clarity is most needed.
An engineering team needs to know if they are oriented towards efficiency or investment. It changes how you hire, how you build, how you think about success and measure progress. It changes not only your appetite for risk, but what counts as a risk in the first place. You can’t optimize for both at once.
They also need to know whether they are responsible for the business logic or the platform it runs on.
Why? Because no one can do everything. Telling devs to own their code is one thing. (Great.) Asking them to own their code and the entire technological iceberg beneath it is wholly another. The more surface area you ask someone to master and attend to, the less focus you can expect from them in any given place. Do you want your revenue-generating teams generating revenue, or not?
If you can’t separate these concerns at the moment, maybe that’s something to work towards. Which is going to be hard to do, if we can’t talk about the function of operations without half the room running away and the remaining half squawking “toil!”
Naming is a form of respect
Operational rigor and excellence are not, how shall I say this…not yet something you can take for granted in the tech industry. The most striking thing about the 2025 DORA report was that the majority of companies report that AI is just adding more chaos to a system already defined by chaos. In other words, most companies are bad at ops.
To some extent, this is because the problems are hard. To a larger extent, I think it’s the cause (and result) of our wholesale abandonment of operations as a term of pride.
It’s another a fucking feedback loop. Ambitious young engineers get the message that being associated with ops is bad, so they run away from those teams. Managers and execs want to recruit great talent and make jobs sound enticing, so they adopt trendy naming schemes to make it clear this work is not ops.
If you want to do something well, historically speaking, this is not the way. The way to build excellence is to name it for what it is, build communities of practice, raise the bar, and compensate for a job well done.
Or as one prognosticator said, way back in 2016:
I think it’s time to bring back “operations” as a term of pride. As a thing that is valued, and rewarded.
“Operations” comes with baggage, no doubt. But I just don’t think that distance and denial are an effective approach for making something better, let alone trash talking and devaluing the skill sets that you need to deliver quality services.
You don’t make operational outcomes magically better by renaming the team “DevOps” or “SRE” or anything else. You make it better by naming it and claiming it for what it is, and helping everyone understand how their role relates to your operational objectives.
Wow. Truly, I couldn’t have said it better myself.
Footnotes
Not to get all Jungian on you all, but part of me has to wonder if “ops” represents the shadow self to software engineers, the parts of yourself that you hate and despise and are most insecure about (I am weak and bad at coding!! I might be automated out of a job!), and thus need to project onto some externalized other that can be safely loathed from a distance.
This might surprise you youngsters, but there was a time when systems folks were clearly the cool kids and developers were considered rather dim. Devs had to know data structures and algorithms, but sysadmins had to know everything. These things tend to come and go in cycles, so we may as well not shit on each other, eh?
As my friend Peter van Hardenburg likes to say, “The best code is no code at all. The second best code is code someone else writes and maintains for you. The worst code is the code you have to write and maintain yourself.” If it would fit on my knuckles, I would get this in knuckle tatts.
Would it be helpful to acknowledge that IT/ops serves an entirely different function than software engineering operations for production systems? Because it absolutely does.
physical with the psychological and emotional, all with the benefit of “regulation” and intentionality. Physically going through the process of a ritual helps people feel satisfied and in control, with better emotional regulation and the ability to act in a steadier and more focused way. Rituals also powerfully increase people’s sense of belonging, giving them a stable feeling of social connection. (p. 5-6)
hands. (It was a very sparkly rhinestone wedding tiara, and every engineer looked simply gorgeous in it.)

providing biweekly updates to the infra leadership groups. Four months later, when the migration was half done, I get a ping from the same exact members of Facebook leadership:
prove that I shaved the heads of and/or dyed the hair blue of at least seven members of engineering. I wish I could remember why! but all I remember is that it was fucking hilarious.



considered for a key senior position. Those are the people you most rely on to be mentors and role models for junior hires. All engineers should embrace the ethos of owning their code in production, and nobody should be promoted or hired into a senior role if they don’t.
languages, databases, and frameworks are already supported by the team? Do they understand what kind of monitoring and observability tools to use, do they ask about local instrumentation best practices?




dashboards, but I have heard a few people refer to them as “dynamic dashboards”.)
software engineers don’t have that kind of systems experience or intuition…and they shouldn’t have to.
graphs that were relevant to some long-ago situation, without context or history, without showing their work. Sometimes you’ll spot the exact scenario, and — huzzah! — the number you shout is correct! But when it comes to unknown scenarios, the odds are not in your favor.
dashboards. You cannot drill down from errors to endpoints to error strings; for that, you’d need a wide structured data blob per request. Those might in fact be two or three separate outages or anomalies happening at the same time, or just the tip of the iceberg of a much larger event, and your hasty assumptions might extend the outage for much longer than was necessary.
had — naming specific hosts, etc — which just creates clutter and toil. This is how your dashboards become that graveyard of past outages.
If what you have is “nothing”, even shitty dashboards are far better than no dashboards. But shitty dashboards have been the only game in town for far too long. We need more vendors to think about building for queryability, explorability, and the ability to follow a trail of breadcrumbs. Modern systems are going to demand more and more of this approach.
job is to support those services in production. (There are plenty of software jobs that do not involve building highly available services, for those who are offended by this.) Tossing it off to ops after tests pass is nothing but a thinly veiled form of engineering classism, and you can’t build high-performing systems by breaking up your feedback loops this way.
This doesn’t have to be cash, it could be a Friday off the week after every on call rotation. The more established and funded a company you are, the more likely you should do this in order to surface the right incentives up the org chart.
Computers!✨ Furthermore, this whole monitoring-based approach will only ever help you find the known unknowns, the problems you already know to look for. But most of your actual problems will be 







Most engineers have never worked on a system like this. Most engineers have no idea what a yawning chasm exists between a healthy, tractable system and where they are now. Most engineers have no idea what a difference observability can make. Most engineers are far more familiar with spending 40-50% of their week fumbling around in the dark, trying to figure out where in the system is the problem they are trying to fix, and what kind of context do they need to reproduce.
in the first place. No doubt about that.
then stared anxiously at dashboards — it was unthinkable. It was like I was being asked to give up my five senses for production — like I was going to be blind, deaf, dumb, without taste or touch.
there’s anything you especially want me to write about, tell me now while I’m in repentance mode.
an outage, you post mortem the incident, figure out what happened, build a dashboard “to help us find the problem immediately next time”, create a detailed runbook for how to respond to it, and (often) configure a paging alert to detect that scenario.
that isn’t user-visible.



Don’t ship before you walk out the door on *any* day.
when this code breaks? how will you know if the deploy is not behaving as planned?” Instrument every commit so you can answer this question in production.
