The Truth About “MEH-TRICS”

First published on 2022-04-13 at https://www.honeycomb.io/blog/truth-about-meh-trics-metrics

A long time ago, in a galaxy far, far away, I said a lot of inflammatory things about metrics.

“Metrics are shit salad.”

“Metrics are simply nerfed dimensions.”

“Metrics suck,” “metrics are legacy,” “metrics and time series aggregates will fucking kneecap you.”

I cannot tell a lie; Twitter will testify that I’ve spent the past six years ragging on metrics. So much so that ever since we launched Honeycomb Metrics last year, our poor solution architects have been encountering skeptics in the field who repeat my quotes back to them and ask, dubiously, whether Honeycomb Metrics are any good or not, and whether we genuinely plan on investing in it or not, given our known anti-metrics sympathies.

That’s a great question. 😊

Metrics aren’t worthless; they’re just limited.

Metrics are a mature technology that’s been around for over 30 years, and they have some real advantages. They’re tiny, fast, and cheap; you can hold a bunch of them in memory as counters, summaries, and gauges. They aggregate well and take up a fixed amount of storage space. The entire monitoring industry is built on top of metrics.

When it comes to workloads like, “How heavy is the write load on my hard drive?” or “What is the temperature or fan status inside my chassis?” or “What is the traffic rate in and out of this interface on my switch?”  metrics are what you should use. In fact, pretty much any time you want to know the health of a system or component in toto, metrics are the right tool.

Because that’s what metrics do best—report statistics in aggregate, from the perspective of any system or component. They can tell you that your Ruby HTTP worker pool is 70% utilized or that your nginx webserver is returning 502s 1% of the time. What they can’t tell you is what this means for any one of your users, applications, delivery vehicles, and so forth.

Until recently, metrics-based tools or logs were the only game in town. People were trying to sell us metrics tools for observability use cases, and that’s what got my goat so badly. If you simply append “… for observability” to each of my inflammatory statements, then I stand by them completely.

“Metrics are shit salad … for observability.

Yup, rings true.

You’re never going to make a metrics tool like Prometheus or Datadog into an observability tool. You’re just not. Observability is about unknown-unknowns, while metrics are a tool for known-unknowns.

If you need a refresher on the differences between observability and monitoring, I’ll refer you to pieces like thisthis, and this. What I want to talk about here is slightly different. In a post-observability world, what is the true and proper place for metrics tooling?

Metrics and observability have different use cases.

Metrics aren’t completely useless, even if you have a robust observability presence. We still use metrics at Honeycomb to this day for certain workloads—and always will because they’re the right tool for the job.

There are two kinds of workloads, roughly speaking: your code—the code you write, review, ship, debug and maintain on a daily basis. And other people’s code—the code you have to run and use in order to support your code. Some examples of the latter might be: Linux, Docker, MySql, Amazon RDS, Kafka, AWS Lambda, GCP gateways, memcache, CI/CD pipelines, Kubernetes, etc.

Your code is your crown jewels, the code you need to survive and succeed as a business. It changes constantly—many times per week, if not per day. You are expected to understand its inner workings intimately, and spend lots of time chasing down bugs or understanding and reproducing behavior. You care about the way it performs and interacts with each and every individual user, with changing infrastructure state, and under a variety of different load conditions.

That is why your code demands observability. In order to understand your software, you must first instrument it, in a way that collects lots of rich context and bundles it up around each event end-to-end. Then you need to stream those events into a tool that lets you slice and dice and trace and explore with support for high-cardinality and high-dimensionality data. That’s the only way you’re going to be able to correlate errors, track down outliers, and reflect each user’s experience.

But what about the rest of the software? You can’t instrument Amazon RDS, and only crazy people would instrument, rebuild, and repackage things like Kafka or Docker or nginx. The whole point of third-party software is that you DON’T USE IT until it’s stable enough to be taken more or less for granted. Sure, you roll updates, but usually on the order of months or years—not every day. You don’t need to be intimately familiar with its inner workings because you aren’t changing it every day. Those aren’t your crown jewels.

You do care about their health though, only differently. You care about whether you need to provision more capacity or not. You care about knowing how hard you’re hammering on the underlying hardware or hypervisor. That’s why metrics and monitoring are the right tools to use for third-party code. They don’t let you peer under the hood in the same way, or slice and dice in the same way, but that’s okay. You shouldn’t have to.

With third-party stuff, you don’t care about the code, you care about the health of the service. In aggregate.

(There are some kinds of in-between software, like databases, where event-level information is super useful for debugging things like slow queries and lock percentages, and you can use various black box techniques to approximate observability without instrumentation. But in general this model holds up quite well.)

In a post-observability world, what are metrics for?

I’ve often pointed out that observability is built on top of arbitrarily wide structured data blobs, and that metrics, logs, and traces can be derived from those blobs while the reverse is not true—you can’t take a bunch of metrics and reformulate a rich event.

And yes, people who have observability typically find themselves using metrics and dashboards less and less. They’re simply not as versatile or useful as events that you can slice and dice and manipulate in infinite ways. And you can derive aggregates and trends from the events you have stored.

But metrics will always be useful for understanding third-party software, from the perspective of the service, cluster, or node. They will always be the right tool for the job when it comes to software interfacing with hardware. And they can be super complementary when you are investigating your code using events and instrumentation.

If you’re an engineer writing and shipping code, you’re never not going to want to know if your change caused memory usage to triple, or CPU utilization to skyrocket, or disk usage or network throughput to saturate. That’s why we built Honeycomb Metrics as an overlay, a way to enhance or validate your understanding of the impact your code changes have had on the underlying system.

Metrics are also valuable as a bridge to the past. People have been instrumenting software for metrics for 30 years—they’re never going away completely, and not everything can or should be reinstrumented with events. Lots of people already have robust monitoring systems that slurp in millions of metrics. Nobody wants to have to redo all that work just because they’re moving to a different tool, so people tend to point their metrics firehose at Honeycomb as a way of getting started as they roll observability out into their code.

The Truth About “MEH-TRICS”

“Why are my tests so slow?” A list of likely suspects, anti-patterns, and unresolved personal trauma.

Over the past couple of weeks I’ve been tweeting a LOT about lead time to deploy: the interval encompassing the time from when the code gets written and when it’s been deployed to production. Also described as “how long it takes you to run CI/CD.”

How important is this?

Fucking central.

Here is a quickie thread from this week, or just go read “Accelerate” like everybody already should have. 🙃

It’s nigh impossible to have a high-performing team with a long lead time, and becomes drastically easier with a dramatically shorter lead time.

🌷 Shorter is always better.
🌻 One mergeset per deploy.
🌹 Deploy should be automatic.

And it should clock in under 15 minutes, all the way from “merging!” to “deployed!”.

Now some people will nod and agree here, and others freak the fuck out. “FIFTEEN MINUTES?” they squall, and begin accusing me of making things up or working for only very small companies. Nope, and nope. There are no magic tricks here, just high standards and good engineering, and the commitment to maintaining your goals quarter by quarter.

If you get CI/CD right, a lot of other critical functions, behaviors, and intuitions are aligned to be comfortably successful and correct with minimal effort. If you get it wrong, you will spend countless cycles chasing pathologies. It’s like choosing to eat your vegetables every day vs choosing a diet of cake and soda for fifty years, then playing whackamole with all the symptoms manifesting on your poor, mouldering body.

Is this ideal achievable for every team, on every stack, product, customer and regulatory environment in the world? No, I’m not being stupid or willfully blind. But I suggest pouring your time and creative energy into figuring out how closely you can approximate the ideal given what you have, instead of compiling all the reasons why you can’t achieve it.

Most of the people who tell me they can’t do this are quite wrong, turns out. And even if you can’t down to 15 minutes, ANY reduction in lead time will pay out massive, compounding, benefits to your team and adjacent teams forever and ever.

So — what was it you said you were working on right now, exactly? that was so important? 🤔

“Cutting my build time by 90%!” — you

Huzzah. 🤠

So let’s get you started! Here, courtesy of my twitterfriends, is a long compiled list of Likely Suspects and CI/CD Offenders, a long list of anti-patterns, and some unresolved personal pain & suffering to hunt down and question when your build gets slow..

✨15 minutes or bust, baby!✨

 
Where it all started: what keeps you from getting under 15 minute CI/CD runs?

Generally good advice.

  • Instrument your build pipeline with spans and traces so you can see where all your time is going. ALWAYS. Instrument.
  • Order tests by time to execute and likelihood of failure.
  • Don’t run all tests, only tests affected by your change
  • Similarly, reduce build scope; if you only change front-end code, only build/test/deploy the front end, and for heaven’s sake don’t fuss with all the static asset generation
  • Don’t hop regions or zones any more than you absolutely must.
  • Prune and expire tests regularly, don’t wait for it to get Really Bad
  • Combine functionality of tests where possible — tests need regular massages and refactors too
  • Pipeline, pipeline, pipeline tests … with care and intention
  • You do not need multiple non-production environment in your CI/CD process. Push your artifacts to S3 and pull them down from production. Fight me on this
  • Pull is preferable to push. (see below)
  • Set a time elapsed target for your team, and give it some maintenance any time it slips by 25%

The usual suspects

  • tests that take several seconds to init
  • setup/teardown of databases (HINT try ramdisks)
  • importing test data, seeding databases, sometimes multiple times
  • rsyncing sequentially
  • rsyncing in parallel, all pulling from a single underprovisioned source
  • long git pulls (eg cloning whole repo each time)
  • CI rot (eg large historical build logs)
  • poor teardown (eg prior stuck builds still running, chewing CPU, or artifacts bloating over time
  • integration tests that spin up entire services (eg elasticsearch)
  • npm install taking 2-3 minutes
  • bundle install taking 5 minutes
  • resource starvation of CI/CD system
  • not using containerized build pipeline
  • …(etc)
 
Continuous deployment to industrial robots in prod?? Props, man.

Not properly separating the streams of “Our Software” (changes constantly) vs “infrastructure” (changes rarely)

  • running cloudformation to setup new load balancers, dbs, etc for an entire acceptance environment
  • docker pulls, image builds, docker pushes container spin up for tests

“Does this really go here?”

  • packaging large build artifacts into different format for distribution
  • slow static source code analysis tools
  • trying to clone production data back to staging, or reset dbs between runs
  • launching temp infra of sibling services for end-to-end tests, running canaries
  • selenium and other UX tests, transpiling and bundling assets

“Have a seat and think about your life choices.”

  • excessive number of dependencies
  • extreme legacy dependencies (things from the 90s)
  • tests with “sleep” in them
  • entirely too large frontends that should be broken up into modules

“We regret to remind you that most AWS calls operate at the pace of ‘Infrastructure’, not ‘Software'”

  • AWS CodeBuild has several minutes of provisioning time before you’re even executing your own code — even a few distinct jobs in a pipeline and you might suffer 15 min of waiting for CodeBuild to do actual work
  • building a new AMI
  • using EBS
  • spinning up EC2 nodes .. sequentially 😱
  • cool it with the AWS calls basically



A few responses were oozing with some unresolved trauma, lol.

Natural Born Opponents: “Just cache it” and “From the top!”

  • builds install correct version of toolchain from scratch each time
  • rebuilding entire project from source every build
  • failure to cache dependencies across runs (eg npm cache not set properly)

“Parallelization: the cause of, and solution to, all CI problems”

  • shared test state, which prevents parallel testing due to flakiness and non-deterministic test results
  • not parallelizing tests
 
 
I have so many questions….

Thanks to @wrd83, @sorenvind, @olitomli, @barney_parker, @dastbe, @myajpitz, @gfodor, @mrz, @rwilcox, @tomaslin, @pwyliu, @runewake2, @pdehlkefor, and many more for their contributions!

P.S. what did I say about instrumenting your build pipeline? For more on honeycomb + instrumentation, see this thread. Our free tier is incredibly generous, btw ☺️

Stay tuned for more long form blog posts on this topic. Coming soon. 🌈

charity

P.S. this blog post is the best thing i’ve ever read about reducing your build time. EVER.

“Why are my tests so slow?” A list of likely suspects, anti-patterns, and unresolved personal trauma.

How to make boba at home…without ruining any pans, making yourself ill, or ending up with a soggy, blobby mess

Last year I was diagnosed with ADHD, which was a great surprise to me (if no one else). Since then I have been trying to pay attention to things I do that might be, let’s say, outside the norm. One of those things is, apparently, food.

I tend to fixate on one food at a time. When I wake up in the morning, it’s the first and only thing I crave. When I’m hungry, I’m dying for it, and I don’t really experience cravings or desire for other foods, although I will eat them to be polite. The phase tends to last for…six months to two years? and then it shifts to something else.

The target of my appetite has been, at various times in the past: honeycrisp apples with peanut butter (I was DEVASTATED when honeycrisp season ended; other apples weren’t the same), dry cheerios with freeze-dried strawberries, chopped broccoli with sharp cheddar, a cashew chicken dish at a now-defunct Thai restaurant, etc.

One year it was manhattans (makers mark, sweet vermouth and bitters) and I seriously worried I was becoming an alcoholic. 🙈

But since September 19th, 2019, the only thing I have been interested in eating is … boba. Those little brown tapioca balls. I can rattle off to you the top boba places in every city I’ve visited since then (LA has some seriously adventurous ones). And when the world strapped in for quarantine, I was on the verge of panic. What to do??

I finally figured out how to make my own boba. This was NONTRIVIAL. It took the sacrifice of countless pans and far too many nights doubled over with nausea and stomach cramps (read my buying tips, I cannot this stress enough), and months of trial and error. But here is how to get the plump, chewy, slightly sweet boba of your dreams.

(Just the boba. Drinks are up to you. I recommend The Boba Book.)

Buying boba.

Do not buy any boba from China. Do not buy any boba labeled “quick cook”, or boba with instructions that are on the order of 5 minutes. Do not buy any flavored boba. I got violently ill from about half a dozen different brands I ordered randomly off Amazon, all made in China. Some had an odd aftertaste.

Supposedly, the Boba Guys are planning to let us buy the stuff they make domestically in California “soon”. Until then, stick to the stuff that is made of tapioca flour only, and manufactured in Taiwan or The U.S.

Also, the little balls are very fragile and turn to powder in the mail unless they are packed very tightly. This boba, from The Tea Zone is what I buy and recommend buying. Pick up some large diameter straws if you don’t have a stash at home.

Equipment.

You need a big-ass pot of boiling water. The biggest pot you’ve got. I use a big soup pot that holds like 16 or 20 quarts.

IMG_0923
Big Ass Pot

If you only have a few quarts of water, you will ruin pans. The tapioca dust turns to gummy that sticks to the sides and bottom and gets baked on like a motherfucker. You want a ratio of SHIT TONS of water to a handful or two of boba.

Cooking.

Fill it up with water to within an inch or two of the top — Bring it to a fast boil, then put your boba in — a cup or two or three, whatever you think you need. Let it boil for 20-25 minutes… only reduce the heat if you have to to keep it from boiling over.

Uncooked boba will have these little white spots in the middle. Once you see only a few of those in a sea of black pearls, turn off the heat. Let it sit in the hot water for another 20-25 minutes.

IMG_0939
Spot the uncooked boba

Then take the pot to the sink, pour off the excess water, fill it back up with cold water, swoosh it around to rinse; pour, fill, rinse a couple times til the balls are rinsed and lukewarm. You don’t have to drain them dry-dry; leave a small bit of water in the pan.

Flavoring and eating.

Add some sweetener — I like brown sugar, but honey is good too, or molasses and white sugar — and let the balls soak for another 30 minutes so they absorb the flavor. Now they are ready to eat. They will only keep for about a day, and don’t refrigerate them or they get gross.

**If you want the syrupy consistency of the gourmet boba shops, leave a little extra water in there, add the sugar, then simmer on low and STIR CONSTANTLY for 5-10 minutes or until it gets syrupy. I cannot stress this enough: rinse the boba first, and do not stop stirring, if you enjoy your pans and want to use them again

The easiest possible recipe (besides eating from the pot with a spoon) is, fill a glass 1/3 of the way with boba, add milk, add brown sugar simple syrup to taste. Add a couple ice cubes if you like your boba on the firm side. Also, try adding a little bit of rum and Frangelico for your bedtime boba.

Cheers!

IMG_9586
Boba, milk, frangelico

How to make boba at home…without ruining any pans, making yourself ill, or ending up with a soggy, blobby mess

Questionable Advice: Can Engineering Productivity Be Measured?

I follow you on Twitter and read your blog.  I particularly enjoy this post: https://charity.wtf/2019/05/01/friday-deploy-freezes-are-exactly-like-murdering-puppies/ I’m reaching out looking for some guidance.

I work as an engineering manager for a company whose non-technology leadership insists there has to be a way to measure the individual productivity of a software engineer. I have the opposite belief. I don’t believe you can measure the productivity of “professional” careers, or thought workers (ex: how do measure productivity of a doctor, lawyer, or chemist?).

For software engineering in particular, I feel that metrics can be gamed, don’t tell the whole story, or in some cases, are completely arbitrary. Do you measure individual developer productivity? If so, what do you measure, and why do you feel it’s valuable? If you don’t and share similar feelings as mine, how would you recommend I justify that position to non-technology leadership?

Thanks for your time.

Anonymous Engineering Manager

Dear Anon,

Once upon a time I had a job as a sysadmin, 100% remote, where all work was tracked using RT tasks. I soon realized that the owner didn’t have a lot of independent technical judgment, and his main barometer for the caliber of our contributions was the number of tasks we closed each day.

I became a ticket-closing machine. I’d snap up the quick and easy tasks within seconds. I’d pattern match and close in bulk when I found a solution for a group of tasks. I dove deep into the list of stale tickets looking for ones I could close as “did not respond” or “waiting for response”, especially once I realized there was no penalty for closing the same ticket over and over.

My boss worshiped me. I was bored as fuck. Sigh.

I guess what I’m trying to say is, I am fully in your camp. I don’t think you can measure the “productivity” of a creative professional by assigning metrics to their behaviors or process markers, and I think that attempting to derive or inflict such metrics can inflict a lot of damage.

In fact, I would say that to the extent you can reduce a job to a set of metrics, that job can be automated away. Metrics are for easy problems — discrete, self-contained, well-understood problems. The more challenging and novel a problem, the less reliable these metrics will be.

Your execs should fucking well know this: how would THEY like to be evaluated based on, like, how many emails they send in a day? Do they believe that would be good for the business? Or would they object that they are tasked with the holistic success of the org, and that their roles are too complex to reduce to a set of metrics without context?

This actually makes my blood boil. It is condescending as fuck for leadership to treat engineers like task-crunching interchangeable cogs. It reveals a deep misunderstanding of how sociotechnical systems are developed and sustained (plus authoritarian tendencies, and usually a big dollop of personal insecurity).

But what is the alternative?

In my experience, the “right” answer, i.e. the best way to run consistently high-performing teams, involves some combination of the following:

  • Outcome-based management that practices focusing on impact, plus
  • Team level health metrics, combined with
  • Engineering ladder and regular lightweight reviews, and
  • Managers who are well calibrated across the org, and encouraged to interrogate their own biases openly & with curiosity.

The right way to look at performance is at the team level. Individual engineers don’t own or maintain code; teams do. The team is the irreducible unit of ownership. So you need to incentivize people to think about work and spending their time cooperatively, optimizing for what is best for the team.

Some of the hardest and most impactful engineering work will be all but invisible on any set of individual metrics. You want people to trust that their manager will have their backs and value their contributions appropriately at review time, if they simply act in the team’s best interest. You do not want them to waste time gaming the metrics or courting personal political favor.

This is one of the reasons that managers need to be technical — so they can cultivate their own independent judgment, instead of basing reviews on hearsay. Because some resources (i.e. your budget for individual bonuses) are unfortunately zero-sum, and you are always going to rely on the good judgment of your engineering leaders when it comes to evaluating the relative impact of individual contributions.

“I would say that Joe’s contribution this quarter had greater impact than Jane’s. But is that really true? Jane did a LOT of mentoring and other “glue” work, which tends to be under-acknowledged as leadership work, so I just want to make sure I am evaluating this fairly … Does anyone else have a perspective on this? What might I be missing?” — a manager keeping themselves honest in calibrations

I do think every team should be tracking the 4 DORA metrics — time elapsed between merge and deploy, frequency of deploy, time to recover from outages, duration of outages — as well as how often someone is paged outside of business hours. These track pretty closely to engineering productivity and efficiency.

But leadership should do its best to be outcome oriented. The harder the problem, the more senior the contributor, the less business anyone has dictating the details of how or why. Make your agreements, then focus on impact.

This is harder on managers, for sure — it’s easier to count the hours someone spends at their desk or how many lines of code they commit than to develop a nuanced understanding of the quality and timbre of an engineer’s contributions to the product, team and the company over time. It is easier to micromanage the details than to negotiate a mutual understanding of what actually matters, commit to doing your part … and then step away, trusting them to fill in the gaps.

But we should expect this; it’s worth it. It is in those gaps where we feel trusted to act that we find joy and autonomy in our labor, where we do our best work as skilled artisans.

Questionable Advice: Can Engineering Productivity Be Measured?

Quarantine Reading Queue on the “Tiger King” Phenomenon

Last Wednesday I walked into my living room and saw three gay rednecks in hot pink shirts being married as a “throuple” on a TV screen at close range, followed by one of the grooms singing a country song about a woman feeding her husband’s remains to her tigers.

I could not look away.  What the fuck.

If you too have been rubbernecking the Tiger King — at any range — I have a book that will help you make sense of things: “Blood Rites: On The Origins and History of the Passions of War“, by Barbara Ehrenreich[1].  I re-read it last night, and here is my book report.


throuple

 

In Blood Rites, Ehrenreich asks why we sacralize war.  Not why we fight wars, or why we are violent necessarily, but why we are drawn to the idea of war, why we compulsively imbue it with an aura of honor and noble sacrifice.  If you kill one person, you’re a murderer and we shut you out from society; kill ten and you are a monster; but if you kill thousands, or kill on behalf of the state, we give you medals and write books about you.

And it’s not only about scale or being backed by state power.  The calling of war brings out the highest and finest experiences our species can know: it sings of heroism and altruism, of discipline, self-sacrifice, common ground, a life lived well in service; of belonging to something larger than one’s self.  Even if, as generations of weary returning soldiers have told us, it remains the same old butchery on the ground, the near-religious allure of war is never dented for long in the popular imagination.

What the fuck is going on?  bloodrites

Ehrenreich is impatient with the traditional scholarship, which locates the origin of war in some innate human aggression or turf wars over resources.  She is at her dryly funniest when dispatching feminist theories about violence being intrinsically male or “testosterone poisoning”, showing that the bloodthirstiest of the gods have usually been feminine.  (Although there are fascinating symmetries between girls becoming women through menstruation, and boys becoming men through … some form of culturally sanctioned ritual, usually involving bloodshed.)

Rather, she shows that our sacred feelings towards blood shed in war are the direct descendents of our veneration of blood shed in sacrifice — originally towards human sacrifice and other animal sacrifice, in a reenactment of our own ever-so-recent role inversion from prey to predator.  Prehistoric sacrifice was likely a way of exerting control over our environment and reenacting the death that gave us life through food.

In her theory, humans do not go to war because we are natural predators. Just the blink of an eye ago, on an evolutionary scale, humans were not predators by any means: we were prey.  Weak, blind, deaf, slow, clawless and naked; we scrawny, clever little apes we were easy pickings for the many large carnivores who roamed the planet.  We scavenged in the wake of predators and worshiped them as gods.  We are the nouveaux riche of predators, constantly re-asserting our dominance to soothe our insecurities.

We go to war not because we are predators, in other words, but because we are prey — and this makes us very uncomfortable!  War exists as a vestigial relic of when we venerated the shedding of blood and found it holy — as anyone who has ever opened the Old Testament can attest.  It was not until the Axial Age that religions of the world underwent a wholesale makeover into a less bloody, more universalistic set of aspirations.  ashes

When I first read this book, years ago, I remember picking it up with a roll of the eyes.  “Sounds like some overly-metaphorical liberal academic nonsense” or something like that.  But I was hooked within ten pages, my mind racing ahead with even more evidence than she marshals in this lively book.  It shifted the way I saw many things in the world.

Like horror movies, for example.  Or why cannibalism is so taboo.  How Jesus became the Son of God, the Brothers’ Grimm, the sacrament of Communion.  The primal fear of being food still resonates through our culture in so many sublimated ways.

And whether what you’re watching is “Tiger King” or the Tiger-King-watchers, it will make A LOT more sense after reading this book too.

Stay safe and don’t kill each other,

charity

IMG_6288

 

[1]  Ehrenreich is best known for her stunning book on the precariousness of the middle class, “Nickel and Dimed”, where she tried to subsist for a year only on whatever work she could get with a high school education.  Ehrenreich is a journalist, and this is a piece of science journalism, not scientific research; yet it is well-researched and scrupulously cited, and it’s worth noting that she has a PhD in biology and was once a practicing scientist.

 

 

Quarantine Reading Queue on the “Tiger King” Phenomenon

Questionable Advice: “After Being A Manager, Can I Be Happy As A Cog?”

One of my stretch goals for 2019 was to start writing an advice column.  I get a lot of questions about everything under the sun: observability, databases, career advice, management problems, what the best stack is for a startup, how to hire and interview, etc.  And while I enjoy this, having a high opinion of my own opinions and all, it doesn’t scale as well as writing essays.  I do have a (rather all-consuming) day job.

So I’d like to share some of the (edited and lightly anonymized) questions I get asked and some of the answers I have given.  With permission, of course.  And so, with great appreciation to my anonymous correspondent for letting me publish this, here is one.

Hi Charity,

I’ve been in tech for 25 years.  I don’t have a degree, but I worked my way up from menial jobs to engineering, and since then I have worked on some of the biggest sites in the world.  I have been offered a management role many times, but every time I refused.  Until about two years ago, when I said “fuck it, I’m almost 40; why not try.”

I took the job with boundless enthusiasm and motivation, because the team was honestly a mess.  We were building everything on-prem, and ops was constantly bullying developers over their supposed incompetence.  I had gone to conferences, listened to podcasts, and read enough blog posts that my head was full of “DevOps/CloudNative/ServiceOriented//You-build-it-you-run-it/ServantLeaders” idealism.  I knew I couldn’t make it any worse, and thought maybe, just maybe I could even make it better.softwarenegineeroncall_2 (1)

Soon after I took the job, though, there were company-wide layoffs.   It was not done well, and morale was low and sour.  People started leaving  for happier pastures.  But I stayed.  It was an interesting challenge, and I threw my heart and soul into it.

For two years I have stayed and grinded it out: recruiting (oh that is so hard), hiring, and then starting a migration to a cloud provider, and with the help of more and more people on the new team, slowly shifted the mindset of the whole engineering group to embrace devops best practices.  Now service teams own their code in production and are on-call for them, migrate themselves to the cloud with my team supporting them and building tools for them.  It is almost unrecognizable compared to where we were when I began managing.

A beautiful story isn’t it?  I hope you’re still reading.  🙂

Now I have to say that with my schedule full of 1:1s, budgeting, hiring, firing, publishing papers of mission statements and OKRs, shaping the teams, wielding influence, I realized that I enjoyed none of the above.  I read your 17 reasons not to be a manager, and I check so many boxes.  It is a pain in the ass to constantly listen to people’s egos, talk to them and keep everybody aligned (which obviously never happens).  And of course I am being crushed between top-down on-the-spot business decisions and bottom-up frustration of poorly executed engineering work under deadlines.  I am also destroyed by the mistrust and power games I am witnessing (or involved in, sometimes). while I long for collaboration and trust.  And of course when things go well my team gets all the praise, and when things go wrong I take all the blame.  I honestly don’t know how one can survive without the energy provided by praise and a sense of achievement.

All of the above makes me miss being an IC (Individual Contributor), where I could work for 8 hours straight without talking to anyone, build stuff, say what I wanted when I wanted, switch jobs if I wasn’t happy, and basically be a little shit like the ones you mention in your article.

Now you may say it’s obvious: I should find a new IC job in a healthier company.  You even wrote about it.  Going back to IC after two years of management is actually a good move.

But when I think about doing it, I get stuck.  I don’t know if I would be able to do it again, or if I could still enjoy it.  I’ve seen too many things, I’ve tasted what it’s like to be (sometimes) in control, and I did have a big impact on the company’s direction over time.  I like that.  If I went back to being an IC, I would feel small and meaningless, like just another cog in the machine.  And of course, being 40-ish, I will compete with all those 20-something smartasses who were born with kubernetes.

Thank you for reading.  Could you give me your thoughts on this?  In any case, it was good to get it off my chest.

Cheers,

Cog?

Dear Cog?,

Holy shitballs!  What an amazing story!  That is an incredible achievement in just two years, let alone as a rookie manager.  You deserve huge props for having the vision, the courage, and the tenacity to drive such a massive change through.

Of COURSE you’re feeling bored and restless.  You didn’t set out on a glorious quest for a life of updating mission statements and OKRs, balancing budgets, tending to people’s egos and fluffing their feelings, tweaking job descriptions, endless 1x1s and meetings meetings meetings, and the rest of the corporate middle manager’s portfolio.  You wanted something much bigger.  You wanted to change the world.  And you did!

But now you’ve done it.  What’s next?testinprod_3

First of all, YOUR COMPANY SUCKS.  You don’t once mention your leadership — where are they in all this?  If you had a good manager, they would be encouraging you and eagerly lining up a new and bigger role to keep you challenged and engaged at work.  They are not, so they don’t deserve you.  Fuck em.  Please leave.

Another thing I am hearing from you is, you harbor no secret desire to climb the managerial ranks at this time.  You don’t love the daily rhythms of management (believe it or not, some do); you crave novelty and mastery and advancement.  It sounds like you are willing to endure being a manager, so long as that is useful or required in order to tackle bigger and harder problems.  Nothing wrong with that!  But when the music stops, it’s time to move on.  Nobody should be saddled with a manager whose heart isn’t in the work.

You’re at the two year mark.  This is a pivotal moment, because it’s the beginning of the end of the time when you can easily slip back into technical work.  It will get harder and harder over the next 2-3 years, and at some point you will no longer have the option.

Picking up another technical role is the most strategic option, the one that maximizes your future opportunities as a technical leader.  But you do not seem excited by this option; instead you feel many complex and uncomfortable things.  It feels like going backwards.  It feels like losing ground.  It feels like ceding status and power.

“Management isn’t a promotion, it’s a career change.”

But if management is not a promotion, then going back to an engineering role should not feel like a demotion!  What the fuck?!

Imeplusprodt’s one thing to say that.  Whether it’s true or not is another question entirely, a question of policy and org dynamics.  The fact is that in most places, most of the power does go to the managers, and management IS a promotion.  Power flows naturally away from engineers and towards managers unless the org actively and vigorously pushes back on this tendency by explicitly allocating certain powers and responsibilities to other roles.

I’m betting your org doesn’t do this.  So yeah, going back to being an IC WILL be a step down in terms of your power and influence and ability to set the agenda.  That’s going to feel crappy, no question. We humans hate that.

Three points.

      1. You cannot go back to doing exactly what you did before, for the very simple reason that you are not the same person.  You are going to be attuned to power dynamics and ways of influencing that you never were before — and remember, leadership is primarily exercised through influence, not explicit authority.Senior ICs who have been managers are supremely powerful beings, who tend to wield outsize influence.  Smart managers will lean on them extensively for everything from shadow management and mentorship to advice, strategy, etc.  (Dumb managers don’t.  So find a smart manager who isn’t threatened by your experience.)
      2. You’re a short-timer here, remember?  Your company sucks.  You’re just renewing your technical skills and pulling a paycheck while finding a company that will treat you better, that is more aligned with your values.
      3. Lastly (and most importantly), I have a question.  Why did you need to become a manager in order to drive sweeping technical change over the past two years?  WHY couldn’t you have done it as a senior IC?  Shouldn’t technical people be responsible for technical decisions, and people managers responsible for people decisions?
        Could this be your next challenge, or part of it?  Could you go back to being an engineer, equipped with your shiny new powers of influence and mystical aura of recent management experience, and use it to organize the other senior ICs to assert their rightful ownership over technical decisions?  Could you use your newfound clout with leadership and upper management to convince them that this will help them recruit and retain better talent, and is a better way to run a technical org — for everyone?

     

I believe this is a better way, but I have only ever seen these changes happen when agitated for and demanded by the senior ICs.  If the senior ICs don’t assert their leadership, managers are unlikely to give it to them.  If managers try, but senior ICs don’t inhabit their power, eventually the managers just shrug and go back to making all the decisions.  That is why ultimately this is a change that must be driven and owned — at a minimum co-owned — by the senior individual contributors.Shared Joys, Kittens

I hope you can push back against that fear of being small and meaningless as an individual contributor.  The fact that it very often is this way, especially in strongly hierarchical organizations, does not mean that it has to be this way; and in healthy organizations it is not this way.  Command-and-control systems are not conducive to creative flourishing.  We have to fight the baggage of the authoritarian structures we inherited in order to make better ones.

Organizations are created afresh each and every day — not created for us, but by us.  Help create the organization you want to work at, where senior people are respected equally and have domains of ownership whether they manage people or technology.  If your current gig won’t value that labor, find one that will..

They exist.  And they want to hire you.

Lots of companies are DYING to hire this kind of senior IC, someone who is still hands on yet feels responsibility for the team as a whole, who knows the business side, who knows how to mentor and craft a culture and can herd cats when nec

There are companies that know how to use ICs at the strategic level, even executive level.  There are bosses who will see you not as a threat, but as a *huge asset* they can entrust with monumental work.

As a senior contributor who moves fluidly between roles, you are especially well-equipped to help shape a sociotechnical organization.  Could you make it your mission to model the kind of relationship you want to see between management and ICs, whichever side you happen to be on?  We need more people figuring out how to build organizations where management is not a promotion, just a change of career, and where going back and forth carries no baggage about promotions and demotions.  Help us.

And when you figure it out, please don’t keep it to yourself.  Expand your influence and share your findings by writing your experiences in blog posts, in articles, in talks.  Tell stories.  Show people people how much better it is this way.  Be so magnificently effective and mysteriously influential as a senior IC that all the baby engineers you work with want to grow up to be just like you.

Hope this helps.


charity

P.S. — Oh and stop fretting about “competing” with the 20-somethings kuberneteheads, you dork. You have been learning shit your whole career and you’ll learn this shit too.  The tech is the easy part.  The tech will always be the easy part.  🙂

Questionable Advice: “After Being A Manager, Can I Be Happy As A Cog?”

Outsource Your O11y: Now Roll It Out And Keep Them Happy (part 3/3)

This is part three of a three-part series of guest posts:

  1. How To Be A Champion, on how to choose a third-party vendor and champion them successfully to your security team.  (George Chamales)
  2. Get Aligned With Security, how to work with your security team to find the best possible outcome for all sides (Lilly Ryan)
  3. Now Roll It Out And Keep Them Happy, on how to operationalize your service by rolling out the integration and maintaining it — and the relationship with your security team — over the long run (Andy Isaacson)

All this pain will someday be worth it.  🙏❤️  charity + friends


“Now Roll It Out And Keep Them Happy”

This is the third in a series of blog posts; previously we analyzed the security challenges of using a third party service, and we worked together with the security team to build empathy to deliver the project.  You might want to read those first, since we are going to build on a lot of the ideas there to ship and maintain this integration.

Ready for launch

You’ve convinced the security team and other stakeholders, you’ve gotten the integration running, you’re getting promising results from dev-test or staging environments… now it’s time to move from proof-of-concept to full implementation.  Depending on your situation this might be a transition from staging to production, or it might mean increasing a feature flipper flag from 5% to 100%, or it might mean increasing coverage of an integration from one API endpoint to cover your entire developer footprint.

Taking into account Murphy’s Law, we expect that some things will go wrong during the rollout.  Perhaps during coverage, a developer realizes that the schema designed to handle the app’s event mechanism can’t represent a scenario, requiring a redesign or a hacky solution.  Or perhaps the metrics dashboard shows elevated error rates from the API frontend, and while there’s no smoking gun, the ops oncall decides to rollback the integration Just In Case it’s causing the incident.

This gives us another chance to practice empathy — while it’s easy, wearing the champion hat, to dismiss any issues found by looking for someone to blame, ultimately this poisons trust within your organization and will hamper success.  It’s more effective, in the long run (and often even in the short run), to find common ground with your peers in other disciplines and teams, and work through to solutions that satisfy everybody.

Keeping the lights on

In all likelihood as integration succeeds, the team will rapidly develop experts and expertise, as well as idiomatic ways to use the product.  Let the experts surprise you; folks you might not expect can step up when given a chance.  Expertise flourishes when given guidance and goals; as the team becomes comfortable with the integration, explicitly recognize a leader or point person for each vendor relationship.  Having one person explicitly responsible for a relationship lets them pay attention to those vendor emails, updates, and avoid the tragedy of the “but I thought *you* were” commons.  This Integration Lead is also a center of knowledge transfer for your organization — they won’t know everything or help every user come up to speed, but they can help empower the local power users in each team to ramp up their teams on the integration.

As comfort grows you will start to consider ways to change your usage, for example growing into new kinds of data.  This is a good time to revisit that security checklist — does the change increase PII exposure to your vendor?  Would the new data lead to additional requirements such as per-field encryption?  Don’t let these security concerns block you from gaining valuable insight using the new tool, but do take the chance to talk it over with your security experts as appropriate.

Throughout this organic growth, the Integration Lead remains core to managing your changing profile of usage of the vendor they shepherd; as new categories of data are added to the integration, the Lead has responsibility to ensure that the vendor relationship and risk profile are well matched to the needs that the new usage (and presumably, business value) is placing on the relationship.

Documenting the Intergation Lead role and responsibilities is critical. The team should know when to check in, and writing it down helps it happen.  When new code has a security implication, or a new use case potentially amplifies the cost of an integration, bringing the domain expert in will avoid unhappy surprises.  Knowing how to find out who to bring in, and when to bring them in, will keep your team getting the right eyes on their changes.

Security threats and other challenges change over time, too.  Collaborating with your security team so that they know what systems are in use helps your team take note of new information that is relevant to your business. A simple example is noting when your vendors publish a breach announcement, but more complex examples happen too — your vendor transitions cloud providers from AWS to Azure and the security team gets an alert about unexpected data flows from your production cluster; with transparency and trust such events become part of a routine process rather than an emergency.

It’s all operational

Monitoring and alerting is a fact of operations life, and this has to include vendor integrations (even when the vendor integration is a monitoring product.)  All of your operations best practices are needed here — keep your alerts clean and actionable so that you don’t develop pager fatigue, and monitor performance of the integration so that you don’t get blindsided by a creeping latency monster in your APIs.

Authentication and authorization are changing as the threat landscape evolves and industry moves from SMS verification codes to U2F/WebAuthn.  Does your vendor support your SSO integration?  If they can’t support the same SSO that you use everywhere else and can’t add it — or worse, look confused when you mention SSO — that’s probably a sign you should consider a different vendor.

A beautiful sunset

Have a plan beforehand for what needs to be done should you stop using the service.  Got any mobile apps that depend on APIs that will go away or start returning permission errors?  Be sure to test these scenarios ahead of time.

What happens at contract termination to data stored on the service?  Do you need to explicitly delete data when ceasing use?

Do you need to remove integrations from your systems before ending the commercial relationship, or can the technical shutdown and business shutdown run in parallel?

In all likelihood these are contingency plans that will never be needed, and they don’t need to be fully fleshed out to start, but a little bit of forethought can avoid unpleasant surprises.

Year after year

Industry best practice and common sense dictate that you should revisit the security questionnaire annually (if not more frequently). Use this chance to take stock of the last year and check in — are you getting value from the service?  What has changed in your business needs and the competitive landscape? 

It’s entirely possible that a new year brings new challenges, which could make your current vendor even more valuable (time to negotiate a better contract rate!) or could mean you’d do better with a competing service.  Has the vendor gone through any major changes?  They might have new offerings that suit your needs well, or they may have pivoted away from the features you need. 

Check in with your friends on the security team as well; standards evolve, and last year’s sufficient solution might not be good enough for new requirements.

 

Andy thinks out loud about security, society, and the problems with computers on Twitter.


 

❤️ Thanks so much reading, folks.  Please feel free to drop any complaints, comments, or additional tips to us in the comments, or direct them to me on twitter.

Have fun!  Stay (a little bit) Paranoid!!

— charity

Outsource Your O11y: Now Roll It Out And Keep Them Happy (part 3/3)

Outsource Your O11y: Get Aligned With Security (part 2/3)

This is part two of a three-part series of guest posts:

  1. How To Be A Champion, on how to choose a third-party vendor and champion them successfully to your security team.  (George Chamales)
  2. Get Aligned With Security, how to work with your security team to find the best possible outcome for all sides (Lilly Ryan)
  3. Now Roll It Out And Keep Them Happy, on how to operationalize your service by rolling out the integration and maintaining it — and the relationship with your security team — over the long run (Andy Isaacson)

All this pain will someday be worth it.  🙏❤️  charity + friends


“Get Aligned With Security”

by Lilly Ryan

If your team has decided on a third-party service to help you gather data and debug product issues, how do you convince an often overeager internal security team to help you adopt it?

When this service is something that provides a pathway for developers to access production data, as analytics tools often do, making the case for access to that data can screech to a halt at the mention of the word “production”. Progressing past that point will take time, empathy, and consideration.

I have been on both sides of the “adopting a new service” fence: as a developer hoping to introduce something new and useful to our stack, and now as a security professional who spends her days trying to bust holes in other people’s setups. I understand both sides of the sometimes-conflicting needs to both ship software and to keep systems safe.  

This guide has advice to help you solve the immediate problem of choosing and deploying a third-party service with the approval of your security team.  But it also has advice for how to strengthen the working relationship between your security and development teams over the longer term. No two companies are the same, so please adapt these ideas to fit your circumstances.

Understanding the security mindset

The biggest problems in technology are never really about technology, but about people. Seeing your security team as people and understanding where they are coming from will help you to establish empathy with them so that both of you want to help each other get what you want, not block each other.

First, understand where your security team is coming from. Development teams need to build features, improve the product, understand and ship good code. Security teams need to make sure you don’t end up on the cover of the NYT for data breaches, that your business isn’t halted by ransomware, and that you’re not building your product on a vulnerable stack.

This can be an unfamiliar frame of mind for developers.  Software development tends to attract positive-minded people who love creating things and are excited about the possibilities of new technology. Software security tends to attract negative thinkers who are skilled at finding all the flaws in a system.  These are very different mentalities, and the people who occupy them tend to have very different assumptions, vocabularies, and worldviews.   

But if you and your security team can’t share the same worldview, it will be hard to trust each other and come to agreement.  This is where practicing empathy can be helpful.

Before approaching your security team with your request to approve a new vendor, you may want to run some practice exercises for putting yourselves in their shoes and forcing yourselves to deliberately cultivate a negative thinking mindset to experience how they may react — not just in terms of the objective risk to the business, or the compliance headaches it might cause, but also what arguments might resonate with them and what emotional reactions they might have.

My favourite exercise for getting teams to think negatively is what I call the Land Astronaut approach.

The “Land Astronaut” Game

Imagine you are an astronaut on the International Space Station. Literally everything you do in space has death as a highly possible outcome. So astronauts spend a lot of time analysing, re-enacting, and optimizing their reactions to events, until it becomes muscle memory. By expecting and training for failure, astronauts use negative thinking to anticipate and mitigate flaws before they happen. It makes their chances of survival greater and their people ready for any crisis.

Your project may not be as high-stakes as a space mission, and your feet will most likely remain on the ground for the duration of your work, but you can bet your security team is regularly indulging in worst-case astronaut-type thinking. You and your team should try it, too.

The Game:

Pick a service for you and your team to game out.  Schedule an hour, book a room with a whiteboard, put on your Land Astronaut helmets.  Then tell your team to spend half an hour brainstorming about all the terrible things that can happen to that service, or to the rest of your stack when that service is introduced.  Negative thoughts only!

Start brainstorming together. Start out by being as outlandish as possible (what happens if their data centre is suddenly overrun by a stampede of elephants?). Eventually you will find that you’ll tire of the extreme worst case scenarios and come to consider more realistic outcomes — some of which which you may not have thought of outside of the structure of the activity.

After half an hour, or whenever you feel like you’re all done brainstorming, take off your Land Astronaut helmets, sift out the most plausible of the worst case scenarios, and try to come up with answers or strategies that will help you counteract them.  Which risks are plausible enough that you should mitigate them?  Which are you prepared to gamble on never happening?  How will this risk calculus change as your company grows and takes on more exposure?

Doing this with your team will allow you all to practice the negative thinking mindset together and get a feel for how your colleagues in the security team might approach this request. (While this may seem similar to threat modelling exercises you might have done in the past, the focus here is on learning to adopt a security mindset and gaining empathy for this thought process, rather than running through a technical checklist of common areas of concern.)

While you still have your helmets within reach, use your negative thinking mindset to fill out the spreadsheet from the first piece in this series.  This will help you anticipate most of the reasonable objections security might raise, and may help you include useful detail the security team might not have known to ask for.

Once you have prepared your list of answers to George’s worksheet and held a team Land Astronaut session together, you will have come most of the way to getting on board with the way your security team thinks.

Preparing for compromise

You’ve considered your options carefully, you’ve learned how to harness negative thinking to your advantage, and you’re ready to talk to your colleagues in security – but sometimes, even with all of these tools at your disposal, you may not walk away with all of the things you are hoping for.

Being willing to compromise and anticipating some of those compromises before you approach the security team will help you negotiate more successfully.

While your Land Astronaut helmets are still within reach, consider using your negative thinking mindset game to identify areas where you may be asked to compromise. If you’re asking for production access to this new service for observability and debugging purposes, think about what kinds of objections may be raised about this and how you might counter them or accommodate them. Consider continuing the activity with half of the team remaining in the Land Astronaut role while the other half advocates from a positive thinking standpoint. This dynamic will get you having conversations about compromise early on, so that when the security team inevitably raises eyebrows, you are ready with answers.

Be prepared to consider compromises you had not anticipated, and enter into discussions with the security team with as open a mind as possible. Remember the team is balancing priorities of not only your team, but other business and development teams as well.  If you and your security colleagues are doing the hard work to meet each other halfway then you are more likely to arrive at a solution that satisfies both parties.

Working together for the long term

While the previous strategies we’ve covered focus on short-term outcomes, in this continuous-deployment, shift-left world we now live in, the best way to convince your security team of the benefits of a third-party service – or any other decision – is to have them along from day one, as part of the team.

Roles and teams are increasingly fluid and boundary-crossing, yet security remains one of the roles least likely to be considered for inclusion on a software development team. Even in 2019, the task of ensuring that your product and stack are secure and well-defended is often left until the end of the development cycle.  This contributes a great deal to the combative atmosphere that is common.

Bringing security people into the development process much earlier builds rapport and prevents these adversarial, territorial dynamics. Consider working together to build Disaster Recovery plans and coordinating for shared production ownership.

If your organisation isn’t ready for that kind of structural shift, there are other ways to work together more closely with your security colleagues.

Try having members of your team spend a week or two embedded with the security team. You may even consider a rolling exchange – a developer for a security team member – so that developers build the security mindset, and the security team is able to understand the problems your team is facing (and why you are looking at introducing this new service).

At the very least, you should make regular time to meet with the security team, get to know them as people, and avoid springing things on them late in the project when change is hardest.

Riding off together into the sunset…?

If you’ve taken the time to get to know your security team and how they think, you’ll hopefully be able to get what you want from them – or perhaps you’ll understand why their objections were valid, and come up with a better solution that works well for both of you.

Investing in a strong relationship between your development and security teams will rarely lead to the apocalypse. Instead, you’ll end up with a better product, probably some new work friends, and maybe an exciting idea for a boundary-crossing new career in tech.

But this story isn’t over! Once you get the green light from security, you’ll need to think about how to roll your new service out safely, maintain it, and consider its full lifespan within your company.  Which leads us to part three of this series, on rolling it out and maintaining it … both your integration and your relationship with the security team.

 

Lilly Ryan is a pen tester, Python wrangler, and recovering historian from Melbourne. She writes and speaks internationally about ethical software, social identities after death, teamwork, and the telegraph. More recently she has researched the domestic use of arsenic in Victorian England, attempted urban camouflage, reverse engineered APIs, wielded the Oxford comma, and baked a really good lemon shortbread.

Outsource Your O11y: Get Aligned With Security (part 2/3)

Ten Platform Commandments

On Monday I gave a talk at DOES18 called “All the World’s a Platform”, where I talked about a bunch of the lessons learned by using and abusing and running and building platforms at scale.

I promised to do a blog post with the takeaways, so here they are.

Platform Commandment #1: Any time you have to think about one particular user, you have failed in some way.  It doesn’t scale.  Just a few one-offs a day will drag you down and drown your forward momentum.

Corollary: you will always have to do this every day.  Solution: turn one-offs into a support problem, not an engineering problem.

Platform Commandment #2: keep your critical path as small and independent as possible.  Have explicit tiers of importance.  You cannot care about everything equally, sacrifices must be made.

Example: at Parse the core API was tier 1, push was tier 2, website was somewhere down around tier 10.  We always knew what to bring up and care about first.

Platform Commandment #3: It is the job of the platform to protect itself at all costs, including at the expense of your app.

Platform Commandment #4: Remember that your platform is a magical black box to your users.  You can’t expect them to behave reasonably without feedback loops and a rich mental model.  Help them out — esp your super-users.  It will save you time if you can help them help themselves.

Platform Commandment #5: Always expose a visible request id, shard id, uuid, trace id, any other relevant diagnostic information in user-visible errors.  Up to the point where it reveals too much exploitable information about your service, which is probably much farther than you think.  Poorly obfuscated infrastructure decisions are usually less of a threat to your business than befuddled users are.

Platform Commandment #6: Your observability must center your users’ perspective, not your own.  The health of the system doesn’t matter.  The health of every request, and every high-cardinality grouping of requests — those are what matter.

You must be able to care about and inspect the perf and quality from the perspective of every single application and/or user and their users, as richly as though theirs was the *only* application.  In real-time. 

Dashboards are practically useless unless you can drill down into them.  Top-10 lists are useless — your biggest customers may not be your most important customers.

Solution: Invest in tooling (like Honeycomb) that lets you slice and dice on dimensions of arbitrary cardinality, so you can do things like a) break down by one uuid out of millions, b) break down by endpoint, latency percentile, raw query, data store, etc — to see what the experience actually looks like for that user, not for a high level aggregate like a dashboard.

Platform Commandment #7: Use end-to-end checks to traverse all the key code paths and architecture paths.

You will be tempted to disable them because they seem flappy and flaky and need to be fixed.  But this is actually what your users are suffering through every day they use your platform.  Don’t disable them, fix them.

Platform Commandment #8: Invest early in every kind of throttle, blacklist, velvet rope, in-flight rewrite, custom url/error responder, content inspection, etc … both partial and total, for every slice of events or users.  You will need all these fine-grained controls to keep your platform alive for 99.9% of users while you debug the .1% who are outliers and bad actors.

Platform Commandment #9: And use a multi-threaded language ffs.

Platform Commandment #10: USE YOUR OWN PLATFORM.  For work, if possible.  Feel the pain that you inflict on others.

Bonus Commandment: all cotenancy isolation guarantees are bullshit**

**from a perf standpoint, not security

Ten Platform Commandments

Post-mortem: feminist advice meltdown (March 2nd)

Okay!  As of today it’s been one week since I wrote some advice and the internet exploded in my face, so now it’s time to do what I always do: post mortem that shit.

This is going to be long.  I erred by making my first post too short, so I’m going to ship $(allthedetail) this time.  Duly warned.

ragrets

What happened?

Around 8 am on Friday, March 2nd, after pulling an all-nighter, I decided to pound out a quick blog post that has been on my todo list forever: the only advice I feel equipped to give on how to succeed in tech.

My advice, in brief, was this:

  1. as a junior engineer, tough it out.  work hard, learn everything, earn your stripes.
  2. stay technical.  don’t get sucked into an offramp unless you are god damn sure you want out for good.
  3. once you are senior, use your power to advocate for others and fuck that shit up.

Money, power, credibility.  This is the best way I know how to earn these things.  This is what worked for me and most of the senior technical women I know and admire.

First of all: I don’t think there should be anything controversial at all about this advice.  It’s good advice, if a bit bluntly put.  Pick your battles, show strategic impact, leverage your influence into power and use that power to fuck shit up in the manner of your choosing.

The fact is, we are far too chickenshit about telling young women straight up how to succeed at work.  We praise them for all kinds of dumb shit and second shift work and emotional labor that has little if any strategic impact to the bottom line, and wonder why they’re burned out and resentful.

We live in a fallen world.  I didn’t make it this way, I just want to help you level up to be a powerful destroyer being so you can make it better.

So I hit “publish”.

Around 9:30 am, Camille Fournier gave me a bunch of unsolicited criticism. Unfortunately, due to some sour personal history with Camille I was extremely not disposed to receive this from her.  I can be a resentful little shit: as soon as she told me to change it in certain ways,  it was the last fucking thing in the world I was going to do.

For a few hours, all the feedback was good. People liked my advice to stay technical (“god I wish someone had told me that 15 years ago”) and my pointing out the loophole that lets women advocate for each other without being penalized.

A few people nailed what I was trying to say even better than I did:

But by the end of the day I was receiving a steady stream of angry tweets from people I had never heard of, with objections that seemed puzzling and ridiculous to me.

They were acting as though the sum total of my advice had been ordering bullied and abused people to just shut up and tough it out.  Soon I was getting tweets accusing me of trashing all diversity work, trashing all women, only being out for myself and my own career, erasing sexual assault, being insensitive and destructive to people of color, and on and on.feamale

People were subtweeting me like crazy, or DM’ing me telling me how much they liked my piece but were afraid to say so in public. Others were harassing my engineering managers and people who follow me.

I have never received textual scrutiny of this type before, where every single word was turned over and macerated and peered at for evidence of traitorous views.  It sucks.  (And it’s pretty hypocritical, to say the least … some of these same women who were gleefully bashing me for clumsy words remain good friends with men who are actual known harassers and abusers of women.)

Lots of people wanted me to take the post down immediately, or publish a retraction or correction immediately. Some prominent feminists publicly chided me and refused to talk to me until I repented of my sins.  🙄

https://twitter.com/vaurorapub/status/970544808086331392

Let’s be clear. I have no problem admitting my errors and making amends. I do it all the fucking time. But I am disinclined to grovel before a howling mob.  It wasn’t even clear to me what I had done wrong, given all the contradictory noises.

So I decided to wait a week before responding, so I could talk to people and figure out what to take away from the mess.

(Also last week: traveled to multiple continents, flew a few dozen hours, wrote multiple talks, delivered presentations at various conferences and meetups, visited and pitched to potential customers, managed a handful of teams, fit 1x1s in between hops and time zones and you know just tried to do my fucking job while dealing with crazed nuts screaming abuse at me online.)

I had a couple of hard but helpful conversations with people like Alice Goldfuss and Courtney Nash, who took the time to walk me through ways that what I wrote may be misinterpreted or wrongly received. This feedback can mostly be bucketed into the following categories:

  • “Assume the reader knows nothing about you and considers you hostile until proven otherwise.” Well shit, I am not used to writing defensively.  I live my life in high trust, high transparency environments and prefer it that way.
  • Your advice doesn’t apply to $x.”  True!  I didn’t bracket it in layers of padding — “this is just what worked for me”, “may not apply to every situation” — because I thought that was freaking obvious.
  • It sounds like you are shit talking all diversity efforts.” No, but I was waving vaguely in the direction of some very cynical and tired feelings on the subject. I’m pretty over corporate diversity issues and pinkwashing that doesn’t expand opportunity or share power.
  • It sounds like you are shitting on all women.” Oof. This is the one that is really painful, because this is the one I have been working hard on for close to 20 years… and should have seen coming. I did intend to put some space between myself and women in tech, because I don’t exactly identify as a woman.. exactly.  I grew up fundamentalist and misogynist af, and have been working hard to recover from that ever since I left home at 15.
  • Maybe you shouldn’t give advice to women at all.” Courtney challenged me on whether I should speak to women, given my ambivalence wrt my own gender identification. Which is an interesting question that I have pondered a lot.

This was all desperately inevitable and predictable, however, and I made some unforced errors. So let’s talk about what I do and don’t regret about all this, and what I would or would not do differently.

witch

Regrets/No regrets

NO REGRETS: giving the advice. It’s good advice, it needed to be said. I’m tired of seeing women burn themselves out on shitty corporate diversity work that only diverts their energy from amassing real power and strategic impact.  Not sorry.

REGRETS: I was sloppy about waving in the direction of my gender issues. I intended to put some space between myself and “women’s issues”, because I don’t exactly identify as a woman, exactly, and I have always felt uncomfortable in women’s spaces. Given the historic devaluation of women’s spaces and issues, I should have been clearer. I am sorry.

NO REGRETS: I think it’s fine for me to give advice to women if they ask, which they do. After all I was raised as a woman, have always been read and treated as one, and assumed there was no other option for 30+ years. I get to speak.  Not sorry.

SEMI-REGRETS: I still can’t figure how anyone managed to project into my piece that I was slamming all diversity work. I said somewhat colorfully that a lot of the advice didn’t work for me and wasn’t my favorite thing to dwell on, in the same grouchy grumbly tone that I use when bitching about query planners and terraform variable interpolation. I don’t think this would have been a big deal if the frenzy hadn’t gotten whipped up, but if anyone genuinely felt hurt or dismissed by it, I can be sorry for that.

REGRETS: the impact it had on my poor engineering managers and other people who work with me. They are still being asked to denounce me or defend me and their decision to work with me. So deeply not ok. I am sorry — but that’s really on you, internet assholes.

BIGGEST REGRETS: any accidental cover given to misogynists. By far the most annoying thing about the brouhaha has been when men with toxic views compliment me because they think I’m agreeing with them.  I am NOT, so get off me.  Sorry not sorry.

SADDEST REGRETS: my plummeting opinion of the feminist internet trash mob. I am a feminist and damn proud of it, but I am also disgusted by the hyperperformative boundary policing of certain self-proclaimed “tech feminists”. If your great joy in life is roving the interwebs looking for any toes pressing a line so you can rapturously castigate them and shun them until they have licked your boots and begged for forgiveness … if you love performing elaborate outrage rituals and whipping up a frenzy of whispers or a witch hunt… then:

laralittleFuck. The Fuck. Off. You are an embarrassment. This is about your ego, and your manufactured grievance machines are Not Helping.

I honestly thought these feminist pile-on mobs were a right-wing fantasy, and I’m sad that I was wrong. I’m also pretty sad about all the folks who know me and have every reason to know better.  In my world you check in with your friends before leaping to judgment, and you help teach each other when you’re being stupid. A pretty dismal number of people I would have called friends just leapt excitedly into the fray passing judgment.

So now I know more about who my friends are.

richer

In conclusion

Why even stick my neck out? I guessed something might go wrong, I just didn’t know what. So why?

Because I want to help, dammit. The farther I get in my career the more time I spend pondering how to bring others along with me, how to open the gates a little wider.

I’ve gotten to do a few things. I have tried to create an equitable, respectful working environment where everyone can do their best work, with managers who are passionate about diversity and strong where I am weak.

But … I have felt very often alienated by the messaging and attempts to help women.  I can’t be the only one who responds more to a strategic message than an empathetic one, who feels condescended to and patronized by the mainstream corporate efforts.

I can’t be the only one who feels simmering resentment every time I get held up as a successful “woman in tech” (the world’s worst participation trophy). I don’t want a fucking consolation prize. I want to sweep the competition, I want to change the world. I can’t be the only one who hungers for power, money and credibility.

I know I’m not, actually. I know because they are telling me. The response has been at least 100-1 positive in private — from junior women especially — thanking me for being brutally honest and treating them like adults, like equals.  (I’ve been told there are armies of women who feel dreadfully hurt but too afraid to say so.  Pity if true, as they say.)

There has always been tension between the people who see the world as it is and fight to succeed in it, and the people who opt out and refuse to participate because it’s compromised.  The world needs us both.  So shut the fuck up and let the kids pick for themselves.

And maybe stop persecuting the people who stand with you.

charity.

P.S. check out Jen Andre’s eloquent restatement of it all.  it’s so great.

nobodies

Charity todo items

  • If I ever again write anything about women or diversity, have someone I trust proof before publishing
  • Remember how much of my audience doesn’t know shit about me, and won’t or can’t assume the best of my intentions
  • Wrap statements in exception handlers about this being my experience blah blah
  • Try not to let people get under my skin and spark a personal reaction
  • Derive somewhat less pleasure from smacking down assholes on the internet, even when they deserve it. [ASPIRATIONAL] [WONTFIX]
Post-mortem: feminist advice meltdown (March 2nd)