Deploys Are The ✨WRONG✨ Way To Change User Experience

This piece was first published on the honeycomb.io blog on 2023-03-08.

….

I’m no stranger to ranting about deploys. But there’s one thing I haven’t sufficiently ranted about yet, which is this: Deploying software is a terrible, horrible, no good, very bad way to go about the process of changing user-facing code.

It sucks even if you have excellent, fast, fully automated deploys (which most of you do not). Relying on deploys to change user experience is a problem because it fundamentally confuses and scrambles up two very different actions: Deploys and releases.

Deploy

“Deploying” refers to the process of building, testing, and rolling out changes to your production software. Deploying should happen very often, ideally several times a day. Perhaps even triggered every time an engineer lands a change.

Everything we know about building and changing software safely points to the fact that speed is safety and smaller changes make for safer deploys. Every deploy should apply a small diff to your software, and deploys should generally be invisible to users (other than minor bug fixes).

Release

“Releasing” refers to the process of changing user experience in a meaningful way. This might mean anything from adding functionality to adding entire product lines. Most orgs have some concept of above or below the fold where this matters. For example, bug fixes and small requests can ship continuously, but larger changes call for a more involved process that could mean anything from release notes to coordinating a major press release.

A tale of cascading failures

Have you ever experienced anything like this?

Your company has been working on a major new piece of functionality for six months now. You have tested it extensively in staging and dev environments, even running load tests to simulate production. You have a marketing site ready to go live, and embargoed articles on TechCrunch and The New Stack that will be published at 10:00 a.m. PST. All you need to do now is time your deploy so the new product goes live at the same time.

It takes about three hours to do a full build, test, and deploy of your entire system. You’ve deployed as much as possible in advance, and you’ve already built and tested the artifacts, so all you have to do is a streamlined subset of the deploy process in the morning. You’ve gotten it down to just about an hour. You are paranoid, so you decide to start an hour early. So you kick off the deploy script at 8:00 a.m. PST… and sit there biting your nails, waiting for it to finish.

SHIT! 20 minutes through the deploy, there’s a random flaky SSH timeout that causes the whole thing to cancel and roll back. You realize that by running a non-standard subset of the deploy process, some of your error handling got bypassed. You frantically fix it and restart the whole process.

Your software finishes deploying at 9:30 a.m., 30 minutes before the embargoed articles go live. Visitors to your website might be confused in the meantime, but better to finish early than to finish late, right? 😬

Except… as 10:00 a.m. rolls around, and new users excitedly begin hitting your new service, you suddenly find that a path got mistyped, and many requests are returning 500. You hurriedly merge a fix and begin the whole 3-hour long build/test/deploy process from scratch. How embarrassing! 🙈

Deploys are a terrible way to change user experience

The build/release/deploy process generally has a lot of safeguards and checks baked in to make sure it completes correctly. But as a result…

  • It’s slow
  • It’s often flaky
  • It’s unreliable
  • It’s staggered
  • The process itself is untestable
  • It can be nearly impossible to time it right
  • It’s very all or nothing—the norm is to roll back completely upon any error
  • Fixing a single character mistake takes the same amount of time as doubling the feature set!

Changing user-visible behaviors and feature sets using the deploy process is a great way to get egg on your face. Because the process is built for distributing large code distributions or artifacts; user experience gets changed only as a side effect.

So how should you change user experience?

By using feature flags.

Feature flags: the solution to many of life’s software’s problems

You should deploy your code continuously throughout the day or week. But you should wrap any large, user-visible behavior changes behind a feature flag, so you can release that code by flipping a flag.

This enables you to develop safely without worrying about what your users see. It also means that turning a feature on and off no longer requires a diff, a code review, or a deploy. Changing user experience is no longer an engineering task at all.

Deploys are an engineering task, but releases can be done by product managers—even marketing teams. Instead of trying to calculate when to begin deploying by working backwards from 10:00 a.m., you simply flip the switch at 10:00 a.m.

Testing in production, progressive delivery

The benefits of decoupling deploys and releases extend far beyond timely launches. Feature flags are a critical tool for apostles of testing in production (spoiler alert: everybody tests in production, whether they admit it or not; good teams are aware of this and build tools to do it safely). You can use feature flags to do things like:

  • Enable the code for internal users only
  • Show it to a defined subset of alpha testers, or a randomized few
  • Slowly ramp up the percentage of users who see the new code gradually. This is super helpful when you aren’t sure how much load it will place on a backend component
  • Build a new feature, but only turning it on for a couple “early access” customers who are willing to deal with bugs
  • Make a perf improvement that should be bulletproof logically (and invisible to the end user), but safely. Roll it out flagged off, and do progressive delivery starting with users/customers/segments that are low risk if something’s fucked up
  • Doing something timezone-related in a batch process, and testing it out on New Zealand (small audience, timezone far away from your engineers in PST) first

Allowing beta testing, early adoption, etc. is a terrific way to prove out concepts, involve development partners, and have some customers feel special and extra engaged. And feature flags are a veritable Swiss Army Knife for practicing progressive delivery.

It becomes a downright superpower when combined with an observability tool (a real one that supports high cardinality, etc.), because you can:

  • Break down and group by flag name plus build id, user id, app id, etc.
  • Compare performance, behavior, or return code between identical requests with different flags enabled
  • For example, “requests to /export with flag “USE_CACHING” enabled are 3x slower than requests to /export without that flag, and 10% of them now return ‘402’”

It’s hard to emphasize enough just how powerful it is when you have the ability to break down by build ID and feature flag value and see exactly what the difference is between requests where a given flag is enabled vs. requests where it is not.

It’s very challenging to test in production safely without feature flags; the possibilities for doing so with them are endless. Feature flags are a scalpel, where deploys are a chainsaw. Both complement each other, and both have their place.

“But what about long-lived feature branches?”

Long-lived branches are the traditional way that teams develop features, and do so without deploying or releasing code to users. This is a familiar workflow to most developers.

But there is much to be said for continuously deploying code to production, even if you aren’t exposing new surface area to the world. There are lots of subterranean dependencies and interactions that you can test and validate all along.

There’s also something very psychologically different between working with branches. As one of our engineering directors, Jess Mink, says:

There’s something very different, stress and motivation-wise. It’s either, ‘my code is in a branch, or staging env. We’re releasing, I really hope it works, I’ll be up and watching the graphs and ready to respond,’ or ‘oh look! A development customer started using my code. This is so cool! Now we know what to fix, and oh look at the observability. I’ll fix that latency issue now and by the time we scale it up to everyone it’s a super quiet deploy.’

Which brings me to another related point. I know I just said that you should use feature flags for shipping user-facing stuff, but being able to fix things quickly makes you much more willing to ship smaller user-facing fixes. As our designer, Sarah Voegeli, said:

With frequent deploys, we feel a lot better about shipping user-facing changes via deploy (without necessarily needing a feature flag), because we know we can fix small issues and bugs easily in the next one. We’re much more willing to push something out with a deploy if we know we can fix it an hour or two later if there’s an issue.

Everything gets faster, which instills more confidence, which means everything gets faster. It’s an accelerating feedback loop at the heart of your sociotechnical system.

“Great idea, but this sounds like a huge project. Maybe next year.”

I think some people have the idea that this has to be a huge, heavyweight project that involves signing up for a SaaS, forking over a small fortune, and changing everything about the way they build software. While you can do that—and we’re big fans/users of LaunchDarkly in particular—you don’t have to, and you certainly don’t have to start there.

As Mike Terhar from our customer success team says, “When I build them in my own apps, it’s usually just something in a ‘configuration’ database table. You can make a config that can enable/disable, or set a scope by team, user, region, etc.”

You don’t have to get super fancy to decouple deploys from releases. You can start small. Eliminate some pain today.

In conclusion

Decoupling your deploys and releases frees your engineering teams to ship small changes continuously, instead of sitting on branches for a dangerous length of time. It empowers other teams to own their own roadmaps and move at their own pace. It is better, faster, safer, and more reliable than trying to use deploys to manage user-facing changes.

If you don’t have feature flags, you should embrace them. Do it today! 🌈

Deploys Are The ✨WRONG✨ Way To Change User Experience

Software deploys and cognitive biases

There exist some wonderful teams out there who have valid, well thought through, legitimate reasons for enforcing “NO FRIDAY DEPLOYS” week in and week out, for not hooking CI/CD up to autodeploy, and for not shipping one person’s changes at a time.

And then there are the reasons most people have.

Bad decisions, and the biases they came from

 

We’re humans. 💜  We leap to conclusions with the wetware we have doing the best it can based on heuristics that feel objectively true, but are ultimately just emotional reactions based on past lived experience. And then we retroactively enshrine those goofy gut feelings with the language of noble motive and moral values.

“I tell people not to deploy to production … because I care so deeply about my team and their ability to have a quiet weekend.”

Barf. 🙄  That’s just like saying you tell your kid not to brush his teeth at night, because you care SO DEEPLY about him and his ability to go to bed calm and happy.

Once the retcon engine in your brain gets running, it comes up with all sorts of reasons. Plausible-sounding reasons! But every single argument of the items in the list above is materially false.

Deploy myths are never going away for good; they appeal to too many of our cognitive biases. But what if there was one simple thing you could do that would invert many of these cognitive biases and cause people to grapple with the question in a new way? What if you could kickstart a recalculation?

My next post will pick up right here. I’ll tell you all about the One Simple Trick you can do to fix your deploys and set you on the virtuous path of high-performing teams.

Til then, here’s what I’ve previously written on the topic.

 

Footnotes

 

Availability bias: The tendency to overestimate the likelihood of events with greater “availability” in memory, which can be influenced by how recent the memories are or how unusual or emotionally charged they may be.

Continued influence effect: The tendency to believe previously learned misinformation even after it has been corrected. Misinformation can still influence inferences one generates after a correction has occurred.

Conservatism bias: The tendency to revise one’s belief insufficiently when presented with new evidence.

Default effect: When given a choice between several options, the tendency to favor the default one.

Dread aversion: Just as losses yield double the emotional impact of gains, dread yields double the emotional impact of savouring

False-uniqueness bias: The tendency of people to see their projects and themselves as more singular than they actually are.

Functional fixedness: Limits a person to using an object only in the way it is traditionally used

Hyperbolic discounting: Discounting is the tendency for people to have a stronger preference for more immediate payoffs relative to later payoffs. Hyperbolic discounting leads to choices that are inconsistent over time – people make choices today that their future selves would prefer not to have made, despite using the same reasoning

IKEA effect: The tendency for people to place a disproportionately high value on objects that they partially assembled themselves, such as furniture from IKEA, regardless of the quality of the end product

Illusory truth effect: A tendency to believe that a statement is true if it is easier to process, or if it has been stated multiple times, regardless of its actual veracity.

Irrational escalation: The phenomenon where people justify increased investment in a decision, based on the cumulative prior investment, despite new evidence suggesting that the decision was probably wrong. Also known as the sunk cost fallacy

Law of the instrument: An over-reliance on a familiar tool or methods, ignoring or under-valuing alternative approaches. “If all you have is a hammer, everything looks like a nail”

Mere exposure effect: The tendency to express undue liking for things merely because of familiarity with them

Negativity bias: Psychological phenomenon by which humans have a greater recall of unpleasant memories compared with positive memories

Non-adaptive choice switching: After experiencing a bad outcome with a decision problem, the tendency to avoid the choice previously made when faced with the same decision problem again, even though the choice was optimal

Omission bias: The tendency to judge harmful actions (commissions) as worse, or less moral, than equally harmful inactions (omissions).

Ostrich effect: Ignoring an obvious (negative) situation

Plan continuation bias: Failure to recognize that the original plan of action is no longer appropriate for a changing situation or for a situation that is different than anticipated

Prevention bias: When investing money to protect against risks, decision makers perceive that a dollar spent on prevention buys more security than a dollar spent on timely detection and response, even when investing in either option is equally effective

Pseudocertainty effect: The tendency to make risk-averse choices if the expected outcome is positive, but make risk-seeking choices to avoid negative outcomes

Salience bias: The tendency to focus on items that are more prominent or emotionally striking and ignore those that are unremarkable, even though this difference is often irrelevant by objective standards

Selective perception bias: The tendency for expectations to affect perception

Status-quo bias: If no special action is taken, the default action that will happen is that the code will go live. You will need an especially compelling reason to override this bias and manually stop the code from going live, as it would by default.

Slow-motion bias: We feel certain that we are more careful and less risky when we slow down. This is precisely the opposite of the real world risk factors for shipping software. Slow is dangerous for software; speed is safety. The more frequently you ship code, the smaller the diffs you ship, the less dangerous each one actually becomes. This is the most powerful and difficult to overcome of all of our biases, because there is no readily available counter-metaphor for us to use. (Riding a bike is the best I’ve come up with. 😔)

Surrogation: Losing sight of the strategic construct that a measure is intended to represent, and subsequently acting as though the measure is the construct of interest

Time-saving bias: Underestimations of the time that could be saved (or lost) when increasing (or decreasing) from a relatively low speed and overestimations of the time that could be saved (or lost) when increasing (or decreasing) from a relatively high speed.

Zero-risk bias: Preference for reducing a small risk to zero over a greater reduction in a larger risk.

Software deploys and cognitive biases

How much is your fear of continuous deployment costing you?

Most people aren’t doing true CI/CD. Most teams wait far too long to get their code into prod after writing it. Most painful of all are the teams who have done all the hard parts — wired up continuous integration, achieved test coverage, etc — but still deploy by hand, thus depriving themselves of the payoff for their hard work.

Any time an engineer merges a diff back to main, this ought to trigger a full run of your CI/CD pipeline, culminating with an automatic deploy to production. This should happen once per mergeset, never batching multiple engineers’ diffs in a run, and it should be over and done with in 15 minutes or less with no human intervention.

It’s 2021, and everyone should know this by now.

✨✨15 minutes or bust✨✨

Okay, but what if you don’t? How costly can it be, really?

Let’s do some back of the envelope math. First you’ll need to answer a couple questions about your org and deploy pipeline.

  • How many engineers do you have? ____________
  • How long typically elapses between when someone writes code and that code is live in production? _____________

Let (n) be the number of engineers it takes to efficiently build and run your product, assuming each set of changes will autodeploy individually in <15 min.

  • If changes typically ship on the order of hours, you need 2(n).
  • If changes ship on the order of days, you need 2(2(n)).
  • If changes ship on the order of weeks, you need 2(2(2(n)))
  • If changes ship on the order of months, you need 2(2(2(2(n))))

Your 6 person team with a consistent autodeploy loop would take 24 people to do the same amount of work, if it took days to deploy their changes. Your 10 person team that ships in weeks would need 80 people.

At cost to the company of approx 200k per engineer, that’s $3.6 million in the first example and $14 million in the second example. That’s how much your neglect of internal tools and kneejerk fear of autodeploy might be costing you.

It’s not just about engineers. The more delay you add into the process of building and shipping code, the more pathologies multiply, and you find yourselves needing to spend more and more time addressing those pathologies instead of making forward progress for the business. Longer diffs. Manual deploy processes. Bunching up multiple engineers’ diffs in a single deploy, then spending the rest of the day trying to figure out which one was at fault for the error.

Soon you need an SRE team to handle your reliability issues, build engineering specialists to build internal tools for all these engineers, managers to manage the teams, product folks to own the roadmap and project managers to coordinate all this blocking and waiting on each other…

You could have just fixed your build process. You could have just committed to continuous delivery. You would be moving more swiftly and confidently as a small, killer team than you ever could at your lumbering size.

✨✨15 minutes or bust✨✨

In 2021, how will *you* achieve the dream of CI/CD, and liberate your engineers from the shackles of pointless toil?

P.S. if you want to know my methodology for coming up with this equation, it’s called “pulled out of my ass because it sounded about right, then checked with about a dozen other technical folks to see if it aligned with their experience.”

 

 

How much is your fear of continuous deployment costing you?

“Why are my tests so slow?” A list of likely suspects, anti-patterns, and unresolved personal trauma.

Over the past couple of weeks I’ve been tweeting a LOT about lead time to deploy: the interval encompassing the time from when the code gets written and when it’s been deployed to production. Also described as “how long it takes you to run CI/CD.”

How important is this?

Fucking central.

Here is a quickie thread from this week, or just go read “Accelerate” like everybody already should have. 🙃

It’s nigh impossible to have a high-performing team with a long lead time, and becomes drastically easier with a dramatically shorter lead time.

🌷 Shorter is always better.
🌻 One mergeset per deploy.
🌹 Deploy should be automatic.

And it should clock in under 15 minutes, all the way from “merging!” to “deployed!”.

Now some people will nod and agree here, and others freak the fuck out. “FIFTEEN MINUTES?” they squall, and begin accusing me of making things up or working for only very small companies. Nope, and nope. There are no magic tricks here, just high standards and good engineering, and the commitment to maintaining your goals quarter by quarter.

If you get CI/CD right, a lot of other critical functions, behaviors, and intuitions are aligned to be comfortably successful and correct with minimal effort. If you get it wrong, you will spend countless cycles chasing pathologies. It’s like choosing to eat your vegetables every day vs choosing a diet of cake and soda for fifty years, then playing whackamole with all the symptoms manifesting on your poor, mouldering body.

Is this ideal achievable for every team, on every stack, product, customer and regulatory environment in the world? No, I’m not being stupid or willfully blind. But I suggest pouring your time and creative energy into figuring out how closely you can approximate the ideal given what you have, instead of compiling all the reasons why you can’t achieve it.

Most of the people who tell me they can’t do this are quite wrong, turns out. And even if you can’t down to 15 minutes, ANY reduction in lead time will pay out massive, compounding, benefits to your team and adjacent teams forever and ever.

So — what was it you said you were working on right now, exactly? that was so important? 🤔

“Cutting my build time by 90%!” — you

Huzzah. 🤠

So let’s get you started! Here, courtesy of my twitterfriends, is a long compiled list of Likely Suspects and CI/CD Offenders, a long list of anti-patterns, and some unresolved personal pain & suffering to hunt down and question when your build gets slow..

✨15 minutes or bust, baby!✨

 
Where it all started: what keeps you from getting under 15 minute CI/CD runs?

Generally good advice.

  • Instrument your build pipeline with spans and traces so you can see where all your time is going. ALWAYS. Instrument.
  • Order tests by time to execute and likelihood of failure.
  • Don’t run all tests, only tests affected by your change
  • Similarly, reduce build scope; if you only change front-end code, only build/test/deploy the front end, and for heaven’s sake don’t fuss with all the static asset generation
  • Don’t hop regions or zones any more than you absolutely must.
  • Prune and expire tests regularly, don’t wait for it to get Really Bad
  • Combine functionality of tests where possible — tests need regular massages and refactors too
  • Pipeline, pipeline, pipeline tests … with care and intention
  • You do not need multiple non-production environment in your CI/CD process. Push your artifacts to S3 and pull them down from production. Fight me on this
  • Pull is preferable to push. (see below)
  • Set a time elapsed target for your team, and give it some maintenance any time it slips by 25%

The usual suspects

  • tests that take several seconds to init
  • setup/teardown of databases (HINT try ramdisks)
  • importing test data, seeding databases, sometimes multiple times
  • rsyncing sequentially
  • rsyncing in parallel, all pulling from a single underprovisioned source
  • long git pulls (eg cloning whole repo each time)
  • CI rot (eg large historical build logs)
  • poor teardown (eg prior stuck builds still running, chewing CPU, or artifacts bloating over time
  • integration tests that spin up entire services (eg elasticsearch)
  • npm install taking 2-3 minutes
  • bundle install taking 5 minutes
  • resource starvation of CI/CD system
  • not using containerized build pipeline
  • …(etc)
 
Continuous deployment to industrial robots in prod?? Props, man.

Not properly separating the streams of “Our Software” (changes constantly) vs “infrastructure” (changes rarely)

  • running cloudformation to setup new load balancers, dbs, etc for an entire acceptance environment
  • docker pulls, image builds, docker pushes container spin up for tests

“Does this really go here?”

  • packaging large build artifacts into different format for distribution
  • slow static source code analysis tools
  • trying to clone production data back to staging, or reset dbs between runs
  • launching temp infra of sibling services for end-to-end tests, running canaries
  • selenium and other UX tests, transpiling and bundling assets

“Have a seat and think about your life choices.”

  • excessive number of dependencies
  • extreme legacy dependencies (things from the 90s)
  • tests with “sleep” in them
  • entirely too large frontends that should be broken up into modules

“We regret to remind you that most AWS calls operate at the pace of ‘Infrastructure’, not ‘Software'”

  • AWS CodeBuild has several minutes of provisioning time before you’re even executing your own code — even a few distinct jobs in a pipeline and you might suffer 15 min of waiting for CodeBuild to do actual work
  • building a new AMI
  • using EBS
  • spinning up EC2 nodes .. sequentially 😱
  • cool it with the AWS calls basically



A few responses were oozing with some unresolved trauma, lol.

Natural Born Opponents: “Just cache it” and “From the top!”

  • builds install correct version of toolchain from scratch each time
  • rebuilding entire project from source every build
  • failure to cache dependencies across runs (eg npm cache not set properly)

“Parallelization: the cause of, and solution to, all CI problems”

  • shared test state, which prevents parallel testing due to flakiness and non-deterministic test results
  • not parallelizing tests
 
 
I have so many questions….

Thanks to @wrd83, @sorenvind, @olitomli, @barney_parker, @dastbe, @myajpitz, @gfodor, @mrz, @rwilcox, @tomaslin, @pwyliu, @runewake2, @pdehlkefor, and many more for their contributions!

P.S. what did I say about instrumenting your build pipeline? For more on honeycomb + instrumentation, see this thread. Our free tier is incredibly generous, btw ☺️

Stay tuned for more long form blog posts on this topic. Coming soon. 🌈

charity

P.S. this blog post is the best thing i’ve ever read about reducing your build time. EVER.

“Why are my tests so slow?” A list of likely suspects, anti-patterns, and unresolved personal trauma.