Last Wednesday I walked into my living room and saw three gay rednecks in hot pink shirts being married as a “throuple” on a TV screen at close range, followed by one of the grooms singing a country song about a woman feeding her husband’s remains to her tigers.
In Blood Rites, Ehrenreich asks why we sacralize war. Not why we fight wars, or why we are violent necessarily, but why we are drawn to the idea of war, why we compulsively imbue it with an aura of honor and noble sacrifice. If you kill one person, you’re a murderer and we shut you out from society; kill ten and you are a monster; but if you kill thousands, or kill on behalf of the state, we give you medals and write books about you.
And it’s not only about scale or being backed by state power. The calling of war brings out the highest and finest experiences our species can know: it sings of heroism and altruism, of discipline, self-sacrifice, common ground, a life lived well in service; of belonging to something larger than one’s self. Even if, as generations of weary returning soldiers have told us, it remains the same old butchery on the ground, the near-religious allure of war is never dented for long in the popular imagination.
What the fuck is going on?
Ehrenreich is impatient with the traditional scholarship, which locates the origin of war in some innate human aggression or turf wars over resources. She is at her dryly funniest when dispatching feminist theories about violence being intrinsically male or “testosterone poisoning”, showing that the bloodthirstiest of the gods have usually been feminine. (Although there are fascinating symmetries between girls becoming women through menstruation, and boys becoming men through … some form of culturally sanctioned ritual, usually involving bloodshed.)
Rather, she shows that our sacred feelings towards blood shed in war are the direct descendents of our veneration of blood shed in sacrifice — originally towards human sacrifice and other animal sacrifice, in a reenactment of our own ever-so-recent role inversion from prey to predator. Prehistoric sacrifice was likely a way of exerting control over our environment and reenacting the death that gave us life through food.
In her theory, humans do not go to war because we are natural predators. Just the blink of an eye ago, on an evolutionary scale, humans were not predators by any means: we were prey. Weak, blind, deaf, slow, clawless and naked; we scrawny, clever little apes we were easy pickings for the many large carnivores who roamed the planet. We scavenged in the wake of predators and worshiped them as gods. We are the nouveaux riche of predators, constantly re-asserting our dominance to soothe our insecurities.
We go to war not because we are predators, in other words, but because we are prey — and this makes us very uncomfortable! War exists as a vestigial relic of when we venerated the shedding of blood and found it holy — as anyone who has ever opened the Old Testament can attest. It was not until the Axial Age that religions of the world underwent a wholesale makeover into a less bloody, more universalistic set of aspirations.
When I first read this book, years ago, I remember picking it up with a roll of the eyes. “Sounds like some overly-metaphorical liberal academic nonsense” or something like that. But I was hooked within ten pages, my mind racing ahead with even more evidence than she marshals in this lively book. It shifted the way I saw many things in the world.
Like horror movies, for example. Or why cannibalism is so taboo. How Jesus became the Son of God, the Brothers’ Grimm, the sacrament of Communion. The primal fear of being food still resonates through our culture in so many sublimated ways.
And whether what you’re watching is “Tiger King” or the Tiger-King-watchers, it will make A LOT more sense after reading this book too.
Stay safe and don’t kill each other,
 Ehrenreich is best known for her stunning book on the precariousness of the middle class, “Nickel and Dimed”, where she tried to subsist for a year only on whatever work she could get with a high school education. Ehrenreich is a journalist, and this is a piece of science journalism, not scientific research; yet it is well-researched and scrupulously cited, and it’s worth noting that she has a PhD in biology and was once a practicing scientist.
One of my stretch goals for 2019 was to start writing an advice column. I get a lot of questions about everything under the sun: observability, databases, career advice, management problems, what the best stack is for a startup, how to hire and interview, etc. And while I enjoy this, having a high opinion of my own opinions and all, it doesn’t scale as well as writing essays. I do have a (rather all-consuming) day job.
So I’d like to share some of the (edited and lightly anonymized) questions I get asked and some of the answers I have given. With permission, of course. And so, with great appreciation to my anonymous correspondent for letting me publish this, here is one.
I’ve been in tech for 25 years. I don’t have a degree, but I worked my way up from menial jobs to engineering, and since then I have worked on some of the biggest sites in the world. I have been offered a management role many times, but every time I refused. Until about two years ago, when I said “fuck it, I’m almost 40; why not try.”
I took the job with boundless enthusiasm and motivation, because the team was honestly a mess. We were building everything on-prem, and ops was constantly bullying developers over their supposed incompetence. I had gone to conferences, listened to podcasts, and read enough blog posts that my head was full of “DevOps/CloudNative/ServiceOriented//You-build-it-you-run-it/ServantLeaders” idealism. I knew I couldn’t make it any worse, and thought maybe, just maybe I could even make it better.
Soon after I took the job, though, there were company-wide layoffs. It was not done well, and morale was low and sour. People started leaving for happier pastures. But I stayed. It was an interesting challenge, and I threw my heart and soul into it.
For two years I have stayed and grinded it out: recruiting (oh that is so hard), hiring, and then starting a migration to a cloud provider, and with the help of more and more people on the new team, slowly shifted the mindset of the whole engineering group to embrace devops best practices. Now service teams own their code in production and are on-call for them, migrate themselves to the cloud with my team supporting them and building tools for them. It is almost unrecognizable compared to where we were when I began managing.
A beautiful story isn’t it? I hope you’re still reading. 🙂
Now I have to say that with my schedule full of 1:1s, budgeting, hiring, firing, publishing papers of mission statements and OKRs, shaping the teams, wielding influence, I realized that I enjoyed none of the above. I read your 17 reasons not to be a manager, and I check so many boxes. It is a pain in the ass to constantly listen to people’s egos, talk to them and keep everybody aligned (which obviously never happens). And of course I am being crushed between top-down on-the-spot business decisions and bottom-up frustration of poorly executed engineering work under deadlines. I am also destroyed by the mistrust and power games I am witnessing (or involved in, sometimes). while I long for collaboration and trust. And of course when things go well my team gets all the praise, and when things go wrong I take all the blame. I honestly don’t know how one can survive without the energy provided by praise and a sense of achievement.
All of the above makes me miss being an IC (Individual Contributor), where I could work for 8 hours straight without talking to anyone, build stuff, say what I wanted when I wanted, switch jobs if I wasn’t happy, and basically be a little shit like the ones you mention in your article.
But when I think about doing it, I get stuck. I don’t know if I would be able to do it again, or if I could still enjoy it. I’ve seen too many things, I’ve tasted what it’s like to be (sometimes) in control, and I did have a big impact on the company’s direction over time. I like that. If I went back to being an IC, I would feel small and meaningless, like just another cog in the machine. And of course, being 40-ish, I will compete with all those 20-something smartasses who were born with kubernetes.
Thank you for reading. Could you give me your thoughts on this? In any case, it was good to get it off my chest.
Holy shitballs! What an amazing story! That is an incredible achievement in just two years, let alone as a rookie manager. You deserve huge props for having the vision, the courage, and the tenacity to drive such a massive change through.
Of COURSE you’re feeling bored and restless. You didn’t set out on a glorious quest for a life of updating mission statements and OKRs, balancing budgets, tending to people’s egos and fluffing their feelings, tweaking job descriptions, endless 1x1s and meetings meetings meetings, and the rest of the corporate middle manager’s portfolio. You wanted something much bigger. You wanted to change the world. And you did!
But now you’ve done it. What’s next?
First of all, YOUR COMPANY SUCKS. You don’t once mention your leadership — where are they in all this? If you had a good manager, they would be encouraging you and eagerly lining up a new and bigger role to keep you challenged and engaged at work. They are not, so they don’t deserve you. Fuck em. Please leave.
Another thing I am hearing from you is, you harbor no secret desire to climb the managerial ranks at this time. You don’t love the daily rhythms of management (believe it or not, some do); you crave novelty and mastery and advancement. It sounds like you are willing to endure being a manager, so long as that is useful or required in order to tackle bigger and harder problems. Nothing wrong with that! But when the music stops, it’s time to move on. Nobody should be saddled with a manager whose heart isn’t in the work.
You’re at the two year mark. This is a pivotal moment, because it’s the beginning of the end of the time when you can easily slip back into technical work. It will get harder and harder over the next 2-3 years, and at some point you will no longer have the option.
Picking up another technical role is the most strategic option, the one that maximizes your future opportunities as a technical leader. But you do not seem excited by this option; instead you feel many complex and uncomfortable things. It feels like going backwards. It feels like losing ground. It feels like ceding status and power.
“Management isn’t a promotion, it’s a career change.”
But if management is not a promotion, then going back to an engineering role should not feel like a demotion! What the fuck?!
It’s one thing to say that. Whether it’s true or not is another question entirely, a question of policy and org dynamics. The fact is that in most places, most of the power does go to the managers, and management IS a promotion. Power flows naturally away from engineers and towards managers unless the org actively and vigorously pushes back on this tendency by explicitly allocating certain powers and responsibilities to other roles.
I’m betting your org doesn’t do this. So yeah, going back to being an IC WILL be a step down in terms of your power and influence and ability to set the agenda. That’s going to feel crappy, no question. We humans hate that.
You cannot go back to doing exactly what you did before, for the very simple reason that you are not the same person. You are going to be attuned to power dynamics and ways of influencing that you never were before — and remember, leadership is primarily exercised through influence, not explicit authority.Senior ICs who have been managers are supremely powerful beings, who tend to wield outsize influence. Smart managers will lean on them extensively for everything from shadow management and mentorship to advice, strategy, etc. (Dumb managers don’t. So find a smart manager who isn’t threatened by your experience.)
You’re a short-timer here, remember? Your company sucks. You’re just renewing your technical skills and pulling a paycheck while finding a company that will treat you better, that is more aligned with your values.
Lastly (and most importantly), I have a question. Why did you need to become a manager in order to drive sweeping technical change over the past two years? WHY couldn’t you have done it as a senior IC? Shouldn’t technical people be responsible for technical decisions, and people managers responsible for people decisions? Could this be your next challenge, or part of it? Could you go back to being an engineer, equipped with your shiny new powers of influence and mystical aura of recent management experience, and use it to organize the other senior ICs to assert their rightful ownership over technical decisions? Could you use your newfound clout with leadership and upper management to convince them that this will help them recruit and retain better talent, and is a better way to run a technical org — for everyone?
I believe this is a better way, but I have only ever seen these changes happen when agitated for and demanded by the senior ICs. If the senior ICs don’t assert their leadership, managers are unlikely to give it to them. If managers try, but senior ICs don’t inhabit their power, eventually the managers just shrug and go back to making all the decisions. That is why ultimately this is a change that must be driven and owned — at a minimum co-owned — by the senior individual contributors.
I hope you can push back against that fear of being small and meaningless as an individual contributor. The fact that it very often is this way, especially in strongly hierarchical organizations, does not mean that it has to be this way; and in healthy organizations it is not this way. Command-and-control systems are not conducive to creative flourishing. We have to fight the baggage of the authoritarian structures we inherited in order to make better ones.
Organizations are created afresh each and every day — not created for us, but by us. Help create the organization you want to work at, where senior people are respected equally and have domains of ownership whether they manage people or technology. If your current gig won’t value that labor, find one that will..
They exist. And they want to hire you.
Lots of companies are DYING to hire this kind of senior IC, someone who is still hands on yet feels responsibility for the team as a whole, who knows the business side, who knows how to mentor and craft a culture and can herd cats when nec
There are companies that know how to use ICs at the strategic level, even executive level. There are bosses who will see you not as a threat, but as a *huge asset* they can entrust with monumental work.
As a senior contributor who moves fluidly between roles, you are especially well-equipped to help shape a sociotechnical organization. Could you make it your mission to model the kind of relationship you want to see between management and ICs, whichever side you happen to be on? We need more people figuring out how to build organizations where management is not a promotion, just a change of career, and where going back and forth carries no baggage about promotions and demotions. Help us.
And when you figure it out, please don’t keep it to yourself. Expand your influence and share your findings by writing your experiences in blog posts, in articles, in talks. Tell stories. Show people people how much better it is this way. Be so magnificently effective and mysteriously influential as a senior IC that all the baby engineers you work with want to grow up to be just like you.
Hope this helps.
P.S. — Oh and stop fretting about “competing” with the 20-somethings kuberneteheads, you dork. You have been learning shit your whole career and you’ll learn this shit too. The tech is the easy part. The tech will always be the easy part. 🙂
You’ve convinced the security team and other stakeholders, you’ve gotten the integration running, you’re getting promising results from dev-test or staging environments… now it’s time to move from proof-of-concept to full implementation.Depending on your situation this might be a transition from staging to production, or it might mean increasing a feature flipper flag from 5% to 100%, or it might mean increasing coverage of an integration from one API endpoint to cover your entire developer footprint.
Taking into account Murphy’s Law, we expect that some things will go wrong during the rollout.Perhaps during coverage, a developer realizes that the schema designed to handle the app’s event mechanism can’t represent a scenario, requiring a redesign or a hacky solution.Or perhaps the metrics dashboard shows elevated error rates from the API frontend, and while there’s no smoking gun, the ops oncall decides to rollback the integration Just In Case it’s causing the incident.
This gives us another chance to practice empathy — while it’s easy, wearing the champion hat, to dismiss any issues found by looking for someone to blame, ultimately this poisons trust within your organization and will hamper success.It’s more effective, in the long run (and often even in the short run), to find common ground with your peers in other disciplines and teams, and work through to solutions that satisfy everybody.
Keeping the lights on
In all likelihood as integration succeeds, the team will rapidly develop experts and expertise, as well as idiomatic ways to use the product.Let the experts surprise you; folks you might not expect can step up when given a chance.Expertise flourishes when given guidance and goals; as the team becomes comfortable with the integration, explicitly recognize a leader or point person for each vendor relationship.Having one person explicitly responsible for a relationship lets them pay attention to those vendor emails, updates, and avoid the tragedy of the “but I thought *you* were” commons.This Integration Lead is also a center of knowledge transfer for your organization — they won’t know everything or help every user come up to speed, but they can help empower the local power users in each team to ramp up their teams on the integration.
As comfort grows you will start to consider ways to change your usage, for example growing into new kinds of data.This is a good time to revisit that security checklist — does the change increase PII exposure to your vendor?Would the new data lead to additional requirements such as per-field encryption?Don’t let these security concerns block you from gaining valuable insight using the new tool, but do take the chance to talk it over with your security experts as appropriate.
Throughout this organic growth, the Integration Lead remains core to managing your changing profile of usage of the vendor they shepherd; as new categories of data are added to the integration, the Lead has responsibility to ensure that the vendor relationship and risk profile are well matched to the needs that the new usage (and presumably, business value) is placing on the relationship.
Documenting the Intergation Lead role and responsibilities is critical. The team should know when to check in, and writing it down helps it happen.When new code has a security implication, or a new use case potentially amplifies the cost of an integration, bringing the domain expert in will avoid unhappy surprises.Knowing how to find out who to bring in, and when to bring them in, will keep your team getting the right eyes on their changes.
Security threats and other challenges change over time, too.Collaborating with your security team so that they know what systems are in use helps your team take note of new information that is relevant to your business. A simple example is noting when your vendors publish a breach announcement, but more complex examples happen too — your vendor transitions cloud providers from AWS to Azure and the security team gets an alert about unexpected data flows from your production cluster; with transparency and trust such events become part of a routine process rather than an emergency.
It’s all operational
Monitoring and alerting is a fact of operations life, and this has to include vendor integrations (even when the vendor integration is a monitoring product.)All of your operations best practices are needed here — keep your alerts clean and actionable so that you don’t develop pager fatigue, and monitor performance of the integration so that you don’t get blindsided by a creeping latency monster in your APIs.
Authentication and authorization are changing as the threat landscape evolves and industry moves from SMS verification codes to U2F/WebAuthn.Does your vendor support your SSO integration?If they can’t support the same SSO that you use everywhere else and can’t add it — or worse, look confused when you mention SSO — that’s probably a sign you should consider a different vendor.
A beautiful sunset
Have a plan beforehand for what needs to be done should you stop using the service.Got any mobile apps that depend on APIs that will go away or start returning permission errors?Be sure to test these scenarios ahead of time.
What happens at contract termination to data stored on the service?Do you need to explicitly delete data when ceasing use?
Do you need to remove integrations from your systems before ending the commercial relationship, or can the technical shutdown and business shutdown run in parallel?
In all likelihood these are contingency plans that will never be needed, and they don’t need to be fully fleshed out to start, but a little bit of forethought can avoid unpleasant surprises.
Year after year
Industry best practice and common sense dictate that you should revisit the security questionnaire annually (if not more frequently). Use this chance to take stock of the last year and check in — are you getting value from the service?What has changed in your business needs and the competitive landscape?
It’s entirely possible that a new year brings new challenges, which could make your current vendor even more valuable (time to negotiate a better contract rate!) or could mean you’d do better with a competing service.Has the vendor gone through any major changes?They might have new offerings that suit your needs well, or they may have pivoted away from the features you need.
Check in with your friends on the security team as well; standards evolve, and last year’s sufficient solution might not be good enough for new requirements.
Andy thinks out loud about security, society, and the problems with computers on Twitter.
❤️ Thanks so much reading, folks. Please feel free to drop any complaints, comments, or additional tips to us in the comments, or direct them to me on twitter.
All this pain will someday be worth it. 🙏❤️ charity + friends
“Get Aligned With Security”
by Lilly Ryan
If your team has decided on a third-party service to help you gather data and debug product issues, how do you convince an often overeager internal security team to help you adopt it?
When this service is something that provides a pathway for developers to access production data, as analytics tools often do, making the case for access to that data can screech to a halt at the mention of the word “production”. Progressing past that point will take time, empathy, and consideration.
I have been on both sides of the “adopting a new service” fence: as a developer hoping to introduce something new and useful to our stack, and now as a security professional who spends her days trying to bust holes in other people’s setups. I understand both sides of the sometimes-conflicting needs to both ship software and to keep systems safe.
This guide has advice to help you solve the immediate problem of choosing and deploying a third-party service with the approval of your security team. But it also has advice for how to strengthen the working relationship between your security and development teams over the longer term. No two companies are the same, so please adapt these ideas to fit your circumstances.
Understanding the security mindset
The biggest problems in technology are never really about technology, but about people. Seeing your security team as people and understanding where they are coming from will help you to establish empathy with them so that both of you want to help each other get what you want, not block each other.
First, understand where your security team is coming from. Development teams need to build features, improve the product, understand and ship good code. Security teams need to make sure you don’t end up on the cover of the NYT for data breaches, that your business isn’t halted by ransomware, and that you’re not building your product on a vulnerable stack.
This can be an unfamiliar frame of mind for developers. Software development tends to attract positive-minded people who love creating things and are excited about the possibilities of new technology. Software security tends to attract negative thinkers who are skilled at finding all the flaws in a system. These are very different mentalities, and the people who occupy them tend to have very different assumptions, vocabularies, and worldviews.
But if you and your security team can’t share the same worldview, it will be hard to trust each other and come to agreement. This is where practicing empathy can be helpful.
Before approaching your security team with your request to approve a new vendor, you may want to run some practice exercises for putting yourselves in their shoes and forcing yourselves to deliberately cultivate a negative thinking mindset to experience how they may react — not just in terms of the objective risk to the business, or the compliance headaches it might cause, but also what arguments might resonate with them and what emotional reactions they might have.
My favourite exercise for getting teams to think negatively is what I call the Land Astronaut approach.
The “Land Astronaut” Game
Imagine you are an astronaut on the International Space Station. Literally everything you do in space has death as a highly possible outcome. So astronauts spend a lot of time analysing, re-enacting, and optimizing their reactions to events, until it becomes muscle memory. By expecting and training for failure, astronauts use negative thinking to anticipate and mitigate flaws before they happen. It makes their chances of survival greater and their people ready for any crisis.
Your project may not be as high-stakes as a space mission, and your feet will most likely remain on the ground for the duration of your work, but you can bet your security team is regularly indulging in worst-case astronaut-type thinking. You and your team should try it, too.
Pick a service for you and your team to game out. Schedule an hour, book a room with a whiteboard, put on your Land Astronaut helmets. Then tell your team to spend half an hour brainstorming about all the terrible things that can happen to that service, or to the rest of your stack when that service is introduced. Negative thoughts only!
Start brainstorming together. Start out by being as outlandish as possible (what happens if their data centre is suddenly overrun by a stampede of elephants?). Eventually you will find that you’ll tire of the extreme worst case scenarios and come to consider more realistic outcomes — some of which which you may not have thought of outside of the structure of the activity.
After half an hour, or whenever you feel like you’re all done brainstorming, take off your Land Astronaut helmets, sift out the most plausible of the worst case scenarios, and try to come up with answers or strategies that will help you counteract them. Which risks are plausible enough that you should mitigate them? Which are you prepared to gamble on never happening? How will this risk calculus change as your company grows and takes on more exposure?
Doing this with your team will allow you all to practice the negative thinking mindset together and get a feel for how your colleagues in the security team might approach this request. (While this may seem similar to threat modelling exercises you might have done in the past, the focus here is on learning to adopt a security mindset and gaining empathy for this thought process, rather than running through a technical checklist of common areas of concern.)
While you still have your helmets within reach, use your negative thinking mindset to fill out the spreadsheet from the first piece in this series. This will help you anticipate most of the reasonable objections security might raise, and may help you include useful detail the security team might not have known to ask for.
Once you have prepared your list of answers to George’s worksheet and held a team Land Astronaut session together, you will have come most of the way to getting on board with the way your security team thinks.
Preparing for compromise
You’ve considered your options carefully, you’ve learned how to harness negative thinking to your advantage, and you’re ready to talk to your colleagues in security – but sometimes, even with all of these tools at your disposal, you may not walk away with all of the things you are hoping for.
Being willing to compromise and anticipating some of those compromises before you approach the security team will help you negotiate more successfully.
While your Land Astronaut helmets are still within reach, consider using your negative thinking mindset game to identify areas where you may be asked to compromise. If you’re asking for production access to this new service for observability and debugging purposes, think about what kinds of objections may be raised about this and how you might counter them or accommodate them. Consider continuing the activity with half of the team remaining in the Land Astronaut role while the other half advocates from a positive thinking standpoint. This dynamic will get you having conversations about compromise early on, so that when the security team inevitably raises eyebrows, you are ready with answers.
Be prepared to consider compromises you had not anticipated, and enter into discussions with the security team with as open a mind as possible. Remember the team is balancing priorities of not only your team, but other business and development teams as well. If you and your security colleagues are doing the hard work to meet each other halfway then you are more likely to arrive at a solution that satisfies both parties.
Working together for the long term
While the previous strategies we’ve covered focus on short-term outcomes, in this continuous-deployment, shift-left world we now live in, the best way to convince your security team of the benefits of a third-party service – or any other decision – is to have them along from day one, as part of the team.
Roles and teams are increasingly fluid and boundary-crossing, yet security remains one of the roles least likely to be considered for inclusion on a software development team. Even in 2019, the task of ensuring that your product and stack are secure and well-defended is often left until the end of the development cycle. This contributes a great deal to the combative atmosphere that is common.
Bringing security people into the development process much earlier builds rapport and prevents these adversarial, territorial dynamics. Consider working together to build Disaster Recovery plans and coordinating for shared production ownership.
If your organisation isn’t ready for that kind of structural shift, there are other ways to work together more closely with your security colleagues.
Try having members of your team spend a week or two embedded with the security team. You may even consider a rolling exchange – a developer for a security team member – so that developers build the security mindset, and the security team is able to understand the problems your team is facing (and why you are looking at introducing this new service).
At the very least, you should make regular time to meet with the security team, get to know them as people, and avoid springing things on them late in the project when change is hardest.
Riding off together into the sunset…?
If you’ve taken the time to get to know your security team and how they think, you’ll hopefully be able to get what you want from them – or perhaps you’ll understand why their objections were valid, and come up with a better solution that works well for both of you.
Investing in a strong relationship between your development and security teams will rarely lead to the apocalypse. Instead, you’ll end up with a better product, probably some new work friends, and maybe an exciting idea for a boundary-crossing new career in tech.
But this story isn’t over! Once you get the green light from security, you’ll need to think about how to roll your new service out safely, maintain it, and consider its full lifespan within your company. Which leads us to part three of this series, on rolling it out and maintaining it … both your integration and your relationship with the security team.
Lilly Ryan is a pen tester, Python wrangler, and recovering historian from Melbourne. She writes and speaks internationally about ethical software, social identities after death, teamwork, and the telegraph. More recently she has researched the domestic use of arsenic in Victorian England, attempted urban camouflage, reverse engineered APIs, wielded the Oxford comma, and baked a really good lemon shortbread.
On Monday I gave a talk at DOES18 called “All the World’s a Platform”, where I talked about a bunch of the lessons learned by using and abusing and running and building platforms at scale.
I promised to do a blog post with the takeaways, so here they are.
Platform Commandment #1: Any time you have to think about one particular user, you have failed in some way. It doesn’t scale. Just a few one-offs a day will drag you down and drown your forward momentum.
Corollary: you will always have to do this every day. Solution: turn one-offs into a support problem, not an engineering problem.
Platform Commandment #2: keep your critical path as small and independent as possible. Have explicit tiers of importance. You cannot care about everything equally, sacrifices must be made.
Example: at Parse the core API was tier 1, push was tier 2, website was somewhere down around tier 10. We always knew what to bring up and care about first.
Platform Commandment #3: It is the job of the platform to protect itself at all costs, including at the expense of your app.
Platform Commandment #4: Remember that your platform is a magical black box to your users. You can’t expect them to behave reasonably without feedback loops and a rich mental model. Help them out — esp your super-users. It will save you time if you can help them help themselves.
Platform Commandment #5: Always expose a visible request id, shard id, uuid, trace id, any other relevant diagnostic information in user-visible errors. Up to the point where it reveals too much exploitable information about your service, which is probably much farther than you think. Poorly obfuscated infrastructure decisions are usually less of a threat to your business than befuddled users are.
Platform Commandment #6: Your observability must center your users’ perspective, not your own. The health of the system doesn’t matter. The health of every request, and every high-cardinality grouping of requests — those are what matter.
You must be able to care about and inspect the perf and quality from the perspective of every single application and/or user and their users, as richly as though theirs was the *only* application.In real-time.
Dashboards are practically useless unless you can drill down into them. Top-10 lists are useless — your biggest customers may not be your most important customers.
Solution: Invest in tooling (like Honeycomb) that lets you slice and dice on dimensions of arbitrary cardinality, so you can do things like a) break down by one uuid out of millions, b) break down by endpoint, latency percentile, raw query, data store, etc — to see what the experience actually looks like for that user, not for a high level aggregate like a dashboard.
Platform Commandment #7: Use end-to-end checks to traverse all the key code paths and architecture paths.
You will be tempted to disable them because they seem flappy and flaky and need to be fixed.But this is actually what your users are suffering through every day they use your platform. Don’t disable them, fix them.
Platform Commandment #8: Invest early in every kind of throttle, blacklist, velvet rope, in-flight rewrite, custom url/error responder, content inspection, etc … both partial and total, for every slice of events or users. You will need all these fine-grained controls to keep your platform alive for 99.9% of users while you debug the .1% who are outliers and bad actors.
Platform Commandment #9: And use a multi-threaded language ffs.
Platform Commandment #10: USE YOUR OWN PLATFORM. For work, if possible. Feel the pain that you inflict on others.
Bonus Commandment: all cotenancy isolation guarantees are bullshit**
Okay! As of today it’s been one week since I wrote some advice and the internet exploded in my face, so now it’s time to do what I always do: post mortem that shit.
This is going to be long. I erred by making my first post too short, so I’m going to ship $(allthedetail) this time. Duly warned.
Around 8 am on Friday, March 2nd, after pulling an all-nighter, I decided to pound out a quick blog post that has been on my todo list forever: the only advice I feel equipped to give on how to succeed in tech.
My advice, in brief, was this:
as a junior engineer, tough it out. work hard, learn everything, earn your stripes.
stay technical. don’t get sucked into an offramp unless you are god damn sure you want out for good.
once you are senior, use your power to advocate for others and fuck that shit up.
Money, power, credibility. This is the best way I know how to earn these things. This is what worked for me and most of the senior technical women I know and admire.
First of all: I don’t think there should be anything controversial at all about this advice. It’s good advice, if a bit bluntly put. Pick your battles, show strategic impact, leverage your influence into power and use that power to fuck shit up in the manner of your choosing.
The fact is, we are far too chickenshit about telling young women straight up how to succeed at work. We praise them for all kinds of dumb shit and second shift work and emotional labor that has little if any strategic impact to the bottom line, and wonder why they’re burned out and resentful.
We live in a fallen world. I didn’t make it this way, I just want to help you level up to be a powerful destroyer being so you can make it better.
So I hit “publish”.
Around 9:30 am, Camille Fournier gave me a bunch of unsolicited criticism. Unfortunately, due to some sour personal history with Camille I was extremely not disposed to receive this from her. I can be a resentful little shit: as soon as she told me to change it in certain ways, it was the last fucking thing in the world I was going to do.
For a few hours, all the feedback was good. People liked my advice to stay technical (“god I wish someone had told me that 15 years ago”) and my pointing out the loophole that lets women advocate for each other without being penalized.
A few people nailed what I was trying to say even better than I did:
IME the optimal strategy at an individual/micro level is wildly different or even opposite from optimal strategy at a group/macro level — the politics of the individual being different in kind to the politics of the group
This has been my biggest struggle with some of the talking points - it sounds as if I’m completely deprived of any agency in how my life pans out. Which I find very disturbing - since it then makes me feel like I have no control over anything. That’s very disempowering.
But by the end of the day I was receiving a steady stream of angry tweets from people I had never heard of, with objections that seemed puzzling and ridiculous to me.
They were acting as though the sum total of my advice had been ordering bullied and abused people to just shut up and tough it out. Soon I was getting tweets accusing me of trashing all diversity work, trashing all women, only being out for myself and my own career, erasing sexual assault, being insensitive and destructive to people of color, and on and on.
People were subtweeting me like crazy, or DM’ing me telling me how much they liked my piece but were afraid to say so in public. Others were harassing my engineering managers and people who follow me.
I have never received textual scrutiny of this type before, where every single word was turned over and macerated and peered at for evidence of traitorous views. It sucks. (And it’s pretty hypocritical, to say the least … some of these same women who were gleefully bashing me for clumsy words remain good friends with men who are actual known harassers and abusers of women.)
Lots of people wanted me to take the post down immediately, or publish a retraction or correction immediately. Some prominent feminists publicly chided me and refused to talk to me until I repented of my sins. 🙄
Let’s be clear. I have no problem admitting my errors and making amends. I do it all the fucking time. But I am disinclined to grovel before a howling mob. It wasn’t even clear to me what I had done wrong, given all the contradictory noises.
So I decided to wait a week before responding, so I could talk to people and figure out what to take away from the mess.
(Also last week: traveled to multiple continents, flew a few dozen hours, wrote multiple talks, delivered presentations at various conferences and meetups, visited and pitched to potential customers, managed a handful of teams, fit 1x1s in between hops and time zones and you know just tried to do my fucking job while dealing with crazed nuts screaming abuse at me online.)
I had a couple of hard but helpful conversations with people like Alice Goldfuss and Courtney Nash, who took the time to walk me through ways that what I wrote may be misinterpreted or wrongly received. This feedback can mostly be bucketed into the following categories:
“Assume the reader knows nothing about you and considers you hostile until proven otherwise.” Well shit, I am not used to writing defensively. I live my life in high trust, high transparency environments and prefer it that way.
“Your advice doesn’t apply to $x.” True! I didn’t bracket it in layers of padding — “this is just what worked for me”, “may not apply to every situation” — because I thought that was freaking obvious.
“It sounds like you are shit talking all diversity efforts.” No, but I was waving vaguely in the direction of some very cynical and tired feelings on the subject. I’m pretty over corporate diversity issues and pinkwashing that doesn’t expand opportunity or share power.
“It sounds like you are shitting on all women.” Oof. This is the one that is really painful, because this is the one I have been working hard on for close to 20 years… and should have seen coming. I did intend to put some space between myself and women in tech, because I don’t exactly identify as a woman.. exactly. I grew up fundamentalist and misogynist af, and have been working hard to recover from that ever since I left home at 15.
“Maybe you shouldn’t give advice to women at all.” Courtney challenged me on whether I should speak to women, given my ambivalence wrt my own gender identification. Which is an interesting question that I have pondered a lot.
This was all desperately inevitable and predictable, however, and I made some unforced errors. So let’s talk about what I do and don’t regret about all this, and what I would or would not do differently.
NO REGRETS: giving the advice. It’s good advice, it needed to be said. I’m tired of seeing women burn themselves out on shitty corporate diversity work that only diverts their energy from amassing real power and strategic impact. Not sorry.
REGRETS: I was sloppy about waving in the direction of my gender issues. I intended to put some space between myself and “women’s issues”, because I don’t exactly identify as a woman, exactly, and I have always felt uncomfortable in women’s spaces. Given the historic devaluation of women’s spaces and issues, I should have been clearer. I am sorry.
NO REGRETS: I think it’s fine for me to give advice to women if they ask, which they do. After all I was raised as a woman, have always been read and treated as one, and assumed there was no other option for 30+ years. I get to speak. Not sorry.
SEMI-REGRETS: I still can’t figure how anyone managed to project into my piece that I was slamming all diversity work. I said somewhat colorfully that a lot of the advice didn’t work for me and wasn’t my favorite thing to dwell on, in the same grouchy grumbly tone that I use when bitching about query planners and terraform variable interpolation. I don’t think this would have been a big deal if the frenzy hadn’t gotten whipped up, but if anyone genuinely felt hurt or dismissed by it, I can be sorry for that.
REGRETS: the impact it had on my poor engineering managers and other people who work with me. They are still being asked to denounce me or defend me and their decision to work with me. So deeply not ok. I am sorry — but that’s really on you, internet assholes.
BIGGEST REGRETS: any accidental cover given to misogynists. By far the most annoying thing about the brouhaha has been when men with toxic views compliment me because they think I’m agreeing with them. I am NOT, so get off me. Sorry not sorry.
SADDEST REGRETS: my plummeting opinion of the feminist internet trash mob. I am a feminist and damn proud of it, but I am also disgusted by the hyperperformative boundary policing of certain self-proclaimed “tech feminists”. If your great joy in life is roving the interwebs looking for any toes pressing a line so you can rapturously castigate them and shun them until they have licked your boots and begged for forgiveness … if you love performing elaborate outrage rituals and whipping up a frenzy of whispers or a witch hunt… then:
Fuck. The Fuck. Off. You are an embarrassment. This is about your ego, and your manufactured grievance machines are Not Helping.
I honestly thought these feminist pile-on mobs were a right-wing fantasy, and I’m sad that I was wrong. I’m also pretty sad about all the folks who know me and have every reason to know better. In my world you check in with your friends before leaping to judgment, and you help teach each other when you’re being stupid. A pretty dismal number of people I would have called friends just leapt excitedly into the fray passing judgment.
So now I know more about who my friends are.
Why even stick my neck out? I guessed something might go wrong, I just didn’t know what. So why?
Because I want to help, dammit. The farther I get in my career the more time I spend pondering how to bring others along with me, how to open the gates a little wider.
I’ve gotten to do a few things. I have tried to create an equitable, respectful working environment where everyone can do their best work, with managers who are passionate about diversity and strong where I am weak.
But … I have felt very often alienated by the messaging and attempts to help women. I can’t be the only one who responds more to a strategic message than an empathetic one, who feels condescended to and patronized by the mainstream corporate efforts.
I can’t be the only one who feels simmering resentment every time I get held up as a successful “woman in tech” (the world’s worst participation trophy). I don’t want a fucking consolation prize. I want to sweep the competition, I want to change the world. I can’t be the only one who hungers for power, money and credibility.
I know I’m not, actually. I know because they are telling me. The response has been at least 100-1 positive in private — from junior women especially — thanking me for being brutally honest and treating them like adults, like equals. (I’ve been told there are armies of women who feel dreadfully hurt but too afraid to say so. Pity if true, as they say.)
There has always been tension between the people who see the world as it is and fight to succeed in it, and the people who opt out and refuse to participate because it’s compromised. The world needs us both. So shut the fuck up and let the kids pick for themselves.
And maybe stop persecuting the people who stand with you.
I don’t really do “women stuff” (awkward umbrella term for gender-segregated events and spaces). I don’t really identify with any gender and I find a lot of the advice to be condescending and overly delicate, and it’s just a really boring thing to think and talk about. For me.
But I’m feeling guilty after turning down a bunch of requests to do shit for International Women’s Day next week. So I’m gonna do a thing I’ve been avoiding doing for years, and write down my (deeply problematic but practical) advice.
Toughen up. For your first 10 years or 3 jobs in the industry, you’re a junior contributor. You need them way more than they need you, so suck it up. Try not to dwell on the bullshit. Work hard and level up and always angle for more money and power when you can.
Stay technical. There are a thousand paved ramps out of engineering roles and only a few hard paths back in. Technical excellence is currency in this industry, even more so if your credibility is gonna get challenged again and again. So don’t stop engineering til you’re great at it.
Use your power for good. Once you become a senior contributor — and i’m not talking bullshit titles but real seniority, when people are coming to you for help far more than you go to them — then you can afford to get sensitive. …On behalf of others. The research convincingly shows that women get punished for advocating for themselves, but not for advocating for others. It’s a sweet loophole, use it.
If you feel like table flipping out of tech, just remember the rest of the world is at LEAST as sexist as tech is, but without the money and power and ridiculous life-coddling. Where exactly do you think you’re going to go?
Don’t quit tech: quit your job. There are LOTS of tolerable-to-great companies out there. If you stay and suffer, you’re just rewarding the shitholes with your presence. Don’t reward the shitholes any more than you can help it.
Learn shit, save your money, amass great power. Then use it to fuck shit up.
Last week was the West Coast Velocity conference. I had a terrific time — I think it’s the
best Velocity I’ve been to yet. I also slipped in quite late, the evening before last, to catch Gareth’s session on DevOps vs SRE.
And it was worth it! Holy crap, this was such a fun barnburner of a talk, with Gareth schizophrenically arguing both for and against the key premise of the talk, which was about “Google Infrastructure for Everyone Else (GIFEE)” and whether SRE is a) the highest, noblest goal that we should all aspire towards, or b) mostly irrelevant to anyone outside the Google confines.
Which Gareth won? Check out the slides and judge for yourself. 🙃
At some point in his talk, though, Gareth tossed out something like “Charity probably already has a blog post on this drafted up somewhere.” And I suddenly remembered “Fuck! I DO!” it’s been sitting in my Drafts for months god dammit.
So this is actually a thing I dashed off back in April, after CraftConf. Somebody asked me for my opinion on the internet — always a dangerous proposition — and I went off on a bit of a rant about the differences and similarities between DevOps and SRE, as philosophies and practices.
Time passed and I forgot about it, and then decided it was too stale. I mean who really wants to read a rehash of someone’s tweetstorm from two months ago?
Well Gareth, apparently.
SRE vs DevOps: TWO PHILOSOPHIES ENTER, BOTH ARE PHENOMENALLY SUCCESSFUL AND MUTUALLY DUBIOUS OF ONE ANOTHER
It also has some really fucking obnoxious blurbs. Things like about how “ONLY GOOGLE COULD HAVE DONE THIS”, and an whiff of snobbery throughout the book as though they actually believe this (which is far worse if true).
You can’t really blame the poor blurb’ers, but you can certainly look askance at a massive systems engineering org when it seems as though they’ve never heard of DevOps, or considered how it relates to SRE practices, and may even be completely unaware of what the rest of the industry has been up to for the past 10-plus years. It’s just a little weird.
So here, for the record, is what I said about it.
1) a lot of the philosophical volleying between devops / SRE comes down to a failure to recognize the overwhelming power of context.
Google is a great company with lots of terrific engineers, but you can only say they are THE
BEST at what they do if you’re defining what they do tautologically, i.e. “they are the best at making Google run.” Etsyans are THE BEST at running Etsy, Chefs are THE BEST at building Chef, because … that’s what they do with their lives.
Context is everything here. People who are THE BEST at Googling often flail and flame out in early startups, and vice versa. People who are THE BEST at early-stage startup engineering are rarely as happy or impactful at large, lumbering, more bureaucratic companies like Google. People who can operate equally well and be equally happy at startups and behemoths are fairly rare.
And large companies tend to get snobby and forget this. They stop hiring for unique strengths and start hiring for lack of weaknesses or “Excellence in Whiteboard Coding Techniques,” and congratulate themselves alot about being The Best. This becomes harmful when it translates into to less innovation, abysmal diversity numbers, and a slow but inexorable drift into dinosaurdom.
2) operations engineering is a specialized skill set *at large scale* or *on hard ops problems*. many -- most? companies don't have those.
Everybody thinks their problems are hard, but to a seasoned engineer, most startup problems are not technically all that hard. They’re tedious, and they are infinite, but anyone can figure this shit out. The hard stuff is the rest of it: feverish pace, the need to reevaluate and reprioritize and reorient constantly, the total responsibility, the terror and uncertainty of trying to find product/market fit and perform ten jobs at once and personally deliver to your promises to your customers.
At a large company, most of the hardest problems are bureaucratic. You have to come to terms with being a very tiny cog in a very large wheel where the org has a huge vested interest in literally making you as replicable and replaceable as possible. The pace is excruciatingly slow if you’re used to a startup. The autonony is … well, did I mention the politics? If you want autonomy, you have to master the politics.
3) the outcomes associated with operations (reliability, scalability, operability) are the responsibility of *everyone* from support to CEO.
Everyone. Operational excellence is everyone’s job. Dude, if you have a candidate come in and they’re a jerk to your office manager or your cleaning person, don’t fucking hire that person because having jerks on your team is an operational risk (not to mention, you know, like moral issues and stuff).
But the more engineering-focused your role is, the more direct your impact will be on operational outcomes.
4) therefore, the more literate you are with operational skills, the more effective and powerful you can be -- esp as a software engineer.
As a software engineer, developing strong ops chops makes you powerful. It makes you better at debugging and instrumentation, building resiliency and observability into your own systems and interdependent systems, and building systems that other people can come along and understand and maintain long after you’re gone.
As an operations engineer, those skills are already your bread and butter. You can increase your power in other ways, like by leveling up at software engineering skills like test coverage and automation, or DBA stuff like query optimization and storage engine internals, or by helping the other teams around you level up on their skills (communication and persuasion are chronically underrecognized as core operations engineering skills).
5) specialization is not a bad thing. specialization is how we scale and do capitalism! the problem is when this becomes compartmentalizing.
This doesn’t mean that everyone can or should be able to do everything. (I can’t even SAY the words “full stack engineer” without rolling my eyes.) Generalists are awesome! But past a certain inflection point, specialization is the only way an org can scale.
It’s the only way you make room for those engineering archetypes who only want to dive deep, or who really really love refactoring, or who will save the world then disappear for weeks. Those engineers can be incredibly valuable as part of a team … but they are most valuable in a large org where you have enough generalists to keep the oars rowing along in the meantime.
6) so: Google SRE has an incredibly powerful set of best practices, that enable them to run the largest site in the world incredibly well.
So, back to Google. They’ve done, ahem, rather well for themselves. Made shitbuckets of money, pushed the boundaries of tech, service hardly ever goes down. They have operational demands that most of us never have seen and never will, and their engineers are definitely to be applauded for doing a lot of hard technical and cultural labor to get there.
Mostly because it comes off a little tone deaf in places. I’m not personally pissed off by
the google SRE book, actually, just a little bemused at how legitimately unaware they seem to be about … anything else that the industry has been doing over the past 10 years, in terms of cultural transformation, turning sysadmins into better engineers, sharing on-call rotations, developing processes around empathy and cross-functionality, engineering best practices, etc.
If you try and just apply Google SRE principles to your own org according to their prescriptive model, you’re gonna be in for a really, really bad time.
However, it happens that Jen Davis and Katherine Daniels just published a book called Effective DevOps, which covers a lot of the same ground with a much more varied and inclusive approach. And one of the things they return to over and over again is the power of context, and how one-size-fits-all solutions simply don’t exist, just like unisex OSFA t-shirts are a dirty fucking lie.
Google insularity is … a thing. On the one hand it’s great that they’re opening up a bit! On the other hand it’s a little bit like when somebody barges onto a mailing list and starts spouting without skimming any of the archives. And don’t even get me started on what happens when you hire long, longterm ex-Googlers back into to the real world.
So, so many of us have had this experience of hiring ex-Googlers who automatically assume that the way Google does a thing is CORRECT, not just contextually appropriate. Not just right for Google, but right for everyone, always. Which is just obviously untrue. But the reassimilation process can be quite long and exhausting when the Kool-Aid is so strong.
8) DevOps as a philosophy is much more sensitive to context than SRE philosophy, because it grew from a broader collaborative base.
Because yeah, this is a conversation and a transformation that the industry has been having for a long time now. Compared with the SRE manifesto, the DevOps philosophy is much more crowd-sourced, more flexible, and adaptable to organizations of all stages of developments, with all different requirements and key business differentiators, because it’s benefited from loud, mouthy contributors who aren’t all working in the same bubble all along.
And it’s like Google isn’t even aware this was happening, which is weird.
9) that's it, basically all i'm saying is "all blanket statements are false" including probably this one 🙂 #devops#sre
Orrrrrr, maybe I’m just a wee bit annoyed that I’ve been drawn into this position of having to defend “DevOps”, after many excellent years spent being grumpy about the word and the 10000010101 ways it is used and abused.
(Tell me again about your “DevOps Engineering Team”, I dare you.)
(^^ thanks to @kellan and others who particularly influenced/clarified my thinking around #8, the crowdsourcing of devops)
P.S. I highly encourage you to go read the epic hours-long rant by @matthiasr that kicked off the whole thing. some of which I definitely endorse and some of which not, but I think we could go drink whiskey and yell about this for a week or two easy breezy <3
So, a few hot takes on that … 1) SRE, as practiced by Google, is really just Ops with a lot of management support
The theme of my talk was basically: what should software engineers know and care about when it comes to operations in a world where we are outsourcing more and more core functionality?
If you care about running a quality service or product, or providing your customers with a reasonable level of service, you have to care about operational concerns like design, resiliency, instrumentation and debuggability. No matter how many abstractions there are between you and the bare metal.
If you chose a provider, you do not get to just point your finger at them in the post mortem and say it’s their fault. You chose them, it’s on you. It’s tacky to blame the software or the service, and besides your customers don’t give a shit whose “fault” it is.
So given an infinite number of things to care about, where do you start?
What is your mission, and what are your differentiators?
The first question must always be: what is your mission? Your mission is not writing software. Your mission is delivering whatever it is your customers are paying you for, and you use software to get there. (Code is kind of a liability so you should write as little of it as necessary. hey!! sounds like a good argument for #serverless!)
Second: what are your core differentiators? What are the things that you are doing that are unique, and difficult to replicate, or the things where you have to actually be world class experts in those things?
Those are the things that you will have the hardest time outsourcing, or that you should think about very carefully before outsourcing.
You can outsource labor, but you can’t outsource caring. And nobody but you is in the position to think about your core differentiators and your product in a holistic way.
If you’re a typical early startup, you’re probably using somewhere between 5 and 20 SaaS products to get rid of some of the crap work and offload it to dedicated teams who can do it better than you can, much more cheaply, so you are freed up to work on your core value proposition.
But you still have to think about things like reliability, your security model, your persistent storage models, your query performance, how all these lovely services talk to each other, how you’re going to debug them, how you’re going to repro when things go wrong, etc. You still own these things, even if you don’t run them.
For example, take AWS Lambda. It’s a pretty great service on many dimensions. It’s an early version of the future. It is also INCREDIBLY irritating and challenging to debug in a practically infinite number of insanity-inducing ways.
** Important side note — I’m talking about actual production systems. Parse, Heroku, Lambda, etc are GREAT for prototyping and can take you a long, long way. Early stage startups SHOULD optimize for agility and rapid developer iteration, not reliability. Thx to @joeemison for reminding me that i left that out of the recap.
Focus on the critical path
Your users don’t care if your internal jenkins builds are broken. They don’t care about a whole lot of things that you have to care about … eventually. They do care a lot if your product isn’t actually functional. Which means you have to think through the behavioral and failure characteristics of the providers you’re relying on in any user visible fashion.
Ask lots of questions if you can. (AWS often won’t tell you much, but smaller providers will.) Find out as much as you can about their cotenancy model (shared hardware or isolation?), their typical performance variance (run your own tests, don’t trust their claims), and the underlying storage systems.
Think about how you can bake in resiliency from the user’s perspective, that doesn’t rely on provider guarantees. If you’re on mobile, can you provide a reasonable offline experience? Like Parse did a lot of magic here in the APIs, where it would back off and retry saves if there were any errors.
Can you fail over to another provider if one is down? Is it even worth it at your company’s stage of maturity and engineering resources to invest in this?
How willing are you to be locked into a vendor or provider, and what is the story if you find yourself forced to switch? Or if that service goes away, as so many, many, many of them have done and will do. (RIP, parse.com.)
Listen, outsourcing is awesome. I do it as much as I can. I’m literally helping build a service that provides outsourced metrics, I believe in this version of the future! It’s basically the latest iteration of capitalism in a nutshell: increased complexity –> increased specialization –> you pay other people to do the job better than you –> everybody wins.
But there are tradeoffs, so let’s be real.
The service, if it is smart, will put strong constraints on how you are able to use it, so they are more likely to deliver on their reliability goals. When users have flexibility and options it creates chaos and unreliability. If the platform has to choose between your happiness vs thousands of other customers’ happiness, they will choose the many over the one every time — as they should.
Limits may mysteriously change or be invented as they are discovered, esp with fledgling services. You may be desperate for a particular feature, but you can’t build it. (This is why I went for Kafka over Kinesis.)
You need to think way more carefully and more deeply about visibility and introspection up front than you would if you were running your own services, because you have no ability to log in and use strace or gdb or tail a logfile or run any system profiling commands when things go dark.
In the best case, you’re giving up some control and quality in exchange for experts doing the work better than you could for cheaper (e.g. i’m never running a fucking physical data center again, jesus. EC24lyfe). In a common worse case, it’s less reliable than what you would build AND it’s also opaque AND you can’t tell if it’s down for you or for everyone because frankly it’s just massively harder to build a service that works for thousands/millions of use cases than for any one of them individually.
Ohhhh and let’s just briefly talk about state.
The serverless utopia mostly ignores the problems of stateful services. If pressed they will usually say DynamoDB, or Firebase, or RDS or Aurora or something.
Real question how does state get persisted with #serverless?
I understand scale out of stateless servers, but who stores the state?
This is a big, huge, deep, wide lake of crap to wade in to so all I’m going to say is that there is no such thing as having the luxury of not having to understand how your storage systems work. Queries will get slow, and you’ll need to be able to figure out why and fix them. You’ll hit scaling cliffs where suddenly a perfectly-usable app just starts timing everything out because of that extra second of latency coming from …
The hardware underlying your instance will degrade (there’s a server somewhere under all those abstractions, don’t forget). The provider will have mysterious failures. They will be better than you, probably, but less inclined to give you satisfactory progress updates because there are hundreds or thousands or millions of you all clamoring.
The more you understand about your storage system (and the more you stay in the lane of how it was intended to be used), the happier you’ll be.
These trends are both inevitable and, for the most part, very good news for everyone.
Operations engineering is becoming a more fascinating and specialized skill set. The best engineers are flocking to solve category problems — instead of building the same system at company after company, they are building SaaS solutions to solve it for the internet at large. Just look at the massive explosion in operational software offerings over the past 5-6 years.
This means that the era of the in-house dedicated ops team, which serves as an absorbent buffer for all the pain of software development, is mostly on its way out the door. (And good riddance.)
People are waking up to the fact that software quality improves when feedback loops are tighter for software engineers, which means being on call and owning services end to end. The center of gravity is shifting towards engineering teams owning the services they built.
This is awesome! You get to rent engineers from Google, AWS, Pagerduty, Pingdom, Heroku, etc for much cheaper than if you hired them in-house — if you could even get them, which you probably can’t because talent is scarce.
But the flip side of this is that application engineers need to get better at thinking in traditionally operations-oriented ways about reliability, architecture, instrumentation, visibility, security, and storage. Figure out what your core differentiators are, and own the shit out of those.
Nobody but you can care about your mission as much as you can. Own it, do it. Have fun.
I just got back from the very first ever @serverlessconf in NYC. I have a soft spot for well-curated single-track conferences, and the organizers did an incredible job. Major kudos to @iamstan and team for pulling together such a high-caliber mix of attendees as well as presenters.
I’m really honored that they asked me to speak. And I had a lot of fun delivering my talk! But in all honesty, I turned it down a few times — and then agreed, and then backed out, and then agreed again at the very last moment. I just had this feeling like the attendees weren’t going to want to hear what I was gonna say, or like we weren’t gonna be speaking the same language.
Which … turned out to be mmmmostly untrue. To the organizers’ credit, when I expressed this concern to them, they vigorously argued that they wanted me to talk *because* they wanted a heavy dose of real talk in the mix along with all the airy fairy tales of magic and success.
So #serverless is the new cloud or whatever
Hi, I’m grouchy and I work with operations and data and backend stuff. I spent 3.5 years helping Parse grow from a handful of apps to over a million. Literally building serverless before it was cool TYVM.
So when I see kids saying “the future is serverless!” and “#NoOps!” I’m like okay, that’s cute. I’ve lived the other side of this fairytale. I’ve seen what happens when application developers think they don’t have to care about the skills associated with operations engineering. When they forget that no matter how pretty the abstractions are, you’re still dealing with dusty old concepts like “persistent state” and “queries” and “unavailability” and so forth, or when they literally just think they can throw money at a service to make it go faster because that’s totally how services work.
I’m going to split this up into two posts. I’ll write up a recap of my talk in a sec, but first let’s get some things straight. Like words. Like operations.
What is operations?
Let’s talk about what “operations” actually means, in the year 2016, assuming a reasonably high-functioning engineering environment.
At a macro level, operational excellence is not a role, it’s an emergent property. It is how you get shit done.
Operations is the sum of all of the skills, knowledge and values that your company has built up around the practice of shipping and maintaining quality systems and software.It’s your implicit values as well as your explicit values, habits, tribal knowledge, reward systems.Everybody from tech support to product people to CEO participates in your operational outcomes, even though some roles are obviously more specialized than others.
Saying you have an ops team who is solely responsible for reliability is about as silly as saying that “HR defines and owns our company culture!” No. Managers and HR professionals may have particular skills and responsibilities, but culture is an emergent property and everyone contributes (and it only takes a couple bad actors to spoil the bushel).
Thinking about operational quality in terms of “a thing some other team is responsible for” is just generally not associated with great outcomes. It leads to software engineers who are less proficient or connected to their outcomes, ops teams who get burned out, and an overall lower quality of software and services that get shipped to customers.
These are the specialized skill sets that I associate with really good operations engineers. Do these look optional to you?
It depends on your mission, but usually these are not particularly optional. If you have customers, you need to care about these things. Whether you have a dedicated ops team or not. And you need to care about the tax it imposes on your humans too, especially when it comes to the cognitive overhead of complex systems.
So this is my definition of operations. It doesn’t have to be your definition. But I think it is a valuable framework for helping us reason about shipping quality software and healthy teams. Especially given the often invisible nature of operations labor when it’s done really well. It’s so much easier to notice and reward shipping shiny features than “something didn’t break”.
The inglorious past
Don’t get me wrong — I understand why “operations” has fallen out of favor in a lot of crowds. I get why Google came up with “SRE” to draw a line between what they needed and what the average “sysadmin” was doing 10+ years ago.
Ops culture has a number of well-known and well-documented pathologies: hero/martyr complexes, risk aversion, burnout, etc. I understand why this is offputting and we need to fix it.
Also, historically speaking, ops has attracted a greater proportion of nontraditional oddballs who just love debugging and building things — fewer Stanford CS PhDs, more tinkerers and liberal arts majors and college dropouts (hi). And so they got paid less, and had less engineering discipline, and burned themselves out doing too much ad hoc labor.
But — this is no longer our overwhelming reality, and it is certainly not the reality we are hurtling towards. Thanks to the SRE movement, and the parallel and even more powerful & diverse open source DevOps movement, operations engineers are … engineers. Who specialize in infrastructure. And there’s more value than ever in empathy and fluid skill sets, in engineers who are capable of moving between disciplines and translating between specialties. This is where the “full-stack developer” buzzword comes from. It’s annoying, but reflects a real craving for generalist skill sets.
The BOFH stereotype is dead. Some of the most creative cultural and technical changes in the technical landscape are being driven by the teams most identified with operations and developer tooling. The best software engineers I know are the ones who consistently value the impact and lifecycle of the code they ship, and value deployment and instrumentation and observability. In other words, they rock at ops stuff.
The Glorious Future
And so I think it’s time to bring back “operations” as a term of pride. As a thing that is valued, and rewarded. As a thing that every single person in an org understands as being critical to success. Every organization has unique operational needs, and figuring out what they are and delivering on them takes a lot of creativity and ingenuity on both the cultural and technical front.
“Operations” comes with baggage, no doubt. But I just don’t think that distance and denial are an effective approach for making something better, let alone trash talking and devaluing the skill sets that you need to deliver quality services.
You don’t make operational outcomes magically better by renaming the team “DevOps” or “SRE” or anything else. You make it better by naming it and claiming it for what it is, and helping everyone understand how their role relates to your operational objectives.
And now that I have written this blog post I can stop arguing with people who want to talk about “DevOps Engineers” and whether “#NoOps” is a thing and maybe I can even stop trolling them back about the nascent “#NoDevs” movement. (Haha just kidding, that one is too much fun.)
I mean how hard can it be to just glue together APIs that other people have written and support and scale? 🤔 #serverless#NoDevs