Power has a way of flowing towards people managers over time, no matter how many times you repeat “management is not a promotion, it’s a career change.”
It’s natural, like water flowing downhill. Managers are privy to performance reviews and other personal information that they need to do their jobs, and they tend to be more practiced communicators. Managers facilitate a lot of decision-making and routing of people and data and things, and it’s very easy to slip into making the all decisions rather than empowering people to make them. Sometimes you want to just hand out assignments and order everyone to do as told. (er, just me??)
But if you let all the power drift over to the engineering managers, pretty soon it doesn’t look so great to be an engineer. Now you have people becoming managers for all the wrong reasons, or everyone saying they want to be a manager, or engineers just tuning out and turning in their homework (or quitting). We all want autonomy and impact, we all crave a seat at the table. You need to work harder to save those seats for non-managers.
So, in the spirit of the enumerated rights and responsibilities of our musty Constitution, here are some of the commitments we make to our engineers at Honeycomb — and some of the expectations we have for managering and engineering roles. Some of them mirror each other, and others are very different.
(Incidentally, I find it helpful to practice visualizing the org chart hierarchies upside down — placing managers below their teams as support structure rather than perched atop.)
Engineer’s Bill of Rights
You should be free to go heads down and focus, and trust that your manager will tap you when you are needed (or would want to be included).
We will invest in you as a leader, just like we invest in managers. Everybody will have opportunities to develop their leadership and interpersonal skills.
Technical decisions must remain the provenance of engineers, not managers.
You deserve to know how well you are performing, and to hear it early and often if you aren’t meeting expectations.
On call should not substantially impact your life, sleep, or health (other than carrying your devices around). If it does, we will fix it.
Your code reviews should be turned around in 24 hours or less, under ordinary circumstances.
You should have a career path that challenges you and contributes to your personal life goals, with the coaching and support you need to get there.
You should substantially choose your own work, in consultation with your manager and based on our business goals. This is not a democracy, but you will have a voice in our planning process.
You should be able to do your work whether in or out of the office. When you’re working remotely, your team will loop you in and have your back.
Make forward progress on your projects every week. Be transparent.
Make forward progress on your career every quarter. Push your limits.
Build a relationship of trust and mutual vulnerability with your manager and team, and invest in those relationships.
Know where you stand: how well are you performing, how quickly are you growing?
Develop your technical judgment and leadership skills. Own and be accountable for engineering outcomes. Ask for help when you need it, give help when asked.
Give feedback early and often, receive feedback gracefully. Practice both saying no and hearing no. Let people retract and try again if it doesn’t come out quite right.
Own your time and actively manage your calendar. Spend your attention tokens mindfully.
Recruit and hire and train your team. Foster a sense of solidarity and “teaminess” as well as real emotional safety.
Care for every engineer on your team. Support them in their career trajectory, personal goals, work/life balance, and inter- and intra-team dynamics.
Give feedback early and often. Receive feedback gracefully. Always say the hard things, but say them with love.
Move us relentlessly forward, watching out for overengineering and work that doesn’t contribute to our goals. Ensure redundancy/coverage of critical areas.
Own the quarterly planning process for your team, be accountable for the goals you set. Allocate resources by communicating priorities and recruiting eng leads. Add focus or urgency where needed.
Own your time and attention. Be accessible. Actively manage your calendar. Try not to make your emotions everyone else’s problems (but do lean on your own manager and your peers for support).
Make your own personal growth and self-care a priority. Model the values and traits we want our engineers to pattern themselves after.
I’d love to hear from anyone else who has a list like this.
Okay! As of today it’s been one week since I wrote some advice and the internet exploded in my face, so now it’s time to do what I always do: post mortem that shit.
This is going to be long. I erred by making my first post too short, so I’m going to ship $(allthedetail) this time. Duly warned.
Around 8 am on Friday, March 2nd, after pulling an all-nighter, I decided to pound out a quick blog post that has been on my todo list forever: the only advice I feel equipped to give on how to succeed in tech.
My advice, in brief, was this:
as a junior engineer, tough it out. work hard, learn everything, earn your stripes.
stay technical. don’t get sucked into an offramp unless you are god damn sure you want out for good.
once you are senior, use your power to advocate for others and fuck that shit up.
Money, power, credibility. This is the best way I know how to earn these things. This is what worked for me and most of the senior technical women I know and admire.
First of all: I don’t think there should be anything controversial at all about this advice. It’s good advice, if a bit bluntly put. Pick your battles, show strategic impact, leverage your influence into power and use that power to fuck shit up in the manner of your choosing.
The fact is, we are far too chickenshit about telling young women straight up how to succeed at work. We praise them for all kinds of dumb shit and second shift work and emotional labor that has little if any strategic impact to the bottom line, and wonder why they’re burned out and resentful.
We live in a fallen world. I didn’t make it this way, I just want to help you level up to be a powerful destroyer being so you can make it better.
So I hit “publish”.
Around 9:30 am, Camille Fournier gave me a bunch of unsolicited criticism. Unfortunately, due to some sour personal history with Camille I was extremely not disposed to receive this from her. I can be a resentful little shit: as soon as she told me to change it in certain ways, it was the last fucking thing in the world I was going to do.
For a few hours, all the feedback was good. People liked my advice to stay technical (“god I wish someone had told me that 15 years ago”) and my pointing out the loophole that lets women advocate for each other without being penalized.
A few people nailed what I was trying to say even better than I did:
But by the end of the day I was receiving a steady stream of angry tweets from people I had never heard of, with objections that seemed puzzling and ridiculous to me.
They were acting as though the sum total of my advice had been ordering bullied and abused people to just shut up and tough it out. Soon I was getting tweets accusing me of trashing all diversity work, trashing all women, only being out for myself and my own career, erasing sexual assault, being insensitive and destructive to people of color, and on and on.
People were subtweeting me like crazy, or DM’ing me telling me how much they liked my piece but were afraid to say so in public. Others were harassing my engineering managers and people who follow me.
I have never received textual scrutiny of this type before, where every single word was turned over and macerated and peered at for evidence of traitorous views. It sucks. (And it’s pretty hypocritical, to say the least … some of these same women who were gleefully bashing me for clumsy words remain good friends with men who are actual known harassers and abusers of women.)
Lots of people wanted me to take the post down immediately, or publish a retraction or correction immediately. Some prominent feminists publicly chided me and refused to talk to me until I repented of my sins. 🙄
Let’s be clear. I have no problem admitting my errors and making amends. I do it all the fucking time. But I am disinclined to grovel before a howling mob. It wasn’t even clear to me what I had done wrong, given all the contradictory noises.
So I decided to wait a week before responding, so I could talk to people and figure out what to take away from the mess.
(Also last week: traveled to multiple continents, flew a few dozen hours, wrote multiple talks, delivered presentations at various conferences and meetups, visited and pitched to potential customers, managed a handful of teams, fit 1x1s in between hops and time zones and you know just tried to do my fucking job while dealing with crazed nuts screaming abuse at me online.)
I had a couple of hard but helpful conversations with people like Alice Goldfuss and Courtney Nash, who took the time to walk me through ways that what I wrote may be misinterpreted or wrongly received. This feedback can mostly be bucketed into the following categories:
“Assume the reader knows nothing about you and considers you hostile until proven otherwise.” Well shit, I am not used to writing defensively. I live my life in high trust, high transparency environments and prefer it that way.
“Your advice doesn’t apply to $x.” True! I didn’t bracket it in layers of padding — “this is just what worked for me”, “may not apply to every situation” — because I thought that was freaking obvious.
“It sounds like you are shit talking all diversity efforts.” No, but I was waving vaguely in the direction of some very cynical and tired feelings on the subject. I’m pretty over corporate diversity issues and pinkwashing that doesn’t expand opportunity or share power.
“It sounds like you are shitting on all women.” Oof. This is the one that is really painful, because this is the one I have been working hard on for close to 20 years… and should have seen coming. I did intend to put some space between myself and women in tech, because I don’t exactly identify as a woman.. exactly. I grew up fundamentalist and misogynist af, and have been working hard to recover from that ever since I left home at 15.
“Maybe you shouldn’t give advice to women at all.” Courtney challenged me on whether I should speak to women, given my ambivalence wrt my own gender identification. Which is an interesting question that I have pondered a lot.
This was all desperately inevitable and predictable, however, and I made some unforced errors. So let’s talk about what I do and don’t regret about all this, and what I would or would not do differently.
NO REGRETS: giving the advice. It’s good advice, it needed to be said. I’m tired of seeing women burn themselves out on shitty corporate diversity work that only diverts their energy from amassing real power and strategic impact. Not sorry.
REGRETS: I was sloppy about waving in the direction of my gender issues. I intended to put some space between myself and “women’s issues”, because I don’t exactly identify as a woman, exactly, and I have always felt uncomfortable in women’s spaces. Given the historic devaluation of women’s spaces and issues, I should have been clearer. I am sorry.
NO REGRETS: I think it’s fine for me to give advice to women if they ask, which they do. After all I was raised as a woman, have always been read and treated as one, and assumed there was no other option for 30+ years. I get to speak. Not sorry.
SEMI-REGRETS: I still can’t figure how anyone managed to project into my piece that I was slamming all diversity work. I said somewhat colorfully that a lot of the advice didn’t work for me and wasn’t my favorite thing to dwell on, in the same grouchy grumbly tone that I use when bitching about query planners and terraform variable interpolation. I don’t think this would have been a big deal if the frenzy hadn’t gotten whipped up, but if anyone genuinely felt hurt or dismissed by it, I can be sorry for that.
REGRETS: the impact it had on my poor engineering managers and other people who work with me. They are still being asked to denounce me or defend me and their decision to work with me. So deeply not ok. I am sorry — but that’s really on you, internet assholes.
BIGGEST REGRETS: any accidental cover given to misogynists. By far the most annoying thing about the brouhaha has been when men with toxic views compliment me because they think I’m agreeing with them. I am NOT, so get off me. Sorry not sorry.
SADDEST REGRETS: my plummeting opinion of the feminist internet trash mob. I am a feminist and damn proud of it, but I am also disgusted by the hyperperformative boundary policing of certain self-proclaimed “tech feminists”. If your great joy in life is roving the interwebs looking for any toes pressing a line so you can rapturously castigate them and shun them until they have licked your boots and begged for forgiveness … if you love performing elaborate outrage rituals and whipping up a frenzy of whispers or a witch hunt… then:
Fuck. The Fuck. Off. You are an embarrassment. This is about your ego, and your manufactured grievance machines are Not Helping.
I honestly thought these feminist pile-on mobs were a right-wing fantasy, and I’m sad that I was wrong. I’m also pretty sad about all the folks who know me and have every reason to know better. In my world you check in with your friends before leaping to judgment, and you help teach each other when you’re being stupid. A pretty dismal number of people I would have called friends just leapt excitedly into the fray passing judgment.
So now I know more about who my friends are.
Why even stick my neck out? I guessed something might go wrong, I just didn’t know what. So why?
Because I want to help, dammit. The farther I get in my career the more time I spend pondering how to bring others along with me, how to open the gates a little wider.
I’ve gotten to do a few things. I have tried to create an equitable, respectful working environment where everyone can do their best work, with managers who are passionate about diversity and strong where I am weak.
But … I have felt very often alienated by the messaging and attempts to help women. I can’t be the only one who responds more to a strategic message than an empathetic one, who feels condescended to and patronized by the mainstream corporate efforts.
I can’t be the only one who feels simmering resentment every time I get held up as a successful “woman in tech” (the world’s worst participation trophy). I don’t want a fucking consolation prize. I want to sweep the competition, I want to change the world. I can’t be the only one who hungers for power, money and credibility.
I know I’m not, actually. I know because they are telling me. The response has been at least 100-1 positive in private — from junior women especially — thanking me for being brutally honest and treating them like adults, like equals. (I’ve been told there are armies of women who feel dreadfully hurt but too afraid to say so. Pity if true, as they say.)
There has always been tension between the people who see the world as it is and fight to succeed in it, and the people who opt out and refuse to participate because it’s compromised. The world needs us both. So shut the fuck up and let the kids pick for themselves.
And maybe stop persecuting the people who stand with you.
I don’t really do “women stuff”. I don’t really identify with any gender and I find a lot of the advice to be condescending and overly delicate, and it’s just a really boring thing to think and talk about. For me.
But I’m feeling guilty after turning down a bunch of requests to do shit for International Women’s Day next week. So I’m gonna do a thing I’ve been avoiding doing for years, and write down my (deeply problematic but practical) advice.
Toughen up. For your first 10 years or 3 jobs in the industry, you’re a junior contributor. You need them way more than they need you, so suck it up. Try not to dwell on the bullshit. Work hard and level up and always angle for more money and power when you can.
Stay technical. There are a thousand paved ramps out of engineering roles and only a few hard paths back in. Technical excellence is currency in this industry, even more so if your credibility is gonna get challenged again and again. So don’t stop engineering til you’re great at it.
Use your power for good. Once you become a senior contributor — and i’m not talking bullshit titles but real seniority, when people are coming to you for help far more than you go to them — then you can afford to get sensitive. …On behalf of others. The research convincingly shows that women get punished for advocating for themselves, but not for advocating for others. It’s a sweet loophole, use it.
If you feel like table flipping out of tech, just remember the rest of the world is at LEAST as sexist as tech is, but without the money and power and ridiculous life-coddling. Where exactly do you think you’re going to go?
Don’t quit tech: quit your job. There are LOTS of tolerable-to-great companies out there. If you stay and suffer, you’re just rewarding the shitholes with your presence. Don’t reward the shitholes any more than you can help it.
Learn shit, save your money, amass great power. Then use it to fuck shit up.
I have a very cynical reaction to the word “values”, especially in the context of corporate entities. At best it’s a disingenuous marketing campaign, usually it’s more like a red blaring light shining on their degenerate hypocrisies and weakest aspirations.
But we’ve been doing a lot of hiring. And one day I realized that most of our top candidates were asking the same two questions: 1) how do technical decisions get made, and 2) what are our company values?
After a few rounds of stuttering and sounding like an idiot, I decided it was time to mayyyybe stop sounding like an idiot and come up with an answer.
But it’s hard to do something you don’t believe in, let alone something as cheesy and heart-on-your-sleeve as write a company values statement. So first I had to talk myself into believing it was worth doing. Which went something like this:
Candidates I respect seem to think this thing matters, therefore it must matter to me too.
Well, some are worse than others. The ones I feel cynical about are the worst. (Facebook’s were a running joke because nobody believed them)
I didn’t hate Linden Lab’s values .. until we stopped believing in them. Hmm, so what was valuable about them?
Well, people used them to help resolve conflicts and make decisions. Nice.
Ok, what else do I hate? Values that are overly broad or include their opposite (“we work hard AND play harder”, “we’re empathetic BUT tough-minded”), are overly generic or too obvious (“no assholes” — duh), are unmemorable laundry lists, or too angelic and earnest (this list is not going to get you laid, capitalist scum).
Ok. So. Our values should be particular (they should not apply just as well to any other company), they should help make decisions and resolve conflict (if they aren’t useful/if we don’t use them then what’s the point), they should be pithy and a bit snarky (an aesthetic choice, just a spoonful of bile to help the medicine go down).
I started jotting down values fodder on my phone, while walking back and forth between home and work, which is how I do all my writing these days. I spent a couple weeks spewing notes out. It was a mess.
Then I sat down with Ginsu, my wizard of a COO, because I knew[*] he had the unique power to sift through my ramblings and craft a pithy message from vast effluvia. After bouncing it around a bit and soliciting everyone’s feedback, we were left with this list, which we quietly back-posted.
What I love about the list that it is specific, actionable, and truly echoes things we say every day to each other (“Everything is an experiment”, “Fast and mostly-right is better than slow and perfect”, and “We hire adults”), as well as bringing bits of our heritage (“Feedback is a gift”is lifted from Facebook; “Do it with style” comes from Linden Lab).
I like that I overhear people repeating the phrases to each other as they do their work and argue and urge each other on. I even love that there are huge, known flaws with it (“do it with style” notoriously does not scale) because that reminds me this is a living document, that we have committed to its care and feeding and regular revisioning.
I even love that you may read it and think, “This place is not for me.” If a place is for everyone, it is not for anyone in particular. I am okay with being a particular place, for particular people at a particular time in our lives. Specificity elicits passion in a way that generic never can.
Nothing lasts forever. As we close this round of funding and the end of the beginning chapter of this company, it’s nice to take a breath, pause, and put a stamp on it.
Lately I’ve been doing some career counseling for people off Twitter (long story). The central drama for many people goes something like this:
“I’m a senior engineer, but I’m thinking about being a manager. I really like engineering, but I feel like I’m just solving the same problems over and over and it seems like the real problems are people problems. I have to be a manager to get promoted. I hope it isn’t terrible, once I make the switch. I hear it’s terrible.”
I’ve been meaning to write this post for a while. There’s a lot but let’s start with: Fuck the whole idea that only managers get career progression. And fuckkkk the idea you have to choose a “lane” and grow old there. I completely reject this kind of slotting.
“Your advice is bad and you should feel bad”:
The best frontline eng managers in the world are the ones that are never more than 2-3 years removed from hands-on work, full time down in the trenches. The best individual contributors are the ones who have done time in management.
And the best technical leaders in the world are often the ones who do both. Back and forth. Like a pendulum.
I’ve done this a few times myself now; start out as an early or first infra engineering hire, build the stack, then build the team, then manage the team, then … leave and start it all over again. I get antsy, I get restless. I start to feel like I know what I’m doing (… a telltale sign something’s wrong).
It’s a good cycle for people who like early stage companies, or have ADD. But I don’t see people talking about it as a career path. So I’m here to advocate for it, as an intentional and awesome way of life.
There are lots of people who do both well - but serially. Not simultaneously.
(h/t to @sarahmei who was tweetstorming this up at the EXACT SAME TIME as i was writing this. Yes Virginia, internet feminists ARE linked by a mystical hive brain.)
On being a manager (of technical projects)
Promoting managers from within means you get those razor sharp skills from the people who just built the thing. That gives them credibility, while they struggle with their newly achieved incompetence in a different role.
That’s one of the only ways you can achieve the temporary glory of a hybrid manager+tech lead. This is an unstable combination, because your engineering skills and context-sharpness are decaying the longer you do it.
You can only really improve at one of these things at a time: engineering or management. And if you’re a manager, your job is to get better at management. Don’t try to cling to your former glory.
Management is highly interruptive, and great engineering — where you’re learning things — requires blocking out interruptions. You can’t do these two opposite things at once. As a manager, it is your job to be available for your team, to be interrupted. It is your job to choose to hand off the challenging assignments, so that your engineers can get better at engineering.
5. Both code and people require the same thing to thrive: focused, sustained attention. No one does both well.
Conversely: the best tech leads in the world are always the ones who done time in management. This is not because they’re always the best programmers or debuggers; it’s because they know how to get shit done, which means they know how to communicate and manage other people.
A tech lead is a manager … but their first priority is achieving the task at hand, not grooming and minding the humans who work on it.
They still need the full manager toolset. They’ll need to know how to rally people and teams and motivate them, or how to triage and restart a stalled project that everybody dreads. They still need to connect the dots between business objectives and technical objectives, and break down big objectives into components. They need to be able to size up a junior engineer’s ability and craft a meaningful assignment, one that pushes their boundaries without crushing them … then do the same for another twenty contributors. This is management work, from the slightly shifted perspective of “Get Thing X Done” not “care for these people”.
So these tech leads usually spend more time in meetings than building things, and they will bitch about it but do it anyway, because writing code is not the best use of their time. Tech is the easy part, herding humans is the harder part.
Senior engineers who have both these toolsets are the kind of tech leads you can build an org around, or a company around. They get shit done. And they are rare.
Almost all of them have spent considerable time in management.
We don’t talk about this nearly enough: the immense breadth and strength that accrues to engineers who make a practice of going back and forth.
Being an IC is like reverse-engineering how a company works with very little information. A lot of things seem ridiculous, or pointless or inefficient from the perspective of a leaf node. .
Being a manager teaches you how the business works. It also teaches you how people work. You will learn to have uncomfortable conversations. You will learn how to still get good work out of people who are irritated, or resentful, or who hate your guts. You will learn how to resolve conflicts, dear god will you ever learn to resolve conflicts. (Actually you’ll learn to YEARN for conflicts because straightforward conflict is usually better than all the other options.) You’ll go home exhausted every day and unable to articulate anything you actually did. But you did stuff.
You’ll miss the dopamine hit of fixing something or solving something. You’ll miss it desperately.
One last thing about management. There’s a myth that makes it really hard for people to stop managing, even when it makes them and everyone around them miserable. And that’s the idea that management is a promotion.
Management is NOT a promotion.
1. Becoming a manager is not a promotion - it's a lateral move onto a parallel track. You're back at junior level in many key skills.
Seriously, fuck that so hard. It is SUCH an insidious myth, and it leads to so many people managing even though they hate managing and have no business managing, and also starves the senior eng pool of the great mentors and elder wizards we need.
Management is not a promotion, management is a change of profession.And you will be bad at it for a long time after you start doing it. If you don’t think you’re bad at it, you aren’t doing your job.
Managing because it feeds your ego is a terrific way to be sure that your engineers get to report to someone miserable and resentful, someone who should really be writing code
or finding something else that brings them joy.
There’s nothing worse than reporting to someone forced into managing. Please don’t be one of the reasons people burn out hard on tech.
It isn’t a promotion, so you don’t have any status to give up. Do it as long as it makes you happy, and the people around you happy. Then stop. Go back to building things. Wait til you get that itch again.
This morning there was yet another comment thread on hacker news about Yet Another outage involving MongoDB and data loss, this time by some company called “CleverTap”.
To summarize: the CleverTap engineering team noticed that the WiredTiger storage engine was faster than MMAPv1 for MongoDB. They decided to … “upgrade the following weekend” (that sentence alone made my eyes bulge).
According to the blog post, they upgraded from 2.6 to 3.0, while simultaneously changing storage engines from MMAPv1 to WiredTiger, while leaving zero secondaries snapshot nodes with data on MMAPv1. All over the course of 3 days.
(They are also running sharded mongo, with a mere 300 ops/sec on each primary, which RAISES A LOT OF QUESTIONS but I already feel like I’m beating up on these kids so I won’t pursue that.)
(But seriously what the *hell* can you be doing to have such a low request rate, that you
need to shard at an infinitesimal volume? Why did you specify it in req/min instead of req/sec? What is the breakdown of reads/writes? What is the lock percentage? What is the avg object size?? Are these like multi-MB documents???? Why did you pause all incoming traffic and process it after the upgrade? If the primary can’t take the extra load, why not rs.syncFrom() a secondary? If that doesn’t work, don’t you have other, bigger problems??)
Most bafflingly of all: why wait only a few minutes after electing a new WiredTiger primary for the first time ever, and then immediately DELETE your only known-good copies of the data on MMAPv1 and re-sync over them with WiredTiger?
Okay. So here’s the thing: you are clearly a team of accidental DBAs. You are operations and software engineers who have found yourselves in charge of the data.
It’s cool. I am too! It’s a really neat and fun place to be in. DBAs and network admins are kind of the last remaining priesthoods in our industry.
There’s a lot of powerful and fun stuff to be done for generalists who pick up specialty knowledge in one of those areas, or specialists (like my neteng friend Leslie) who start bringing their skills back to the generalist side and merging the two.
(Oh Right, We Wrote A Book About This!!!)
My friend Laine and I are writing a book for people on the data side, called “Database Reliability Engineering“, which is aimed at generalist engineers who want to learn how to deal with data responsibly and effectively.
(Actually that’s a good point, I am supposed to be pitching this book! — which is really mostly Laine with a smidgen of me but it’s going to be super awesome. Consider this your sales pitch.)
So first, as an accidental DBA, you should obviously buy this book :). Second: stateful services require a different mindset[*]. It’s cool that you are running your own databases! But reading post mortems like this where the conclusion is “MongoDB sucks” makes me fucking grind my teeth.
Stop treating your databases like stateless services.
There are lots of ways that MongoDB (and every other database on the planet) really sucks. Mongo set themselves up for special rage by overpromising too much early on, and seeming tone deaf to criticism from real database engineers.
But *I* can criticize Mongo all day long. You children on hacker news who have never run it don’t get to. 😛 If you don’t know what the fuck you’re talking about, if you’re cargo culting other people’s years-old complaints, just shut up already.
Managing stateful services like databases means that you need to be more paranoid than you did with stateless services. With stateless services the best practices are to to roll early, roll fast, roll often, roll back. When you’re dealing with state, you need to be careful.
With stateful services you can’t play it fast and loose like that. You’re going to have data loss, corruption, unpredictable results, catastrophic failures that you can’t simply roll back from. Data loss can be ruinous to your company. (This can also be true for stateless services that sit close to your data and mutate it a lot.)
But that’s what makes it fun. 🙂
When we were moving from MMAPv1 to RocksDB at Parse, we ran hybrid replica sets for 6-9 months. We were paranoid. It was justified! We spent half a year capturing production workloads and replaying them, electing Rocks primaries and rolling back, and even then keeping snapshots and secondaries of both storage engines for *months*.
This isn’t because MongoDB sucks. It’s the nature of the game, it’s the difference between stateful and stateless services.
Do you know that there was a total query engine rewrite in 2.6? We spent months flushing out tons of crazy bugs. Do you know about the index intersection changes? We helped chase down bugs in those too. (You’re welcome.)
You can’t just go “dudes it’s faster” and jump off a cliff. This shit is basic. Test real production workloads. Have a rollback plan. (Not for *10 days* … try a month or two.)
If CleverTap had run their plan past anyone experienced with data, they would have called out all of those completely predictable failures, and advised them to change it:
Make one change at a time. Do a major version upgrade separately from the storage engine upgrade.
Delay between each change. Two weeks is absolutely minimal, any thing less is careless. Let them bake.
Storage engine changes are scary. It takes years to gain confidence in a new way of laying bits down on disk. (Whenever people bitch and moan about mongo, I remind them that I’ve still lost WAY more data to MyISAM, InnoDB, and MySQL overall than Mongo.
You can run lots and lots of replicas (up to 7 votes per replica set, even more nodes) per each replica set in Mongo. This is a killer feature. Why didn’t you use it?
Keep backups around for months in the new storage engine *and* the old storage engine, just in case. Have two hidden snapshot nodes. The only cost is in dollars, which is fucking cheap compared to data or engineering time.
If you are a new accidental DBA, you have to make a point of learning things. Go to conferences. Read books. Buy bottles of whiskey for your data friends and pick their brains. Remember that they know things you do not. Don’t blame the vendors when you fucked up.
Network engineering is the same way, but mistakes tend to be a lot less … permanent. You drop some packets.. like grains of sand. ^_^
Remember that you’re in charge of keeping people’s data safe and secure. You have much to learn. Learn it.
And get off my fucking lawn. ❤
Some slides from a couple of relevant talks I’ve given on the subject:
[*] P.S.: “Stop treating your stateful services like stateless services” … this is a fact, but it’s not the aspiration. DB folks should all be leaning in to the model of learning to treat our stateful services like stateless services, with the same casual disregard for individual nodes. This is hard, and it’s going to take some time, but it’s clearly where the world is heading and it’s definitely a good thing. 🙂 The learning goes both ways!
Last week was the West Coast Velocity conference. I had a terrific time — I think it’s the
best Velocity I’ve been to yet. I also slipped in quite late, the evening before last, to catch Gareth’s session on DevOps vs SRE.
And it was worth it! Holy crap, this was such a fun barnburner of a talk, with Gareth schizophrenically arguing both for and against the key premise of the talk, which was about “Google Infrastructure for Everyone Else (GIFEE)” and whether SRE is a) the highest, noblest goal that we should all aspire towards, or b) mostly irrelevant to anyone outside the Google confines.
Which Gareth won? Check out the slides and judge for yourself. 🙃
At some point in his talk, though, Gareth tossed out something like “Charity probably already has a blog post on this drafted up somewhere.” And I suddenly remembered “Fuck! I DO!” it’s been sitting in my Drafts for months god dammit.
So this is actually a thing I dashed off back in April, after CraftConf. Somebody asked me for my opinion on the internet — always a dangerous proposition — and I went off on a bit of a rant about the differences and similarities between DevOps and SRE, as philosophies and practices.
Time passed and I forgot about it, and then decided it was too stale. I mean who really wants to read a rehash of someone’s tweetstorm from two months ago?
Well Gareth, apparently.
SRE vs DevOps: TWO PHILOSOPHIES ENTER, BOTH ARE PHENOMENALLY SUCCESSFUL AND MUTUALLY DUBIOUS OF ONE ANOTHER
It also has some really fucking obnoxious blurbs. Things like about how “ONLY GOOGLE COULD HAVE DONE THIS”, and an whiff of snobbery throughout the book as though they actually believe this (which is far worse if true).
You can’t really blame the poor blurb’ers, but you can certainly look askance at a massive systems engineering org when it seems as though they’ve never heard of DevOps, or considered how it relates to SRE practices, and may even be completely unaware of what the rest of the industry has been up to for the past 10-plus years. It’s just a little weird.
So here, for the record, is what I said about it.
1) a lot of the philosophical volleying between devops / SRE comes down to a failure to recognize the overwhelming power of context.
Google is a great company with lots of terrific engineers, but you can only say they are THE
BEST at what they do if you’re defining what they do tautologically, i.e. “they are the best at making Google run.” Etsyans are THE BEST at running Etsy, Chefs are THE BEST at building Chef, because … that’s what they do with their lives.
Context is everything here. People who are THE BEST at Googling often flail and flame out in early startups, and vice versa. People who are THE BEST at early-stage startup engineering are rarely as happy or impactful at large, lumbering, more bureaucratic companies like Google. People who can operate equally well and be equally happy at startups and behemoths are fairly rare.
And large companies tend to get snobby and forget this. They stop hiring for unique strengths and start hiring for lack of weaknesses or “Excellence in Whiteboard Coding Techniques,” and congratulate themselves alot about being The Best. This becomes harmful when it translates into to less innovation, abysmal diversity numbers, and a slow but inexorable drift into dinosaurdom.
2) operations engineering is a specialized skill set *at large scale* or *on hard ops problems*. many -- most? companies don't have those.
Everybody thinks their problems are hard, but to a seasoned engineer, most startup problems are not technically all that hard. They’re tedious, and they are infinite, but anyone can figure this shit out. The hard stuff is the rest of it: feverish pace, the need to reevaluate and reprioritize and reorient constantly, the total responsibility, the terror and uncertainty of trying to find product/market fit and perform ten jobs at once and personally deliver to your promises to your customers.
At a large company, most of the hardest problems are bureaucratic. You have to come to terms with being a very tiny cog in a very large wheel where the org has a huge vested interest in literally making you as replicable and replaceable as possible. The pace is excruciatingly slow if you’re used to a startup. The autonony is … well, did I mention the politics? If you want autonomy, you have to master the politics.
3) the outcomes associated with operations (reliability, scalability, operability) are the responsibility of *everyone* from support to CEO.
Everyone. Operational excellence is everyone’s job. Dude, if you have a candidate come in and they’re a jerk to your office manager or your cleaning person, don’t fucking hire that person because having jerks on your team is an operational risk (not to mention, you know, like moral issues and stuff).
But the more engineering-focused your role is, the more direct your impact will be on operational outcomes.
4) therefore, the more literate you are with operational skills, the more effective and powerful you can be -- esp as a software engineer.
As a software engineer, developing strong ops chops makes you powerful. It makes you better at debugging and instrumentation, building resiliency and observability into your own systems and interdependent systems, and building systems that other people can come along and understand and maintain long after you’re gone.
As an operations engineer, those skills are already your bread and butter. You can increase your power in other ways, like by leveling up at software engineering skills like test coverage and automation, or DBA stuff like query optimization and storage engine internals, or by helping the other teams around you level up on their skills (communication and persuasion are chronically underrecognized as core operations engineering skills).
5) specialization is not a bad thing. specialization is how we scale and do capitalism! the problem is when this becomes compartmentalizing.
This doesn’t mean that everyone can or should be able to do everything. (I can’t even SAY the words “full stack engineer” without rolling my eyes.) Generalists are awesome! But past a certain inflection point, specialization is the only way an org can scale.
It’s the only way you make room for those engineering archetypes who only want to dive deep, or who really really love refactoring, or who will save the world then disappear for weeks. Those engineers can be incredibly valuable as part of a team … but they are most valuable in a large org where you have enough generalists to keep the oars rowing along in the meantime.
6) so: Google SRE has an incredibly powerful set of best practices, that enable them to run the largest site in the world incredibly well.
So, back to Google. They’ve done, ahem, rather well for themselves. Made shitbuckets of money, pushed the boundaries of tech, service hardly ever goes down. They have operational demands that most of us never have seen and never will, and their engineers are definitely to be applauded for doing a lot of hard technical and cultural labor to get there.
Mostly because it comes off a little tone deaf in places. I’m not personally pissed off by
the google SRE book, actually, just a little bemused at how legitimately unaware they seem to be about … anything else that the industry has been doing over the past 10 years, in terms of cultural transformation, turning sysadmins into better engineers, sharing on-call rotations, developing processes around empathy and cross-functionality, engineering best practices, etc.
If you try and just apply Google SRE principles to your own org according to their prescriptive model, you’re gonna be in for a really, really bad time.
However, it happens that Jen Davis and Katherine Daniels just published a book called Effective DevOps, which covers a lot of the same ground with a much more varied and inclusive approach. And one of the things they return to over and over again is the power of context, and how one-size-fits-all solutions simply don’t exist, just like unisex OSFA t-shirts are a dirty fucking lie.
Google insularity is … a thing. On the one hand it’s great that they’re opening up a bit! On the other hand it’s a little bit like when somebody barges onto a mailing list and starts spouting without skimming any of the archives. And don’t even get me started on what happens when you hire long, longterm ex-Googlers back into to the real world.
So, so many of us have had this experience of hiring ex-Googlers who automatically assume that the way Google does a thing is CORRECT, not just contextually appropriate. Not just right for Google, but right for everyone, always. Which is just obviously untrue. But the reassimilation process can be quite long and exhausting when the Kool-Aid is so strong.
8) DevOps as a philosophy is much more sensitive to context than SRE philosophy, because it grew from a broader collaborative base.
Because yeah, this is a conversation and a transformation that the industry has been having for a long time now. Compared with the SRE manifesto, the DevOps philosophy is much more crowd-sourced, more flexible, and adaptable to organizations of all stages of developments, with all different requirements and key business differentiators, because it’s benefited from loud, mouthy contributors who aren’t all working in the same bubble all along.
And it’s like Google isn’t even aware this was happening, which is weird.
9) that's it, basically all i'm saying is "all blanket statements are false" including probably this one 🙂 #devops#sre
Orrrrrr, maybe I’m just a wee bit annoyed that I’ve been drawn into this position of having to defend “DevOps”, after many excellent years spent being grumpy about the word and the 10000010101 ways it is used and abused.
(Tell me again about your “DevOps Engineering Team”, I dare you.)
(^^ thanks to @kellan and others who particularly influenced/clarified my thinking around #8, the crowdsourcing of devops)
P.S. I highly encourage you to go read the epic hours-long rant by @matthiasr that kicked off the whole thing. some of which I definitely endorse and some of which not, but I think we could go drink whiskey and yell about this for a week or two easy breezy ❤
So, a few hot takes on that …
1) SRE, as practiced by Google, is really just Ops with a lot of management support