The Accidental DBA

This morning there was yet another comment thread on hacker news about Yet Another outage involving MongoDB and data loss, this time by some company called “CleverTap”.

Recap

To summarize: the CleverTap engineering team noticed that the WiredTiger storage engine was faster than MMAPv1 for MongoDB.  They decided to … “upgrade the following weekend” (that sentence alone made my eyes bulge).

According to the blog post, they upgraded from 2.6 to 3.0, while simultaneously changing storage engines from MMAPv1 to WiredTiger, while leaving zero secondaries snapshot nodes with data on MMAPv1.  All over the course of 3 days.

(They are also running sharded mongo, with a mere 300 ops/sec on each primary, which RAISES A LOT OF QUESTIONS but I already feel like I’m beating up on these kids so I won’t pursue that.)

Questions …

(But seriously what the *hell* can you be doing to have such a low request rate, that you
rainbow-umbrella-hineed to shard at an infinitesimal volume?  Why did you specify it in req/min instead of req/sec?  What is the breakdown of reads/writes?  What is the lock percentage?  What is the avg object size??  Are these like multi-MB documents????  Why did you pause all incoming traffic and process it after the upgrade?  If the primary can’t take the extra load, why not rs.syncFrom() a secondary?   If that doesn’t work, don’t you have other, bigger problems??)

Most bafflingly of all: why wait only a few minutes after electing a new WiredTiger primary for the first time ever, and then immediately DELETE your only known-good copies of the data on MMAPv1 and re-sync over them with WiredTiger?

Accidental DBAs

Okay.  So here’s the thing: you are clearly a team of accidental DBAs.  You are operations and software engineers who have found yourselves in charge of the data.

It’s cool.  I am too!  It’s a really neat and fun place to be in.  DBAs and network admins are kind of the last remaining priesthoods in our industry. umbrella-rainbow_cm-f

There’s a lot of powerful and fun stuff to be done for generalists who pick up specialty knowledge in one of those areas, or specialists (like my neteng friend Leslie) who start bringing their skills back to the generalist side and merging the two.

(Oh Right, We Wrote A Book About This!!!)

My friend Laine and I are writing a book for people on the data side, called “Database Reliability Engineering“, which is aimed at generalist engineers who want to learn how to deal with data responsibly and effectively.

(Actually that’s a good point, I am supposed to be pitching this book! — which is really screen-shot-2016-10-01-at-7-00-15-pmmostly Laine with a smidgen of me but it’s going to be super awesome.  Consider this your sales pitch.)

So first, as an accidental DBA, you should obviously buy this book  :).  Second: stateful services require a different mindset[*].  It’s cool that you are running your own databases!  But reading post mortems like this where the conclusion is “MongoDB sucks” makes me fucking grind my teeth.

Stop treating your databases like stateless services.

There are lots of ways that MongoDB (and every other database on the planet) really sucks.  Mongo set themselves up for special rage by overpromising too much early on, and seeming tone deaf to criticism from real database engineers.

But *I* can criticize Mongo all day long.  You children on hacker news who have never run it don’t get to. 😛  If you don’t know what the fuck you’re talking about, if you’re cargo culting other people’s years-old complaints, just shut up already.

Managing stateful services like databases means that you need to be more paranoid than you did with stateless services.  With stateless services the best practices are to to roll early, roll fast, roll often, roll back.  When you’re dealing with state, you need to be careful.

With stateful services you can’t play it fast and loose like that.  You’re going to have data loss, corruption, unpredictable results, catastrophic failures that you can’t simply roll back from.  Data loss can be ruinous to your company.  (This can also be true for stateless services that sit close to your data and mutate it a lot.)

But that’s what makes it fun.  🙂

Be paranoid.

When we were moving from MMAPv1 to RocksDB at Parse, we ran hybrid replica sets for 6-9 months.  We were paranoid.  It was justified!  We spent half a year capturing production workloads and replaying them, electing Rocks primaries and rolling back, and even then keeping snapshlightningots and secondaries of both storage engines for *months*.

This isn’t because MongoDB sucks.  It’s the nature of the game, it’s the difference between stateful and stateless services.

Do you know that there was a total query engine rewrite in 2.6?  We spent months flushing out tons of crazy bugs.  Do you know about the index intersection changes?  We helped chase down bugs in those too.  (You’re welcome.)

You can’t just go “dudes it’s faster” and jump off a cliff.  This shit is basic.  Test real production workloads. Have a rollback plan.  (Not for *10 days* … try a month or two.)

Lessons

If CleverTap had run their plan past anyone experienced with data, they would have called out all of those completely predictable failures, and advised them to change it:

  1. Make one change at a time.  Do a major version upgrade separately from the storage engine upgrade.
  2. Delay between each change.  Two weeks is absolutely minimal, any thing less is careless.  Let them bake.
  3. Storage engine changes are scary.  It takes years to gain confidence in a new way of laying bits down on disk.  (Whenever people bitch and moan about mongo, I remind Rainbow-Umbrella-Z-5_5them that I’ve still lost WAY more data to MyISAM, InnoDB, and MySQL overall than Mongo.
  4. You can run lots and lots of replicas (up to 7 votes per replica set, even more nodes) per each replica set in Mongo.  This is a killer feature.  Why didn’t you use it?
  5. Keep backups around for months in the new storage engine *and* the old storage engine, just in case.  Have two hidden snapshot nodes.  The only cost is in dollars, which is fucking cheap compared to data or engineering time.

If you are a new accidental DBA, you have to make a point of learning things.  Go to conferences.  Read books.  Buy bottles of whiskey for your data friends and pick their brains.  Remember that they know things you do not.  Don’t blame the vendors when you fucked up.

Network engineering is the same way, but mistakes tend to be a lot less … permanent.  You drop some packets..  like grains of sand. ^_^

Remember that you’re in charge of keeping people’s data safe and secure.  You have much to learn.  Learn it.

And get off my fucking lawn.  <3

Some slides from a couple of relevant talks I’ve given on the subject:

 

[*] P.S.:  “Stop treating your stateful services like stateless services” … this is a fact, but it’s not the aspiration.  DB folks should all be leaning in to the model of learning to treat our stateful services like stateless services, with the same casual disregard for individual nodes.  This is hard, and it’s going to take some time, but it’s clearly where the world is heading and it’s definitely a good thing.  🙂  The learning goes both ways!

 

rainbow-cloud-droplet

The Accidental DBA

18 thoughts on “The Accidental DBA

  1. Gary Godfrey says:

    Thanks for this post. You’ve made me realize that I fall into the Accidental DBA camp and probably need to start thinking about things a bit differently. So, when is the Rough Cuts version of your book coming out?

  2. Pramod says:

    What a coincidence! I fall into the “accidental” DBA category. I read the clevertrip link and was confused for opting MongoDB! This clears my mind. Great info.. thanks!

  3. Hey mipsytipsy,

    I am the author of the original blog post – https://blog.clevertap.com/sleepless-nights-with-mongodb-wiredtiger-and-our-return-to-mmapv1/ and I handle the dev ops at CleverTap. I wanted to make a few clarifications about the post.

    Firstly, the original article mentioned 18K operations/minute per node. This was a publishing mistake. We actually do 18K operations/second (the graph on the original post has the numbers as reported by MMS). We’ve posted a couple of updates on our blog, in case you’re interested in following this through.

    We do realise that we didn’t bake the storage engine enough, but lesson learned and we’re wiser now.

    1. Cool, glad to hear it. Couple things:

      “All things being equal, WiredTiger should’ve at least had the stability of MMAPv1 considering that it’s the default storage engine option from Mongo 3.2 onwards”

      Nnnnnnope, that’s not how software works. I agree that’s how we would LIKE it to work, but unfortunately in our world the mmapv1 engine has had what, almost 8 years of use in production hammering out bugs? Storage engines are one of the slowest-maturing categories I can think of, second only perhaps to encryption libraries.

      Also, you’re running it in 3.0, where it definitely isn’t the default, and you don’t get to just wave your hand and say it should be the same or better.

      ” – Each time we restarted the data nodes, WiredTiger would work fine for a couple of hours and then freeze. On digging deeper we noticed that it would freeze when the cache reached 95% capacity. So clearly there’s some cache eviction issue at play here”

      Just guessing here, but it sounds like you left your production system optimized for the old storage engine in ways that compounded the problem. Yep, their shit shouldn’t have crashed! But if you had had a couple of healthy secondaries around running the old stable dataset for a while, it never would have impacted you.

      If you decode to learn just one thing, honestly, I would hope it would be that. 🙂 Next time take advantage of the lovely fact that you can just *add new nodes* without decommissioning the stable ones for a nice, healthy long time. Cheers!

  4. Anthony Atkinson says:

    I know nothing after reading this. Please don’t misunderstand, that’s not a burn (toward you). I think that the fact that I now know that I know nothing was the point, too (I hope). I feel as if I knew things about databases before I read this. Now I’m not so sure. Good job, and as a plus, the pitch for the book subliminally made me make a reminder to order it. Smooth. Great write-up.

  5. sha512sum says:

    I know nothing after reading this. Please don’t misunderstand, that’s not a burn (toward you). I think that the fact that I now know that I know nothing was the point, too (I hope). I feel as if I knew things about databases before I read this. Now I’m not so sure. Good job, and as a plus, the pitch for the book subliminally made me make a reminder to order it. Smooth. Great write-up.

  6. . Couple things:
    “All things being equal, WiredTiger should’ve at least had the stability of MMAPv1 considering that it’s the default storage engine option from Mongo 3.2 onwards”
    Nnnnnnope, that’s not how software works.

Leave a Reply