bool(true)

Zero Downtime Just Isn’t Worth It

Nov 2, 2016 | StartUp,

When I describe Wrenchmode to developers, often the first reaction I get is: “But zero-downtime isn’t really that hard…” Unfortunately, I think that view misses the point of Wrenchmode. It’s not that zero-downtime is conceptually hard to achieve, just that it takes time, which is your most valuable resource, especially with small teams or just-launched products.

To illustrate why zero-downtime often isn’t worth the effort, let’s look at a well-understood example: Renaming a column in the database.

Zero Downtime Column Rename

The process for renaming a column is pretty straightforward. It’s been covered in many ways in many places, and I’ll be the first to admit that it’s “not hard”. However, it’s tedious, time consuming, and has the opportunity for introducing errors or bugs.

Here’s the basic procedure for renaming name to title in one of your tables:

  1. Add the title column to your database schema.
  2. Update your code to write to both the title and name column.
  3. Deploy schema and code
  4. Write a script or SQL statement that syncs all the name values over to the title column. Make sure you don’t overwrite data that someone is currently editing.
  5. Run your sync’ing script in production. Chew your fingernails while you pray that it works correctly.
  6. Update your code to only write to the title column.
  7. Deploy your code
  8. Update your schema to drop the name column.
  9. Deploy your schema updates
Total downtime: 00:00:00*

* If you get any of those steps wrong, or there’s a bug anywhere, you could have significant downtime while you fix and repair things…or worse, you may lose data that your users have entered. Trust me: Users hate data loss waaaaay more than downtime.

Total coding time: 06:00:00

This even assumes you’re playing a little fast-and-loose. If you’re properly testing everything, it could take significantly longer.

Total deploys: 3

Maintenance Mode Column Rename

Now let’s look at what it would take to just drop into maintenance mode for the rename.

  1. Write a migration that renames the name column to title with an ALTER TABLE statement.
  2. Update your code to change name to title everywhere. Tedious, but not too hard.
  3. Put your website into maintenance mode.
  4. Deploy your code and schema changes.
  5. Bring it out of maintenance mode.
Total downtime: 00:10:00

This assumes that your database is fairly small, perhaps 100k rows.

Total coding time: 00:45:00
Total deploys: 1

Analysis

Dropping into maintenance mode costs you 10 minutes of downtime, but saves you 5 hours of developer time. In fact, it saves you at least 5 hours of developer time. Hidden in the zero-downtime cost is the risk of things going wrong. ALTER TABLE statements are rock solid. Hastily written developer scripts to copy name to title are not nearly as reliable. What happens if the script crashes out halfway through? Are you sure it won’t overwrite user input? If you’ve been at this long enough, you know that all sorts of things can go wrong in these simple renames.

Zero Downtime, but at What Cost?

All this is to highlight the fact that zero-downtime, while conceptually easy, requires a lot of work to put into practice. Is it worth it? Well that depends on your size and userbase. If your company makes $1,000,000 per minute, then heck yes zero-downtime is worth it.

But, if you’re a brand-new startup with only a handful of users, or even a more mature company with a fair amount of traffic, I’d say that the safety of a simple ALTER TABLE statement, combined with the developer time savings, make zero-downtime a much-less useful investment of your time. Sure, it gives you some bragging rights, but the average user really doesn’t care. As long as they know what’s going on and when you’ll be back (*ahem* Check out wrenchmode.com *ahem*), they will wait patiently.

Wrenchmode to the Rescue

So what’s my ulterior motive here? To get you to go install and use Wrenchmode, my handy little (free) tool for easily putting up a beautiful maintenance page! I think it’s a good idea to warn your users about downtime, but there’s no need for it to be ugly or for you to waste your time creating the perfect page. Wrenchmode gets you a maintenance page in about 5 minutes, or even less if you’re on Heroku. So when the time comes, just toss up the maintenance page for five minutes rather than stressing for hours and hours about zero-downtime.