Spikability – An Application’s Ability to Handle Unknown and/or Inconsistent Load

Neo Handling Load

I’ve been in many conversations recently about how an application can handle spiky behavior. More importantly, how to handle it without a bunch of wasted resources that sit idle most of the time just to handle the spikes. There are a bunch of applications where this is a very typical use case, for instance:

  • A blogging application may experience spikes on every post that goes out.
  • A daily deal site like Groupon or One Kings Lane experience HUGE spikes whenever a new product is posted. One Kings Lane has the benefit of knowing exactly what time it will happen each day though: 8am every morning.
Let’s use the daily deal site for our examples because it is a well known problem for them.

Solution #1: Have More Resources than You’ll Ever Need

One way to solve this problem is to always have excess capacity waiting for the spikes. If you can estimate the maximum amount of simultaneous traffic you’ll have on any given day, then you can just keep enough servers running to handle the maximum load. With this solution, you’ll always want to have extra capacity above your maximum just in case because today might be your new maximum.

Solution #2: Disable Features During High Loads

When traffic spikes and your current resources can’t handle it, disable features or have a “lightweight” backup version of your application. Groupon had to do this and Google suggests this. This works because it essentially reduces your load. But then you’ve also stripped away a large portion of your application so this is not the best option.

Solution #3: Auto Scaling

When traffic spikes, quickly and automatically launch new servers to handle it. Sounds great!  But easier said than done.  First off, it takes a lot of effort to setup an auto scaling system. Second, it has to be able to spin up those extra resources very fast; too slow and your system may already have gone down. There’s also a bit of waste until your auto scaler decides it’s time to tear down those extra instances. And if your spikes are occurring at random times, you end up scaling up and down all the time.

Solution #4: Use Message Queues

Queues are made for dealing with spiky load. When load is light, the queues are always empty and everything works as normal. When load is heavy, the queues start to fill up and you process the load as fast as you can based on the resources you have running. And you can launch more servers to eat away at the queues if they keep growing.
The big benefit here is that you don’t waste resources and money (#1) and you don’t have to disable any features of your site/application (#2).


Messages queues are a simple and very effective way of dealing with spiky behavior. There are a lot of message queue options out there to choose from. IronMQ is a cloud based message queue you can use with very little effort and will scale with you. And if you want on demand workers to be able to work through your queues, take a look at IronWorker for massively scalable task processing.
  • http://www.blogger.com/profile/10770433415124655073 mxx

    If users expect a result in real-time, message queue is not a solution.
    Additionally, we use your example of a daily deal site, more often than not such sites have a fixed inventory amount. You can’t have a message queue process your orders. You’ll keep selling those offers while your whole inventory will still be in the queue.

  • http://www.blogger.com/profile/11257184260827538727 Seun Osewa

    I think caching may be the most important strategy for coping with traffic spikes on most websites. Traffic spikes tend to occur when a specific page on your site is linked from a popular site like reddit or the yahoo home page. If caching is implemented correctly, such hot pages will be served entirely from your cache, and your server load won’t be affected much.

  • http://www.blogger.com/profile/01398330633165910535 Travis Reeder

    Caching can definitely help by essentially reducing load on your app/database, but a cache can’t perform an action. For retrieving data that is used often or takes time/resources to generate, a cache is a wonderful thing. For everything else your app needs to do, you need something else, like a message queue.