How to Reduce a 9 Hour Job into a 9 Minute Job

A common problem developers have is how to run through a large number of tasks in a short amount of time. For instance, sending out a nightly notification to all your users, calculating your user’s bills every month or updating their Facebook graph data. These types of things are very common and almost every application needs to do it and it becomes more and more of a problem as your user base grows.

I was just helping a customer today that was using Heroku workers at full worker capacity (24) and it was taking ~9 hours per day to go through his user database of 200,000 users and send out notifications. A quick calculation explains why:

((200,000 users * 3.5 seconds per task) / 3600 seconds/hour) / 24 workers processes = 8.1 hours

We took it down to less than 9 minutes. Here’s how:

The easy (naive?) way would just be to queue up 200,000 tasks on IronWorker. This would have worked just fine, but it would have taken a while just to queue up the tasks and the setup/teardown time of a task would be wasteful when the task only takes 3.5 seconds to run. Instead, we had each task process 100 users which is now only 2000 tasks. Each task should take approximately 3.5 * 100 = 350 seconds = ~6 minutes to run and we can run them all at pretty much the same time, but I’ll be conservative and adjust a bit for IronWorker capacity.

((200,000 users * 3.5 seconds) / 3600 seconds/hour) / 2000 worker processes * 1.5 capacity adjustment = 0.146 hours = 8.75 minutes

Batching up the tasks into batches of 100 was easy, here’s sample code:

IronWorker gives you super simple access to huge compute power and you don’t even need to think about a server. This customer will never have to worry about scaling this part of his application again.

  • ismael

    In my calculations this way saves them about $340 in workers, is it true?

  • Travis Reeder

    Hi Ismael, could you share your calculation?

    While it’s running, the cost would be the same, but with IronWorker, you would only pay for the time the workers are actually running, no more, no less. With Heroku you’d have to pay for the 9 hours of time and then be sure to turn off all your workers every day after they were done or pay $827 per month to keep them running.

    IronWorker would be ~$10 per nightly run (2000 * 6 / 60 * $0.05)

  • Sunny Gleason

    Are there always a multiple of 100 users? If not, it seems like this code might be missing the part to enqueue the final tail remainder…

  • Travis Reeder

    Hi Sunny,

    100 users in the example above is arbitrary. It’s your code and your choice to do it however you want, you could do 1 user per worker or 1000 per task if you wanted.

    The final task would be whatever is left over. If you had 200,001 users, then the final task would just do the work for 1 user. Again, it’s your code and you create the payload for each task so it would be totally up to you.

  • IPDb Developers

    Can you provide some insight as to how to choose batch size for each IronWorker? That is – if my task has a set up time of X and a run time of Y, how many tasks should I send to each IronWorker?

  • Travis Reeder

    We recommend a run-time of at least 30 seconds for each task so that would be a good starting point. Really, the more you can do in a single task, the more efficient it is because you amortize the setup/teardown time of your worker (loading, making db connections, etc). But at the same time, there is value in having short quick tasks, for instance if task errors out, it’s easier to debug and retry it. Also, you don’t have to wait forever to ensure that it worked (or didn’t).

    So as a rule of thumb, greater than 30 seconds, but less than 5 minutes is a good place to be.

  • Trevor

    You could use `find_in_batches`and avoid your custom batching:

  • Unknown

    Wait a minute, that is 8.75 minute per worker process. So if you have 2000 worker process in parallel, ironworker will charge you for 8.75*2000*0.075/60 = $21.82


  • Travis Reeder

    Hi Unknown, worker hours are based on $0.05 per hour, not $0.75 (that’s the overage price if you exceed your plan limits) and the actual running time of each task was ~6 minutes (“Each task should take approximately 3.5 * 100 = 350 seconds = ~6 minutes to run”) so it would be:

    2000 * 6 / 60 * $0.05 = $10