Buffer Using MongoDB for Real-Time Analytics Without MapReduce

In a blog post, Tom Moor – co-founder and chief hacker at social media post scheduling app Buffer – describes how his company uses the NoSQL database MongoDB to gain real-time insight into internal analytics without noticably affecting front-end performance…and without MapReduce.

While Buffer uses common solutions like Kissmetrics and Google Analytics for many metrics, it needs to keep track of custom benchmarks, too, for things like Firefox extension usage or API call endpoints.

Using atomic ‘inc’ methods and so-called unsafe save states, Moor writes that he’s seen lightning quick access to real-time analytics. The trick is that by Buffer’s method, the driver doesn’t wait for a save success before continuing. By limiting data growth via storing statistics in minutely, weekly or hourly batches, read speeds in the admin role are increased.

For code snippets and implementation advice, I highly recommend checking out Moor’s blog. But for my money, the real news here is that using MongoDB for analytics is keeping “168 documents created per collection, per week for data recorded hourly.” That’s a reasonably small enough sample size that Buffer can actually do the analytics in code, without requiring MapReduce or any of the solutions that implement the same. The whole idea was inspired by a two-year-old official MongoDB blog post.

The flip-side to this bare-bones method is that data visualization (by way of Flot, Moor confirms in the comments) is lacking. Right now, Moor writes that most of their charts are simple line graphs, even in situations where pie or bar charts would be more called for.

Of course, MongoDB isn’t the only option for this kind of real-time analytics implementation. We saw plenty of developers doing something similar with HBase at HBase Con 2012 in May. Others use Redis bitmaps to achieve the same.