Nodeable made what you might call a major shift in its business model today. The company, which began life as cloud monitoring service provider for systems administrators, debuted a streaming Big Data Analytics services that is based on open source Storm and can be applied to numerous use cases beyond application monitoring.
The new cloud-based service is called StreamReduce and, according to the company, can help clients make sense of (and take action based) on multiple flows of multi-structured data as it hits the system. StreamReduce, and streaming Big Data analytics generally, is a compliment to other Big Data approaches, particularly Hadoop, that are not optimized to make sense of data in real-time.
Why Do We Need A Dedicated Streaming Big Data Service When We Have Hadoop?
Here’s the crux of the problem. Hadoop has proven itself a reliable platform for crunching and analyzing large volumes of historical, multi-structured data. But the open source Big Data framework was not designed to process and analyze data in real-time and nobody has yet figured out a way to add such capabilities to Hadoop.
Why? Because at its core Hadoop is a batch-and-load-oriented framework. That is, to get data into Hadoop, you gather up all the data you want to crunch and perform a large data dump into HDFS, without the need to normalize or otherwise add a common structure to the data first. You can automate and schedule this process, but at this point in time there’s no practical way to perform streaming analysis via Hadoop.
The two approaches – historical, batch-and-load Big Data analytics and streaming Big Data analytics – have two different goals. The former’s aim is to discover historical patterns and trends in large volumes of multi-structured data that could data back days, weeks or even years. The latter’s goal is to derive as much value from data as it is created, then send it on its way.
Though different goals, the two approaches can and should work hand-in-hand. In an ideal scenario, as data is created it passes through a platform like StreamReduce, which performs analytics to detect anomalous or otherwise important events and triggers responsive actions. Once the data passes through the streaming Big Data analytics service, it is sent to a Big Data platform like Hadoop where Data Scientists can pore over the data to uncover further insights.
Tag Team: Streaming and Historical Big Data Analytics
As a simple example, consider the Twitter fire hose. The fire hose hits the streaming Big Data analytics service in real-time, where the data is mined to identify unhappy customers, all within in sub-second time frames. This kicks-off a series of responses, such as emails offering free services to those ticked off customers. Once the fire hose passes through the streaming service, some or all of the data is then sent into a queue to be loaded into Hadoop, where at a later date Data Scientists might combine it with other data sources to perform social graph analysis or to correlate social media activity to buying behaviors.
Streaming Big Data Analytics is particularly relevant to industries where the ability to respond faster than the competition – even just a split second faster – can mean the difference between success and failure. Think the financial services, energy and utilities, and consumer-facing retail industries. Streaming Big Data Analytics is also applicable across a number of horizontal use cases, including clickstream analysis, log file analysis and real-time advertising optimization.
Of course, financial services and trading firms have been using complex event processing engines to perform streaming analytics on high velocity but relatively structured data for years. But new approaches were needed to expand streaming analytics to even more data sources and to multi-structured data, which traditional CEP engines are not equipped to handle.
Marz and Storm
That’s why Nathan Marz invented Storm for streaming Big Data Analytics when he worked at BackType. The company was later acquired by Twitter, where Marz put the finishing touches on Storm and open sourced the project. Marz talked about Storm and its evolution on theCUBE at Strata 2012:
Of course the ideal ideal scenario is one platform that can perform both streaming and historical Big Data Analytics. But we’re not there yet. With the right expertise, however, engineers can architect the two systems – Storm and Hadoop – to work tightly together as Twitter does. I also expect streaming Big Data analytics vendors like Nodeable and HStreaming to partner closely with Hadoop providers to deliver the two services combined as seamlessly as possible.