Yesterday I wrote about Cascading 2.0, an alternative to MapReduce. The application framework, managed by Concurrent, allows for developers to develop “Cascading,” big data apps using high-level scripting languages. The apps then get scheduled to run across a Hadoop cluster.
Also yesterday, HP executives presented their case for integrating Hadoop with Autonomy and HP Vertica, its impressive analytics technoloogy.
In both the news from HP and Concurrent, executives often referred to “aggregation,” as what serves as a priority in developing big data systems. It’s becoming clear why. Aggregation represents the next phase on the road to data as a service.
HP executives described how customers now talk about “data lakes,” where all data flows for analysis. With Autonomy, the data feeds into its analysis for filtering and then disrtributed to a Hadoop cluster.
I asked Autonomy Promote’s chief executive Rafiq Mohammadi how the integration might fit with Cascading 2.0. He said it’s not an either or situation. It’s simply an aggregation that could be executed through a REST-based API.
“Our entire strategy is to aggregate logic,” he said.
AWS: The Mega Aggregator
The Autonomy Intelligent Data Operating Layer (IDOL)integrating into Hadoop is similar to the way Amazon Web Services (AWS) aggregates data for customers to shape into apps. It serves as the value for any number of data services.
It does account for AWS success with customers in the business of data. Customers can program apps through platform-as-a-service (PaaS) and run them through AWS Hadoop clusters. Flightcaster did this and made its name for its accurate flight forecasting. Today, Cascading 2.0 gives the capability to more easily develop apps with aggregated data. Thousands more data services will emerge as automation quickens the capability to access aggregated data.
Advances in automation and app development for deployment on Hadoop clusters signals the coming trend in data-as-a-service. PaaS environments and big data frameworks will serve as the foundation for automating the application process to access aggregated data resources.
It’s inevitable. The analytics tools are getting better and the frameworks are far more simole to set up.
But the next step is aggregation. Once that is achieved, data can be shaped and used for competitive advantage.