I’ve written before that big data necessitates DevOps by creating infrastructural challenges that require improved collaboration between development and operations and increased reliance on automation.
But big data can also enable DevOps teams by providing more actionable intelligence about applications and infrastructure. Here’s a look at a few tools at the intersection of big data and DevOps – tools that either make it easier for operations teams to handle rapidly growing data sets and/or harness operational data to improve efficiency.
The founders of Splunk refer to it as an “operational intelligence” tool One of the primary sources of big data is log data, which is especially of use to developers and operations. Since 2002 Splunk has made it easier to search log data archives and make use of the information contained within them. But Splunk can extract intelligence from other forms of machine data as well, from Twitter streams to the output of medical monitoring devices. That means Splunk could be useful to DevOps teams both for improving operations and for handling big data sources from outside the IT department.
See also: 8+ Alternatives to Splunk.
Boundary is an application monitoring tool designed specifically for the big data era. By focusing on network data, it’s able to determine the health of an application before it fails. It’s a way to use big data, including data collected from other DevOps tools like Puppet, to manage big data.
See also: Our profile of Boundary.
Amazon Web Services CloudFormation
VMware’s Dave McCrory has been talking lately about the idea of “data gravity” – the idea that as data accumulates in one place, it attracts more data and applications to this place. It’s not a matter of lockin per se, it’s that it’s easier and faster to manage applications and data stored iin the same storage location than it is to have data and applications spread across multiple locations.
With so much data ending up in Amazon’s cloud, you might want some tools that help automate virtual big data infrastructure on Amazon Web Services. That’s where CloudFormation comes in. CloudFormation in an AWS orchestration tool allows developers and sysadmins to spin-up and manage various tools including Elastic Beanstalk, Relational Database Service and ElastiCache Clusters. Although this tool won’t help you with Elastic MapReduce, Amazon’s hosted Hadoop service, if you do plan to do big data on the Amazon cloud, you’ll likely find yourself places other applications in that cloud as well.
See also: Infrastructure automation tools
One of the other major tools in infrastructure automation is Chef, an open source project created by former Amazon.com employee Jesse Robbins. Robbins also founded Opscode, a company dedicated to commercializing Chef.
At Data Day Austin last February, Opscode Technical Evangelist Matt Ray gave a presentation on Cluster Chef, now known as Ironfan, a variation on Chef created by the data-as-a-service company InfoChimps designed for managing clusters of servers. It’s the foundation of Infochimps’ big data platform, which can run on premises or in whatever cloud data gravity sucks you into.
Crowbar is an open source project from our sponsors at Dell. It’s also based on Chef and was originally intended to be used to spin-up OpenStack instances. But there are a number of plugins, called “barclamps,” including one for deploying Hadoop that supports the whole BigTop family including Hive, Pig, ZooKeeper and Sqoop. There’s also one for Cloudera Manager, the proprietary Hadoop management system from Cloudera.
Photo by zzpza