Facebook is such a big fan of Hadoop Distributed File System (HDFS) that it stores 100 petabytes of data in a single HDFS filesystem, across 100 clusters – including very probably the largest single cluster of its type.
So it’s no wonder that Facebook’s engineering team went out of its way to solve the single point of failure problem (SPOF) problem that’s characterized Hadoop to date, an approach that Facebook engineer Andrew Ryan explained at this week’s Hadoop Summit (and summarized in a blog post here).
The way a standard HDFS client works is such that all filesystem metadata requests have to be filtered through a single server called the NameNode, with filesystem sending and receiving shunted through a pool of Datanodes. These Datanodes are redundant, and the filesystem can handle the failure of any one. But if that one NameNode goes down, HDFS is functionally out of commission, and no applications connected to it will function properly.
For Facebook, a highly-available NameNode is highly desirable: Ryan estimates that while such a thing is only a preventative measure against 10 percent of unplanned data warehousing outages, it could shave off as much as 50 percent of planned downtime by allowing the team to do hardware and software maintenance on the cluster without having to flip the switch.
Enter Avatarnode, named for the James Cameron movie, providing a manual failover point for HDFS NameNodes. Ryan’s diagram illustrates the basic gist:
AvatarNode, which Facebook has open sourced back to the community, provides a two-node NameNode with manual failover. This was achieved by wrapping Facebook in the Apache Zookeeper configuration management tool. AvatarNode is now used in production, handling some of Facebook’s heaviest workloads. You can grab Facebook’s distribution of Hadoop, including AvatarNode, here.
“Moving forward, we’re working to improve Avatarnode further and integrate it with a general high-availability framework that will permit unattended, automated, and safe failover,” as Ryan puts it in that blog entry.
Eliminating NameNode as an SPOF is the goal of many. Cloudera and MapR both include redundant NameNode systems in their Hadoop distributions, and the currently-in-beta Apache Hadoop 2.0 itself also removes NameNode as a single point of failure with a similar manual failover approach.
In the meanwhile, enterprises looking to use Hadoop/HDFS in production might well be able to take advantage of Facebook’s recipe for high availability.