In their ongoing coverage of the 2012 Cassandra Summit in Santa Clara, California, John Furrier and Jeff Kelly spoke with Co-Founder and CTO of DataStax, Jonathan Ellis (see full video below). The panelists discuss Cassandra’s strengths in light of the NoSQL movement, SSD’s increasing momentum, and how Cassandra distinguishes itself from MongoDB and HBase.
Furrier highlights how data solutions have changed over time (noting that he comes from the “old-school” database generation) to address problems around mobile, data and cloud. He asks Ellis’ opinion on the new key challenge in the marketplace. Ellis suggests the NoSQL movement (not only SQL) is important to this discussion, as it has been “a blessing and a curse” to Cassandra. While “relational databases are a hammer,” Ellis warns, “not every problem is a nail” and “there are other tools that are better at solving specific problems.” Ellis explains that Cassandra is more concerned with scaling applications than the language you access them with: “We can make trade-offs that are more appropriate for scaling real-time applications that make Cassandra a better fit than MySQL or ORACLE.”
Furrier probes further about Cassandra’s strongest assets and how SSD has impacted its growth. Ellis says what Cassandra does better than anyone else in providing support for multiple data centers. Ellis refers to SSD as a “slow revolution,” citing early adopters like Amazon and Microsoft. The advantage Ellis sees in cloud is that, one can say: “I’m not going to focus on hiring ops teams and training them, I’m going to focus on my core business and… outsource that infrastructure…I don’t think there’s one right answer for anyone.” Furrier adds that he thinks cloud will be around, but he doesn’t see it as “revolutionary as big data.”
Kelly inquires about what distinguishes NoSQL solutions from one another. Ellis explains that Voldemort is in the same general area as Cassandra, “but there still isn’t that much community around Voldemort.” But, he does see room for diversification with companies like MongoDB, which targets small businesses and hobbyists. Ellis describes the Cassandra community as practical “problem servers,” concerned more with accessibility than theory.
Furrier also inquires about Ellis’ philosophy about using different tools. Ellis says that he is a big fan of the python language, but building a database around python isn’t as effective as Java, given its speed. Ellis says he would trade-off exclusivity for performance in such cases.
Furrier also asks about common challenges facing technology entrepreneurs in which, companies like Twitter and Facebook explode, but have to do some major re-architecting to rebuild to scale. He asks for Ellis’ insight into the common dilemma most successful tech startups face when soaring in the air, only to realize “they have to change the airplane engine at 30,000 feet.” Ellis believes this problem is unavoidable and, perhaps, should not be feared. “Even if I had a crystal ball that says I’m going to be at 100 million users in 5 years, maybe I’d still start off building it on [a language] my team is more familiar with,” and deal with the limitations of that.
Ellis notes Cassandra’s effectiveness, in part, lies in its leanness. Whereas programs like ORACLE prevent programmers from doing joints, Ellis says: “Cassandra forces you to think efficiently; it makes you discard some bad habits.” Furrier refers to the ways in which, some companies train staff on ORACLE before switching over to Cassandra as an old real-estate tactic: “They show you the crappy houses first…play with ORACLE and then move to Cassandra, it’s a dream.”
Going forward, Ellis says a major goal for Cassandra going forward is “better support for large cluster members.” He cites demand for supporting thirty-two cores, eight terabytes and being able to scale up in addition to scaling out. In terms of competition, Ellis suggests: “Couch and MongoDB are going after the same market, and Cassandra and HBase are going after the same market.” Furrier describes HBase as a tailored suit, once the design is set, it’s perfect, but no one else can use yours. Ellis says, when it comes to HBase, even if you have the expertise, “it’s not going to make some of the basic limitations go away.” Ellis asks rhetorically: “What’s the point of using a distributed database if you have to go ahead and shard it again afterwards?” He adds: “MongoDB stops being appropriate once you hit datasets that don’t fit in memory…the performance really falls of a cliff.” Taking these and other reasons into account, Ellis believes Cassandra has the edge.
Operationally, over the next year, Cassandra wants to emphasize its ease-of-use for developers. Ellis describes a new screencast in which, a Cassandra engineer explains how to stand up a four-node Cassandra cluster in just two minutes. Ellis concludes: “By having a fully distributed cluster…we want to continue to make lives easier when people build operations on Cassandra.”