Last year Erik Meijer and Gavin Bierman of Microsoft Research published a paper making the case for a common language for NoSQL databases. A year later, there’s been some progress towards this goal, although not much.
UnQL, under development by Couchbase, Microsoft and SQLite is one of the most ambitious NoSQL query language projects I know of. The project seeks to deliver a universal SQL-like query language for document databases and other relatively well-structured non-relational databases. Here’s a lucid explanation of UnQL from H Online:
Unlike traditional relational databases, a collection may contain differently structured documents. The SQL commands CREATE and DROP TABLE become CREATE and DROP COLLECTION in UnQL. The WHERE clause of queries now refers to a document’s properties which map to the fields of the stored object.
Couchbase SVP of Products James Phillips says that so far there aren’t any partners other than Couchase, Microsoft and SQLlite, but anyone could build support for the language into an open source document database. So even if MongoDB, Apache CouchDB and BigCouch don’t officially support UnQL, someone could build an extension to provide support for UnQL into them.
Phillips says that the project is currently focused only on document databases, not key-value stores or graph databases. That means other data stores with traction – such as Apache Cassandra, HBase and Riak – won’t be able to easily support UnQL, if at all.
There are a lot of databases that UnQL could eventually be used with, and a lot that it simply couldn’t be used with. Graph databases are so fundamentally different they require a much different approach. Gremlin is graph traversal language that has become the standard for working with graph databases. It’s supported by DEX, InfiniGraph, Neo4j and many others.
Meanwhile, other projects are trying to create their own custom built SQL-like domain specific languages. Cassandra as CQL (Cassandra Query Language). Hadoop has Pig. HPCC has ECL (Enterprise Control Language). Google has GQL. But each will be a bit different – knowing SQL may help you learn each of these languages, besides perhaps Gremlin, but they’re not cross-compatible.
In their paper, Meijer and Bierman made the case that Microsoft LINQ (Language INntegrated Query) could be the glue for between SQL and what “coSQL” (their name forNoSQL). LINQ is a query system for .NET languages that has also been implemented other languages such as Java. It can be used with Microsoft SQL as well as independent data sources such as XML documents and Twitter.
Magnus Mårtensson, author of a .NET client for Neo4j wrote in 2010 that he has considered building a LINQ connector for Neo4j and there’s also been work done towards creating LINQ support for MongoDB, but it doesn’t appear either project has legs at the moment.
LINQ proves that it’s possible to create a truly cross-platform query language for databases. But it will require a great degree of openness on Microsoft’s part, and acceptance of the standard from the community. Instead we might end up with something LINQ-like. The NoSQL market has become increasingly competitive in the past year and half, and it’s difficult to imagine 10gen and the Apache CouchDB team accepting a standard created by Coucbase and Microsoft. Demand will have to come from the community from the ground up.