There are a lot of NoSQL databases out there. We have used or tried out many of them. We love a lot of cool features they offer. However, we also face many unique challenges in a highly regulated HCM SaaS business. So we have kept looking for the unicorn database to meet our requirements. Unfortunately, none of existing solutions fully address all of our challenges. So we asked ourselves two years ago if we can build our own solution. It was how Unicorn database was born. Unicorn is built on top of BigTable-like storage engines such as Cassandra, HBase, or Accumulo. With different storage engine, we can achieve different strategies on consistency, replication, etc. Beyond the plain abstraction of BigTable data model, Unicorn provides the easy-to-use document data model and MongoDB-like API. Moreover, Unicorn supports directed property multigraphs and documents can just be vertices in a graph. With the built-in document and graph data models, developers can focus on the business logic rather than work with tedious key-value pair manipulations. Of course, developers are still free to use key-value pairs for flexibility in some special cases.
During the past two years, we have learned a lot and made a lot of improvements, which resulted in Unicorn 2.0, which we are excited to open source to the community.
Unicorn is implemented in Scala and can be used as a client-side library without overhead. Unicorn also provides a shell for quick access of database. The code snippets in this document can be directly run in the Shell. A HTTP API, in the module Rhino, is also provided to non-Scala users.
With the module Narwhal that is specialized for HBase, advanced features such as time travel, rollback, counters, server side filter, etc. are available. The user can also export the data to Spark as RDD for large scale analytics. These RDDs can also be converted to DataFrames or Datasets, which support SQL queries. Unicorn graphs can be analyzed by Spark GraphX too.
It is worth noting that we didn’t define the type/schema of the document while Scala is a strong type language. In other words, we have both the type safe features of strong type language and the flexibility of dynamic language in Unicorn’s JSON library.
We can also query JSON structures with JSONPath expressions in the same way as XPath expression are used in combination with an XML document.
With the easy-to-use document model and the approach of data-as-API, agile development is not a dream. A document is essentially a JSON object with a unique key. With document data model, the application developers will focus on the business logic while Unicorn efficiently maps documents to key-value pairs in BigTable.
It is easy to insert/upsert a document and get it back with the key.
To update a document, simply throw a JSON object compatible with MongoDB’s API:
In SaaS applications, multi-tenancy, which multiple clients share the same database but each should see only its own data, is common. Unicorn supports multi-tenancy nicely to ensure the suitable view to the clients.
Because the tenant is “Google” now, the data of tenant “IBM” are not visible.
There are a lot of other cool features such as locality, scripting, time travel, filter, etc., which we cannot cover in this short overview. Please refer to our Github project for details. But we do want you to taste some of our Spark and Graph features in the following.
For large scale analytics, we can export documents to Spark as RDD[JsObject].
For analytics, SQL is still the best language. We can easily convert RDD[JsObject] to a strong-typed DataFrame to be analyzed in SparkSQL.
Unicorn supports directed property multigraphs. Documents from different tables can be added as vertices to a multigraph. It is also okay to add vertices without corresponding to documents. Each relationship/edge has a label and optional data (any valid JsValue, default value JsInt(1)). In what follows, we create a graph of gods, an example from Titan graph database.
For graph traversal, we support a Gremlin-like API. The following example shows how to get saturn’s grandchildren’s name.
Beyond simple graph traversal, Unicorn supports DFS, BFS, A* search, Dijkstra algorithm, etc.
Note that this search is performed by a single machine. For very large graph, it is better to use some distributed graph computing engine such as Spark GraphX.