Published on Jan 30,2018
767 Views
Email Post

There are a lot of NoSQL databases out there. We have used or tried out many of them. We love a lot of cool features they offer. However, we also face many unique challenges in a highly regulated HCM SaaS business. So we have kept looking for the unicorn database to meet our requirements. Unfortunately, none of existing solutions fully address all of our challenges. So we asked ourselves two years ago if we can build our own solution. It was how Unicorn database was born. Unicorn is built on top of BigTable-like storage engines such as Cassandra, HBase, or Accumulo. With different storage engine, we can achieve different strategies on consistency, replication, etc. Beyond the plain abstraction of BigTable data model, Unicorn provides the easy-to-use document data model and MongoDB-like API. Moreover, Unicorn supports directed property multigraphs and documents can just be vertices in a graph. With the built-in document and graph data models, developers can focus on the business logic rather than work with tedious key-value pair manipulations. Of course, developers are still free to use key-value pairs for flexibility in some special cases.

During the past two years, we have learned a lot and made a lot of improvements, which resulted in Unicorn 2.0, which we are excited to open source to the community.

Unicorn is implemented in Scala and can be used as a client-side library without overhead. Unicorn also provides a shell for quick access of database. The code snippets in this document can be directly run in the Shell. A HTTP API, in the module Rhino, is also provided to non-Scala users.

With the module Narwhal that is specialized for HBase, advanced features such as time travel, rollback, counters, server side filter, etc. are available. The user can also export the data to Spark as RDD for large scale analytics. These RDDs can also be converted to DataFrames or Datasets, which support SQL queries. Unicorn graphs can be analyzed by Spark GraphX too.

JSON

To support the document model, Unicorn has a very rich and advanced JSON library. With it, the users can operate JSON data just like in JavaScript. Moreover, it supports JSONPath for flexibly analyse, transform and selectively extract data out of JSON objects. Meanwhile, it is type safe and may capture many errors during the compile time. Creating a JSON object is as simple as

Capture1

 

You can use the dot notation to access its fields just like in JavaScript:

Capture2

It is worth noting that we didn’t define the type/schema of the document while Scala is a strong type language. In other words, we have both the type safe features of strong type language and the flexibility of dynamic language in Unicorn’s JSON library.

We can also query JSON structures with JSONPath expressions in the same way as XPath expression are used in combination with an XML document.

Capture3

Documents

With the easy-to-use document model and the approach of data-as-API, agile development is not a dream. A document is essentially a JSON object with a unique key. With document data model, the application developers will focus on the business logic while Unicorn efficiently maps documents to key-value pairs in BigTable.

It is easy to insert/upsert a document and get it back with the key.

Capture4

To update a document, simply throw a JSON object compatible with MongoDB’s API:

Capture5

In SaaS applications, multi-tenancy, which multiple clients share the same database but each should see only its own data, is common. Unicorn supports multi-tenancy nicely to ensure the suitable view to the clients.

Capture6

Because the tenant is “Google” now, the data of tenant “IBM” are not visible.

Capture7

There are a lot of other cool features such as locality, scripting, time travel, filter, etc., which we cannot cover in this short overview. Please refer to our Github project for details. But we do want you to taste some of our Spark and Graph features in the following.

Spark

For large scale analytics, we can export documents to Spark as RDD[JsObject].

Capture8

For analytics, SQL is still the best language. We can easily convert RDD[JsObject] to a strong-typed DataFrame to be analyzed in SparkSQL.

Capture9

Graph

Unicorn supports directed property multigraphs. Documents from different tables can be added as vertices to a multigraph. It is also okay to add vertices without corresponding to documents. Each relationship/edge has a label and optional data (any valid JsValue, default value JsInt(1)). In what follows, we create a graph of gods, an example from Titan graph database.

Capture10

For graph traversal, we support a Gremlin-like API. The following example shows how to get saturn’s grandchildren’s name.

Capture11

Beyond simple graph traversal, Unicorn supports DFS, BFS, A* search, Dijkstra algorithm, etc.

Capture12

Note that this search is performed by a single machine. For very large graph, it is better to use some distributed graph computing engine such as Spark GraphX.

Capture13

Edureka offers specially curated courses on NoSQL databases such as MongoDB and Cassandra. Do check them out.

Related Posts:

MongoDB Interview Questions

Cassandra Career Opportunities

About Author
Haifeng Li
Published on Jan 30,2018

Share on

Browse Categories

Comments
2 Comments