Use of Cassandra
Cassandra is simple to maintain and often the administrator has a minimal part to play here since most of the work is done automatically. So if it has to be scaled up or down, add a node or remove one, it’s very fast. It provides a simple tool that will help you speed up with necessary tasks so often that there is nothing to worry about. Re-syncing and balancing of data are automatically carried out. It provides high velocity compared to the other NoSQL systems.
It has the capability of increasing columns for a specific type. It is only suitable for a case where secondary index needs are less which means there is an absolutely de-normalized information i.e all the information serves a specific query. It sits in one single table and avoids moving across multiple tables to serve a specific client query. Basically, Cassandra will support non-group pi kind of systems. Cassandra can be used in those situations where there are less secondary index needs. In case, you need a very simple set up, extremely high velocity of reads and writes and wide column requirements, Cassandra will be a good pick.
Where not to use Cassandra
There are certain dos and don’ts while dealing with Cassandra. The don’ts are as follows:
- Secondary indexes
- Relational Data
- Primary and financial records
- Stringent security
- Dynamic queries on columns
- Searching column data
- Low Latency
Use of HBase
HBase is pretty optimized for reads. It is probably one of the best systems used for MapReduce kind of a model. It was built for such kind of a system where it was in connection to work with hadoop.
The applications have strict consistency requirements and it could be well suited for doing range based scans. In Hbase, the database storage is done with blocks of data so the range becomes much bigger. Size of the memory is adjustable so you could actually create this block. It stores data as raw blocks as long as your range query falls within that block it can just obtain from one single node so the distribution is based on nodes. That way it is good for range based scans.
Hbase works really well where there is MapReduce application because it was always built with MapReduce in mind. MapReduce picks up all the blocks at one shot and runs maps or functions on that block of data. Instead of searching and picking up small data and then running Map Reduce on it, you would rather pick up a huge chunk and then run MapReduce which is much faster. Facebook uses it to manage its user statuses, photos, chat messages etc.
Consistency is another factor which is very important because it gives consistency because of the master slave model which hits at the base and the scale is available out of the box . It basically has a controller node and worker node, so more worker nodes can get added on to scale it comfortably.
Where not to use HBase
Wherever full table scans are required, HBase can not be used there because it is stored in blocks of data and if the data goes across blocks it will become painful and performance wise it will be bad. Data has to be aggregated, rolled up, analyzed across rows. For all these things, HBase shouldn’t be used. HBase is right for MapReduce kind of a job where you have a set of data and you want to run map or specific kind of functions.
When comparing the two, again, HBase is suitable for Mapreduce kind of tools where there is more consistency requirements, for example, Facebook messenger. Wherever there are feeds and travel portals, Cassandra comes in the picture.
Got a question for us? Mention them in the comments section and we will get back to you.