Published on Jul 13,2018
Email Post

Cassandra File System

Cassandra is the right choice when you need scalability and high availability without compromising on performance. Cassandra  file system is an HDFS file system that is replaceable with your standard HDFS  file. You can change the Hadoop configuration and explore and expose Cassandra’s file system as HDFS. With this file system, it is easy to get rid of the name nodes and data node daemons because Cassandra can take care of that.

When Cassandra file system gets exposed as a Hadoop file system, people with a  Hadoop background and knowledge would not know Cassandra at all. It shouldn’t matter since they’re interacting with the HDFS API. The HDFS standard API function would be applied to the Cassandra file system.  It internally takes care of how to talk to which aspect of the Cassandra file system.

It removes many pin points that exist in Hadoop architecture, namely the single point of failure. It is horizontally scalable and gives very tight integration for the processing to run faster. The output could alternatively be put back in Cassandra. For handling big files like Hadoop, Cassandra can also be exposed to do that. It basically exposes its own file system in terms of bigger chunks of blocks. It’s quite similar to the Hadoop system.

Characteristics of Cassandra file system

The Cassandra file system is decentralized. It doesn’t have a single point of failure and has a replication facility. It is very similar to HDFS and this is an HDFS compatible system. Another important factor about Cassandra file system is that it can be used for indexing.

Being indexed in an HDFS file system is very difficult since everything gets distributed on blocks but in the Cassandra file system, certainly, you can have the information index and hence, this provides a very unique advantage.

One can have the index in the Cassandra file system and then the power of Hadoop could be used to traverse the data and do some smart scanning, instead of scanning all the data and finding out respective information.

Got a question for us? Mention them in the comments section and we will get back to you.

Related Posts:

Introduction to Apache Thrift

Importance of Data science with Cassandra

Get started with Cassandra

About Author
Published on Jul 13,2018

Share on

Browse Categories