Introduction to Cassandra Architecture

Cassandra Architecture

In the world of RDBMS, there is something called as system tables where RDBMS maintains the metadata about tables. Hence, if you create a table and call it a column name, it gets stored in system tables only. Similarly, in Cassandra, there is something called as key space to store the data about other key spaces. It stores the metadata about clusters and also some operational data as well. It stores metadata for the local node, as well as hinted handoff information.

Metadata consists of

The nodes token – This is nothing but the key range. The node is supposed to take the value from 1-10. So the node token will be 1.
The cluster name – In case there are two names and the cluster name is different and attempts are made to bring them in the same cluster, it will not be possible.
Key space and schema definitions to support dynamic loading – It will have the metadata about different columns so that you can support dynamic loading.
Migration data – If the replication factor is changed from 1-2 and 2-3, that information is stored under migration data. Any configuration changes made to a key space are stored under this category.
Whether or not the node is bootstrapped – Bootstrapping is when you want to bring a new node to the cluster, you add it to the cluster. It will be unaware with the cluster configuration and will not know how the cluster looks like. That node is called the dumb node. When a node comes up, Cassandra has bootstrapping. So once a node is bootstrapped it starts copying the data from other nodes. Those nodes are called the Seed nodes.

System key space cannot be modified or edited.

A system key space has two families. One is schema column family which holds the schema definition and the other is user key space data. The second is known as migration column family which records changes made to the key space.

CommitLog, Memtable, SSTable

The CommitLog is a crash-recovery mechanism that supports Cassandra’s durability goals. Cassandra writes to commit logs first before writing to the Memtables. When the number of objects stored in the Memtable reaches a threshold , the contents of the Memtable are flushed to disk in a file called SStable.

Each CommitLog maintains an internal bit flag to indicate whether it needs flushing. Once a Memtable is flushed to a disk as an SStable, it is immutable and cannot be changed by the application.

Compaction and Bloom filters

Introduction to Cassandra Architecture

It is the process of freeing up space by merging large accumulated data files. It basically merges different SStables into one. The keys are merged and the columns are combined and tombstones (soft deletes) are discarded before a new index is created. Cassandra supports multiple types of compaction:

Read-only compaction – This happens while reading the data.

Major compaction – When a key space level compaction is carried out, all the column families get compacted.

Bloom filters are used as performance booster. They are fast, non deterministic algorithms for testing whether an element is a member of a set. It serves as a special kind of cache allowing quick look-ups/search as they reside in memory. They can be false positive but not false negative. Hence, used to check for assessing the disk.

Tombstone and Snitches

Tombstones are analogous to soft delete in traditional RDBMS world. It is a deletion marker that is required to suppress older data in SStables until compaction can run. It uses tombstones to perform a soft delete functionality.

Introduction to Cassandra Architecture

If you wish to learn Microsoft SQL Server and build a career in the relational databases, functions, and queries, variables etc domain, then check out our interactive, live-online SQL Course here, which comes with 24*7 support to guide you throughout your learning period.

A snitch determines which data centers and racks are written to and read from. There are three types of Snitches – Simple, dynamic and rack inferring snitch.

Got a question for us? Mention them in the comments section and we will get back to you or get your Apache Cassandra certification from Edureka.

Introduction to Cassandra Architecture

Cassandra Architecture

CommitLog, Memtable, SSTable

Compaction and Bloom filters

Tombstone and Snitches

Recommended videos for you

Introduction to MongoDB

Build Application With MongoDB

Recommended blogs for you

Differences Between SQL & NoSQL Databases – MySQL & MongoDB Comparison

Understanding Journaling in MongoDB

What is a Database? Definition, Types and Components

Learn How To Handle Exceptions In PL/SQL

Growing Significance of MongoDB in Data Science Field

Development and Production of MongoDB

Top Apache Cassandra Interview Questions You Must Prepare In 2024

MySQL Workbench Tutorial – A Comprehensive Guide To The RDBMS Tool

Everything You Need to Know About LIKE Operator in SQL

Learn About Concatenate In SQL With Examples

What is a Schema in SQL and how to create it?

Foreign Key SQL : Everything You Need To Know About Foreign Key Operations

Top 50 DBMS Interview Questions You Need to know in 2024

SQL Functions: How to write a Function in SQL?

PL/SQL Tutorial : Everything You Need To Know About PL/SQL

Rising popularity of Hadoop and MongoDB® in the industry

SQLite Tutorial: Everything You Need To Know

Top 50 MySQL Interview Questions You Must Prepare In 2024

SQL UPDATE : Learn How To Update Values In A Table

What is SQL Regex and how to implement it?

Join the discussion Cancel reply

Trending Courses in Databases

Microsoft SQL Course

SQL Essentials Training

MongoDB Certification Training Course

MySQL DBA Certification Training

Apache Cassandra Certification Training

Teradata Certification Training

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Introduction to Cassandra Architecture