AWS Architect Certification Training
- 53k Enrolled Learners
- Live Class
In the world of RDBMS, there is something called as system tables where RDBMS maintains the metadata about tables. Hence, if you create a table and call it a column name, it gets stored in system tables only. Similarly, in Cassandra, there is something called as key space to store the data about other key spaces. It stores the metadata about clusters and also some operational data as well. It stores metadata for the local node, as well as hinted handoff information.
Metadata consists of
System key space cannot be modified or edited.
A system key space has two families. One is schema column family which holds the schema definition and the other is user key space data. The second is known as migration column family which records changes made to the key space.
The CommitLog is a crash-recovery mechanism that supports Cassandra’s durability goals. Cassandra writes to commit logs first before writing to the Memtables. When the number of objects stored in the Memtable reaches a threshold , the contents of the Memtable are flushed to disk in a file called SStable.
Each CommitLog maintains an internal bit flag to indicate whether it needs flushing. Once a Memtable is flushed to a disk as an SStable, it is immutable and cannot be changed by the application.
It is the process of freeing up space by merging large accumulated data files. It basically merges different SStables into one. The keys are merged and the columns are combined and tombstones (soft deletes) are discarded before a new index is created. Cassandra supports multiple types of compaction:
Read-only compaction – This happens while reading the data.
Major compaction – When a key space level compaction is carried out, all the column families get compacted.
Bloom filters are used as performance booster. They are fast, non deterministic algorithms for testing whether an element is a member of a set. It serves as a special kind of cache allowing quick look-ups/search as they reside in memory. They can be false positive but not false negative. Hence, used to check for assessing the disk.
Tombstones are analogous to soft delete in traditional RDBMS world. It is a deletion marker that is required to suppress older data in SStables until compaction can run. It uses tombstones to perform a soft delete functionality.
A snitch determines which data centers and racks are written to and read from. There are three types of Snitches – Simple, dynamic and rack inferring snitch.
Got a question for us? Mention them in the comments section and we will get back to you.