Top Apache Cassandra Interview Questions You Must Prepare For In 2016
Recommended by 4 users
According to a 2015 survey by Dice.com, Apache Cassandra comes in at No. 2 in the ranking of technologies with top paying salaries. According to the survey, the average salary for the skill stands at $147,811, second only to SAP’s HANA (High Performance Analytical Appliance). 2015 rankings of databases by DB-Engines, shows that Apache Cassandra demonstrated the highest percentage increase in popularity. It gained 32.2 scoring points and secured the third place overall. With a significant skill-demand gap around the world, Apache Cassandra can be the turning point in your career. To help you take advantage of the career opportunities and help you be better prepared for your Apache Cassandra job interview, we have compiled a list of top 25 frequently asked Cassandra interview questions. If you have other questions that you think should be included in this list, kindly post in the comments section.
Apache Cassandra Interview Questions:
Q1: How many types of NoSQL databases are there?
Answer: There are four types of NoSQL databases, namely:
- Document Stores (MongoDB, Couchbase)
- Key-Value Stores (Redis, Volgemort)
- Column Stores (Cassandra)
- Graph Stores (Neo4j, Giraph)
Q2: What do you understand by Commit log in Cassandra?
Answer: Commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log.
Q3: Define Mem-table in Cassandra.
Answer: It is a memory-resident data structure. After commit log, the data will be written to the mem-table. Mem-table is in-memory/write-back cache space consisting of content in key and column format. The data in mem- table is sorted by key, and each column family consists of a distinct mem-table that retrieves column data via key. It stores the writes until it is full, and then flushed out.
Q4: What is SSTable?
Answer: SSTable or ‘Sorted String Table,’ refers to an important data file in Cassandra. It accepts regular written memtables which are stored on disk and exist for each Cassandra table. Being immutable, SStables do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary and a bloom filter.
Q5: What is bloom filter?
Answer: Bloom filter is an off-heap data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.
Q6: Establish the difference between a node, cluster & data centres in Cassandra.
Answer: Node is a single machine running Cassandra.
Cluster is a collection of nodes that have similar type of data grouped together.
Data centres are useful components when serving customers in different geographical areas. Different nodes of a cluster are grouped into different data centres.
Q7: Define composite type in Cassandra?
Answer: In Cassandra, composite type allows to define a key or a column name with a concatenation of data of different type. You can use two types of Composite Types:
- Row Key
- Column Name
Q8: What is Cassandra Data Model?
Answer: Cassandra Data Model consists of four main components, namely:
- Cluster: These are made up of multiple nodes and keyspaces.
- Keyspace: It is a namespace to group multiple column families, especially one per partition.
- Column: It consists of a column name, value and timestamp
- Column family: This refers to multiple columns with row key reference.
Q9: Explain what is a keyspace in Cassandra?
Answer: In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consists of one keyspace per node.
Q10: Elaborate on CQL?
Answer: A user can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to work with CQL or separate application language drivers.
Q11: Talk about the concept of tunable consistency in Cassandra.
Answer: Tunable Consistency is a characteristic that makes Cassandra a favored database choice of Developers, Analysts and Big data Architects. Consistency refers to the up-to-date and synchronized data rows on all their replicas. Cassandra’s Tunable Consistency allows users to select the consistency level best suited for their use cases. It supports two consistencies – Eventual Consistency and Strong Consistency.
Q12: What are the three components of Cassandra write?
Answer: The three components are:
- Commitlog write
- Memtable write
- SStable write
Cassandra first writes data to a commit log and then to an in-memory table structure memtable and at last in SStable.
Q13: Explain zero consistency.
Answer: In zero consistency the write operations will be handled in the background, asynchronously. It is the fastest way to write data.
Q14: Mention what are the values stored in the Cassandra Column?
Answer: There are three values in Cassandra Column. They are:
- Column Name
- Time Stamp
Q15: What do you understand by Kundera?
Answer: Kundera is an object-relational mapping (ORM) implementation for Cassandra which is written using Java annotations.
Q16: What is the concept of SuperColumn in Cassandra?
Answer: Cassandra SuperColumn is a unique element consisting of similar collections of data. They are actually key-value pairs with values as columns. It is a sorted array of columns, and they follow a hierarchy when in action.
Q17: When do you have to avoid secondary indexes?
Answer: Try not using secondary indexes on columns containing a high count of unique values as that will produce few results.
Q18: List the steps in which Cassandra writes changed data into commitlog?
Answer: Cassandra concatenates changed data to commitlog. Then Commitlog acts as a crash recovery log for data. Until the changed data is concatenated to commitlog, write operation will never be considered successful.
Q19: What is the use of “ResultSet execute(Statement statement)” method?
Answer: This method is used to execute a query. It requires a statement object.
Q20: What is Thrift?
Answer: Thrift is the name of the Remote Procedure Call (RPC) client used to communicate with the Cassandra server.
Q21: Explain the two types of compactions in Cassandra.
Answer: Compaction refers to a maintenance process in Cassandra , in which, the SSTables are reorganized for data optimization of data structures on the disk. There are two types of compaction in Cassandra:
- Minor compaction: It starts automatically when a new table is created. Here, Cassandra condenses all the equally sized tables into one.
- Major compaction: It is triggered manually using nodetool. It compacts all tables of a ColumnFamily into one.
Q22: Explain what is Cassandra-Cqlsh?
Answer: Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things:
- Define a schema
- Insert a data, and
- Execute a query
Q23: What is the use of “void close()” method?
Answer: This method is used to close the current session instance.
Q24: What are the collection data types provided by CQL?
Answer: There are three collection data types:
- List : A list is a collection of one or more ordered elements.
- Map : A map is a collection of key-value pairs.
- Set : A set is a collection of one or more elements.
Q25: Describe Replication Factor?
Answer: Replication Factor is the measure of number of data copies existing. It is important to increase the replication factor to log into the cluster.
Got a question for us? Please mention it in the comments section and we will get back to you.