Cassandra and Hadoop - realtime vs batch

Question

What are the differences in the architecture/implementation of Cassandra and Hadoop which account for real-time transaction processing in Apache Cassandra, while batch-oriented analytical processing in Hadoop?

nitinrawat895 · Answer

Apache Hadoop, is a big data analytics framework, focusing on near-time and batch-oriented analytics of historical data. It helps to run analytics on high volumes of historical business data on commodity hardware. The Vanilla hadoop consists of a Distributed File System (DFS) also know as HDFS( Hadoop Distribted File System) at the core and also the libraries to support Map Reduce model to write programs and to do analysis. DFS is what enables Hadoop to be scalable. It takes care of splitting data into multiple nodes in a multi node cluster setup so that Map Reduce can work on individual data nodes thus enabling parallelism. Thus MapReduce is a programming paradigm for processing and handling larger data sets.&#160;Apache Cassandra is a highly scalable, consistent, distributed and structured key-value store. It is not a conventional database but is more like Hashtable or HashMap which stores a key/value pair. Cassandra works on top of HDFS thus makes use of it for scaling.&#160;Reason:&#160;BigTable makes use of a String Sorted Table (SSTable) inorder to store key/value pairs. It maintains an index which onsists of key and offset in the File for the key which enables reading of value for that key using only a seek to the offset location. SSTable is effectively immutable which means that after creating the File no modifications can be done to the existing key/value pairs but new key/value pairs can be/are appended to the file. Update and Delete of records, update with a newer key/value and deletion with a key and tombstone value are appended to the file. Duplicate keys are allowed in this file for SSTable. The index is also modified whenever update or delete takes place so that the offset for that key, points to the latest value or tombstone value are appended to the file.

Cassandra and Hadoop - realtime vs batch

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

Differences between Cassandra and Hadoop, Real-time Processing v/s Batch Processing

How to choose between Cassandra, Membase, Hadoop, MongoDB and RDBMS?

What is the difference between Mongodb and Hadoop?

Is there any difference between “hdfs dfs” and “hadoop fs” shell commands?

Hadoop dfs -ls command?

“no such file or directory" in case of hadoop fs -ls

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

Relationship between Spark, Hadoop and Cassandra?

How can I download only hdfs and not hadoop?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES