Cassandra and Hadoop - realtime vs batch

+1 vote

What are the differences in the architecture/implementation of Cassandra and Hadoop which account for real-time transaction processing in Apache Cassandra, while batch-oriented analytical processing in Hadoop?

Mar 26, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
43 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Apache Hadoop, is a big data analytics framework, focusing on near-time and batch-oriented analytics of historical data. It helps to run analytics on high volumes of historical business data on commodity hardware. The Vanilla hadoop consists of a Distributed File System (DFS) also know as HDFS( Hadoop Distribted File System) at the core and also the libraries to support Map Reduce model to write programs and to do analysis. DFS is what enables Hadoop to be scalable. It takes care of splitting data into multiple nodes in a multi node cluster setup so that Map Reduce can work on individual data nodes thus enabling parallelism. Thus MapReduce is a programming paradigm for processing and handling larger data sets. 

Apache Cassandra is a highly scalable, consistent, distributed and structured key-value store. It is not a conventional database but is more like Hashtable or HashMap which stores a key/value pair. Cassandra works on top of HDFS thus makes use of it for scaling. 

Reason: 

BigTable makes use of a String Sorted Table (SSTable) inorder to store key/value pairs. It maintains an index which onsists of key and offset in the File for the key which enables reading of value for that key using only a seek to the offset location. SSTable is effectively immutable which means that after creating the File no modifications can be done to the existing key/value pairs but new key/value pairs can be/are appended to the file. Update and Delete of records, update with a newer key/value and deletion with a key and tombstone value are appended to the file. Duplicate keys are allowed in this file for SSTable. The index is also modified whenever update or delete takes place so that the offset for that key, points to the latest value or tombstone value are appended to the file.

answered Mar 26, 2018 by nitinrawat895
• 9,070 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to choose between Cassandra, Membase, Hadoop, MongoDB and RDBMS?

Actually it comes in two ways: One ...READ MORE

answered Sep 18, 2018 in Big Data Hadoop by Frankie
• 9,590 points
26 views
0 votes
10 answers

What is the difference between Mongodb and Hadoop?

Apart from the similarity that they are ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Deeraj
1,841 views
+2 votes
10 answers

Is there any difference between “hdfs dfs” and “hadoop fs” shell commands?

Yes, there's a difference between hadoop fs and ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Kunal
7,474 views
0 votes
1 answer

How to install and configure a multi-node Hadoop cluster?

I would recommend you to install Cent ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by Shubham
• 12,270 points
463 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
575 views
0 votes
1 answer

“no such file or directory" in case of hadoop fs -ls

The behaviour that you are seeing is ...READ MORE

answered May 9, 2018 in Big Data Hadoop by nitinrawat895
• 9,070 points

edited May 9, 2018 by nitinrawat895 1,689 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,070 points
1,680 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,070 points
132 views
0 votes
1 answer

Relationship between Spark, Hadoop and Cassandra?

Spark is a distributed in memory processing ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 9,070 points
70 views
0 votes
1 answer

How can I download only hdfs and not hadoop?

No, you cannot download HDFS alone because ...READ MORE

answered Mar 15, 2018 in Big Data Hadoop by nitinrawat895
• 9,070 points
55 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.