Cassandra and Hadoop - realtime vs batch

+1 vote

What are the differences in the architecture/implementation of Cassandra and Hadoop which account for real-time transaction processing in Apache Cassandra, while batch-oriented analytical processing in Hadoop?

Mar 26, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
1,473 views

1 answer to this question.

0 votes

Apache Hadoop, is a big data analytics framework, focusing on near-time and batch-oriented analytics of historical data. It helps to run analytics on high volumes of historical business data on commodity hardware. The Vanilla hadoop consists of a Distributed File System (DFS) also know as HDFS( Hadoop Distribted File System) at the core and also the libraries to support Map Reduce model to write programs and to do analysis. DFS is what enables Hadoop to be scalable. It takes care of splitting data into multiple nodes in a multi node cluster setup so that Map Reduce can work on individual data nodes thus enabling parallelism. Thus MapReduce is a programming paradigm for processing and handling larger data sets. 

Apache Cassandra is a highly scalable, consistent, distributed and structured key-value store. It is not a conventional database but is more like Hashtable or HashMap which stores a key/value pair. Cassandra works on top of HDFS thus makes use of it for scaling. 

Reason: 

BigTable makes use of a String Sorted Table (SSTable) inorder to store key/value pairs. It maintains an index which onsists of key and offset in the File for the key which enables reading of value for that key using only a seek to the offset location. SSTable is effectively immutable which means that after creating the File no modifications can be done to the existing key/value pairs but new key/value pairs can be/are appended to the file. Update and Delete of records, update with a newer key/value and deletion with a key and tombstone value are appended to the file. Duplicate keys are allowed in this file for SSTable. The index is also modified whenever update or delete takes place so that the offset for that key, points to the latest value or tombstone value are appended to the file.

answered Mar 26, 2018 by nitinrawat895
• 11,380 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Differences between Cassandra and Hadoop, Real-time Processing v/s Batch Processing

Hadoop is basically designed including HDFS loaded with ...READ MORE

answered Jun 18, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
665 views
0 votes
1 answer

How to choose between Cassandra, Membase, Hadoop, MongoDB and RDBMS?

Actually it comes in two ways: One ...READ MORE

answered Sep 18, 2018 in Big Data Hadoop by Frankie
• 9,830 points
630 views
0 votes
10 answers

What is the difference between Mongodb and Hadoop?

MongoDB is a NoSQL database, whereas Hadoop is ...READ MORE

answered Jun 20, 2018 in Big Data Hadoop by jenny_code
11,266 views
+2 votes
10 answers

Is there any difference between “hdfs dfs” and “hadoop fs” shell commands?

hadoop fs <args> fs is used for generic ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by anonymous
32,406 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,232 views
0 votes
1 answer

“no such file or directory" in case of hadoop fs -ls

The behaviour that you are seeing is ...READ MORE

answered May 9, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points

edited May 9, 2018 by nitinrawat895 7,788 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,521 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,165 views
0 votes
1 answer

Relationship between Spark, Hadoop and Cassandra?

Spark is a distributed in memory processing ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,986 views
0 votes
1 answer

How can I download only hdfs and not hadoop?

No, you cannot download HDFS alone because ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,082 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP