Apache Hadoop, is a big data analytics framework, focusing on near-time and batch-oriented analytics of historical data. It helps to run analytics on high volumes of historical business data on commodity hardware. The Vanilla hadoop consists of a Distributed File System (DFS) also know as HDFS( Hadoop Distribted File System) at the core and also the libraries to support Map Reduce model to write programs and to do analysis. DFS is what enables Hadoop to be scalable. It takes care of splitting data into multiple nodes in a multi node cluster setup so that Map Reduce can work on individual data nodes thus enabling parallelism. Thus MapReduce is a programming paradigm for processing and handling larger data sets.
Apache Cassandra is a highly scalable, consistent, distributed and structured key-value store. It is not a conventional database but is more like Hashtable or HashMap which stores a key/value pair. Cassandra works on top of HDFS thus makes use of it for scaling.
BigTable makes use of a String Sorted Table (SSTable) inorder to store key/value pairs. It maintains an index which onsists of key and offset in the File for the key which enables reading of value for that key using only a seek to the offset location. SSTable is effectively immutable which means that after creating the File no modifications can be done to the existing key/value pairs but new key/value pairs can be/are appended to the file. Update and Delete of records, update with a newer key/value and deletion with a key and tombstone value are appended to the file. Duplicate keys are allowed in this file for SSTable. The index is also modified whenever update or delete takes place so that the offset for that key, points to the latest value or tombstone value are appended to the file.