OLTP vs OLAP
OLTP is said to be more of an online transactional system or data storage system, where the user does lots of online transactions using the data store. It is also said to have more ad-hoc reads/writes happening on real time basis.
OLAP is more of an offline data store. It is accessed number of times in offline fashion. For example, Bulk log files are read and then written back to data files. Some of the common areas where OLAP is used are Log Jobs, Data mining Jobs, etc.
Cassandra is said to be more of OLTP, as it is real-time, whereas Hadoop is more of OLAP, since it is used for analytics and bulk writes.
Why Integrate OLAP & OLTP?
If in case you are looking for the cheapest price for hotel booking in next 365 days, here you have a huge data set for Cassandra and want to have recommendation on real time database, a promo is run based on price.
In such a scenario, we have to iterate all records and keep analytics on top of it, which is a huge offline job that has to be kick-started often. Here, Hadoop comes into play for Bulk data crunching.
The other benefit is that we can run one cluster and abort running a different Hadoop cluster.
The third benefit is that one can also reduce a lot of operation cost.
Given a scenario, in which, if a user is well-versed in various Hadoop Eco-systems, like Hive, Pig Latin and needs to integrate data into it, then one must plug-in some data source in Cassandra and try to run Map Reduce jobs as well.
There is a noticeable pattern between OLTP & OLAP. In OLTP, there is less number of writes, e.g. Hotel Information. Assuming that the Price changes happen every 5000 times per second, the reads may be more here. In such a scenario, there can be 1 write per second but reads could expel to hundreds and thousands. So the ratio here is around 1:1000.
It is an interesting observation that Cassandra can fit into this model easily, which includes models, where read/write is equal. Also, when it comes to OLTP, even if one gets into a tuneable and strong consistency model, one can see a millisecond gap between eventual consistent models and strongest consistent models. Thus, Cassandra can fit into OLTP.
Coming to OLAP, one can see different OLAP patterns, which means there are several writes happening simultaneously. In OLAP, we dump data in one shot i.e., all log files are put into data store and then we start processing. The data pattern or access pattern is exactly the opposite of OLTP kind of application. Here, the Hadoop or MapReduce will be useful.
Got a question for us? Mention them in the comments section and we will get back to you.