How do I integrate Kdb and hadoop

0 votes
How can I integrate KDB+ with Hadoop like OpensTSDB/MongoDB/Cassandra?

KDB+ works in distributed architecture and supports mapreduce. Is it an alternate option to Hadoop?

Can anyone resolve this query?

Jul 12, 2018 in Big Data Hadoop by shubham
• 7,340 points

2 answers to this question.

0 votes

1. Can Kx work with HDFS?

 Yes. However, it is unlikely to be chosen as an approach. The reasons the analytics industry is moving away from HDFS as a construct for analytics applies to Kx also. Throughput and latency of read/write operations using HDFS is much less efficient than with embedded storage or a distributed object or file system, even when using the same volume of storage equipment. Some contributors to the performance degradation of HDFS for Kx can be slightly mitigated by layering traditional file systems under HDFS, such as with the Lustre, GPFS or MapR file systems. Note that if the HDFS layer is implemented on top of another distributed file system, this throws up the possibility of using its perhaps more beneficial methods to read/write data into Kx, which somewhat makes the HDFS layer unnecessary. 

2. Can Kx ingest data directly from HDFS sources? 

Yes. This is a much more likely scenario for a sophisticated user of Kx’s kdb+ database. Kdb+ has interfaces for a wide range of ingest sources and languages, including the ability to ingest from HDFS files via the Hadoop utilities. For example “Hadoop FS” could be piped into a FIFO within the named-pipe support of q. 

3.What about MapReduce with kdb+?

 Use of the MapReduce model is inherent within kdb+. It can manifest not only across a distributed networked architecture but also can efficiently span shared memory when running many threads on one server 

4. Can Kx work alongside Hive or Spark?

 Yes. This is the best use case for Kx/Hadoop interoperation. For example, runtime data being generated and stored in Spark/HBase or Spark can be interoperated with Kx using a number of public interfaces e.g. The operating functions found within Kx are a superset of the functions offered in Spark. We envisage the requirement for an ETL (batch) process extracting data from a Hive or HBase database into kdb+, followed by q syntax data analytics. Performance and function of this will depend on the data model and the type of data being transformed.

 5. Can I port from kdb+ to one of the other toolsets in Hadoop? 

Nothing prevents this, but you will almost certainly end up with a slower solution in terms of latency, throughput and query time metrics. If this is acceptable to the user of the application it could be considered. For any time-series or similarly structured data, the data could be exported and reimported. The target system will lack some of the capabilities built into kdb+.

You can read more about it on the following link:

answered Jul 31, 2018 by Anmol
• 1,780 points
0 votes

 kdb+ could be installed on every Hadoop node and your Hadoop worker could use it, but you would have to use it a software component that is not Hadoop distributed.

Hadoop and kdb+ complementary technologies - not rivals. Spark Hadoop has some excellent graph theoretic and machine learning libraries. This are ideal for the very slow processing that a Spark cluster performs.

With Spark, you have to send a whole Java container over to a remote host, start it up, load it; so if you're going to all that trouble, you better do something really compute intensive with long difficult lines of control, multi-threaded heuristic stuff - like machine learning. (Once you have the Spark worker nodes running, you can send new data to them.)

q/kdb+, on the other hand, is fast! Starts in a flash, runs all day. Bomb-proof. And many FinTech firms use q/kdb+ for algorithmic trading. The same 5 years of data an analyst uses for back-testing with ksql can be used by a transactional real-time system - the Ticker Plant and all that.

You would probably have run a Spark machine learning job overnight to analyse some the data that q/kdb+ has prepared. This would generate metrics for tuning a trading system.

And again effbaie points out that q/kdb+ has a very simple C interface and is really fast over a network, so you might have a single site Spark cluster using 'q' as a client component accessing a server kdb+ database, rather than all that mucking about with the Hadoop File System.

kdb+ databases have fixed schema. NoSQL databases don't: the record structure can change.


answered Aug 6, 2018 by Abhi
• 3,720 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How do I get connected to Hadoop and Geo Spatial connector?

There are a number of free and ...READ MORE

answered Aug 14, 2018 in Big Data Hadoop by Frankie
• 9,830 points
0 votes
1 answer

How do I run Hadoop with Docker for both DEV and PROD environments?

Your requirements show that you are using ...READ MORE

answered May 30, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
0 votes
1 answer

How can I download only hdfs and not hadoop?

No, you cannot download HDFS alone because ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
0 votes
1 answer

How to get started with Hadoop and do some development using Eclipse IDE?

Alright, there are couple of things that ...READ MORE

answered Apr 4, 2018 in Big Data Hadoop by Ashish
• 2,650 points
0 votes
1 answer

How do I print hadoop properties in command line?

You can use the following command to get ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
0 votes
1 answer

How do I include all the Hadoop dependencies using Maven?

This is a dependency mismatch error. I ...READ MORE

answered Apr 10, 2018 in Big Data Hadoop by Shubham
• 13,490 points
0 votes
1 answer

Integration of Hadoop with Mongo DB concept

MongoDB isn't built to work on top ...READ MORE

answered Sep 25, 2018 in Big Data Hadoop by Frankie
• 9,830 points
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP