How do I integrate Kdb+ and hadoop?

Question

How can I integrate KDB+ with Hadoop like OpensTSDB/MongoDB/Cassandra?

KDB+ works in distributed architecture and supports mapreduce. Is it an alternate option to Hadoop?

Can anyone resolve this query?

Thanks

Anmol · Answer

1. Can Kx work with HDFS?&#160;Yes. However, it is unlikely to be chosen as an approach. The reasons the analytics industry is moving away from HDFS as a construct for analytics applies to Kx also. Throughput and latency of read/write operations using HDFS is much less efficient than with embedded storage or a distributed object or file system, even when using the same volume of storage equipment. Some contributors to the performance degradation of HDFS for Kx can be slightly mitigated by layering traditional file systems under HDFS, such as with the Lustre, GPFS or MapR file systems. Note that if the HDFS layer is implemented on top of another distributed file system, this throws up the possibility of using its perhaps more beneficial methods to read/write data into Kx, which somewhat makes the HDFS layer unnecessary.&#160;2. Can Kx ingest data directly from HDFS sources?&#160;Yes. This is a much more likely scenario for a sophisticated user of Kx&#8217;s kdb+ database. Kdb+ has interfaces for a wide range of ingest sources and languages, including the ability to ingest from HDFS files via the Hadoop utilities. For example &#8220;Hadoop FS&#8221; could be piped into a FIFO within the named-pipe support of q.&#160;3.What about MapReduce with kdb+?&#160;Use of the MapReduce model is inherent within kdb+. It can manifest not only across a distributed networked architecture but also can efficiently span shared memory when running many threads on one server&#160;4. Can Kx work alongside Hive or Spark?&#160;Yes. This is the best use case for Kx/Hadoop interoperation. For example, runtime data being generated and stored in Spark/HBase or Spark can be interoperated with Kx using a number of public interfaces e.g. The operating functions found within Kx are a superset of the functions offered in Spark. We envisage the requirement for an ETL (batch) process extracting data from a Hive or HBase database into kdb+, followed by q syntax data analytics. Performance and function of this will depend on the data model and the type of data being transformed.&#160;5. Can I port from kdb+ to one of the other toolsets in Hadoop?&#160;Nothing prevents this, but you will almost certainly end up with a slower solution in terms of latency, throughput and query time metrics. If this is acceptable to the user of the application it could be considered. For any time-series or similarly structured data, the data could be exported and reimported. The target system will lack some of the capabilities built into kdb+.You can read more about it on the following link:&#160;https://bit.ly/2LDMLWC

Abhi · Answer

&#160;kdb+ could be installed on every Hadoop node and your Hadoop worker could use it, but you would have to use it a software component that is not Hadoop distributed.Hadoop and kdb+ complementary technologies - not rivals. Spark Hadoop has some excellent graph theoretic and machine learning libraries. This are ideal for the very slow processing that a Spark cluster performs.With Spark, you have to send a whole Java container over to a remote host, start it up, load it; so if you're going to all that trouble, you better do something really compute intensive with long difficult lines of control, multi-threaded heuristic stuff - like machine learning. (Once you have the Spark worker nodes running, you can send new data to them.)q/kdb+, on the other hand, is fast! Starts in a flash, runs all day. Bomb-proof. And many FinTech firms use q/kdb+ for algorithmic trading. The same 5 years of data an analyst uses for back-testing with ksql can be used by a transactional real-time system - the Ticker Plant and all that.You would probably have run a Spark machine learning job overnight to analyse some the data that q/kdb+ has prepared. This would generate metrics for tuning a trading system.And again effbaie points out that q/kdb+ has a very simple C interface and is really fast over a network, so you might have a single site Spark cluster using 'q' as a client component accessing a server kdb+ database, rather than all that mucking about with the Hadoop File System.kdb+ databases have fixed schema. NoSQL databases don't: the record structure can change.Credits:&#160;https://groups.google.com/forum/#!topic/personal-kdbplus/iEYVJLD-2fY

How do I integrate Kdb and hadoop

Your comment on this question:

2 answers to this question.

Your answer

Your comment on this answer:

Your comment on this answer:

Related Questions In Big Data Hadoop

How do I get connected to Hadoop and Geo Spatial connector?

How do I run Hadoop with Docker for both DEV and PROD environments?

How can I download only hdfs and not hadoop?

How to get started with Hadoop and do some development using Eclipse IDE?

How do I print hadoop properties in command line?

How do I include all the Hadoop dependencies using Maven?

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

Hadoop dfs -ls command?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES