Will hadoop(sqoop) load oracle faster than SQL loader?

Question

We presently load CDRs to an oracle warehouse using a combination of bash shell scripts and SQL loader with multiple threads. We are hoping to offload this process to hadoop because we envisage that the increase in data due to increase in subscriber base will soon max out the current system. And we also want to gradually introduce hadoop into our data warehouse environment.Will loading from hadoop be faster? If so what's is the best set of hadoop tool for this?Further info: We usually will get contunoius stream of pipe delimited text files through ftp to a folder, add two more fields to each record, load to temp tables in oracle and run a procedure to load to final table. How would u advice the process flow to be in terms of tools to use. For example;files are ftp to the Linux file system (or is possible to ftp straight to hadoop?) and flume loads to Hadoop.fields are added (what will be best to do this? Pig, hive, spark or any other recommendations)files are then loaded to oracle using sqoopthe final procedure is called(can sqoop make an oracle procedure call? If not what tool will be best to execute the procedure and help control the whole process ?)Also, how can one control the level of parallelism? Does it equate the number of mappers running the job?

Frankie · Answer

down voteHad a similar task of exporting data from a < 6 node Hadoop cluster to an Oracle Datewarehouse. I've tested the following:SqoopOraOopOracle Loader for Hadoop&#160;from the "Oracle BigData Connectors" suiteHadoop streaming job which uses sqloader as mapper, in its configuration you can read from stdin using:&#160;load data infile "-"Considering just speed, the Hadoop streaming job with sqloader as a mapper was the fastest way to transfer the data, but you have to install sqloader on each machine of your cluster. It was more of a personal curiosity, I would not recommend using this way to export data, the logging capabilities are limited, and should have a bigger impact on your datawarehouse performance.The winner was Sqoop, it is pretty reliable, it's the import/export tool of the Hadoop ecosystem and was second fastest solution, according to my tests.(1.5x slower than first place)Sqoop with OraOop (last updated 2012) was slower than the latest version of Sqoop, and requires extra configuration on the cluster.Finally, the worst time was obtained using Oracle's BigData Connectors, if you have a big cluster(>100 machines) then it should not be as bad as the time I obtained. The export was done in two steps. First step involves reprocessing the output and converting it to an Oracle Format that plays nice with the Datawarehouse. The second step was transferring the result to the Datawarehouse. This approach is better if you have a lot of processing power, and you would not impact the Datawarehouse's performance as much as the other solutions.

Will hadoop sqoop load oracle faster than SQL loader

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

Sqoop vs Oracle Hadoop Connectors

Hadoop “Unable to load native-hadoop library for your platform” warning

When hadoop-env.sh will be executed in hadoop

Will hadoop replace data warehousing?

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

Hadoop dfs -ls command?

When I try to execute any Hadoop commands error pops up saying unable to find or load main class M

What is the prerequisite for BigQuery other than SQL?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES