Will hadoop(sqoop) load oracle faster than SQL loader?

0 votes

We presently load CDRs to an oracle warehouse using a combination of bash shell scripts and SQL loader with multiple threads. We are hoping to offload this process to hadoop because we envisage that the increase in data due to increase in subscriber base will soon max out the current system. And we also want to gradually introduce hadoop into our data warehouse environment.

Will loading from hadoop be faster? If so what's is the best set of hadoop tool for this?

Further info: We usually will get contunoius stream of pipe delimited text files through ftp to a folder, add two more fields to each record, load to temp tables in oracle and run a procedure to load to final table. How would u advice the process flow to be in terms of tools to use. For example;

  • files are ftp to the Linux file system (or is possible to ftp straight to hadoop?) and flume loads to Hadoop.

  • fields are added (what will be best to do this? Pig, hive, spark or any other recommendations)

  • files are then loaded to oracle using sqoop

  • the final procedure is called(can sqoop make an oracle procedure call? If not what tool will be best to execute the procedure and help control the whole process ?)

    Also, how can one control the level of parallelism? Does it equate the number of mappers running the job?

Sep 4, 2018 in Big Data Hadoop by Neha
• 6,280 points
296 views

1 answer to this question.

0 votes

Had a similar task of exporting data from a < 6 node Hadoop cluster to an Oracle Datewarehouse. I've tested the following:

  • Sqoop

  • OraOop

  • Oracle Loader for Hadoop from the "Oracle BigData Connectors" suite
  • Hadoop streaming job which uses sqloader as mapper, in its configuration you can read from stdin using: load data infile "-"

Considering just speed, the Hadoop streaming job with sqloader as a mapper was the fastest way to transfer the data, but you have to install sqloader on each machine of your cluster. It was more of a personal curiosity, I would not recommend using this way to export data, the logging capabilities are limited, and should have a bigger impact on your datawarehouse performance.

The winner was Sqoop, it is pretty reliable, it's the import/export tool of the Hadoop ecosystem and was second fastest solution, according to my tests.(1.5x slower than first place)

Sqoop with OraOop (last updated 2012) was slower than the latest version of Sqoop, and requires extra configuration on the cluster.

Finally, the worst time was obtained using Oracle's BigData Connectors, if you have a big cluster(>100 machines) then it should not be as bad as the time I obtained. The export was done in two steps. First step involves reprocessing the output and converting it to an Oracle Format that plays nice with the Datawarehouse. The second step was transferring the result to the Datawarehouse. This approach is better if you have a lot of processing power, and you would not impact the Datawarehouse's performance as much as the other solutions.

answered Sep 4, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Sqoop vs Oracle Hadoop Connectors

Most of the connectors would have the ...READ MORE

answered May 18, 2018 in Big Data Hadoop by nitinrawat895
• 10,730 points
70 views
+10 votes
11 answers

Hadoop “Unable to load native-hadoop library for your platform” warning

modify the glibc version.CentOS provides safe softwares ...READ MORE

answered Sep 10, 2018 in Big Data Hadoop by bug_seeker
• 15,360 points
13,553 views
0 votes
1 answer

When hadoop-env.sh will be executed in hadoop

Yes you need to put in the ...READ MORE

answered Apr 3, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
114 views
0 votes
1 answer

Will hadoop replace data warehousing?

Mostly the answer is yes hadoop is ...READ MORE

answered Apr 4, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
27 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
3,379 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
407 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,858 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,231 views
+1 vote
1 answer
0 votes
1 answer

What is the prerequisite for BigQuery other than SQL?

If you know your SQL, you should ...READ MORE

answered Aug 23, 2018 in Big Data Hadoop by Frankie
• 9,810 points
36 views