Will hadoop sqoop load oracle faster than SQL loader

0 votes

We presently load CDRs to an oracle warehouse using a combination of bash shell scripts and SQL loader with multiple threads. We are hoping to offload this process to hadoop because we envisage that the increase in data due to increase in subscriber base will soon max out the current system. And we also want to gradually introduce hadoop into our data warehouse environment.

Will loading from hadoop be faster? If so what's is the best set of hadoop tool for this?

Further info: We usually will get contunoius stream of pipe delimited text files through ftp to a folder, add two more fields to each record, load to temp tables in oracle and run a procedure to load to final table. How would u advice the process flow to be in terms of tools to use. For example;

  • files are ftp to the Linux file system (or is possible to ftp straight to hadoop?) and flume loads to Hadoop.

  • fields are added (what will be best to do this? Pig, hive, spark or any other recommendations)

  • files are then loaded to oracle using sqoop

  • the final procedure is called(can sqoop make an oracle procedure call? If not what tool will be best to execute the procedure and help control the whole process ?)

    Also, how can one control the level of parallelism? Does it equate the number of mappers running the job?

Sep 4, 2018 in Big Data Hadoop by Neha
• 6,300 points
1,837 views

1 answer to this question.

0 votes

down vote

Had a similar task of exporting data from a < 6 node Hadoop cluster to an Oracle Datewarehouse. I've tested the following:

  • Sqoop

  • OraOop

  • Oracle Loader for Hadoop from the "Oracle BigData Connectors" suite
  • Hadoop streaming job which uses sqloader as mapper, in its configuration you can read from stdin using: load data infile "-"

Considering just speed, the Hadoop streaming job with sqloader as a mapper was the fastest way to transfer the data, but you have to install sqloader on each machine of your cluster. It was more of a personal curiosity, I would not recommend using this way to export data, the logging capabilities are limited, and should have a bigger impact on your datawarehouse performance.

The winner was Sqoop, it is pretty reliable, it's the import/export tool of the Hadoop ecosystem and was second fastest solution, according to my tests.(1.5x slower than first place)

Sqoop with OraOop (last updated 2012) was slower than the latest version of Sqoop, and requires extra configuration on the cluster.

Finally, the worst time was obtained using Oracle's BigData Connectors, if you have a big cluster(>100 machines) then it should not be as bad as the time I obtained. The export was done in two steps. First step involves reprocessing the output and converting it to an Oracle Format that plays nice with the Datawarehouse. The second step was transferring the result to the Datawarehouse. This approach is better if you have a lot of processing power, and you would not impact the Datawarehouse's performance as much as the other solutions.

answered Sep 4, 2018 by Frankie
• 9,830 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Sqoop vs Oracle Hadoop Connectors

Most of the connectors would have the ...READ MORE

answered May 18, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
957 views
+11 votes
11 answers

Hadoop “Unable to load native-hadoop library for your platform” warning

modify the glibc version.CentOS provides safe softwares ...READ MORE

answered Sep 10, 2018 in Big Data Hadoop by bug_seeker
• 15,510 points
72,172 views
0 votes
1 answer

When hadoop-env.sh will be executed in hadoop

Yes you need to put in the ...READ MORE

answered Apr 3, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
1,258 views
0 votes
1 answer

Will hadoop replace data warehousing?

Mostly the answer is yes hadoop is ...READ MORE

answered Apr 4, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
678 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,072 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,571 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,058 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,639 views
+1 vote
1 answer
0 votes
1 answer

What is the prerequisite for BigQuery other than SQL?

If you know your SQL, you should ...READ MORE

answered Aug 23, 2018 in Big Data Hadoop by Frankie
• 9,830 points
1,078 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP