Exporting of ORC file to RDBMS using SQOOP

0 votes
I have a file in HDFS with ORC file type. I want to load that ORC file data to RDBMS using SQOOP. Can you provide me commands for exporting Command for ORC file which has HIVE metastore?
Jul 9, 2019 in Big Data Hadoop by Aisha
148 views

1 answer to this question.

0 votes

We have to use Sqoop-HCatalog Integration here. Just type “sqoop export help” in Bash and see what are all the sqoop parameter commands there for the Sqoop Export related to HCatalog. I got the details as below.

HCatalog arguments:

–hcatalog-database <arg> HCatalog database name

–hcatalog-home <hdir> Override $HCAT_HOME

–hcatalog-table <arg> HCatalog table name

–hive-home <dir> Override $HIVE_HOME

–hive-partition-key <partition-key> Sets the partition key to

use when importing to hive

–hive-partition-value <partition-value> Sets the partition value to

use when importing to hive

–map-column-hive <arg> Override mapping for

specific column to hive types.

Now see the sample code snippets below for a very simple Sqoop Export Process.

Step1:

Create a Hive Table in TXT format. You can create this in any format like Avro, RCfile, etc

create table customers_txt (customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as textfile;

OK

Time taken: 0.611 seconds

Step 2:

Create a Hive table in ORC format

create table customers(customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as orc;

OK

Time taken: 0.594 seconds

Step 3:

Create some test data here

bash-4.1$vi customer.txt

1 giri nellai

2 giri town

3 giri pheonix

4 parthi glendale

5 ram glendale

6 goutam campbell

:wq!

Step 4:

Load test data to a Txt formatted table

hive> load data local inpath ‘/tmp/customer.txt’ into table customers_txt;

Copying data from file:/tmp/customer.txt

Copying file: file:/tmp/customer.txt

Loading data to table default.customers_txt

Table default.customers_txt stats: [numFiles=1, numRows=0, totalSize=92, rawDataSize=0]

OK

Time taken: 1.301 seconds

Verify the data

hive> select * from customers_txt;

OK

1 giri nellai

2 giri town

3 giri pheonix

4 parthi glendale

5 ram glendale

6 goutam campbell

Time taken: 0.456 seconds, Fetched: 6 row(s)

Step 5: Insert the data into ORC table:

insert overwrite table customers select * from customers_txt;

Step 6:

Execute the below Sqoop Export Command

sqoop export –connect jdbc:oracle:thin:@Servername:1521/dbName –username *********** –password ********* –table dbName.CUSTOMERS–hcatalog-table customers

Verify the Sqoop Job Output:

15/09/08 17:02:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1438142065989_98389

15/09/08 17:02:27 INFO impl.YarnClientImpl: Submitted application application_1438142065989_98389

15/09/08 17:02:27 INFO mapreduce.Job: The url to track the job: http://n01bdl303.aap.csaa.pri:8088/proxy/application_1438142065989_98389/

15/09/08 17:02:27 INFO mapreduce.Job: Running job: job_1438142065989_98389

15/09/08 17:02:38 INFO mapreduce.Job: Job job_1438142065989_98389 running in uber mode : false

15/09/08 17:02:38 INFO mapreduce.Job: map 0% reduce 0%

15/09/08 17:02:46 INFO mapreduce.Job: map 100% reduce 0%

15/09/08 17:02:46 INFO mapreduce.Job: Job job_1438142065989_98389 completed successfully

15/09/08 17:02:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=269257

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=10756

HDFS: Number of bytes written=0

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=5338

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=5338

Total vcore-seconds taken by all map tasks=5338

Total megabyte-seconds taken by all map tasks=24298576

Map-Reduce Framework

Map input records=6

Map output records=6

Input split bytes=10112

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=41

CPU time spent (ms)=1740

Physical memory (bytes) snapshot=337338368

Virtual memory (bytes) snapshot=8627503104

Total committed heap usage (bytes)=2070413312

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=0

15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Transferred 10.5039 KB in 30.9629 seconds (347.3836 bytes/sec)

15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Exported 6 records.
answered Jul 9, 2019 by Rishi

Related Questions In Big Data Hadoop

0 votes
1 answer

How to change file format using Sqoop?

For change in the file format, you ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 68,840 points
344 views
0 votes
1 answer

How to print the content of a file in console present in HDFS?

Yes, you can use hdfs dfs command ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by Shubham
• 13,370 points
791 views
0 votes
1 answer

How to transfer data from Netezza to HDFS using Apache Sqoop?

Remove the --direct option. It gives issue ...READ MORE

answered Apr 23, 2018 in Big Data Hadoop by kurt_cobain
• 9,290 points
394 views
0 votes
1 answer

Why is jar file required to execute a MR code instead of class file?

We use hadoop keyword to invoke the ...READ MORE

answered Apr 24, 2018 in Big Data Hadoop by Shubham
• 13,370 points
73 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
3,838 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
521 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
20,308 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,290 points
1,436 views
0 votes
1 answer

How to override Mapred home directory using sqoop?

Sqoop provides an option to override the ...READ MORE

answered May 17, 2019 in Big Data Hadoop by Rajini
65 views
0 votes
3 answers

How to change the delimiter in Sqoop?

--fields-terminated-by <char> READ MORE

answered Jun 25, 2019 in Big Data Hadoop by anonymous
2,371 views