Exporting of ORC file to RDBMS using SQOOP

0 votes
I have a file in HDFS with ORC file type. I want to load that ORC file data to RDBMS using SQOOP. Can you provide me commands for exporting Command for ORC file which has HIVE metastore?
Jul 9 in Big Data Hadoop by Aisha
46 views

1 answer to this question.

0 votes

We have to use Sqoop-HCatalog Integration here. Just type “sqoop export help” in Bash and see what are all the sqoop parameter commands there for the Sqoop Export related to HCatalog. I got the details as below.

HCatalog arguments:

–hcatalog-database <arg> HCatalog database name

–hcatalog-home <hdir> Override $HCAT_HOME

–hcatalog-table <arg> HCatalog table name

–hive-home <dir> Override $HIVE_HOME

–hive-partition-key <partition-key> Sets the partition key to

use when importing to hive

–hive-partition-value <partition-value> Sets the partition value to

use when importing to hive

–map-column-hive <arg> Override mapping for

specific column to hive types.

Now see the sample code snippets below for a very simple Sqoop Export Process.

Step1:

Create a Hive Table in TXT format. You can create this in any format like Avro, RCfile, etc

create table customers_txt (customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as textfile;

OK

Time taken: 0.611 seconds

Step 2:

Create a Hive table in ORC format

create table customers(customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as orc;

OK

Time taken: 0.594 seconds

Step 3:

Create some test data here

bash-4.1$vi customer.txt

1 giri nellai

2 giri town

3 giri pheonix

4 parthi glendale

5 ram glendale

6 goutam campbell

:wq!

Step 4:

Load test data to a Txt formatted table

hive> load data local inpath ‘/tmp/customer.txt’ into table customers_txt;

Copying data from file:/tmp/customer.txt

Copying file: file:/tmp/customer.txt

Loading data to table default.customers_txt

Table default.customers_txt stats: [numFiles=1, numRows=0, totalSize=92, rawDataSize=0]

OK

Time taken: 1.301 seconds

Verify the data

hive> select * from customers_txt;

OK

1 giri nellai

2 giri town

3 giri pheonix

4 parthi glendale

5 ram glendale

6 goutam campbell

Time taken: 0.456 seconds, Fetched: 6 row(s)

Step 5: Insert the data into ORC table:

insert overwrite table customers select * from customers_txt;

Step 6:

Execute the below Sqoop Export Command

sqoop export –connect jdbc:oracle:thin:@Servername:1521/dbName –username *********** –password ********* –table dbName.CUSTOMERS–hcatalog-table customers

Verify the Sqoop Job Output:

15/09/08 17:02:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1438142065989_98389

15/09/08 17:02:27 INFO impl.YarnClientImpl: Submitted application application_1438142065989_98389

15/09/08 17:02:27 INFO mapreduce.Job: The url to track the job: http://n01bdl303.aap.csaa.pri:8088/proxy/application_1438142065989_98389/

15/09/08 17:02:27 INFO mapreduce.Job: Running job: job_1438142065989_98389

15/09/08 17:02:38 INFO mapreduce.Job: Job job_1438142065989_98389 running in uber mode : false

15/09/08 17:02:38 INFO mapreduce.Job: map 0% reduce 0%

15/09/08 17:02:46 INFO mapreduce.Job: map 100% reduce 0%

15/09/08 17:02:46 INFO mapreduce.Job: Job job_1438142065989_98389 completed successfully

15/09/08 17:02:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=269257

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=10756

HDFS: Number of bytes written=0

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=5338

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=5338

Total vcore-seconds taken by all map tasks=5338

Total megabyte-seconds taken by all map tasks=24298576

Map-Reduce Framework

Map input records=6

Map output records=6

Input split bytes=10112

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=41

CPU time spent (ms)=1740

Physical memory (bytes) snapshot=337338368

Virtual memory (bytes) snapshot=8627503104

Total committed heap usage (bytes)=2070413312

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=0

15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Transferred 10.5039 KB in 30.9629 seconds (347.3836 bytes/sec)

15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Exported 6 records.
answered Jul 9 by Rishi

Related Questions In Big Data Hadoop

0 votes
1 answer

How to change file format using Sqoop?

For change in the file format, you ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 67,620 points
257 views
0 votes
1 answer

How to print the content of a file in console present in HDFS?

Yes, you can use hdfs dfs command ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by Shubham
• 13,300 points
577 views
0 votes
1 answer

How to transfer data from Netezza to HDFS using Apache Sqoop?

Remove the --direct option. It gives issue ...READ MORE

answered Apr 23, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
312 views
0 votes
1 answer

Why is jar file required to execute a MR code instead of class file?

We use hadoop keyword to invoke the ...READ MORE

answered Apr 24, 2018 in Big Data Hadoop by Shubham
• 13,300 points
55 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
3,012 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
336 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
14,869 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
1,102 views
0 votes
1 answer

How to override Mapred home directory using sqoop?

Sqoop provides an option to override the ...READ MORE

answered May 17 in Big Data Hadoop by Rajini
36 views
0 votes
3 answers

How to change the delimiter in Sqoop?

--fields-terminated-by <char> READ MORE

answered Jun 25 in Big Data Hadoop by anonymous
1,584 views