Exporting of ORC file to RDBMS using SQOOP

Question

I have a file in HDFS with ORC file type. I want to load that ORC file data to RDBMS using SQOOP. Can you provide me commands for exporting Command for ORC file which has HIVE metastore?

score 0 · Answer 1 · Jul 9, 2019

We have to use Sqoop-HCatalog Integration here. Just type “sqoop export help” in Bash and see what are all the sqoop parameter commands there for the Sqoop Export related to HCatalog. I got the details as below.

HCatalog arguments:

–hcatalog-database <arg> HCatalog database name

–hcatalog-home <hdir> Override $HCAT_HOME

–hcatalog-table <arg> HCatalog table name

–hive-home <dir> Override $HIVE_HOME

–hive-partition-key <partition-key> Sets the partition key to

use when importing to hive

–hive-partition-value <partition-value> Sets the partition value to

use when importing to hive

–map-column-hive <arg> Override mapping for

specific column to hive types.

Now see the sample code snippets below for a very simple Sqoop Export Process.

Step1:

Create a Hive Table in TXT format. You can create this in any format like Avro, RCfile, etc

create table customers_txt (customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as textfile;

OK

Time taken: 0.611 seconds

Step 2:

Create a Hive table in ORC format

create table customers(customer_id string, customer_name string, city string) row format delimited fields terminated by ‘,’ stored as orc;

OK

Time taken: 0.594 seconds

Step 3:

Create some test data here

bash-4.1$vi customer.txt

1 giri nellai

2 giri town

3 giri pheonix

4 parthi glendale

5 ram glendale

6 goutam campbell

:wq!

Step 4:

Load test data to a Txt formatted table

hive> load data local inpath ‘/tmp/customer.txt’ into table customers_txt;

Copying data from file:/tmp/customer.txt

Copying file: file:/tmp/customer.txt

Loading data to table default.customers_txt

Table default.customers_txt stats: [numFiles=1, numRows=0, totalSize=92, rawDataSize=0]

OK

Time taken: 1.301 seconds

Verify the data

hive> select * from customers_txt;

OK

1 giri nellai

2 giri town

3 giri pheonix

4 parthi glendale

5 ram glendale

6 goutam campbell

Time taken: 0.456 seconds, Fetched: 6 row(s)

Step 5: Insert the data into ORC table:

insert overwrite table customers select * from customers_txt;

Step 6:

Execute the below Sqoop Export Command

sqoop export –connect jdbc:oracle:thin:@Servername:1521/dbName –username *********** –password ********* –table dbName.CUSTOMERS–hcatalog-table customers

Verify the Sqoop Job Output:

15/09/08 17:02:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1438142065989_98389

15/09/08 17:02:27 INFO impl.YarnClientImpl: Submitted application application_1438142065989_98389

15/09/08 17:02:27 INFO mapreduce.Job: The url to track the job: http://n01bdl303.aap.csaa.pri:8088/proxy/application_1438142065989_98389/

15/09/08 17:02:27 INFO mapreduce.Job: Running job: job_1438142065989_98389

15/09/08 17:02:38 INFO mapreduce.Job: Job job_1438142065989_98389 running in uber mode : false

15/09/08 17:02:38 INFO mapreduce.Job: map 0% reduce 0%

15/09/08 17:02:46 INFO mapreduce.Job: map 100% reduce 0%

15/09/08 17:02:46 INFO mapreduce.Job: Job job_1438142065989_98389 completed successfully

15/09/08 17:02:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=269257

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=10756

HDFS: Number of bytes written=0

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=5338

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=5338

Total vcore-seconds taken by all map tasks=5338

Total megabyte-seconds taken by all map tasks=24298576

Map-Reduce Framework

Map input records=6

Map output records=6

Input split bytes=10112

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=41

CPU time spent (ms)=1740

Physical memory (bytes) snapshot=337338368

Virtual memory (bytes) snapshot=8627503104

Total committed heap usage (bytes)=2070413312

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=0

15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Transferred 10.5039 KB in 30.9629 seconds (347.3836 bytes/sec)

15/09/08 17:02:46 INFO mapreduce.ExportJobBase: Exported 6 records.