Big Data and Hadoop (168 Blogs) Become a Certified Professional

Oracle to HDFS using Sqoop

Last updated on May 22,2019 24.4K Views
Awanish
Awanish is a Sr. Research Analyst at Edureka. He has rich expertise... Awanish is a Sr. Research Analyst at Edureka. He has rich expertise in Big Data technologies like Hadoop, Spark, Storm, Kafka, Flink. Awanish also...

1. Download the Oracle Expresss Edition from the below link and unzip it.

https://docs.google.com/a/edureka.in/file/d/0B2-rlCGKD40NNW5BcHZMTkdtcmc/edit

61

2. After extracting you will find a executable file of Oracle XE Edition as in the below image.

62

3. Double click on the OracleXEUniv to install Oracle database on your system and click on Run.

63

4. Click on Next

64

5. Accept the licence agreement and click on next.

65

6. By default oracle will take system as database name. Let us enter password

for this database.

Enter Password –> system

Confirm Password –> system

Click Next:

66

7. Click on Install

6768

8. Click on Finish

69

9. Let us edit the sqlnet.ora file present in the folder

C:oraclexepporacleproduct.2.0serverNETWORKADMIN

When you open it you will find the below content.

70

Edit it as in the Below Screen Shot

71

10. Let us start the SQL Command Line of Oracle.

Goto Start menu -> All Programs -> Oracle Database 10g Express Edition ->

Run SQL Command Line and double click on it.

72

11. You will get the Cli fo Oracle Database.

73

12. Let us connect to the Oracle database.

Username: system

Password: system

74

You will receive the message connected successfully.

13. Let us create a simple table.

Command:

create table emp (id number);

75

14. Let us insert some values in it by using Insert command.

Command:

insert into emp values(2);

76

15.Let us check if the data is inserted into the table by using Select Command.

Command:

select * from emp;

77

16. Let us commit the data.

Command:

commit;

78

17. To import the data from Oracle Database to Sqoop we need to add the

Oracle Connector (ojdbc6_g.jar).

You can download the jar from the below link.

https://docs.google.com/a/edureka.in/file/d/0B2-rlCGKD40Nekw3ZXBRWUU5Y1E/edit

18.Open Cloudera cdh3 and move Oracle connector to Cloudera cdh3 (To Desktop) using FileZilla.

Use the below link to understand how to move a file from Windows to cloudera cdh3 vm.

https://www.edureka.co/blog/transfer-files-windows-cloudera-demo-vm/

19. Once the Oracle connector is present on Cloudera Cdh3 Desktop, move it to the lib folder of

sqoop by executing the below command:

Command:

sudo cp /home/cloudera/Desktop/ojdbc6_g.jar /usr/lib/sqoop/lib/

79

20. Change the directory to Sqoop by executing the below command:

Command:

cd /usr/lib/sqoop/

80

21. Open Command Prompt (CMD) on Windows and check the IPv4 Address

by executing the

below command:

Command:

ipconfig

81

22. Import the data of the table emp present in Oracle database to hdfs by executing the below

command:

Required items for the command:

IPv4 Address – Your IPv4 address. In my case it is 192.168.46.1

Database Name – system

Table Name – emp

Username – system

Password – system

Output Directory – Could be any. I have used sqoopoutput1

Command:

sudo bin/sqoop import –connect jdbc:oracle:thin:system/

system@192.168.46.1:1521:xe

–username system -P –table system.emp –columns “ID” –target-dir /

sqoopoutput1 -m 1

83

23. When the command got executed successfully, you will receive the message

that Retrieved Records as in the below image.

84

24. Open the Browser and go to the below URL:

URL: http://localhost:50070/dfshealth.jsp

Click on Browse the filesystem

86

25. Click on sqoopoutput1 directory

87

26. Click on part-m-00000 file:

88

27.Below is the data that was imported from Oracle database:

89

Congratulations! You have Successfully Imported Data from Oracle Database

to HDFS Using Sqoop..!

Comments
21 Comments
  • chinmaye sri says:

    Simply superb article thank you

  • jyoti says:

    I am getting error like below-

    cloudera@cloudera-vm:~$ sqoop import –connect jdbc:oracle:thin:sys/Welcome123@192.168.41.1:1521:orcl –username tgthr -P –table transactions –columns “tid,accounid,amount” –m 1
    Enter password:
    16/06/27 05:34:51 INFO tool.CodeGenTool: Beginning code generation
    16/06/27 05:34:53 ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: Io exception: The Network Adapter could not establish the connection
    16/06/27 05:34:53 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
    java.lang.NullPointerException
    at com.cloudera.sqoop.orm.ClassWriter.generate(ClassWriter.java:935)
    at com.cloudera.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
    at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:337)
    at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
    at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
    at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:218)
    at com.cloudera.sqoop.Sqoop.main(Sqoop.java:228)
    cloudera@cloudera-vm:~$

    • EdurekaSupport says:

      Hey jyoti, thanks for checking out our blog.
      The error “network adaptor could not establish connection” is coming because of incorrect jdbc url. Jdbc url in your sqoop command should be in this format: jdbc:oracle:thin:@192.168.41.1:1521:orcl
      The connection refused error may also occur by scenarios as far as :
      1.The Oracle service might not be running on the specified host on the given port number.
      2.The firewall in between might restrict the client access to the oracle server through the given port number.Also it will be best to check if the oracle DB listener is started listening properly on port 1521 or not?
      It will be also good to check if the port is accessible fine from the sqoop client host:
      telnet 1521
      telnet 192.168.41.1:1521
      Hope this helps. Cheers!

  • kesava p says:

    how much time taking 1 TB data importing from oracle to HDFS

    • EdurekaSupport says:

      Hey Kesava, thanks for checking out our blog. The estimated time is completely dependent s on the cluster, RAM size and CPU of the system where both the things are installed.
      Hope this helps. Cheers!

  • davy says:

    Hi , I am facing one issue like below.

    sqoop import –connect jdbc:oracle:thin:system/system@192.168.5.1:1521:ORCL –username system -P –table system.emp –columns “ID” –target-dir /sqoopoutput1 -m 1

    java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection

    the only difference is oracle sid , and my database version is

    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 – 64bit Production

    PL/SQL Release 11.2.0.1.0 – Production

    CORE 11.2.0.1.0 Production

    TNS for 64-bit Windows: Version 11.2.0.1.0 – Production

    NLSRTL Version 11.2.0.1.0 – Production

    • EdurekaSupport says:

      Hey Davy, thanks for checking out our blog. The connection refused error may occur for scenarios such as:
      1.The Oracle service might not be running on the specified host on the given port number.
      2.The firewall in between might restrict the client access to the oracle server through the given port number.
      So, we suggest that you first confirm the oracle host, port and the firewall restriction in between.
      You can easily check the access by using telnet as below,
      telnet 192.162.2.8 1521
      Check if your oracle service is running and no firewall is blocking your connection.
      Check if oracle is listening in port 1521 or not, if not fix the port issue and then try connecting to your database.
      It will be also good to check if the port is accessible fine from the sqoop client host:
      telnet 1521
      The telnet command it just to verify the connectivity between the host where you are running the “./sqoop import” command to the Oracle Database host ” 1521″.
      Hope this helps. Cheers!

  • Prashant says:

    If I have a client Server Architecture……..
    so how i import data from Oracle Server To HDFS which is installed in client machine………

  • aa says:

    Hi Awanish, thanks for the good post. Can u pls suggest good method to do the activity given below.

    1. Source – Oracle DB (tables) in our local data center
    2. Channel – will be using sqoop
    3. Destination – Amazon AWS cluster

    How do we move the data from Oracle DB which is in our local data center to AWS.

    • Vineet says:

      You need to follow the same steps. In your particular scenario the only difference is that the hdfs is present remotely on AWS servers. In this case you need to use the public ip of the AWS server so that you can connect to it remotely. Rest every step remains the same. I hope this helps!

      • aa says:

        Sure, will try and update the findings. Thank you. Also how do we configure Sqoop on AWS. Do we need to go for Sqoop1 or 2. After installing Sqoop where to update the env varaibles…it will be great if some one can help in this.

  • Shweta says:

    hello
    i am having doubts like

    while creating table in SQL command prompt .. i have not created any database so by default database name will be system ?

    2.. login and password is system.. i have created table with four fields table name is Operator, fiels names are : PID in number, Name in Char(25), Skills in char(25), Operates in number.. now i want to import this to my edureka VM i am getting this error please help me out in solving this error:

    [edureka@localhost ~]$ sqoop import –connect jdbc:oracle:thin:system/system@192.168.93.1:1521:xe –username system -P –table system.Operator –columns “PID” –target-dir /sqoopoutput1 -m 1
    Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Enter password:
    15/05/29 19:28:56 INFO manager.SqlManager: Using default fetchSize of 1000
    15/05/29 19:28:56 INFO tool.CodeGenTool: Beginning code generation
    15/05/29 19:28:57 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: ORA-01017: invalid username/password; logon denied

    java.sql.SQLException: ORA-01017: invalid username/password; logon denied

    at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:120)
    at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:118)
    at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:224)
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:468)
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:410)
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:402)
    at oracle.jdbc.driver.T4CTTIoauthenticate.receiveOauth(T4CTTIoauthenticate.java:853)
    at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:428)
    at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:494)
    at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:199)
    at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:30)
    at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:494)
    at java.sql.DriverManager.getConnection(DriverManager.java:571)
    at java.sql.DriverManager.getConnection(DriverManager.java:215)
    at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:314)
    at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
    at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:660)
    at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:683)
    at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
    at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:223)
    at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:347)
    at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1277)
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1089)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:396)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
    15/05/29 19:28:57 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1095)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:396)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:238)

    • EdurekaSupport says:

      Hi Shweta,
      1. Yes, you are absolutely right!
      2. Please check again if you have given the password correctly.

  • Narendra says:

    I am getting below error while using sqoop :[edureka@localhost sqoop-1.4.4]$ sudo bin/sqoop import –connect jdbc:oracle:thin:system/evnbevnb@192.168.3.100:1521:xe–username system -P –table system.emp –columns “ID” –target-dir /usr/tmp -m 1
    Error: /usr/lib/hadoop does not exist!
    Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.

    • EdurekaSupport says:

      Hi Narendra, check again the command that you have specified. We suggest you to enter the command manually in case if you were copy pasting it.

      It should contain — before username, columns and target. sudo bin/sqoop import –connect jdbc:oracle:thin:system/evnbevnb@192.168.3.100:1521:xe –username system -P –table system.emp –columns “ID” –target-dir /usr/tmp -m 1

      Hope this helps!

      • Narendra says:

        It solved.Thank you.

  • Kunalkhanna says:

    I am getting Error: JAVA_HOME is not set and could not be found.
    Please help me to get it resolved. Thanks in advance.

    • EdurekaSupport says:

      Hi Kunal, you might be receiving this error as the Java path is not set. To overcome this please set the Java Path as shown below :
      Step 1: Open terminal and execute the command
      command: sudo gedit .bashrc

      Step 2: If it asks
      for password, type – cloudera

      Step 3: Copy the
      below line at the end of this file:
      export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24

      Note: The path is applicable if you are using Cloudera CDH3. If you are using any other cluster then update the path with the location where Java is installed

      Step 4: Save the file, restart the terminal and then run the commands.

      Hope this helps!!

  • Santhosh S says:

    Awesome Awanish!

    • EdurekaSupport says:

      Thanks, Santhosh!!

Join the discussion

Browse Categories

webinar REGISTER FOR FREE WEBINAR
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Subscribe to our Newsletter, and get personalized recommendations.