Sqoop creating new file for import evertime

Question

I am using the below query to do incremental import and it is creating a new file everytime... like part-m-000 00 is already present and if there are no new records in the table,,, it still creates a new file part-m-00001 and gets all the records again in the file

sqoop import --connect jdbc:mysql://mysqldb.edu.cloudlab.com/userDb --username=labuser --password=letmepass -m 1 --table employee
e -target-dir "/user/hduser/user_Test1" --incremental append --check-column id

I assume that if there is no change in the table data then it should not create any new file and if there is a change it should only add new rows.
But in my case, it is importing all the rows again in a new file.
Please let me know if I am doing anything wrong.

Omkar · Answer 1 · Jan 8, 2019

No, you are not doing anything wrong but unfortunately, that's the way sqoop works. Every time you make any changes in the table and then do incremental append then the new data is saved in a new file.

That's how batch processing works in all hadoop technologies.

If you want to create additional part files with only new records, then you may have to use one more parameter --last-value in the sqoop command. Please refer below command.

Command: sqoop import --connect jdbc:mysql://mysqldb.edu.cloudlab.com/MonikaDb --username=labuser --password=edureka -m 1 --table employee -target-dir "/user/edureka_425270/Monika_Test1" --incremental append --check-column id --last-value 2 4