Sqoop incremental append by date

Question

Suppose I want to do incremental append on the basis of date, how do you mention date for incremental last-modified import?

Omkar · Answer 1 · Feb 20, 2019

Consider a table with 3 records which you already imported to hdfs using sqoop

+------+------------+----------+------+------------+

| sid | city | state | rank | rDate |

+------+------------+----------+------+------------+

| 101 | Chicago | Illinois | 1 | 2014-01-25 |

| 101 | Schaumburg | Illinois | 3 | 2014-01-25 |

| 101 | Columbus | Ohio | 7 | 2014-01-25 |

+------+------------+----------+------+------------+

sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P

Now you have additional records in the table but no updates on existing records

+------+------------+----------+------+------------+

| sid | city | state | rank | rDate |

+------+------------+----------+------+------------+

| 101 | Chicago | Illinois | 1 | 2014-01-25 |

| 101 | Schaumburg | Illinois | 3 | 2014-01-25 |

| 101 | Columbus | Ohio | 7 | 2014-01-25 |

| 103 | Charlotte | NC | 9 | 2013-04-22 |

| 103 | Greenville | SC | 9 | 2013-05-12 |

| 103 | Atlanta | GA | 11 | 2013-08-21 |

+------+------------+----------+------+------------+

Here you should use an bwith --check-column which specifies the column to be examined when determining which rows to import.

sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --check-column rank --incremental append --last-value 7

The above code will insert all the new rows based on the last value.

Now we can think of second case where there are updates in rows

+------+------------+----------+------+------------+

| sid | city | state | rank | rDate |

+------+------------+----------+------+------------+

| 101 | Chicago | Illinois | 1 | 2015-01-01 |

| 101 | Schaumburg | Illinois | 3 | 2014-01-25 |

| 101 | Columbus | Ohio | 7 | 2014-01-25 |

| 103 | Charlotte | NC | 9 | 2013-04-22 |

| 103 | Greenville | SC | 9 | 2013-05-12 |

| 103 | Atlanta | GA | 11 | 2013-08-21 |

| 104 | Dallas | Texas | 4 | 2015-02-02 |

| 105 | Phoenix | Arzona | 17 | 2015-02-24 |

+------+------------+----------+------+------------+

Here we use incremental last modified where we will fetch all the updated rows based on date.

sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --check-column rDate --incremental lastmodified --last-value 2014-01-25 --target-dir yloc/loc