Why we use --split by command in Sqoop

0 votes

Why we use --split-by command in Sqoop?

Apr 11, 2019 in Big Data Hadoop by rashmi
9,625 views

2 answers to this question.

0 votes

The command --split-by is used to specify the column of the table used to generate splits for imports. This means that it specifies which column will be used to create the split while importing the data into the cluster.

Basically it is used to improve the import performance to achieve faster parallelism.

answered Apr 11, 2019 by Gitika
• 65,850 points
+2 votes
In simple explanation,

When specify SPLIT_BY only for sqoop'ing the whole table [which would be select * from table]. As this not condition based we need a logic to divide the data and process them in more than one node.

So we select a column based on which data can be divided base on a range [for integers] & characters [alphabets], to avoid huge amount of the data being concentrated on one mapper/node.

Bigdata is specifically meant to achieve parallelism.
answered Feb 6, 2020 by Ramji Sridaran

Related Questions In Big Data Hadoop

0 votes
1 answer

Why we use 'help' command in Hadoop Sqoop?

Hi, The command sqoop help lists the tools ...READ MORE

answered Feb 4, 2020 in Big Data Hadoop by MD
• 95,340 points
263 views
+1 vote
1 answer

Why do we use STORE command in pig?

Hey, We use store command to store the ...READ MORE

answered May 7, 2019 in Big Data Hadoop by Gitika
• 65,850 points
1,143 views
0 votes
1 answer

Why should we use "distinct" keyword in pig script?

Hey, The "distinct" statement is very simple. It ...READ MORE

answered May 3, 2019 in Big Data Hadoop by Gitika
• 65,850 points
270 views
0 votes
1 answer

Why we use Relation keyword in pig?

Hey, In pig, Relation represents a complete database. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by Gitika
• 65,850 points
196 views
0 votes
1 answer
0 votes
1 answer

Why we are configuring mapred.job.tracker in YARN?

I really dont know the reason behind ...READ MORE

answered Mar 29, 2018 in Big Data Hadoop by Ashish
• 2,650 points
1,002 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
8,620 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
1,521 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
76,077 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
3,306 views
webinar REGISTER FOR FREE WEBINAR X
Send OTP
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP