Need help using Joins in Pandas using Python

0 votes

Hello all, 

My problem is particularly with the left out join. I already know that the resulting take should always have fewer rows than the corresponding left table, right?

Here's a scenario:

My left table is 200000 rows and 8 columns.

My right table is 50000 rows and 5 columns.

Note that the left table has the field id which basically matches the data with a corresponding column which is in the right table and I call it key.

For merging these two, I make use of the following code:

combined = pd.merge(a,b,how='left',left_on='id',right_on='key')

And the combined shape becomes 250000.

Is there anything that I am doing wrong here?

Jan 24, 2019 in Python by Anirudh
• 2,080 points
448 views

1 answer to this question.

0 votes

Hi, there is one scenario where the number increases. This is if the keys match more than one single time of the same row in the different dataframe.

Check out this below:

In [11]: df = pd.DataFrame([[1, 3], [2, 4]], columns=['A', 'B'])

In [12]: df2 = pd.DataFrame([[1, 5], [1, 6]], columns=['A', 'C'])

In [13]: df.merge(df2, how='left')  # merges on columns A
Out[13]: 
   A  B   C
0  1  3   5
1  1  3   6
2  2  4 NaN

So, what we usually do to avoid this is we sure we drop the duplicates in the df2 by using the following piece of code:

In [21]: df2.drop_duplicates(subset=['A'])  # you can use take_last=True
Out[21]: 
   A  C
0  1  5

In [22]: df.merge(df2.drop_duplicates(subset=['A']), how='left')
Out[22]: 
   A  B   C
0  1  3   5
1  2  4 NaN

Hope this helped!

answered Jan 24, 2019 by Nymeria
• 3,560 points

Related Questions In Python

0 votes
1 answer

Need help with Tkinter window formatting using Python

Tkininter comes with the columnspan argument to span the labels ...READ MORE

answered Sep 7, 2018 in Python by aryya
• 7,450 points
631 views
0 votes
1 answer

Need help installing easy_install in Python 2.7.1 on Windows 7

That tool is part of the setuptools ...READ MORE

answered Dec 26, 2018 in Python by Nymeria
• 3,560 points
936 views
0 votes
1 answer

Need help extracting a schema to make use for an avro file in Python

Hi, nice question. So what I daily use ...READ MORE

answered Jan 10, 2019 in Python by Nymeria
• 3,560 points
4,580 views
0 votes
1 answer

Need help checking the validity of an image file in Python

I went through the Python documentation and ...READ MORE

answered Jan 18, 2019 in Python by Nymeria
• 3,560 points
1,925 views
0 votes
1 answer

Need help with making use of Pluck in Python

Hi, good question. Easy solution to be ...READ MORE

answered Jan 24, 2019 in Python by Nymeria
• 3,560 points
1,460 views
0 votes
1 answer
–1 vote
1 answer
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP