Parsing from Excel multisheet file List comprehension between columns

0 votes

I'm attempting to parse a large Excel file with numerous sheets. Each sheet has a column that contains the following data (three sheets = three columns):

ReceivedEmail    OpenedEmail    ClickedURL
aaaa@aaa.com     gggg@aaa.com   aaaa@aaa.com
bbbb@aaa.com     dddd@aaa.com   rrrr@aaa.com
cccc@aaa.com     rrrr@aaa.com
dddd@aaa.com     aaaa@aaa.com
eeee@aaa.com     oooo@aaa.com
ffff@aaa.com
gggg@aaa.com
rrrr@aaa.com
qqqq@aaa.com
oooo@aaa.com

What I'm looking for is a single table that keeps the first column of sheet one, the one with all the information about ReceivedEmail (persons we mass e-mailed). Instead of repeating the emails in the successive sheets' first columns, I want to utilize list comprehension to determine whether OpenedEmail is present in ReceivedEmail and output 1 otherwise.

Here's what I did so far:

import pandas as pd
xl = pd.ExcelFile(path_to_file)
xl.sheet_names
['ReceivedEmail', 'OpenedEmail', 'ClickedURL']
df = xl.parse(sheet_name=xl.sheet_names[0], header=None)
df.rename(columns={df.columns[0]:xl.sheet_names[0]}, inplace=True);
df.columns[0]
['ReceivedEmail']
# then I created a buffer dataframe to check next columns
df_buffer = xl.parse(sheet_name=xl.sheet_names[1], header=None)
df_buffer.rename(columns={df_buffer.columns[0]:xl.sheet_names[1]}, inplace=True);

But then when I run list comprehension like this:

df[df_buffer.columns[0]] = [1 if x in df[df.columns[0]] else 0 for x in df_buffer[df_buffer.columns[0]]]

I get an error:

ValueError: Length of values does not match length of index

Any clue how to solve this error or handle the problem in a smart way? I am doing manually to see if it works, then I could do a looping later, but I am stuck with the error.

End result should be:

ReceivedEmail    OpenedEmail    ClickedURL
aaaa@aaa.com         1              1
bbbb@aaa.com         0              0      
cccc@aaa.com         0              0     
dddd@aaa.com         1              0
eeee@aaa.com         0              0    
ffff@aaa.com         0              0  
gggg@aaa.com         1              0
rrrr@aaa.com         1              1
qqqq@aaa.com         0              0
oooo@aaa.com         1              0
Nov 25, 2022 in Others by Kithuzzz
• 38,020 points
574 views

1 answer to this question.

0 votes

You can use read_excel with parameter sheetname=None for return all sheets to ordered dictionary of DataFrames:

Notice:

Each sheet have one column.

dfs = pd.read_excel('file.xlsx', sheetname=None)
print (dfs)
OrderedDict([('ReceivedEmail',               a
0  aaaa@aaa.com
1  bbbb@aaa.com
2  cccc@aaa.com
3  dddd@aaa.com
4  eeee@aaa.com
5  ffff@aaa.com
6  gggg@aaa.com
7  rrrr@aaa.com
8  qqqq@aaa.com
9  oooo@aaa.com), ('OpenedEmail',               a
0  gggg@aaa.com
1  dddd@aaa.com
2  rrrr@aaa.com
3  aaaa@aaa.com
4  oooo@aaa.com), ('ClickedURL',               a
0  aaaa@aaa.com
1  rrrr@aaa.com)])

Then join together and change order by subset [] and for each column from second check membership by isin, last convert boolena mask to integers.

cols = list(dfs.keys())
df = pd.concat({k: v.iloc[:, 0] for k, v in dfs.items()}, axis=1)[cols]

df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: df.iloc[:, 0].isin(x)).astype(int)
print (df)
  ReceivedEmail OpenedEmail ClickedURL
0  aaaa@aaa.com           1          1
1  bbbb@aaa.com           0          0
2  cccc@aaa.com           0          0
3  dddd@aaa.com           1          0
4  eeee@aaa.com           0          0
5  ffff@aaa.com           0          0
6  gggg@aaa.com           1          0
7  rrrr@aaa.com           1          1
8  qqqq@aaa.com           0          0
9  oooo@aaa.com           1          0
answered Nov 25, 2022 by narikkadan
• 63,720 points

Related Questions In Others

0 votes
1 answer

Removing duplicates from Excel rows by adding values of some columns

Copy the first three columns to another ...READ MORE

answered Sep 23, 2022 in Others by narikkadan
• 63,720 points
838 views
0 votes
1 answer

How can I convert excel to PDF by Libreoffice and keep all format from excel file?

"Times New Roman" typeface does not have ...READ MORE

answered Oct 3, 2022 in Others by narikkadan
• 63,720 points
1,599 views
0 votes
1 answer

Download file from URL in Excel 2019 (it works on Excel 2007)

The Sub Code looks fine. Check the ...READ MORE

answered Oct 7, 2022 in Others by narikkadan
• 63,720 points
1,720 views
0 votes
1 answer

Excel: How to merge two columns into one (from different sheets or separated columns)

This equation is completely adjustable. Your two ...READ MORE

answered Oct 7, 2022 in Others by narikkadan
• 63,720 points
786 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,358 views
0 votes
1 answer
0 votes
1 answer

How to find out how many rows and columns to read from an Excel file with PHPExcel?

Solution: $file_name = htmlentities($_POST['file_name']); $sheet_name = htmlentities($_POST['sheet_name']); $number_of_columns = htmlentities($_POST['number_of_columns']); $number_of_rows ...READ MORE

answered Oct 23, 2022 in Others by narikkadan
• 63,720 points
7,860 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP