How to read a dataframe from an excel sheet containing multiple tables

0 votes

In an excel sheet I have several tables, each with a unique set of column names. The illustration below demonstrates how the tables are laid out on the sheet:

What additional options are there other manually setting the row and column values to be read if I want to read the summary table into a pandas dataframe?

Apr 4 in Others by narikkadan
• 63,160 points
945 views

1 answer to this question.

0 votes

Here is one option with scikit-image (highly inspired by @Roshan's answer) :

import numpy as np
import pandas as pd
​
#pip install scikit-image
from skimage.measure import label, regionprops
​
df = pd.read_excel("wbook.xlsx", sheet_name="Sheet1", header=None)
​
larr = label(np.array(df.notnull()).astype("int"))
​
list_dfs = []

for s in regionprops(larr):
    sub_df = (df.iloc[s.bbox[0]:s.bbox[2], s.bbox[1]:s.bbox[3]]
                .pipe(lambda df_: df_.rename(columns=df_.iloc[0])
                .drop(df_.index[0])))

    list_dfs.append(sub_df)

Output :

  col1 col2 # <- first DataFrame
2    1   aa
3    2   bb <class 'pandas.core.frame.DataFrame'>

  col3 col4 # <- second DataFrame
7    3   cc
8    4   dd <class 'pandas.core.frame.DataFrame'>

   col5 col6 # <- third DataFrame
9     5   ee
10    6   ff <class 'pandas.core.frame.DataFrame'>

Spreadsheet used :

enter image description here

answered Apr 4 by Kithuzzz
• 38,010 points

Related Questions In Others

0 votes
1 answer

How to unmerge multiple cells and transpose each value into a new column in Pandas dataframe from excel file

Try this: df = pd.read_excel("Sample_File.xlsx", header=[0,1,2,3,4,5], index_col = ...READ MORE

answered Jan 8 in Others by narikkadan
• 63,160 points
1,299 views
0 votes
1 answer

How to find out how many rows and columns to read from an Excel file with PHPExcel?

Solution: $file_name = htmlentities($_POST['file_name']); $sheet_name = htmlentities($_POST['sheet_name']); $number_of_columns = htmlentities($_POST['number_of_columns']); $number_of_rows ...READ MORE

answered Oct 23, 2022 in Others by narikkadan
• 63,160 points
5,466 views
0 votes
1 answer

How to save a new sheet in an existing excel file, using Pandas?

import pandas as pd import numpy as np path ...READ MORE

answered Dec 10, 2022 in Others by narikkadan
• 63,160 points
4,731 views
0 votes
1 answer
0 votes
1 answer

Convert column in excel date format (DDDDD.tttt) to datetime using pandas

Given # s = df['date'] s 0 ...READ MORE

answered Oct 2, 2022 in Others by narikkadan
• 63,160 points
2,475 views
0 votes
1 answer

How to freeze the top row and the first column using XlsxWriter?

You can use worksheet.freeze_panes() to achieve this . There ...READ MORE

answered Oct 11, 2022 in Others by narikkadan
• 63,160 points
1,193 views
0 votes
1 answer

Compare 2 columns in same excel sheet in pandas

Try this: import pandas as pd import numpy as ...READ MORE

answered Dec 16, 2022 in Others by narikkadan
• 63,160 points
1,542 views
0 votes
1 answer

Export DataFrame timedelta column to timestamp Excel column

The reason that the column format isn't ...READ MORE

answered Feb 6 in Others by narikkadan
• 63,160 points
621 views
0 votes
1 answer

Excel VBA: how to find a description from an AD-group

First add the 'description' property to your ...READ MORE

answered Feb 16 in Others by Kithuzzz
• 38,010 points
332 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP