Convert PDF to XLS

0 votes

I want to convert the PDF files into CSV or XLS. I tried doing this by using the python tabula:

#!/bin/bash
#!/usr/bin/env python3
import tabula

# Read pdf into list of DataFrame
df = tabula.read_pdf("File1.pdf", pages='all')

# convert PDF into CSV file
tabula.convert_into("File1.pdf", "File1.csv", output_format="csv", pages='all')

# convert all PDFs in a directory
#tabula.convert_into_by_batch("input_directory", output_format='csv', pages='all')
Dec 15, 2022 in Others by Kithuzzz
• 38,010 points
449 views

1 answer to this question.

0 votes

To ensure that data import makes sense, we must adjust tabula's parameters in accordance with the situation. The specifications I offered in the comments were merely illustrative. We must either utilise the premium version of Acrobat or some trials in order to get columns that start on the x-axis.

so code would be like

Import and setup

import tabula
import pandas as pd
pdf_file='file1.pdf'
column_names=['Product','Batch No','Machin No','Time','Date','Drum/Bag No','Tare Wt.kg','Gross Wt.kg',
              'Net Wt.kg','Blender','Remarks','Operator']
df_results=[] # store results in a list

We must handle the pages independently because they are not all in the same format. Additionally, some cleanup must be done, such as removing unnecessary columns and data after a particular value (refer in page 2 processing)

# Page 1 processing
try:
    df1 = tabula.read_pdf(pdf_file, pages=1,area=(95,20, 800, 840),columns=[93,180,220,252,310,315,333,367,
                                                                          410,450,480,520]
                         ,pandas_options={'header': None}) #(top,left,bottom,right)
    df1[0]=df1[0].drop(columns=5)
    df1[0].columns=column_names
    df_results.append(df1[0])
    df1[0].head(2)
    
except Exception as e:
    print(f"Exception page not found {e}")
# Page 2 processing
try:
    df2 = tabula.read_pdf(pdf_file, pages=3,area=(10,20, 800, 840),columns=[93,180,220,252,310,315,330,370,
                                                                          410,450,480,520]
                         ,pandas_options={'header': None}) #(top,left,bottom,right)

    row_with_Sta = df2[0][df2[0][0] == 'Sta'].index.tolist()[0]
    df2[0] = df2[0].iloc[:row_with_Sta]
    df2[0]=df2[0].drop(columns=5)
    df2[0].columns=column_names
    df_results.append(df2[0])
    df2[0].head(2)
except Exception as e:
    print(f"Exception page not found {e}")
#result = pd.concat([df1[0],df2[0]]) # concate both the pages and then write to CSV
result = pd.concat(df_results) # concate list of pages and then write to CSV
result.to_csv("result.csv")

Note:

answered Dec 15, 2022 by narikkadan
• 63,420 points

Related Questions In Others

0 votes
4 answers

How to Convert EML to PDF?

Hi, I think you need to check out ...READ MORE

answered Aug 12, 2020 in Others by Steve
• 200 points
2,820 views
0 votes
1 answer

Convert Excel and Word files to PDF Using ruby

 You can combine some: For excel files - ...READ MORE

answered Sep 26, 2022 in Others by narikkadan
• 63,420 points
922 views
0 votes
1 answer

Convert Excel to PDF issue with documents4j

MS Excel may not always be used ...READ MORE

answered Sep 26, 2022 in Others by narikkadan
• 63,420 points
1,103 views
0 votes
1 answer

Codeigniter convert excel file to pdf

This is a basic php script for ...READ MORE

answered Sep 27, 2022 in Others by narikkadan
• 63,420 points
1,097 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,051 views
0 votes
1 answer
0 votes
1 answer

How to convert .xls to .pdf via PHP?

You can use PHPExcel to read the XLS ...READ MORE

answered Nov 8, 2022 in Others by narikkadan
• 63,420 points
739 views
0 votes
1 answer

Convert Word doc, docx and Excel xls, xlsx to PDF with PHP

After receiving a request, I'll put the ...READ MORE

answered Nov 20, 2022 in Others by narikkadan
• 63,420 points
946 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP