UiPath I want to capture specific data from unstructured scanned PDF files Invoices and export data in excel sheet

Question

We have business invoices in form of scanned PDFs and PDFs are from different vendors so they are different to each other, we want to export all data like Invoice Number, Invoice Date, Items details in tabular form.
I am using UiPath RPA tool for this problem.
Thanks,

Ashish Soni

Sir,I have Bluprism platform.So it is helpful to me if you find the solution for this problem in Blueprism. — Subikshaa, Mar 25, 2020
Hey @Subhiksha, If u want to extract data from PDF file to excel then,

map it open through a browser,

then select through which PDF file need to extract data from,

u can read the data from the pdf after u spy those data with html or region mode, — Mar 26, 2020

Sirajul · Answer 1 · Mar 20, 2020

You could do the following:

Install UiPath.PDF.Activities.
Once you install that package, you will be able to see PDF activities in activities pane.
You can use Read PDF Text or Read PDF Text with OCR activities for your requirement.
You can then write it into excel using write range or write cell activity

For more info refer to https://www.edureka.co/blog/uipath-pdf-data-extraction/

answered Mar 20, 2020 by Sirajul
• 59,230 points

Thanks for answer..these activities will fetch whole document data but my requirement is to fetch specific data whose position is not fixed in PDF. For example in one PDF invoice number displayed on right top and other PDF it shows on middle of the document. My problem is candidate for IntelligentOCR I guess. I am not sure.

commented Mar 20, 2020 by Ashish
• 200 points

I guess you should probably use GET OCR TEXT activity you will able to find specific field value. Have a look at this for more details: https://docs.uipath.com/studio/docs/example-of-using-ocr-and-image-automation

commented Mar 20, 2020 by Sirajul
• 59,230 points

I used Get OCR Text and it successfully fetches specific value BUT how can we fetch table having multiple row/column from PDF. Because this activity provide facility to select specific text. Suppose in one PDF only one product is there and other PDF two or more than two products are listed. In this case script will fail.

PDF #1 has following table: