Please describe the best method to extract specific text from a PDF and input that text into a C application text field

I would like to know the best method of extracting specific data (Text) from a PDF and inputting that data (Text) into a text field of a Tax software application.

A user will fill in the PDF form in different sections as "answers". EG - 1/ Question... [Text Answer]

I will need to copy or extract answer 1 and input answer 1 into the correct text field of the Tax software at 1/ Answer 1.

Any advice is appreciated.

Sep 14, 2020 in RPA by Chris
• 130 points • 4,738 views

That all works great. No issues getting the text in bulk or specific areas of a PDF into a variable for display in a "Message Box". so step 1 is complete.

step 2/ Entering the text into a Tax Software application. Are you able to provide a link or reference for me to have a look at please?

I (think) I will need to loop the actions of the extracting of text from the PDF and inputting of text into the tax software multiple times until all the answers are complete and filled in.

Thanks

commented Sep 15, 2020 by Chris
• 130 points

If pdfs is structured format pdf integration and object cloning.

commented Sep 15, 2020 by Lokesh chandra

Im not sure how that would work.

If I have the PDF data (Text) I need in variables of type string including the int32 I convert to string, is it ok to use the "Type To" Activity so select the area in the Business application to send and display the string variable that has been captured from the PDF?

I have captured this from the PDF

Used this to display in Message Box just for testing:

"D1Code : " + D1Code.ToString+Environment.NewLine +"D1Total: "+D1Total.ToString+Environment.NewLine +"D2Total :"+D2Total.ToString+Environment.NewLine
+"D2Comments :"+ D2Comments.ToString+Environment.NewLine +"D3Total :"+ D3Total.ToString+Environment.NewLine

Are there different options?

commented Sep 15, 2020 by Chris
• 130 points

Hey @Chris did that approach work?

commented Sep 15, 2020 by Sirajul
• 59,190 points

Hi,

It worked in "Notepad" as a test. I went backwards though. When trying to grab the specific text in the PDF I am now getting different errors like "Legacy Chrome ..." (even If I am using Edge)

and this one when I try to open the pdf and read using Edge.

So not sure how to fix this one. Is it a dependency? What is the best option for opening the pdf. As in should I use Chrome or Edge or other? Adobe Reader didnt seem to work either.

Thanks

commented Sep 16, 2020 by Chris
• 130 points

@Chris, could you please post the complete error that you have encountered?

commented Sep 16, 2020 by Sirajul
• 59,190 points

Steps broken down.

Here is a snip if the PDF. The Red Boxes indicate the areas users can enter text and the areas I wish to GET. I made the PDF so I can change it to help the situation if needed.

I have stripped this to the bare bones. Process-Sequence

Activate:

Variables set as strings or Int32 and converted via ToString in MessageBox

I have used Edge, Chrome, AVG Browser, and Adobe. I can use what is recommended.

Result - The Message Box is either displayed and the end of compiling but is empty or it has the words “Chrome Legacy Window”. If I grab the Full text it works and included the text from the users. No Problems. If I use OCR screen scraping with Teseract only it seems to work.

I am open to suggestions for the best way to do this as I have flexibility to do best practise for best results.

Keeping in mind I need to put the results or variables into another application after this.

commented Sep 17, 2020 by Chris
• 130 points

You are getting this exception coz most probably you used a partial selector. Try using a complete selector instead of a partial selector.

Hope this helps!

commented Sep 17, 2020 by Vivek

The selector details didnt make any difference but I added a hot key stroke to make the PDF "Actual Size" before reading the text. This has worked consistently now. I can display the text (captured in a variable) in a Message Box or Text File.

Part 2/ I can open the Business Tax Application and navigate, and input the text in the correct field but it seems to loose focus after the 1st input of text and it also wont save in the application. It is not consistent either as it as sometimes it inputs the text in the wrong spot.

I am using "Type Into" activity. This seems to work but again the UI element seems to loose focus. I have checked the selectors on them and they have been validated. I have played with clicking in the element first or tabbing through to the correct field. Get inconsistent results.

any ideas (No errors) to help make this solid and work 100% of the time?