Reading different format files from s3 having decoding issues using boto3

0 votes
Am Trying to read text from different format files such as pdf,docx,doc,rtf from s3 using boto3.

import boto3

s3 = boto3.client('s3')

bucket = 'my-bucket'
#key = 'file2.doc'
#key =  'file3.docx'
key = 'file1.pdf'  

obj = s3.get_object(Bucket=bucket, Key=key)

file_content = obj['Body'].read().decode('utf-8')
print(file_content)

AM not getting actual file text properly.i did tried converting to binary to text and also different encodind formats but none worked.is there any way which works for all files??
May 17, 2024 in Python by anonymous

edited Mar 5 1,186 views