Unziiping a tar gz file in aws s3 bucket and upload it back to s3 using lambda

0 votes

I need to unzip 24 tar.gz files coming in my s3 bucket and upload it back to another s3 bucket using lambda or glue, it should be serverless the total size for all the 24 files will be maxing 1 GB. Is there any way I can achieve that, Below is the lambda function which uses s3 even based trigger to unzip the files, but I am not able to achieve the result.

import boto3
import botocore
import tarfile

from io import BytesIO
s3_client = boto3.client('s3')
def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    input_tar_file = s3_client.get_object(Bucket = bucket, Key = key)
    input_tar_content = input_tar_file['Body'].read()
    uncompressed_key='result_files/'
    with tarfile.open(fileobj = BytesIO(input_tar_content)) as tar:
        for tar_resource in tar:
            if (tar_resource.isfile()):
                inner_file_bytes = tar.extractfile(tar_resource).read()
                s3_client.upload_fileobj(BytesIO(bytes_content), Bucket =bucket,Key=uncompressed_key)

It's saying bytes_content not defined. Is it possible to use lambda aur glue to get a solution for this problem? Any help will be much appreciated.

Dec 3, 2020 in AWS by khyati
• 190 points

edited Dec 3, 2020 by MD 20,334 views

1 answer to this question.

+1 vote

Hi@khyati,

You can do your task using lambda. Go through the AWS Certification, you may get the idea.

answered Dec 3, 2020 by MD
• 95,460 points

Thanks,Right now I am also trying to do same thing addding s3 based event trigger through below lambda code.

from io import BytesIO
s3_client = boto3.client('s3')
def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    input_tar_file = s3_client.get_object(Bucket = bucket, Key = key)
    input_tar_content = input_tar_file['Body'].read()
    uncompressed_key='result_files/'
    with tarfile.open(fileobj = BytesIO(input_tar_content)) as tar:
        for tar_resource in tar:
            if (tar_resource.isfile()):
                inner_file_bytes = tar.extractfile(tar_resource).read()
                s3_client.upload_fileobj(BytesIO(bytes_content), Bucket =bucket,Key=uncompressed_key)
But the code aboove is not working as expected can you please help on this part.
Hi@kyati,

Can you tell me what it is showing in the output terminal?
Hi MD its showing name bytes_content not defined.I am not sure where exactly I am missing something can you please suggest me some way on this to uzip a tar file in s3 using lambda
Hi@khyati,

Why you are using the bytes_content variable? Did you define this keyword in your whole code?
Okay thanks MD  yes i did not defined  i am using inner file bytes now which i defined but files are not getting extracted properly in destination bucket however  lambda funnction does not throw any error  can you suggest if there some changes required in code level.

Related Questions In AWS

0 votes
1 answer

How to download the latest file in a S3 bucket using AWS CLI?

You can use the below command $ aws ...READ MORE

answered Sep 6, 2018 in AWS by Archana
• 4,170 points
21,021 views
0 votes
0 answers

How to upload a file in to aws s3 by using programmatically??

Sep 13, 2019 in AWS by anonymous

closed Sep 16, 2019 by Kalgi 4,244 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
+1 vote
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP