Bottleneck while uploading lots of files to GCP bucket in a small time

0 votes

So I have a GCP bucket and I have to upload files to it. The issue is I have 10 million files I want to upload into the bucket (each file size is 50kb) and I have a time constraint of 8 hours or fewer. Currently, I am using a Java program (google ref code) and tested it on 1000 images and it uploads each file in about 300 milliseconds, but if I use multi-threading; I have been able to reduce the average time to 40 milliseconds (using 20 threads). I can go up to 60 threads and reduce the time further to 15-20 milliseconds but then also I face 3 problems:

  1. 20 milliseconds per file isn’t fast enough. I need it to be at least 3 milliseconds or fewer.

  2. It throws “com.google.cloud.storage.StorageException: Connect timed out,” exception when I exceed 25 threads.

  3. Going beyond 60 threads, the programs don’t seem to get any faster (I am guessing hardware constraint ).

Mar 14, 2022 in GCP by Rahul
• 3,380 points
951 views

1 answer to this question.

0 votes
You might have a hotspot on Cloud Storage. Add a hash in your filename before the sequential sequence.

Change the following parameters in .boto (/User/Username/.boto)

parallel_composite_upload_threshold = 120

parallel_composite_upload_component_size = 50 M
answered Mar 17, 2022 by Korak
• 5,820 points

Related Questions In GCP

0 votes
1 answer

How to create a cloud storage bucket in GCP?

Buckets are the basic containers that hold your ...READ MORE

answered Oct 24, 2019 in GCP by Sirajul
• 59,230 points
2,182 views
0 votes
1 answer
0 votes
1 answer

Connect to an instance as a root user in GCP

If you configured an instance to allow ...READ MORE

answered Sep 24, 2019 in GCP by Sirajul
• 59,230 points
4,852 views
0 votes
1 answer

How to allow outbound traffic on a custom port in gcp?

To allow outbound traffic through a custom ...READ MORE

answered Sep 26, 2019 in GCP by Sirajul
• 59,230 points
4,371 views
0 votes
1 answer

Is it possible to rename a project in GCP?

Yes, it is possible to rename your ...READ MORE

answered Sep 27, 2019 in GCP by Sirajul
• 59,230 points
20,594 views
0 votes
1 answer

Unable to connect through SSH with VS-Code in Mac to a remote GCP VM

It looks to be a specific error ...READ MORE

answered Mar 20, 2022 in GCP by Korak
• 5,820 points
1,315 views
0 votes
1 answer

GCP - Initate a shutdown to an instance after certein time when it started (for example 3 hours after started)

You can add the contents of a ...READ MORE

answered Mar 31, 2022 in GCP by Korak
• 5,820 points
672 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP