How to build docker images for multiple Sagemaker training applications

0 votes

Given one ML repository contains 3 training applications, briefly,

root
  |__Dockerfile
  |__requirements.txt (contains **heavy dependencies**, e.g., numpy, sklearn, etc. needed for all 3 apps)
  |__app_0
  |    |__training_0.py
  |    |__Dockerfile0
  |__app_1
  |    |__training_1.py
  |    |__Dockerfile1
  |__app_2
  |    |__training_2.py
  |    |__Dockerfile2
  |__heavy_utils
       |__utils.py

There are two approaches to build app_0, app_1 and app_2.

  1. One container for multiple apps - Build one container using the Dockerfile at root location. There will be some COPY commands at the end of the Dockerfile,
COPY app_0 .
COPY app_1 .
COPY app_2 .
  1. Multiple containers for multiple apps - Build multiple containers using individual Dockerfile$i inside app_$i.

I tried both approaches with pros and cons.

  1. One container for multiple apps

Pros: When uploading the image to AWS ECR, The file size is optimized as all 3 apps share some dependencies. Cons: When I plug in the container to Sagemaker training jobs, Sagemaker cannot recognize all 3 apps because docker building only allows one ENTRYPOINT in Dockerfile.

  1. Multiple containers for multiple apps

Pros: I can give different ECR images to Sagemaker training jobs with individual ENTRYPOINT specified. Cons: Duplicate dependencies across those ECR images.

I'd like to learn,

  1. Which one is more conventional or if any other better practice?
  2. Can I specify a custom ENTRYPOINT for Sagemaker training job (like the processing job) after the docker has been built? Specifically, I'm using Sagemaker SDK (sagemaker.estimator.Estimator) to build a Sagemaker pipeline. AFAIK, the entry_point option is only effective outside the container, i.e., run an external script from local or S3, which has a different behavior than entrypoint in sagemaker.processing.Processor.
Dec 13, 2022 in AWS by Ashwini
• 2,960 points
44 views

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In AWS

0 votes
1 answer
0 votes
3 answers
+1 vote
3 answers

How to get ARN for s3 Bucket using aws cli .

An ARN is a non-opaque, constructible identifier, ...READ MORE

answered Aug 16, 2018 in AWS by Priyaj
• 58,100 points
4,049 views
0 votes
1 answer

How to Enable CORS for an AWS API Gateway Resource

Since you're using Lambda Proxy integration for ...READ MORE

answered Jul 10, 2018 in AWS by Hammer
• 360 points
3,351 views
0 votes
1 answer
+2 votes
1 answer
+2 votes
1 answer

Deploy Docker Containers from Docker Cloud

To solve this problem, I followed advice ...READ MORE

answered Sep 3, 2018 in AWS by Priyaj
• 58,100 points
1,981 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP