Need help bootstrapping Python module installation on Amazon EMR

Hi all, pretty simple question.

My requirement is that I need to make use of a Spark cluster through an EMR console. I will be running a Spark script that has the local dependency on a certain Python package.

What is the easiest way to go about doing this?

All help appreciated.
Feb 11, 2019 in Python by Anirudh
The easiest way to definitely do this is to create a bash script primarily. This script needs to contain your installation commands.

Later, you need to copy it to the S3 and set up a bootstrap action to point to the script (This is done from the console)

Consider the following example:

#!/bin/bash -xe

# Non-standard and non-Amazon Machine Image Python modules:
sudo pip install -U \
  awscli            \
  boto              \
  ciso8601          \
  ujson             \

sudo yum install -y python-psycopg2

Hope this helped!

answered Feb 11, 2019 by Nymeria
i tried but i got this error

sudo: easy_install-3.4: command not found
sudo: /usr/local/bin/pip3: command not found

Hello @ akash,

First of all: try pip3 instead of pip


pip3 --version
pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

pip3 should be installed automatically together with Python3.x. The documentation hasn't been updated, so simply replace pip by pip3 in the instructions, when installing Flask for example.

Now, if this doesn't work, you might have to install pip separately.

Hope it helps!!
Thank you!

edited Jul 8, 2019 by Kalgi
