Unable to use ml library in pyspark

>>> from pyspark.ml.feature import Tokenizer
Traceback (most recent call last):
File "", line 1, in 
File "/usr/lib/spark-2.1.1-bin-hadoop2.7/python/pyspark/ml/__init__.py", line 22, in 
from pyspark.ml.base import Estimator, Model, Transformer
File "/usr/lib/spark-2.1.1-bin-hadoop2.7/python/pyspark/ml/base.py", line 21, in 
from pyspark.ml.param import Params
File "/usr/lib/spark-2.1.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py", line 26, in 
import numpy as np
ImportError: No module named numpy
Jul 30, 2019 in Apache Spark by Yashita

The error message you have shared with us we can see the error is related to numpy package we suggest you to follow the commands below in your terminal to first install pip and then numpy after this try to import Tokenizer

1. Add the EPEL Repository
Pip is not available in CentOS 7 core repositories. To install pip we need to enable the EPEL repository:

sudo yum install epel-release

2. Install pip
Once the EPEL repository is enabled we can install pip and all of its dependencies with the following command:

sudo yum install python-pip

3. Verify Pip installation
To verify that the pip is installed correctly run the following command which will print the pip version:

pip --version

After this use 

pip install numpy 

to install numpy package​

Hope this helps!

answered Jul 30, 2019 by Karan

