Unable to use ml library in pyspark

0 votes
>>> from pyspark.ml.feature import Tokenizer
Traceback (most recent call last):
File "", line 1, in 
File "/usr/lib/spark-2.1.1-bin-hadoop2.7/python/pyspark/ml/__init__.py", line 22, in 
from pyspark.ml.base import Estimator, Model, Transformer
File "/usr/lib/spark-2.1.1-bin-hadoop2.7/python/pyspark/ml/base.py", line 21, in 
from pyspark.ml.param import Params
File "/usr/lib/spark-2.1.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py", line 26, in 
import numpy as np
ImportError: No module named numpy
Jul 30 in Apache Spark by Yashita
26 views

1 answer to this question.

0 votes

The error message you have shared with us we can see the error is related to numpy package we suggest you to follow the commands below in your terminal to first install pip and then numpy after this try to import Tokenizer

1. Add the EPEL Repository
Pip is not available in CentOS 7 core repositories. To install pip we need to enable the EPEL repository:

sudo yum install epel-release

2. Install pip
Once the EPEL repository is enabled we can install pip and all of its dependencies with the following command:

sudo yum install python-pip

3. Verify Pip installation
To verify that the pip is installed correctly run the following command which will print the pip version:

pip --version

After this use 

pip install numpy 

to install numpy package

answered Jul 30 by Karan

Related Questions In Apache Spark

0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,710 points
1,457 views
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,260 points
129 views
0 votes
1 answer

How to use ftp scheme using Yarn in Spark application?

In case Yarn does not support schemes ...READ MORE

answered Mar 28 in Apache Spark by Raj
136 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
3,299 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
390 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,235 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
16,075 views
–1 vote
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

answered Jan 3 in Apache Spark by Omkar
• 67,660 points
173 views