Unable to use ml library in pyspark

0 votes
>>> from pyspark.ml.feature import Tokenizer
Traceback (most recent call last):
File "", line 1, in 
File "/usr/lib/spark-2.1.1-bin-hadoop2.7/python/pyspark/ml/__init__.py", line 22, in 
from pyspark.ml.base import Estimator, Model, Transformer
File "/usr/lib/spark-2.1.1-bin-hadoop2.7/python/pyspark/ml/base.py", line 21, in 
from pyspark.ml.param import Params
File "/usr/lib/spark-2.1.1-bin-hadoop2.7/python/pyspark/ml/param/__init__.py", line 26, in 
import numpy as np
ImportError: No module named numpy
Jul 30, 2019 in Apache Spark by Yashita

1 answer to this question.

0 votes

The error message you have shared with us we can see the error is related to numpy package we suggest you to follow the commands below in your terminal to first install pip and then numpy after this try to import Tokenizer

1. Add the EPEL Repository
Pip is not available in CentOS 7 core repositories. To install pip we need to enable the EPEL repository:

sudo yum install epel-release

2. Install pip
Once the EPEL repository is enabled we can install pip and all of its dependencies with the following command:

sudo yum install python-pip

3. Verify Pip installation
To verify that the pip is installed correctly run the following command which will print the pip version:

pip --version

After this use 

pip install numpy 

to install numpy package​

Hope this helps!

To know more about Pyspark, it's recommended that you join PySpark Training today.


answered Jul 30, 2019 by Karan

Related Questions In Apache Spark

0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,489 views
0 votes
1 answer
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,390 points
0 votes
1 answer

How to use ftp scheme using Yarn in Spark application?

In case Yarn does not support schemes ...READ MORE

answered Mar 28, 2019 in Apache Spark by Raj
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,910 points
–1 vote
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

answered Jan 3, 2019 in Apache Spark by Omkar
• 69,230 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP