TypeError A sparse matrix was passed but dense data is required Use X toarray to convert to a dense numpy array

+1 vote

First I wanted to say that this is my first time trying this. Secondly, I'm not sure I'm placing this question in the right forum. If not, please excuse me.

I'm trying to use Naive Bayes on my data. Click here to download the dataset.

This is my code till now:

data = pd.read_json('/Users/rokayadarai/Desktop/Coding/DataSets/Hotel_Reviews.json')
#stopword are not usefull (a, and, the)
stopset = set(stopwords.words('english'))
vectorizer = TfidfVectorizer(use_idf=True, lowercase=True, strip_accents='ascii', stop_words=stopset)
y = data['Reviewer_Score']
X = scipy.sparse.hstack([vectorizer.fit_transform(data['Negative_Review']),
#515738 observations and 106514 unique words
print (y.shape)
print (X.shape)
#split the data - 0.2 means 20% of the data. 123 means use same dataset with every test
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=123)
#train naive bayes classifier
clf = naive_bayes.GaussianNB()
clf.fit(X_train, y_train)
#test model's accuracy 
roc_auc_score(y_test, clf.predict_proba(X_test)[:,1])

And this is the error I get: 

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense NumPy array.​

Could please somebody helps me out? I'm stuck. I know I'm doing something wrong, but I can't figure out what and can't seem to find anything on the Internet to help me.

Dec 16, 2020 in Machine Learning by anonymous
• 170 points

edited Dec 16, 2020 by MD 6,068 views

1 answer to this question.

0 votes

There is a problem with your X variable. You are using X as your feature variable. Before you train your model, check the format and shape of X. If possible paste the output here as well. So, we can help you with the steps.
answered Dec 16, 2020 by MD
• 95,440 points

Related Questions In Machine Learning

0 votes
1 answer
0 votes
1 answer

How to know if a problem is solvable by machine learning?

Transitioning from learning machine learning algorithms through ...READ MORE

answered Dec 13, 2023 in Machine Learning by anonymous
• 1,180 points
0 votes
1 answer

What is the difference between a Confusion Matrix and Contingency Table?

Confusion Matrix is a classification matrix used ...READ MORE

answered Mar 2, 2022 in Machine Learning by Dev
• 6,000 points
0 votes
1 answer
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,058 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

How to convert 1D array into 2D array using pandas?

Hi@akhtar, You can follow the below given codes to ...READ MORE

answered May 8, 2020 in Machine Learning by MD
• 95,440 points
+1 vote
1 answer
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP