Hi guys, I'm trying to use the Naive Bayes Algorithm on my dataset. Dataset can be downloaded here: https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe

This is my code:

#

#stopword are not usefull (a, and, the)

stopset = set(stopwords.words('english'))

vectorizer = TfidfVectorizer(use_idf=True, lowercase=True, strip_accents='ascii', stop_words=stopset)

#merge 2 columns negative_reviews&Positive reviews into 1

data ['Reviews'] = data['Negative_Review'] + data['Positive_Review']

y = data.Reviewer_Score

X = vectorizer.fit_transform(data.Reviews)

# 515738 observations and 83941 unique words

print (y.shape)

print (X.shape)

#split the data - 0.2 means 20% of the data. 123 means use same dataset with every test

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=123)

#train naive bayes classifier

classifier = naive_bayes.MultinomialNB()

classifier.fit(X_train, y_train)

But after running it I keep getting the error:

ValueError: Unknown label type: (array([ 7.5,  9.2,  9.2, ...,  5.8, 10. ,  9.6]),) for the line classifier.fit(X_train, y_train) Dec 16, 2020 495 views

## 1 answer to this question.

Hi,

There is a problem with your steps. Before you go for the model, try to analyze the dataset. First, check the format and type of each column. Check the format of your X_train and y_train. answered Dec 16, 2020 by
• 95,300 points

## valueerror: found input variables with inconsistent numbers of samples: [40, 10]

keep random_state =42 in train_test_split module READ MORE

## How do I convert a pandas dataframe to a numpy array using python?

Try something like this: df.values array([[nan, 0.2, nan], ...READ MORE

## how can i randomly select items from a list?

You can also use the random library's ...READ MORE

+1 vote

## how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

## how do i use the enumerate function inside a list?

can you give an example using a ...READ MORE

+1 vote