Hi guys, first I wanted to say that this is my first time trying this. Secondly. I'm not sure I'm placing this question at the right forum. If it's not, please excuse me.

I'm trying to use Naive Bayes on my data. The dataset can be downloaded from https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe . This is my code till now:

```data = pd.read_json('/Users/rokayadarai/Desktop/Coding/DataSets/Hotel_Reviews.json')
#stopword are not useful words (like: a, and, the)
stopset = set(stopwords.words('english'))
vectorizer = TfidfVectorizer(use_idf=True, lowercase=True, strip_accents='ascii', stop_words=stopset)
y = data["Reviewer_Score"]
x = vectorizer.fit_transform(['Negative_Review', 'Positive_Review'])
#515738 observations and 2(?) unique words
print (y.shape)
print (x.shape)
#split the data - 0.2 means 20% of the data. 123 means use same dataset with every test
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,random_state=123)
```

When I try to run this, I get the error: ValueError: Found input variables with inconsistent numbers of samples: [2, 515738]. Could please somebody helps me out? I'm stuck and can't seem to find anything on the Internet to help me.

Dec 15, 2020
edited Dec 15, 2020 3,659 views

## 1 answer to this question.

Hi,

You are asking your query in the right place. You might get the above error because of the shape of x and y. So check the shape of x and if it is 1D, then convert it from 1D to 2D.

• 95,440 points

## ValueError: Found input variables with inconsistent numbers of samples: [11, 3988]

After reshaping also I am getting the ...READ MORE

## how can i randomly select items from a list?

You can also use the random library's ...READ MORE

+1 vote

## how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

+1 vote