Naive Bayes classifier bases decision only on a-priori probabilities

0 votes

I'm trying to classify tweets according to their sentiment into three categories (Buy, Hold, Sell). I'm using R and the package e1071.

I have two data frames: one trainingset and one set of new tweets which sentiment need to be predicted.

trainingset dataframe:

   +--------------------------------------------------+

   **text | sentiment**

   *this stock is a good buy* | Buy

   *markets crash in tokyo* | Sell

   *everybody excited about new products* | Hold

   +--------------------------------------------------+

Now I want to train the model using the tweet text trainingset[,2] and the sentiment category trainingset[,4].

classifier<-naiveBayes(trainingset[,2],as.factor(trainingset[,4]), laplace=1)

Looking into the elements of classifier with

classifier$tables$x

I find that the conditional probabilities are calculated..There are different probabilities for every tweet concerning Buy,Hold and Sell.So far so good.

However when I predict the training set with:

predict(classifier, trainingset[,2], type="raw")

I get a classification which is based only on the a-priori probabilities, which means every tweet is classified as Hold (because "Hold" had the largest share among the sentiment). So every tweet has the same probabilities for Buy, Hold, and Sell:

      +--------------------------------------------------+

      **Id | Buy | Hold | Sell**

      1  |0.25 | 0.5  | 0.25

      2  |0.25 | 0.5  | 0.25

      3  |0.25 | 0.5  | 0.25

     ..  |..... | ....  | ...

      N  |0.25 | 0.5  | 0.25

     +--------------------------------------------------+

Any ideas what I'm doing wrong? Appreciate your help!

Mar 23, 2022 in Machine Learning by Dev
• 6,000 points
503 views

1 answer to this question.

0 votes

You seem to have trained the model with complete phrases as inputs, whereas you appear to wish to utilize words as input features.

This is how it is used:

## S3 method for class 'formula'
naiveBayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)
## Default S3 method:
naiveBayes(x, y, laplace = 0, ...)

## S3 method for class 'naiveBayes'
predict(object, newdata,
  type = c("class", "raw"), threshold = 0.001, ...)

The Arguments
 x: A numeric matrix, or a data frame of categorical and/or numeric variables.

 y: Class vector.

(Taken from r documentation)

Try to train the Naive Bayes like this

x <- c("johny likes almonds", "maria likes dogs and johny")
y <- as.factor(c("good", "bad")) 
bayes<-naiveBayes( x,y )

the classifier recognizes these two sentences.

#Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = x,y = y)

A-priori probabilities:
y
 bad good 
 0.5  0.5 

Conditional probabilities:
            x
      x
y      johny likes almonds maria likes dogs and johny
  bad                0                         1
  good               1                         0

In order to get a word level classifier run it with words as inputs

x <-             c("johny","likes","almonds","maria","likes","dogs","and","johny")
y <- as.factors( c("good","good", "good","bad",  "bad",  "bad", "bad","bad") )
bayes<-naiveBayes( x,y )

The Output

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = x,y = y)

A-priori probabilities:
y
 bad good 
 0.625 0.375 

Conditional probabilities:
      x
y            and    almonds     dogs     johny     likes     maria
  bad  0.2000000 0.0000000 0.2000000 0.2000000 0.2000000 0.2000000
  good 0.0000000 0.3333333 0.0000000 0.3333333 0.3333333 0.0000000

R is not well suited for processing NLP data in general; python (or at the very least Java) would be a far better choice.

The strsplit function can be used to split a sentence into words.

unlist(strsplit("johny likes almonds"," "))
[1] "johny"  "likes" "almonds" a
answered Mar 25, 2022 by Nandini
• 5,480 points

Related Questions In Machine Learning

0 votes
0 answers

Decision tree vs. Naive Bayes classifier

In which cases is it better to ...READ MORE

Feb 28, 2022 in Machine Learning by Dev
• 6,000 points
566 views
0 votes
1 answer

How do I create a decision tree?

Let us consider the following example. Suppose a ...READ MORE

answered May 13, 2019 in Machine Learning by Fatima
822 views
0 votes
3 answers

How to train a Keras model on multiple GPUs?

Hello there, With the latest commit and release ...READ MORE

answered Jul 17, 2020 in Machine Learning by Lily
• 260 points
3,243 views
0 votes
1 answer

ERROR: PyAudio-0.2.11-cp39-cp39-win32.whl is not a supported wheel on this platform.

Hi@akhtar, You should download the version that is ...READ MORE

answered Sep 7, 2020 in Machine Learning by MD
• 95,440 points
4,689 views
0 votes
1 answer

Use different distance formula other than euclidean distance in k means

K-means is based on variance minimization. The sum-of-variance formula ...READ MORE

answered Jun 21, 2018 in Data Analytics by Sahiti
• 6,370 points
1,537 views
0 votes
1 answer

How to convert a sentence to word table in R?

Try the following code: sentence <- c("case sweden", ...READ MORE

answered Jun 21, 2018 in Data Analytics by Sahiti
• 6,370 points
1,373 views
+1 vote
1 answer

How to handle Nominal Data?

Nominal data is basically data which can ...READ MORE

answered Jul 24, 2018 in Data Analytics by Abhi
• 3,720 points
570 views
+2 votes
2 answers

How to handle outliers

There are multiple ways to handle outliers ...READ MORE

answered Jul 24, 2018 in Data Analytics by Abhi
• 3,720 points
857 views
0 votes
1 answer
0 votes
1 answer

A simple explanation of Naïve Bayes Classification

Naive Bayes Classification uses probability to classify ...READ MORE

answered Feb 22, 2022 in Machine Learning by Nandini
• 5,480 points
371 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP