Naive Bayes classifier bases decision only on a-priori probabilities

Question

I'm trying to classify tweets according to their sentiment into three categories (Buy, Hold, Sell). I'm using R and the package e1071.

I have two data frames: one trainingset and one set of new tweets which sentiment need to be predicted.

trainingset dataframe:

   +--------------------------------------------------+

   **text | sentiment**

   *this stock is a good buy* | Buy

   *markets crash in tokyo* | Sell

   *everybody excited about new products* | Hold

   +--------------------------------------------------+

Now I want to train the model using the tweet text trainingset[,2] and the sentiment category trainingset[,4].

classifier<-naiveBayes(trainingset[,2],as.factor(trainingset[,4]), laplace=1)

Looking into the elements of classifier with

classifier$tables$x

I find that the conditional probabilities are calculated..There are different probabilities for every tweet concerning Buy,Hold and Sell.So far so good.

However when I predict the training set with:

predict(classifier, trainingset[,2], type="raw")

I get a classification which is based only on the a-priori probabilities, which means every tweet is classified as Hold (because "Hold" had the largest share among the sentiment). So every tweet has the same probabilities for Buy, Hold, and Sell:

      +--------------------------------------------------+

      **Id | Buy | Hold | Sell**

      1  |0.25 | 0.5  | 0.25

      2  |0.25 | 0.5  | 0.25

      3  |0.25 | 0.5  | 0.25

     ..  |..... | ....  | ...

      N  |0.25 | 0.5  | 0.25

     +--------------------------------------------------+

Any ideas what I'm doing wrong? Appreciate your help!

Nandini · Answer 1 · Mar 25, 2022

You seem to have trained the model with complete phrases as inputs, whereas you appear to wish to utilize words as input features.

This is how it is used:

## S3 method for class 'formula'
naiveBayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)
## Default S3 method:
naiveBayes(x, y, laplace = 0, ...)

## S3 method for class 'naiveBayes'
predict(object, newdata,
  type = c("class", "raw"), threshold = 0.001, ...)

The Arguments
 x: A numeric matrix, or a data frame of categorical and/or numeric variables.

 y: Class vector.

(Taken from r documentation)

Try to train the Naive Bayes like this

x <- c("johny likes almonds", "maria likes dogs and johny")
y <- as.factor(c("good", "bad")) 
bayes<-naiveBayes( x,y )

the classifier recognizes these two sentences.

#Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = x,y = y)

A-priori probabilities:
y
 bad good 
 0.5  0.5 

Conditional probabilities:
            x
      x
y      johny likes almonds maria likes dogs and johny
  bad                0                         1
  good               1                         0

In order to get a word level classifier run it with words as inputs

x <-             c("johny","likes","almonds","maria","likes","dogs","and","johny")
y <- as.factors( c("good","good", "good","bad",  "bad",  "bad", "bad","bad") )
bayes<-naiveBayes( x,y )

The Output

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = x,y = y)

A-priori probabilities:
y
 bad good 
 0.625 0.375 

Conditional probabilities:
      x
y            and    almonds     dogs     johny     likes     maria
  bad  0.2000000 0.0000000 0.2000000 0.2000000 0.2000000 0.2000000
  good 0.0000000 0.3333333 0.0000000 0.3333333 0.3333333 0.0000000

R is not well suited for processing NLP data in general; python (or at the very least Java) would be a far better choice.

The strsplit function can be used to split a sentence into words.

unlist(strsplit("johny likes almonds"," "))
[1] "johny"  "likes" "almonds" a

Naive Bayes classifier bases decision only on a-priori probabilities

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Machine Learning

How do I create a decision tree?

How to train a Keras model on multiple GPUs?

ERROR: PyAudio-0.2.11-cp39-cp39-win32.whl is not a supported wheel on this platform.

Classification in Naive Bayes algorithm

Use different distance formula other than euclidean distance in k means

How to convert a sentence to word table in R?

How to handle Nominal Data?

How to handle outliers

Decision tree vs. Naive Bayes classifier

Linear Discriminant Analysis vs Naive Bayes

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES