You seem to have trained the model with complete phrases as inputs, whereas you appear to wish to utilize words as input features.
This is how it is used:
## S3 method for class 'formula'
naiveBayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)
## Default S3 method:
naiveBayes(x, y, laplace = 0, ...)
## S3 method for class 'naiveBayes'
predict(object, newdata,
type = c("class", "raw"), threshold = 0.001, ...)
The Arguments
x: A numeric matrix, or a data frame of categorical and/or numeric variables.
y: Class vector.
(Taken from r documentation)
Try to train the Naive Bayes like this
x <- c("johny likes almonds", "maria likes dogs and johny")
y <- as.factor(c("good", "bad"))
bayes<-naiveBayes( x,y )
the classifier recognizes these two sentences.
#Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = x,y = y)
A-priori probabilities:
y
bad good
0.5 0.5
Conditional probabilities:
x
x
y johny likes almonds maria likes dogs and johny
bad 0 1
good 1 0
In order to get a word level classifier run it with words as inputs
x <- c("johny","likes","almonds","maria","likes","dogs","and","johny")
y <- as.factors( c("good","good", "good","bad", "bad", "bad", "bad","bad") )
bayes<-naiveBayes( x,y )
The Output
Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = x,y = y)
A-priori probabilities:
y
bad good
0.625 0.375
Conditional probabilities:
x
y and almonds dogs johny likes maria
bad 0.2000000 0.0000000 0.2000000 0.2000000 0.2000000 0.2000000
good 0.0000000 0.3333333 0.0000000 0.3333333 0.3333333 0.0000000
R is not well suited for processing NLP data in general; python (or at the very least Java) would be a far better choice.
The strsplit function can be used to split a sentence into words.
unlist(strsplit("johny likes almonds"," "))
[1] "johny" "likes" "almonds" a