Error saying vector size cannot be NA when using R with data mining

Question

I'm using R for data analytics and connected it with elasticsearch and retrieve a dataset of Shakespeare Complete Works.

library("elastic")
connect()
maxi <- count(index = 'shakespeare')
s <- Search(index = 'shakespeare',size=maxi)

dat <- s$hits$hits[[1]]$`_source`$text_entry
for (i in 2:maxi) {
  dat <- c(dat , s$hits$hits[[i]]$`_source`$text_entry)
}
rm(s)

After that I want to do a tf_idf matrix but apparently I can't since it uses too much memory (I have 4GB of RAM), here is my code:

library("tm")
myCorpus <- Corpus(VectorSource(dat))
myCorpus <- tm_map(myCorpus, content_transformer(tolower),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removeNumbers),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removePunctuation),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removeWords), stopwords("en"),lazy = TRUE)
myTdm <- TermDocumentMatrix(myCorpus,control = list(weighting = function(x) weightTfIdf(x, normalize = FALSE)))

myCorpus is around 400 Mb.

But then I do:

> m <- as.matrix(myTdm)
Error in vector(typeof(x$v), nr * nc) : vector size cannot be NA
In addition: Warning message:
In nr * nc : NAs produced by integer overflow

Maverick · Answer 1 · Nov 15, 2018

You can use the removesparseterm function.

Removes sparse terms from a document-term or term-document matrix.

something like this:

# NOT RUN {
 data("crude") 
tdm <- TermDocumentMatrix(crude) 
removeSparseTerms(tdm, 0.2) # }

answered Nov 15, 2018 by Maverick
• 10,840 points

Error saying vector size cannot be NA when using R with data mining

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Data Analytics

How to order data frame rows according to vector with specific order using R?

Error saying "EXPR must be a length 1 vector" when trying to use the switch functionality

Error saying "Error in file(out, "wt") : cannot open the connection" when execute help command in r

Error saying "Error: object 'packages' not found" when trying to web scrap using r

How to segment documents into phrases in text mining using R?

Trying to find frequent itemsets of a data set using arules package

Error saying "Error in df$item : object of type 'closure' is not subsettable" when trying to use arules package

Error saying "Error in lapply(rdmTweets, as.data.frame) : object 'rdmTweets' not found"

Error saying "Error: package or namespace load failed for ‘FactoMineR’" when using data mining on RStudio

Error saying "R cannot be resolved" on eclipse while running a simple application

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES