Error saying "vector size cannot be NA" when using R with data mining

0 votes

I'm using R for data analytics and connected it with elasticsearch and retrieve a dataset of Shakespeare Complete Works.

library("elastic")
connect()
maxi <- count(index = 'shakespeare')
s <- Search(index = 'shakespeare',size=maxi)

dat <- s$hits$hits[[1]]$`_source`$text_entry
for (i in 2:maxi) {
  dat <- c(dat , s$hits$hits[[i]]$`_source`$text_entry)
}
rm(s)

After that I want to do a tf_idf matrix but apparently I can't since it uses too much memory (I have 4GB of RAM), here is my code:

library("tm")
myCorpus <- Corpus(VectorSource(dat))
myCorpus <- tm_map(myCorpus, content_transformer(tolower),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removeNumbers),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removePunctuation),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removeWords), stopwords("en"),lazy = TRUE)
myTdm <- TermDocumentMatrix(myCorpus,control = list(weighting = function(x) weightTfIdf(x, normalize = FALSE)))

myCorpus is around 400 Mb.

But then I do:

> m <- as.matrix(myTdm)
Error in vector(typeof(x$v), nr * nc) : vector size cannot be NA
In addition: Warning message:
In nr * nc : NAs produced by integer overflow
Nov 15, 2018 in Data Analytics by Ali
• 10,380 points
243 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You can use the removesparseterm function. 

Removes sparse terms from a document-term or term-document matrix.

something like this:

# NOT RUN {
 data("crude") 
tdm <- TermDocumentMatrix(crude) 
removeSparseTerms(tdm, 0.2) # }

answered Nov 15, 2018 by Maverick
• 10,020 points

Related Questions In Data Analytics

0 votes
1 answer

How to order data frame rows according to vector with specific order using R?

You can try using match: data <- data.frame(alphabets=letters[1:4], ...READ MORE

answered Apr 30, 2018 in Data Analytics by darklord
• 6,140 points
43 views
0 votes
1 answer
+1 vote
2 answers
0 votes
1 answer

Trying to find frequent itemsets of a data set using arules package

Try replacing ID <- c("A123","A123","A123","A123","B456","B456","B456") item <- c("bread", "butter", "milk", ...READ MORE

answered Nov 15, 2018 in Data Analytics by Maverick
• 10,020 points
22 views
0 votes
1 answer

Error saying "Error in df$item : object of type 'closure' is not subsettable" when trying to use arules package

Try replacing ID <- c("A123","A123","A123","A123","B456","B456","B456") item <- c("bread", "butter", ...READ MORE

answered Nov 15, 2018 in Data Analytics by Maverick
• 10,020 points
66 views
0 votes
1 answer
0 votes
1 answer

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.