'm new to R. I'm trying to run hclust() on about 50K items. I have 10 columns to compare and 50K rows of data. When I tried assigning the distance matrix, I get: "Cannot allocate vector of 5GB".

Is there a size limit to this? If so, how do I go about doing a cluster of something this large?
Jun 26, 2018 800 views

## 1 answer to this question.

Classic hierarchical clustering approaches are O(n^3) in runtime and O(n^2) in memory complexity. So yes, they scale incredibly bad to large data sets. Obviously, anything that requires materialization of the distance matrix is in O(n^2) or worse.

Note that there are some specializations of hierarchical clustering such as SLINK and CLINK that run in O(n^2), and depending on the implementation may also only need O(n) memory.

You might want to look into more modern clustering algorithms. Anything that runs in O(n log n) or better should work for you. There are plenty of good reasons to not use hierarchical clustering: usually it is rather sensitive to noise (i.e. it doesn't really know what to do with outliers) and the results are hard to interpret for large data sets (dendrograms are nice, but only for small data sets).

• 8,240 points

## How to limit output of a dataframe in R?

For randomly sampling a row/cell where a ...READ MORE

## How to change font size of text and axes on R plots ?

To change the font size of text ...READ MORE

## What is the Difference in Size and Count in pandas (python)?

The major difference is "size" includes NaN values, ...READ MORE

## How can I control the size of points in an R scatterplot?

plot(variable, type='o' , pch=5, cex=.3) The pch argument ...READ MORE

## How can I change font size and direction of axes text in ggplot2 ?

You can try theme(): Library(ggplot2) a <- data.frame(x=gl(10, 1, ...READ MORE

+1 vote

## Error saying "vector size cannot be NA" when using R with data mining

You can use the removesparseterm function.  Removes sparse ...READ MORE

## hclust size limit?

Classic hierarchical clustering approaches are O(n^3) in ...READ MORE

## Big Data transformations with R

Dear Koushik, Hope you are doing great. You can ...READ MORE