hclust size limit?

0 votes
I'm new to R. I'm trying to run hclust() on about 50K items. I have 10 columns to compare and 50K rows of data. When I tried assigning the distance matrix, I get: "Cannot allocate vector of 5GB".

Is there a size limit to this? If so, how do I go about doing a cluster of something this large?
Jul 10 in Python by ana1504.k
• 7,890 points
34 views

1 answer to this question.

0 votes
Classic hierarchical clustering approaches are O(n^3) in runtime and O(n^2) in memory complexity. So yes, they scale incredibly bad to large data sets. Obviously, anything that requires materialization of the distance matrix is in O(n^2) or worse.

Note that there are some specializations of hierarchical clustering such as SLINK and CLINK that run in O(n^2), and depending on the implementation may also only need O(n) memory.
answered Jul 10 by SDeb
• 13,210 points

Related Questions In Python

0 votes
1 answer

Size of an object in Python

Use sys.getsizeof() function: >>> import sys >>> s = ...READ MORE

answered May 25, 2018 in Python by Nietzsche's daemon
• 4,260 points
40 views
0 votes
1 answer

How to get the size of a string in Python?

If you are talking about the length ...READ MORE

answered Jun 4, 2018 in Python by ariaholic
• 7,340 points
115 views
0 votes
1 answer

Create an empty list in python with certain size

Try this instead: lst = [None] * 10 The ...READ MORE

answered Aug 2, 2018 in Python by bug_seeker
• 15,360 points
1,415 views
–1 vote
2 answers

How to find the size of a string in Python?

following way to find length of string  x ...READ MORE

answered Mar 29 in Python by rajesh
• 1,210 points
112 views
0 votes
1 answer

Is there any easy way to fill in missing data?

You can try the following code: First, you ...READ MORE

answered Jun 20, 2018 in Data Analytics by DataKing99
• 8,130 points
61 views
0 votes
1 answer

hclust size limit

Classic hierarchical clustering approaches are O(n^3) in runtime and O(n^2) in ...READ MORE

answered Jun 25, 2018 in Data Analytics by DataKing99
• 8,130 points
130 views
0 votes
1 answer

SMOTE-function not working in R

If you convert 'y' to a factor, ...READ MORE

answered Jun 26, 2018 in Data Analytics by CodingByHeart77
• 3,690 points
505 views
0 votes
1 answer

How to find out cluster center mean of DBSCAN in R?

Just index back into the original data ...READ MORE

answered Jun 27, 2018 in Data Analytics by darklord
• 6,190 points
172 views
0 votes
1 answer

Size of an open file object

You can use the following and try ...READ MORE

answered Jan 22 in Python by SDeb
• 13,210 points
31 views
0 votes
1 answer

How to increase plt.title font size?

Try the following : import matplotlib.pyplot as plt plt.figtext(.5,.9,'Temperature', ...READ MORE

answered Feb 11 in Python by SDeb
• 13,210 points
149 views