hclust size limit

0 votes
I'm new to R. I'm trying to run hclust() on about 50K items. I have 10 columns to compare and 50K rows of data. When I tried assigning the distance matrix, I get: "Cannot allocate vector of 5GB".

Is there a size limit to this? If so, how do I go about doing a cluster of something this large?
Jul 10, 2019 in Python by ana1504.k
• 7,910 points
760 views

1 answer to this question.

0 votes
Classic hierarchical clustering approaches are O(n^3) in runtime and O(n^2) in memory complexity. So yes, they scale incredibly bad to large data sets. Obviously, anything that requires materialization of the distance matrix is in O(n^2) or worse.

Note that there are some specializations of hierarchical clustering such as SLINK and CLINK that run in O(n^2), and depending on the implementation may also only need O(n) memory.
answered Jul 10, 2019 by SDeb
• 13,300 points

Related Questions In Python

0 votes
1 answer

Size of an object in Python

Use sys.getsizeof() function: >>> import sys >>> s = ...READ MORE

answered May 25, 2018 in Python by Nietzsche's daemon
• 4,260 points
1,192 views
0 votes
1 answer

How to get the size of a string in Python?

If you are talking about the length ...READ MORE

answered Jun 4, 2018 in Python by aryya
• 7,460 points
1,391 views
0 votes
1 answer

Create an empty list in python with certain size

Try this instead: lst = [None] * 10 The ...READ MORE

answered Aug 2, 2018 in Python by bug_seeker
• 15,510 points
28,133 views
–1 vote
2 answers

How to find the size of a string in Python?

following way to find length of string  x ...READ MORE

answered Mar 29, 2019 in Python by rajesh
• 1,270 points
2,144 views
0 votes
1 answer

Is there any easy way to fill in missing data?

You can try the following code: First, you ...READ MORE

answered Jun 20, 2018 in Data Analytics by DataKing99
• 8,250 points
1,099 views
0 votes
1 answer

hclust size limit

Classic hierarchical clustering approaches are O(n^3) in runtime and O(n^2) in ...READ MORE

answered Jun 26, 2018 in Data Analytics by DataKing99
• 8,250 points
996 views
0 votes
1 answer

SMOTE-function not working in R

If you convert 'y' to a factor, ...READ MORE

answered Jun 27, 2018 in Data Analytics by CodingByHeart77
• 3,750 points
2,954 views
0 votes
1 answer

How to find out cluster center mean of DBSCAN in R?

Just index back into the original data ...READ MORE

answered Jun 27, 2018 in Data Analytics by Sahiti
• 6,370 points
1,478 views
0 votes
1 answer

Size of an open file object

You can use the following and try ...READ MORE

answered Jan 22, 2019 in Python by SDeb
• 13,300 points
525 views
0 votes
1 answer

How to increase plt.title font size?

Try the following : import matplotlib.pyplot as plt plt.figtext(.5,.9,'Temperature', ...READ MORE

answered Feb 11, 2019 in Python by SDeb
• 13,300 points
1,196 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP