How can I plot a k-dsitance graph using python?

+2 votes
Im trying to plot the distance graph for a given value of min-points. I'm specifically looking for the
knee and corresponding epsilon values. I've been trying to use sklearn for my cause, but I can't seem to
find any function that returns these specific values. Can someone help me?
Apr 10, 2018 in Python by ariaholic
• 7,320 points

2 answers to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
+2 votes
Best answer
Hi there, instead of sklearn you could use this function to get what you want

import numpy as np
import pandas as pd
import math

def k_distances(X, n=None, dist_func=None):
    """Function to return array of k_distances.

    X - DataFrame matrix with observations
    n - number of neighbors that are included in returned distances (default number of attributes + 1)
    dist_func - function to count distance between observations in X (default euclidean function)
    if type(X) is pd.DataFrame:
        X = X.values
    if n == None:

    if dist_func == None:
        # euclidean distance square root of sum of squares of differences between attributes
        dist_func = lambda x, y: math.sqrt(
                np.power(x-y, np.repeat(2,x.size))

    Distances = pd.DataFrame({
        "i": [i//10 for i in range(0, len(X)*len(X))],
        "j": [i%10 for i in range(0, len(X)*len(X))],
        "d": [dist_func(x,y) for x in X for y in X]
    return np.sort([g[1].iloc[k].d for g in iter(Distances.groupby(by="i"))])

    X should be pandas.DataFrame or numpy.ndarray. n is number of neighbors that are in d-neighborhood. You should know this number. By default is number of attributes + 1.

To plot the distance using python use matplotlib

import matplotlib.pyplot as plt

d = k_distances(X,n,dist_func)
answered Apr 10, 2018 by charlie_brown
• 7,710 points

selected Oct 12, 2018 by Omkar
0 votes

You probably want to use the matrix operations provided by numpy to speed up your distance matrix calculation.

def k_distances2(x, k):
    dim0 = x.shape[0]
    dim1 = x.shape[1]
    p=-2***2, axis=1).T+ np.repeat(np.sum(x**2, axis=1),dim0,axis=0).reshape(dim0,dim0)
    p = np.sqrt(p)
    pm= p.flatten()
    pm= np.sort(pm)
    return p, pm
m, m2= k_distances2(X, 2)
answered Oct 12, 2018 by findingbugs
• 4,730 points

Related Questions In Python

0 votes
1 answer

How can I define a multidimensional array in python using ctype?

Here's one quick-and-dirty method: >>> A = ((ctypes.c_float ...READ MORE

answered Oct 9, 2018 in Python by ariaholic
• 7,320 points
0 votes
1 answer

How can I rename multiple files in a certain directory using Python?

Use os.rename(src, dst) to rename or move a file ...READ MORE

answered Nov 23, 2018 in Python by ariaholic
• 7,320 points
+2 votes
1 answer

How can I record the X,Y limits of a displayed X,Y plot using the matplotlib show() module?

A couple hours after posting this question ...READ MORE

answered Dec 27, 2018 in Python by anonymous
0 votes
1 answer

How can I lookup hostname using the IP address with a timeout in Python?

Good question. I actually was stuck with ...READ MORE

answered Feb 6 in Python by Nymeria
• 3,500 points
0 votes
1 answer

how can i count the items in a list?

suppose you have a list a = [0,1,2,3,4,5,6,7,8,9,10] now ...READ MORE

answered May 2 in Python by Mohammad
• 1,400 points
0 votes
1 answer
+2 votes
3 answers

How can I play an audio file in the background using Python?

down voteacceptedFor windows: you could use  winsound.SND_ASYNC to play them ...READ MORE

answered Apr 3, 2018 in Python by charlie_brown
• 7,710 points

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.