how create distance vector in pyspark (Euclidean distance)

0 votes

Hi,  I want to implementation this image in pyspark. Please help me or tell me the code. Thanks

Oct 16 in Apache Spark by dani
• 160 points
101 views

1 answer to this question.

+1 vote

Hi@dani,

You can find the euclidean distance using the available PySpark module as shown below.

import math._
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.linalg.Vectors

//input two vectors of length n, but must be equal length
//output euclidean distance between the vectors
val euclideanDistance = udf { (v1: Vector, v2: Vector) =>
    sqrt(Vectors.sqdist(v1, v2))
}

But if you want to use your own module then I suggest you create a mathematical expression and it with python language.

answered Oct 16 by MD
• 79,190 points
Hi, What's difference between call a function from Python in Spark and execute it or converting that function to pyspark code and executing it? Will the run time be different?
for example we can call kmeans function of python in spark or using of pyspark kmeans library ? thanks

Hi@dani,

You can think in this way that PySpark will make the operation easier only for Spark. Whereas if you use the Kmeans function of python in spark, then you have to import more modules, more operations, etc.

Related Questions In Apache Spark

0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
44,015 views
0 votes
12 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 60,801 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,950 points
4,494 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 49,580 points
556 views
0 votes
1 answer

How to create RDD from existing RDD in scala?

scala> val rdd1 = sc.parallelize(List(1,2,3,4,5))                           -  Creating ...READ MORE

answered Feb 28 in Apache Spark by anonymous
273 views
0 votes
1 answer

How to create RDD from an external file source in scala?

Hi, To create an RDD from external file ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 49,580 points
482 views
0 votes
1 answer

How to create scala project in intellij?

You have to install Intellij with scala plugin. ...READ MORE

answered Jul 5, 2019 in Apache Spark by Jimmy
812 views
0 votes
1 answer

Spark: How can i create temp views in user defined database instead of default database?

You can try the below code: df.registerTempTable(“airports”) sqlContext.sql(" create ...READ MORE

answered Jul 14, 2019 in Apache Spark by Ishan
1,584 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

answered Feb 6 in Apache Spark by MD
• 79,190 points
644 views
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

answered Apr 13 in Apache Spark by MD
• 79,190 points
596 views