how create distance vector in pyspark Euclidean distance

0 votes

Hi,  I want to implementation this image in pyspark. Please help me or tell me the code. Thanks

Oct 16, 2020 in Apache Spark by dani
• 160 points
559 views

1 answer to this question.

+1 vote

Hi@dani,

You can find the euclidean distance using the available PySpark module as shown below.

import math._
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.linalg.Vectors

//input two vectors of length n, but must be equal length
//output euclidean distance between the vectors
val euclideanDistance = udf { (v1: Vector, v2: Vector) =>
    sqrt(Vectors.sqdist(v1, v2))
}

But if you want to use your own module then I suggest you create a mathematical expression and it with python language.

answered Oct 16, 2020 by MD
• 95,140 points
Hi, What's difference between call a function from Python in Spark and execute it or converting that function to pyspark code and executing it? Will the run time be different?
for example we can call kmeans function of python in spark or using of pyspark kmeans library ? thanks

Hi@dani,

You can think in this way that PySpark will make the operation easier only for Spark. Whereas if you use the Kmeans function of python in spark, then you have to import more modules, more operations, etc.

Related Questions In Apache Spark

0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 13, 2020 in Apache Spark by Gitika
• 65,870 points
54,215 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points
5,595 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 65,870 points
685 views
0 votes
1 answer

How to create RDD from existing RDD in scala?

scala> val rdd1 = sc.parallelize(List(1,2,3,4,5))                           -  Creating ...READ MORE

answered Feb 28, 2020 in Apache Spark by anonymous
381 views
0 votes
1 answer

How to create RDD from an external file source in scala?

Hi, To create an RDD from external file ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 65,870 points
619 views
0 votes
1 answer

How to create scala project in intellij?

You have to install Intellij with scala plugin. ...READ MORE

answered Jul 5, 2019 in Apache Spark by Jimmy
1,048 views
0 votes
1 answer

Spark: How can i create temp views in user defined database instead of default database?

You can try the below code: df.registerTempTable(“airports”) sqlContext.sql(" create ...READ MORE

answered Jul 14, 2019 in Apache Spark by Ishan
1,976 views
0 votes
1 answer

How to call the Debug Mode in PySpark?

As far as I understand your intentions ...READ MORE

answered Jul 26, 2019 in Apache Spark by ravikiran
• 4,620 points
2,356 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 67,968 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

answered Feb 6, 2020 in Apache Spark by MD
• 95,140 points
1,097 views