how create distance vector in pyspark Euclidean distance

0 votes

Hi,  I want to implementation this image in pyspark. Please help me or tell me the code. Thanks

Oct 16, 2020 in Apache Spark by dani
• 160 points
1,283 views

1 answer to this question.

+1 vote

Hi@dani,

You can find the euclidean distance using the available PySpark module as shown below.

import math._
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.linalg.Vectors

//input two vectors of length n, but must be equal length
//output euclidean distance between the vectors
val euclideanDistance = udf { (v1: Vector, v2: Vector) =>
    sqrt(Vectors.sqdist(v1, v2))
}

But if you want to use your own module then I suggest you create a mathematical expression and it with python language.

Hope this helps!

Join Pyspark training online today to know more about Pyspark.

Thanks.

answered Oct 16, 2020 by MD
• 95,320 points
Hi, What's difference between call a function from Python in Spark and execute it or converting that function to pyspark code and executing it? Will the run time be different?
for example we can call kmeans function of python in spark or using of pyspark kmeans library ? thanks

Hi@dani,

You can think in this way that PySpark will make the operation easier only for Spark. Whereas if you use the Kmeans function of python in spark, then you have to import more modules, more operations, etc.

Related Questions In Apache Spark

0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,970 points
67,643 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19 by Sarfaraz 6,532 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,970 points
814 views
0 votes
1 answer

How to create RDD from existing RDD in scala?

scala> val rdd1 = sc.parallelize(List(1,2,3,4,5))                           -  Creating ...READ MORE

answered Feb 29, 2020 in Apache Spark by anonymous
521 views
0 votes
1 answer

How to create RDD from an external file source in scala?

Hi, To create an RDD from external file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,970 points
791 views
0 votes
1 answer

How to create scala project in intellij?

You have to install Intellij with scala plugin. ...READ MORE

answered Jul 5, 2019 in Apache Spark by Jimmy
1,250 views
0 votes
1 answer

Spark: How can i create temp views in user defined database instead of default database?

You can try the below code: df.registerTempTable(“airports”) sqlContext.sql(" create ...READ MORE

answered Jul 14, 2019 in Apache Spark by Ishan
2,430 views
0 votes
1 answer

How to call the Debug Mode in PySpark?

As far as I understand your intentions ...READ MORE

answered Jul 26, 2019 in Apache Spark by ravikiran
• 4,620 points
3,025 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 74,813 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

answered Feb 6, 2020 in Apache Spark by MD
• 95,320 points
1,574 views