how create distance vector in pyspark Euclidean distance

0 votes

Hi,  I want to implementation this image in pyspark. Please help me or tell me the code. Thanks

Oct 16, 2020 in Apache Spark by dani
• 160 points
4,748 views

1 answer to this question.

+1 vote

Hi@dani,

You can find the euclidean distance using the available PySpark module as shown below.

import math._
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.linalg.Vectors

//input two vectors of length n, but must be equal length
//output euclidean distance between the vectors
val euclideanDistance = udf { (v1: Vector, v2: Vector) =>
    sqrt(Vectors.sqdist(v1, v2))
}

But if you want to use your own module then I suggest you create a mathematical expression and it with python language.

Hope this helps!

Join PySpark training online today to know more about Pyspark.

Thanks.

answered Oct 16, 2020 by MD
• 95,460 points
Hi, What's difference between call a function from Python in Spark and execute it or converting that function to pyspark code and executing it? Will the run time be different?
for example we can call kmeans function of python in spark or using of pyspark kmeans library ? thanks

Hi@dani,

You can think in this way that PySpark will make the operation easier only for Spark. Whereas if you use the Kmeans function of python in spark, then you have to import more modules, more operations, etc.

Related Questions In Apache Spark

0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,770 points
125,769 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,712 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,770 points
1,635 views
0 votes
1 answer

How to create RDD from existing RDD in scala?

scala> val rdd1 = sc.parallelize(List(1,2,3,4,5))                           -  Creating ...READ MORE

answered Feb 29, 2020 in Apache Spark by anonymous
1,442 views
0 votes
1 answer

How to create RDD from an external file source in scala?

Hi, To create an RDD from external file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,770 points
1,784 views
0 votes
1 answer

How to create scala project in intellij?

You have to install Intellij with scala plugin. ...READ MORE

answered Jul 5, 2019 in Apache Spark by Jimmy
2,295 views
0 votes
1 answer

Spark: How can i create temp views in user defined database instead of default database?

You can try the below code: df.registerTempTable(“airports”) sqlContext.sql(" create ...READ MORE

answered Jul 14, 2019 in Apache Spark by Ishan
4,508 views
0 votes
1 answer

How to call the Debug Mode in PySpark?

As far as I understand your intentions ...READ MORE

answered Jul 26, 2019 in Apache Spark by ravikiran
• 4,620 points
6,214 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 88,875 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

answered Feb 6, 2020 in Apache Spark by MD
• 95,460 points
4,179 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP