Hi@dani,
You can find the euclidean distance using the available PySpark module as shown below.
import math._
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.linalg.Vectors
//input two vectors of length n, but must be equal length
//output euclidean distance between the vectors
val euclideanDistance = udf { (v1: Vector, v2: Vector) =>
sqrt(Vectors.sqdist(v1, v2))
}
But if you want to use your own module then I suggest you create a mathematical expression and it with python language.
Hope this helps!
Join PySpark training online today to know more about Pyspark.
Thanks.