One Hot Encoding in Apache Spark

The following code that I wrote for One-Hot encoding in Spark is not working and is giving me errors like value not found : value encoder, etc. What I want this is - import csv data to create a dataframe, do one-hot encoding and create a new dataframe with the new encoded columns. The column that I want to encode is called SignalType so where in the code below I specify the column name?

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().getOrCreate()
val df =     
spark.read.option("header","true").option("inferSchema","true")
.csv("myfile.csv")

for(line <- df.head(5)){
println(line)
}

import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer}
val df = spark.createDataFrame(Seq(
(1, "1"),
(2, "2"),
(3, "3"),
(4, "4"),
)).toDF("categoryIndex1", "categoryIndex2")

val encoder = new OneHotEncoderEstimator()
.setInputCols(Array("categoryIndex1", "categoryIndex2"))
.setOutputCols(Array("categoryVec1", "categoryVec2"))

val model = encoder.fit(df)
val encoded = model.transform(df)
encoded.show()

One Hot Encoding in Apache Spark

Your comment on this question:

No answer to this question. Be the first to respond.

Your answer

Related Questions In Apache Spark

Concatenate columns in apache spark dataframe

cache tables in apache spark sql

Ways to create RDD in Apache Spark

How to print the contents of RDD in Apache Spark?

groupByKey vs reduceByKey in Apache Spark.

What is the difference between rdd and dataframes in Apache Spark ?

What do we exactly mean by “Hadoop” – the definition of Hadoop?

I installed Spark but while executing command, I am getting ‘hadoop’ command not found error?

Can we run Spark without using Hadoop?

Joining Multiple Spark Dataframes

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES