Spark read csv to create RDD into Dataframe

I am trying to parse the case class file as RDD and convert the RDD into DF. But I couldn't do it. I am trying to use spark.read.csv. Please help.

Jan 22, 2019 in Big Data Hadoop by slayer
• 29,370 points • 6,224 views

1 answer to this question.

You can use a case class and rdd and then convert it to dataframe.

The common syntax to create a dataframe directly from a file is as shown below for your reference.

val df = spark.read.option("header","true").option(inferSchema,"true").csv("")

if you are relying on in-built schema of the csv file.

And If you don't then you can create a schema such as:

val schema = StructType(Array(StructField("AirportID", IntegerType, true)))

answered Jan 22, 2019 by Omkar
• 69,180 points

Related Questions In Big Data Hadoop

0 votes

1 answer

Spark - load CSV file as DataFrame?

spark-csv is part of core Spark functionality ...READ MORE

answered Sep 25, 2018 in Big Data Hadoop by slayer
• 29,370 points • 7,164 views

0 votes

1 answer

How to read more than one files in Apache Spark?

Try this: val text = sc.wholeTextFiles("student/*") text.collect() ...READ MORE

answered Dec 11, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 2,862 views

0 votes

1 answer

How to read Spark elements having multiple lines each?

Try this: val new_records = sc.newAPIHadoopRDD(hadoopConf,classOf[ ...READ MORE

answered Dec 12, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 1,671 views

0 votes

1 answer

How to create a FileSystem object that can be used for reading from and writing to HDFS?

Read operation on HDFS In order to read ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
edited Mar 22, 2018 by nitinrawat895 • 3,585 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 6,176 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 11,616 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 3,005 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 112,680 views

0 votes

1 answer

How to convert Spark data into CSV?

You can use this: df.write .option("header", "true") ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 5,052 views

0 votes

1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 8,818 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP