Spark read csv to create RDD into Dataframe

I am trying to parse the case class file as RDD and convert the RDD into DF. But I couldn't do it. I am trying to use spark.read.csv. Please help.

Jan 22, 2019 in Big Data Hadoop by slayer
• 29,370 points • 6,412 views

1 answer to this question.

You can use a case class and rdd and then convert it to dataframe.

The common syntax to create a dataframe directly from a file is as shown below for your reference.

val df = spark.read.option("header","true").option(inferSchema,"true").csv("")

if you are relying on in-built schema of the csv file.

And If you don't then you can create a schema such as:

val schema = StructType(Array(StructField("AirportID", IntegerType, true)))

answered Jan 22, 2019 by Omkar
• 69,180 points

Related Questions In Big Data Hadoop

0 votes

1 answer

Spark - load CSV file as DataFrame?

spark-csv is part of core Spark functionality ...READ MORE

answered Sep 25, 2018 in Big Data Hadoop by slayer
• 29,370 points • 7,334 views

0 votes

1 answer

How to read more than one files in Apache Spark?

Try this: val text = sc.wholeTextFiles("student/*") text.collect() ...READ MORE

answered Dec 11, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 3,040 views

0 votes

1 answer

How to read Spark elements having multiple lines each?

Try this: val new_records = sc.newAPIHadoopRDD(hadoopConf,classOf[ ...READ MORE

answered Dec 12, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 1,851 views

0 votes

1 answer

How to create a FileSystem object that can be used for reading from and writing to HDFS?

Read operation on HDFS In order to read ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
edited Mar 22, 2018 by nitinrawat895 • 3,947 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 6,861 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 13,544 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 4,434 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 116,528 views

0 votes

1 answer

How to convert Spark data into CSV?

You can use this: df.write .option("header", "true") ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 5,338 views

0 votes

1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 9,060 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP