Can I read a CSV represented as a string into Apache Spark?

0 votes
I have a CSV file represented as a string. Is there any way to convert this string directly to a dataframe?

Help needed.

Thanks in advance
May 3, 2018 in Apache Spark by Data_Nerd
• 2,340 points
37 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You can use the following command. This will require you to do a bit of data cleansing and verification.

val mydata : Array[List[String]] = myString.split('\n').flatMap(CSVParser.parseLine(_))

After that you can convert it to a RDD

val myRDD : RDD[List[String]] = sparkContext.parallelize(msdata)

answered May 3, 2018 by kurt_cobain
• 9,260 points

Related Questions In Apache Spark

0 votes
1 answer

In a Spark DataFrame how can I flatten the struct?

You can go ahead and use the ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 12,270 points
341 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,270 points
647 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
244 views
0 votes
1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

answered Sep 19, 2018 in Apache Spark by zombie
• 3,690 points
89 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
2,662 views
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,260 points
63 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous
17,885 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,340 points
1,474 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
756 views
+1 vote
2 answers

Apache Spark vs Apache Spark 2

Spark 2 doesn't differ much architecture-wise from ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,260 points
2,201 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.