A Dataframe can be created from an existing RDD You would create the Dataframe from the existing RDD by inferring schema using case classes in which one of the given classes

0 votes
A Dataframe can be created from an existing RDD. You would create the Dataframe from the existing RDD by inferring schema using case classes in which one of the given classes?

a)  if your dataset has more than 22 fields
b)  if all your users are going to need dataset parsed in same way
c)  if you have two sets of users who will need the text dataset parsed differently
d)  we cannot create a data frame in RDD
Nov 25, 2020 in Apache Spark by ritu
• 960 points
3,951 views

1 answer to this question.

0 votes

Hi@ritu,

You can create a data frame from an existing RDD. You can see the below example.

SparkSession.createDataFrame(RDD obj).
val dfWithoutSchema = spark.createDataFrame(rdd)
dfWithoutSchema.show()
+------+--------------------+
|    _1|                  _2|
+------+--------------------+
| first|[2.0, 1.0, 2.1, 5.4]|
|  test|[1.5, 0.5, 0.9, 3.7]|
|choose|[8.0, 2.9, 9.1, 2.5]|
+------+--------------------+

So I think you can go with option B.

answered Nov 25, 2020 by akhtar
• 38,230 points

Related Questions In Apache Spark

0 votes
1 answer

7)From Schema RDD, data can be cache by which one of the given choices?

Hi, @Ritu, According to the official documentation of Spark 1.2, ...READ MORE

answered Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
1,561 views
0 votes
2 answers

5)Using which one of the given choices will you create an RDD with specific partitioning?

Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE

answered Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
3,567 views
0 votes
1 answer
0 votes
1 answer
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
7,905 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@Edureka, Spark's internal scheduler may truncate the lineage of the RDD graph ...READ MORE

answered Nov 26, 2020 in Apache Spark by MD
• 95,440 points
3,357 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@ritu, Spark's internal scheduler may truncate the lineage of the RDD graph if ...READ MORE

answered Nov 25, 2020 in Apache Spark by akhtar
• 38,230 points
2,253 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP