A Dataframe can be created from an existing RDD You would create the Dataframe from the existing RDD by inferring schema using case classes in which one of the given classes

0 votes
A Dataframe can be created from an existing RDD. You would create the Dataframe from the existing RDD by inferring schema using case classes in which one of the given classes?

a)  if your dataset has more than 22 fields
b)  if all your users are going to need dataset parsed in same way
c)  if you have two sets of users who will need the text dataset parsed differently
d)  we cannot create a data frame in RDD
Nov 24, 2020 in Apache Spark by ritu
• 980 points
473 views

1 answer to this question.

0 votes

Hi@ritu,

You can create a data frame from an existing RDD. You can see the below example.

SparkSession.createDataFrame(RDD obj).
val dfWithoutSchema = spark.createDataFrame(rdd)
dfWithoutSchema.show()
+------+--------------------+
|    _1|                  _2|
+------+--------------------+
| first|[2.0, 1.0, 2.1, 5.4]|
|  test|[1.5, 0.5, 0.9, 3.7]|
|choose|[8.0, 2.9, 9.1, 2.5]|
+------+--------------------+

So I think you can go with option B.

answered Nov 25, 2020 by akhtar
• 38,180 points

Related Questions In Apache Spark

0 votes
1 answer

7)From Schema RDD, data can be cache by which one of the given choices?

Hi, @Ritu, According to the official documentation of Spark 1.2, ...READ MORE

answered Nov 23, 2020 in Apache Spark by Gitika
• 65,870 points
137 views
0 votes
2 answers

5)Using which one of the given choices will you create an RDD with specific partitioning?

Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE

answered Nov 23, 2020 in Apache Spark by Gitika
• 65,870 points
207 views
0 votes
1 answer
0 votes
1 answer
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,480 points
5,066 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@Edureka, Spark's internal scheduler may truncate the lineage of the RDD graph ...READ MORE

answered Nov 25, 2020 in Apache Spark by MD
• 95,140 points
378 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@ritu, Spark's internal scheduler may truncate the lineage of the RDD graph if ...READ MORE

answered Nov 25, 2020 in Apache Spark by akhtar
• 38,180 points
296 views