Why do we use sc.parallelize?

0 votes

Could you please let me know when RDD is already distributed over nodes in a cluster and will be acted upon in parallel, what is the use of parallelize. Why do we use sc.parallelize?

Jul 11 in Apache Spark by Sumit
17 views

1 answer to this question.

0 votes

Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel.

Now this RDD creation can be done in two ways:

First, is to refer to an external dataset present in the hdfs or local i.e,

sc.textFile("/user/edureka_425640/patient_records.txt")

Second, is parallelizing an existing collection using sc.parallelize i.e., sc.parallelize API will help in loading user created data which is not mandatorily coming from a directory.

val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)

So, when we are using sc.parallelize, we are actually using it for RDD creation only.

answered Jul 11 by Suman

Related Questions In Apache Spark

0 votes
1 answer

What do we mean by an RDD in Spark?

The full form of RDD is a ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 10,110 points
126 views
0 votes
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

answered Jan 3 in Apache Spark by Omkar
• 67,120 points
102 views
0 votes
1 answer

How can we use spark shell for scala without cluster?

You can run the Spark shell for ...READ MORE

answered Apr 28 in Apache Spark by Giri
45 views
0 votes
0 answers
0 votes
0 answers
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,110 points
2,048 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,110 points
196 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
10,478 views
0 votes
1 answer

In what kind of use cases has Spark outperformed Hadoop in processing?

I can list some but there can ...READ MORE

answered Sep 19, 2018 in Apache Spark by zombie
• 3,690 points
57 views
0 votes
1 answer

Spark context (sc) not found

Maybe the hadoop service didn't start properly. Try ...READ MORE

answered Feb 13 in Apache Spark by John
36 views