How to create paired RDD using subString method in Spark

0 votes
I have a file with and ID and some values then how to create a paired RDD using subString method in Spark?
Aug 2, 2019 in Apache Spark by Riddhi
2,682 views

1 answer to this question.

0 votes

Hi,

If you have a file with id and some value, then you can create paired rdd with id as key and value as other details:

Here is an example of doing that below:

val pRDD2[(Int),(String)]=sc.textFile(“path_of_your_file”)
.keyBy(line=>line.subString(1,5).trim().toInt)
.mapValues(line=>line.subString(10,30).trim())
answered Aug 2, 2019 by Gitika
• 65,890 points

Related Questions In Apache Spark

0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,913 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

answered Jun 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,038 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 88,642 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
61,707 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,979 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,497 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,547 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,890 points
9,490 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,890 points
1,568 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP