How to create paired RDD using subString method in Spark?

0 votes
I have a file with and ID and some values then how to create a paired RDD using subString method in Spark?
Aug 2 in Apache Spark by Riddhi
88 views

1 answer to this question.

0 votes

Hi,

If you have a file with id and some value, then you can create paired rdd with id as key and value as other details:

Here is an example of doing that below:

val pRDD2[(Int),(String)]=sc.textFile(“path_of_your_file”)
.keyBy(line=>line.subString(1,5).trim().toInt)
.mapValues(line=>line.subString(10,30).trim())
answered Aug 2 by Gitika
• 25,360 points

Related Questions In Apache Spark

0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,582 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

answered Jun 19, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,892 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 27,571 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
12,828 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
3,378 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
407 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,839 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2 in Apache Spark by Gitika
• 25,360 points
636 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 3 in Apache Spark by Gitika
• 25,360 points
65 views