How to use RDD filter with other function?

0 votes
I am working on Spark RDD. I know how to filter a RDD like val y = rdd.filter(e => e%2==0), but I do not know how to combine filter with other function like Row.

In val rst = rdd.map(ab => Row(ab.a, ab.b)), I want to filter out ab.b > 0, but I tried put filter at multiple place and they do not work.

Can someone help.
Jul 5, 2018 in Apache Spark by Shubham
• 13,370 points
997 views

2 answers to this question.

0 votes

I'm not sure about the "out" part in "filter out": do you want to keep those entries, or do you want to get rid of them? If you want to drop all entries with ab.b > 0, then you need

val result = rdd.filterNot(_.b > 0).map(ab => Row(ab.a, ab.b))
If you want to retain only the entries with ab.b > 0, then try

val result = rdd.filter(_.b > 0).map(ab => Row(ab.a, ab.b))
The underscore _ is simply the shorter form of

val result = rdd.filter(ab => ab.b > 0).map(ab => Row(ab.a, ab.b))

Hope this will help.

answered Jul 5, 2018 by nitinrawat895
• 10,840 points
0 votes

val x = sc.parallelize(1 to 10, 2)
 
// filter operation 
val y = x.filter(e => e%2==0) 
y.collect
// res0: Array[Int] = Array(2, 4, 6, 8, 10)
 
// RDD y can be re written with shorter syntax in scala as 
val y = x.filter(_ % 2 == 0)
y.collect
// res1: Array[Int] = Array(2, 4, 6, 8, 10)

answered Aug 16, 2018 by zombie
• 3,750 points

Related Questions In Apache Spark

0 votes
1 answer

How to remove the elements with a key present in any other RDD?

Hey, You can use the subtractByKey () function to ...READ MORE

answered Jul 22, 2019 in Apache Spark by Gitika
• 25,460 points
239 views
0 votes
1 answer

How to use nested function in Scala?

Hey, With Scala, we can define a Scala ...READ MORE

answered Jul 26, 2019 in Apache Spark by Gitika
• 25,460 points
74 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,370 points
3,172 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,840 points
1,814 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,290 points
7,113 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,840 points
1,835 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,840 points
9,491 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 33,743 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

answered May 25, 2018 in Apache Spark by nitinrawat895
• 10,840 points
2,886 views