How to use RDD filter with other function

0 votes
I am working on Spark RDD. I know how to filter a RDD like val y = rdd.filter(e => e%2==0), but I do not know how to combine filter with other function like Row.

In val rst = rdd.map(ab => Row(ab.a, ab.b)), I want to filter out ab.b > 0, but I tried put filter at multiple place and they do not work.

Can someone help.
Jul 5, 2018 in Apache Spark by Shubham
• 13,480 points
6,524 views

2 answers to this question.

0 votes

I'm not sure about the "out" part in "filter out": do you want to keep those entries, or do you want to get rid of them? If you want to drop all entries with ab.b > 0, then you need

val result = rdd.filterNot(_.b > 0).map(ab => Row(ab.a, ab.b))
If you want to retain only the entries with ab.b > 0, then try

val result = rdd.filter(_.b > 0).map(ab => Row(ab.a, ab.b))
The underscore _ is simply the shorter form of

val result = rdd.filter(ab => ab.b > 0).map(ab => Row(ab.a, ab.b))

Hope this will help.

answered Jul 5, 2018 by nitinrawat895
• 11,380 points
0 votes

val x = sc.parallelize(1 to 10, 2)
 
// filter operation 
val y = x.filter(e => e%2==0) 
y.collect
// res0: Array[Int] = Array(2, 4, 6, 8, 10)
 
// RDD y can be re written with shorter syntax in scala as 
val y = x.filter(_ % 2 == 0)
y.collect
// res1: Array[Int] = Array(2, 4, 6, 8, 10)

answered Aug 17, 2018 by zombie
• 3,790 points

Related Questions In Apache Spark

0 votes
1 answer

How to remove the elements with a key present in any other RDD?

Hey, You can use the subtractByKey () function to ...READ MORE

answered Jul 22, 2019 in Apache Spark by Gitika
• 65,950 points
1,880 views
0 votes
1 answer

How to use nested function in Scala?

Hey, With Scala, we can define a Scala ...READ MORE

answered Jul 26, 2019 in Apache Spark by Gitika
• 65,950 points
298 views
0 votes
1 answer

Spark Core How to fetch max n rows of an RDD function without using Rdd.max()

Hi@Prasant, If Spark Streaming is not supporting tuple, ...READ MORE

answered Dec 3, 2020 in Apache Spark by MD
• 95,300 points
289 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,480 points
8,520 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
13,593 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
5,344 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points
22,395 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 74,132 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

answered May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
6,127 views