How to use RDD filter with other function?

0 votes
I am working on Spark RDD. I know how to filter a RDD like val y = rdd.filter(e => e%2==0), but I do not know how to combine filter with other function like Row.

In val rst = rdd.map(ab => Row(ab.a, ab.b)), I want to filter out ab.b > 0, but I tried put filter at multiple place and they do not work.

Can someone help.
Jul 5, 2018 in Apache Spark by Shubham
• 13,310 points
642 views

2 answers to this question.

0 votes

I'm not sure about the "out" part in "filter out": do you want to keep those entries, or do you want to get rid of them? If you want to drop all entries with ab.b > 0, then you need

val result = rdd.filterNot(_.b > 0).map(ab => Row(ab.a, ab.b))
If you want to retain only the entries with ab.b > 0, then try

val result = rdd.filter(_.b > 0).map(ab => Row(ab.a, ab.b))
The underscore _ is simply the shorter form of

val result = rdd.filter(ab => ab.b > 0).map(ab => Row(ab.a, ab.b))

Hope this will help.

answered Jul 5, 2018 by nitinrawat895
• 10,730 points
0 votes

val x = sc.parallelize(1 to 10, 2)
 
// filter operation 
val y = x.filter(e => e%2==0) 
y.collect
// res0: Array[Int] = Array(2, 4, 6, 8, 10)
 
// RDD y can be re written with shorter syntax in scala as 
val y = x.filter(_ % 2 == 0)
y.collect
// res1: Array[Int] = Array(2, 4, 6, 8, 10)

answered Aug 16, 2018 by zombie
• 3,690 points

Related Questions In Apache Spark

0 votes
1 answer

How to remove the elements with a key present in any other RDD?

Hey, You can use the subtractByKey () function to ...READ MORE

answered Jul 22 in Apache Spark by Gitika
• 25,360 points
117 views
0 votes
1 answer

How to use nested function in Scala?

Hey, With Scala, we can define a Scala ...READ MORE

answered Jul 26 in Apache Spark by Gitika
• 25,360 points
67 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,310 points
2,520 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,574 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
6,159 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,460 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,730 points
7,602 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 27,160 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

answered May 25, 2018 in Apache Spark by nitinrawat895
• 10,730 points
2,591 views