How to use RDD filter with other function?

0 votes
I am working on Spark RDD. I know how to filter a RDD like val y = rdd.filter(e => e%2==0), but I do not know how to combine filter with other function like Row.

In val rst = rdd.map(ab => Row(ab.a, ab.b)), I want to filter out ab.b > 0, but I tried put filter at multiple place and they do not work.

Can someone help.
Jul 5, 2018 in Apache Spark by Shubham
• 13,110 points
242 views

2 answers to this question.

0 votes

I'm not sure about the "out" part in "filter out": do you want to keep those entries, or do you want to get rid of them? If you want to drop all entries with ab.b > 0, then you need

val result = rdd.filterNot(_.b > 0).map(ab => Row(ab.a, ab.b))
If you want to retain only the entries with ab.b > 0, then try

val result = rdd.filter(_.b > 0).map(ab => Row(ab.a, ab.b))
The underscore _ is simply the shorter form of

val result = rdd.filter(ab => ab.b > 0).map(ab => Row(ab.a, ab.b))

Hope this will help.

answered Jul 5, 2018 by nitinrawat895
• 10,030 points
0 votes

val x = sc.parallelize(1 to 10, 2)
 
// filter operation 
val y = x.filter(e => e%2==0) 
y.collect
// res0: Array[Int] = Array(2, 4, 6, 8, 10)
 
// RDD y can be re written with shorter syntax in scala as 
val y = x.filter(_ % 2 == 0)
y.collect
// res1: Array[Int] = Array(2, 4, 6, 8, 10)

answered Aug 16, 2018 by zombie
• 3,690 points

Related Questions In Apache Spark

0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,110 points
1,440 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,030 points
1,017 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
452 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,030 points
977 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
4,299 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,030 points
810 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,030 points
4,087 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 14,375 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

answered May 25, 2018 in Apache Spark by nitinrawat895
• 10,030 points
1,937 views