Filter Option or FlatMap in spark

0 votes

This is the code in Spark:

dsPhs.filter(filter(_))
.map(convert)
.coalesce(partitions)
.filter(additionalFilter.IsValid(_))

At convert function I get more complex object - MyObject, so I need to prefilter basic object. I have 3 options:

  1. Make map return option(MyObject) and filter it at additionalFilter
  2. Replace map with flatMap and return empty array when filtered
  3. Use filter before map function, to filter out RawObject, before converting it to MyObject.
Nov 9, 2018 in Apache Spark by Neha
• 6,300 points
2,477 views

1 answer to this question.

0 votes

If, for option 2, you mean have convert return an empty array, there's another option: have convert return an Option[MyObject] and use flatMap instead of map. This has the best of options 1 and 2. Without knowing more about your use case, I can't say for sure whether this is better than option 3, but here are some considerations:

  1. Should convert contain input validation logic? If so, consider modifying it to return an Option.
    • If convert is used, or will be used, in other places, could they benefit from this validation?
    • As a side note, this might be a good time to consider what convert currently does when passed an invalid argument.
  2. Can you easily change convert and its signature? If not, consider using a filter.
answered Nov 9, 2018 by Frankie
• 9,830 points

Related Questions In Apache Spark

0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points
33,851 views
0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

answered Jun 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
758 views
+1 vote
3 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

answered Jun 17, 2019 in Apache Spark by vishal
• 180 points
38,115 views
0 votes
1 answer

What is Map and flatMap in Spark?

Hi, The map is a specific line or ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 65,910 points
1,892 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
1,640 views
+1 vote
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by pradeep
1,903 views
0 votes
1 answer

Joining Multiple Spark Dataframes

You can run the below code to ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by Bharani
• 4,660 points
2,650 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
3,104 views
0 votes
1 answer

What is Executor Memory in a Spark application?

Every spark application has same fixed heap ...READ MORE

answered Jan 5, 2019 in Apache Spark by Frankie
• 9,830 points
6,237 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP