Filter, Option or FlatMap in spark

0 votes

This is the code in Spark:

dsPhs.filter(filter(_))
.map(convert)
.coalesce(partitions)
.filter(additionalFilter.IsValid(_))

At convert function I get more complex object - MyObject, so I need to prefilter basic object. I have 3 options:

  1. Make map return option(MyObject) and filter it at additionalFilter
  2. Replace map with flatMap and return empty array when filtered
  3. Use filter before map function, to filter out RawObject, before converting it to MyObject.
Nov 9, 2018 in Apache Spark by Neha
• 6,180 points
302 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

If, for option 2, you mean have convert return an empty array, there's another option: have convert return an Option[MyObject] and use flatMap instead of map. This has the best of options 1 and 2. Without knowing more about your use case, I can't say for sure whether this is better than option 3, but here are some considerations:

  1. Should convert contain input validation logic? If so, consider modifying it to return an Option.
    • If convert is used, or will be used, in other places, could they benefit from this validation?
    • As a side note, this might be a good time to consider what convert currently does when passed an invalid argument.
  2. Can you easily change convert and its signature? If not, consider using a filter.
answered Nov 9, 2018 by Frankie
• 9,710 points

Related Questions In Apache Spark

0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 9,670 points
3,630 views
0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

answered Jun 25, 2018 in Apache Spark by nitinrawat895
• 9,670 points
26 views
0 votes
2 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

answered Jun 17 in Apache Spark by vishal
• 160 points
357 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
3,131 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
139 views
+1 vote
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
112 views
0 votes
1 answer

Joining Multiple Spark Dataframes

You can run the below code to ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by Bharani
• 4,550 points
171 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,710 points
157 views
0 votes
1 answer

What is Executor Memory in a Spark application?

Every spark application has same fixed heap ...READ MORE

answered Jan 4 in Apache Spark by Frankie
• 9,710 points
131 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.