How to groupBy count then filter on count in Scala

Question

I'm using spark 2.1, I was trying to use the groupBy on the "count" column i have. It throws an exception.

Code:

df.groupBy("travel").count()
  .filter("count >= 1000")
  .show()

java.lang.RuntimeException: [1.15] failure: ``('' expected but `>=' found count >= 1000

kurt_cobain · Answer 1 · Apr 19, 2018

I think the exception is caused because you used the keyword Count.

Now when you use the filter function, in the background it's actually SQL code running. So count being a keyword in SQL is misinterpreted here.

You can either specify it as a column by using $ sign

df.groupBy("travel").count()
  .filter($"count >= 1000")
  .show()

Alternatively, you can use the rename function also

df.groupBy("travel").count().withColumnRenamed("count", "x")
  .filter("x >= 1000")
  .show()

Hope this helps!!

If you need to learn more about Scala, It's recommended to join Scala Certification course today.

Thank you!

answered Apr 19, 2018 by kurt_cobain
• 9,350 points

Your comment on this question: