RDD word count with line numbers

0 votes
Hi,

Could you please send me the Pyspark snippet to find word count and list of line numbers where that word present.

Ex.

Text file contains following text

Hello world

Hello world

Hello

Output

Hello 3  [1,2,3]

World 2  [1,2]

Here,

hello is present in line numbers 1,2,3

World is present in line numbers 1,2
Jul 25, 2019 in Apache Spark by Rishi
1,639 views

1 answer to this question.

0 votes
df = spark.createDataFrame([("A", 2000), ("A", 2002), ("A", 2007), ("B", 1999), ("B", 2015)], ["Group", "Date"])

+-----+----+

|Group|Date|

+-----+----+

| A|2000|

| A|2002|

| A|2007|

| B|1999|

| B|2015|

+-----+----+


# accepted solution above



from pyspark.sql.window import *

from pyspark.sql.functions import row_number


df.withColumn("row_num", row_number().over(Window.partitionBy("Group").orderBy("Date")))



# accepted solution above output



+-----+----+-------------+

|Group|Date|row_num|

+-----+----+-------------+

| B     |1999|       1   |

| B     |2015|        2  |

| A     |2000|        1  |

| A     |2002| 2         |

| A     |2007| 3         |

+-----+----+-------+

After this you can write a UDF to list it out. 

answered Jul 25, 2019 by Siri

Related Questions In Apache Spark

0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

answered Aug 17, 2018 in Apache Spark by zombie
• 3,790 points
7,472 views
0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

answered Jul 26, 2018 in Apache Spark by zombie
• 3,790 points
716 views
0 votes
1 answer

How to remove the elements with a key present in any other RDD?

Hey, You can use the subtractByKey () function to ...READ MORE

answered Jul 22, 2019 in Apache Spark by Gitika
• 65,850 points
2,368 views
0 votes
2 answers

5)Using which one of the given choices will you create an RDD with specific partitioning?

Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE

answered Nov 23, 2020 in Apache Spark by Gitika
• 65,850 points
1,221 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
8,628 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
1,524 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
76,274 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
6,420 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 78,763 views
webinar REGISTER FOR FREE WEBINAR X
Send OTP
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP