RDD word count with line numbers

0 votes
Hi,

Could you please send me the Pyspark snippet to find word count and list of line numbers where that word present.

Ex.

Text file contains following text

Hello world

Hello world

Hello

Output

Hello 3  [1,2,3]

World 2  [1,2]

Here,

hello is present in line numbers 1,2,3

World is present in line numbers 1,2
Jul 25 in Apache Spark by Rishi
17 views

1 answer to this question.

0 votes
df = spark.createDataFrame([("A", 2000), ("A", 2002), ("A", 2007), ("B", 1999), ("B", 2015)], ["Group", "Date"])

+-----+----+

|Group|Date|

+-----+----+

| A|2000|

| A|2002|

| A|2007|

| B|1999|

| B|2015|

+-----+----+


# accepted solution above



from pyspark.sql.window import *

from pyspark.sql.functions import row_number


df.withColumn("row_num", row_number().over(Window.partitionBy("Group").orderBy("Date")))



# accepted solution above output



+-----+----+-------------+

|Group|Date|row_num|

+-----+----+-------------+

| B     |1999|       1   |

| B     |2015|        2  |

| A     |2000|        1  |

| A     |2002| 2         |

| A     |2007| 3         |

+-----+----+-------+

After this you can write a UDF to list it out. 

answered Jul 25 by Siri

Related Questions In Apache Spark

0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

answered Aug 16, 2018 in Apache Spark by zombie
• 3,690 points
314 views
0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

answered Jul 26, 2018 in Apache Spark by zombie
• 3,690 points
132 views
0 votes
1 answer

How to remove the elements with a key present in any other RDD?

Hey, You can use the subtractByKey () function to ...READ MORE

answered Jul 22 in Apache Spark by Gitika
• 25,300 points
44 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,636 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
2,408 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
246 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
12,237 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,044 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 17,987 views