How to perform a word count on a dataframe column

I have created a dataframe of two columns id and text, I want to perform a wordcount on the text column of the dataframe. What code can I use to do this using PySpark?

Jan 22, 2019 in Big Data Hadoop by Karan
• 3,185 views

1 answer to this question.

You can use the below code to do this:

rdd=sc.textfile(inputfile)

wc=rdd.flatMap(lambda line: line.split()).map(lambda w : (w,1)).reduceByKey(lambda a,b:a+b).map(lambda (a,b) : (b,a)).sortByKey(ascending=True)

output=sc.collect()


for (count, word) in output:

       print("%s %i",(word, count))

answered Jan 22, 2019 by Omkar
• 69,180 points

Related Questions In Big Data Hadoop

0 votes

1 answer

How to count lines in a file on hdfs command?

Use the below commands: Total number of files: hadoop ...READ MORE

answered Aug 10, 2018 in Big Data Hadoop by Neha
• 6,300 points • 31,237 views

0 votes

1 answer

How to groupBy/count then filter on count in Scala

I think the exception is caused because ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points • 31,579 views

0 votes

1 answer

How to Change the maximum number of cells of a column family?

Hey, Given below is the syntax to change ...READ MORE

answered May 24, 2019 in Big Data Hadoop by Gitika
• 65,730 points • 2,260 views

0 votes

1 answer

How to set Hue Plugin for JobTracker on a different host?

When the JobTracker and the Hue Server ...READ MORE

answered May 24, 2019 in Big Data Hadoop by Jishan
• 1,620 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 14,348 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 5,109 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 120,728 views

–1 vote

1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points • 7,402 views

0 votes

1 answer

How to embed HDFS cluster information on a website?

Hi @Bhavish. It is possible to get cluster ...READ MORE

answered May 27, 2019 in Big Data Hadoop by Omkar
• 69,180 points • 1,576 views

+1 vote

1 answer

How to count number of rows in alias in PIG?

COUNT is part of pig LOGS= LOAD 'log'; LOGS_GROUP= ...READ MORE

answered Oct 15, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 4,081 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP