How to perform a word count on a dataframe column?

0 votes

I have created a dataframe of two columns id and text, I want to perform a wordcount on the text column of the dataframe. What code can I use to do this using PySpark?

Jan 22 in Big Data Hadoop by Karan
29 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You can use the below code to do this:

rdd=sc.textfile(inputfile)

wc=rdd.flatMap(lambda line: line.split()).map(lambda w : (w,1)).reduceByKey(lambda a,b:a+b).map(lambda (a,b) : (b,a)).sortByKey(ascending=True)

output=sc.collect()


for (count, word) in output:

       print("%s %i",(word, count))
answered Jan 22 by Omkar
• 65,850 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to count lines in a file on hdfs command?

Use the below commands: Total number of files: hadoop ...READ MORE

answered Aug 10, 2018 in Big Data Hadoop by Neha
• 6,140 points
1,308 views
0 votes
1 answer

How to groupBy/count then filter on count in Scala

I think the exception is caused because ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
3,602 views
0 votes
1 answer

How to add column inside a table in Hive?

Hi, Yes, we can add column inside a ...READ MORE

answered May 15 in Big Data Hadoop by Gitika
• 8,100 points
8 views
0 votes
1 answer

How to install Hadoop on Ubuntu?

You can refer to this blog by ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 9,030 points
255 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
1,663 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
130 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
8,063 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
565 views
+1 vote
1 answer

How to count number of rows in alias in PIG?

COUNT is part of pig LOGS= LOAD 'log'; LOGS_GROUP= ...READ MORE

answered Oct 15, 2018 in Big Data Hadoop by Omkar
• 65,850 points
31 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 65,850 points
328 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.