Which is the easiest way for text analytics with hadoop

0 votes

 am experimenting with hadoop and the distributions of Hortonwork and cloudera in order to do some simple text analytics. All the examples I have found until now on the web regarding e.g. wordcount deal with only one column. But I have many text files on which wordcount must be applied and the results must be saved in a spreadsheet, each in a separate column. So I was wondering what is the easiest way to do text analytics with hadoop in conjunction with spreadsheets. The functions I need are:

  • transform to lower case
  • filter stopwords
  • transpose results
  • write to excel

Can this be accomplished easily with Pig or Rhadoop or something else?

Nov 22, 2018 in Big Data Hadoop by Neha
• 6,300 points
362 views

1 answer to this question.

0 votes
Apache pig provides CSVExcelStorage class for loading or storing into csv format, it uses CSV conventions of Excel 2007. Apart from that I have also experimented with storing the results from Pig to mongoDB and then reading it into R using rmongodb library.
answered Nov 22, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
1 answer

I have to ingest in hadoop cluster large number of files for testing , what is the best way to do it?

Hi@sonali, It depends on what kind of testing ...READ MORE

answered Jul 8, 2020 in Big Data Hadoop by MD
• 95,300 points
409 views
0 votes
1 answer

Best way of starting & stopping the Hadoop daemons with command line

First way is to use start-all.sh & ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,480 points
6,323 views
0 votes
1 answer

What are some of the famous visualization tools which can be integrated with Hadoop & Hive?

I have personally used two visualization tools ...READ MORE

answered May 1, 2018 in Big Data Hadoop by coldcode
• 2,070 points
1,081 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
7,893 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
1,327 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
62,507 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,992 views
0 votes
1 answer

Which is the Real Time Monitoring tool/API for Hadoop?

If you're using Yarn, there's a rest ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,810 points
600 views