Word Clouds using TabPy

0 votes

I'd want to write some TabPy logic that counts the frequency of terms in a column and removes stop words for a Tableau word cloud.

In Python, I can accomplish this rather easily:

other1_count = other1.answer.str.split(expand=True).stack().value_counts()
other1_count = other1_count.to_frame().reset_index()
other1_count.columns = ['Word', 'Count']

### Remove stopwords
other1_count['Word'] = other1_count['Word'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))
other1_count['Word'].replace('', np.nan, inplace=True)
other1_count.dropna(subset=['Word'], inplace=True)
other1_count = other1_count[~other1_count.Word.str.contains("nan")]

But I'm not sure how to do that using TabPy. Anyone familiar with TabPy and how I might be able to get this to work?

Please accept my sincere gratitude.

Apr 8 in Tableau by Vaani
• 7,020 points
196 views

1 answer to this question.

0 votes

I worked on a project in R a while back that did something pretty similar. Here's an example of a video demonstrating the proof-of-concept (no audio). https://www.screencast.com/t/xa0yemiDPl

It effectively depicts the ultimate result of using Tableau to interactively explore wine descriptions in a word cloud for the chosen nations. The following were the most important elements:

Tableau connects to the data to be analysed, as well as a placeholder dataset containing the number of records you anticipate to get from your Python/R code (Tableau expects to receive the same number of records back from Python/R as it sends forth to be processed). It can be a problem if you're transmitting text data but processing it to produce a large number of records (as in the word cloud example).


The Word and Frequency counts are returned in a single vector, separated by a delimiter, by Python/R code that connects to your data (what Tableau will require for a word cloud).


Tableau Calculated Fields are used to split the single vector and parameter actions are used to pick parameter values to pass to the Python/R code.

overview

Tableau Calculated Field - [R Words+Freq]:

Script_Str('
print("STARTING NEW SCRIPT RUN")
print(Sys.time())
print(.arg2) # grouping
print(.arg1) # selected country


# TEST VARIABLE (non-prod)
.MaxSourceDataRecords = 1000 # -1 to disable

# TABLEAU PARAMETER VARIABLES 
.country = "' + [Country Parameter] + '"
.wordsToReturn = ' + str([Return Top N Words]) + '
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

# VARIABLES DERIVED FROM TABLEAU PARAMETER VALUES
.countryUseAll = (.country == "All")
print(.countryUseAll)
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#setwd("C:/Users/jbelliveau/....FILL IN HERE...")
.fileIn = ' + [Source Data Path] + '
#.fileOut = "winemag-with-DTM.csv"

#install.packages("wordcloud")
#install.packages("RColorBrewer") # not needed if installed wordcloud package

library(tm)
library(wordcloud)
library(RColorBrewer) # color package (maps or wordclouds)

wineAll = read.csv(.fileIn, stringsAsFactors=FALSE)

# TODO separately... polarity 

# use all the data or just the parameter selected
print(.countryUseAll)

if ( .countryUseAll ) {
  wine = wineAll # filter down to parameter passed from Tableau
}else{
  wine = wineAll[c(wineAll$country == .country),] # filter down to parameter passed from Tableau
}

# limited data for speed (NOT FOR PRODUCTION)
if( .MaxSourceDataRecords > 0 ){
  print("limiting the number of records to use from input data")
  wine = head(wine, .MaxSourceDataRecords)  
}


corpus = Corpus(VectorSource(wine$description))
corpus = tm_map(corpus, tolower)
#corpus = tm_map(corpus, PlainTextDocument) # https://stackoverflow.com/questions/32523544/how-to-remove-error-in-term-document-matrix-in-r/36161902
corpus = tm_map(corpus, removePunctuation)
corpus = tm_map(corpus, removeWords, stopwords("English"))
#length(corpus)

dtm = DocumentTermMatrix(corpus)

#?sample
mysample = dtm # no sampling (used Head on data read... for speed/simplicity on this example)
#mysample <- dtm[sample(1:nrow(dtm), 5000, replace=FALSE),]
#nrow(mysample)
wineSample = as.data.frame(as.matrix(mysample))

# column names (the words)
# use colnames to get a vector of the words
#colnames(wineSample)

# freq of words
# colSums to get the frequency of the words
#wineWordFreq = colSums(wineSample)

# structure in a way Tableau will like it
wordCloudData = data.frame(words=colnames(wineSample), freq=colSums(wineSample))
str(wordCloudData)

# sort by word freq
wordCloudDataSorted = wordCloudData[order(-wordCloudData$freq),]

# join together by ~ for processing once Tableau gets it
wordAndFreq = paste(wordCloudDataSorted[, 1], wordCloudDataSorted[, 2], sep = "~")

#write.table(wordCloudData, .fileOut, sep=",",row.names=FALSE) # if needed for performance refactors

topWords = head(wordAndFreq, .wordsToReturn)
#print(topWords)

return( topWords )

',
Max([Country Parameter])
, MAX([RowNum]) // for testing the grouping being sent to R
)

Tableau Calculated Field for the Word Value:

// grab the first token to the left of ~
Left([R Words+Freq], Find([R Words+Freq],"~") - 1)

Tableau Calculated Field for the Frequency Value:

INT(REPLACE([R Words+Freq],[Word]+"~",""))

If you're not familiar with Tableau, you'll probably want to work with someone who is. They'll be able to assist you with creating calculated fields and connecting Tableau to TabPy.

answered Apr 21 by Neha
• 8,940 points

Related Questions In Tableau

0 votes
1 answer

Python integration with Tableau using Tabpy Module

Follow the below steps: 1. Visit this link https://github.com/tableau/TabPy and ...READ MORE

answered Jun 19, 2019 in Tableau by Cherukuri
• 33,050 points
2,117 views
0 votes
1 answer

How can I calculate the median of sales price using 3 variables in Tableau

First let me clarify things for you. ...READ MORE

answered Apr 12, 2018 in Tableau by xyz
• 1,560 points
5,972 views
0 votes
1 answer

Can someone tell me about some tips and tricks while using tableau?

1. Tableau tips and tricks: Calendar in ...READ MORE

answered Jul 4, 2018 in Tableau by ffdfd
• 5,550 points

edited Sep 21, 2021 by Soumya 258 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 3,136 views
0 votes
1 answer
0 votes
1 answer

How to fix spelling error in a field or column value using Tableau?

You should be able to use a ...READ MORE

answered Mar 10 in Tableau by Neha
• 8,940 points
171 views
0 votes
1 answer

Convert String to Date in Tableau using DATEPARSE

You should be able to make the ...READ MORE

answered Mar 28 in Tableau by Neha
• 8,940 points
1,229 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP