Uncategorized

Word Clouds

Word clouds are a cool way to show themes in text.  I did this word cloud with my master's thesis!

1.Require packages
 library(RXKCD)
 library(tm)
 library(wordcloud)
 library(RColorBrewer)
 library(SnowballC)

2.Pull out text file
 filePath <- "C:/data/blog/wordcloud/kk.txt"
 text <- readLines(filePath)
 docs <- Corpus(VectorSource(text))
 inspect(docs)

3.Get rid of messy characters
 toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
 docs <- tm_map(docs, toSpace, "/")
 docs <- tm_map(docs, toSpace, "@")
 docs <- tm_map(docs, toSpace, "\\|")

4.Convert the text to lower case
 docs <- tm_map(docs, content_transformer(tolower))

5. Remove numbers
 docs <- tm_map(docs, removeNumbers)

6. Remove english common stopwords
 docs <- tm_map(docs, removeWords, stopwords("english"))

7. Remove punctuations
 docs <- tm_map(docs, removePunctuation)

8.Eliminate extra white spaces
 docs <- tm_map(docs, stripWhitespace)

9. Convert to matrix
 dtm <- TermDocumentMatrix(docs)
 m <- as.matrix(dtm)
 v <- sort(rowSums(m),decreasing=TRUE)
 d <- data.frame(word = names(v),freq=v)
 head(d, 10)

10. Create wordcloud
 wordcloud(words = d$word, freq = d$freq, min.freq = 1,
 max.words=200, random.order=FALSE, rot.per=0.35,
 colors=brewer.pal(8, "Dark2"))

pic

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s