R Studio - Text Analytics - Corpus Inspection

Jul 3, 2021

There are several areas in R Studio Text Analytics, which can be examined. One of the main tasks is to explore the text Corpus. I use the file with texts of political speeches of presidents, for the purpose of the political communication examination. This basic part of code will provide the most frequent word in text, which is a 'will'.

Corpus inspection steps:

Installation of packages and Data Load
Data Preparation and selection
Corpus visualisation

## install packages

install.packages("tm")

install.packages("corpus")

install.packages("textplot_wordcloud")

install.packages("text2vec")

install.packages( "readtext" )

install.packages("quanteda")

install.packages("quanteda.textplots")

install.packages("quanteda.textstats")

library(readr)

## import data from csv. and save csv to mac desktop

d = read_csv("~/Desktop/Political_speaches.csv")

## view first 6 rows

head(d)

## Load Libraries

library("tm")

library("SnowballC")

library("wordcloud")

library("RColorBrewer")

setwd('~/Desktop/Political_speaches.csv')

text <- readLines("~/Desktop/Political_speaches.csv")

## Load the data as a corpus

docs <- Corpus(VectorSource(text))

## Convert all text to lower case

docs <- tm_map(docs, content_transformer(tolower))

## Remove punctuations

docs <- tm_map(docs, removePunctuation)

docs <- tm_map(docs, removeNumbers)

docs <- tm_map(docs, removeWords, stopwords("english"))

docs <- tm_map(docs, removeWords, c("big", "small"))

docs <- tm_map(docs, stemDocument)

stopwords(kind = "en")

docs <- tm_map(docs, toSpace, "\\|")

docs <- tm_map(docs, stripWhitespace)

doc_mat <- TermDocumentMatrix(docs)

m <- as.matrix(doc_mat)

v <- sort(rowSums(m), decreasing = TRUE)

d_Rcran <- data.frame(word = names(v), freq = v)

head(d_Rcran, 5)

word freq

will will 11137

state state 9202

govern govern 8515

year year 7240

nation nation 6660

## Word distribution

wordcloud(words = d_Rcran$word,freq = d_Rcran$freq, min.freq = 1,max.words = 100, random.order = FALSE,rot.per = 0.0, colors = brewer.pal(4, "Set1"))

Sarka Pribylova

Email: sarka.pribylova@gmail.com

R Studio - Text Analytics - Corpus Inspection

Recent Posts

Comments

Sarka Pribylova Email: sarka.pribylova@gmail.com

R Studio - Text Analytics - Corpus Inspection

Recent Posts

Comments

Sarka Pribylova

Email: sarka.pribylova@gmail.com