This text analysis uses NLP functions from R tm{} package to explore and understand the corpus before the implementation of a n-gram prediction model.
The corpus is the “English-US” dataset obtained from HC Corpora. See their readme file for details on the corpora available.
See the report here.