Practicum: Tokenizing, lemmatization, frequency and correlation analysis
In-Class Presentations: Student 5 and Student 6
Description: This session will focus on basic text processing techniques required as the basis for nearly all modes of textual analysis. Topics covered will include stemming, lemmatization, semantic reduction, naïve Bayesian classification, and word frequency analysis. We will discuss the hardware and software based biases that computers bring to these tasks and how these affect, direct, expand, and/or limit how human scholars engage with text as data.
- Tasmin, P. “Literary Data Processing.” IBM Journal of Research and Development 1.3 (1957): 249-56. [UCD electronic access: http://harvest.lib.ucdavis.edu/F/?func=direct&doc_number=003134410&local_base=UCD01PUB].
- Tanya Clement, “Text Analysis, Data Mining, and Visualizations in Literary Scholarship,” MLA Commons [https://dlsanthology.commons.mla.org/text-analysis-data-mining-and-visualizations-in-literary-scholarship/].
- Stanford NLP Group, “Stemming and Lemmatization” [http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html].
- Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. An Introduction to Information Retreival. Cambridge: Cambridge UP, 2009. Web. [http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf].
- DeltaDNA, “Text Mining in R Tutorial: Term Frequency & Word Clouds” [https://deltadna.com/blog/text-mining-in-r-for-term-frequency/].
- Jockers, Matthew Lee, Text Analysis with R for Students of Literature. Cham: Springer-Verlag, 2014. Chapters 2, 3, and 4. [UC Davis Only: http://harvest.lib.ucdavis.edu/F/?func=direct&doc_number=003745646&local_base=UCD01PUB].
- Graham Williams, “Hands-On Data Science with R: Text Mining,” November 5, 2014 [http://handsondatascience.com/TextMiningO.pdf].