Practicum: Data Scraping and Cleaning
In-Class Presentations: Student 3 and Student 4
Description: The focus of this unit is the process of corpus creation through data scraping and cleaning. We will learn how to scrape content from the web and to process it for future analysis. Discussion will focus on the difference between information and data and the process of transforming one into the other.
- Christof Schöch, “Big? Smart? Clean? Messy? Data in the Humanities.” Journal of Digital Humanities 2.3, 2013 [http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/].
- Matt Kirschenbaum, “The Remaking of Reading: Data Mining and the Digital Humanities” [http://www.csee.umbc.edu/~hillol/NGDM07/abstracts/talks/MKirschenbaum.pdf].
- XML and Web Technologies for Data Sciences with R, Deborah Nolan and Duncan Temple Lang. New York : Springer, 2014. Chapters 1 and 4. [UCD electronic access: http://harvest.lib.ucdavis.edu/F/?func=direct&doc_number=003745611&local_base=UCD01PUB].
- Web companion to XML and Web Technologies for Data Sciences with R, Deborah Nolan and Duncan Temple Lang [http://rxmlwebtech.org/]