Text Cleaning
- RegExr, an online tool to learn, build, and test Regular Expressions
- Regex lessons
- Regex cheat sheet
- Ted Underwood, DataMunging, example of Python scripts used to clean and normalize OCR
Topic Modeling
- Megan R. Brett, Topic Modeling: A Basic Introduction
- Ted Underwood, Topic Modeling Made Just Simple Enough
- Scott Weingart, Topic Modeling for Humanists: A Guided Tour, Topic Modeling and Text Analysis
- MALLET (Developed by David Mimno)
- David M. Blei et al (2003): Latent Dirichlet Allocation