Topic Modeling Activity
Topic modeling in-browser with jsLDA
Time to experiment! Download these texts and upload them to jsLDA:
- Frankenstein; Or, The Modern Prometheus by Mary Wollstonecraft Shelley (Retrieved from Project Gutenberg and formatted for jsLDA)
- U.S. Presidents’ Inaugural Speeches (Retrieved from DH Toychest and formatted for jsLDA: all 57 inaugural speeches from Washington through Obama collected from the American Presidency Project with the assistance of project co-director John T. Woolley; assembled as individual plain-text files by Alan Liu)
- Lyrical Ballads, 1798 by William Wordsworth with Samuel Taylor Coleridge (Retrieved from DH Toychest and formatted for jsLDA; assembled by Alan Liu from Project Gutenberg)
Add a stopword list: NLTK’s list of english stopwords (you can also download this same list as a text file here: stopwords)
You can also try it with your own corpus. Just make sure your text file is formated correctly:
- One document per line, with each document consisting of
[doc ID] [tab] [label] [tab] [text...]
Don’t be afraid to fail or get bad results. Topic modeling is exploratory, and sometimes you have to play around with it before you know what settings work best for your project.
Reflection Time
- What insights, if any, does topic modeling provide?
- What are the values/limitations of ‘not reading’ this way?
- How might the judgment calls you have to make (choosing stopwords, number of topics produced, scope of collection) affect your use of your results as evidence?