Tools

There are various tools that can be used to employ topic modeling, including Python and R. Here, we’ll discuss a program called MALLET and the statistical model it uses.


MALLET

Under the Hood: Latent Dirichlet Allocation

MALLET uses an algorithm called Latent Dirichlet Allocation (LDA) that works in this way:

List of Topics

…and a data file containing the percentage of each topic’s presence in each of your documents:

Percentage of Each Topic
David M. Blei, Probabilistic Topic Models

Still Confused? Want to Know Specifics?

  • Matt Jockers’s Topic Modeling “Fable” (LDA Buffet) is a really great, non-technical entry into how LDA works.

  • David Blei invented LDA. Check out his article Introduction to Probabilistic Topic Models in Communications of the ACM for further information about the algorithms LDA uses.


Modifying Your Output


Other Tools