Loading Events

Latent Dirichlet Allocation for Topic Modeling by Lindsay Wang '24 & Jenny Tian '24

Wed, May 1st, 2024
1:10 pm
- 1:50 pm

  • This event has passed.
Image of Wachenheim Science Center

Latent Dirichlet Allocation for Topic Modeling by Lindsay Wang ’24 & Jenny Tian ’24, Wednesday May 1, 1:10 – 1:50pm, North Science Building 015, Wachenheim, Statistics Colloquium

Abstract:

Throughout our everyday lives, we interface with large corpora of textual information, composed of hundreds upon thousands of documents and millions of words. How can we determine what is contained in all those documents? Topic modeling is an unsupervised learning model for extracting those “latent” (i.e., unknown) topics from a collection of documents. One of the primary models utilized for topic modeling is Latent Dirichlet Allocation (LDA), a Bayesian generative probabilistic model that groups words and documents into a predefined number of k topics. After reviewing the LDA algorithm for generating topic and word distributions and collapsed Gibbs sampling for LDA inference, we consider the case study of topic modeling a corpus of texts written by Nobel Prize in Literature authors.

Event/Announcement Navigation