Loading Events

Data Thinning to Avoid Double Dipping by Anna Neufeld, Williams College

Wed, September 25th, 2024
1:00 pm
- 1:50 pm

  • This event has passed.
Image of Wachenheim Science Center

Data Thinning to Avoid Double Dipping by Anna Neufeld, Williams College, 1:00 – 1:50pm, Wednesday September 25, North Science Building 015, Wachenheim, Statistics Colloquium

We refer to the practice of using the same data to fit and evaluate a model as double dipping. Problems arise when standard statistical procedures are applied in settings that involve double dipping. To circumvent the challenges associated with double dipping, one approach is to fit a model on one dataset, and then validate the model on another independent dataset. When we only have access to one dataset, we typically accomplish this via sample splitting. Unfortunately, in many unsupervised problems, sample splitting does not allow us to avoid double dipping. In this talk, we propose data thinning: a very general alternative to sample splitting that can be used to avoid double dipping in both supervised and unsupervised settings.

Event/Announcement Navigation