May 19, 2025

Augmenting Data and Inferential Frameworks

40 minutes

We discuss how augmenting structured data with features from unstructured sources, like text or images, impacts statistical analysis. They propose that while this augmentation can introduce bias through the use of machine learning models for feature extraction, it also leads to a significant reduction in variance due to the richer information available. The text explores whether this variance reduction is substantial enough to overcome the introduced bias, potentially leading to improved control of False Discovery Rate and increased statistical power (reduced Type II errors). Different statistical frameworks, including Prediction-Powered Inference (PPI), Recalibrated PPI (RePPI), and MARS (Missing At Random Structured Data), are presented as methods developed to enable valid and efficient inference despite the complexities introduced by using ML-derived features from unstructured data.

...more

View all episodes

By Enoch H. Kang

May 19, 2025

Augmenting Data and Inferential Frameworks

40 minutes

...more

Share Augmenting Data and Inferential Frameworks

Sign up to save your podcasts

Augmenting Data and Inferential Frameworks

Augmenting Data and Inferential Frameworks