
Sign up to save your podcasts
Or


This paper introduces computational toolkit designed to correct errors in demand estimation when using unstructured data, such as images or text, to represent products. Because researchers often use machine learning embeddings as proxies for true product attributes, these approximations can introduce statistical bias that leads to inaccurate predictions of consumer behavior. The authors propose a bias-correction method and diagnostic tests to ensure that these proxies adequately capture the dimensions of differentiation driving market choices. Their approach is efficient and lightweight, integrating easily into standard workflows without requiring extensive re-computation or optimization. Simulations and empirical applications demonstrate that this method significantly improves the accuracy of counterfactual predictions, such as how consumers react when a product is removed from the market. Ultimately, the research provides a rigorous framework for economists to use high-dimensional, complex data while maintaining the integrity of their structural models.
By Enoch H. KangThis paper introduces computational toolkit designed to correct errors in demand estimation when using unstructured data, such as images or text, to represent products. Because researchers often use machine learning embeddings as proxies for true product attributes, these approximations can introduce statistical bias that leads to inaccurate predictions of consumer behavior. The authors propose a bias-correction method and diagnostic tests to ensure that these proxies adequately capture the dimensions of differentiation driving market choices. Their approach is efficient and lightweight, integrating easily into standard workflows without requiring extensive re-computation or optimization. Simulations and empirical applications demonstrate that this method significantly improves the accuracy of counterfactual predictions, such as how consumers react when a product is removed from the market. Ultimately, the research provides a rigorous framework for economists to use high-dimensional, complex data while maintaining the integrity of their structural models.