
Sign up to save your podcasts
Or


Modern AI theory often struggles to explain why certain datasets enable better out-of-distribution generalization than others, as classical information theory fails to account for computational constraints. This research introduces epiplexity, a new metric that quantifies the structural, learnable information an observer can extract within a limited time budget. Unlike standard entropy, which treats random noise and complex patterns similarly, epiplexity distinguishes reusable structure from inherent randomness. By applying this lens, the authors resolve three paradoxes regarding data ordering, deterministic transformations, and the limits of distribution matching. Their findings demonstrate that language data possesses significantly higher epiplexity than image data, explaining its superior transferability. Ultimately, the paper suggests that data selection should prioritize high-epiplexity sources to maximize the acquisition of functional, intelligent behaviors.
By Enoch H. KangModern AI theory often struggles to explain why certain datasets enable better out-of-distribution generalization than others, as classical information theory fails to account for computational constraints. This research introduces epiplexity, a new metric that quantifies the structural, learnable information an observer can extract within a limited time budget. Unlike standard entropy, which treats random noise and complex patterns similarly, epiplexity distinguishes reusable structure from inherent randomness. By applying this lens, the authors resolve three paradoxes regarding data ordering, deterministic transformations, and the limits of distribution matching. Their findings demonstrate that language data possesses significantly higher epiplexity than image data, explaining its superior transferability. Ultimately, the paper suggests that data selection should prioritize high-epiplexity sources to maximize the acquisition of functional, intelligent behaviors.