Monday's CONSTRUCT follows a practical tension: model capability is moving, but the systems around the model now decide whether that capability becomes usable work.
- Google DeepMind and Kaggle's agentic evaluation talk anchors the episode's argument that benchmark creation has to move from a small research circle into ordinary developer practice.
- Tren Griffin's Microsoft and GitHub Copilot post gives the enterprise version of the same issue: companies don't just buy a model, they buy the harness where feedback and spending show up.
- Two Minute Papers' Demis Hassabis interview summary supplies the science platform frame, where many specialized models become a drug-discovery system rather than one magic model.
- The llama.cpp CUDA Walsh-Hadamard pull request shows the other end of progress: a small kernel-level gain can change local inference economics when it lands in common tooling.
- Ivan Fioravanti's MLX DeepSeek V4 Flash post points at the pressure to make large models fit on consumer Apple hardware with custom quantization.
- Viv's note on the Hugging Face agent vocabulary write-up closes the loop: people can't operate shared systems if they don't agree on what an agent, harness, environment, and evaluation mean.