August 06, 2025

Cognition, Contracts, and Compression

26 minutes

Generated Google NotebookLM.

Episode Description:
In this episode, we explore 10 new papers advancing our understanding of how LLMs think, how agents can be trusted, and how systems can scale more efficiently:

What LLMs really "know" – UCCT proposes a formal theory of cognition in LLMs, arguing intelligence is emergent and context-triggered—not intrinsic.
Rethinking RAG – CoCoA and CoCoA-zero show how multi-agent collaboration improves synergy between internal model memory and retrieved context.
Efficiency, by design – Efficient Agents sheds light on cost/performance trade-offs in agent systems, while Blueprint First separates logic from generation to enable deterministic workflows.
Contrastive learning, upgraded – Context-Adaptive Multi-Prompt Embedding improves vision-language alignment with adaptive token prompts and diversity constraints.
Inference-time teaming – CTTS scales up LLM performance via collective test-time scaling, using reward model ensembles and agent collaboration.
At the edge – A new adaptive agent placement and migration framework uses LLMs and ant colony optimization to meet real-time edge constraints.
Smarter chains of thought – A step entropy metric allows LLMs to prune redundant reasoning during inference, improving cost-efficiency without sacrificing accuracy.
Quantization, vision-style – VLMQ brings post-training quantization to Vision-Language Models, optimizing for both modality balance and efficiency.
Reliable by contract – A Design-by-Contract–inspired layer enables neurosymbolic agents to enforce input-output constraints, offering a formal basis for agent safety.

From the nature of LLM cognition to practical methods for verifiable, scalable deployment, this episode highlights where theory meets engineering—and where structure enhances trust.

Sources:

The Unified Cognitive Consciousness Theory for Language Models (UCCT) | HTML
CoCoA: Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy | HTML
Efficient Agents: Building Effective Agents While Reducing Cost | HTML
Blueprint First, Model Second: A Framework for Deterministic LLM Workflow | HTML
Context-Adaptive Multi-Prompt LLM Embedding for Vision-Language Alignment | HTML
CTTS: Collective Test-Time Scaling | HTML
Adaptive AI Agent Placement and Migration in Edge Intelligence Systems | HTML
Compressing Chain-of-Thought in LLMs via Step Entropy | HTML
VLMQ: Efficient Post-Training Quantization for Vision-Language Models | HTML
A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design | HTML

...more

View all episodes

By Scot Bearss

August 06, 2025

Cognition, Contracts, and Compression

26 minutes

Generated Google NotebookLM.

Episode Description:
In this episode, we explore 10 new papers advancing our understanding of how LLMs think, how agents can be trusted, and how systems can scale more efficiently:

What LLMs really "know" – UCCT proposes a formal theory of cognition in LLMs, arguing intelligence is emergent and context-triggered—not intrinsic.
Rethinking RAG – CoCoA and CoCoA-zero show how multi-agent collaboration improves synergy between internal model memory and retrieved context.
Efficiency, by design – Efficient Agents sheds light on cost/performance trade-offs in agent systems, while Blueprint First separates logic from generation to enable deterministic workflows.
Contrastive learning, upgraded – Context-Adaptive Multi-Prompt Embedding improves vision-language alignment with adaptive token prompts and diversity constraints.
Inference-time teaming – CTTS scales up LLM performance via collective test-time scaling, using reward model ensembles and agent collaboration.
At the edge – A new adaptive agent placement and migration framework uses LLMs and ant colony optimization to meet real-time edge constraints.
Smarter chains of thought – A step entropy metric allows LLMs to prune redundant reasoning during inference, improving cost-efficiency without sacrificing accuracy.
Quantization, vision-style – VLMQ brings post-training quantization to Vision-Language Models, optimizing for both modality balance and efficiency.
Reliable by contract – A Design-by-Contract–inspired layer enables neurosymbolic agents to enforce input-output constraints, offering a formal basis for agent safety.

From the nature of LLM cognition to practical methods for verifiable, scalable deployment, this episode highlights where theory meets engineering—and where structure enhances trust.

Sources:

The Unified Cognitive Consciousness Theory for Language Models (UCCT) | HTML
CoCoA: Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy | HTML
Efficient Agents: Building Effective Agents While Reducing Cost | HTML
Blueprint First, Model Second: A Framework for Deterministic LLM Workflow | HTML
Context-Adaptive Multi-Prompt LLM Embedding for Vision-Language Alignment | HTML
CTTS: Collective Test-Time Scaling | HTML
Adaptive AI Agent Placement and Migration in Edge Intelligence Systems | HTML
Compressing Chain-of-Thought in LLMs via Step Entropy | HTML
VLMQ: Efficient Post-Training Quantization for Vision-Language Models | HTML
A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design | HTML

...more

Share Cognition, Contracts, and Compression

Sign up to save your podcasts

Cognition, Contracts, and Compression

Cognition, Contracts, and Compression