July 25, 2025

Safety, Evaluation, and Reasoning

54 minutes

Generated with Google NotebookLM.

In this episode, we dive into the cutting edge of Large Language Models (LLMs)—their promise, their pitfalls, and the novel techniques reshaping how they’re used in the wild.

We unpack a wide spectrum of advancements:

Privacy Risks in fine-tuning: How LoRA-adapted models are vulnerable to Membership Inference Attacks, and what defenses like dropout and differential privacy can do about it.
Auto-Grading Physics Exams: Meet AlphaPhysics, a hybrid system that uses LLMs, computer algebra, and term rewriting to accurately grade even complex equations.
LLMs as Critics: Explore CLEAR, a tool that uses LLMs themselves to identify recurring reasoning failures in math and retrieval-augmented generation (RAG).
Reasoning in Arabic Tables: Enter AraTable, a benchmark for Arabic tabular understanding, with an Assisted Self-Deliberation (ASD) framework pushing multilingual evaluation forward.
Smarter Code Reviews: Discover how symbolic reasoning enhances LLMs' ability to flag subtle code defects beyond current semantic techniques.
Interpretability Research: Learn how Sparse Autoencoders and wMPPC are being used to analyze how language and vision models share core internal concepts.
AI Safety at Scale: Dive into SafeWork-R1, a safety-aligned model trained using the SafeLadder framework and multiple ethical verifiers to steer behavior.
Domain-Specific Data Synthesis: See how AQuilt generates instruction-tuning data for legal and medical domains through embedded logic and self-inspection—cutting cost while raising relevance.

If you're tracking where AI is going next, this episode is your briefing on the research shaping the next generation of intelligent systems.

Sources:

https://arxiv.org/pdf/2507.18584v1.pdf

https://arxiv.org/pdf/2507.18576v1.pdf

https://arxiv.org/pdf/2507.18512v1.pdf

https://arxiv.org/pdf/2507.18476v1.pdf

https://arxiv.org/pdf/2507.18442v1.pdf

https://arxiv.org/pdf/2507.18392v1.pdf

https://arxiv.org/pdf/2507.18391v1.pdf

https://arxiv.org/pdf/2507.18337v1.pdf

https://arxiv.org/pdf/2507.18302v1.pdf

...more

View all episodes

By Scot Bearss

July 25, 2025

Safety, Evaluation, and Reasoning

54 minutes

Generated with Google NotebookLM.

In this episode, we dive into the cutting edge of Large Language Models (LLMs)—their promise, their pitfalls, and the novel techniques reshaping how they’re used in the wild.

We unpack a wide spectrum of advancements:

Privacy Risks in fine-tuning: How LoRA-adapted models are vulnerable to Membership Inference Attacks, and what defenses like dropout and differential privacy can do about it.
Auto-Grading Physics Exams: Meet AlphaPhysics, a hybrid system that uses LLMs, computer algebra, and term rewriting to accurately grade even complex equations.
LLMs as Critics: Explore CLEAR, a tool that uses LLMs themselves to identify recurring reasoning failures in math and retrieval-augmented generation (RAG).
Reasoning in Arabic Tables: Enter AraTable, a benchmark for Arabic tabular understanding, with an Assisted Self-Deliberation (ASD) framework pushing multilingual evaluation forward.
Smarter Code Reviews: Discover how symbolic reasoning enhances LLMs' ability to flag subtle code defects beyond current semantic techniques.
Interpretability Research: Learn how Sparse Autoencoders and wMPPC are being used to analyze how language and vision models share core internal concepts.
AI Safety at Scale: Dive into SafeWork-R1, a safety-aligned model trained using the SafeLadder framework and multiple ethical verifiers to steer behavior.
Domain-Specific Data Synthesis: See how AQuilt generates instruction-tuning data for legal and medical domains through embedded logic and self-inspection—cutting cost while raising relevance.

If you're tracking where AI is going next, this episode is your briefing on the research shaping the next generation of intelligent systems.

Sources:

https://arxiv.org/pdf/2507.18584v1.pdf

https://arxiv.org/pdf/2507.18576v1.pdf

https://arxiv.org/pdf/2507.18512v1.pdf

https://arxiv.org/pdf/2507.18476v1.pdf

https://arxiv.org/pdf/2507.18442v1.pdf

https://arxiv.org/pdf/2507.18392v1.pdf

https://arxiv.org/pdf/2507.18391v1.pdf

https://arxiv.org/pdf/2507.18337v1.pdf

https://arxiv.org/pdf/2507.18302v1.pdf

...more

Share Safety, Evaluation, and Reasoning

Sign up to save your podcasts

Safety, Evaluation, and Reasoning

Safety, Evaluation, and Reasoning