June 24, 2025

Design for Provably Safe AI with Evan Miyazono

1 hour 9 minutes

Membership | Donations | Spotify | YouTube | Apple Podcasts

This week’s guest is my friend Evan Miyazono, CEO and Director of Atlas Computing — a tech non-profit committed not to the false god of perfect alignment but to plausible strategy of provable safety. Focusing on community building, cybersecurity, and biosecurity, Evan and his colleagues are working to advance a new AI architecture that constrains and formally specifies AI outputs, with reviewable intermediary results, collaborating across sectors to promote this radically different and more empirical approach to applied machine intelligence.

After completing his PhD in Applied Physics at Caltech, Evan led research at Protocol Labs, creating their research grants program, and led the special projects team that created Hypercerts, Funding the Commons, gov4git, and key parts of Discourse Graphs and the initial Open Agency Architecture proposal.

In our conversation we talk about a wide swath of topics including regulatory scaling problems, specifying formal organizational charters, the spectre of opacity, and the quantification of trust — all, in some sense, interdisciplinary matters of “game design” in our entanglement with magical technologies and fundamental uncertainty.

If you enjoy this conversation, join the Wisdom x Technology Discord server and consider becoming a member for access to our study groups, community calls, and complete archives.

Founding members also get access to the entire twenty hours of lecture and discussion from my recent course, How to Live in the Future.

Links

• Hire me for speaking or consulting• Explore the Humans On The Loop archives• Dig into nine years of mind-expanding podcasts• Browse the books we discuss on the show at Bookshop.org• Explore the interactive knowledge garden grown from over 250 episodes

Discussed

• Atlas Computing Summary Slides• Atlas Computing Institute Talks (YouTube Playlist)• A Toolchain for AI-Assisted Code Specification, Synthesis and Verification• Also, a relevant paper from Max Tegmark:Provably safe systems: the only path to controllable AGI

Mentioned

Gregory BatesonDavid DalrympleK. Allado-McDowellTerence McKennaYuval Noah HarariCosma ShaliziHenry FarrellHakim BeyNatalie DeprazFrancisco VarelaPierre VermerschPlurality InstitutePuja OhlhaverSean Esbjörn-HargensAlfred North WhiteheadDe Kai

Primer Riff

Are we doing AI alignment wrong? Game designers Forrest Imel and Gavin Valentine define games as having meaningful decisions, uncertain outcomes, and measurable feedback. If any one of these breaks, the game breaks. And we can think about tech ethics through this lens as well. Much of tech discourse is about how one or more of these dimensions has broken the “game” of life on Earth — the removal of meaningful decisions, the mathematical guarantee of self-termination through unsustainable practices, and/or the decoupling of feedback loops.

AI alignment approaches tend to converge on restoring meaningful decisions by getting rid of uncertainty, but it’s a lost cause. It’s futile to encode our values into systems we can’t understand. To the extent that machines think, they think very differently than we do, and characteristically “interpret” our requests in ways that reveal the assumptions we are used to making based on shared context and understanding with other people.

We may not know how a black box AI model arrives at its outputs, but we can evaluate those outputs…and we can segment processes like this so that there are more points at which to review them. One of this show’s major premises is that the design and use of AI systems is something like spellcraft — a domain where precision matters because the smallest deviation from a precise encoding of intent can backfire.

Magic isn’t science in as much as we can say that for spellcraft, mechanistic understanding is, frankly, beside the point. Whatever you may think of it, spellcraft evolved as a practical approach for operating in a mysterious cosmos. Westernized Modernity dismisses magic because Enlightenment era thinking is predicated on the knowability of nature and the conceit that everything can and will eventually bend to principled, rigorous investigation. But this confused accounting just reshuffled its uneradicable remainder of fundamental uncertainty back into a stubbornly persistent Real that continues to exist in excess of language, mathematics, and mechanistic frameworks. Economies, AI, and living systems guarantee uncertain outcomes — and in accepting this, we have to re-engage with magic in the form of our machines. The more alike they become, the more our mystery and open-ended co-improvisation loom back over any goals of final knowledge and control.

In a 2016 essay, Danny Hillis called this The Age of Entanglement. It is a time that calls for an evolutionary approach to technology. Tinkering and re-evaluating, we find ourselves one turn up the helix in which quantitative precision helps us reckon with the new built wilderness of technology. When we cannot fully explain the inner workings of large language models, we have to step back and ask:

What are our values, and how do we translate them into measurable outputs?

How can we break down the wicked problem of AI controllability into chunks on which it’s possible to operate?

How can adaptive oversight and steering fit with existing governance processes?

In other words, how can we properly task the humanities with helping us identify “meaningful decisions” and the sciences with providing “measurable feedback.” Giving science the job of solving uncertainty or defining our values ensures we’ll get as close as we can to certitude about outcomes we definitely don’t want. But if we think like game designers, then interdisciplinary collaboration can help us safely handle the immense power we’ve created and keep the game going.

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit michaelgarfield.substack.com/subscribe

...more

View all episodes

By Michael Garfield

4.9

242242 ratings