January 30, 2026

“Building AIs that do human-like philosophy” by Joe Carlsmith

38 minutes

Audio version (read by the author) here, or search for "Joe Carlsmith Audio" in your podcast app.

This is the ninth essay in a series I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but see this introduction for a summary of the essays that have been released thus far, plus a bit more about the series as a whole.

1. Introduction

At this point in the series, I’ve outlined most of my current picture of what it would look like to build a mature science of AI alignment. But I left off one particular topic that I think worth discussing on its own: namely, the importance of building AIs that do what I’ll call “human-like philosophy.”

I want to discuss this topic on its own because I think that the discourse about AI alignment is often haunted by some sense that AI alignment is not, merely, a “scientific” problem. Rather: it's also, in part, a philosophical (and perhaps especially, an ethical) problem; that it's hard, at least in part, because philosophy is hard; and that solving it is likely to require some very sophisticated [...]

---

Outline:

(00:33) 1. Introduction

(04:21) 2. Philosophy as a tool for out-of-distribution generalization

(10:55) 3. Some limits to the importance of philosophy to AI alignment

(17:55) 4. When is philosophy existential?

(22:18) 5. The challenge of human-like philosophy

(22:29) 5.1. The relationship between human-like philosophy and human-like motivations

(27:27) 5.2. How hard is human-like philosophy itself?

(28:08) 5.2.1. Capability

(29:35) 5.2.2. Disposition

(33:41) 6. What does working on this look like?