November 19, 2025

“Victor Taelin’s notes on Gemini 3” by Gunnar_Zarncke

Listen Later

6 minutes

Victor Taelin of Higher Order Company has some of the hardest computer science problems the LLMs most likely have never seen before and evaluated Gemini 3 on them. Here is his tweet reproduced almost in full.

Short Version

First of all: you've all seen the benchmarks, so I don't think you need me to judge this one. Still, based on my tests, this is as real as it gets, and I want to talk about it. This model outperforms GPT-5 Pro, Gemini 2.5 Deep Think, and everything else, on my hardest problems, by far.

It is the new SOTA at:
→ debugging complex compiler bugs
→ refactoring files without logical mistakes
→ solving difficult λ-calculus problems
→ ASCII art (it is almost decent now!)
→ Competitive Gen 3 OU (won't elaborate 😭)

It is still an LLM, though. It has similar failure modes, and is worse than Sonnet / GPT-5 in some scenarios.

It seems very bad at:
→ inferring intent
→ not going overboard
→ one-shot vibe coding
→ creative writing
→ health questions

Also, I suspect this checkpoint isn't the best Google has.

Now, on to a complete, manually typed Gemini 3 overview.

Long Version

1. Vibe [...]

---

Outline:

(00:25) Short Version

(01:53) Long Version

---

First published:

November 18th, 2025

Source:

https://www.lesswrong.com/posts/N7oRkcz3PrNQSNyw9/victor-taelin-s-notes-on-gemini-3

---

Narrated by TYPE III AUDIO.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

LessWrong (30+ Karma)

By LessWrong

November 19, 2025

“Victor Taelin’s notes on Gemini 3” by Gunnar_Zarncke

Listen Later

6 minutes

Victor Taelin of Higher Order Company has some of the hardest computer science problems the LLMs most likely have never seen before and evaluated Gemini 3 on them. Here is his tweet reproduced almost in full.

Short Version

First of all: you've all seen the benchmarks, so I don't think you need me to judge this one. Still, based on my tests, this is as real as it gets, and I want to talk about it. This model outperforms GPT-5 Pro, Gemini 2.5 Deep Think, and everything else, on my hardest problems, by far.

It is the new SOTA at:
→ debugging complex compiler bugs
→ refactoring files without logical mistakes
→ solving difficult λ-calculus problems
→ ASCII art (it is almost decent now!)
→ Competitive Gen 3 OU (won't elaborate 😭)

It is still an LLM, though. It has similar failure modes, and is worse than Sonnet / GPT-5 in some scenarios.

It seems very bad at:
→ inferring intent
→ not going overboard
→ one-shot vibe coding
→ creative writing
→ health questions

Also, I suspect this checkpoint isn't the best Google has.

Now, on to a complete, manually typed Gemini 3 overview.

Long Version

1. Vibe [...]

---

Outline:

(00:25) Short Version

(01:53) Long Version

---

First published:

November 18th, 2025

Source:

https://www.lesswrong.com/posts/N7oRkcz3PrNQSNyw9/victor-taelin-s-notes-on-gemini-3

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

112,217 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

131 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,243 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

558 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,290 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners