
Sign up to save your podcasts
Or


In this episode I sit down with Amir to get tactical about running local AI models as part of a daily workflow. We center on GLM 5.2 from ZAI, how it stacks up against frontier models like Opus 4.8, and how a fusion approach lets you sequence a heavy thinking model with a lighter execution model for the best output at the lowest cost. Amir walks through setup in Cursor and Codex via OpenRouter, shares real token-cost math, and demos GLM 5.2 refining a live app. By the end you will know how to start today, where local models shine, and how model chaining keeps spend in check.
Timestamps
00:00 – Intro
02:09 – GLM 5.2 and Z AI
04:01 – Specs: 1M context and Terminal Bench 2.1
05:22 – Making sense of benchmark scores
06:42 – Setup in Cursor or Codex with OpenRouter
10:18 – Local model upside: buy a machine, run tasks
11:42 – Token cost: 44 cents versus $2.38
13:36 – Future-proofing with an upfront hardware bet & The Uber subsidy analogy
16:49 – Model chaining and the vision workaround
19:23 – Token maxing vs routing tasks to the right model
20:54 – Answering the "cost is irrelevant" crowd
21:59 – Closing thoughts
Key Points
The #1 tool to find startup ideas/trends - https://www.ideabrowser.com
LCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/
The Vibe Marketer - Resources for people into vibe marketing/marketing with AI: https://www.thevibemarketer.com/
FIND ME ON SOCIAL
X/Twitter: https://twitter.com/gregisenberg
Instagram: https://instagram.com/gregisenberg/
LinkedIn: https://www.linkedin.com/in/gisenberg/
FIND AMIR ON SOCIAL
Humblytics: https://humblytics.com/?via=community
X/Twitter: https://x.com/amirmxt
Youtube: https://www.youtube.com/@amirmxt
By Greg Isenberg4.7
204204 ratings
In this episode I sit down with Amir to get tactical about running local AI models as part of a daily workflow. We center on GLM 5.2 from ZAI, how it stacks up against frontier models like Opus 4.8, and how a fusion approach lets you sequence a heavy thinking model with a lighter execution model for the best output at the lowest cost. Amir walks through setup in Cursor and Codex via OpenRouter, shares real token-cost math, and demos GLM 5.2 refining a live app. By the end you will know how to start today, where local models shine, and how model chaining keeps spend in check.
Timestamps
00:00 – Intro
02:09 – GLM 5.2 and Z AI
04:01 – Specs: 1M context and Terminal Bench 2.1
05:22 – Making sense of benchmark scores
06:42 – Setup in Cursor or Codex with OpenRouter
10:18 – Local model upside: buy a machine, run tasks
11:42 – Token cost: 44 cents versus $2.38
13:36 – Future-proofing with an upfront hardware bet & The Uber subsidy analogy
16:49 – Model chaining and the vision workaround
19:23 – Token maxing vs routing tasks to the right model
20:54 – Answering the "cost is irrelevant" crowd
21:59 – Closing thoughts
Key Points
The #1 tool to find startup ideas/trends - https://www.ideabrowser.com
LCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/
The Vibe Marketer - Resources for people into vibe marketing/marketing with AI: https://www.thevibemarketer.com/
FIND ME ON SOCIAL
X/Twitter: https://twitter.com/gregisenberg
Instagram: https://instagram.com/gregisenberg/
LinkedIn: https://www.linkedin.com/in/gisenberg/
FIND AMIR ON SOCIAL
Humblytics: https://humblytics.com/?via=community
X/Twitter: https://x.com/amirmxt
Youtube: https://www.youtube.com/@amirmxt

3,450 Listeners

1,289 Listeners

541 Listeners

1,579 Listeners

1,095 Listeners

1,259 Listeners

2,172 Listeners

229 Listeners

4,468 Listeners

2,659 Listeners

361 Listeners

657 Listeners

268 Listeners

54 Listeners

32 Listeners