
Sign up to save your podcasts
Or


In this episode James and Frank dive into running AI coding models locally versus in the cloud—BYOK/Open Router, VS Code’s chat/agent harness, model runners (Olama, vLLM), and the practicality of 27B models on a 3090 using 4‑bit quantization. They share hands-on takeaways—how recent engineering (MT/MTPLX) boosts inference to usable token rates, when auto model selection makes sense, cost and hardware trade‑offs, and why local models can liberate your workflow while still needing smarter, unified tooling.
⭐⭐ Review Us ⭐⭐
Machine transcription available on http://mergeconflict.fm
Support Merge Conflict
By soundbite.fm4.9
8686 ratings
In this episode James and Frank dive into running AI coding models locally versus in the cloud—BYOK/Open Router, VS Code’s chat/agent harness, model runners (Olama, vLLM), and the practicality of 27B models on a 3090 using 4‑bit quantization. They share hands-on takeaways—how recent engineering (MT/MTPLX) boosts inference to usable token rates, when auto model selection makes sense, cost and hardware trade‑offs, and why local models can liberate your workflow while still needing smarter, unified tooling.
⭐⭐ Review Us ⭐⭐
Machine transcription available on http://mergeconflict.fm
Support Merge Conflict

382 Listeners

288 Listeners

3,059 Listeners

3,722 Listeners

83 Listeners

985 Listeners

8,077 Listeners

212 Listeners

34 Listeners

22 Listeners

242 Listeners

2,030 Listeners

3 Listeners

18 Listeners

101 Listeners