Domesticating AI

Your First AI at Home


Listen Later

Domesticating AI — S01E01: Your First AI at Home

Hosts: Miriah Peterson, Matt Sharp, Chris Brousseau

This episode is your practical on-ramp to running AI at home: why inference engines matter, what to install first, and how to make “local AI” feel stable instead of fragile. The hosts start with a hardware + market reality check (tinygrad’s tinybox-style “AI server appliance” idea and the ongoing memory/RAM crunch), then break down what an inference engine actually does, how popular runtimes compare (llama.cpp, vLLM, Ollama, TGI), and a sane starter workflow for getting from “downloaded a model” to “usable local AI.”

  • Inference engines are the “runtime”: model loading, tokenization, KV cache/context handling, and the serving layer.
  • ​Pick your engine based on your goal: tinkering (llama.cpp) vs serving throughput (vLLM/TGI) vs it-just-works packaging (Ollama).
  • ​You don’t need a brand-new rig to start, but RAM/VRAM constraints will shape everything.
  • ​Use leaderboards as a hint, then validate with your own small eval prompts that match your workload.
  • ​If you’re exposing anything beyond your LAN: reverse proxy + TLS + don’t casually open ports.

0:00 Intro + host chaos + what the show is

1:08 News: tinygrad / “AI server appliance” thinking (tinybox vibes)

2:44 News: RAM prices + the memory crunch for builders

8:26 Main: building your first AI at home (why now)

8:49 What is an inference engine?

12:30 Engines compared: llama.cpp vs vLLM vs Ollama vs TGI

15:42 Do you need to buy a new computer? (CPU vs GPU realities)

25:32 Models for home: fit-to-hardware, quantization, context

34:37 Leaderboards vs evals: picking models you can trust

44:00 Community + meetups + where to follow

45:22 Outro — “Keep your AI on a leash”

News / context

  • ​Tom’s Hardware: TinyBox production + multi-GPU appliance concept (Tom's Hardware)
  • ​Reuters: AI-driven memory shortage / supply-chain crunch (Reuters)
  • ​IDC: 2026 device impacts from the memory shortage (IDC)

Inference engines

  • ​llama.cpp (GGML org) (GitHub)
  • ​vLLM OpenAI-compatible server (docs.vllm.ai)
  • ​Ollama docs (quickstart) (Ollama Documentation)
  • ​Hugging Face Text Generation Inference (TGI) (GitHub)
  • Miriah Peterson: Software engineer, Go educator, and community builder focused on production-first AI. Runs SoyPete Tech (streams + writing + open-source).
  • Matt Sharp: AI Engineer/Strategist, co-author of LLMs in Production, MLOps practitioner. Writes The Data Pioneer. (thedatapioneer.substack.com)
  • Chris Brousseau: NLP practitioner, co-author of LLMs in Production, VP of AI at VEOX. You can find him as IMJONEZZ. (veox.ai)
  • SoyPete Tech (YouTube): (youtube.com)
  • SoyPete Tech (Substack): (soypetetech.substack.com)
  • Matt’s Substack (The Data Pioneer): (thedatapioneer.substack.com)
  • Chris on YouTube (IMJONEZZ): (youtube.com)
  • LLMs in Production (book): (Manning Publications)
...more
View all episodesView all episodes
Download on the App Store

Domesticating AIBy SoyPete Tech