January 30, 2026

Your First AI at Home

41 minutes

Domesticating AI — S01E01: Your First AI at Home

Hosts: Miriah Peterson, Matt Sharp, Chris Brousseau

This episode is your practical on-ramp to running AI at home: why inference engines matter, what to install first, and how to make “local AI” feel stable instead of fragile. The hosts start with a hardware + market reality check (tinygrad’s tinybox-style “AI server appliance” idea and the ongoing memory/RAM crunch), then break down what an inference engine actually does, how popular runtimes compare (llama.cpp, vLLM, Ollama, TGI), and a sane starter workflow for getting from “downloaded a model” to “usable local AI.”

Inference engines are the “runtime”: model loading, tokenization, KV cache/context handling, and the serving layer.
Pick your engine based on your goal: tinkering (llama.cpp) vs serving throughput (vLLM/TGI) vs it-just-works packaging (Ollama).
You don’t need a brand-new rig to start, but RAM/VRAM constraints will shape everything.
Use leaderboards as a hint, then validate with your own small eval prompts that match your workload.
If you’re exposing anything beyond your LAN: reverse proxy + TLS + don’t casually open ports.

0:00 Intro + host chaos + what the show is

1:08 News: tinygrad / “AI server appliance” thinking (tinybox vibes)

2:44 News: RAM prices + the memory crunch for builders

8:26 Main: building your first AI at home (why now)

8:49 What is an inference engine?

12:30 Engines compared: llama.cpp vs vLLM vs Ollama vs TGI

15:42 Do you need to buy a new computer? (CPU vs GPU realities)

25:32 Models for home: fit-to-hardware, quantization, context

34:37 Leaderboards vs evals: picking models you can trust

44:00 Community + meetups + where to follow

45:22 Outro — “Keep your AI on a leash”

News / context

Tom’s Hardware: TinyBox production + multi-GPU appliance concept (Tom's Hardware)
Reuters: AI-driven memory shortage / supply-chain crunch (Reuters)
IDC: 2026 device impacts from the memory shortage (IDC)

Inference engines

llama.cpp (GGML org) (GitHub)
vLLM OpenAI-compatible server (docs.vllm.ai)
Ollama docs (quickstart) (Ollama Documentation)
Hugging Face Text Generation Inference (TGI) (GitHub)
Miriah Peterson: Software engineer, Go educator, and community builder focused on production-first AI. Runs SoyPete Tech (streams + writing + open-source).
Matt Sharp: AI Engineer/Strategist, co-author of LLMs in Production, MLOps practitioner. Writes The Data Pioneer. (thedatapioneer.substack.com)
Chris Brousseau: NLP practitioner, co-author of LLMs in Production, VP of AI at VEOX. You can find him as IMJONEZZ. (veox.ai)
SoyPete Tech (YouTube): (youtube.com)
SoyPete Tech (Substack): (soypetetech.substack.com)
Matt’s Substack (The Data Pioneer): (thedatapioneer.substack.com)
Chris on YouTube (IMJONEZZ): (youtube.com)
LLMs in Production (book): (Manning Publications)

...more

View all episodes

By SoyPete Tech

January 30, 2026

Your First AI at Home

41 minutes

Domesticating AI — S01E01: Your First AI at Home

Hosts: Miriah Peterson, Matt Sharp, Chris Brousseau

Inference engines are the “runtime”: model loading, tokenization, KV cache/context handling, and the serving layer.
Pick your engine based on your goal: tinkering (llama.cpp) vs serving throughput (vLLM/TGI) vs it-just-works packaging (Ollama).
You don’t need a brand-new rig to start, but RAM/VRAM constraints will shape everything.
Use leaderboards as a hint, then validate with your own small eval prompts that match your workload.
If you’re exposing anything beyond your LAN: reverse proxy + TLS + don’t casually open ports.

0:00 Intro + host chaos + what the show is

1:08 News: tinygrad / “AI server appliance” thinking (tinybox vibes)

2:44 News: RAM prices + the memory crunch for builders

8:26 Main: building your first AI at home (why now)

8:49 What is an inference engine?

12:30 Engines compared: llama.cpp vs vLLM vs Ollama vs TGI

15:42 Do you need to buy a new computer? (CPU vs GPU realities)

25:32 Models for home: fit-to-hardware, quantization, context

34:37 Leaderboards vs evals: picking models you can trust

44:00 Community + meetups + where to follow

45:22 Outro — “Keep your AI on a leash”

News / context

Tom’s Hardware: TinyBox production + multi-GPU appliance concept (Tom's Hardware)
Reuters: AI-driven memory shortage / supply-chain crunch (Reuters)
IDC: 2026 device impacts from the memory shortage (IDC)

Inference engines

llama.cpp (GGML org) (GitHub)
vLLM OpenAI-compatible server (docs.vllm.ai)
Ollama docs (quickstart) (Ollama Documentation)
Hugging Face Text Generation Inference (TGI) (GitHub)
Miriah Peterson: Software engineer, Go educator, and community builder focused on production-first AI. Runs SoyPete Tech (streams + writing + open-source).
Matt Sharp: AI Engineer/Strategist, co-author of LLMs in Production, MLOps practitioner. Writes The Data Pioneer. (thedatapioneer.substack.com)
Chris Brousseau: NLP practitioner, co-author of LLMs in Production, VP of AI at VEOX. You can find him as IMJONEZZ. (veox.ai)
SoyPete Tech (YouTube): (youtube.com)
SoyPete Tech (Substack): (soypetetech.substack.com)
Matt’s Substack (The Data Pioneer): (thedatapioneer.substack.com)
Chris on YouTube (IMJONEZZ): (youtube.com)
LLMs in Production (book): (Manning Publications)

...more

Share Your First AI at Home

Sign up to save your podcasts

Your First AI at Home

Your First AI at Home