February 27, 2026

Hardware-First Home AI: Chips, Memory, Backends, and What to Buy

33 minutes

Episode 3 is a hardware-first guide to running AI at home. We break down what CPUs vs GPUs vs NPUs vs TPUs actually do in the inference pipeline, why memory capacity isn’t the same as performance (model loading, KV cache, and MoE), why backends/runtimes are real constraints (CUDA vs ROCm vs Metal/MLX vs CPU), and how to scale from one box to multi-GPU and multi-machine setups.

Keep your AI on a leash.

Links mentioned:

- GPU Glossary (Modal): https://modal.com/gpu-glossary

- CUDA → ROCm headline: https://wccftech.com/the-claude-code-has-managed-to-port-nvidia-cuda-backend-to-rocm-in-just-30-minutes/

- Unsloth PR: https://github.com/unslothai/unsloth/pull/3856

...more