The Stateless Founder

Build an Offline AI Stack That Works When Your WiFi Doesn't


Listen Later

Build an Offline AI Stack That Works When Your WiFi Doesn't
The Problem: Dead Hours on Travel Days

Every nomad founder knows the scenario: you're on a ferry from Split to Hvar, three hours with no signal, and you have a client summary due by the time you dock. Your laptop sits there useless because every tool in your AI stack needs the internet.

This isn't just a ferry problem - it's airports, rural Airbnbs, trains through the Alps, even café WiFi in Portugal that can't hold a connection to Claude for 40 minutes.

Why This Works Now (And Not Two Years Ago)

Desktop GUIs for Local LLMs

  • LM Studio: Full GUI for Mac/Windows/Linux with model browser, local OpenAI-compatible API
  • GPT4All: Cross-platform desktop app with built-in local API server
  • Ollama: Now ships with Windows GUI (2025), eliminating terminal requirements
  • Mature Offline Speech-to-Text

    • whisper.cpp: Fully offline transcription, actively maintained under ggml-org
    • faster-whisper: Real-time dictation apps shipping updates in 2026
    • Hardware Trend: Gartner projects 55% of PC shipments in 2026 will be AI PCs (~143M units with dedicated neural processing hardware)

      The Architecture

      Four Core Pieces:

      1. Local LLM Runner: LM Studio, GPT4All, or Ollama
      2. Local Speech-to-Text: whisper.cpp or faster-whisper
      3. SQLite Database: Running in WAL mode as your work queue
      4. Sync Layer: Litestream pushing to S3-compatible storage when online
      5. The Flow:

        1. Drop audio file into inbox folder
        2. File watcher (Watchman/fswatch) detects new file
        3. Worker enqueues transcription job to SQLite
        4. whisper.cpp transcribes fully offline
        5. Auto-chains summarize and tag jobs to local LLM
        6. Everything stays on laptop until connectivity returns
        7. Litestream syncs database changes to cloud storage
        8. Hardware Requirements

          Conservative Baselines:

          • 7-8B models: 8GB RAM minimum (fits in ~5-8GB at 4-bit quantization)
          • 13B models: 16GB RAM minimum, ideally with GPU offload
          • 70B models: 64GB RAM minimum (not practical for travel)
          • LM Studio Recommendation: 16GB+ RAM for comfortable local inference

            Battery Impact: 4-bit quantization reduces memory bandwidth and power draw per token compared to full precision models

            Run Profiles
            Battery Saver Mode
            • Use Case: Boarding in 12 minutes, battery at 40%
            • Config: 7-8B model, 4-bit quantization, CPU only
            • Context Window: 2-4K tokens
            • Tasks: Summaries, tags, short drafts
            • Performance Target: 15+ tokens/second (scale down if lower)
            • Throughput Mode
              • Use Case: 6-hour train ride with power outlet
              • Config: 13B model, partial GPU offload
              • Context Window: Up to 8K tokens
              • Tasks: Longer drafts, complex summarization
              • Task Partitioning: Local vs Cloud
                Handle Locally
                • Document summarization
                • Content tagging
                • Short draft generation
                • Basic transcription
                • Defer to Cloud Queue
                  • Heavy code generation
                  • Image analysis
                  • Vision tasks
                  • RAG over large document sets
                  • Anything requiring 8K+ context windows
                  • Queue Management: Set priority levels (local tasks at 5, cloud-deferred at 7). Sync worker processes deferred jobs automatically when online.

                    The Sync Strategy

                    Conflict Policy for This Use Case:

                    • Local work creates new content (summaries, tags, drafts)
                    • Not editing shared documents offline
                    • Small conflict surface
                    • Conflict Resolution:

                      • Tags: Merge as set union, remove duplicates
                      • Summaries: Last-writer-wins with timestamp and origin flag (local vs cloud)
                      • Idempotency: Hash of file + task type prevents duplicate processing
                      • Sync Implementation:

                        • Litestream pushes WAL changes to S3-compatible storage
                        • Runs in Docker container, streams incremental changes when connected
                        • Idles gracefully when offline, resumes where it left off
                        • Security and Encryption

                          Database Encryption:

                          • Use SQLCipher (open-source) or SQLite SEE (commercial)
                          • Keys from OS keychain, not environment files
                          • Full database file encryption at rest
                          • Why This Matters: Stolen laptop with client transcripts is a nightmare scenario

                            Real-World Test Results

                            Santi's 2-Hour Offline Drill:

                            • Input: 10-minute audio recording + two 800-word markdown notes
                            • Processing Time: 11 minutes total (4 minutes for transcription on M2 MacBook Pro)
                            • Battery Drain: 6% (82% to 76%)
                            • Sync Time: Under 30 seconds when WiFi restored
                            • Conflicts: Zero
                            • Implementation Checklist
                              Weekend Setup
                              1. Choose Your Runner: Install LM Studio, GPT4All, or Ollama
                              2. Download Models: Start with 7B model for testing
                              3. Set Up Queue: SQLite database with WAL mode
                              4. Install File Watchers: Watchman (Meta) or fswatch
                              5. Configure Sync: Litestream container pointing to your database
                              6. Test Offline: Disconnect for 2 hours, drop files, watch queue
                              7. The Drill Protocol
                                • Run offline test on Saturday, not during client deadline
                                • Disconnect WiFi, Bluetooth, phone in airplane mode
                                • Drop test files, monitor queue processing
                                • Verify sync when connectivity returns
                                • Fix issues at your desk, not in panic mode
                                • Honest Limitations

                                  Performance Trade-offs:

                                  • Local 7B models slower than GPT-4 Turbo
                                  • Summaries good for travel-day triage, need cleanup later
                                  • Not final-draft quality output
                                  • Battery Constraints:

                                    • No hard benchmarks for 2026 laptops yet
                                    • Physics: quantization reduces power per token
                                    • Your mileage varies by hardware and model choice
                                    • Sync Complexity:

                                      • Manageable for single-operator, new content creation
                                      • Would need sophisticated conflict resolution for collaborative editing
                                      • This is personal travel-day safety net, not distributed team workflow
                                      • The Macro Trend

                                        Why Invest the Weekend:

                                        • Gartner: 55% of 2026 PC shipments will be AI PCs
                                        • IDC: On-device intelligence mainstreaming at MWC 2026
                                        • Hardware getting better at this, not worse
                                        • Stack you build this weekend gets faster with each hardware upgrade
                                        • Resources

                                          Offline-First AI SOP (in show notes): Complete implementation guide with:

                                          • Architecture diagrams
                                          • Battery Saver vs Throughput profiles
                                          • Model selection matrix (RAM/VRAM requirements)
                                          • SQLite schema and WAL configuration
                                          • Docker-compose for sync workers
                                          • Conflict resolution policies
                                          • Travel Day Mode scripts for Mac/Windows
                                          • Encryption setup guides
                                          • Next Steps

                                            This Week: Pick a Saturday. Install one runner. Download a 7B model. Disconnect WiFi. Drop a file. Watch it process.

                                            That one drill tells you if your hardware can handle it. Once you see it work, you'll never fly without it again.

                                            Stop losing travel days. Build the stack.

                                            ...more
                                            View all episodesView all episodes
                                            Download on the App Store

                                            The Stateless FounderBy Santi, Kira