Digital Dopamine

Gimme More Gemma 4


Listen Later

One Step Closer to Sustainable AI

Today, we are going to give props where props are due, and that’s with Google’s new Gemma 4 models. The Gemma 4 models are open source with an Apache 2.0 license; Anyone can use this model commercially or personally with no restrictions, allowing folks to build and sell applications, embed the model in their product, or clone and build upon the core model code without needing permission from Google. That’s not the only cool thing about Gemma 4, the models are TINY and pretty damn powerful.

As you can see in the image above, there are 4 models total: E2B, E4B, 26B, and 31B.

E2B & E4B

These models are best suited for mobile and IoT devices. There are no other models available that can run natively on the user’s device without an internet connection and using only the device’s processing power. The immediate benefit of this breakthrough is that populations with little to no internet access can begin using AI for their own learning. For personal use and coding, the intelligence of these 2 models is not strong enough to do any daunting task, coding, or deep thinking, but they are more than capable of providing quick 1-2 liner responses for learning and informational purposes (we’ll get into the exact details of the breakthrough later). This also benefits people who might get lost on long expeditions in unknown & uncivilized territories. For example, one might need to get foraging info to assist their survival tactics. Though they were engineered with a target audience in mind. These models were engineered from the ground up for maximum compute and memory efficiency, and in collaboration with Google Pixel, Qualcomm, and MediaTek, they can run completely offline with near-zero latency on phones, Raspberry Pi, and NVIDIA Jetson Orin Nano.

26B & 31B

Here’s where things get pretty interesting. The 26B & 31B models are aimed at more coding assistants and agentic workflows, research, and enterprise production apps. The 26B works decently on a Mac Mini (M4), and you can easily downgrade to the smaller models for much greater performance, depending on the task and context provided. But for open source models that are so small with an open license, the 26B and 31B swing well above their weight class, outcompeting models with 20× more parameters. By total parameter count, Gemma 4 31B is 24× smaller than GLM-5 and 34× smaller than Kimi-K2.5-Thinking, delivering comparable performance at a fraction of the footprint. The more remarkable story is the 26B MoE: it achieves 97% of the 31B's quality at approximately 8× less compute per inference step, with the 26B MoE reaching 40+ tokens per second locally versus the 31B exceeding 10 tokens per second.

The pure flexibility and power at these model sizes unlock so many possibilities in the open source space, especially for people looking for a free and frictionless way to get into using AI without needing massive compute power. Much better for the environment, too!

Gemma The Winna

Yes, I misspelled “winner” on purpose, relax. Gemma 4 is the clear winner when it comes to open models. At least in my. eyes. There are other open-source and free models, such as Kimi-K2.5 and Qwen 3.6, that beat Gemma 4 in almost every category, but the gap is not large, and with the other models, you pretty much will need a powerhouse of a home setup or the power of an actual enterprise server to run these on. So, for the everyday layman, Gemma 4 should be the model forced upon the people. Yup, I said it. If you aren’t using AI agents or AI in general for coding, deep thinking, research, or science, then you need to be forced to use a model that has very little impact on the environment and your wallet. There’s no reason for anyone to be paying $20/month for high-energy prompts about what to make for dinner or how to talk to women……there are people that do the latter and all I can do is pray for them lol. But below is a nice little graphic of the 4 different model variations, their use cases, and the min amount of compute needed to run (not necessarily smoothly).

There are some cool facts to read into about the technical achievements of Gemma 4 and I urge anyone with a deeper interest in knowing the nitty-gritty to read into their official docs → https://ai.google.dev/gemma/docs/core/model_card_4 as well as peep the video below:

Pairing With OpenCode and Ollama

Now that we have a general sense of Gemma 4 and its capabilities, I wanted to test things out myself with a quick local project that uses the Gemma 4 Model to build out an API for an AI chat. I wanted to include a UI as well for screen recording purposes so I looked into ways to make this happen all without paying a single penny (locally of course). This led me to OpenCode and Ollama.

OpenCode

This is pretty much the open-source and free version of “Claude Code” with the added ability to use any free or paid model for your operations. To be more detailed, OpenCode is an AI-powered terminal coding agent that sits on top of whatever local model you point it at. It reads your codebase, writes and edits files, runs commands, and iterates — all from your terminal. Critically, it’s model-agnostic, so you can point it directly at your Ollama endpoint and it uses Gemma 4 as its brain instead of a paid cloud model. The local setup was very straightforward, and you can get it installed with curl, brew, or an installer:

Curl

curl -fsSL https://opencode.ai/install | bash

Homebrew

brew install anomalyco/tap/opencode

Node (NPM, PNPM, BUN, YARN)

# NPM

npm install -g opencode-ai
# PNPM
pnpm install -g opencode-ai
# Bun
bun install -g opencode-ai
# Yarn
yarn global add opencode-ai

Once installed, you are all set to start using OpenCode as your agent of choice, but you’ll still need a vehicle to download, manage, and serve open-weight models like Gemma 4 on your local machine. That's where Ollama comes in.

Ollama

Ollama is the runtime layer in which you can download and serve open-weight models and API dependent models on your local machine. In our case, it wraps the model in a local REST API (mimicking OpenAI's API format) so any app can talk to it at localhost:11434. With the command below:

# The model i used is a custom 26b model. You can swap that out with gemma4:latest or any other model version

ollama run gemma4:26b-32k

you get your model running. Ollama handles quantization, GPU/CPU offloading, and model versioning under the hood. For our example project, it's the reason we have zero cloud dependency and zero API costs, which is DOPE AF!!

Local AI API Example

Now let’s dive into the example project itself. The goal was to build a Local AI assistant API using only OpenCode and Gemma 4 as the model. That turned into two rounds of attempting that, both having their fair share of difficulties when it came to some of the logic. Now this task was pretty hefty for Gemma 4 in my opinion but I wanted to see how well it would do with creating an AI itself.

The Qwen flop

This one kinda sucked because for the second run, I wanted to use Qwen since it’s supposed to be better when it comes to agentic coding. But long story short, as soon as I started using the qwen3.6:27b, which is supposed to be good on machines with at least 17 GB of RAM. I have 24GB, and while it’s running, my activity monitor says it’s using 22.5 GB of my 24 GB….

Not only is it using more GBs than the model apparently was supposed to use (or maybe it is, and there’s a lack of architectural knowledge on my end), but it just doesn’t work at all. No exaggeration when I say I let it run for about 1 hr on a simple question, and it never even got past the thinking step. Clearly, my Mac Mini was not up for the task with Qwen, but that’s exactly why Gemma 4 is so damn cool. There’s literally a model for every kind of device.

API V1 & V2

So, as I mentioned earlier, I did two rounds of running this, and I will say there were successes and failures in both runs that made them about equal in output quality.

V1

With V1, the UI is where we were lacking. Hell, the UI was the biggest struggle for both runs since it insisted it use Svelete and SvelteKit as the frontend framework 😂. I know most of these models have very little Svelte in their training data, so any chance I get, I make sure to use Svelte and help the AI build their Svelte chops lol. But considering this project is local, I’m not helping the big 3 (OpenAI, Anthropic, X) train their models this time around.

The other issue with this build was that the scaffolding was a bit off, and that caused confusion when giving further directions. However, it seemed to pick up the pieces pretty quickly on the first round, and I got a working API + UI pushed up to GitHub. Below is a quick screenshot of a random question I asked the V1 chatbot:

For additional context here, the first question was asked with nothing in the “System Prompt” section, but for the second question, I added “Make sure your responses are fantasy-themed”, and the response was adjusted accordingly. The System Prompt section itself is unique to this version’s build, as V2 did not feel the need to add that feature 😅. But it gives a bit more control of the output for the user by allowing them to add some constants into their chat flow. The UI uses Svelte and took a couple of prompts to fix the errors present. I was personally still impressed by how well it did with Svelte compared to other big-name models I’ve used in the past.

V2

Now with V2, the first run through of the prompt gave it a good head start with the UI, since there were already structured files in the first version. There were only 2 main issues with the second run through: it initially built the frontend in a “UI” folder instead of a “Frontend” folder, and the button + POST request was broken. The UI design, though, was a bit more compact, and in my opinion, that was a better look. The System Prompt section was also left out of this build, and the chat box is much smaller than V1’s, and I have a title to actually indicate what the tool is.

I asked both versions their own simple questions, and they did a very good job and got a response back within a minute, once the models were warmed up. I personally don’t have a reason to ever use either of these chatbots, but I figured it would be a good test to see if AI can build….AI 😅. In the long run, and while imperfect, it was still a success.

Goodnight Gemma

That about wraps up my little experiment and dive into Google’s Gemma 4 model, and I simply can’t express enough how much I look forward to the continued evolution of Gemma. If you watched the full video, I hope you enjoyed the journey with me building pieces of both versions, even the roadblocks. I’ll be doing more dives into cool tech like this soon, but until next time, stay rooted. Peace ✌🏾.

If you want to keep up with my work or want to connect as peers, check out my social links below and give me a follow!

* 🦋 Bluesky

* 📸 Instagram

* ▶️ Youtube

* 💻 Github

* 👾 Discord



Get full access to Digital Dopamine at digitaldopaminellc.substack.com/subscribe
...more
View all episodesView all episodes
Download on the App Store

Digital DopamineBy Digital Dopamine