Coordinated with Fredrik

NATS as the Nervous System of the Grid


Listen Later

In this special episode of Coordinated with Fredrik, we went deep — not at the strategy layer, not at the founder layer, but at the socket level.

This was a strict engineering teardown of a single question:

Can NATS become the autonomic nervous system of Sourceful Energy?

What follows is the architectural synthesis.

The Real Problem We’re Solving

At Sourceful, we are not just operating a backend.

We are coordinating:

* High-throughput mobile app traffic

* A growing mesh of backend microservices

* Massive telemetry streams from distributed energy assets

* Smart meters

* EV chargers

* Solar arrays in remote geographies

* Wind turbines behind unstable uplinks

This is not a traditional cloud-native architecture.

It is cloud + edge + unreliable networks + financial correctness.

That requires more than horizontal scaling. It requires:

* Isolation of failure domains

* Backpressure awareness

* Autonomous routing

* Dynamic topology adaptation

* Edge survivability

This is where NATS becomes interesting.

The 15 MB Binary That Changes the Conversation

NATS runs as a single static Go binary of roughly 15 MB.

A single node can handle 15–18 million messages per second.

That sounds unrealistic until you understand the engineering choices:

Concurrency Model

* Lightweight goroutines

* User-space scheduling

* Massive TCP connection density

Memory Discipline

* Zero-allocation parsing

* Pointer passing instead of object churn

* Minimal garbage collection pauses

Routing Philosophy

* Pure in-memory message routing

* Disk I/O only when explicitly requested

NATS is not a heavyweight enterprise broker.

It is a highly optimized, high-throughput routing engine.

Selfish Optimization: Protect the System First

One of the most controversial ideas in NATS is “selfish optimization.”

If a downstream consumer slows down:

* NATS does not buffer indefinitely

* NATS does not slow producers

* NATS drops the connection

From a traditional enterprise mindset, that sounds aggressive.

But in distributed energy systems, it is correct.

If the router collapses:

* Telemetry stops

* Control signals stop

* Billing APIs stop

* The entire system fails

Protecting the health of the transport layer is non-negotiable.

The whole must survive even if individual services fail.

Core NATS vs JetStream

NATS separates transient routing from durability.

Core NATS

* In-memory

* Fire-and-forget

* Ultra-low latency

* No persistence

If no subscriber exists, the message is dropped.

Use this for:

* Real-time telemetry

* State queries

* Fast internal RPC

JetStream

* Durable streams

* Raft-based replication

* Replayable consumers

* At-least-once / exactly-once semantics

Use this for:

* Billing events

* Immutable records

* Financial correctness

The key principle:

Persistence is opt-in.

You only pay for disk I/O when the workload requires it.

Raft Without the Bottleneck

Most distributed streaming systems rely on a single global consensus group.

JetStream does something different:

* One meta-consensus group for cluster metadata

* Independent Raft groups per stream

* Even per consumer

If you run 5,000 streams, you run 5,000 independent consensus groups.

Why does that not collapse under overhead?

Because:

* Each Raft group runs as a lightweight goroutine

* Heartbeats are batched

* Streams are isolated

A spike in one stream does not block the others.

This is horizontal scalability at the consensus layer.

Subject-Based Routing Instead of IP-Based Thinking

NATS routes by subject strings, not by IP addresses.

Example:

telemetry.eu.germany.meter80492

Routing is powered by an optimized radix trie.

This means:

* No regex matching

* No linear scans

* Logarithmic routing complexity

Subject hierarchies become your semantic network.

Developers stop thinking about:

* Hostnames

* Ports

* DNS

* Reverse proxies

They express interest in data.The infrastructure routes it.

Request-Reply Without HTTP

NATS supports request-reply patterns without point-to-point connections.

Mechanically:

* The requester generates a temporary reply subject

* Publishes a message including that subject

* A service processes and replies to that subject

* The first response wins

To developers, it feels synchronous.

Under the hood, it is fully asynchronous and multiplexed.

Queue groups provide built-in distributed load balancing.

This eliminates internal service meshes and east-west load balancers for microservice communication.

Public ingress still requires API gateways.Internal routing becomes dramatically simpler.

Global Scaling with Superclusters

Inside a region, NATS uses a full mesh cluster.

Across regions, it uses superclusters connected by gateways.

Gateways operate in interest-only mode.

If Europe is not subscribing to US telemetry:

* No bytes cross the Atlantic.

The moment interest appears:

* Flow begins automatically.

This prevents blind data mirroring and reduces egress costs dramatically.

Leaf Nodes: Edge Autonomy

Leaf nodes are where NATS becomes transformative for energy infrastructure.

A leaf node:

* Runs locally on edge hardware

* Initiates an outbound TLS connection to the core

* Requires no inbound firewall rules

* Multiplexes all traffic over a single connection

If connectivity drops:

* Local JetStream buffers telemetry

* Local control systems continue functioning

* No data is lost

When connectivity restores:

* The stream synchronizes automatically

* Consumers resume from correct offsets

This enables:

Autonomous edge during disconnection.Seamless federation when connected.

For EV chargers, solar arrays, and wind turbines, this is critical.

Decentralized Security at Scale

Traditional brokers rely on centralized authentication.

That becomes a bottleneck at scale.

NATS uses:

* ED25519 keypairs

* JWT-based trust hierarchy

* Operator → Account → User model

Authentication becomes pure cryptographic verification.

No database lookups.No external latency.No central auth bottleneck.

Permissions are embedded in JWT claims:

* Publish rights

* Subscribe rights

* Data limits

Revocation can be pushed in real time without cluster restarts.

For enterprises tied to Okta or LDAP, auth callouts bridge existing identity systems into decentralized JWT issuance.

This allows compliance without sacrificing performance.

Kafka, RabbitMQ, MQTT — Where NATS Fits

Kafka

Designed for:

* Durable append-only logs

* Analytics pipelines

* Data lakes

Strength:

* Historical retention

Tradeoff:

* Partition-bound scaling

* Consumer rebalancing pauses

* Operational overhead

NATS:

* Dynamic routing

* Elastic worker scaling

* Lower latency for microservices

RabbitMQ

Designed for:

* Complex exchange-based routing

Tradeoff:

* Higher operational fragility under partitions

* Cluster state complexity

NATS:

* Simpler subject routing

* Gossip for cluster sync

* Raft-backed durability

MQTT

Best for:

* Constrained IoT devices

NATS does not replace MQTT.

It embeds an MQTT broker.

MQTT topics are mapped directly to NATS subjects internally.

This creates a unified backbone:

* Edge devices speak MQTT

* Backend services speak NATS

* No external translation layer required

The Paradigm Shift

For decades, distributed systems have been built around:

* IP addresses

* DNS names

* Blocking HTTP calls

* Explicit service discovery

NATS introduces a different idea:

Express interest in a semantic subject.Let an autonomic system route it dynamically.

In a world of:

* Real-time AI inference

* Autonomous energy assets

* Fluid containerized workloads

* Distributed edge computing

IP-based thinking becomes friction.

Subject-based thinking becomes leverage.

What This Means for Sourceful

Adopting NATS is not swapping a queue.

It is:

* Flattening internal service meshes

* Eliminating east-west load balancers

* Moving complexity into autonomic infrastructure

* Enabling edge-first resilience

* Protecting system health by design

* Running a global coordination backbone on a single optimized binary

The goal is operational simplicity.

Push complexity into the transport layer.Free engineers to focus on energy optimization logic.

If we get this right:

The infrastructure becomes invisible.

And the grid becomes programmable.



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit frahlg.substack.com
...more
View all episodesView all episodes
Download on the App Store

Coordinated with FredrikBy Fredrik Ahlgren