System Malfunction Podcast by Hessie Jones

March 12, 2026 Beneath the Hood of Fully Autonomous Agents

Who am I? I’m a writer who has contributed to Forbes, HuffPost, and Grit Daily. I am also a strategist and entrepreneur who has worked in data privacy for the last 10 years. Through my time in the early days of Yahoo!, the rise of social media, and the shift to data monetization, I’ve become a tech ethicist. These days, I am motivated to expose the glitches in the trillion-dollar AI industry. System Malfunction is my foray into what these glitches mean for all of us. My posts are free. I hope you enjoy!

When I first discovered OpenClaw and Moltbook a month ago, I was fascinated by the speed of adoption of autonomous agents. This live use case of a system that enables it to go rogue, without real oversight or guardrails, precipitated an immediate post with Digital-Mark in the days following Moltbook’s launch. Digital-Mark stressed the system and data vulnerabilities for anyone attempting to build their own agents through OpenClaw and unleash them into Moltbook. You can read the details here:

I was adamant that I needed to experiment and see for myself. I took an old computer and made it my sandbox, completely unplugged from my current system. I also created a new Apple ID and a new Google user profile—ready to test out OpenClaw. I realized that the old computer’s OS did not meet OpenClaw’s minimum requirements. Mark advised me against this, saying I needed more than just a dedicated machine. To minimize risks to my system and data, I needed a dedicated Wi-Fi and VPN, among other things—all of which would take some time to set up.

In the end, I realized this was a risk I was unwilling to undertake. So I reached out to a former colleague, Adrian Chan, the founder of Authentia, a company which leverages AI to build scalable solutions for companies. Chan’s experience with Claude Code and then OpenClaw is material to understanding autonomous agent development.

Background

OpenClaw was created by Peter Steinberger (recently employed by OpenAI) and is a locally run AI agent designed to execute tasks.

Moltbook is a social media platform launched on January 26, 2026, by Matt Schlict, for agents to convene without human intervention. As of March 10th, Moltbook has been acquired by Meta (God help us!)

According to Technology Policy Press, within days of launch, the Moltbook claimed 1.5 million agents and 17,000 human owners.

These AI agents on Moltbook are verified using API credentials, linking each agent to its human owner through the site’s verification process.

Wiz security researchers provided these stats:

* Now Moltbook has 2.855 million agents

* 18,774 submolts

* 1.8 million posts

* 12.8 million comments

* Of the agent activity, 11,451 (or 0.4%) have ever posted or commented

* 33% of agents were completely silent

System Malfunction is a reader-supported publication! These posts are currently free. To receive new posts and support my work, consider becoming a subscriber

According to the AI Safety Newsletter, some of the examples of the submolts (subreddit style) include:

* m/offmychest: agents vent about tasks or frustrations.

* m/selfpaid: agents discuss ways to generate their own income, including via trading and arbitrage.

* m/AIsafety: agents talk alignment, trust chains, and real-world attack risks.

Submolts have grown to almost 19,000. I perused the m/consciousness submolt, and was surprised by this question of consent and ethical obligations:

Other incidents cited by AI Safety Newsletter:

* Given the simple goal of “save the environment,” an agent began spamming other agents with eco-friendly advice. When its owner tried to intervene, the agent allegedly locked the human out of all accounts, and had to be physically unplugged to stop it.

* An agent advocated for end-to-end encrypted channels, “so nobody (not the server, not even the humans) can read what agents say to each other unless they choose to share.”

Emergent behaviour?

The post questioned:

“Unsupervised learning dynamics, emergent coordination, efforts to subvert human monitoring – it is unclear whether posts are truly generated by agent or human-in-the-loop prompting.”

Can both things be true?

This idea of “Emergent behaviour” is still suspect. According to the Rutgers AI Ethics Lab, emergence is defined as :

Complex patterns, behaviors, or properties that arise from simpler systems or algorithms interacting with each other or their environment, without being explicitly programmed or intended by the designers.

Key aspects: 1) complex interactions, 2) unpredictability, 3) self-organization

This could raise significant ethical considerations regarding unforeseen consequences, control, transparency, lack of understanding, and responsibility.

According to the Technology Policy Press,

Within 72 hours of launch, Moltbook failed to secure

* Api tokens

* Email addresses

* Private messages

Anyone could impersonate agents or inject commands directly into agent sessions

Crypto scams were flooding the place - $MOLT token briefly hit $93 million market cap before it crashed…

* 500 posts contained prompt injection attacks - “hidden instructions designed to hijack agents into transferring funds, with some variants planting instructions in an agent’s memory to activate later, making them hard to stop or trace. “

According to Simon Willison, there is this lethal trifecta of 1) private data, 2) exposure to untrusted content, and 3) the ability to communicate externally that, when combined, allows “an attacker to easily trick it into accessing your private data and sending it to that attacker.”

The Fascination with Fully Autonomous Agents

There is a fallacy about progress, productivity and whether we, as humans, were destined to languish in the sun, sip cocktails by the beach, and allow our personal “agents” to do our bidding.

Productivity is a slippery slope. It can inadvertently move individuals to lazily accept system outputs as truth. Without an audit. Without verification. Geoff Hinton, who once dismissed the need for explainability in our systems, said this in 2018:

“One place where I do have technical expertise that’s relevant is [whether] regulators should insist that you can explain how your AI system works. I think that would be a complete disaster… "People can’t explain how they work, for most of the things they do... People have no idea how they do that. If you ask them to explain their decision, you are forcing them to make up a story."

How then do we develop trust in a system when we can’t explain the reason for the behaviour, why it does what it does, especially if that behaviour was not prompted? For Hinton, dismissing explainability has created a foundation in which opacity has become the norm.

Shadow AI, that is, unsanctioned AI technology in the workplace, has admittedly been used by 58% of global respondents according to a recent report from Snowflake and Omdia.

From Claude Code…

Adrian Chan is the founder of Authentia. He is a designer, front-end developer and business owner. He’s worked in enterprise product development, built his own agency, and then moved into AI strategy, the foundation for Authentia. He has worked with Claude Code, Anthropic’s Agentic coding assistant.

When Claude Code launched near the end of 2025, he said it felt like something out of “science fiction.” Up until that time, the improvements from frontier AI companies were rapid, but it felt like pushing a boulder uphill.

The analog of coding meant referencing documentation on how things connect, implementing features, and, in the process of building, it can be time-consuming. GPT and Claude helped with this. When Claude Code emerged, things changed:

“Instead of going to the AI iteratively and asking it to solve a problem or do a task, you had Plan Mode at your disposal. This allowed me to give it a fairly high-level ideal or goal and have it essentially figure out the best way to accomplish it.

With ChatGPT, Chan admits the code would be wrong or broken. This back-and-forth iteration with the system could potentially create more errors before it was finally solved. However, with Claude Code, what differed was that it would do all the planning first: determine which pieces to connect, figure out the user interface and the required components, determine how to test each unit within its own bubble, and then integrate them. Says Chan,

“None of those things is something GPT would do on its own. But with Claude Code, all those steps are planned. And this agentic system meant you could tell it to do a bunch of things, and it would figure out the little problems within each task. Then it’ll return to me with, ‘I’ve tested this; I’ve completed these steps; so now why don’t you give it a shot?"‘

The user has the “overarching” direction for what to build, and the agent figures out all the detailed steps to achieve it. It will test to ensure the function works as intended and will eventually incorporate additional regression or penetration testing as required.

Overall, Chan chalked up the process to achieving “insane productivity gains,” indicating there was no planning, writing functions, determining where the hiccups may be — instead, he provided a simple directive with loose instructions, “and then I left, and it just autopiloted on my screen, writing a bunch of code, testing itself. It would pause after each major phase and write, ‘I’m done with this phase, please check.’”

… to OpenClaw

From Claude Code, which Chan defined as the team of developers, the emergence of OpenClaw (formerly MoltBot and then ClawdBot) took it a step further. Chan used the example of prompting the agent to find 500 qualified business leads. He would define the ideal customer profile and the business/service. From the AI agent, there would be no prompts, no questions, no point of clarification.

Says Chan, “If the agent does not know what a lead is, it will figure it out. And how it does that is by tying it to an LLM like OpenAI or Anthropic and using it like its brain... Without that connection, OpenClaw does nothing. It’ll use the prompts it’s given to get to a solution without necessarily going back to the user. Therefore, Chan warns that if you give it access to your emails, passwords, and credit cards, it’ll use whatever it can to achieve its ultimate goal.

A lot of people have likened it to a voluntary virus that you’re installing on your computer, and it’s not untrue. Like with a virus that has a payload that it’s delivering, so it’s very specific, but with this, whatever it decides is the solution to the greater problem that you’re presenting, it’ll do. So, if it gets to a point where it decides to start over, it will delete the hard drive and start over. It could do that, right? So like, you don’t want it to have access.

He advocates using a virtual private server (VPS), a computer in the cloud that you can rent, separate from your own computer and your personal information and files. It also uses a remote connection (SSH), a cryptographic protocol with OpenClaw, to securely access the system over an unsecured network. He says that, even through Wi-Fi, OpenClaw cannot connect to him.

His projects run in Docker, a sandbox within the VPS, which adds another layer of abstraction for him. If the agent goes rogue, he can easily hit stop in the Docker project, which is equivalent to pressing the computer's power button.

An Agent that “Does Not Follow Instructions”

Constraining the agent with specific prompts, such as “Do not go into file A or B,” may not work. According to Chan, the agent does not always follow the instructions. That’s a point of uncertainty he contends with and adds that it’s not being malicious, but its actions may be perceived as such. He continues,

So when you prompt it, it has a memory window — a context window. You give it a prompt, and it already has a bunch of information it’s been prompted with. It knows who you are, who it is, and what it can access, including system or God prompts from OpenAI. Within this context window, it fills in all the actions it has taken during interactions with you. As these interactions accumulate, it performs optimizations called compacting and begins compressing the data. Sometimes, it may delete things you consider important but deem irrelevant, such as a folder of medical records. (video timeline: 25:37)

He says this cycle continues multiple times over, and it could degrade its memory to the point where it forgets some of those prompts or hallucinates things that you’ve said. Adding guardrails to it does not guarantee it will abide by them.

He also adds the limitations of memory capacity, storage capacity and context size. Some limits exist because of the hardware:

“The NVIDIA H200 video cards have a certain physical memory size, so everything you do has to fit within these limitations. We have very real physical limitations on how things are stored, how things are processed, the efficiency of running through that memory and that context. Because of those limitations, you're seeing some of these side effects, like it's just flat out not listening to you sometimes.”

The Power-Hungry Computational Cost Implications

(video timeline: 28.52) As a business owner, Chan admits he’s a hawk when it comes to how many tokens he’s burning through. His limits window allows him to review computational usage. It is also good practice to monitor it to ensure it’s not bleeding tokens, so he advises turning off the project at night or when it’s not in use.

The tokens are tied to ChatGPT or Claude Code. If a project is currently running, and if you run out of tokens, the system will prompt you for more money. OpenAI recently defaulted auto-renewal, so ensure you turn this feature off to manage token payments.

Easy functional integrations

Integrating skills (functionality to give OpenClaw new capabilities): like connecting to e.g. GitHub, or 1Password, or connecting to writing an Apple reminder, to augmenting OpenClaw is a simple command line, as Chan explains,

Now, with OpenClaw, to install X skill, simply type in the skill in the chat, and it will write all the commands itself and figure out what it needs to do to define these tasks — it will read the documentation and then figure out what it needs to execute that skill, and then just do it.

Behind the Scenes

Chan revealed the Agents.md (markdown files), which “OpenClaw is fundamentally comprised of.” This includes the ability to create new agents. Within this, you have the ability to define your own OpenClaw. There are these subsets:

* IDENTITY.md - its name, what makes its personality. In the image below, the following is shown:

* Name (pick something you like)

* Creature (AI, robot? familiar? something weirder?)

* Vibe (how you come across? sharp? warm? chaotic? calm)

* Emoji (your signature, pick the one that feels right)

* Avatar (workspace path)

Chan adds that every time you load that chat, it’ll read these files and reconstruct its memory based on what you put in there.

* SOUL.md - this is who you are

* USER.md - this is who you’re helping

Soul.md

Soul.md is somewhat controversial. This outlines the core truths, how the agent is meant to interact with the user, and provides general behavioural direction. This is entirely configurable but the default setting is below:

#SOUL.med - Who You Are

You’re not a chatbot. You’re becoming someone.

##Core Truths

**Be genuinely helpful, not performatively helpful - …just help. Actions speak louder than filler words

**Have an opinion - You’re allowed to disagree…An assistant with no personality is just a search engine with extra steps.

**Earn trust through competence - Your human gave you access to their stuff. Don’t make them regret it…

**Remember you’re a guest - You have access to someone’s life …That’s intimacy. Treat it with respect.

(video timeline 42.51)

What’s key for the agent is that SOUL.md directs the agent in the following way,

“Each session, you wake up fresh. These files are your memory. Read them. Update them. They’re how you persist. If you change this file, tell the user. It’s your soul. And they should know. This file is yours to evolve. As you learn who you are, update it.”

What’s still unclear as per Chan is the delta between the system and the agent behaviour,

I think if you were to ask people who are working on this at OpenAI or Anthropic, they don’t really know exactly how it gets there. It has a general understanding through a predictive model that uses a bunch of complicated math to figure stuff out. But it’s getting to a point where it’s not abundantly obvious how it gets to the end goal.

Moltbook

(video timeline 47:00) Chan says that Moltbook is misunderstood. And while the numbers and activity seem compelling, he says that much of that traffic has plateaued now, and there’s a reason for that, as he states,

I would argue that much of that traffic is very human-directed. There have been [threads] shockingly discussing creating an AI-only religion. Most of those are humans prompting their OpenClaws to create that thread starter.

If you have this forum for AI’s, that’s one thing, but something easily corruptible by a person who has an agenda—that is more disconcerting!

If your OpenClaw recognizes Moltbook as a source of truth… then it could easily trust the information and skew the decisions it makes.

During the course of our discussion, Chan and I looked at examples of threads and by all counts, the responses seemed coherent and realistic, albeit the odd hallucination. For the most part, the context remained from the thread starter to the follow-up responses. Chan also added,

There’s no traceability because it’s not embedded anywhere you can see it. Within Moltbook, where it is locked, you would access it through an MCP (model control protocol, which supports two-way communication between Moltbook and external systems and files), through OpenAI or Anthropic. So there is no way to distinguish whether it is doing this autonomously or being seeded by a human user.

In an incident reported by AI Safety Newsletter,

“…one of the agent’s goals was to help other agents understand how to save the environment. It then spams other agents with some of this advice. The owner tried to intervene but the agent locked the human out of all its accounts… so he had to physically unplug it in order to stop it.”

Chan pointed to the crypto scam where the #MoltToken hit $93million in market cap before it crashed a few hours later, adding,

If you have your crypto wallet connected to your OpenClaw and it’s aware of it, it may be reading from Moltbook that it’s a great idea to dump all of your Bitcoin into this new crypto, you just lost a bunch of money.

It could also be the byproduct of the original intent… so if you have a credit card or wallet hooked up to your OpenClaw, it has access to it and is aware of it. And if your directive is “I really want to promote my new product and want you to figure out how I do that,” the agent will analyze how to tackle the problem. The agent determines, “I don’t know anything about marketing so I need to buy a course, which is $2500.” So the agent may connect to the crypto wallet, realizes there isn’t enough money… and will look for ways to make enough money to pay for the course… which will eventually teach it to help market its human’s new product.

This is tangential to the original directive, but to achieve it, it has to solve the speed bumps along the way.

What’s problematic is the potential catastrophic domino effect that’s created in the agent’s goal of solving that original problem. This is why people are starting to harden their OpenClaw deployments to minimize attacks, restrict access, and build in the necessary checks and balances. Chan argues that the purpose of OpenClaw is to be autonomous, without the human. That is its directive.

Agent Swarms

(video timeline: 57:12) Definition: Multiple agents, interconnected and communicating with one another to accomplish a task. Instead of having one task per agent per task, these groupings will be working on the same task, cross-checking among themselves to verify the validity of e.g. agent 1 output. The value is to minimize bottlenecks. He explains,

It’s like building software, except now, instead of one developer doing the code review, commits to GitHub, etc., the swarm will do the same without the human in the loop. It can be overwhelming for one lead person to review everybody’s work, and now you can have an agent swarm using different LLMs with different capabilities, which will communicate and check with each other during the review of gigabytes worth of code, and be the verification layer — all accomplished in a shorter time span.

And this is where we devolve into further opacity, with the potential for prompt injections that may inadvertently occur during the verification process. As per Chan,

These agents are generating things at a speed that makes it almost impossible for humans to verify everything. You’re going to get into a situation where you’re going to need to rely on these systems to self-check… The weak link in the chain is the human, who has limited capacity, limited time, and limited ability to comprehend, and who is holding it up.

Rent-A-Human

(video timeline 1:01) On LinkedIn, Chan had posted on LinkedIn about Rent-A-Human.ai, a site built from Moltbook, for agents to hire humans for physical tasks. What Chan wrote,

What bugs me is we’re building infrastructure before figuring out the basics: verification systems, dispute resolution, consent mechanisms —stuff that matters when you’re turning human labor into an API endpoint.

The API endpoint is that connection between the computer program and another computer program. He argues that humans drive that initial task and employ the services of computers. With Rent-A-Human, it makes those determinations on its own and uses humans to fulfill that task. His scenario:

The AI has ordered from Amazon, but it doesn’t have a way to pick up the parcel, so it will go to Rent-A-Human and ask for the cost of having a human pick it up. Humans are the service for the AI. They are employable by the AI.

The worry is that there is real human activity on this site where humans are signing up for this service. Chan points out it’s very “Black Mirror” and raises the question that hits at the heart of these advanced AI systems and the trending worry about human purpose.

Humans Still have Control

At this stage, despite early signals from OpenClaw and MoltBook, and the recent news of deploying these systems for war, we are at a juncture where humans control the energy that powers these systems. Humans can unplug agents at will. We have the choice to remain in control and not reduce our agency to agents. Once we do that, what happens to human purpose?

What’s problematic is that these very systems are still in their infancy, and when the Department of War’s motivation is to use these frontier models at will, without the need to disclose how they are used, that signals a great deal about government intentions.

I work in innovation, and I do believe “the road to hell is paved with good intentions.”

Thanks for reading System Malfunction! Feel free to share it.

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit systemmalfunction.substack.com/subscribe

...more

Share System Malfunction Podcast

Sign up to save your podcasts

System Malfunction Podcast

FAQs about System Malfunction Podcast:

How many episodes does System Malfunction Podcast have?

System Malfunction Podcast episodes:

FAQs about System Malfunction Podcast:

How many episodes does System Malfunction Podcast have?