Please support this podcast by checking out our sponsors:
- Discover the Future of AI Audio with ElevenLabs - https://try.elevenlabs.io/tad
- Prezi: Create AI presentations fast - https://try.prezi.com/automated_daily
- Build Any Form, Without Code with Fillout. 50% extra signup credits - https://try.fillout.com/the_automated_daily
Support The Automated Daily directly:
Buy me a coffee: https://buymeacoffee.com/theautomateddaily
Today's topics:
OpenClaw goes viral, sparks risk - OpenClaw’s rapid adoption highlights a new attack surface: local agent assistants inherit employee permissions, making marketplaces and “skills” a supply-chain risk. Keywords: OpenClaw, skills, permissions, prompt injection, enterprise security.
Enterprise governance for AI agents - Zenity’s webinars frame AI agents as “digital teammates” that need policy, least privilege, containment, and observability—plus practical governance for shadow AI. Keywords: AI security, governance, compliance, posture management, Zenity Learning Lab.
Agents in the wild: Manus and Telegram - Manus AI tried a near one-click always-on agent via Telegram as the persistent ‘front door’—then Telegram suspended the account, raising platform dependency questions. Keywords: Manus, Telegram, persistent memory, integrations, credit-based pricing.
Agentic coding: edit buttons and IDEs - From exe.dev’s ‘Edit with Shelley’ to xAI’s Grok Build parallel agents, agentic development is shifting toward wiki-like software and multi-agent IDE workflows. Keywords: slinky, Shelley, parallel agents, arena mode, Claude, IDE.
Microsoft builds MAI to diversify - Microsoft is building in-house MAI foundation models under Mustafa Suleyman while still hedging with OpenAI, Nvidia/AMD, and third-party models on Azure. Keywords: MAI, Maia chip, Fairwater data centers, Azure, OpenAI partnership.
Tokenizers and why links hallucinate - A reverse-engineering look at OpenAI’s o200k_base tokenizer suggests big efficiency gains for code, URLs, and non-Latin scripts—plus camelCase-aware pre-tokenization and tool-token variants. Keywords: tokenizer, tiktoken, o200k_base, camelCase, tool tokens.
RL for agents: Forge and Composition-RL - MiniMax’s Forge tackles the RL ‘impossible triangle’ for agentic models, while Composition-RL boosts RLVR by composing verifiable prompts—both aiming for stronger reasoning with scalable training. Keywords: reinforcement learning, agents, verifiable rewards, Forge, Composition-RL.
Data quality systems for human judgment - Welo Data argues AI fails quietly when human evaluation isn’t operationalized into calibrated, auditable workflows with drift monitoring and QA loops. Keywords: data quality, human judgment, calibration, auditability, drift detection.
AI backlash: slop, scams, bullying - Several pieces warn that AI’s harms are becoming everyday: harassment, deepfakes, scams, open-source maintainer overload, and a worsening ‘dead internet’ feel. Keywords: AI slop, bullying, scams, open source, disinformation.
Real-time speech agents: PersonaPlex - NVIDIA’s PersonaPlex-7B-v1 is a full-duplex speech-to-speech model that can listen and talk simultaneously, enabling interruptions and natural turn-taking for real-time voice agents. Keywords: speech-to-speech, full-duplex, PersonaPlex, Moshi, 24kHz.
-https://zenity.io/resources/webinars/openclaw-how-to-secure-agent-assistants
-https://zenity.io/resources/webinars/foundations-of-ai-security
-https://arxiv.org/abs/2602.11865
-https://github.com/HenryNdubuaku/maths-cs-ai-compendium
-https://blog.exe.dev/software-as-wiki
-https://www.testingcatalog.com/manus-ai-launched-24-7-agent-via-telegram-and-got-suspended/
-https://codemade.net/blog/building-for-one/
-https://winbuzzer.com/2026/02/13/microsoft-mustafa-suleyman-ai-self-sufficiency-openai-mai-models-xcxwbn/
-https://www.testingcatalog.com/xai-tests-parralel-agents-and-arena-mode-for-grok-build/
-https://metehan.ai/blog/reverse-engineering-the-gpt-5-tokenizer-aeo-geo/
-https://go.welodata.ai/l/976893/2026-01-23/8njgp
-https://joshcollinsworth.com/blog/sloptimism
-https://www.vulnu.com/p/the-problem-isnt-openclaw-its-the-architecture
-https://welodata.ai/ai-data-quality-systems-human-judgment-at-scale/
-https://x.com/neural_avb/status/2022715561390776524
-https://welodata.ai/ai-data-quality-systems/
-https://openrouter.ai/openrouter/free
-https://www.jeffgeerling.com/blog/2026/ai-is-destroying-open-source/
-https://steipete.me/posts/2026/openclaw
-https://venturebeat.com/infrastructure/nvidia-groq-and-the-limestone-race-to-real-time-ai-why-enterprises-win-or
-https://danielmiessler.com/blog/nobody-is-talking-about-generalized-hill-climbing
-https://arxiv.org/abs/2602.12036
-https://www.seangoedecke.com/fast-llm-inference/
-https://anthony.noided.media/blog/ai/programming/2026/02/14/i-guess-i-kinda-get-why-people-hate-ai.html
-https://huggingface.co/nvidia/personaplex-7b-v1
Episode Transcript
OpenClaw goes viral, sparks risk
Let’s start with the story that’s turning into the textbook example of modern agent risk: OpenClaw.
OpenClaw—previously known as ClawdBot and MoltBot—has spread fast because it’s genuinely useful. It runs locally, it can act on your behalf, and it’s designed to be extended with “skills.” And that’s exactly where the alarm bells are coming from.
One analysis argues the recent OpenClaw incidents aren’t just a framework bug—they’re a preview of an architectural problem: once you move from chatbots that merely answer questions to agents that take actions—running commands, editing files, calling APIs—your failure modes get sharper. A hallucinated answer is annoying; a hallucinated action can be destructive.
Security researchers reported a wave of malicious skills appearing in OpenClaw’s marketplace, ClawHub, distributed in a way that resembles a supply-chain attack. Some leaned on social engineering—“setup steps” that basically told users to paste suspicious commands into their terminal. OpenClaw responded by partnering with VirusTotal to scan third-party skills, which helps, but scanning can’t solve everything.
The deeper issue is what Simon Willison famously calls the ‘lethal trifecta’: private data access, untrusted input ingestion, and the ability to communicate externally. Put those three together, and prompt injection stops being an academic trick and becomes an operational exfiltration risk—especially when agents read emails, browse the web, or ingest tickets that attackers can influence.
So what do grown-up controls look like? The recommendations are refreshingly practical: sandbox runtimes—VMs, containers, separate machines or users—plus least-privilege credentials, default-deny network egress, allowlisted tools, and approvals for high-risk actions. And crucially: log what the agent actually did, not only what it said in chat.
Now, OpenClaw is also at the center of a talent-and-ecosystem twist. Peter Steinberger, the creator, announced he’s joining OpenAI to work on ‘bringing agents to everyone,’ with the stated goal of building an agent his mother can use. He says OpenClaw will remain open source and will move into a foundation to keep it independent—while OpenAI sponsors the project and he works from inside the lab. It’s an interesting hybrid: community governance on the outside, frontier access on the inside. The next question is whether the security posture matures at the same speed as usability.
And if you’re wondering why enterprises are suddenly paying attention, Zenity is basically building an educational and marketing runway around this exact moment. Zenity is pushing an on-demand webinar specifically about OpenClaw security—framing the risk in plain language: because OpenClaw runs on an employee device with employee permissions, it can access sensitive data and perform actions as that user. For a company, that’s not a ‘tool install’—it’s a new identity and automation layer that inherits all the messiness of endpoints.
Zenity is also running a three-part series called ‘Foundations of AI Security: What, Why, and How.’ The focus is agents as ‘digital teammates’—already appearing across enterprises—and the sessions aim to cover what agents are, how they’re attacked, and how to regain governance. There’s even a professional certificate for completing the series. And, yes, the registration is gated behind the usual form fields and marketing consent—so it’s education, but it’s also lead gen. Still, it’s a sign: agent security is becoming its own discipline, not a footnote in cloud security.
Enterprise governance for AI agents
Staying on agents, let’s talk about where the research community thinks this is heading: delegation.
A new arXiv paper titled ‘Intelligent AI Delegation’ argues that as agents tackle more complex tasks, they need something better than ad-hoc heuristics for splitting work and handing it off—whether that handoff is to another agent, or to a human.
What’s notable is the framing: delegation isn’t just task assignment. The paper explicitly calls out transfer of authority, responsibility, and accountability. That’s a big deal, because most agent systems today can route sub-tasks, but they’re fuzzy about who is responsible when something fails—especially in multi-party networks where an agent delegates to an agent that delegates again.
The authors propose treating delegation as a sequence of task-allocation decisions, with clear role boundaries and mechanisms for trust between delegators and delegatees. If you’ve ever watched an agent workflow go sideways and then asked, ‘Wait—why did it do that, and who approved it?’—this is an attempt to make that question answerable.
It’s also a hint at an ‘agentic web’ where delegation is a protocol-level feature, not just a prompt pattern. If that happens, the governance layer—logging, permissions, accountability—becomes just as important as the model quality.
Agents in the wild: Manus and Telegram
Now let’s zoom out to consumer-ish agents and platform realities.
Manus AI launched an ‘Agents’ feature aiming for personal agents with identity and persistent memory, designed to run always-on on a dedicated computer instance. The onboarding flow advertises integrations with Telegram, WhatsApp, Facebook Messenger, and Line—yet at launch, Telegram appears to be the only one that actually works.
The design choice is clever: Telegram becomes the persistent front door. You link your Telegram account, Manus spins up a dedicated chat, and that same thread is mirrored in the Manus web app and mobile clients. The pitch is low friction: connect chat, add tools, install skills, and you’ve got a 24/7 agent.
But the platform dependency showed up immediately—Telegram suspended the new Manus AI always-on agent account shortly after launch, with no clear public explanation from Telegram or Meta.
If you’re building ‘always-on’ assistants on top of consumer messaging platforms, you’re building on rented land. Even if your product works technically, it can be throttled by policy, abuse concerns, or just a platform deciding it doesn’t like automated agents in its ecosystem.
And there’s another practical constraint: Manus is credit-based. Long-running agent workflows can chew through credits fast, so pricing transparency becomes a core feature, not a billing detail. Agents are not like single-shot chats; they’re more like background processes that never stop wanting to spend tokens.
Agentic coding: edit buttons and IDEs
Let’s switch to agentic coding—because we’re seeing two distinct trends: ‘software as a wiki’ and ‘multi-agent IDEs.’
First, a small but revealing story from exe.dev: they built an internal link shortener called slinky, and the unusual part is an ‘Edit with Shelley’ button. Click it, and you drop into an agent named Shelley running on the same VM as the app—ready to modify the software directly.
In the example, the author wanted template-style parameters in short links—so a short URL like /trace/foo could expand into a very long, heavily escaped Honeycomb URL where ‘foo’ is substituted into a query. He gave the agent instructions, and within minutes it implemented the feature in a one-shot change.
The thesis is provocative: some internal tools can be treated like a wiki—if you don’t like it, you click edit and change it. It’s a powerful idea, but it also underlines why agent permissions and containment matter. An ‘edit’ button that can change production tooling is amazing—until it’s an attacker’s button.
Second, a much more ambitious direction: xAI’s Grok Build is reportedly evolving from a ‘vibe coding’ assistant into a browser-based IDE. The most interesting rumored capability is ‘Parallel Agents,’ where one prompt can spawn multiple coding agents at once—apparently up to eight concurrent agents across two models, with side-by-side output and context usage tracking.
There’s also talk of an ‘Arena mode’—a tournament-style workflow that scores or ranks outputs to automatically surface the best result. If this works, it’s a shift from ‘the model gives you one answer’ to ‘the system runs a small competition and hands you the winner.’
The open question is how reliably it improves outcomes versus just multiplying token spend—and whether agent orchestration becomes the real product while the base model becomes a replaceable component.
Microsoft builds MAI to diversify
On the practical, hands-on side of AI-assisted coding, we got a great build log from someone who made a KDE Plasma task switcher.
Loris Bognanni built ‘FastTab’ to fix a niche but real annoyance: KDE’s Gallery task switcher on X11 can take up to a second to open. His alternative is written in Zig, uses OpenGL, and runs as a daemon so it can respond instantly.
What makes it relevant for AI news is the process: he started with zero Zig and X11 internals experience, and used Claude to prototype quickly—then iterated to something usable. But he’s clear about the tradeoffs.
He emphasizes incremental development with git, careful diff review, and containment. Manually approving every agent command was exhausting, but giving full permissions felt reckless—so he used locked-down Docker containers to isolate the agent from his real filesystem. He also learned to explicitly describe the container’s constraints—no display access, no package installs—so the model wouldn’t waste tokens trying impossible steps.
His most honest takeaway is that AI gets you 80% fast, but the last 20% is still engineering: the first version was a huge, messy file with duplication and no tests. He had to refactor for maintainability before further AI help became safe. That’s probably the pattern we’ll see for a while: models accelerate prototypes, but humans still own the architecture, the risk management, and the final polish.
Tokenizers and why links hallucinate
Let’s hit a major platform story: Microsoft moving toward self-sufficiency.
Microsoft says it’s building its own ‘MAI’ models to reduce reliance on OpenAI, led by AI chief Mustafa Suleyman. This is a clear strategic adjustment from the earlier era where Microsoft leaned heavily on OpenAI for the brains behind Copilot.
The shift accelerated after a 2025 restructuring of the Microsoft–OpenAI relationship that gave both sides more flexibility. Microsoft’s pitch is ‘true self-sufficiency’—which, translated, means: if you want to own the future of your product stack, you can’t rent your core models forever.
They’re investing in infrastructure too: the Maia 200 accelerator for cost-efficient inference, and the Fairwater data center network described as an AI ‘superfactory.’ At the same time, Microsoft is still buying Nvidia and AMD hardware, and still hosting a menu of models on Azure—Anthropic, Meta’s Llama, Mistral—alongside OpenAI.
So this isn’t a breakup; it’s diversification. Microsoft even has API access to OpenAI models secured through 2032 and still holds a big stake in OpenAI. The story here is leverage: controlling more of the model stack gives Microsoft negotiating power, cost control, and the ability to tailor models for specific product needs.
RL for agents: Forge and Composition-RL
Two deeply technical but surprisingly practical stories today involve tokenization and training.
First: a detailed reverse-engineering effort on OpenAI’s o200k_base tokenizer—the one reportedly underpinning GPT-4o, GPT-5, and other ‘o’ models. The author reconstructs artifacts using tiktoken and a public vocabulary file hosted on Azure.
The headline finding is that English prose doesn’t gain much efficiency going from roughly 50k to 100k to 200k tokens—but code, URLs, and non-Latin scripts do. That matches what many developers feel: newer models ‘handle code and multilingual text better,’ and tokenization is part of the reason.
A standout detail: the pre-tokenization regex appears to be camelCase and PascalCase aware—splitting identifiers like camelCaseVariable into more meaningful chunks before BPE merges. That’s a small implementation choice with big downstream benefits for code.
The author also highlights a tool-oriented tokenizer variant—o200k_harmony—with roughly 1,000+ tool/control tokens and reserved slots, suggesting a protocol layer for tool use without changing the base text vocabulary.
And there’s a practical hypothesis: when important entities are single tokens, models may be less error-prone than when they must assemble them from multiple tokens. That could contribute to hallucinated links and date mistakes—where the model stitches together plausible fragments. It doesn’t ‘explain everything,’ but it’s a concrete mechanism that helps connect low-level representation to high-level failure modes.
Second: on training, there’s a thread of momentum in reinforcement learning for agents.
MiniMax’s M2.5 model has been turning heads for being fast, cheap, and strong at coding, and a summary of their technical write-up explains how they scaled RL for agentic tasks. MiniMax describes an ‘impossible triangle’: high throughput, training stability, and agent flexibility across many real-world scaffolds.
Their system, Forge, decouples generation from training. Agents generate trajectories asynchronously, middleware logs everything into a pool, and the engine trains later—so you don’t have to lockstep the whole system. They also shape rewards around not just correctness, but tool-call validity, speed, and even encouraging parallel tool use. Plus they use tricks like prefix trees to reuse shared dialogue prefixes—reportedly yielding huge training speedups.
In the same spirit of getting more mileage from limited training signals, there’s a new arXiv paper called Composition-RL. It targets a subtle RLVR problem: as a model improves, many prompts become ‘pass-rate 1’—too easy to provide learning signal. Composition-RL composes multiple easy-but-verifiable prompts into a new verifiable challenge, keeping the training informative. They report consistent reasoning gains across models from 4B to 30B parameters, and they’ve released code and datasets.
This is the underlying theme of 2026 so far: agent capability is not only about bigger pretraining. It’s about better post-training loops, better verification, and better system design around the model.
Data quality systems for human judgment
Before we wrap, two stories about trust, quality, and the social surface area of AI.
On the enterprise side, Welo Data argues that AI projects fail quietly not because the model is weak, but because human judgment—evaluation and labeling—can’t be explained, repeated, or defended at scale. Their point is that once programs expand across teams, countries, languages, and risk profiles, unstructured human judgment drifts. People interpret guidelines differently, disagreements get papered over, and then suddenly nobody can reconstruct why a dataset was labeled a certain way.
Their prescription is to operationalize judgment: calibrated evaluators, continuous monitoring for drift, structured QA loops with escalation paths, and auditability where every decision is traceable. They’re also skeptical of scaling via LLM ‘judges’ without governed oversight—because inconsistency and bias can get amplified, not reduced.
Separately, a personal essay takes aim at what it calls ‘AI optimism’—arguing that the downside risks are landing unevenly and that optimism can be a kind of privilege. The author describes trying a ‘personalized roast’ tool based on a GitHub profile and being surprised by how emotionally cutting it felt—then extrapolating to what that means for bullying, especially for kids, when AI can generate targeted cruelty at scale.
The essay also points to deepfakes, scams, propaganda, and flawed deployments in policing and justice—plus the more mundane but pervasive experience of platforms being flooded with low-quality generated content.
Those concerns connect directly to open-source realities. Jeff Geerling argues agentic AI and AI-generated ‘slop’ are already harming open source. He cites an Ars Technica retraction after hallucinated quotes, harassment directed at maintainers, and the curl project ending its bug bounty because AI-driven submissions tanked the signal-to-noise ratio—useful reports fell from around 15% to 5%. His broader claim is simple: reviewer time is the bottleneck, and AI is currently multiplying the work maintainers must do to keep ecosystems healthy.
Whether you see these as growing pains or structural problems, the direction is clear: the cost of verification—of claims, of code, of media—is rising. And society is not yet staffed, tooled, or incentivized to handle that load gracefully.
AI backlash: slop, scams, bullying
Finally, one of the most exciting releases today if you care about real-time voice agents: NVIDIA’s PersonaPlex-7B-v1.
PersonaPlex is a speech-to-speech conversational model built for full-duplex dialogue—meaning it can listen and speak at the same time. That’s a big shift from the ‘push to talk’ feel of many voice assistants, because natural conversation includes interruptions, overlaps, quick backchannels, and barge-ins.
Technically, it operates on continuous audio via a neural codec and autoregressively predicts both text tokens and audio tokens. It’s conditioned on two prompts: an audio voice prompt to set speaking style, and a text prompt to define persona—role, background, scenario.
NVIDIA says it’s optimized for their GPUs—Ampere and Hopper—and trained on Fisher English speech conversations. It’s designed for English-in, English-out today, and evaluated on a benchmark focused on conversational dynamics like turn-taking and interruption latency.
If full-duplex voice becomes common, it’s going to make agents feel less like ‘systems you query’ and more like ‘participants.’ That’s powerful—and it also raises the bar again for safety, logging, and consent, because real-time systems can do damage faster than humans can intervene.
Subscribe to edition specific feeds:
- Space news
* Apple Podcast English
* Spotify English
* RSS English Spanish French
- Top news
* Apple Podcast English Spanish French
* Spotify English Spanish French
* RSS English Spanish French
- Tech news
* Apple Podcast English Spanish French
* Spotify English Spanish Spanish
* RSS English Spanish French
- Hacker news
* Apple Podcast English Spanish French
* Spotify English Spanish French
* RSS English Spanish French
- AI news
* Apple Podcast English Spanish French
* Spotify English Spanish French
* RSS English Spanish French
Visit our website at https://theautomateddaily.com/
Send feedback to [email protected]
Youtube
LinkedIn
X (Twitter)