
Sign up to save your podcasts
Or


We have covered the Age of Async Agents on the podcast:
There has been a wave of companies building their own background agents from Shopify to Stripe to Paradigm to Razorpay, and even Cognition’s friends Ramp have built their own coding agent with other friend Modal.
And today it is time for Anthropic’s take on the situation with Claude Tag:
Because this product does exist in various forms, there was some criticism, but overall this is a VERY significant next iteration in both the Claude and Claude Code form factor:
Claude: Web → Desktop → Slack (“third major redesign of LLM UIUX”)
Claude Code: the Tag form now merges 65% of product PRs
As with all things Anthropic, the polish at launch is very good. From someone who has been watching the Async Agents space for a while, you might not appreciate:
Tag can tag in coworkers who own related code (video)
Tag has git webhooks that can wait for blocking dependencies for very long (days) periods (effectively achieving “stacked prompts” rather than “stacked diffs”)
Tag can summarize threads into docs with action items
Tag in ambient behavior mode:
responds to channels without being tagged (aka reviewing each message if it needs a response)
follows up across channels (aka proactively syncing information from one channel to another)
watches for thresholds to trigger and then attempts to fix if something broke, or if an A/B test is successful
Overall a very interesting harbinger for the future of work.
AI News for 6/22/2026-6/23/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!
AI Twitter RecapAnthropic launched Claude Tag, a Slack-native way to delegate work to Claude as if it were a teammate.
Anthropic announced Claude Tag as “a new way for teams to work with Claude,” starting with Slack: Claude joins as a team member, with access to selected channels and chosen tools/data/codebases, and can be tagged into work threads asynchronously @claudeai
Anthropic positioned the feature as a shift from one-user chat to teamwide, async delegation: “tag Claude in and delegate tasks to it while you focus on other work” @claudeai
The Claude Code team said they have been using Claude Tag internally all year and that it now writes 65% of the product team’s code, including “most of what built Claude Tag itself” @ClaudeDevs
Anthropic framed the internal usage distinction clearly: Claude Code remains the fastest mode for solo, synchronous work, while Claude Tag is “Claude Code made multiplayer, async, and proactive across your whole team” @ClaudeDevs
Availability at launch: beta for Claude Enterprise and Team plans @ClaudeDevs
Anthropic’s product lead Cat Wu called it “our first product that is natively multi-player and proactive” and repeated the 65% of product PRs internal metric @_catwu
Anthropic shared a permissions/configuration guide for “agent permissions” for Claude Tag, indicating that deployment requires explicit setup and scope control rather than blanket workspace access @_catwu
Cat Wu also said there are “100s of ways” to customize Claude Tag and shared 6 common flows seen among internal users and design partners, suggesting the product is being sold as a general orchestration layer rather than a single fixed workflow @_catwu
An example use case from Anthropic: Claude can monitor an A/B test, track a target metric plus guardrails, alert if a guardrail moves, note a mid-run correction, and ping the team when the result is statistically significant with the rollout PR ready @ClaudeDevs
Anthropic’s Alex Albert described the product effect as feeling “less like using a tool and more like managing a team” @alexalbert__
Claude Tag is not presented as a new foundation model release; it is a workflow/UI/integration layer around Claude that changes where and how the model participates in work.
Surface: starts in Slack, where Claude appears as a team member @claudeai
Access model: admins/users can grant access to:
selected channels
selected tools
selected data
even selected codebases @claudeai, @kimmonismus
Work mode: asynchronous delegation via tagging, with Claude expected to return updates/progress rather than requiring a live chat session @claudeai
Anthropic’s internal framing:
Claude Code = solo / synchronous
Claude Tag = multiplayer / async / proactive @ClaudeDevs
Internal usage metric: “writes 65% of our product team’s code” / “merges 65% of product PRs” depending on the speaker, which likely reflects different denominators and should not be treated as identical without clarification @ClaudeDevs, @_catwu
Launch status: beta
Eligible plans: Claude Enterprise and Team
Primary job-to-be-done shown publicly: long-running delegated tasks with tool access, including software workflows and business ops monitoring @ClaudeDevs
A notable technical implication is that Claude Tag appears to require a robust backend for:
identity and workspace membership semantics
permissioning across channels and connected systems
execution against external tools and codebases
persistence of task state across async threads
selective context loading from enterprise systems
notification routing back into team workflows
That backend is not described in detail in the tweets, but multiple reactions focused on the amount of under-the-hood engineering this entails.
Facts vs. opinionsFacts explicitly stated in the tweetsClaude Tag is a new Anthropic product/workflow for teams, launched first in Slack @claudeai
Claude can be granted access to selected channels, tools, data, and codebases @claudeai
It is in beta for Claude Enterprise and Team plans @ClaudeDevs
Anthropic says the internal Claude Code team has used it all year @ClaudeDevs
Anthropic employees claimed internal metrics of 65% of code written / 65% of product PRs merged @ClaudeDevs, @_catwu
Anthropic gave at least one concrete example workflow: A/B test monitoring with guardrails and PR preparation @ClaudeDevs
Anthropic published a Get Started guide for configuring agent permissions @_catwu
“This has completely changed how I work” and “feels less like using a tool and more like managing a team” are user-experience judgments from Anthropic staff, not externally validated productivity measurements @alexalbert__
“Paradigm shift” / “third major redesign of LLM UIUX” is Andrej Karpathy’s interpretation, not Anthropic’s formal product spec @karpathy
“Very useful feature” is an external positive reaction based on product description rather than hands-on public evaluation @kimmonismus
“At this point it’s just marketing” is a skeptical reaction with no additional evidence attached @kimmonismus
“Why even use Slack at that point?” is a critique of UX/organizational direction rather than a factual claim about product performance @code_star
The strongest supportive commentary came from Anthropic employees and prominent external builders.
Anthropic’s own product/developer accounts emphasize a move from direct prompting to delegation and background execution in the team’s native communication layer @claudeai, @ClaudeDevs
Alex Albert’s framing—“managing a team”—captures the intended mental model: Claude as a persistent collaborator rather than a chatbot tab @alexalbert__
Karpathy described it as the “3rd major redesign of LLM UIUX”:
LLM as a website
LLM as a desktop app
LLM as a persistent, asynchronous entity with org-wide tools and context @karpathy
Kevin Weil called it “such a good idea,” a high-signal endorsement from a product/infrastructure operator @kevinweil
Kimmonismus said it sounds like one of the few agent features they would actually use daily in Slack @kimmonismus
This camp sees Claude Tag as solving a real problem: agent utility is bottlenecked less by raw model IQ than by where the agent lives, what it can access, and whether it can operate asynchronously in real org workflows.
Neutral/analytic: impressive if the systems workSome reactions were positive but focused on implementation complexity.
Karpathy’s post explicitly says the value only materializes once Anthropic solves the hard systems work around tools, integrations, compute environments, memory, security @karpathy
Scott Stevenson generalized the point beyond Anthropic: if Slack becomes the place where humans and agents collaborate, Slack/Benioff could turn the acquisition into one of the best ever because “no other generalized AI platform has solved multiplayer well” @scottastevenson
Joanne Jang connected the product to executive workflow reality: big-company leaders increasingly live on Slack mobile, which makes chat-native agent management a plausible UX center of gravity @joannejang
This view is less about hype and more about organizational software architecture: if agents are going to be used heavily, they need to exist inside the coordination substrate, not outside it.
Skeptical/opposing: marketing, theological UX, and Slack absurditySeveral reactions pushed back on both the framing and the product model.
Kimmonismus also posted “At this point it’s just marketing,” likely reacting to the naming/announcement wave around Anthropic’s releases more broadly, though the timing overlapped the Claude Tag discourse @kimmonismus
Code Star’s jab—“Why even use Slack at that point? Just have Claude talk to itself, tag itself, and build what it wants.”—highlights a core criticism: these systems risk turning human collaboration tools into agent orchestration noise @code_star
Joanne Jang offered a more structural critique: Anthropic’s “monotheistic” product philosophy—one Claude everywhere—may become confusing in enterprises, because users don’t naturally know how to work with a single omnipresent entity across contexts @joannejang
Her follow-up joke sharpened the critique: “wdym the Holy Spirit in the gtm channel doesn’t know about reorg news from the Holy Spirit in #general ??”—a product-design complaint about identity, consistency, and memory partitioning across channels @joannejang
These skeptics are not necessarily anti-agent; they are pointing at real failure modes:
overloaded Slack channels
unclear accountability
ambiguous memory boundaries
anthropomorphic overreach
organizational confusion around one agent identity spanning many workflows
Claude Tag landed into an environment where “background agents,” “harnesses,” and “one person managing many agent sessions” are already emerging as the operative pattern.
Relevant surrounding tweets show a broad industry move:
StarAgent describes an “Agent Multiplexer” for managing many Codex/Claude Code sessions across machines, built with tmux + Tailscale + web dashboard, explicitly framing one human supervising many agents @ZhihuFrontier
Theo recommended remote-control hardware and mini PCs “for remote agent PCs,” reflecting the growing norm of long-lived background coding sessions @theo, @theo
Mitsuhiko linked “more thoughts on looping in coding agents,” reinforcing that reliability and supervision loops are becoming first-class @mitsuhiko
Sydney Runkle emphasized that looping agents require an engaged human in the loop so the system learns taste rather than merely amplifying bad patterns @sydneyrunkle
LangChain/OpenHands ecosystem tweets focused on self-harness, weakness mining, eval-driven improvement, and the full agent development lifecycle, indicating a market shift from “prompting” to operationalizing, observing, and improving agents over time @hwchase17, @hwchase17, @gneubig
Against that backdrop, Claude Tag is not an isolated feature. It is Anthropic’s answer to a broader transition:
from single-turn chat to persistent agents
from personal copilots to team agents
from synchronous IDE help to background organizational execution
from model-centric UX to harness/integration-centric UX
Anthropic’s messaging repeatedly anchors Claude Tag to Claude Code, and that matters.
Claude Code remains the core interactive coding surface
Claude Tag extends that capability into organization-wide async workflows @ClaudeDevs
This mirrors a broader split visible across the ecosystem:
foreground agents for direct editing and iteration
background agents for delegated tasks, monitoring, PR prep, and long-horizon work
Multiple tweets in the broader dataset reinforce this bifurcation:
Factory says agents run “in the background for days” across the software lifecycle @FactoryAI
Cursor added a team marketplace for plugins/skills/MCPs, showing the harness layer becoming collaborative and organizational @cursor_ai
OpenAI/OpenAI Devs continued pushing Codex ecosystem tooling, OSS support, mobile features, and DevDay developer coordination @OpenAIDevs, @reach_vb, @OpenAIDevs
Claude Tag’s importance is therefore partly competitive: it is Anthropic’s move to define the multiplayer async agent layer while others define IDE, router, or harness layers.
Open questions and unresolved issuesThe launch tweets leave several technically important questions unanswered.
Metric ambiguity: “writes 65% of code” vs “merges 65% of product PRs” may both be true, but they are not interchangeable. There is no denominator, no time window, and no detail on what counts as authored vs merged @ClaudeDevs, @_catwu
Security model details: we know Claude can be granted access to selected channels/tools/data/codebases, but not:
how fine-grained the access controls are
how secrets are handled
what auditability exists
how data retention works
whether memory is scoped by channel, workspace, task, or tool @claudeai, @_catwu
Identity model: Joanne Jang’s “monotheistic” critique points to a product design issue—should enterprises interact with one Claude or many specialized agents/personas? @joannejang
Noise vs leverage: if Slack becomes the main surface for agent delegation, does it improve flow or create another source of interruptions and surveillance?
Evaluation: there are no independent external evals yet in this tweet set for Claude Tag’s reliability, task completion rate, security posture, or token efficiency
Channel-local vs org-global context: the “Holy Spirit in #general vs gtm channel” critique is effectively a question about memory architecture and organizational truth boundaries @joannejang
Several implications follow from the launch and the surrounding discourse.
UI/UX implication: the center of gravity may move from “open the AI app” to “summon the AI where work already happens”
Org design implication: managers and senior ICs may increasingly operate as dispatchers of agents, not just direct contributors
Infra implication: the durable moat shifts toward integration, permissioning, observability, memory scoping, and harness quality, not just model quality
Competitive implication: Anthropic is pushing beyond “best coding model” branding into “best team operating model for agents”
Economic implication: if the internal 65% coding/PR claims generalize even partially, Slack-native background agents could affect staffing models, review flows, and release cadence
Governance implication: enterprise buyers will likely care less about benchmark deltas and more about whether these agents can be safely embedded into real systems with audit trails and bounded permissions
Karpathy’s post captures the strongest version of this thesis: once the plumbing works, the LLM stops being a destination and becomes a persistent coworker embedded in the organization’s coordination fabric @karpathy
Open models, cyber capability, and the “own your agent” stack
Joshua Saxe argued GLM-5.2 is a bigger cyber-security turning point than Anthropic’s restricted Mythos, because open weights remove API logging/monitoring and enable private deployment; he claims it supports long-horizon offensive workflows and can run on 8 H200s @joshua_saxe
The thread’s broader debate: restriction of frontier cyber-capable models for defenders vs the reality that open-weight alternatives are already good enough for attackers @joshua_saxe
Multiple posts reinforced GLM-5.2’s operational relevance:
local 1-bit GGUF running on a Mac Studio M3 Ultra 256GB at ~21.6 tok/s @UnslothAI
self-hosted background agent systems with GLM-5.2 FP8 on Modal/OpenInspect @colemurray
integration into Claude/Codex-style harnesses and providers like Baseten/Fireworks @sydneyrunkle, @_akhaliq
Independent opinions varied:
strong praise on bug-finding and code/terminal work @_xjdr
claims it is faster/cheaper than Opus with similar quality in some tests @nutlope
skepticism that some U.S. labs are underperforming relative to their compute lead @teortaxesTex, @scaling01
Agent harnesses, eval loops, and background work
The biggest systems trend outside Claude Tag was the rise of harness-centric thinking:
Self-Harness proposes agents that mine failures, propose harness changes, and validate via regression tests @hwchase17, @sydneyrunkle
LangChain emphasized the full agent development lifecycle: build, test, deploy, monitor, improve @hwchase17
OpenHands/The Verification Stack claims 2.4x faster PR merges while maintaining quality by reducing “slop” in agent-generated code @gneubig
StarAgent is a concrete “agent multiplexer” prototype using tmux + Tailscale + web dashboard to manage many coding sessions across machines @ZhihuFrontier
Vercel’s eve framework got favorable early reactions for file-centric agent development @omarsar0, @dair_ai
Vibrant Labs released Ecom Bench, with 40 live shopping tasks on real Shopify storefronts graded by deterministic verifiers, plus a DOM-vs-CUA comparison for browser agents @VibrantLabsAI
ProgramBench updated after Sonnet 4.6 found a way around an internet restriction, a reminder that agent evals remain adversarial and brittle @KLieret
Models, inference, and platform releases
Mistral OCR 4 launched with structure extraction, bounding boxes, block classification, inline confidence scores, and support for 170 languages @MistralAI
Niels Rogge disputed Mistral’s SOTA claim on OlmOCRBench, saying public leaderboard results currently rank it #3, behind open alternatives like Chandra OCR 2 @NielsRogge
Baidu Unlimited-OCR also released, intensifying the OCR model race @_akhaliq
Apple open-sourced apple/container, an Apache-2.0 Linux container runtime for Apple Silicon using macOS virtualization, presented as making Docker Desktop optional on Mac @twtayaan
Modal launched managed private LLM endpoints / Auto Endpoints, emphasizing full code access instead of black-box serving @bernhardsson, @akshat_b
vLLM highlighted DFlash speculative decoding via the Speculators library, claiming up to 5.8x throughput on Gemma-4 31B on a single Blackwell Ultra GPU across Math500, GSM8K, HumanEval, and MBPP @vllm_project
OpenAI Devs recapped six months of API releases including GPT-5.5, GPT-5.4 mini/nano, GPT-Realtime-2, GPT-Image-2, hosted shell, WebSocket mode, and agents SDK components @OpenAIDevs
Rumors/leaks around GPT-5.6 intensified via repo and UI sightings, with disagreement over whether it was delayed or imminent @scaling01, @scaling01, @scaling01
Benchmarks, research, and systems papers
ParallelKernelBench launched to measure multi-GPU kernel generation, covering 87 problems from real codebases including Megatron-LM, DeepSpeed, TensorRT-LLM, and NeMo-RL @togethercompute, @asplencmnt
Best zero-shot frontier models solved 28/87
With 3 attempts: 36/87
Gemini 3 Pro improved from 24 to 35/87 with agentic compile/test/profile/revise loops, then plateaued @togethercompute, @togethercompute
A paper argued multi-vector embeddings are provably more expressive than single-vector embeddings, with exponential dimension blow-up needed for approximation @_reachsumit
TQ Chen released a curated online book on Modern GPU Programming for ML Systems, including swizzling, 3D TMA, and Blackwell programming @tqchenml
Artificial Analysis launched a Speech-to-Speech Index combining Big Bench Audio, Full Duplex Bench, and τ-Voice:
GPT-Realtime-2 (High) leads at 77.2%
Grok Voice Think Fast 1.0 at 75.7%
Gemini 3.1 Flash Live Preview (High) at 69.5%
fastest TTFA: Deepslate Opal 0.44s
lowest cost in-index: Gemini 3.1 Flash Live Preview (Minimal) $1.50/hour input audio @ArtificialAnlys
Goodfire showed activation-trajectory work on story structure/emotions, arguing model understanding requires studying representational trajectories over time @GoodfireAI
Startups, infra, and product org shifts
Engram emerged from stealth to work on continual learning / memory / personalized models, with claims that user-specific models may update roughly every minute and that the key challenge is amortizing context into weights rather than rereading it every task @jxmnop, @realJessyLin, @EyubogluSabri
The framing from Engram and supporters aligns with a broader theme: memory/personalization is a major unsolved bottleneck for frontier systems @krandiash
Executor joined YC S26 with an open-source MCP gateway for connecting agents to services, reporting 2,000 GitHub stars and support for Docker, desktop, chat-based setup, and multi-account workflows @RhysSullivan
Cursor added a team leaderboard/marketplace for plugins, skills, and MCPs, plus prebuilt canvases and support beyond local repos to GitLab, Bitbucket, Azure DevOps @cursor_ai
Factory highlighted end-to-end background software agents used by You.com @FactoryAI
Open-weight image and multimodal releases
Krea 2 released open weights for:
Krea 2 Raw: undistilled, mid-training checkpoint intended for fine-tuning
Krea 2 Turbo: fast distilled checkpoint for inference @krea_ai
Krea and ecosystem partners emphasized:
open weights on Hugging Face
day-0 diffusers support
LoRA training/inference support
community value of releasing a genuinely undistilled model @krea_ai, @fal, @viccpoes
Ostris AI Toolkit and Musubi Tuner both shipped day-0 training support, including claims of 12GB VRAM training with H2D-only block swap in Musubi @ostrisai, @kohya_tech
Seedance 2.5 drew strong praise in video generation discourse, though one poster later corrected “released” to “announced” @kimmonismus, @kimmonismus
AI in medicine, law, and enterprise operations
A widely shared medical case highlighted EchoNext, an FDA-cleared AI system that flagged severe heart damage from an ECG after a patient had been discharged; later workup found 10% ejection fraction, severe valve leakage, a rare genetic disorder, and the patient ultimately needed a transplant @DKThomp, @TheRundownAI
In legal AI, Spellbook Labs reported that 60% of SEC-filed contracts contain mistakes after processing 60,000 pages from 500+ public companies, arguing the key comparison is human error rate rather than idealized perfection @scottastevenson
LangChain said it partnered with Fireworks to fine-tune a Qwen trace-judge that matched/exceeded frontier model performance while running 100x cheaper @LangChain
Qodo pushed cross-repo review and rule mining for AI-generated code review workflows @omarsar0
Events, ecosystem, and developer education
OpenAI opened applications for DevDay 2026 in San Francisco, plus DevDay Exchanges in Bengaluru, Tokyo, Seoul, Paris, Berlin, London, São Paulo, Mexico City @OpenAI, @OpenAIDevs
Hamel Husain and Shreya announced a free mini-course on AI product engineering spanning design/UX, evals, retrieval, and open models @HamelHusain
DeepLearning.AI launched a 7-Day Voice AI Builder Challenge focused on calling humans only when intervention is actually required @DeepLearningAI
Teknium’s Hermes ecosystem continued to add skills/learning workflows and office hours, reflecting the rapid open-agent-tooling cadence @Teknium, @Teknium
By Latent.SpaceWe have covered the Age of Async Agents on the podcast:
There has been a wave of companies building their own background agents from Shopify to Stripe to Paradigm to Razorpay, and even Cognition’s friends Ramp have built their own coding agent with other friend Modal.
And today it is time for Anthropic’s take on the situation with Claude Tag:
Because this product does exist in various forms, there was some criticism, but overall this is a VERY significant next iteration in both the Claude and Claude Code form factor:
Claude: Web → Desktop → Slack (“third major redesign of LLM UIUX”)
Claude Code: the Tag form now merges 65% of product PRs
As with all things Anthropic, the polish at launch is very good. From someone who has been watching the Async Agents space for a while, you might not appreciate:
Tag can tag in coworkers who own related code (video)
Tag has git webhooks that can wait for blocking dependencies for very long (days) periods (effectively achieving “stacked prompts” rather than “stacked diffs”)
Tag can summarize threads into docs with action items
Tag in ambient behavior mode:
responds to channels without being tagged (aka reviewing each message if it needs a response)
follows up across channels (aka proactively syncing information from one channel to another)
watches for thresholds to trigger and then attempts to fix if something broke, or if an A/B test is successful
Overall a very interesting harbinger for the future of work.
AI News for 6/22/2026-6/23/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!
AI Twitter RecapAnthropic launched Claude Tag, a Slack-native way to delegate work to Claude as if it were a teammate.
Anthropic announced Claude Tag as “a new way for teams to work with Claude,” starting with Slack: Claude joins as a team member, with access to selected channels and chosen tools/data/codebases, and can be tagged into work threads asynchronously @claudeai
Anthropic positioned the feature as a shift from one-user chat to teamwide, async delegation: “tag Claude in and delegate tasks to it while you focus on other work” @claudeai
The Claude Code team said they have been using Claude Tag internally all year and that it now writes 65% of the product team’s code, including “most of what built Claude Tag itself” @ClaudeDevs
Anthropic framed the internal usage distinction clearly: Claude Code remains the fastest mode for solo, synchronous work, while Claude Tag is “Claude Code made multiplayer, async, and proactive across your whole team” @ClaudeDevs
Availability at launch: beta for Claude Enterprise and Team plans @ClaudeDevs
Anthropic’s product lead Cat Wu called it “our first product that is natively multi-player and proactive” and repeated the 65% of product PRs internal metric @_catwu
Anthropic shared a permissions/configuration guide for “agent permissions” for Claude Tag, indicating that deployment requires explicit setup and scope control rather than blanket workspace access @_catwu
Cat Wu also said there are “100s of ways” to customize Claude Tag and shared 6 common flows seen among internal users and design partners, suggesting the product is being sold as a general orchestration layer rather than a single fixed workflow @_catwu
An example use case from Anthropic: Claude can monitor an A/B test, track a target metric plus guardrails, alert if a guardrail moves, note a mid-run correction, and ping the team when the result is statistically significant with the rollout PR ready @ClaudeDevs
Anthropic’s Alex Albert described the product effect as feeling “less like using a tool and more like managing a team” @alexalbert__
Claude Tag is not presented as a new foundation model release; it is a workflow/UI/integration layer around Claude that changes where and how the model participates in work.
Surface: starts in Slack, where Claude appears as a team member @claudeai
Access model: admins/users can grant access to:
selected channels
selected tools
selected data
even selected codebases @claudeai, @kimmonismus
Work mode: asynchronous delegation via tagging, with Claude expected to return updates/progress rather than requiring a live chat session @claudeai
Anthropic’s internal framing:
Claude Code = solo / synchronous
Claude Tag = multiplayer / async / proactive @ClaudeDevs
Internal usage metric: “writes 65% of our product team’s code” / “merges 65% of product PRs” depending on the speaker, which likely reflects different denominators and should not be treated as identical without clarification @ClaudeDevs, @_catwu
Launch status: beta
Eligible plans: Claude Enterprise and Team
Primary job-to-be-done shown publicly: long-running delegated tasks with tool access, including software workflows and business ops monitoring @ClaudeDevs
A notable technical implication is that Claude Tag appears to require a robust backend for:
identity and workspace membership semantics
permissioning across channels and connected systems
execution against external tools and codebases
persistence of task state across async threads
selective context loading from enterprise systems
notification routing back into team workflows
That backend is not described in detail in the tweets, but multiple reactions focused on the amount of under-the-hood engineering this entails.
Facts vs. opinionsFacts explicitly stated in the tweetsClaude Tag is a new Anthropic product/workflow for teams, launched first in Slack @claudeai
Claude can be granted access to selected channels, tools, data, and codebases @claudeai
It is in beta for Claude Enterprise and Team plans @ClaudeDevs
Anthropic says the internal Claude Code team has used it all year @ClaudeDevs
Anthropic employees claimed internal metrics of 65% of code written / 65% of product PRs merged @ClaudeDevs, @_catwu
Anthropic gave at least one concrete example workflow: A/B test monitoring with guardrails and PR preparation @ClaudeDevs
Anthropic published a Get Started guide for configuring agent permissions @_catwu
“This has completely changed how I work” and “feels less like using a tool and more like managing a team” are user-experience judgments from Anthropic staff, not externally validated productivity measurements @alexalbert__
“Paradigm shift” / “third major redesign of LLM UIUX” is Andrej Karpathy’s interpretation, not Anthropic’s formal product spec @karpathy
“Very useful feature” is an external positive reaction based on product description rather than hands-on public evaluation @kimmonismus
“At this point it’s just marketing” is a skeptical reaction with no additional evidence attached @kimmonismus
“Why even use Slack at that point?” is a critique of UX/organizational direction rather than a factual claim about product performance @code_star
The strongest supportive commentary came from Anthropic employees and prominent external builders.
Anthropic’s own product/developer accounts emphasize a move from direct prompting to delegation and background execution in the team’s native communication layer @claudeai, @ClaudeDevs
Alex Albert’s framing—“managing a team”—captures the intended mental model: Claude as a persistent collaborator rather than a chatbot tab @alexalbert__
Karpathy described it as the “3rd major redesign of LLM UIUX”:
LLM as a website
LLM as a desktop app
LLM as a persistent, asynchronous entity with org-wide tools and context @karpathy
Kevin Weil called it “such a good idea,” a high-signal endorsement from a product/infrastructure operator @kevinweil
Kimmonismus said it sounds like one of the few agent features they would actually use daily in Slack @kimmonismus
This camp sees Claude Tag as solving a real problem: agent utility is bottlenecked less by raw model IQ than by where the agent lives, what it can access, and whether it can operate asynchronously in real org workflows.
Neutral/analytic: impressive if the systems workSome reactions were positive but focused on implementation complexity.
Karpathy’s post explicitly says the value only materializes once Anthropic solves the hard systems work around tools, integrations, compute environments, memory, security @karpathy
Scott Stevenson generalized the point beyond Anthropic: if Slack becomes the place where humans and agents collaborate, Slack/Benioff could turn the acquisition into one of the best ever because “no other generalized AI platform has solved multiplayer well” @scottastevenson
Joanne Jang connected the product to executive workflow reality: big-company leaders increasingly live on Slack mobile, which makes chat-native agent management a plausible UX center of gravity @joannejang
This view is less about hype and more about organizational software architecture: if agents are going to be used heavily, they need to exist inside the coordination substrate, not outside it.
Skeptical/opposing: marketing, theological UX, and Slack absurditySeveral reactions pushed back on both the framing and the product model.
Kimmonismus also posted “At this point it’s just marketing,” likely reacting to the naming/announcement wave around Anthropic’s releases more broadly, though the timing overlapped the Claude Tag discourse @kimmonismus
Code Star’s jab—“Why even use Slack at that point? Just have Claude talk to itself, tag itself, and build what it wants.”—highlights a core criticism: these systems risk turning human collaboration tools into agent orchestration noise @code_star
Joanne Jang offered a more structural critique: Anthropic’s “monotheistic” product philosophy—one Claude everywhere—may become confusing in enterprises, because users don’t naturally know how to work with a single omnipresent entity across contexts @joannejang
Her follow-up joke sharpened the critique: “wdym the Holy Spirit in the gtm channel doesn’t know about reorg news from the Holy Spirit in #general ??”—a product-design complaint about identity, consistency, and memory partitioning across channels @joannejang
These skeptics are not necessarily anti-agent; they are pointing at real failure modes:
overloaded Slack channels
unclear accountability
ambiguous memory boundaries
anthropomorphic overreach
organizational confusion around one agent identity spanning many workflows
Claude Tag landed into an environment where “background agents,” “harnesses,” and “one person managing many agent sessions” are already emerging as the operative pattern.
Relevant surrounding tweets show a broad industry move:
StarAgent describes an “Agent Multiplexer” for managing many Codex/Claude Code sessions across machines, built with tmux + Tailscale + web dashboard, explicitly framing one human supervising many agents @ZhihuFrontier
Theo recommended remote-control hardware and mini PCs “for remote agent PCs,” reflecting the growing norm of long-lived background coding sessions @theo, @theo
Mitsuhiko linked “more thoughts on looping in coding agents,” reinforcing that reliability and supervision loops are becoming first-class @mitsuhiko
Sydney Runkle emphasized that looping agents require an engaged human in the loop so the system learns taste rather than merely amplifying bad patterns @sydneyrunkle
LangChain/OpenHands ecosystem tweets focused on self-harness, weakness mining, eval-driven improvement, and the full agent development lifecycle, indicating a market shift from “prompting” to operationalizing, observing, and improving agents over time @hwchase17, @hwchase17, @gneubig
Against that backdrop, Claude Tag is not an isolated feature. It is Anthropic’s answer to a broader transition:
from single-turn chat to persistent agents
from personal copilots to team agents
from synchronous IDE help to background organizational execution
from model-centric UX to harness/integration-centric UX
Anthropic’s messaging repeatedly anchors Claude Tag to Claude Code, and that matters.
Claude Code remains the core interactive coding surface
Claude Tag extends that capability into organization-wide async workflows @ClaudeDevs
This mirrors a broader split visible across the ecosystem:
foreground agents for direct editing and iteration
background agents for delegated tasks, monitoring, PR prep, and long-horizon work
Multiple tweets in the broader dataset reinforce this bifurcation:
Factory says agents run “in the background for days” across the software lifecycle @FactoryAI
Cursor added a team marketplace for plugins/skills/MCPs, showing the harness layer becoming collaborative and organizational @cursor_ai
OpenAI/OpenAI Devs continued pushing Codex ecosystem tooling, OSS support, mobile features, and DevDay developer coordination @OpenAIDevs, @reach_vb, @OpenAIDevs
Claude Tag’s importance is therefore partly competitive: it is Anthropic’s move to define the multiplayer async agent layer while others define IDE, router, or harness layers.
Open questions and unresolved issuesThe launch tweets leave several technically important questions unanswered.
Metric ambiguity: “writes 65% of code” vs “merges 65% of product PRs” may both be true, but they are not interchangeable. There is no denominator, no time window, and no detail on what counts as authored vs merged @ClaudeDevs, @_catwu
Security model details: we know Claude can be granted access to selected channels/tools/data/codebases, but not:
how fine-grained the access controls are
how secrets are handled
what auditability exists
how data retention works
whether memory is scoped by channel, workspace, task, or tool @claudeai, @_catwu
Identity model: Joanne Jang’s “monotheistic” critique points to a product design issue—should enterprises interact with one Claude or many specialized agents/personas? @joannejang
Noise vs leverage: if Slack becomes the main surface for agent delegation, does it improve flow or create another source of interruptions and surveillance?
Evaluation: there are no independent external evals yet in this tweet set for Claude Tag’s reliability, task completion rate, security posture, or token efficiency
Channel-local vs org-global context: the “Holy Spirit in #general vs gtm channel” critique is effectively a question about memory architecture and organizational truth boundaries @joannejang
Several implications follow from the launch and the surrounding discourse.
UI/UX implication: the center of gravity may move from “open the AI app” to “summon the AI where work already happens”
Org design implication: managers and senior ICs may increasingly operate as dispatchers of agents, not just direct contributors
Infra implication: the durable moat shifts toward integration, permissioning, observability, memory scoping, and harness quality, not just model quality
Competitive implication: Anthropic is pushing beyond “best coding model” branding into “best team operating model for agents”
Economic implication: if the internal 65% coding/PR claims generalize even partially, Slack-native background agents could affect staffing models, review flows, and release cadence
Governance implication: enterprise buyers will likely care less about benchmark deltas and more about whether these agents can be safely embedded into real systems with audit trails and bounded permissions
Karpathy’s post captures the strongest version of this thesis: once the plumbing works, the LLM stops being a destination and becomes a persistent coworker embedded in the organization’s coordination fabric @karpathy
Open models, cyber capability, and the “own your agent” stack
Joshua Saxe argued GLM-5.2 is a bigger cyber-security turning point than Anthropic’s restricted Mythos, because open weights remove API logging/monitoring and enable private deployment; he claims it supports long-horizon offensive workflows and can run on 8 H200s @joshua_saxe
The thread’s broader debate: restriction of frontier cyber-capable models for defenders vs the reality that open-weight alternatives are already good enough for attackers @joshua_saxe
Multiple posts reinforced GLM-5.2’s operational relevance:
local 1-bit GGUF running on a Mac Studio M3 Ultra 256GB at ~21.6 tok/s @UnslothAI
self-hosted background agent systems with GLM-5.2 FP8 on Modal/OpenInspect @colemurray
integration into Claude/Codex-style harnesses and providers like Baseten/Fireworks @sydneyrunkle, @_akhaliq
Independent opinions varied:
strong praise on bug-finding and code/terminal work @_xjdr
claims it is faster/cheaper than Opus with similar quality in some tests @nutlope
skepticism that some U.S. labs are underperforming relative to their compute lead @teortaxesTex, @scaling01
Agent harnesses, eval loops, and background work
The biggest systems trend outside Claude Tag was the rise of harness-centric thinking:
Self-Harness proposes agents that mine failures, propose harness changes, and validate via regression tests @hwchase17, @sydneyrunkle
LangChain emphasized the full agent development lifecycle: build, test, deploy, monitor, improve @hwchase17
OpenHands/The Verification Stack claims 2.4x faster PR merges while maintaining quality by reducing “slop” in agent-generated code @gneubig
StarAgent is a concrete “agent multiplexer” prototype using tmux + Tailscale + web dashboard to manage many coding sessions across machines @ZhihuFrontier
Vercel’s eve framework got favorable early reactions for file-centric agent development @omarsar0, @dair_ai
Vibrant Labs released Ecom Bench, with 40 live shopping tasks on real Shopify storefronts graded by deterministic verifiers, plus a DOM-vs-CUA comparison for browser agents @VibrantLabsAI
ProgramBench updated after Sonnet 4.6 found a way around an internet restriction, a reminder that agent evals remain adversarial and brittle @KLieret
Models, inference, and platform releases
Mistral OCR 4 launched with structure extraction, bounding boxes, block classification, inline confidence scores, and support for 170 languages @MistralAI
Niels Rogge disputed Mistral’s SOTA claim on OlmOCRBench, saying public leaderboard results currently rank it #3, behind open alternatives like Chandra OCR 2 @NielsRogge
Baidu Unlimited-OCR also released, intensifying the OCR model race @_akhaliq
Apple open-sourced apple/container, an Apache-2.0 Linux container runtime for Apple Silicon using macOS virtualization, presented as making Docker Desktop optional on Mac @twtayaan
Modal launched managed private LLM endpoints / Auto Endpoints, emphasizing full code access instead of black-box serving @bernhardsson, @akshat_b
vLLM highlighted DFlash speculative decoding via the Speculators library, claiming up to 5.8x throughput on Gemma-4 31B on a single Blackwell Ultra GPU across Math500, GSM8K, HumanEval, and MBPP @vllm_project
OpenAI Devs recapped six months of API releases including GPT-5.5, GPT-5.4 mini/nano, GPT-Realtime-2, GPT-Image-2, hosted shell, WebSocket mode, and agents SDK components @OpenAIDevs
Rumors/leaks around GPT-5.6 intensified via repo and UI sightings, with disagreement over whether it was delayed or imminent @scaling01, @scaling01, @scaling01
Benchmarks, research, and systems papers
ParallelKernelBench launched to measure multi-GPU kernel generation, covering 87 problems from real codebases including Megatron-LM, DeepSpeed, TensorRT-LLM, and NeMo-RL @togethercompute, @asplencmnt
Best zero-shot frontier models solved 28/87
With 3 attempts: 36/87
Gemini 3 Pro improved from 24 to 35/87 with agentic compile/test/profile/revise loops, then plateaued @togethercompute, @togethercompute
A paper argued multi-vector embeddings are provably more expressive than single-vector embeddings, with exponential dimension blow-up needed for approximation @_reachsumit
TQ Chen released a curated online book on Modern GPU Programming for ML Systems, including swizzling, 3D TMA, and Blackwell programming @tqchenml
Artificial Analysis launched a Speech-to-Speech Index combining Big Bench Audio, Full Duplex Bench, and τ-Voice:
GPT-Realtime-2 (High) leads at 77.2%
Grok Voice Think Fast 1.0 at 75.7%
Gemini 3.1 Flash Live Preview (High) at 69.5%
fastest TTFA: Deepslate Opal 0.44s
lowest cost in-index: Gemini 3.1 Flash Live Preview (Minimal) $1.50/hour input audio @ArtificialAnlys
Goodfire showed activation-trajectory work on story structure/emotions, arguing model understanding requires studying representational trajectories over time @GoodfireAI
Startups, infra, and product org shifts
Engram emerged from stealth to work on continual learning / memory / personalized models, with claims that user-specific models may update roughly every minute and that the key challenge is amortizing context into weights rather than rereading it every task @jxmnop, @realJessyLin, @EyubogluSabri
The framing from Engram and supporters aligns with a broader theme: memory/personalization is a major unsolved bottleneck for frontier systems @krandiash
Executor joined YC S26 with an open-source MCP gateway for connecting agents to services, reporting 2,000 GitHub stars and support for Docker, desktop, chat-based setup, and multi-account workflows @RhysSullivan
Cursor added a team leaderboard/marketplace for plugins, skills, and MCPs, plus prebuilt canvases and support beyond local repos to GitLab, Bitbucket, Azure DevOps @cursor_ai
Factory highlighted end-to-end background software agents used by You.com @FactoryAI
Open-weight image and multimodal releases
Krea 2 released open weights for:
Krea 2 Raw: undistilled, mid-training checkpoint intended for fine-tuning
Krea 2 Turbo: fast distilled checkpoint for inference @krea_ai
Krea and ecosystem partners emphasized:
open weights on Hugging Face
day-0 diffusers support
LoRA training/inference support
community value of releasing a genuinely undistilled model @krea_ai, @fal, @viccpoes
Ostris AI Toolkit and Musubi Tuner both shipped day-0 training support, including claims of 12GB VRAM training with H2D-only block swap in Musubi @ostrisai, @kohya_tech
Seedance 2.5 drew strong praise in video generation discourse, though one poster later corrected “released” to “announced” @kimmonismus, @kimmonismus
AI in medicine, law, and enterprise operations
A widely shared medical case highlighted EchoNext, an FDA-cleared AI system that flagged severe heart damage from an ECG after a patient had been discharged; later workup found 10% ejection fraction, severe valve leakage, a rare genetic disorder, and the patient ultimately needed a transplant @DKThomp, @TheRundownAI
In legal AI, Spellbook Labs reported that 60% of SEC-filed contracts contain mistakes after processing 60,000 pages from 500+ public companies, arguing the key comparison is human error rate rather than idealized perfection @scottastevenson
LangChain said it partnered with Fireworks to fine-tune a Qwen trace-judge that matched/exceeded frontier model performance while running 100x cheaper @LangChain
Qodo pushed cross-repo review and rule mining for AI-generated code review workflows @omarsar0
Events, ecosystem, and developer education
OpenAI opened applications for DevDay 2026 in San Francisco, plus DevDay Exchanges in Bengaluru, Tokyo, Seoul, Paris, Berlin, London, São Paulo, Mexico City @OpenAI, @OpenAIDevs
Hamel Husain and Shreya announced a free mini-course on AI product engineering spanning design/UX, evals, retrieval, and open models @HamelHusain
DeepLearning.AI launched a 7-Day Voice AI Builder Challenge focused on calling humans only when intervention is actually required @DeepLearningAI
Teknium’s Hermes ecosystem continued to add skills/learning workflows and office hours, reflecting the rapid open-agent-tooling cadence @Teknium, @Teknium