UX - The User Experience Podcast

UX and AI Digest Episode 2


Listen Later

Send us Fan Mail

Evaluating AI Agents, Claude's Computer Access & Prompt-Only Enterprise Software

πŸ”¬ EVA β€” A Framework for Evaluating Voice Agents

  • I hadn't realised we lacked a proper evaluation framework for voice agents β€” this one from Hugging Face caught my attention
  • What I like: it combines two dimensions I've always thought should go together β€” accuracy (task completion, faithfulness, speech fidelity) and experience (conciseness, conversational flow, turn-taking)
  • My question: is the "experience" side actually measured with real end users, or just by the designers?
  • This connects to a three-step evaluation model I keep coming back to: define your ingredients, evaluate internally, then validate with users β€” and compare the gap
  • I'll dedicate a full episode to this, but the short version is: if you want to elicit trust or satisfaction, you need to know which product attributes actually produce those outcomes

πŸ€– Claude + Cowork β€” AI With Access to Your Computer

  • Cowork now lets you authorise Claude to access your files and folders so it can act on your behalf even when you're away
  • I'm genuinely torn β€” amazed by the technology, but uncomfortable with the direction
  • My concern isn't the capability itself, it's the pattern: LLMs arrive, and suddenly we open the gates to everything β€” recording, transcription, computer access β€” as if these things naturally belong together
  • My rule of thumb: always assume your data is being used to improve the product β€” if you have doubts, assume yes
  • I'd love to see more push for private, self-hosted LLMs β€” but the honest tension is that commercial ones will keep winning on convenience because they have more data to train on
  • It's not even apples to apples β€” and that's what makes this hard

πŸ–₯️ Aragon β€” What If Enterprise Software Was Just a Prompt?

  • Startup Aragon raised $12M at a $100M valuation to replace enterprise tools like Salesforce, Jira, and Tableau with a single LLM interface
  • Their thesis: buttons and menus are dead, future business is done by prompt
  • My honest reaction: I get why this is being explored β€” we're mapping the edges of a new territory and seeing what sticks
  • But one modality for everything? I'm not convinced β€” when I was building my own website, I actually wanted both: LLM for generation, drag-and-drop for fine-tuning β€” and that product barely exists yet
  • Users have 10+ years of muscle memory with their tools β€” strip that away and you're not simplifying, you're adding friction
  • Nielsen's heuristics exist for a reason: people need control, exit doors, and multiple ways to accomplish a task

Support the show

...more
View all episodesView all episodes
Download on the App Store

UX - The User Experience PodcastBy Jeremy