Learning GenAI via SOTA Papers

EP124: FRIDAY the AI that runs your computer


Listen Later

The paper introduces OS-Copilot, a pioneering framework designed to build generalist computer agents capable of interacting with entire operating systems (Linux and MacOS) rather than being restricted to narrow, specific applications. The framework provides a unified interface for agents to seamlessly interact with diverse OS elements, including the web, code terminals, files, multimedia, and third-party applications.

Leveraging this framework, the authors developed FRIDAY, a self-improving embodied AI assistant designed to automate general computer tasks. FRIDAY operates using a three-part architecture:

  • A directed acyclic graph-based planner that decomposes user requests and allows for the parallel execution of independent subtasks to save time.
  • A configurator modeled after human memory—featuring declarative, procedural, and working memory components—that retrieves relevant tools and knowledge.
  • An actor that proposes and executes actions within the OS, utilizing a self-criticism module to evaluate success, fix execution errors, and save newly generated tools for future use.

A key innovation of FRIDAY is its self-directed learning capability, which allows it to master unfamiliar applications with minimal human supervision. When faced with a new application like Excel or PowerPoint, FRIDAY autonomously proposes a curriculum of tasks ranging from easy to challenging, solves them through trial and error, and accumulates reusable Python tools in its memory repository.

In performance evaluations, FRIDAY achieved a 40.86% success rate on level-1 tasks of the GAIA general AI benchmark, marking a 35% relative improvement over previous state-of-the-art methods like GPT-4 Plugins. It also successfully solved highly complex level-3 tasks that were previously unsolvable by any other evaluated system. Furthermore, on spreadsheet manipulation tasks, FRIDAY achieved a 60% success rate purely through self-directed learning, outperforming baseline models specifically engineered for spreadsheet control.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu