
Sign up to save your podcasts
Or


The paper introduces MagicGUI-RMS, a multi-agent reward modeling framework designed to create self-evolving graphical user interface (GUI) agents. It addresses the limitations of existing agents—such as their reliance on manual annotations and static rule-based systems—by providing a scalable method for automated trajectory evaluation and feedback.
The system's core architecture consists of two primary components:
To support these models, the authors developed a structured data construction pipeline that automatically generates diverse training samples through techniques like trajectory perturbation and rule-based verification. Additionally, an automated data-reflux mechanism enables continuous self-improvement by feeding high-quality, verified trajectories back into the agent’s training set.
Experimental results demonstrate that MagicGUI-RMS significantly enhances agent performance, achieving substantial gains in task accuracy and robustness. Notably, the system outperformed several strong baselines, including GPT-4o, particularly in complex and out-of-distribution GUI tasks.
By Yun WuThe paper introduces MagicGUI-RMS, a multi-agent reward modeling framework designed to create self-evolving graphical user interface (GUI) agents. It addresses the limitations of existing agents—such as their reliance on manual annotations and static rule-based systems—by providing a scalable method for automated trajectory evaluation and feedback.
The system's core architecture consists of two primary components:
To support these models, the authors developed a structured data construction pipeline that automatically generates diverse training samples through techniques like trajectory perturbation and rule-based verification. Additionally, an automated data-reflux mechanism enables continuous self-improvement by feeding high-quality, verified trajectories back into the agent’s training set.
Experimental results demonstrate that MagicGUI-RMS significantly enhances agent performance, achieving substantial gains in task accuracy and robustness. Notably, the system outperformed several strong baselines, including GPT-4o, particularly in complex and out-of-distribution GUI tasks.