
Sign up to save your podcasts
Or


HuggingGPT is a collaborative system that leverages Large Language Models (LLMs), such as ChatGPT, as a central controller to manage and integrate various expert AI models from machine learning communities like Hugging Face. The paper addresses the limitation of current LLMs in handling complex, multi-modal information (such as vision and speech) by using language as a generic interface to connect the LLM with external expert models.
The system operates as an autonomous agent through a four-stage workflow:
By combining the reasoning and planning capabilities of LLMs with the specialized expertise of multimodal models, HuggingGPT can autonomously tackle a wide range of sophisticated tasks across language, vision, and speech domains, paving a new pathway toward artificial general intelligence.
By Yun WuHuggingGPT is a collaborative system that leverages Large Language Models (LLMs), such as ChatGPT, as a central controller to manage and integrate various expert AI models from machine learning communities like Hugging Face. The paper addresses the limitation of current LLMs in handling complex, multi-modal information (such as vision and speech) by using language as a generic interface to connect the LLM with external expert models.
The system operates as an autonomous agent through a four-stage workflow:
By combining the reasoning and planning capabilities of LLMs with the specialized expertise of multimodal models, HuggingGPT can autonomously tackle a wide range of sophisticated tasks across language, vision, and speech domains, paving a new pathway toward artificial general intelligence.