Artificial Discourse

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models


Listen Later

"Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models," details the development of a new family of multimodal language models (VLMs) called Molmo. Molmo is notable for its open-weight and open-data approach, meaning the model's weights, training data, and code are publicly available. This contrasts with the current trend of proprietary VLMs which keep their models closed. Molmo achieves state-of-the-art performance by utilizing a novel image captioning dataset called PixMo, collected from human annotators using speech-based descriptions. This approach avoids reliance on synthetic data generated by proprietary systems, enabling the creation of performant VLMs without the need for distilling closed models. The authors highlight Molmo's potential for various tasks, including question answering and image-based navigation.

...more
View all episodesView all episodes
Download on the App Store

Artificial DiscourseBy Kenpachi