October 10, 2024

Pixtral-12B Multimodal Model | Mistral AI

10 minutes

Pixtral 12B is a 12-billion parameter multimodal language model trained to understand both images and text. It uses a novel vision encoder trained from scratch which allows it to process images at their native resolution and aspect ratio. Pixtral outperforms comparable open-source models on multimodal benchmarks, including a new benchmark called MM-MT-Bench. This podcast also discusses the importance of having standardised evaluation protocols for multimodal language models. The pixtral paper authors highlight the problems with existing benchmarks and metrics, proposing solutions to improve the evaluation of these models.

...more

View all episodes

By Shobhit Gupta

October 10, 2024

Pixtral-12B Multimodal Model | Mistral AI

10 minutes

...more

Share Pixtral-12B Multimodal Model | Mistral AI

Sign up to save your podcasts

Pixtral-12B Multimodal Model | Mistral AI

Pixtral-12B Multimodal Model | Mistral AI