November 06, 2024

Meet Leopard: The AI That Excels in Multi-Image, Text-Rich Tasks

9 minutes

In this episode, Robert and Haley explore the latest breakthrough in AI multimodal models: Leopard, a new AI developed to tackle complex, text-rich image tasks. Designed by researchers from the University of Notre Dame, Tencent AI Seattle Lab, and UIUC, Leopard is the first model to truly excel at understanding and reasoning across multiple text-heavy images, like presentation slides, web snapshots, and scanned documents.

Join us as we break down how Leopard’s adaptive high-resolution multi-image encoding and innovative pixel shuffling set it apart from traditional models. Unlike its predecessors, Leopard can keep high-resolution details without sacrificing accuracy, meaning it’s primed for real-world uses like analyzing multi-page reports, data charts, and visual presentations. We discuss:

Leopard’s Unique Dataset: A tailored instruction-tuning dataset of over a million data points.
Dynamic Encoding: How Leopard keeps crucial details while managing multiple images at once.
Performance Gains: Over 9% improvement on benchmarks like SlideVQA and Multi-page DocVQA.

Get ready to dive into how this model reshapes the landscape for AI in business, education, and research. Leopard just might be the game-changer multimodal AI has been waiting for!

...more

View all episodes

By Robert Loft and Haley Hanson

November 06, 2024

Meet Leopard: The AI That Excels in Multi-Image, Text-Rich Tasks

9 minutes

Leopard’s Unique Dataset: A tailored instruction-tuning dataset of over a million data points.
Dynamic Encoding: How Leopard keeps crucial details while managing multiple images at once.
Performance Gains: Over 9% improvement on benchmarks like SlideVQA and Multi-page DocVQA.

Get ready to dive into how this model reshapes the landscape for AI in business, education, and research. Leopard just might be the game-changer multimodal AI has been waiting for!

...more

Share Meet Leopard: The AI That Excels in Multi-Image, Text-Rich Tasks

Sign up to save your podcasts

Meet Leopard: The AI That Excels in Multi-Image, Text-Rich Tasks

Meet Leopard: The AI That Excels in Multi-Image, Text-Rich Tasks