Papers Read on AI

Improved Baselines with Visual Instruction Tuning


Listen Later

Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks. Our final 13B checkpoint uses merely 1.2M publicly available data, and finishes full training in ~1 day on a single 8-A100 node. We hope this can make state-of-the-art LMM research more accessible. Code and model will be publicly available.

2023: Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee



https://arxiv.org/pdf/2310.03744v1.pdf
...more
View all episodesView all episodes
Download on the App Store

Papers Read on AIBy Rob

  • 3.7
  • 3.7
  • 3.7
  • 3.7
  • 3.7

3.7

3 ratings


More shows like Papers Read on AI

View all
Stuff You Should Know by iHeartPodcasts

Stuff You Should Know

77,462 Listeners

The AI in Business Podcast by Daniel Faggella

The AI in Business Podcast

161 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

442 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

295 Listeners

AI Today Podcast by AI & Data Today

AI Today Podcast

147 Listeners

Darknet Diaries by Jack Rhysider

Darknet Diaries

7,883 Listeners

Last Week in AI by Skynet Today

Last Week in AI

290 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

88 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

76 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

444 Listeners

Arxiv Papers by Igor Melnyk

Arxiv Papers

3 Listeners