December 08, 2025

303-TITAN: Multimodal Whole-Slide Pathology Foundation Model

Listen Later

14 minutes

The paper describes the development and extensive evaluation of the TITAN (Transformer-based pathology Image and Text Alignment Network), a new multimodal foundation model for computational pathology. TITAN is pretrained on a massive dataset of over 335,000 whole-slide images (WSIs) using a combination of vision-only learning and subsequent vision-language alignment with both synthetic fine-grained captions and clinical pathology reports. The model utilizes a Vision Transformer architecture, enhanced with ALiBi positional encoding, to manage the immense scale and complexity inherent in gigapixel WSIs. Evaluations across diverse tasks, including cancer subtyping, molecular classification, and survival prediction, demonstrate that TITAN consistently outperforms prior slide foundation models, often by significant margins, even in data-limited settings such as rare cancer retrieval. Furthermore, its multimodal capabilities enable advanced functions like zero-shot visual-language classification, highly accurate cross-modal retrieval between slides and reports, and the automatic generation of clinical descriptions. Overall, the research demonstrates TITAN’s efficacy in producing powerful, general-purpose slide representations that are immediately applicable to complex clinical workflows without requiring task-specific fine-tuning.

References:

Ding T, Wagner S J, Song A H, et al. A multimodal whole-slide foundation model for pathology[J]. Nature Medicine, 2025: 1-13.

前往小宇宙评论区与主播互动

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Paper Talk

By 淼淼Elva

December 08, 2025

303-TITAN: Multimodal Whole-Slide Pathology Foundation Model

Listen Later

14 minutes

The paper describes the development and extensive evaluation of the TITAN (Transformer-based pathology Image and Text Alignment Network), a new multimodal foundation model for computational pathology. TITAN is pretrained on a massive dataset of over 335,000 whole-slide images (WSIs) using a combination of vision-only learning and subsequent vision-language alignment with both synthetic fine-grained captions and clinical pathology reports. The model utilizes a Vision Transformer architecture, enhanced with ALiBi positional encoding, to manage the immense scale and complexity inherent in gigapixel WSIs. Evaluations across diverse tasks, including cancer subtyping, molecular classification, and survival prediction, demonstrate that TITAN consistently outperforms prior slide foundation models, often by significant margins, even in data-limited settings such as rare cancer retrieval. Furthermore, its multimodal capabilities enable advanced functions like zero-shot visual-language classification, highly accurate cross-modal retrieval between slides and reports, and the automatic generation of clinical descriptions. Overall, the research demonstrates TITAN’s efficacy in producing powerful, general-purpose slide representations that are immediately applicable to complex clinical workflows without requiring task-specific fine-tuning.

References:

Ding T, Wagner S J, Song A H, et al. A multimodal whole-slide foundation model for pathology[J]. Nature Medicine, 2025: 1-13.

前往小宇宙评论区与主播互动

...more