https://arxiv.org/abs/2504.20734
These sources introduce and describe **UniversalRAG**, a novel framework designed to enhance Retrieval-Augmented Generation (RAG) by incorporating knowledge from **multiple corpora with diverse modalities and granularities**, moving beyond traditional text-only RAG systems. The paper explains how UniversalRAG addresses the **modality gap** encountered when attempting to unify diverse data into a single representation space. It proposes a **modality-aware routing mechanism** that dynamically selects the most appropriate corpus for a given query and further refines retrieval by considering **different granularity levels** within modalities, such as paragraphs or documents for text and clips or full videos for video content. Experimental results across multiple benchmarks demonstrate that UniversalRAG **outperforms existing modality-specific and unified baselines** by adaptively accessing the most relevant knowledge sources for a wide range of queries.