Modular RAG: Optimizing LLMs for Indigenous Knowledge Preservation
research paper explores a Retrieval-Augmented Generation (RAG)
framework for large language models (LLMs). The study uses a novel
dataset of interviews with Amazon rainforest natives and biologists to
assess the impact of different RAG components (base language models like
GPT and Palm, similarity scoring algorithms) on performance. The
modular RAG design allows for interchangeable components, enabling the
investigation of various configurations. Results show that model
performance varies depending on the combination of components and
whether contextual data is included; specifically, optimal performance
is achieved when models are paired with similarity scores from their
native platforms. The findings suggest that RAG offers a more efficient
alternative to traditional LLM fine-tuning, with implications for both
LLM development and the preservation of indigenous knowledge.
Ref https://www.researchgate.net/publication/378449219_A_Retrieval-Augmented_Generation_Based_Large_Language_Model_Benchmarked_On_a_Novel_Dataset