This paper surveys Retrieval-Augmented Generation (RAG) systems for Large Language Models (LLMs) by classifying RAG tasks into four levels of complexity based on the type of external data required and the task’s primary focus: explicit fact queries, implicit fact queries, interpretable rationale queries, and hidden rationale queries. Each level presents unique challenges and solutions, with techniques like prompt tuning, Chain-of-Thought (CoT) prompting, and fine-tuning being explored to address the complexities of integrating external data into LLMs. The paper also explores three main forms of integrating external data into LLMs: context, small model, and fine-tuning, highlighting their strengths, limitations, and suitable applications. Ultimately, the paper aims to offer a comprehensive understanding of the data requirements and key bottlenecks in building LLM applications, providing guidance for systematically developing such applications.