This research paper introduces REPOCOD, a new benchmark for evaluating large language models (LLMs) in code generation tasks that require knowledge of entire software projects. The benchmark consists of 980 code generation problems collected from 11 popular open-source Python repositories. REPOCOD is designed to be more challenging than existing benchmarks by requiring LLMs to generate complex functions, often involving interactions across multiple files in a project. The authors evaluate the performance of various LLMs on REPOCOD, finding that even the most advanced models struggle to achieve high accuracy, particularly when dealing with functions requiring knowledge of the entire repository. This research highlights the need for further development of LLMs that can effectively handle the complexity of real-world code generation within the context of software projects.