Stop Worrying and Love the Large Context Window
The potential for Large Language Models (LLMs) to enhance developer productivity is significant, but practical application often hits a snag. Developers frequently find themselves manually feeding code snippets into chat interfaces, hoping the AI understands the broader project context. This approach can be inefficient for complex applications.
Coding involves understanding intricate dependencies, project history, and the overall structure. Standard LLM context windows, while growing, can still be insufficient for the scale of many enterprise codebases. Manually curating the relevant context is time-consuming and prone to missing critical information or misunderstanding interfaces.
To address this, tools are emerging to help provide LLMs with more comprehensive context. Two such tools that can work together are Repomix and Gemini's advanced models.
Repomix (https://repomix.com/) is a tool designed to scan an entire repository. It analyzes the structure, dependencies, and code, then packages this information into a single XML file, typically named repomix-output.xml
. This file acts as a detailed blueprint of the project, intended to be digestible by an LLM.
Google's Gemini models, particularly the more capable versions, offer large context windows. These larger windows are potentially capable of ingesting the detailed output generated by RepoMix for a significant number of real-world projects, allowing the LLM to have a much broader understanding of the codebase.
Using these tools together generally involves the following workflow:
- Generate Comprehensive Context: Navigate to your project's root directory in your terminal. Run the RepoMix command, for example
npx -y repomix
(or justrepomix
if you have installed it globally). This process scans your project and creates therepomix-output.xml
file. - Engage Gemini: Open your preferred interface for interacting with an advanced Gemini model.
- Provide the Blueprint: Upload the generated
repomix-output.xml
file to Gemini.
Initiating the Conversation
After uploading the repomix-output.xml
file, it's helpful to provide Gemini with clear initial instructions. Paste the following prompt into the chat interface and send it. This sets the stage for Gemini to process the context file and understand how you want to interact regarding code changes, including the critical instruction to output each file modification into its own document.
You are assisting with modifications to a software codebase. The entire repository context is provided within the attached Repomix file. This file contains a consolidated representation of the codebase, including directory structure and file contents.
I will provide specific instructions for changes I want to make in subsequent messages.
**CRITICAL INSTRUCTION FOR OUTPUT FORMAT:**
When you generate code modifications for existing files OR create new files based on my requests, you **MUST** output the **complete content** of **each** modified or new file into its **own, separate document** within the Gemini Canvas workspace. Do not combine multiple files into a single document. This separation is essential for my workflow.
For now, please carefully review the provided Repomix file to understand the current state of the codebase. Acknowledge when you are ready for my change requests.
Wait for Gemini to acknowledge it has processed the file and is ready.
- Utilize Canvas Mode: It is highly recommended to switch Gemini to Canvas mode (the workspace view with the document editor) for interactions involving this level of context. Canvas mode is generally better equipped to handle multi-file awareness and utilize the structural information provided by the RepoMix output compared to standard chat interfaces. Once in Canvas mode, you can provide your specific change requests.
With this setup, your requests to the LLM can become more specific and context-aware, leveraging the model's understanding of the entire codebase provided by RepoMix. For instance, you could ask Gemini to perform tasks like:
- "Refactor the
getUserData
method insrc/services/UserService.java
to use the updatedUserRepository
interface defined insrc/repositories/UserRepository.java
." - "Identify potential race conditions related to the
sharedResource
variable across all files in thecom.example.concurrency
package." - "Implement comprehensive Javadoc documentation for the public methods within the
PaymentProcessor
class, adhering to the style guide implicitly present in other service classes." - "Create a new Angular component
OrderSummaryComponent
insrc/app/components/orders
, mirroring the structure and data binding patterns ofProductDetailComponent
found insrc/app/components/products
."
Gemini, operating in Canvas mode with the context from Repomix and the initial instruction about file separation, can better understand where files are located and how different parts of the code connect, leading to potentially more relevant and actionable suggestions presented in a manageable way.
For those looking to streamline the process further, automation can help reduce repetitive steps like generating the context file and then locating it for upload. As an example concept, you could add a script to your package.json
in a Node.js project to run RepoMix and then copy the path of the output file to the clipboard (this specific example uses macOS commands):
// In package.json
"scripts": {
// ... other scripts
"repomix:context": "npx -y repomix@latest && osascript -e \"set the clipboard to (POSIX file \\\"$(realpath repomix-output.xml)\\\")\""
}