Ragex - Hybrid Retrieval-Augmented Generation for Multi-Language Codebases

Ragex is an MCP (Model Context Protocol) server that analyzes codebases using compiler output and language-native tools to build comprehensive knowledge graphs. It enables natural language querying of code structure, relationships, and semantics.

I admit I finally discovered that T9 autocompletion on steroids (aka LLMs) might be somewhat helpful in daily development process. MCP-servers put the last piece of this puzzle into place for me. Assistants might be indeed helpful, if they are fed with the proper context, not with the whole project codebase or like.

I felt like R letter in RAG acronym could have been done better though. And I created Ragex, providing an MCP server providing the context retrieved directly from AST.

I don’t use assistants per se, but I use LunarVim as my code editor of choice, hence I benefited from Ragex myself, using it as a co-pilot of the language server. Some tasks (like atomic refactoring) it seems to be doing better.

The README of the project contains more details. Enjoy!

3 Likes

Many such capitulations lately.

Last year there was a lot of talk about how RAG would be killed by large context models, but it doesn’t seem to have happened. Instead it seems like a lot of prompting discourse has been replaced with context management discourse.

Empirically, how many lines of code do you find you’re able to shove into current models before they start to break down? And has that number gone up over the last couple of generations?

I don’t know about them breaking down but Gemini Pro starts showing laziness tendencies when the chat nears the 1M tokens context window. It starts hypothesizing on code it has readily available and given to it in the initial prompt and I have to keep reminding it that it already has the code and it should just check it.

2 Likes

As far as I can tell, this is a key mistake. RAG built upon AST can keep relevant context for nearly billions of tokens, on the contrary to “large” models which still need to operate the context on their side.

That’s exactly the reason Ragex was born. I felt like I hit the ceiling too fast, exactly because the context window cannot grow infinitely, no matter what providers say. With Ragex, I’ve tried the elixir codebase and it somewhat works without any degrading, because AST is way more intense, compared to plain text.

1 Like

I have kinda insider info (please don’t quote me) that G actually lied about 1M. They started to accept 1M, which far from they learned to use 1M.

That’s the problem. Having (somehow squeezed) AST instead of code reduces the number of tokens drastically.

My understanding is that large context models are not natively trained on their full context window (as this would be prohibitive). Instead the models are trained at a much smaller context size and then the window is extended in post-training (I have little understanding of how). There is a resulting performance degradation once you eclipse the “true” context size of the model, but large context is still better than nothing.

I have no idea how much of this has changed over the past year, though. Everything is moving so fast.

AFAIU (I am not a model engineer either,) simply feeding the model with larger and larger contexts is a way to nowhere, because, you know, it’s easy to memorize Ode to a Nightingale, but it’s next to impossible to cram Hamlet from the first to the last line. No matter how large the context window is, one still needs to comprehend the meanings out of it.

That being said, I came to a conclusion that the more concise we can make the context without losing the significance, the better we are. That’s why I am a believer in local RAG for contexts, and that’s why I ever started Ragex.

1 Like

I think you are anthropomorphizing here. Not only is it trivial for a computer to memorize Hamlet, but there are human savants who can do so with ease as well. It seems to be more a matter of evolutionary pressure: it was not evolutionarily useful for the average human to memorize Hamlet. And even this is a relative matter; a monkey would have great difficulty memorizing the poem.

But this discussion is somewhat moot in that it is far too expensive for models to actually operate natively with 1M context windows. My understanding is that large context models use tricks internally which, with some hand waving, are conceptually similar to RAG in that they compress the internal representation of the tokens into a latent vector. In other words, the RAG is coming from inside the model.

The only question is, empirically, which works better. I can’t speak to this because I have not been vibecoding, but it sounds like you (and others) are saying that the large context models don’t work well enough at this time.

I imagine there is also a cost factor, as input tokens are not free. But I’m sure the providers would love if you would send them more!

2 Likes

I don’t do vibecoding either, I do vibe-documenting, vibe-testing, and vibe-keeping-track-on-changes for vibe-committing.

Well, I am pretty sure I (the client side) know better what is more significant than the server side. They surely do kinda RAG on their side, but they have no idea what’s important in this text junk and what’s not. I do.

1 Like