HexDocs MCP - Semantic Search for Hex Documentation, Right in Your Editor ✨

:wave: Elixir community! I’m excited to announce HexDocs MCP. The project is an MCP server that I developed to improve my workflow, and I hope it helps you too!

Summary

HexDocs MCP offers semantic search capabilities for Hex package documentation—specifically designed for AI applications. It has two main components:

  1. Elixir Package: Downloads, processes, and generates embeddings from Hex package documentation.

  2. TypeScript Server: Implements the Model Context Protocol (MCP) to provide a searchable interface to the embeddings.

This project aims to assist developers by giving AI assistants like Cursor, Claude Desktop, Windsurf, etc. better context when working with Elixir code. When your AI assistant needs details about a specific Hex package function or module, HexDocs MCP retrieves the most relevant documentation snippets using vector embedding search.

Features

• Provides a wrapper around mix hex.docs fetchmix hex.docs.mcp fetch to download and process Hex documentation
• Generates embeddings using Ollama (with nomic-embed-text as the default)
• Works with MCP-compatible clients

Why HexDocs MCP?

This project is a follow-up a previous post. I really like Ash, but wow—AI is still pretty bad at writing code for our ecosystem. Although it’s improved over the last few months, AI still struggles in many areas. Plus, the MCP protocol has taken off more than I expected, so I felt it was time to put something out there that we can learn from, improve, and iterate on.

Future potential

  • Completely remove javascript from the project - I added this because it has the most support with MCP right now
  • Use Bumblebee instead of ollama - I wasn’t sure how to use Bumblebee in this context. Could we remove the ollama requirement?
  • Other ideas?

Acknowledgement

Big shoutout to @mjrusso for laying the groundwork with the hex2text project!

7 Likes

This looks great, thanks for building and and contributing. MCP is a very (very!) new thing to me, and I’m behind the curve on AI-assisted development.

Do you (or anyone else) have any recommendations on how to write and structure library documentation to improve the quality of the embeddings that are derived from docs? I’m not sure I even phrased the question correctly but I’m optimistic you know what I mean!

I haven’t personally hopped on the AI editor trend, but a lot of people have been asking for stuff like this, so nice work :slight_smile:

I think you’re looking for this. I wonder, if you had used your tool to find this documentation, could it have improved itself?

Hey, great question! I think I get what you’re asking—are you thinking specifically about HexDocs MCP or just more generally? There are a lot of small optimizations the MCP server could make that would help, but they’d require a decent amount of work. :sweat_smile:

At a high level, I see this in two parts: the chunking mechanism and the search/re-ranking process.

For chunking, my current approach is pretty naive. I’m using TextChunker, which works with Markdown. That said, it’d be nice to have a smarter, more dynamic way to split content—maybe even something tailored specifically to HexDocs? I’m not totally sure what the best solution is here, but I definitely think there’s room for improvement. I’m still pretty new to all of this, so I haven’t nailed down a solid direction yet.

As for re-ranking, I think one of the most impactful things we could do is give RAG systems better signals about which embeddings should carry more weight. For example, if libraries included an llms.txt file—similar to what LangGraph suggests here—then tooling could prioritize content referenced in that file, treating it as more central or authoritative.

The tricky part, though, is how to maintain the llms.txt file as a library author. How do we decide what goes in it? How do we keep it up to date over time? I don’t have a good answer for that yet.

Hey, I actually considered this! I just wasn’t sure how to use Bumblebee alongside the stdio MCP server, which is written in JavaScript. I think it might be possible to drop the Ollama dependency if I used a native JavaScript library for embeddings—I just wasn’t sure which one to use. Ollama was super easy to get going since it works well with both Elixir and JavaScript, so I went with that for now. If you or someone could provide some direction here, I’d be open to a better solution!

1 Like

@bradley, this is sweet, thanks for building this!

A few misc comments:

  • Bumblebee really only makes sense to use if you’re willing to port the server to Elixir. Note that @thmsmlr has a (STDIO transport) MCP server implementation as part of Livebook Tools: livebook_tools/lib/livebook_tools/mcp_server.ex at master · thmsmlr/livebook_tools · GitHub

  • This is a bigger discussion, and I’m not going to do it justice in this quick comment, but something to consider: using RAG for search, but not for context. RAG picks out relevant chunks and thus identifies relevant modules/docs, but separate LLM call(s) are made (with the entire contents of that module’s documentation, and a summary of the user’s ask), returning new LLM-summarized text that can then be passed off to the coding agent. See ReAG: Reasoning-Augmented Generation  - Superagent for a better explanation (although that’s not exactly what I’m proposing, but it’s along the same lines).

My preferred tools don’t yet support MCP, so I haven’t played around too much, but the above is an approach I’ve been planning on playing around with.

Thanks for the excellent read! TIL ReAG.

One decision I made when designing this library was to limit the number of retrieval results to avoid overwhelming the LLM’s context window. However, after reflecting on your points, I’m considering an alternative approach: introducing an intermediate filtering step after the embedding retrieval but before delivering the final results.

Specifically, the idea would be to retrieve a broader set of results initially, then use the LLM itself to iterate through these and apply an isIrrelevant: true filter, discarding entries that aren’t contextually relevant. Although this would likely increase response latency, I believe it could significantly enhance the relevance and overall quality of the results.

What do you think of this approach?

One concern I still have relates to the chunking mechanism itself—since valuable context could be lost within individual chunks, it would be ideal to remove the need for chunking altogether which the blog post mentions.

Yes, that’s the general the idea. I would experiment with keeping the chunking
(but strictly for search), and then sending each document that the chunk was excerpted from for in its entirety and have the LLM provide back the relevant subset of context. (Which may need to happen across parallel calls, depending on the number and size of documents.)

The challenge with the isIrrelevant approach is not so much the flagging of what’s relevant or not, but rather that the initial vector similarity search is probably not going to pick the best “chunks”; i.e., you’ll get results that are similar to the user’s query, but not the best ones for actually solving their problem.

But yes, the cost is a lot more latency (and tokens). Might be worth exposing a “fast” tool that exclusively does RAG, and another tool that takes this alternative approach.

(One other thought: I don’t think MCP supports returning another tool call, or prompt(s) from a tool call, but if it did that could be a neat way to implement ReAG, as it gives more guidance/control back to the MCP client.)

1 Like