HexDocs MCP - Semantic Search for Hex Documentation, Right in Your Editor ✨

bradley · April 3, 2025, 11:51pm

Elixir community! I’m excited to announce HexDocs MCP. The project is an MCP server that I developed to improve my workflow, and I hope it helps you too!

Summary

HexDocs MCP offers semantic search capabilities for Hex package documentation—specifically designed for AI applications. It has two main components:

Elixir Package: Downloads, processes, and generates embeddings from Hex package documentation.
TypeScript Server: Implements the Model Context Protocol (MCP) to provide a searchable interface to the embeddings.

This project aims to assist developers by giving AI assistants like Cursor, Claude Desktop, Windsurf, etc. better context when working with Elixir code. When your AI assistant needs details about a specific Hex package function or module, HexDocs MCP retrieves the most relevant documentation snippets using vector embedding search.

Features

• Provides a wrapper around mix hex.docs fetch → mix hex.docs.mcp fetch to download and process Hex documentation
• Generates embeddings using Ollama (with nomic-embed-text as the default)
• Works with MCP-compatible clients

Why HexDocs MCP?

This project is a follow-up a previous post. I really like Ash, but wow—AI is still pretty bad at writing code for our ecosystem. Although it’s improved over the last few months, AI still struggles in many areas. Plus, the MCP protocol has taken off more than I expected, so I felt it was time to put something out there that we can learn from, improve, and iterate on.

Future potential

Completely remove javascript from the project - I added this because it has the most support with MCP right now
Use Bumblebee instead of ollama - I wasn’t sure how to use Bumblebee in this context. Could we remove the ollama requirement?
Other ideas?

Acknowledgement

Big shoutout to @mjrusso for laying the groundwork with the hex2text project!

kip · April 3, 2025, 11:58pm

This looks great, thanks for building and and contributing. MCP is a very (very!) new thing to me, and I’m behind the curve on AI-assisted development.

Do you (or anyone else) have any recommendations on how to write and structure library documentation to improve the quality of the embeddings that are derived from docs? I’m not sure I even phrased the question correctly but I’m optimistic you know what I mean!

garrison · April 4, 2025, 12:12am

I haven’t personally hopped on the AI editor trend, but a lot of people have been asking for stuff like this, so nice work

I think you’re looking for this. I wonder, if you had used your tool to find this documentation, could it have improved itself?

bradley · April 4, 2025, 2:14pm

Hey, great question! I think I get what you’re asking—are you thinking specifically about HexDocs MCP or just more generally? There are a lot of small optimizations the MCP server could make that would help, but they’d require a decent amount of work.

At a high level, I see this in two parts: the chunking mechanism and the search/re-ranking process.

For chunking, my current approach is pretty naive. I’m using TextChunker, which works with Markdown. That said, it’d be nice to have a smarter, more dynamic way to split content—maybe even something tailored specifically to HexDocs? I’m not totally sure what the best solution is here, but I definitely think there’s room for improvement. I’m still pretty new to all of this, so I haven’t nailed down a solid direction yet.

As for re-ranking, I think one of the most impactful things we could do is give RAG systems better signals about which embeddings should carry more weight. For example, if libraries included an llms.txt file—similar to what LangGraph suggests here—then tooling could prioritize content referenced in that file, treating it as more central or authoritative.

The tricky part, though, is how to maintain the llms.txt file as a library author. How do we decide what goes in it? How do we keep it up to date over time? I don’t have a good answer for that yet.

Hey, I actually considered this! I just wasn’t sure how to use Bumblebee alongside the stdio MCP server, which is written in JavaScript. I think it might be possible to drop the Ollama dependency if I used a native JavaScript library for embeddings—I just wasn’t sure which one to use. Ollama was super easy to get going since it works well with both Elixir and JavaScript, so I went with that for now. If you or someone could provide some direction here, I’d be open to a better solution!

mjrusso · April 4, 2025, 2:52pm

@bradley, this is sweet, thanks for building this!

A few misc comments:

Bumblebee really only makes sense to use if you’re willing to port the server to Elixir. Note that @thmsmlr has a (STDIO transport) MCP server implementation as part of Livebook Tools: livebook_tools/lib/livebook_tools/mcp_server.ex at master · thmsmlr/livebook_tools · GitHub
This is a bigger discussion, and I’m not going to do it justice in this quick comment, but something to consider: using RAG for search, but not for context. RAG picks out relevant chunks and thus identifies relevant modules/docs, but separate LLM call(s) are made (with the entire contents of that module’s documentation, and a summary of the user’s ask), returning new LLM-summarized text that can then be passed off to the coding agent. See ReAG: Reasoning-Augmented Generation - Superagent for a better explanation (although that’s not exactly what I’m proposing, but it’s along the same lines).

My preferred tools don’t yet support MCP, so I haven’t played around too much, but the above is an approach I’ve been planning on playing around with.

bradley · April 4, 2025, 3:23pm

Thanks for the excellent read! TIL ReAG.

One decision I made when designing this library was to limit the number of retrieval results to avoid overwhelming the LLM’s context window. However, after reflecting on your points, I’m considering an alternative approach: introducing an intermediate filtering step after the embedding retrieval but before delivering the final results.

Specifically, the idea would be to retrieve a broader set of results initially, then use the LLM itself to iterate through these and apply an isIrrelevant: true filter, discarding entries that aren’t contextually relevant. Although this would likely increase response latency, I believe it could significantly enhance the relevance and overall quality of the results.

What do you think of this approach?

One concern I still have relates to the chunking mechanism itself—since valuable context could be lost within individual chunks, it would be ideal to remove the need for chunking altogether which the blog post mentions.

mjrusso · April 4, 2025, 3:54pm

Yes, that’s the general the idea. I would experiment with keeping the chunking
(but strictly for search), and then sending each document that the chunk was excerpted from for in its entirety and have the LLM provide back the relevant subset of context. (Which may need to happen across parallel calls, depending on the number and size of documents.)

The challenge with the isIrrelevant approach is not so much the flagging of what’s relevant or not, but rather that the initial vector similarity search is probably not going to pick the best “chunks”; i.e., you’ll get results that are similar to the user’s query, but not the best ones for actually solving their problem.

But yes, the cost is a lot more latency (and tokens). Might be worth exposing a “fast” tool that exclusively does RAG, and another tool that takes this alternative approach.

(One other thought: I don’t think MCP supports returning another tool call, or prompt(s) from a tool call, but if it did that could be a neat way to implement ReAG, as it gives more guidance/control back to the MCP client.)

kip · April 4, 2025, 11:52pm

I was thinking specifically about what documentation should be written to maximise the chances of a developer getting a relevant and useful result from MCP-based searches.

For example, most of my libs have reasonable API documentation. But they definitely lack in tutorial type documentation. If the trend is towards more AI-assisted help then perhaps I need to be paying more immediate attention to different types of documentation.

bradley · April 7, 2025, 12:26am

Off hand these are my thoughts but I’m curious if anyone else agrees/disagrees.

Order of importance:

Force runtime errors/warnings at compile time wherever possible
Minimize ambiguity via types, be it via behaviours, specs, etc.
If none of this is possible, provide an examples via documentation/doctests
Add an llms.txt as mentioned above so devs could use mcpdoc or similar

bradley · April 12, 2025, 3:09am

Hey all! I’ve put out v0.2.0-rc.1 but am waiting to release it until I’m more confident in how well it works. Is anyone willing to try it out/give me feedback? I would be eternally grateful!

The main updates are:

Added a fetch command to allow the mcp server to auto fetch docs
You no longer need to add hexdocs_mcp to your mix.exs. Reasons for this were:
1. To reduce toil and having to keep dependencies up-to-date for a non-critical tool
2. I like to switch between various projects and having to remember/install in every project isn’t ideal for my workflow to run mix hex.docs.mcp fetch ...
3. To make installation as easy as possible for folks. Theoretically you don’t even need to have elixir installed to use the mcp server, though I haven’t tested this.

You can see all of the details and a migration guide in the release notes.

I will admit, it feels odd that the primary entrypoint to this project is a node app, but I think it’s right for now. Once the MCP ecosystem evolves and is more stable, migrating to an elixir MCP server would make a lot of sense I think. I’m open to ideas here if you have any!

bradley · April 14, 2025, 10:36pm

I went ahead and pushed v0.2.0. It’s the same as the rc version I mention above. Check out the changelog/readme for details!

bradley · April 19, 2025, 4:27am

I just pushed v0.3.0 with a few minor improvements:

You can now now search across all hexdocs you’ve already fetched without having to specify a package name. I personally am very excited about this one.
I added the --project flag to the mix cli to make it possible to fetch all package docs from a given mix.exs. I’m considering adding it to the MCP server but it’s so slow that the experience via MCP fetch is pretty bad for large projects.

AlanMcCann · April 20, 2025, 11:13am

Thank you for your work on this.

Stupid question - I have added hexdocs-mcp to my claude desktop client. It shows as running in Cluade. How do I get Claude to actually pull information from a library using hexdocs-mcp. Right now is does a web search if I ask it about a specific phoenix library

olivermt · April 20, 2025, 11:19am

There is a tooling guide for claude code cli, I read it yesterday. You need to add the mcp to the cli tool as well.

bradley · April 20, 2025, 11:37am

Hey great question! I don’t use the Claude desktop app, so I’m not sure if this issue is specific to that. When I use Claude for code, I often prompt it with “look it up on Hexdocs” if the model keeps reverting to web search. That usually does the trick. LMK if that works for you or not!

AlanMcCann · April 20, 2025, 12:15pm

thanks for the quick response.

Using “fetch documentation for ash authentication in hexdocs” got it going.

I had an error after that and i did have a question as to whether I need to install Ollama locally (or the tool download did that)

bradley · April 20, 2025, 12:30pm

Yes, unfortunately, you do need to install Ollama locally. I’m planning to migrate the whole project to Bumblebee eventually which would eliminate the need for an external dependency, but I’ve run into issues getting it to work on my Mac and I’m hesitant to switch since I’m not confident it’s fully cross-platform compatible yet. If anyone has any experience here some guidance would be much appreciated.

AlanMcCann · April 20, 2025, 12:37pm

That isn’t a problem! I will do that and report any learnings I have on the way if it is helpul. Thanks again!

AlanMcCann · April 20, 2025, 1:17pm

I installed Ollama, ran ollama pull nomic-embed-text then ollama serve and everything worked fine!

olivermt · April 20, 2025, 3:22pm

I just skimmed the code, but ollama is primarily making embeddings right? What model are you using? I wasnt able to easily find it using github on my phone