Hi! Over the last weeks, we have been working on HexDocs Search, a cross-package function/type/callback documentation search interface containing data for (almost) all packages currently on hexdocs.pm.
To give it a try, visit hexdocs.pandosearch.com and just start typing.
We are Pandosearch, a Dutch company offering search engines for websites and web applications. We have been working with Elixir since 2018 and love the language and its ecosystem.
We regularly try to give back by contributing to Elixir open source projects, but felt we could do more.
Seeing a community need for cross-package HexDocs search, we thought: this is our call!
HexDocs Search currently contains around 433K functions, types and callbacks. Most HexDocs packages are included, but not all. See “Known limitations” for more details on this.
We collect data by crawling hexdocs.pm twice a day. We use the root
sitemap.xml index file on hexdocs.pm as a starting point and use all the URLs found there.
lastmod values from the XML sitemaps are used to ensure we only recrawl changed pages. This enables us to do a full recrawl in less than ten minutes. This strategy also has the added benefit of avoiding massive request loads on hexdocs.pm.
The current project status is probably best described as “beta”. Response times should be quick and the project should generally be up 24/7, but we are still actively working on some details, so a bit of downtime is to be expected every once in a while.
Our first goal after posting this announcement is to gather feedback from the community. This means we need you, as you are probably an Elixir developer reading this!
Any feedback is greatly appreciated. If possible, just use this forum thread, but you can also send a private message if deemed more appropriate. We’ll get back to you as quickly as possible.
A few days before posting this announcement, we’ve offered @josevalim early access to gather some initial feedback. Reason for contacting him was this recently added GitHub issue, which outlines a set of features similar to what our project aims to offer.
We are currently in the process of determining if (parts of) HexDocs Search could be integrated into ExDoc/HexDocs for cross-package search. As the outcome of this process is still highly uncertain, José encouraged us to wait no longer and go live with what we have right now. In other words: here we are :).
For future features, we are very much open to suggestions, as our main goal is to maximize value to the community. As such, we deliberately do not have a fixed feature roadmap right now.
We deliberately decided to limit our scope to function/type/callback documentation for now. This means no module documentation, guides and/or other ExDoc-generated HTML pages are included in the results yet.
An important reason for this is that in contrast to single-package search, including these content types for all of HexDocs is quite challenging in terms of correctly ranking and identifying search results.
In addition, due to the more free-form nature of ExDoc guide pages and module documentation, there aren’t any hard guarantees about the exact structure of the HTML output, especially as Markdown also allows raw HTML to be used. This gives us some additional challenges when crawling and rendering these types of content.
Due to ExDoc HTML changes over the year, searchable content is limited to HTML generated by ExDoc versions starting from v0.20.0 (released April 2, 2019). In case your package is missing, update your ExDoc dependency version before releasing a new package version. Our crawler should pick up the newly generated documentation within 24 hours.
Also due to ExDoc HTML/CSS changes over the years, documentation HTML is loaded in an
<iframe> using the original hexdocs.pm CSS. In most cases, this ensures styling is the same as on hexdocs.pm, but some edge cases may have some minor rendering issues.
Dark and light mode are supported, but only based on system preferences. Manually selected theme preferences set on hexdocs.pm cannot be respected due to
localStorage being scoped to a domain.
That’s it for now. Looking forward to your feedback!