Tidewave has just been announced by José Valim

TL; DR:

Here, for discussion, are some speculative notions sparked by recent work by José Valim, Chris McCord, et al.

Background

José has been shepherding Elixir’s set theoretic type system for a few years now and things appear to be proceeding quite smoothly (yay!). The current work concentrates on detecting (and warning programmers about) type conflicts found in code.

Unlike Dialyzer, no declarations are required; the facility analyzes the code itself. Although it attempts to handle all legal code, the type system project may issue deprecations as needed to handle corner cases.

More recently, José has announced Tidewave, an Elixir-friendly, open source LLM (etc) agent. Because Tidewave uses Model Context Protocol (MCP), it can interoperate (to varying degrees ;-}) with a variety of LLMs, text editors, etc. And, of course, it has access to the entire Elixir / Erlang toolbox (e.g., BEAM, Ecto, Phoenix, …)

Chris McCord has also been working on similar tooling. However, his approach uses containerization (e.g., on Fly.io), rather than the locally hosted approach taken by José. No matter; it’s all good…

Discussion

My basic notion is that there is an opportunity to integrate these (and other) efforts, collecting configuration and operational data and making it available to programmers in a more useful way.

Tidewave, LLMs, and data

Tidewave leans heavily on LLMs, so let’s begin there. Although an LLM can digest many sorts of data, all data is not created equal (GIGO). Ideally, one would give it clean, correct, well-structured data that contains summary, temporal, and other information.

Data sources

There are various ways that the compilation process and/or the BEAM could supply information for use by the LLM. For example:

  • The new type system could provide type information for every data instance of interest.
  • Information on libraries could be harvested from HexDocs, Hex, etc.
  • Source code and configuration files could be harvested by ElixirLS.
  • Compiled code (e.g., Elixir AST, Erlang Abstract Format) could be processed to create a graph of possible call-trees.
  • At runtime, the BEAM could track significant events (e.g., message handling, process creation). Similarly, the Phoenix Telemetry facility could supply useful operational data.

Data Storage

For best results, a mechanism will be needed to store the collected data. Graph representations are my preference, because of their extreme flexibility and generality. Graph databases (e.g., ArangoDB and Neo4j) provide both convenience and high performance.

Both the ArangoDB and Neo4j folks are hard at work on integration with LLMs, etc. However, other database approaches (e.g., relational) can generally be coerced into handling graphs (YMMV) and common formats (e.g., JSON) can be used for data exchange. So, graph databases aren’t the only possible options.

Use Cases

Given this combination of static and dynamic information sources, Tidewave could describe a running Elixir system in fairly arbitrary and detailed ways. For example:

  • Diagram the active processes, showing ancestry (as in Observer), message traffic (both potential and active), etc.
  • Report on processes receiving unmatched message formats.
  • Report on data type mismatches in handling received messages.

Here’s hoping that this sort of data collection and integration will start showing up soon. (ducks)

-r

1 Like