Working on lib_elixir - Elixir core modules as a library

zachallaun · August 12, 2024, 9:11pm

Hmm, just writing the abstract forms to a file is an interesting option that I hadn’t thought of. I’d like to think through the Pros/Cons compared to namespacing the abstract forms during compilation.

Pros:

Faster compilation because namespacing happens ahead of time.
Multiple libraries can depend on and use the same lib_elixir instead of having to re-namespace per library.
Easier setup, just add dependency like :lib_elixir_1_17_2 and then use LibElixir_1_17_2.Code (etc.) instead of having to set up custom config, pre-download, and namespace the abstract forms. (Though that would only happen once.)

Cons:

More difficult maintainership, or at least more complex setup to get automations in place to distribute multiple lib_elixir versions, stay up-to-date with releases, etc.
Possibly misusing the hex package registry by encoding the Elixir version in the package name. :lib_elixir_1_17_2 instead of :lib_elixir. (This is necessary for different libraries to depend on different versions.)
Doesn’t support building arbitrary ref, e.g. whatever the latest dev build is.

Things I missed?

mhanberg · August 13, 2024, 1:20am

Easier setup, just add dependency like :lib_elixir_1_17_2 and then use LibElixir_1_17_2.Code (etc.) instead of having to set up custom config, pre-download, and namespace the abstract forms. (Though that would only happen once.)

My suggestion is that the package would have all the different versions of Elixir in the same package, so lib_elixir would have say LibElixir.V1_17 LibElixir.V1_18 in it, so you wouldn’t need to publish new packages for each version.

More difficult maintainership, or at least more complex setup to get automations in place to distribute multiple lib_elixir versions, stay up-to-date with releases, etc.

I don’t personally think this would be that difficult, there are like 6 releases of Elixir per year.

Possibly misusing the hex package registry by encoding the Elixir version in the package name. :lib_elixir_1_17_2 instead of :lib_elixir. (This is necessary for different libraries to depend on different versions.)

Related to my comment above, I don’t think it would be necessary. And to be completely clear, what I mean is that the single package would contain the complete namespaced code for each version of Elixir that is included.

Doesn’t support building arbitrary ref, e.g. whatever the latest dev build is.

I think that the use case for this would be limited, but could potentially still include the “inside your project” method originally described. Or potentially a mix task to even vendor in the abstract forms into the users project the same way we would do the library.

zachallaun · August 13, 2024, 1:40am

Got it – that could work.

The use-case that comes to mind is test-driving APIs prior to release, but I agree with you that we could likely include both of these mechanisms for selecting a version.

One thing I forgot to mention: There is at least one incompatibility that I think was introduced in 1.14 that makes it so that the byte code generated for a lib_elixir 1.14+ can’t be run on less than 1.14 and vice versa, but I think it is possible to address that with another transform to the abstract code.

That can only happen once we know what version of Elixir we’re running, which I was thinking was a point in favor of delaying namespacing, but if it’s really just the one thing, we could bundle both versions of the bytecode and the Mix compiler could “install” the correct ones once it knows what version is running. I’ll need to experiment a bit more.

zachallaun · August 16, 2024, 3:30pm

I’ve made some progress on this, but there are a few things that I wanted to get feedback on. First, a bit of context:

Namespacing currently works by starting with a few user-specified modules like [Code, Macro, ...] and then recursively namespacing those modules and any modules they reference. You’ll get MyNamespace.Code, but you’ll also get MyNamespace.Code.Fragment because Code.Fragment is referenced by Code, and you’ll also get anything that Code.Fragment references, etc.

This can be problematic. As a concrete example, when you parse access syntax like foo[:bar], you get:

{{:., [line: 1], [Access, :get]}, [line: 1], [{:foo, [line: 1], nil}, :bar]}

and various tools like Sourceror will look for that [Access, :get] to differentiate between that and Access.get(foo, :bar), which parses differently.

Without intervention, however, lib_elixir will implicitly namespace Access, so you’d get something like:

{{:., [line: 1], [MyNamespace.Access, :get]}, [line: 1], [{:foo, [line: 1], nil}, :bar]}

This is problematic because this AST will be misunderstood by other libraries. To address this, lib_elixir has a hard-coded list of excluded modules, including things like Kernel, Access, etc.

Given the above, there are certain modules that must be excluded from namespacing, either because module names must be preserved, or for other reasons: String, for instance, is excluded because later versions fail to compile on earlier Elixirs/OTPs.

So I have some thoughts and questions on what and what not to include:

While this idea was introduced as lib_elixir, all of the current intended use-cases make use of only three root modules and their descendants: Code, Macro, and Module. (Though @dorgan has a use-case for including Mix.Tasks.Format as well.)

As a simplification, should this project explicitly target those modules and be called something like lib_code instead? Are there other modules that people may want to use?
Alternatively, should it be the library author’s responsibility to explicitly exclude everything they want excluded? E.g. Access would be namespaced unless you excluded it. Perhaps the only modules that are excluded automatically are those that cannot be included for compatibility reasons, like String?
Should transitive dependencies by accessible directly within the namespace? For instance, if you include Code, Code.Fragment could be namespaced as MyNamespace.DEP.Code.Fragment to prevent its accidental use. This also means the Access example above could result in something like {:., [], [MyNamespace.DEP.Access, :get]} depending on the decisions from the above two questions.
My personal opinion is that the library should take a strong stance on what’s included and excluded, perhaps with no configuration at all:
- Rename to lib_code.
- Only include modules related to metaprogramming, like Code, Macro, Module, :elixir_tokenizer as it’s used by a number of projects, maybe Mix.Tasks.Format, and their descendants.
- Exclude everything not strictly necessary, like Kernel, Access, String, etc.
- Munge the names of transitive dependencies that must be included but shouldn’t be used, like Mix.Task (if Mix.Tasks.Format is included), namespacing it as MyNamespace.DEP.Mix.Task or similar.

I’d love to hear on the above from folks!