CodeGen - simple, succint and customizable code generation for your libraries

CodeGen

(experimental!) code here: GitHub - tmbb/ex_code_gen: Flexible code generation for Elixir

Suppose you want to have your library generate some code in the user’s module. That is pretty common in Elixir, and it’s usually done through the use XXX macro.That is flexible and succint, in that it doesn’t add any source code to the user’s file. On the other extreme, you can have code generators (normally called using mix package.gen.something ...). These generators dump literal source code in the middle of your project, often adding hundreds of lines but allow you to customize those lines as much as you want.

However, up until now, there isn’t a simple way of combining both approaches above. Ideally, one would use a module but retain the ability to add the literal source code of several functions so that they can be customized. This (experimental!) package was prompted by the discussion here: https://elixirforum.com/t/phoenix-1-7-feels-a-little-bit-locked-in-with-tailwindcss/54468/43

I’m tagging @josevalim and @thiagomajesk because of that discussion.

Let’s look at an example. First, let’s define a code template, which is simply a module that defines a __code_gen__(options) function (not a macro, there’s no need for it to be a macro!) which returns the AST of the code we want to inject into the module (the code is large, but it’s mainly because it’s highly commented, the actual injected code is very simple):

defmodule CodeGenTemplate do
  def __code_gen__(opts) do
    c1 = Keyword.get(opts, :c1, 1)
    c2 = Keyword.get(opts, :c3, 2)
    c3 = Keyword.get(opts, :c3, 3)

    quote do
      # Define some code inside a block, which we'll be able
      # to dump into our own module if we want
      CodeGen.block "f1/1" do
        @constant1 unquote(c1)

        def f1(x) do
          x + @constant1
        end
      end

      # Anothe code block
      CodeGen.block "f2/1" do
        def f2(x), do: x + unquote(c2)
      end

      # This code block is more complex and has comments
      CodeGen.block "f3/1" do
        # Comments are completely removed by Elixir's parser.
        # Ways of preserving comments while easily allowing AST
        # manipulation in an idiomatic way.
        #
        # The workaround is to use these custom attributes
        # which are completely removed from the AST and replaced
        # by comments when you dump the source code in the module
        @comment__ "This is a comment outside the function"
        @comment__ "This is a another comment outside the function"
        @comment__ "Yet another comment"
        @newline_after_comment__
        def f3(x) do
          @comment__ "this is a comment inside the function"
          x + unquote(c3)
        end
      end

      # You can define functions outside a code block
      # if you don't want the user to be able to redefine the function.
      def another_function() do
        # ...
      end

      # We need to mark some functions as overridable so that we can actually
      # dump their source code into the module and things will work.
      # It's too hard for CodeGen to understand which functions are being generated
      # and generate this list on its own.
      defoverridable f1: 1, f2: 1, f3: 1
    end
  end
end

Now, we can use that code in a module:

defmodule CodeGenExample do
  use CodeGen,
    # The `:module` is our template, which must define a `__code_gen__(options)` function
    module: CodeGenTemplate,
    # Use the options to customize the generated code
    options: [
      c1: 7
    ]
end

Looking at the example module, we see that the code above is quite similar to a normal use CodeGenTemplate invocation, except for the fact that it is more explicit regrading the fact that we’re using the CodeGen module, which will support additional features.

Now, suppose we want to customize the f1/1 function. The way to do it is to edit the source code directly, but the problem with macros that generate code is that there is no source code for us to edit! However, this is where the special features in the CodeGen module become useful. Remember we have defined a number of named blocks. First, we can query the module to see which block names are available (of course, the author of the template module should make that clear in the documentation, but querying the block names is always halpful):

iex> CodeGen.block_names(CodeGenExample)
["f1/1", "f2/1", "f3/1"]

Nice! But we’d like to be able to actually see the blocks’ contents, so that we know what we’ll be including in advance. That is also easy:

iex(6)> CodeGen.show_blocks(CodeGenExample)
┌────────────────────────────────────────────────────────
│ Block: f1/1
├───────────────────────────────────────────────────────
│ @constant1 7
│ def f1(x) do
│   x + @constant1
│ end
└───────────────────────────────────────────────────────
┌────────────────────────────────────────────────────────
│ Block: f2/1
├───────────────────────────────────────────────────────
│ def f2(x) do
│   x + 2
│ end
└───────────────────────────────────────────────────────
┌────────────────────────────────────────────────────────
│ Block: f3/1
├───────────────────────────────────────────────────────
│ # This is a comment outside the function
│ # This is a another comment outside the function
│ # Yet another comment
│
│ def f3(x) do
│   # this is a comment inside the function
│   x + 3
│ end
└───────────────────────────────────────────────────────

:ok

Now we know which code there is in each block. Suppose we want to dump the contents of block f1/1 into our own file. We just need to do:

iex> CodeGen.dump_source(CodeGenExample, "f1/1")
* injecting test/fixtures/immutable/code_gen_example.ex
:ok

The file contents have been replaced by:

defmodule CodeGenExample do
  use CodeGen,
    module: CodeGenTemplate,
    options: [
      c1: 7
    ]

  @constant1 7
  def f1(x) do
    x + @constant1
  end
end

You can now customize the f1/1 function at will, while not having your code polluted by the code of the other functions you don’t need.

Applications

This CodeGen module is useful in any situation where you want to put some code in a module without adding literal code to the file, but in which you think you might have to customize some of the functions by editing the source code.

I can think of some uses for this:

Library behaviours, such as GenServer

Let’s say that instead of writing use GenServer you could write use CodeGen, module: GenServer. That way the GenServer module could define all callbacks inside the module, but you could do things such as CodeGen.dumo_source(MyGenserver, "handle_info/2") to have skeleton implementtion which you can edit´

Phoenix CoreComponents

Phoenix CoreComponents are meant to be customized by the user. However, the truth is that the default Phoenix generators dump A LOT of source code into the default project, some of which the user doesn’t care about, at least at the beginning. Some people (myself included) have complained about it here: https://elixirforum.com/t/phoenix-1-7-feels-a-little-bit-locked-in-with-tailwindcss/54468/43. One of the suggestions made by @josevalim is that people should publish CoreComponents files customized to certain CSS frameworks (such as Bulma, Bootstrap, etc.). With CodeGen, one can publish a package that provides a custom CoreComponents file, which can be used like this:

defmodule MyAppWeb.CoreComponents do
  use CodeGen,
    # The (foreign) module providing the core components
    module: Bootstrap5CoreComponents,
    # Module-level customizations
    options: [
      horizontal_forms_by_default?: true,
      label_width: 3,
      input_width: 9,
      # ...
    ]
end

If the user wants to customize something like an input/1 component, then it’s simple to just start iex and do:

iex> CodeGen.dump_source(MyAppWeb.CoreComponents, "input/1")
* injecting test/fixtures/immutable/code_gen_example.ex
:ok

The code above could insert the code for the input/1 component, whitout polluting the source with other components which the user doesn’t need to modify.

Inclusion in Elixir’s Standard Library

I’ve looked into many languages which provide intersting facilities for code generation. The main ones are variants of Lisp in one way or another, but there are non-SExpr-based languages with such capabilities, such as OCaml (through a more or less complex build step), Haskell (which seems to provide very useful metaprogramming facilities, but which I’ve never tried due to the general dificulty of doing most things in Haskell as well as fear of lazy evaluation), Rust (which is kinda too low level for me). Other languages, like C have preprocessor-based metaprogramming.

Elixir has very impressive metaprogramming capabilities, which sets it apart from most other languages I’ve tried, and at the level of Lisp (and certainly more ergonomic than all the non-Lisp languages I’ve tried). However, the most unique feature of Elixir’s development I’ve found is the emphasis on generators to generate actual code which can be customized by the user. Phoenix’s generators, in particular, are a marvel in terms of generalizability and implementation (I know it because I’m maintaining parallel generators for my Mandarin admin package, and it’s actually quite hard to implement, maintain and test such generators).

I think that the functionality I’ve built with CodeGen bridges the functionality of metaprogrammning facilities (succint, safe, non-customizable) together ith the functionality of code generators (verbose, less safe - we never know what the user will do to “our” code, very customizable). This is an idea I’ve had for a long time but which up until now I’ve never got around to implement it. The implemenation is in a single (very short) file.

Given the emphasis given in Elixir to both metaprogramming and code generation, I’d like to encourage experimentation with these kinds of ideas (not necessarily with CodeGen, but with other similar tools one can develop) so that this could be stabilized and maybe included in the Elixir standard library.

4 Likes

I like where you’re going with this, and how you said the definition does not need a macro, but you go on to use quote and CodeGen.block "f1/1" do which doesn’t feel very idiomatic. For simplicity and ease of code reuse, have you considered defining the template module with regular functions, eg:

defmodule CodeGenTemplate do
  @c1 1

  def f1(x, c1 \\ @c1) do
    x + c1
  end
end

So then use CodeGen can take care of adding defdelegate to your module for each function in the template that isn’t also defined in your module, and the show_blocks and dump_source functions could do something like this to read/copy the source:

[edit] full code here: module_extend.ex · GitHub

quote/unquote is still the most idiomatic way of generating dynamic AST. Your proposal of having a “raw module” doesn’t allow the template author to support dynamic generating an AST that depends on dynamic options. For example, if you want to provide a Bootstrap5Components, you might want to allow the user to choose whether dorms are horizontal or vertical by default. How does your proposal deal with such dynamic options?

Oh and here’s some more code I had laying around experimenting with the defdelegate part of this. Plus it defines defoverridable which means you can use super :slight_smile:

[edit] full code here: module_extend.ex · GitHub

That’s part of being more idiomatic too, see for example the official advice about not using config for libraries but instead passing options as function arguments. I think this code suggests a middle ground approach (so you could pass it as a function argument in any case, and redefine the default in your module if you’ve dumped the source:

defmodule CodeGenTemplate do
  @c1 1

  def f1(x, c1 \\ @c1) do
    x + c1
  end
end

If I remember correctly, the advice against config options was exactly against setting options using config.exs/Application.put_env/3 because those are global in your application. Generating customized modules according to “config options” is completely different, because nothing is global: you simply define a module customized for your purposes, and different part of the application can generate different module. I believe that generating different code according to different options is an important feature, and the driver of most of my architecture.

The idea is precisely not to have to pass optional arguments, or even to customize what the default value of the parameter is! This is precisely the case with my PhoenixComponents example. One may want to specify a default width for form fields (i.e., set a default value dynamically at compile-time) and allow the user to override it on a case-by-case example (by making it an optional parameter in the component).

I think we have pretty different opinions fundamental principles of code organization and on the proper role for dynamic AST generations, and that’s ok, but I can’t see how your (admittedly more elegant proposals) can support the dynamic AST generation I need.

In your example, isn’t that what Phoenix.Component’s attr is for? eg. <.input width=9 /> so you can change the options when using a component?

No worries, it’s nice to see someone else playing with these kind of things as well!

1 Like

Not exactly. I want to be able to generate the following AST:

attr :label_width, unquote(default_label_width)
attr :input_width, unquote(default_input_width)

def input(assigns) do
  # ...
end

So that I can do something like this:

MyAppWeb.CoreComponents do
  use CodeGen,
    module: Bootstrap5Components,
    options: [
      default_label_width: 3,
      default_input_width: 9
    ]
end

but such that in other project (or the same project) I can have components with different parameters.

@tmbb makes sense, best of luck!

For anyone else following along, I’ve slightly updated the previously posted code (notably to obtain the last line number of a function definition along with the first so we can copy a entire function’s definition(s) block out of a code file), which means that combined with the inject_before_final_end function from CodeGen (thanks @tmbb :slight_smile: how are you planning the library?) we can inject/dump a function from a template module into our code.

[edit] full code here: module_extend.ex · GitHub

While your idea is not bad, since this can be achieved with functions only, I don’t think the complexity you are introducing is worth the result. The first rule of metaprogramming is to avoid the use of macros if the same result can be achieved with functions, especially when it comes to DSLs with new concepts.

I am with the same opinion as @mayel in regard of the syntax, if you really want to abstract function generation, you need to get rid of the macro syntax, otherwise I will just use classic metaprogramming as I already know how those concepts work and scale, I don’t have to think about the magic you are doing under the hood.

Having said all of this, this might be a good tool oriented not to end developer, but as a building block of a library that further can abstract implementation details.

1 Like

What is it that you say that can be achieved with functions only? The end result of useing CodeGen is to add functions to the currenty module without the user having to spell them out (just like use GenServer does, for example). That has to be done with metaprogramming, AFAIK.

Are you saying that we should delegate to the template module instead of adding the function definitions to the user module?

In my case, having a component library was the use case that inspired this library. In such a library, you might have fancier_input(assigns) which depends on input(assigns). If you delegate to another module, you’ll have something like:

defdelegate fancier_input(assigns), to: SomeLibraryModule
defdelegate input(assigns), to: SomeLibraryModule

This means that if you change our own implementation of input(assigns), those changes will not be reflected on fancier_input(assigns). And in this case I’d argue you want these changes to be reflected. The goal is not to be maximally functionaly pure, the goal is to have generators that work on demand by “generating” only the functions you want to customize.

By “macro syntax” you mean the quote block or the CodeGen.block "name" do ... end blocks inside the quote block?

In case it’s not clear, I’m not doing much more magic than what a normal use SomeModule macro would do. CodeGen is basically a fancier version of the use ... macro which supports (optionally) adding some code to your file. Normal use ... invocations (such as the ones you find in Phoenix or Plug, for example), already perform a huge amount of magic behind the scenes to make things work.

This is a fair point! I’m the fisr to aggree that CodeGen, despite being a simple library might be quite the concept shift from what is usually done in Elixir. Currently, everything that generates code in Elixir is either a mix generator or a macro. CodeGen aims to combine the best of both approaches, but it hasn’t been validade yet. I’m mainly putting it out there as an experiment to see what peoples’ opinions are.

Could you expand on this last point please? I’m not suer I understand. Do you think that end users should never have to call use CodeGen, ... themselves?

I really like your approach, and it’s certainly a interesting option if you don’t need to generate AST dynamically. The best part is that it handles comments correctly without the need for my @comment__ "..." hacks.

OK I see it now, basically the main point of this library is to be able to generate and inspect the source code of the injected blocks of code, it does bring some order to code generation. My bad, I thought that the main scope of the library was to have parametric function generation by passing options:.

Do you think that end users should never have to call use CodeGen, ... themselves?

While it would be great a introduction of such a tool for existing code generating libraries, especially for phoenix, I don’t think code generation should be used by someone making for example an phoenix application, because as tempting as might sound, it introduces a lot of complexity.

In my vision, I would love to use this library something like:


defmacro __using__(_env) do

@blockdoc """
  This block is responsible for injecting child specification, 
  read more about child specifications here. 
  Override this function for your custom child specification.
"""
CodeGen.block "child_spec" do

 def child_spec(init_arg) do
        default = %{
          id: __MODULE__,
          start: {__MODULE__, :start_link, [init_arg]}
        }

        Supervisor.child_spec(default, unquote(Macro.escape(opts)))
      end

      defoverridable child_spec: 1
end
end

In general now that I think about it, this does seem like a symptom of a larger problem, because what your library achieves is basically public code generation interfaces. Maybe AST manipulation/processing and code generation should be somehow separated, because at this moment is a big pile where you can do whatever comes to your mind.

I don’t understand the difference bewteen this and what I already have. Is it the fact that the module defines a “normal” __using__ macro instead of the wierd __gen_code__ function which is called by the use CodeGen macro?

There is no difference, I just showed an example where your library would come really handy and added the @blockdoc that would help users understand why they would want to override that piece of injected code, I think documentation is as important as the feature itself.

The @blockdoc seems like a good idea for documentation purposes. But I don’t know how to implement that… Maybe it would be better to document the blocks outside the inner quoted expression? It’s hard do get the @blockdoc from inside the quoted expression.

I guess the best way to document the blocks is probably inside the @moduledoc of the template module, but I’m not sure. Maybe there’s a more structured way of doing it which I’m not seeing.

It turns out that in order to deal with compile-time dependencies properly, __gen_code__(options) must be a macro and not a function.