Elixir_of_ocaml

So something I’ve been working on in my random little bit of free since last year was an ocaml to elixir transpiler. However, it’s dead. It works via a compiler plugin and for some reason OCaml is removing compiler plugins in the next major version update, so the current code-base is basically dead. It will need to be rewritten as a standalone compiler using OCaml’s compiler-lib, so that will take some time to figure out and then actually ‘do’.

Has anyone started a standalone compiler for an ocaml to elixir converter so I don’t duplicate effort or anything?

On my now-dead version it worked pretty well and handled a lot of constructs. I never got around to doing auto-uncurrying but it’s not hard to do (OCaml makes it really easy, probably would take an hour to do and test), but overall I think the code was pretty readable that it output.

Given this OCaml file I typed up right quick:

╰─➤  cat test/test_file.ml
let _ = 42

let a = 42

let b c d =
  let `Blah (e, f) = `Blah (c + d, 1) in
  let `Blorp g = `Blorp (e + f) in
  g + e

let c 21 = 42

let d a b = a +. b +. 6.28

let e b = if b then 1 else 2

let log b s = if b then print_endline s

let test () =
  let 42 = a in
  let 7 = b 1 2 in
  let 42 = c 21 in
  let 9.28 = d 1.0 2.0 in
  let 1 = e true in
  let () = log true "this is logged" in
  ()

Then compiling it with the elixir_of_ocaml plugin:

╰─➤  time ocamlopt -plugin ../elixir_of_ocaml.native.cmxs test_file.ml  
File "test_file.ml", line 10, characters 6-13:
10 | let c 21 = 42
           ^^^^^^^
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
0
File "test_file.ml", line 19, characters 6-8:
19 |   let 42 = a in
           ^^
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
0
File "test_file.ml", line 20, characters 6-7:
20 |   let 7 = b 1 2 in
           ^
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
0
File "test_file.ml", line 21, characters 6-8:
21 |   let 42 = c 21 in
           ^^
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
0
File "test_file.ml", line 22, characters 6-10:
22 |   let 9.28 = d 1.0 2.0 in
           ^^^^
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
0.
File "test_file.ml", line 23, characters 6-7:
23 |   let 1 = e true in
           ^
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
0
defmodule TestFile do
  _ = 42

  def a, do: 42

  def b(c) do
    fn(d) ->
      {:Blah, {e, f}} = {:Blah, {Kernel.+(c, d), 1}}
      {:Blorp, g} = {:Blorp, Kernel.+(e, f)}
      Kernel.+(g, e)
    end
  end

  def c(21) do
    42
  end

  def d(a) do
    fn(b) ->
      Kernel.+(Kernel.+(a, b), 6.28)
    end
  end

  def e(b) do
    if b do
      1
    else
      2
    end
  end

  def log(b) do
    fn(s) ->
      if b do
        IO.puts(s)
      end
    end
  end

  def test({}) do
    42 = a
    7 = b(1, 2)
    42 = c(21)
    9.28 = d(1.0, 2.0)
    1 = e(true)
    log(true, "this is logged")
    {}
  end
end

ocamlopt -plugin ../elixir_of_ocaml.native.cmxs test_file.ml  0.01s user 0.00s system 76% cpu 0.016 total

I just have it dump it to stdout, makes it easy to have a mix task take it in, format it with elixir’s formatter, then dump it out for example. And as you can see with ocaml lexing it, typing it, couple of miner optimization passes, and the to-elixir conversion and all, this file took ~0.016 seconds from a completely cold run. Technically I could get a custom driver a little faster so that will be a nice thing of the rewrite. I really should write that uncurry pass even if this code is dead… >.>

12 Likes

Why OCaml to Elixir instead of OCaml to Erlang? Is it so that you can recycle Elixir’s docstrings? I think that to preserve the docs in an OCaml to Erlang compiler you’d have to generate the BEAM file from OCaml (so that you could dump your docs in the doc chunk there)… And doing that would probably require compiling to BEAM bytecode directly.

Despite the way I’ve seen you praise OCaml’s design and modularity, the whole “plugin” stuff has always seemed a little hacky to me.

3 Likes

No, going to erlang or so wouldn’t be hard at all, this was just a play project for a while now and I wanted to see if I could get it to work with Elixir Macro’s, for use within things like a Phoenix Router and so forth.

That sounds excessively painful. ^.^;

Eh, it really is… The compiler is modular enough that it is easy to make a program that just calls the compiler steps directly to do what you need, that just has a lot more scaffolding and work than a plugin though, especially when all I need to do is just read the post-processed typed tree and then stop.

2 Likes

A typed web framework on the BEAM would be really cool! I wonder how feasible that is. Making it typed would probably require you to parametrize everything based on the type of the request. It would be a giant record type that would contain everything Phoenix provides plus user-specific stuff; probably different request types could be specialized to different pipelines. Maybe this could be made easier by something like row types.

2 Likes

Shouldn’t need to parametrize such things I wouldn’t think? Just need to type the interfaces properly.

Which OCaml has, but not really sure it is needed. Row typed objects in OCaml are easily the least used part of the language and is the sole part that most people wouldn’t give even half a thought about to removing, lol.

Suppose you have a controller module. A function in the controller would have the type: create: request -> response. But becuase each user may want to include somewhat different information in the request, you might want something like: create: 'a request -> response.

But because you have multiple chained functions in the pipeline (if you want to follow Plug’s model), you’d have something like: create: pipeline_layer_x -> pipeline_layer_y. That seems hard to model in a static typesystem

Actually that’s a perfect use-case for either static, or even polymorphic variants. I wouldn’t see any need to not use static though. Maybe make a user-specified map storage for any user-type, can even key it based on the type as well isn’t hard.


Just for the larf of it, I ran elixir_of_ocaml.ml through itself (never done that before actually, it’s huge… in filesize that is, processing time was still <0.2s), it’s interesting to see how everything converted, like some snippets:

A simple function dispatch on the second argument:

  def elixir_of_asttype_constant(depth) do
    fn
      {:Const_int, i} ->
        Stdlib.string_of_int(i)

      {:Const_char, c} ->
        Stdlib.String.make(1, c)

      {:Const_string, s, so} ->
        ("\"" <> (Stdlib.String.escaped(s) <> ("\"" <> Stdlib.Option.value(so, ""))))

      {:Const_float, s} ->
        s

      {:Const_int32, i32} ->
        Stdlib.Int32.to_string(i32)

      {:Const_int64, i64} ->
        Stdlib.Int64.to_string(i64)

      {:Const_nativeint, i} ->
        Stdlib.Nativeint.to_string(i)
    end
  end

How about the function that generates case function, which it itself has a match that gets converted to the case that the function itself is generating:

  def elixir_of_case(depth) do
    fn
      %{c_lhs: c_lhs, c_guard: c_guard, c_rhs: c_rhs} ->
        pattern = elixir_of_pattern(depth, c_lhs)
        guard = case c_guard do
          :None ->
            ""
          {:Some, expr} ->
            (" when " <> elixir_of_expression((depth + 1), expr))
         end
        body = elixir_of_expression(depth, c_rhs)
        (pattern <> (guard <> (" ->\n" <> (indent(depth) <> body))))
    end
  end

And yes, with a simple switch it can compile records as either maps or tuples, here it is as a tuple (definitely shows you the form a lot better!):

  def elixir_of_case(depth) do
    fn
      {_, c_lhs, c_guard, c_rhs} ->
        pattern = elixir_of_pattern(depth, c_lhs)
        guard = case c_guard do
          :None ->
            ""
          {:Some, expr} ->
            (" when " <> elixir_of_expression((depth + 1), expr))
         end
        body = elixir_of_expression(depth, c_rhs)
        (pattern <> (guard <> (" ->\n" <> (indent(depth) <> body))))
    end
  end

Although seeing some ‘larger’ records in use as tuples is more interesting:

  def elixir_of_cases(depth) do
    fn
      arg ->
        fn
          trimmable ->
            fn
              [] ->
                "nil"

              [{_, {_, :Tpat_any, _, _, _, _, _}, :None, c_rhs} | []] ->
                prebody = if trimmable do
                  ""
                else
                  (arg <> ("\n" <> indent(depth)))
                end
                (prebody <> elixir_of_expression(depth, c_rhs))

              [{_, {_, {:Tpat_var, _, _}, _, _, _, _, _}, :None, c_rhs} | []] ->
                prebody = if trimmable do
                  ""
                else
                  (arg <> ("\n" <> indent(depth)))
                end
                (prebody <> elixir_of_expression(depth, c_rhs))

              [{_, {_, {:Tpat_construct, _, {_, "()", _, _, _, 0, _, _, _, _, _, _, _, _, _}, []}, _, _, _, _, _}, :None, c_rhs} | []] ->
                prebody = if trimmable do
                  ""
                else
                  (arg <> ("\n" <> indent(depth)))
                end
                (prebody <> elixir_of_expression(depth, c_rhs))

              cases ->
                ndepth = (depth + 1)
                cases = Kernel.|>(Kernel.|>(cases, Stdlib.List.map(elixir_of_case((ndepth + 1)))), Stdlib.String.concat(("\n" <> indent(ndepth))))
                ("case " <> (arg <> (" do\n" <> (indent(ndepth) <> (cases <> ("\n" <> (indent(depth) <> " end")))))))
            end
        end
    end
  end

I really need to run this through the elixir formatter… As you can see on this, the BEAM doesn’t support ‘or’ matchers in its matchspec like OCaml does, so I have to break those up in to multiple otherwise identical case heads.

Doesn’t come across quite as clean to elixir, mostly because still needing to uncurry things and make the formatting a bit smarter (eh, no real need on that though, just run it through the elixir’s formatter was always intended).

1 Like

Do you have the code somewhere public?

Right now it’s all tangled up in my sandbox projects. I meant to rip it out into it’s own thing once I got time to actually flesh it out and move it into multiple files instead of one giant file. But with the plugin system dying, not sure it’s worth the work. ^.^;

I guess I could post that massive single file somewhere, it’s quite the opposite of pretty though and I’d need to make a new makefile for it or so (or setup a dune project for it)…

1 Like

That single file might be enough for me to decide whether I can help you port the code into the new compiler-lib thing…

2 Likes

From what I’ve been looking, it would be easier to fork the compiler if I want to keep the same processing. I was planning on having such a thing compile to both elixir code and a native ocaml BEAM Node in time, which isn’t near as easy to do with just writing a new compiler driver, plus they are actually removing the internal hooks, not just the plugin, hence why forking would probably be easier…

Well, you can always fork, but then you won’t be able to use the awesome magic juice of algebraic effects when they land.

1 Like

How are you handing algebraic datatypes with normal static constructors (Foo instead of ’Foo)? Do you optimize them away?

I don’t feel qualified enough to write OCaml-to-Elixir transpiler but thought of it several times.

I am making baby steps in learning OCaml (way too busy and fighting for my health to have enough free time!) and I have to say I quite like the syntax, more and more, as I go further. Still haven’t discovered much of that static typing goodness but even the little I’ve seen impressed me a lot.

OCaml has an amazing compiler from what I can tell. IMO the community should definitely invest in more transpilers.

Not really sure how you would have BEAM processes in the source OCaml code though…

2 Likes

Polymorphic variants like `Foo becomes a plain atom like :Foo and static variants like Foo become an Elixir scoped atom like Foo (I.E. :"Elixir.Foo"), reasoning was is treating polymorphic/open variants like plain atoms and static variants like closed module-level types, and thus that made for an easy distinction. Honestly they could both just be plain atoms and it would work fine, the typing system in ocaml would ensure it is impossible for there to be a mixup (but it made for easier inspecting!).

Easier than I thought, just a whole lot of stuff to go through, and a lot of it is still stubs in mine. ^.^;

I really like the language, including the syntax (even with as ‘old’ as it is), it’s always clear as to what is happening (and use ocp-indent to make the lines indented properly too!).

It is sooo easy to work with, I really enjoy it! I’m so sad the plugin system is going away, but I might just have to deal with making a standalone app instead, that will require two compilations though, one with mine and one with the normal compiler then, which I really don’t like, but eh…

Already considered that long ago, just a black-boxed messaging type that has to be ‘matched’ on like match msg with ... (or even match receive () with ...) and a module OTP/BEAM/Whatever type to match on. That would fit the beam ecosystem best. Trying to type the actual mailboxes themselves is an impossible errand if you intend for it to work with elixir/erlang/lfe/etc code too, so you need that segmentation point, which thankfully is already like how erlang/elixir works anyway (have to match out the messages!).

EDIT: And as for the variant/atom things, I had it ready (though not built yet) to take normal ocaml attributes to be able to define how a type should be represented in code, that’s why I had things like records able to be tuples or maps based on a switch, I intended for that switch to eventually be an attribute so it could be customized on each types basis, same for a lot of things with variants and so forth so you could do things like have Some thing and None match to Erlang’s {:ok, value} and :error or even to elixir’s value and nil, and similar things with the Result type too, among any other custom type.

1 Like

This sounds like you’re creating another Bucklescript project for OCaml. Did you know about Rehp which is a project that forks js_of_ocaml and makes the backend pluggable?

3 Likes

Has anyone ever seen this: https://github.com/purerl/purescript?

1 Like

Purescript is basically a version of Haskell with better typeclasses and some other things. OCaml’s lack of typeclasses bothers me a lot because without typeclasses the usual comparison operators (=, >=, etc) need to be implemented using compiler magic.

Because it’s a reimplementation of Haskell, it doesn’t have some of OCaml’s features, like first class modules, so it might be a tough choice.

1 Like

Purescript is not really a re-implementation of Haskell in any reasonable sense. It’s Haskell without GHC or crazy extensions coupled with strict evaluation, sort of making it JS written in a cleaner Haskell syntax. Pureerl is an alternate backend to the the language(I don’t think it’s used a lot), with JS being the primary target(used a lot more).

Typeclasses as a concept are the same, but they’re more granular and the hierarchy is very well-organized, so it feels very clean to work with them. One more notable feature is built-in row polymorphism, so it becomes very simple to work with data from JS-land.

IMHO, it’s a small and pleasant language to work with. The community is a bit on the smaller side, so you do end up having to write bindings for many common libraries you might want to use.

2 Likes

Oh making a backend for ocaml is easy, unfortunately it’s so low level after after so many optimization passes that the type information is just outright gone by then, even something like `Blah turns into just an integral number and something like Vwoop 42 turns into a 2-box structure (called a block in it). I was trying to get readable output code, which is just a bit more difficult to do. ^.^;

Yep, been watching it for years, very slow going. Also compiled rather slowly on any moderately complex code, which OCaml does not suffer from.

+1

I really really dislike row polymorphism, it’s so easy to accidentally slip the wrong types and they are much much harder to optimize on the back-end level (not an issue when compiling to javascript of course, but -in general’).

1 Like