Decompile BEAM files to Elixir source code

Is it possible to decompile BEAM files with :debug_info chunk to actual Elixir source code? This data is from :debug_info

iex(2)> Secret |> :code.which |> :beam_lib.chunks([:debug_info]) |> elem(1) |> elem(1) |> Keyword.get(:debug_info) |> elem(2) |> elem(1) |> Map.get(:definitions)
[{{:hello, 0}, :def, [line: 26], [{[line: 26], [], [], :world}]}]

And this is actual Elixir AST

iex(3)> quote do
...(3)> defmodule Secret do
...(3)>   def hello, do: :world
...(3)> end
...(3)> end
{:defmodule, [context: Elixir, import: Kernel],
 [
   {:__aliases__, [alias: false], [:Secret]},
   [
     do: {:def, [context: Elixir, import: Kernel],
      [{:hello, [context: Elixir], Elixir}, [do: :world]]}
   ]
 ]}

If it’s possible to transform :debug_info to AST - please write how, thanks!

What would the use case for this? Maybe there is a simpler way to achieve what you want?

You might find this helpful

1 Like

The idea is to create a analysis tool that will work with AST of Elixir program

The problem is that the transformation from Elixir to BEAM bytecode loses information. A simple example:

a = 1
a = 2

will result in bytecode somewhat like

a = 1
a1= 2

Because rebinding variables isn’t possible in bytecode, so Elixir fakes it. There are plenty of such examples. Decompiling BEAM bytecode to Erlang is fairly straightforward and precise, but getting back to the original Elixir AST is, I think, almost impossible. The simpler solution would be to just stash the AST in a BEAM segment (the format is pretty extensible so while I’m not sure whether this is already happening somewhere, I’m sure it is not too hard to do this).

3 Likes

At the time you wrote this it seems that was already possible with Dbgi chunk, that you can see the pull request for it here, and get some more background from this 2017 Jose video:

But I may be misunderstanding something, because compilers and AST’s are not my thing :wink:

1 Like

My project works with a lot of code but not all code:


3 Likes

I have also done some experiments with the BEAM file. See the little beam_file project.

With BeamFile you can create byte, erl, and elixir-code (with the limitation discussed here) from the beam file.

Example:

defmodule Example.Math do
  @moduledoc "Math is Fun"

  def add(number_a, number_b), do: number_a + number_b

  def odd_or_even(a) do
    if rem(a, 2) == 0 do
      :even
    else
      :odd
    end
  end
end
iex> {:ok, code} = BeamFile.elixir_code(Example.Math)
iex> IO.puts(code)
defmodule Elixir.Example.Math do
  @moduledoc """
  Math is Fun
  """

  def add(number_a, number_b) do
    :erlang.+(number_a, number_b)
  end

  def odd_or_even(a) do
    case(:erlang.==(:erlang.rem(a, 2), 0)) do
      false ->
        :odd

      true ->
        :even
    end
  end
end
5 Likes

Hello dear friends. hope you can help me.
I have some .beam files I want to decompile into .erl files (source code)
How can I do this. Please help me, I will thank you.

It depends on if the beam file includes the AST or not. Try this:

I believe there are some mechanisms in the compiler to turn the beam bytecode into Core Erlang if you just want to deduce the logic similar to an decompiler

1 Like

Is it possible to get this SSA IR of an Elixir module?

It’s possible to generate it for an Elixir file using @compile (tip from this issue):

# in foo.ex
defmodule Foo do
  @compile :dprecg

  def foo(a, b) do
    a + b
  end
end

then compiling with elixir foo.ex complains:

$ elixir foo.ex
** (CompileError) foo.ex: could not compile module Foo. We expected the compiler to return a .beam binary but got something else. This usually happens because ERL_COMPILER_OPTIONS or @compile was set to change the compilation outcome in a way that is incompatible with Elixir

but does produce a foo.ex.precodegen file:

# foo.ex.precodegen
module 'Elixir.Foo'.
exports [{'__info__',1},{foo,2},{module_info,0},{module_info,1}].
attributes [].

%% Counter = 16
function `'Elixir.Foo'`:`'__info__'`(x0/_0) {
  %% _0: 0..1 0..2 10..11 14..15
0:
  [1] switch x0/_0, ^3, [
    { `attributes`, ^13 },
    { `compile`, ^13 },
    { `deprecated`, ^8 },
    { `exports_md5`, ^10 },
    { `functions`, ^9 },
    { `macros`, ^8 },
    { `md5`, ^13 },
    { `module`, ^7 }
  ]

7:
  [3] ret `'Elixir.Foo'`

9:
  [5] ret `[{foo,2}]`

10:
  [7] ret `<<63,179,243,139,216,207,23,209,142,119,33,235,17,74,57,60>>`

8:
  [9] ret `[]`

13:
  %% @ssa_ret:6: 11..13
  [11] x0/@ssa_ret:6 = call (`erlang`:`get_module_info`/2), `'Elixir.Foo'`, x0/_0
  [13] ret x0/@ssa_ret:6

3:
  %% @ssa_ret:14: 15..17
  [15] x0/@ssa_ret:14 = match_fail `function_clause`, x0/_0
  [17] ret x0/@ssa_ret:14
}

%% foo.ex:4
%% Counter = 4
function `'Elixir.Foo'`:`foo`(x0/_0, x1/_1) {
  %% _0: 0..1
  %% _1: 0..1
0:
  %% foo.ex:5
  %% _2: 1..7
  [1] x0/_2 = bif:'+' x0/_0, x1/_1

  %% @ssa_bool: 3..5
  [3] z0/@ssa_bool = succeeded x0/_2
  [5] br z0/@ssa_bool, ^3, ^1

3:
  [7] ret x0/_2

1:
  %% @ssa_ret: 9..11
  [9] x0/@ssa_ret = call (`erlang`:`error`/1), `badarg`
  [11] ret x0/@ssa_ret
}

%% Counter = 4
function `'Elixir.Foo'`:`module_info`() {
0:
  %% @ssa_ret:3: 1..3
  [1] x0/@ssa_ret:3 = call (`erlang`:`get_module_info`/1), `'Elixir.Foo'`
  [3] ret x0/@ssa_ret:3
}

%% Counter = 4
function `'Elixir.Foo'`:`module_info`(x0/_0) {
  %% _0: 0..1
0:
  %% @ssa_ret:3: 1..3
  [1] x0/@ssa_ret:3 = call (`erlang`:`get_module_info`/2), `'Elixir.Foo'`, x0/_0
  [3] ret x0/@ssa_ret:3
}
1 Like

Of course but the further you get away from something similar to Erlang the harder it is to have Elixir code that makes sense. That is why I choose to translate the Erlang AST to Elixir and even then there are some things that is hard to create nice code from.

Agree. What I want to explore is to represent the Erlang SSA IR in MLIR. So that I can lower it to LLVM and further anything LLVM supports.