First impressions of Elixir's core

josevalim · January 20, 2019, 2:27pm

Easily in which sense? Have you tried to move them out? It is not just a matter of moving a file from a directory to the other.

For example, could inspect be a separate library? Sure, it could. But if we do so, how would Elixir core, which cannot depend on any external package, do something as simple as:

def some_function(other) do
  raise ArgumentError, "expected an integer, got #{inspect(other)}"
end

The best we would be able to do is say “expected an integer, got something else” or we would have to reimplement the inspect functionality by hand.

What if you move a GenServer out? Then how would we implement the functionality in Elixir that relies on GenServers today?

So the answer why a lot of those things are in core is because building a reasonably useful application without them, including Elixir itself, would be very hard. Removing them is not easy because it means you would have to implement some good chunks of them to implement Elixir’s core itself. As others said, without ex_unit we couldn’t even write tests for core.

So what about mix and eex? They are in Elixir because although they are not needed in Elixir itself, they are needed for Elixir developers. You want to start a new project? Well, you need mix. You want to compile a project? Well, you need mix as well. But they are not required, to the point those are typically stripped out when running your software in production. In fact, if you want to ditch Mix and maintain your whole project with a Makefile, you can do it (and what’s how we did it before Mix).

This leaves us with Logger and IEx. We probably wouldn’t have a Logger in Elixir if Erlang/OTP didn’t have a Logger. But Erlang/OTP does ship with a Logger, as part of its kernel/stdlib, and if we didn’t have a Logger in Elixir, then it means you would get error reports and log messages with everything as Erlang terms. So Logger is needed at least as a translation layer. Same thing for IEx. It comes as part of Erlang/OTP’s kernel/stdlib, so providing a shell that understands Elixir is quite reasonable and relatively low effort. If we didn’t have one in Erlang/OTP, maybe we wouldn’t have one in Elixir too (although I would consider IEx to be an essential development tool).

So you can think of this as an onion. Inside core, we have a bunch things that are essential for writing software and often they are dependent on each other. For example, if Enum depends on GenServer and GenServer depends on Enum, how can you compile them in the first place? This is common in all programming languages that are mostly implemented in terms of themselves, which is why such languages need to find a way to bootstrap themselves. Some languages even require a previous version of the language in order to compile the new version. If you don’t want this, then the alternative is to write your core in another language.

Then on top of this core, we have another layer with applications which are necessary for developers to bootstrap their own applications and run them in production with good user experience. That implies a build tool, interactive elixir, properly formatted log messages, etc. At this point, it is much easier to organize those parts into applications because the “core” is ready. We can use almost all conveniences any Elixir developer would, except by the build tool itself.

Then the third layer is the one provided by Hex (which is not part of core) and the community.

If you really want to take the idea of a minimal core to the extreme, you can try this: in your next project, you can only use Kernel.SpecialForms. Nothing more. You will see that, in order to get anything meaningful done, you will have to reimplement a good chunk of Elixir’s core.

hauleth · January 20, 2019, 11:13pm

TBH only thing I would try to change in Elixir 2.0 (except the fact that @doc became plain attribute with accumulate: true instead of special handling) is to thin out Kernel.SpecialForms, especially for and with to be moved out of there and instead provide some other replacement which will not need to be “special”.

josevalim · January 20, 2019, 11:34pm

So every time this discussion comes out I have to remind everyone that for cannot be implemented using regular constructs because it relies on some VM optimizations that only work if we emit the proper Erlang abstract code. with also has some particular scoping rules that would be hard to implement as efficiently without being a special form but I want to submit a PR to Erlang/OTP for that.

asummers · January 21, 2019, 2:05pm

To that point, are there any other Erlang/OTP features on your wish list currently?

josevalim · January 21, 2019, 2:31pm

I wouldn’t call them features per-se. Just compiler improvements that could benefit everyone running on the BEAM.

OvermindDL1 · January 22, 2019, 6:16pm

Hmm? My cond library (talked about somewhere on these forums) reimplemented for quite well and outperforms it on benchmarks (since it can accept some optional typing information), all as just a normal elixir module and macro’s.

josevalim · January 22, 2019, 7:23pm

How well does it perform for binary generators and using the equivalent of into: ""? Because that’s the part that relies on the VM instructions for pre-allocated binaries.

OvermindDL1 · January 22, 2019, 8:37pm

Hmm, well mine does some tricks thanks to knowing the typing information but if Elixir’s for uses VM specific instructions that are not exposed to Elixir then I’m unsure how it could be faster, thus given this benchmark:

defmodule Helpers do
  use ExCore.Comprehension

  def elixir_2(b) do
    for\
      <<red::8, green::8, blue::8 <- b>>,
      red = div(red, 2),
      green = div(green, 2),
      blue = div(blue, 2),
      into: "",
      do: <<blue::8, green::8, red::8>>
  end

  def ex_core_2(b) do
    comp do
      <<red::8, green::8, blue::8>> <- binary b
      red = div(red, 2)
      green = div(green, 2)
      blue = div(blue, 2)
      <<blue::8, green::8, red::8>> -> ""
    end
  end

end

inputs = %{
  "Bin - 256 - into \"\" /2 and swap" => {List.to_string(:lists.seq(0, 255)), &Helpers.elixir_2/1, &Helpers.ex_core_2/1},
}

# This tests that both the Elixir and the ExCore versions have the same output for the same input
Enum.each(inputs, fn {desc, {input, elx, exc}} ->
  x = elx.(input)
  c = exc.(input)
  if x !== c, do: throw {:mismatch_bench_test, desc, input, x, c}
end)

actions = %{
  "Elixir.for"  => fn {input, elx, _core} -> elx.(input) end,
  "ExCore.comp" => fn {input, _elx, core} -> core.(input) end,
}


Benchee.run actions, inputs: inputs, time: 2, warmup: 1, print: %{fast_warning: false}

And it is fast enough that the error bound could be large, but the test result is:

╰─➤  mix bench comprehension
Compiling 1 file (.ex)
Operating System: Linux
Number of Available Cores: 6
Available memory: 16.430136 GB
Elixir 1.6.6
Erlang 21.2.2
Benchmark suite executing with the following configuration:
warmup: 1.00 s
time: 2.00 s
parallel: 1
inputs: Bin - 256 - into "" /2 and swap
Estimated total run time: 6.00 s



Benchmarking with input Bin - 256 - into "" /2 and swap:
Benchmarking Elixir.for...
Benchmarking ExCore.comp...

##### With input Bin - 256 - into "" /2 and swap #####
Name                  ips        average  deviation         median
ExCore.comp       61.85 K       16.17 μs    ±92.81%       15.00 μs
Elixir.for        55.93 K       17.88 μs    ±77.44%       17.00 μs

Comparison: 
ExCore.comp       61.85 K
Elixir.for        55.93 K - 1.11x slower

Mine is a bit over-careful about what it accepts, but any code it does accept it compiles fully and knows the types of as resolved (consequently it’s not quite as any-code-accepting as for is, but close), otherwise it will throw at compile-time. I haven’t got around to finishing this library (it has a number of parts, not just comp, I should push the latest version to github as it’s a bit out of date at this point… plus comp is the best working part of it regardless), but I’m thinking I really should find the time sometime.

Oh, also this is running on Elixir 1.6.6 on OTP 21. Elixir 1.7.0 had a backwards incompatible change from 1.6.6 in macro syntax (I reported it, I was told it was undocumented syntax and thus subject to change, though I contest that the great majority of the language syntax is undocumented anyway so that doesn’t bode well for many things potentially used…) that doesn’t let it work on newer versions (I also need to come up with a new syntax for what it broke, blah).

EDIT: And the benchmark timings hold the same even for significantly larger time runs.

josevalim · January 22, 2019, 9:01pm

Can you please try with this comprehension:

  for\
      <<red::8, green::8, blue::8 <- b>>,
      into: "",
      do: <<div(blue, 2),::8, div(green, 2),::8, div(red, 2),::8>>

We treat those as filters and I assume yours do not, which may cause differences. Just for a more apples to apples comparison. Thanks!

josevalim · January 22, 2019, 9:06pm

Please let me know if there is anything you believe is missing from the Syntax Reference page. We do lack a more formal definition of the grammar but saying “the great majority of the language syntax is undocumented” is not correct as we do cover many syntax rules and each individual token quite well.

OvermindDL1 · January 22, 2019, 9:31pm

Ah yes indeedy! Filters in mine are explicit via ... <- filter ....

This looks much more like the direction I’d expect it to take then!

╰─➤  mix bench comprehension
Operating System: Linux
Number of Available Cores: 6
Available memory: 16.430136 GB
Elixir 1.6.6
Erlang 21.2.2
Benchmark suite executing with the following configuration:
warmup: 1.00 s
time: 2.00 s
parallel: 1
inputs: Bin - 256 - into "" /2 and swap
Estimated total run time: 6.00 s



Benchmarking with input Bin - 256 - into "" /2 and swap:
Benchmarking Elixir.for...
Benchmarking ExCore.comp...

##### With input Bin - 256 - into "" /2 and swap #####
Name                  ips        average  deviation         median
Elixir.for        81.01 K       12.34 μs   ±136.09%       12.00 μs
ExCore.comp       64.51 K       15.50 μs    ±51.76%       15.00 μs

Comparison: 
Elixir.for        81.01 K
ExCore.comp       64.51 K - 1.26x slower

Not as bad as I’d expect overall. Any other optimizations I can try? Mine handily beats for with lists, maps, and custom types, so for winning because of inaccessible code on binary’s is good. ^.^

Technically for is type aware on binary patterns, just not any other types, it should expand that capability via optional declarations.

I do mean Spec when I say documentation. With a formal Spec then any and all syntax should be documented fully and any differences to the Spec, whether missing or extra, should outright be a bug (directed fuzzers are awesome for this!). Having forms that were working then suddenly break in a backwards incompatible way is very irritating… ^.^;

EDIT: Instead of having magical opcode access internally, why not expose them to the language itself via some function-like construct?

josevalim · January 22, 2019, 9:44pm

Agreed. I would also try larger binaries as benchmark input as that may make a bigger difference as you force new reallocations to happen.

I remember it was used to make a big difference back when we implemented it but I don’t quite remember the exact scenarios as it was a long time ago (before 1.0).

Regarding optimizations you can try, you can compile your binary comprehension into for, which is the same reason why we compile them to Erlang. But that doesn’t help solve the original problem which is not having them be a special form in the first place.

OvermindDL1 · January 22, 2019, 9:54pm

I added this input:

"Bin - 256*256 - into \"\" /2 and swap" => {List.to_string(Enum.intersperse(:lists.seq(0, 255), :lists.seq(0, 255))), &Helpers.elixir_2/1, &Helpers.ex_core_2/1},

And it is indeed slower still, 1.81 times slower!

##### With input Bin - 256 - into "" /2 and swap #####
Name                  ips        average  deviation         median
Elixir.for        80.71 K       12.39 μs   ±130.62%       12.00 μs
ExCore.comp       64.44 K       15.52 μs   ±130.52%       15.00 μs

Comparison: 
Elixir.for        80.71 K
ExCore.comp       64.44 K - 1.25x slower

##### With input Bin - 256*256 - into "" /2 and swap #####
Name                  ips        average  deviation         median
Elixir.for         316.70        3.16 ms     ±3.95%        3.12 ms
ExCore.comp        175.04        5.71 ms    ±17.26%        5.47 ms

Comparison: 
Elixir.for         316.70
ExCore.comp        175.04 - 1.81x slower

That is indeed quite a big difference, getting close to half the speed! ^.^

Hehe, I guess I could special case binary comprehensions to generate for instead, but that seems kind of cheating, this is my playground library for recreating Elixir’s standard library but faster and more following the usual styles (so my version of Enum is more following the Categorical Rules for example, and it’s faster). ^.^

EDIT: I guess ‘technically’ only binary comprehensions would have to be special forms so far. ^.^

eksperimental · January 23, 2019, 2:31am

From all the 5 applications, the one that I think it is used the least is EEx, so I went ahead and tried to remove it. It is only used by the mix new command, but it’s use goes back to

$ git log -S "EEx" -p lib/mix/lib/mix/generator.ex

commit 72dd2507b0e094db6c40a729005ccb25e7133d36
Author: José Valim <jose.valim@...>
Date:   Wed Jul 25 10:34:44 2012 +0200

    Initial work on mix new

Pretty much the same use we still have today in the current code.

+  @doc """
+  Embed a template given by `contents` into the current module.
+
+  It will define a private function with the `name` followed by
+  `_template` that expects assigns as arguments.
+
+  This function can be invoked passing no argument or passing
+  a keywords list. Each key in the keyword list can be accessed
+  in the template using the `@` macro.
+
+  For more information, check `EEx.SmartEngine`.
+  """
+  defmacro embed_template(name, contents) do
+    quote do
+      require EEx
+      EEx.function_from_string :defp, :"#{unquote(name)}_template", "<% @_abc  %>" <> unquote(contents), [:assigns]
+    end
+  end

I didn’t know EEx was used to render the templates that are generated when we run mix new.