Gleam, a statically typed language for the Erlang VM

But are they still tuples behind the scenes?

They are maps, to enable polymorphism at runtime :slight_smile:

I’ll hop on IRC and we can have a chat

I hate implicit in Scala … This is was drove me out this language.
The implicit give you a lot of power but from other side is low level tool and can you drove crazy.
The implicit in Scala will give you only mess and pain … :slight_smile:

@mkunikow If Gleam gains ad-hoc polymorphism through implicit arguments we will be careful to learn from any shortcomings of Scala’s implicits.

Two small updates:

We now have an IRC channel, #gleam-lang on freenode.

Secondly I’ve been interviewed by the folks at a This Is Not a Monad Tutorial. Read it here -> https://notamonadtutorial.com/an-interview-with-the-creator-of-gleam-an-ml-like-language-for-the-erlang-vm-with-a-compiler-e94775f60dc7

9 Likes

Hmm, I find this odd. The usual syntax for ML’y languages (with some OCaml specific in terms of row-records and so forth) is:

  • Tuples:
    • Values: (), (1, 2.3) (in actuality the parenthesis are often optional, so 1, 2.3 is equal to (1, 2.3))
    • Types: unit, integer * float
  • Records:
    • Values: {}, {a=42, b=6.28}
    • Types: {}, {a : integer, b : float}
  • Row-Records:
    • Values: object end, object let a = 42 let b = 6.28 end
    • Types: <>, <a : integer, b : float>
  • List:
    • Values: [], [1; 2; 3]
    • Types: 'a list, integer list
  • Arrays:
    • Values: [||], [|1; 2; 3|]
    • Types: 'a array, integer array
  • A lot more like Variants, Polymorphic Variants, GADT’s, et…

Now I’m not actually that much of a fan of the above declarations, honestly I’d have a type system syntax be based more like this:

  • Tuples: Internally this is implemented as just a record with names like _0, _1, etc. I still doubt whether this should be a base type or just a built-in type.
    • Values: (), (1, 2.3), 1, 2.3
    • Types: 'a tuple, (integer, float) tuple
  • Records: This is a base system type, although you could implement it as a GADT inside a list, I think this is worthy as a base system type.
    • Values: {}, {a : 42, b : 6.28}
    • Types: {}, {a : integer, b : float}
  • Row-Records: Instead of being a base type this could potentially by build on maps, even though internally they really should not be implemented as a map… Keeping <> as the delimeters might be useful but there are such better things to do with </> I think…
    • Values: {||}, {|a : 42, b : 6.28|}
    • Types: {||}, {|a : integer, b : float|}
  • List: Having lists use ; as the separator is great when you can do tuples like , without parenthesis, but if tuples have mandated parenthesis then just return list separators to ,. This is not a base type as it would also be implemented as a record (cons cell). Technically it doesn’t even need a built in special syntax, could just use function calls for it with perhaps a macro layer on top to add its special syntax?
    • Values: [], [1, 2, 3]
    • Types: 'a list, integer list
  • Arrays: Honestly I’m torn on this, it could be a built-in type (maybe even a base type) for efficiency reasons, but it really doesn’t map quite perfectly on to all likely backends, so keeping it as a built-in but not a base type might be best… I’m actually not sure it should even have it’s own syntax and should rather just be function calls.
    • Values: [||], [|1, 2, 3|]
    • Types: 'a array, integer array

In general, I would recommend not having {...} be both tuples and records. Although tuples are very often implemented on top of records in a type system (and records usually compile down to tuples at lower levels), they are different concepts and empty versions of each have different purposes.

Honestly you don’t even need to have tuples be a built in type, you could even easily make a user defineable operator that just builds tuples into a record, something like:

let (,) a0 a1 = {_0 = a0, _1 = a1}
let (,,) a0 a1 a2 = {_0 = a0, _1 = a1, _2 = a2}
let (,,,) a0 a1 a2 a3 = {_0 = a0, _1 = a1, _2 = a2, _3 = a3}
...etc...

And in fact this is exactly how some languages define their tuples (why do you think OCaml has something like 1, 2, 3 === (1, 2, 3), the parenthesis is just a state separator, not part of the tuple syntax). And if you have something like an implicit module system or typeclasses along with record concatenation then you could implement every possible tuple on-demand via like two implicit modules.

As a workaround for now though you could follow most ML languages and call your empty tuple the unit type?

Ah, row-typing records, I’m guessing these are implemented as a map instead of lowering to a tuple when compiling to the BEAM or do you compile in efficient dispatchers instead? Although row-typing can be compiled down to maps, that is by far from the most efficient format.

Honestly I’d just remove tuples as a base type, just use the comma operator like most languages with tuples do that compile to a static record (which would lower down to a tuple on the BEAM anyway). Or if you really really want a built-in type for tuples then use parenthesis and not curly brackets.

And I really wouldn’t conflate the syntax for static records and row typed records either, you could go the OCaml route and use {...} for static records and <...> for row-typed records for example (it uses different operators to access them too, staticrecord.field for a static record and rowtyped#field for a row typed record, which enforces the type of which it can be).

Using different delimiters for all 3 would fix that too. They are different constructs with different memory representations and different uses, thus they should not have the same syntax.

Well in most functional languages a function call only takes 1 argument, ever, so something like blah(1, 2, 3) in an ML language is quite literally calling blah with 1 argument, which is the tuple of 1, 2, 3 (which the compiler lowers down to efficient calling styles regardless) so that would be entirely fine if you follow the same pattern (where the function name has its first gleam argument ‘spread’ across the application argument on the BEAM, can be made perfectly efficient and perfectly unambiguous, especially as I would not recommend auto-currying, although auto-partial-application is fine, just don’t auto-curry as it’s not great on the BEAM, ML languages don’t even auto-curry either much as most people think otherwise because of the syntax pretends it does as the pattern of calling a function with a single tuple is the most common ML pattern, OCaml is actually a bit of an oddity here but it optimized the more simple style to work as efficiently as the calling with a tuple).

Honestly I’d fall back to OCaml’s syntax when it is questionable as it has been extremely well researched and chosen to be unambiguous, easy to parse by both machines and humans, and extremely efficient.

Unsure about gleam, but in ML languages blah(1, 2) is no different than blah (1, 2) as is the same in many languages, Elixir is an oddity here. ^.^

I’m not sure that’s generally considered polymorphic in the typing-verse, rather that seems like it is just normal row typing?

Hmm? :slight_smile:

OCaml works around that by only allowing that syntax in the types, the runnable code section uses words of object ... end.

In typing systems records field names are part of their specification, so the names have to match and the types with the names have to match, the position of the fields don’t actually matter.

Yeah that’s why I’m more partial to : instead of = for that, although I think = is still unambiguous anyway as in records to the left of the first instance of a = is the name and on the right side is the pattern.

They are exceptionally more common in functional languages as placeholders or tags. Though if a tuple just compiles to named static records then an empty tuple and an empty record are identical anyway so {} is unambiguous and only has a single value anyway.

Records should be a base type, binaries should be a base type (or rather maybe arrays should be a base type and a binary is just a utf8 array or so). Tuples, lists, maps, etc… should not be base types but rather are built on top of the base types as most functional languages do the same.

Then they aren’t static records, they are row-typed records. You really exceptionally should add static records as well, not just for certain efficiency reasons but also because it constrains the type. In general row-typed records should be extremely seldomly used as they allow one to write code that can more easily logically break where static records are far more hardy in program design.

Uh, isn’t that was a single head variant is? In OCaml this:

type cartesian = Cartesian float * float

Quite literally compiles to the same assembly as just:

float * float

Just it is more type safe. A single head enum is only information for the type system, it should not add anything to the generated code at all and in doing so is generally considered a bug in many languages.

In a normal/nominal HM style typing system static records compile down to what is essentially a tuple (or struct in C parlance). Row records compile down to a dispatch table with implicit witnesses being passed with it (essentially it creates a hidden 2-field static record that one of the fields is the internal ‘static’ record type of the row record, and the other field points to a dispatch table, basically a Lens, for getting/updating the internal record, where this Lens is updated base on the function that it is being passed in to and the types that it expects the row typed record to have, or you can implement it inefficiently by just using a map (which uses both more value space and more execution time but a slight reduction in generated code due to no Lenses).

Implicit parameters in Scala are signficantly more powerful than, say, implicit modules though, and far harder to reason about than implicit modules. Think of implicit modules like typeclasses, not like scala implicits.

That’s rather pretty horrifying for a record type, that means it won’t be O(1)… >.>

There is no reason to require this. The compiler is the one that decides which types are dispatched over polymorphically and which are not.

1 Like

From the link:

I’d like to enhance how atoms are represented at type level. Currently we can say “this value is an atom”, but that’s about it. It would be more useful if we could say “this value is the atom ‘ok’ or the atom ‘error’”, or “this function can takes the atom ‘up’ or the atom ‘down’, but no other atom”. This could also be extended to create polymorphic enum variants too, though I’m unsure whether it makes sense to have those as well as Gleam’s existing pre-declared enums.

So… OCaml’s polymorphic variants? ^.^

Essentially:

OCamlElixir
`Blah :Blah
`Blah 42 {:Blah, 42}
`Blah(1, 2.3, "4") {:Blah, 1, 2.3, "4"}

Which is very similar to how OCaml already compiles it. For example the polymorphic variant `Blah gets compiled in assembly to the integer 737303889, just as atoms get interned into into the atom table on the beam, and the polymorphic variant `Blah 42 gets compiled into assembly of the equivalent of a C struct of struct { int _tag = 737303889; int _0 = 42; }.

Being polymorphic variants you can constrain the ‘values’ of the polymorphic variants accepted in a variety of areas, like the closed type of [> `Blah | `Blorp of int ] means that it accepts those two polymorphic variants, or any smaller version of it (meaning you can pass in just `Blah by itself just fine). The open type of [> `Blah | `Blorp of int ] means it accepts any combination of the ones specified (like the closed type), or it accepts anything else (it may just not do anything with them). These two types were auto-detected via these:

# let a = function
| `Blah -> 1
| `Blorp a -> a
;;
val a : [< `Blah | `Blorp of int ] -> int = <fun>

# let b = function
| `Blah -> 1
| `Blorp a -> a
| _ -> -1
;;
val b : [> `Blah | `Blorp of int ] -> int = <fun>

And thus you can see why it deduced that. :slight_smile:

There is no reason to require this. The compiler is the one that decides which types are dispatched over polymorphically and which are not.

Exactly this. There is no reason for the default record type to not be a static record (compiling down to a beam tuple). For example OCaml has modules, module interfaces, and first class modules (all of which are also on the beam in some form) and it just builds a dispatch table to dispatch a non-exact interface to a more open module, like row-types work (or in erlang parlance it apply’s, so when compiling to the beam it becomes just a normal call on a module binding, so no extra work), and Functor Modules in OCaml are just modules that take an extra argument to all their dispatch calls (I.E. what tuple calls did on the BEAM before they were broken out, so you’d have to do generate it manually if you want Functor’s, not that it’s hard, just more ugly to read the generated code since it ‘infects’ all non-static first class module calls).

Hmm, I find this odd. The usual syntax for ML’y languages (with some OCaml specific in terms of row-records and so forth) is:

It’s not a goal to have ML inspired syntax, so yes, will be quite odd by contrast! I’m hoping it will feel more familiar to people who use C style languages such as Rust or Javascript.

Ah, row-typing records, I’m guessing these are implemented as a map instead of lowering to a tuple when compiling to the BEAM or do you compile in efficient dispatchers instead?

Yes, records are row typed and compile to maps. If O(1) access is required then they should be avoided in favour of enums or tuples.

They are different constructs with different memory representations and different uses, thus they should not have the same syntax.

I agree! A better syntax for tuples has yet to be decided upon. Today I’ve been leaning towards #(1, 2, 3)

Well in most functional languages a function call only takes 1 argument,

I’m not sure I believe that most functional languages do this! Perhaps most ML languages. :wink:

Honestly I’d fall back to OCaml’s syntax when it is questionable as it has been extremely well researched and chosen to be unambiguous, easy to parse by both machines and humans, and extremely efficient.

I can think of a few instances in which OCaml syntax is ambiguous, and while personally I quite like the OCaml syntax I do think it is telling that ReasonML has managed to gather quite a following purely by removing it.
I intend to study and take much from OCaml’s type system, though don’t plan to study the syntax.

Unsure about gleam, but in ML languages blah(1, 2) is no different than blah (1, 2) as is the same in many languages, Elixir is an oddity here. ^.^

Gleam is not like ML here, it is closer to Erlang, Scala, etc.

I’m not sure that’s generally considered polymorphic in the typing-verse, rather that seems like it is just normal row typing?

I’m not an expert here, but the papers I’ve read have called it polymorphism. Wikipedia seems to agree. Row polymorphism - Wikipedia

Either way, we are using row types :slight_smile:

Without this feature implementing behaviours would not be possible, so it’s extremely useful.

Hmm? :slight_smile:

Modules are row typed and so a behaviour is type sig that specifies a module with at least X Y Z functions.

In general row-typed records should be extremely seldomly used as they allow one to write code that can more easily logically break where static records are far more hardy in program design.

As an Elm / Purescript programmer I’ve not had this experience. Could you detail these situations?

Just it is more type safe. A single head enum is only information for the type system, it should not add anything to the generated code at all and in doing so is generally considered a bug in many languages.

In Gleam enums will always be compiled the same way, regardless of whether they have 1 variant or more. This is largely to enable predictable interop with the wider ecosystem, which is one of the primary project goals.

The more efficient type wrapper you’ve described will be a different syntax.

That’s rather pretty horrifying for a record type, that means it won’t be O(1)… >.>

To me this is a problem with the connotations of the name. If people think records are O(1) (which OCaml and Erlang programmers likely will) should we pick a different name?

My problem is that “struct” means O(1) to a greater number of programmers and “object” has OOP connotations to yet again a greater number of people. :frowning:

There is no reason to require this. The compiler is the one that decides which types are dispatched over polymorphically and which are not.

@Qqwy Gleam records are to be easily usable from Erlang and Elixir and so we cannot rely on compiler optimisation for this.

So… OCaml’s polymorphic variants? ^.^

Yes, completely! :slight_smile:

As I see it, you have three choices you can base your records off of:

  1. Maps (just like Elixir structs). This has as advantage that the data-structures are self-documenting: The names of the fields are in the map after all. The disadvantage is that they are relatively slow.
  2. Tuples (just like Erlang records). This has the advantage that it is relatively fast, but the disadvantage that the fields are indicated by position instead of field name, which is not self-documenting.
  3. References. This has the property that they are not easily introspectable/usable from Elixir/Erlang from the get-go; they can only be used by calling the appropriate NIFs that might turn their contents into other Elixir/Erlang types. However, they are probably (depending on how you write your NIFs) very fast and allow you to be less coupled to how Elixir/Erlang structure things, as well as do more data-hiding. Depending on what you want to do, this can be an advantage or a disadvantage.

In all three of these cases, the code that does polymorphic dispatch based on them will be similar.
Especially because in a more ML-like language Sum-types are much more common than in plain Erlang/Elixir, I personally would expect performance to play at least some role in making the decision of which one you’d want to use.

Actually, you could even make this configurable (although that might require a bit more work up-front of course).


In any case, let it be known that I am very excited for Gleam! :smiley:

I doubt self-documenting is the most important quality here. I think this method may be necessary to have polymorphic record matches implemented by the beam at runtime. If I want to match any record containing element foo, and records are implemented as tuples, how are match clauses generated?

1 Like

Well, you can do what Erlang does for their Records. It might even be possible to use Erlang’s or Elixir’s glue code to work with them.

That said, I might have misunderstood what kind of records you were talking about: If you only want to use them as map-like records with named fields, using maps as underlying storage (or just plain using maps) would make sense.

However, if you want to base all user-defined sum and/or product types off of them (and I thought this is what you were trying to do, since you were talking about polymorphism), this is where I would advise you to use tuples as underlying datastructure because of their extra performance.

Well a Rust tuple would be instanced like (1u8, 2.3f32, "hello") so that follows the ML pattern. ^.^

So… they aren’t records then? o.O

You really should call them maps or so instead. The different performance characteristics could easily confuse someone who might suddenly wonder why their O(1) algorithm is running in O(log(n)) time.

Well most languages from OCaml to Rust to a whole lot more just use (1, 2.3, "4"), so… :slight_smile:

Even beyond ML languages, conceptually most any language that is based on HM typing use just single-arg functions because they map to the typing system. ML languages (OCaml being the outlier in that this is not the common pattern for it due to its better optimization engine compared to other ML languages) use the pattern of a tuple being that single argument. :slight_smile:

Hmm? Do tell? It’s syntax is very specifically designed to be entirely unambiguous and trivial to parse, you know the base type of anything from the syntax. ReasonML is a javascript-like syntax layer on OCaml, it is not type knowledgeable or anything of the sort, it is a trivial string transformer (which would not be possible if there were ambiguous syntax without actually typing the code).

You are missing a lot then by not looking at ML languages, there is a reason they are taught in schools and it’s not just the typing system (considering some dynamically typed ML’s are also taught routinely). :slight_smile:

This isn’t just an ML thing, the spacing doesn’t matter in everything from C to javascript to Scala to Python to etc… This is an extremely usual thing. It is very much Elixir being the oddity. ^.^

Row polymorphism yes, I.E. polymorphic over the order of field entries, but it is not a polymorphic type, it is still very static.

I thought it had the characteristics of a map type though, since you said above that they will compile down to a beam map? Those are more costly than row interfaces (which have a higher access cost than a static array but is a flat constant cost, I.E. O(1), not O(log(n)) that a map is, of which even that constant cost is often cheaper than even a single entry map lookup).

That is primarily to make it easier for javascript programmers, but in general using row typed records means that records become extremely weakly named. Even OCaml’s row records can have named interfaces (which is also unusual for row records, but even in OCaml that is optional, although using them allows OCaml to generate more efficient code), but Elm’s and TypeScript’s do not, they mostly just pray that the, for example, V8 JIT engine’s optimizer will properly see that the shape of the resultant structures is the same so it can optimize them into a static structure, though if you pass differently shaped records into the same function then that breaks down and you lose efficiency. Static records mean you always gain that optimization in JS regardless, and on something like the beam that doesn’t have a shape optimization engine means that you gain direct access O(1) speed, where with row typing the V8 engine can ‘usually’ remove its cost (depending on its access patterns), on the BEAM you will always incur that speed hit if you implement it as a map, and even if you optimize it to actual indirect calls instead of a map on the beam then the beam won’t optimize that out either where an optimizing compiler like LLVM could optimize it entirely to direct calls in the great majority of cases if not all depending on if you wish to optimize for code speed or code size.

The usability issues of row typing on the other hand are a whole other bag of worms, it effectively lets you treat a ‘thing’ like it is duck typed, this means it lets you pass a value into a function that the function might not actually know how to operate on. The trivial example would be something like:

length {x=x, y=y} = sqrt(x + y)

Then you could pass something like length {x=1, y=2} in to it and it works fine, but it also lets you pass something like length {x=1, y=2, z=3} in to it and get the wrong answer, and yet it compiles fine. And yes this specific example was taken from a bug report I saw on an Elm library back when I was still using Elm.

And yes, even though OCaml has row typing almost none of the community uses it. You will be hard pressed to find any library that even touches it. In general using row typing when static typing is possible is near universally a sign of bad code.

Now there are good uses of row typing, such as witnesses if you don’t have a good first class module system, which OCaml has, hence why row typing isn’t used in OCaml; OCaml’s Module System has all of the advantages of row typing while enforcing names, which lets it generate better code, and able to hold non-value data, like being able to pass in previously unknown types, which row typing does not support.

Interop in what way? It still compiles the same way regardless. Like in OCaml something like:

type stuff =
| Blah of int
| Vwoop of float
| Breep of string

match thing with
| Blah i -> string_of_int i
| Vwoop f -> string_of_float f
| Breep s -> s

Would compile to assembly of something like this in Elixir:

case thing do
  i when is_integer(i) -> Integer.to_string(i)
  f when is_float(f) -> Float.to_string(f)
  s when is_binary(s) -> s
end

Since OCaml’s values are tagged with their type internally (tagged bits, so things like an integer are constrained to 31 bits instead of 32 before they become boxed, and everything else is boxed via a tagged bit system).

However, that is just the assembly backend. The Javascript backend actually boxes everything manually into a javascript shape of the form where the object has a field called tag and the indexes in the array are the field entries, so the above would generate this in javascript:

  switch (thing.tag | 0) {
    case 0 : 
        return String(thing[0]);
    case 1 : 
        return Pervasives.string_of_float(thing[0]);
    case 2 : 
        return thing[0];
    
  }

And ‘that’ style converted to Elixir would be:

case elem(thing, 0) do
  :blah -> Integer.to_string(elem(thing, 1))
  :vwoop -> Float.to_string(elem(thing, 1))
  :breep -> elem(thing, 1)
end

Or as more traditional elixir code:

case thing do
  {:blah, i} -> Integer.to_string(i)
  {:vwoop, f} -> Float.to_string(f)
  {:breep, s} -> s
end

Hmm, I wonder which is faster actually… Let’s test!!!
Given this benchmark file of case_match_tuple_bench.exs:

defmodule CaseMatchTupleBench do
  @classifiers [2, 20]
  def classifiers(), do: @classifiers

  def time(_), do: 2

  def inputs(_cla),
    do: %{
      "First" => {:First, 1},
      "Last" => {:Last, 1},
    }

  def setup(_cla), do: nil

  def teardown(_cla, _setup), do: nil

  for cla <- @classifiers do
    def actions(unquote(cla), _setup),
      do: %{
        "elem calls" => fn inp ->
          case elem(inp, 0) do
            unquote(List.flatten([
              quote(do: (:First -> elem(unquote(Macro.var(:inp, nil)), 1))),
              Enum.map(2..cla, fn v -> quote(do: (unquote(:"#{v}") -> -1)) end),
              quote(do: (:Last -> elem(unquote(Macro.var(:inp, nil)), 1))),
            ]))
          end
        end,
        "tuple matchers" => fn inp ->
          case inp do
            unquote(List.flatten([
              quote(do: ({:First, i} -> i)),
              Enum.map(2..cla, fn v -> quote(do: ({unquote(:"#{v}"), x} -> x)) end),
              quote(do: ({:Last, i} -> i)),
            ]))
          end
        end,
      }
  end
end

It benchmarks with:

╰─➤  mix bench case_match_tuple

Benchmarking Classifier: 2
==========================

Operating System: Linux"
CPU Information: AMD Phenom(tm) II X6 1090T Processor
Number of Available Cores: 6
Available memory: 15.67 GB
Elixir 1.8.1
Erlang 21.2.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
parallel: 1
inputs: First, Last
Estimated total run time: 24 s


Benchmarking elem calls with input First...
Benchmarking elem calls with input Last...
Benchmarking tuple matchers with input First...
Benchmarking tuple matchers with input Last...

##### With input First #####
Name                     ips        average  deviation         median         99th %
tuple matchers       20.40 M      0.0490 μs    ±10.60%      0.0460 μs      0.0650 μs
elem calls           15.73 M      0.0636 μs   ±107.11%      0.0600 μs       0.130 μs

Comparison: 
tuple matchers       20.40 M
elem calls           15.73 M - 1.30x slower

Memory usage statistics:

Name              Memory usage
tuple matchers           136 B
elem calls               136 B - 1.00x memory usage

**All measurements for memory usage were the same**

##### With input Last #####
Name                     ips        average  deviation         median         99th %
tuple matchers       19.63 M      0.0509 μs     ±8.80%      0.0480 μs      0.0680 μs
elem calls           15.39 M      0.0650 μs    ±13.85%      0.0670 μs      0.0850 μs

Comparison: 
tuple matchers       19.63 M
elem calls           15.39 M - 1.28x slower

Memory usage statistics:

Name              Memory usage
tuple matchers           136 B
elem calls               136 B - 1.00x memory usage

**All measurements for memory usage were the same**

Benchmarking Classifier: 20
===========================

Operating System: Linux"
CPU Information: AMD Phenom(tm) II X6 1090T Processor
Number of Available Cores: 6
Available memory: 15.67 GB
Elixir 1.8.1
Erlang 21.2.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
parallel: 1
inputs: First, Last
Estimated total run time: 24 s


Benchmarking elem calls with input First...
Benchmarking elem calls with input Last...
Benchmarking tuple matchers with input First...
Benchmarking tuple matchers with input Last...

##### With input First #####
Name                     ips        average  deviation         median         99th %
tuple matchers       17.33 M      0.0577 μs     ±8.68%      0.0540 μs      0.0820 μs
elem calls           14.65 M      0.0683 μs    ±43.35%      0.0700 μs       0.150 μs

Comparison: 
tuple matchers       17.33 M
elem calls           14.65 M - 1.18x slower

Memory usage statistics:

Name              Memory usage
tuple matchers           136 B
elem calls               136 B - 1.00x memory usage

**All measurements for memory usage were the same**

##### With input Last #####
Name                     ips        average  deviation         median         99th %
tuple matchers       17.92 M      0.0558 μs    ±11.24%      0.0520 μs      0.0731 μs
elem calls           14.70 M      0.0680 μs    ±78.15%      0.0700 μs       0.140 μs

Comparison: 
tuple matchers       17.92 M
elem calls           14.70 M - 1.22x slower

Memory usage statistics:

Name              Memory usage
tuple matchers           136 B
elem calls               136 B - 1.00x memory usage

**All measurements for memory usage were the same**

So in short, the results are:

  • When the number of case branches are low (3), and you are successfully matching the first branch then the elem call version is 1.3x slower.
    • When matching the last (second) head then it is 1.28x slower.
  • When the number of case branches are high (22), and you are successfully matching the first branch then the elem call version is 1.18x slower.
    • When matching the last (23rd) head then it is 1.22x slower.

So matching the tuple in the matcher is always faster assuming my benchmark code is right. ^.^

Either way, back to method of expose of heads.

So how will gleam handle these different cases:

  1. The erlang/elixir API returns either a {:ok, value} or :error?
  2. The erlang/elixir API returns either a {:ok, value} or {:error, reason}?
  3. The erlang/elixir API returns either a value or :nil?
  4. The erlang/elixir API returns either a value or %SomeException{...}?

For note, if ocaml gets a beam backend then all of the above could be typed properly via methods like this by assuming the generation structure that I described above (these are all error handling patterns that I see most excessively in Elixir):

(* 1. *)
type 'value basic_result =
| Ok of 'value
| Error

(* 2. This is just the normal result type in OCaml actually *)
type ('value, 'reason) result =
| Ok of 'value
| Error of 'reason

(* 3. None should really compile to erlang's `undefined` but going elixir'y here, so `nil` *)
type 'value option =
| Some of 'value [@beam.unwrapped] [@beam.guard "_0 != nil"]
| None [@beam.literal "nil"]

(* 4. *)
type ('value, 'exc) exc_option =
| Some of 'value [@beam.unwrapped] [@beam.guard "not is_map(_0) orelse map_get('__exception__', _0) !== true"]
| Exception of 'exc [@beam.unwrapped] [@beam.guard "is_map(_0) andalso map_get('__exception__', _0) === true"]

If gleam wants to interoperate with foreign interfaces (as OCaml is designed to support) then it will at the very least need to support all the above Elixir error handling patterns in a generic way.

Yeah it absolutely should have a different name. I find Elixir’s “struct” type horribly horribly misnamed as well. object is entirely fine I’d think and it does follow type based naming patterns for row typed things (which C++/Java’s style of OOP ‘technically’ is, with more fluff). If you don’t want object then perhaps just something like rowstruct, although if you are intent on using a map underneath instead of a proper row typed struct pattern then why not just call it a ‘typed map’ or something, since it is just a map with some statically typed required fields instead of unbounded field types that a normal map type would have?

Erlang and Elixir will be able to easily use them regardless the format Glean puts out, the issue is having gleam being able to write out to the specific types that erlang/elixir API’s expect as well as being able to consume them back, and with erlang/elixir being dynamically typed this is extremely varied, like at the most basic a function could return either an integer or a string as its return types, or some only differentiating factor within a deep matched value (all of which OCaml’s compiler and syntax supports just fine for foreign interop), it will need to be able to support those. You could potentially use a nameless variant type (an unnamed sum type), which wouldn’t be hard to compile down on the beam, but you’ll start to have major issues when you want to compile down to something else, like LLVM. Most languages don’t contain anonymous sum types because without universally tagged value types, and OCaml only tags in some cases but not all, then compiling becomes ambiguous (this is the union type in C for example) and there become ways to break the type system. OCaml supports such patterns by telling the backend via attributes (the [@...] stuff above) what it should do (and attributes unknown to a backend are ignored). Most OCaml attributes are generally only put in by library authors for various reasons but there are useful ones in end-programmer code too (like very efficient json generators around types and so forth!). ^.^

So… they aren’t records then? o.O

They are records in the same way that Elm and Purescript’s records are records. Names are hard!

Hmm? Do tell? It’s syntax is very specifically designed to be entirely unambiguous

Actually I think after reflecting on what you’ve said about tuples and multiple arguments I’m incorrect here. I didn’t understand how they worked :slight_smile:

Interop in what way? It still compiles the same way regardless.

In the way of Erlang and Elixir programmers being able to easily tell what the underlying data structure will be so they can construct it on their side.

Gleam will not do any performance optimisations as OCaml does.

So how will gleam handle these different cases:

Gleam will not handle these in any special way. The standard library strongly encourages using {ok, V} | {error, E}, and any other pattern will need to be handled explicitly by the programmer, likely by defining enums as you have in the OCaml example.

rowstruct
typedmap

They’re not structs and Gleam already has a typed Map. I’m leaning towards object at the moment as it’s probably the least overloaded for BEAM programmers.

1 Like

Yes, that’s right!

The sum types (enums) will be based off of tagged tuples.

1 Like

Both of which compile to javascript, and their records compile to class objects, which then the JIT optimizes since they tend to have uniform shapes, that’s not the case in the beam world, or really anywhere else for that matter. ^.^;

Hmm? I’m curious what was thought before (initial thoughts are important as they tend to be first what someone notices about something!).

There’s a huge difference between developing an API and expecting it to be consumed by the foreign interface, and using an API that is on the other side of the foreign interface, so it will need to handle those if it expects to be able to consume erlang/elixir API’s.

I mean I typed distinct fields in a map rather than the map itself being typed (which would of course just be the normal map type). ^.^

1 Like

Yeah, we’ll have the same performance penalty as Elixir structs here.

X of int * int
X of (int * int)

Both can be written as X(1, 1), though one is a single item tuple? I’m unsure now. My OCaml isn’t very strong :slight_smile:

Ah but beam/erlang records compile down to structs, so if someone is using erlang and they go to gleam they would probably expect a gleam record == erlang record. 6.^

This is one of those efficiency quirks of OCaml! ^.^

Specifically, a variant head is multi-arg, which allows a variant to be tightly packed, the type form of this looks like a normal tuple, but this is a holdover from SML I think it was. The type of a head with an explicit single-arg tuple has the parenthesis, however this also means that the head becomes just 2 elements, the tag and the tuple, then of course the tuple can have many elements, this makes it more efficient to pass around the tuple of data as a single reference, but makes it less efficient to decompose the head directly, so OCaml added both forms. Now the quirk of syntax is that the variant head is special cased to take multi-arguments and holds it as a literal tuple internally until it performs the type check and figures out whether to separate it into arguments or make it into a normal boxed tuple, that is why the parenthesis works for each and indeed this part is considered a bit of a wart in the syntax, and it would be easy to fix however it would not be backwards compatible and backwards compatibility is of prime concern to them, so this old holdover from pre-ocaml days still persists. :slight_smile:

For note, there are 3 ‘types’ of head, they can be mixed and match as you want they just change various access and performance characteristics. These are:

  • The packed argument head Blah of int * int * int, this would basically map in Elixir to a {:blah, 1, 2, 3}. This of course supports embedded tuples just fine like in Blah of (int * int) * int, which in Elixir would be {:blah, {1, 2}, 3}.
  • The embedded tuple head ``Blah of (int * int * int), this would basically map in Elixir to a {:blah, {1, 2, 3}}`,
  • The packed record head Blah of {x: int, y: int}, which would basically map in Elixir to a {:blah, 1, 2}, and yes that means that you can’t pull a record out wholesale, it is still efficiently packed into a single object with little to no wrapping, basically like an Erlang record. This means you can’t do something like match blah with | Blah r -> r as there is no record object, it’s packed/lowered into the head itself for efficiency, rather you have to access it via parts or match it out directly like match blah with | Blah r -> r.x or match blah with | Blah {x; _} -> x (and yes OCaml supports punning records in all places, the _ in the matcher in a record means ignore all other fields).

To do an embedded record argument you have to do it in two steps (as records themselves have to be ‘named’):

type my_record = {x: int, y: int}
type my_variant = Vector2 of my_record

The above is what OCaml wants to enforce with tuples too, but it wouldn’t be backwards compatible so it is not done.

Saying that JavaScript object field access is O(1) is a huge simplification :stuck_out_tongue:

Most JavaScript engines employ very complex techniques, but the default implementation of field access is just a linear search over the fields of the object - the runtime/JIT can later optimise it to a more efficient implementation, including C-like struct access, but if they fail to optimise, they fall back to this linear search implementation. Exactly the same techniques could be available to BEAM if it ever gets a JIT.

Constraining the design by the current implementation is a bit backwards - it’s the design that has to be right and the implementation’s task is to make that design efficient, not the other way round. If maps provide useful abstractions and easy implementation - use maps! The runtime can always be optimised if it turns out to be too slow. That is also how Erlang was designed - first get the design right, then figure out how to make it fast enough.

5 Likes

Only while the shape is consistent and monomorphic as mentioned prior. ^.^

When it is not then it calls back to at best a virtual dispatch and at work a map lookup through potentially multiple boxes, not as pretty then, lol.

Thankfully languages like OCaml/ReasonML, Elm, Purescript, etc… all try to emphasis forcing using a single shape in the calls. :slight_smile:

As I recall reading on V8, even in the worse case it is only a hash lookup, but that lookup is via hashing the boxed pointer to the field value, so it is a lot of indirections to go through on top of hashing itself, but still better than pure linear (which really only matters on huge calls), and at those times the runtime will often optimize it to a lower level faster hashmap.

Very true, which is why I focused on row-typed records being bad conceptually to fit data in as it allows types to change in surprising ways while still allowing a successful compile of potentially very bad code. :slight_smile:

A couple of tiny updates as we approach Gleam’s first release :tada:

The Gleam compiler has learnt how to to suggest an alternative when it encounters a type or name it doesn’t recognise, to help with pesky typos :slight_smile:

The record type has been renamed to “map”, which I think should be less confusing in the presence of Erlang records and their non-O(1) access cost.

The final few items of work before v0.1 can be tracked here → Issues · gleam-lang/gleam · GitHub

6 Likes

Gleam has just got its first release, v0.1.0! Read more here: Hello, Gleam! – lpil.uk

I am very happy to finally be here :slight_smile:

21 Likes

Congratulations! Great progress. Will try it out on an upcoming project (some basic astronomy stuff)

3 Likes