FunLand: Algebraic("Container") Data Types for Elixir - Prerelease

Qqwy · July 22, 2016, 9:24pm

Because of popular demand, and because I’ll probably be busy the next couple of days, so it would need to wait a lot longer if I didn’t publish it now, here it is:

FunLand: This is a package that adds a couple behavours to your Elixir application, which you can use to define Algebraic Data Types.

What exactly are Algebraic Data Types?

They are basically containers for simpler types, in all kinds and shapes.

Some common examples are:

Lists (as use already every day in Elixir)
Tuples
Trees
Maybe, which either contains a single value or nothing. This allows for propagation of failures in more complex operations.
Writers, which allow you to keep track of something (such as a log) in the background while passing it through multiple operations that work on simple values.

Why are Algebraic Data Types useful?

Algebraic Data Types are useful in the same way that using loops is useful: They let you re-use a set of operations you already had on a much larger set of inputs/problems.

For instance, Mappable.map lets you re-use any function that works on a single simple type, to tranform the contents of a collection of things of that type:

Mappable.map(input, fn x -> x*2 end) would transform the list [1,2,3] into the list [2,4,6], the tuple {3,1,4} into {6,2,8}, Maybe.just(6) into Maybe.just(12), etc.

This pre-release of FunLand is mainly because I would like some feedback, and to find out if the design choices I have made so far are sound. To implement Abstract Data Types turned out to be a larger endeavour than I had expected. I hope that FunLand will be able to explain to newcomers how ADTs work and why they are useful, and make it easy for people to define their own.

also, Pull Requests are very welcome!

Sincerely,

~Wiebe-Marten/Qqwy

OvermindDL1 · July 22, 2016, 9:35pm

Been waiting for this. ^.^

EDIT: Going to add an Either/Result that has a left or right type? Usually used in the same vein as Maybe but usually as return values to hold either a success value and result or an error value and reason (equiv to the usual {:ok, something}/{:error, reason}) with the usual map and such helpers on them as well?

Qqwy · July 22, 2016, 9:45pm

Yes, as you can see on the Roadmap (but please do tell if it isn’t clear enough), Either/Result is on there.

OvermindDL1 · July 22, 2016, 9:52pm

Ah, I did see it but I glossed over it as it looked like a function that took two maybe’s and returned the first that was set. Either/Result should not be two things with possible states but rather two possible things. The difference between Either(Maybe a, Maybe b) and Either(a or b) I guess?

Qqwy · July 23, 2016, 12:37pm

@OvermindDL1: You’re right. I dug a bit deeper into the source of Haskell et al, and now I finally understand what Either actually is. It is similar to Maybe, with the difference that the fallback(error, null, etc) value is not static ‘nothing’, but could be anything you like, so you have more information on e.g. at what place something went away from the happy path.

I have released version 0.6 which includes Either. I am not entirely certain about the name Either, though, since I find it somewhat unclear from the name that ‘left’ is thought of as the fallback answer.

I believe that Scala uses ‘Option’ and some other languages use ‘Result’, but these also seem somewhat vague. I am still looking for a better name.

NobbZ · July 23, 2016, 3:15pm

I can’t speak for Scala, but for Rust I can tell that Result<A, B> is similar to Haskells Either a b, while Rusts Option<A> is similar to Haskells Maybe a.

Also even if often done like this, I wouldn’t say that Either is to signal value or error, but signals the possible outcome of a computation. Consider \n -> if n >= 0 then Right n else Left (-n) (Haskell). This example is somewhat constructed but shows that Either is not always about an error.

Thats the reason why there is (in Rust at least) often an additional Error<A>/Error a which does alias to some according Either String a.

Also, even if it is common and idiomatic to use Right for success, this is not necessarily true for every language! Idris does use Either the other way round and uses Left for success (its not common though to use Maybe or Either though).

Qqwy · July 23, 2016, 3:38pm

@NobbZ thank you very much! I am relatively new to algebraic data types myself, and FunLand is created mostly to make it clear to newcomers what these things are. Feedback like this is greatly appreciated.

OvermindDL1 · July 23, 2016, 4:29pm

Yeah I’ve used both Result and Maybe in the same language, where Maybe is basically a 2-element union (actual union types are better, and if the language has actual union types like elm then there is no maybe), and Result is basically the same thing but its two types are usually Ok (right) and Err (left).

NobbZ · July 23, 2016, 4:47pm

You describe ADTs as Containers for some other types, thats not the only truth. ADTs are much more complex.

When introducing ADTs I’d do roughly the following order:

Simple enumerations as in data Bool = False | True or data Fruits = Apple | Peach | Orange.
Record/Struct Like as in data Person = Person String Date.
Replacement for tagged unions as in data NPC = Monster String Int | Merchant String [(Item, Int)].
Introduce type-variables as in data Maybe a = Just a | Nothing and explain that this is an extension to the former ones.

ADT itself are not necessary to define classes (Haskell term) or interfaces (Idris term) since they are very similar to what the OO-World does call an interface ever since.

As you can see, ADT is more or less an abstraction of what the C world does know as 3 separate concepts: enum, struct, and (tagged) union.

NobbZ · July 23, 2016, 4:55pm

Can you please elaborate what you mean by “union types are better”?

Also when I compile my stuff to something baremetal, I’d be glad if Maybe would get compiled to some pointer type, where Nothing is represented as NULL. This is called a “zero-cost abstraction”, a buzzword that has been more or less introduced by the Mozilla Foundation and Rust

OvermindDL1 · July 23, 2016, 5:26pm

Well in Elm parlance (since that is what I’ve been doing a lot lately and its syntax is stuck in my head), a Either can only represent two values, which might be fine, however left/right is… not descriptive either (Elm has Maybe on javascript represented as null or the value). Compared to a Union type:

type MyWellTypedMaybe
  = ImAString String
  | ImAnInt Int

Then just use it as ImAString "test" or ImAnInt 42. That is far more descriptive than something like Left "test" or Right 42. You could implement Either in the union types, or you could implement Result (and in fact Elm does implement Result like this):

type Result err suc
  = Err err
  | Ok suc

And you use it like:

myFunc : (Result String Int) -> blah
myFunc res = do stuff

myFunc Ok 42
-- or
myFunc Err "a string"
-- or assign to variables or whatever
let
  a = Ok 42
in
  myFunc a

Proper Tagged Union Types can build up any of those others with ease. I really wish Elixir had a form of Tagged Union types built in, could be done as something like:

defunion UnionName do
  MyType {s:string(), i:int()}
  AnotherType i:int(), when i>3
  MoreType something:map()
end

So imagine defunion being a macro, takes a union name and a do body, the body defines a set of types that the union could be along with names, typespecs, and an optional when clause of what they store. Internally it would just be represented as a normal erlang tagged tuple, so doing something like UnionName.MyType("string", 42) would just return {UnionName.MyType, {"string", 42}} or so. The constructors verify the types are correct and the constraints if any provided. You could then do something like a special unioncase or so:

s = UnionName.AnotherType(42)
a = unioncase UnionName, s, do
  MyType t -> IO.inspect {:mytype, t}
  AnotherType i ->IO.inspect {:anothertype, i}
  MoreType m ->IO.inspect {:moretype, m}
end

And so a would be {:anothertype, 42} in this case. However the important bit would be that unioncase would require the union, access it at compile-time to find out what makes it up, and make sure that all cases are covered, so something like this:

s = UnionName.AnotherType(42)
a = unioncase UnionName, s, do
  MyType t -> IO.inspect {:mytype, t}
  AnotherType i ->IO.inspect {:anothertype, i}
end

Should cause a compile-time warning/error about not all union cases are covered, missing the ‘MoreType’ case, or something like:

s = UnionName.AnotherType(42)
a = unioncase UnionName, s, do
  MyType t -> IO.inspect {:mytype, t}
  AnotherType i, when i>100 ->IO.inspect {:anothertype, i}
  MoreType m ->IO.inspect {:moretype, m}
end

And this would cause a warning/error about something like not all union cases are covered, AnotherType is not covered from i>=3 and i<=100 or so. Of course a _ -> would cover everything.

/me is a huge proponant of good type systems, the only real thing that Erlang is missing for me, the number of times the lack of a type system bites my butt is far far far more than it would take me time to fix the issues to start with by a type system yelling at me.

Qqwy · July 23, 2016, 5:51pm

@NobbZ You are right. Naming Algebraic Data Types ‘containers’ does not cover all of them, but it seemed like a simple word to use for people who had never heard of them before.

A possibly better alternative, that is still understandable for outsiders, might be Composite Types.

I like your introduction, and it actually hadn’t occured to me before that enumerated types are a kind of ADT as well.

NobbZ · July 23, 2016, 6:36pm

Compared to a Union type:

type MyWellTypedMaybe
  = ImAString String
  | ImAnInt Int

This is a specialised version of Either a b, namely Either String Int. In a language like elm, which does not know about classes or interfaces, it makes absolutely sense to use specialised versions.

But for Haskell or Idris or even Elixir + fun_land where you can define some kind of interface around some type like Foldable, Monad and whatnot, it’s much easier for a programmer to have that generic Either a b and reuse its instances.

Personally, I really like some of the pre 0.17 concepts of elm, but the absence of classes/interfaces was a showstopper for me. And now with 0.17 where they are moving away from FRP, I am not sure how this will continue and I will continue my search for some good JS replacement…

Also the thing you are calling union-type here is called sum-type in Haskell, since it can represent as many values as Int + String could. On the opposite we have product-types (type Person = Person String Int) which can represent as many values as String * Int could.

OvermindDL1 · July 23, 2016, 6:45pm

In Elm you could implement Either like:

type Either left right
  = Left left
  | Right right

Then you could just do Either String Int or whatever. I am actually really big on the lack of classes/interfaces myself, those are often just a poorly made crutch for lack of proper function passing/currying/closures, which Elm has fine.

(EDIT: And I really really like exceptionally really hate a generic Either, I’ve never seen a case where a properly named enum/union type is not better. In the example supplied of Foldable and Monad a Result would be more descriptive as Ok/Error make sense, Left/Right does not. Naming is important.)

Eh not entirely, a union in Elm could also be something like this:

type Msg
    = Mdl Material.Msg
    | InfoConnect Int
    | RoomList_Init Int RoomDefs
    | SelectRoom Int
    | Room_Connect Int
    | Room_Info Int {}
    | RoomMsg_Init Int RoomMessages
    | RoomMsg_Add Int RoomMessages
    | RoomMsg_SyncState Int RoomSyncState
    | RoomMsg_SyncJoin Int RoomSyncEvent
    | RoomMsg_SyncLeave Int RoomSyncEvent
    | ClickUser String
    | MsgInput_Change String
    | MsgInput_Send
    | MsgInput_ToggleMultiline

It is not a sum type, it is a tagged union (in Erlang/Elixir parlance it is a 2-tuple of {:tagname, data}). You can have empty data tags like MsgInput_Send and MsgInput_ToggleMultiline above. You can have types with duplicate data types like ClickUser String and MsgInput_Change String. The identifier after the | is a type name, you can even make things that take that given type instead of the whole union, the things after it up to the next | are the data, if any, that the type holds.

But yeah, classes/interfaces missing are a good thing.

Qqwy · July 23, 2016, 7:05pm

I dont understand of how the genetic Either example you gave for Elm is different from a type constructor in e.g. Haskell. Could you elaborate more?

Qqwy · July 23, 2016, 7:10pm

I completely agree that a bad generalisation is worse than no generalisation, but on the other hand, good generalisation can prevent you a lot of work. The only ‘problem’ I see, is that the more abstract you try to go, the harder it gets to name things well.

OvermindDL1 · July 23, 2016, 7:19pm

It’s not, that’s the point.

Elm types can take arguments so you can build up more complex types where-ever you need. I’ve even found cases where it is useful to extend types, which in Elm you can do like:

-- Convoluted example just to show off syntax
type alias Pos2 baseType valType
  = { baseType
    | x : valType
    , y : valType
    }

type alias Pos3 valType
  = Pos2 { z : valType } valType

type alias Pos3Float
  = Pos3 Float

-- All of these are equal because I used alias above, alias means unnamed type, purely structural
Pos2 {z : Float} Float
Pos3 Float
Pos3Float

-- This structure would satisfy them all
{ x: 3.14
, y: 4.25
, z: 5.36
}

NobbZ · July 23, 2016, 7:20pm

Which is perfectly fine a sum-type.

type Msg
    = Mdl Material.Msg  -- as many values as Material.Msg has
    | InfoConnect Int -- as many values as Int has
    | RoomList_Init Int RoomDefs -- as many values as Int * RoomDefs has
    | SelectRoom Int -- as many values as Int has
    | Room_Connect Int -- as many values as Int has
    | Room_Info Int {} -- as many values as Int * {} has (and {} is HUGE!)
    | RoomMsg_Init Int RoomMessages -- as many values as Int * RoomMessages has
    | RoomMsg_Add Int RoomMessages -- as many values as Int * RoomMessages has
    | RoomMsg_SyncState Int RoomSyncState -- as many values as Int * RoomSyncSte has
    | RoomMsg_SyncJoin Int RoomSyncEvent -- as many values as Int * RoomSyncEvent has
    | RoomMsg_SyncLeave Int RoomSyncEvent -- as many values as Int * RoomSyncEvent has
    | ClickUser String -- as many values as String has
    | MsgInput_Change String -- as many values as String has
    | MsgInput_Send -- exactly 1 value
    | MsgInput_ToggleMultiline  -- exactly 1 value

The type Msg itself has as many values as the sum of all tags have.

If something is not a sumtype, it has to be a product type.

NB: Because we can “sum” and “multiply” the cardinality of types (sets) we named it algebraic data types

OvermindDL1 · July 23, 2016, 7:22pm

Well in Elm the {} is the empty record, it will store nothing.

NobbZ · July 23, 2016, 7:27pm

This example is merely syntax, and I am not really sure if I like it beeing able to inject additional fields into a type like that… I could go back to ruby and monkey patch there…

And {} is similar to haskells () then? OK, I thought it was like “any record”, similar to map/0-type in Elixir.