Why do we still need immutability when memory is not shared?

Aetherus · July 29, 2020, 6:15am

If the purpose of immutability is for sharing things among multiple processes without locking, why do we need immutability at all since Erlang/Elixir processes don’t share memory?

hubertlepicki · July 29, 2020, 6:39am

Yes, it’s not just concurrent code that can have unexpected side-effects with mutable variables.

Consider closures / anonymous functions and their scope. I wrote a blog post early about it when I was getting into Elixir: https://www.amberbit.com/blog/2015/6/14/closures-elixir-vs-ruby-vs-javascript/

Particularly this is an issue when you define anonymous function that references some value from outside of it’s defiition, and that value changes unexpectedly later, after the function has been declared but before it was called, or changes between executions of the function.

jwarlander · July 29, 2020, 7:03am

Immutability is also at the core of many optimizations that allows Elixir / Erlang to handle eg. lists efficiently… The following blog post discusses some of that:

Jacek · July 29, 2020, 7:10am

There are even simpler cases than the ones pointed out by Hubert where you can’t be sure about values of variables you defined. Let’s have a look at a simple JS piece of code:

let a = [1, 2, 3];
foo(a);

Can you be sure about the content of a after foo was called? Of course, you can argue that you’d never write a function which modifies its arguments but can you be sure that a library you use doesn’t do anything like that?

mitizhi · July 29, 2020, 7:19am

By immutability I assume you mean that Erlang/Elixir do not allow values stored in memory change, that there are no global or module scoped variables, that every variable definition is within function scope, or comes into the function as an argument. Erlang goes even further by disallowing the reassignment to a variable (inside the functions), though this is not a deep difference in principle.

Without mutable memory, you can write and understand code naturally in isolation. Things are more explicit as a result. You do not need to track state changes, everything is just passed from a function to another, each function doing their little thing. As a result, code is easier to reason about, which leads to easier maintainability. The forced discipline also makes it easier to write correct code.

mindok · July 29, 2020, 7:23am

Python’s mutable default arguments take this to a whole different level of surprise…

From https://docs.python-guide.org/writing/gotchas/

def append_to(element, to=[]):
    to.append(element)
    return to

my_list = append_to(12)
print(my_list) #Result: [12]

my_other_list = append_to(42)
print(my_other_list) #Result: [12, 42]

Aetherus · July 29, 2020, 9:35am

I didn’t know Python is that warty
Immutability certainly eliminates that kind of problems.
I just wonder that could it be better if Erlang/Elixir had a way to define mutable data structures so that certain algorithms could be implemented in pure Erlang/Elixir without NIF (like sorting), and remove the mutability after that.

hauleth · July 29, 2020, 11:56am

You can “hack it around” with process dictionary in many cases, but in general the answer is - no, not really. Especially as a lot of algorithms (to be exact - all of them, by the Church-Turing Theorem) can be written in immutable manner. Not all will be performant enough, but in such cases you would use NIF anyway (for example linear algebra). Sorting can be done efficiently with immutable structures as well (merge sort), and most of other cases can be improved either by clever TCO or by memoization.

Qqwy · July 29, 2020, 1:14pm

Immutability is a sane default to make sure code does not do unexpected things when interacting with other code. Not sharing datastructures between processes is another such a sane default.
(I actually think immutability might have been part of Erlang back during its initial proof-of-concept version when it was implemented as a ‘Prolog library’. Prolog shares a similar approach to immutability, since it is a declarative language.)

However, there definitely are possibilities to ‘opt out’ for both of these. For the obvious reasons that have been mentioned by other people earlier in this topic, these are only there as a ‘last resort’ when e.g. you find out that you really do have a performance problem by benchmarking your code in practice.

Use ETS to circumvent ‘memory sharing’. More specifically, ETS tables (in certain configurations) allow multiple processes (not only the creator process) to read (and possibly even write) to the table. In certain cases this is useful if you have a process that turns out to have become a bottleneck. This is probably the most common one of the four techniques listed here.
Use persistent_term when you have data that needs to be read very frequently but written infrequently. This fully allows all processes to read from the same memory.
You can read/write to the process’ dictionary at any time as @hauleth already mentioned, somewhat going against the ‘immutability’ idea. (alternatively, one might consider all normal Elixir/Erlang code to occur within a ‘state’ monad. In other words: immutability is not fully broken.) This is mostly useful for e.g. tracing and other ‘temporary’ assignments that should not interrupt the normal flow of the code.
Write NIFs that use ‘resource objects’. Native Implemented Functions are able to break many of the rules Elixir/Erlang themselves provide. Resources are a datatype that the BEAM supports which essentially is ‘just’ a pointer that you can pass back to a NIF at a later time. This allows you to use mutable datastructures inside Erlang/Elixir without any problems… except that they will work exactly as surprising (c.f. the earlier post by @mindok about Python) as one would expect.

lud · July 29, 2020, 1:45pm

You mean that a NIF can change the value pointed by a pointer that is accessible from erlang code ?

Qqwy · July 29, 2020, 1:54pm

Correct. a ‘resource’ is a tiny wrapper around a pointer to a chunk of raw memory in RAM. A NIF can read from and write to that memory (as well as reallocating, i.e. growing/shrinking it) as much as it pleases. See the section of the Erlang guide about Resource objects for more information.

rvirding · July 29, 2020, 2:20pm

Note that in the definition of Erlang all data structures are immutable, but it does not mention anything about sharing data between processes or not. All it says is that processes are isolated. This together immutability means that one process can never affect the data in another process. So the not sharing we have now is actually an implementation detail, and there have been done a number of experimental implemetations which have had a single heap containing all data. From the Erlang POV you don’t notice the difference.

And immutability is very nice and makes understanding what is going on much easier. Things just don’t happen to change while you are working on them. It is also the core of most functional languages.

semmitmondo · August 4, 2020, 8:01am

I asked the same question to myself some time ago, and one of the not-already-mentioned answers is that because of the particular GC that BEAM uses. The GC can be more efficient if the older data structures just cannot reference the newer parts, and immutability guarantees that. (And also it makes reasoning about your code (inside a process, without concurrency) simpler. And also this is how FP is done traditionally. Mutable data structures are evil, and every time you think you need one, it’s always possible to work it around with immutable ones, even inside the same process. If it’s allowed to use another process for the workaround, then it makes it even simpler.)

hauleth · August 4, 2020, 9:22am

I would not say that. Mutable shareable data structures are evil, that is true though.