Storing logic / functions

WuhKuh · June 7, 2017, 3:17pm

Good afternoon,

Currently I’m trying to save and load ‘objects’ with custom functions. For example, imagine the periodic table with different bonding mechanisms for every element. During runtime, I want to be able to add, edit and remove these elements and their bonding mechanisms (which are represented as functions).

It is important to store the current configuration on exit/crash, so something that persists these elements with their functions is required. On first sight it seems I have to pass the elements and their respective bonding mechanism as arguments to a function that will handle interacting elements.

The problem: I don’t know how to (let’s say best practice to) store and load these elements. Can anyone help me in the right direction?

idi527 · June 7, 2017, 4:28pm

Where do you keep the state currently?

If in processes (like genservers), are these processes supervised?

If so, you can create a public ETS table linked to the supervisor process

:ets.new(:state_keeper, [:public, :named_table])

in supervisor’s init/1 and store the state of crashed processes there (in their terminate/2 callbacks if they are genservers).

You can also persist the contents of ETS to disk from time to time using something like https://github.com/michalmuskala/persistent_ets.

WuhKuh · June 7, 2017, 4:48pm

Yes, I’m keeping a GenServer that stores the state under the first Supervisor in the tree. I’m more worried that there could be an event that shuts the computer down which hosts the application, for example power outage. That’s the reason I’d like to store the state to disk.

I’m planning to store the state to disk whenever something gets changed, which shouldn’t be too often. I think DETS would be preferable over persistent_ets (after taking a loop at the repo) as this should keep the persistence updated no matter the time.

idi527 · June 7, 2017, 5:11pm

So you will store every change in state to disk? Then you can dismiss my pervious message, since the terminate/2 callback won’t be called in the case of power outage.

WuhKuh · June 7, 2017, 5:31pm

It did guide me to the solution I was looking for: On change, I wanted to store functions with the element to disk, so don’t worry.
This erlang question confirms that DETS/ETS/Mnesia should be able to store functions.

I asked this question because I had too much focus on storing this into a file format I was used to, like JSON due to my background in Javascript. This got me into the XY problem which only led me further away from the solution.

peerreynders · June 7, 2017, 6:19pm

I would reconsider this approach. Functions can reference code (modules) that may have existed when the function was stored but that may not exist when it is later retrieved. To me it makes more sense to store the data necessary to “create the function” (which is typically information that is stored within its closure) when it is later needed - data is usually more easily migrated.

terminate is also not invoked when a process which is not trapping exits receives an exit signal other than :normal and for those trapping exits a) have to actually decide to terminate upon receiving and :EXIT message and b) don’t get to call terminate when they are the target of a :kill exit signal.

Furthermore from a design standpoint the process state could already be compromised by the time terminate runs. So the process should send off its state for “backup” any time it updates it AND it is certain that the state is in a sane and consistent state. Programming Elixir 1.3 demonstrates this approach in Chapter 18 Supervisors where the Stash worker is actually higher up in the supervision tree than the Sequence worker.

WuhKuh · June 7, 2017, 6:57pm

This was my initial approach, but there are so many variables to keep track of that it didn’t seem easy to implement. Moreover, if I want to add an element with a nonexisting property, I would have to make code changes. This is why I consider storing a function.

Eventually, I want to be able to write these functions from within a browser and send them to the machine. I’m currently thinking of doing this using the following approach:

{function, _} = Code.eval_string("&#{input}")

This will be stored together with the element in DEPS.
Whenever bond (a generic function) is called, it will reference to the elements involved and call their corresponding function with the necessary arguments.

I understand that this could be a big security hole, so the necessary security measures will be put in place.

Also, I don’t exactly see how the code is going to reference missing modules.
I’m still a rookie, so feel free to prove me wrong.

peerreynders · June 8, 2017, 5:40pm

Take a closer look at the reference you posted yourself - in particular this. Because anonymous functions are attached to the module that created them, changing the name of the creating module after the anonymous function has already been stored will make the function crash upon execution later, once:

the old module has been purged from memory
and the corresponding old beam file has been deleted

The same thing will happen if you rename or remove modules with functions that the anonymous function explicitly references. So storing anonymous functions outside of code will be a major impediment to refactoring.

Proper security isn’t about convenience but about constraining everything to just the most minimal variances necessary within which the job can still be accomplished (of course that requires you to be very specific about what you want to accomplish). The approach you are proposing is tantamount to suggesting to leaving the floodgates open to flood the city and relying on pumps to drain the flood water (or at least building some ridiculously expensive “flood-locks”).

What do you mean by “property”? Typically “properties” are simply represented as key-value pairs and Elixir has Map as its de facto key-value store - both keys and data can be added, updated and deleted at run-time. It is struct which requires code changes.

I suspect that your solution approach is still constrained by a paradigm that is different from the one that Elixir operates in (José Valim: Beyond Functional Programming with Elixir and Erlang).

WuhKuh · June 9, 2017, 6:21pm

This actually is the problem I’m facing currently. Making the switch from Javascript is harder than it seems. I’ve tried two courses on Udemy which are quite scarce in resources (no OTP coverage). Elixir Sips seems to do a better job, which I’m following at the moment.

I mean that a bonding mechanism can have fundamental differences. This is why I’d like to be able to write custom functions for each element. Your initially proposed solution was to assign variables which differ with each element.
I’d use this solution if it wouldn’t create complex code, which is poorly maintainable if you have to account for a huge amount of variables. Not to mention variables which should change over time.

I agree with your points made.

However, I’d still like to have a functionality that gives me the flexibility to add new functions to the server running the code (assuming this program won’t be run locally). This is why I thought of a management system exposed to the local network equipped with the necessary authorization layer, just like I can access the command prompt of my firewall whenever I’m logged in as an admin.

I’ve read that remark from the source, but I don’t understand how it would affect the code. I can’t see the point of failure yet… Could you explain it for me when I’m using the following process?

I will have a module (a GenServer) that’ll be responsible for maintaining the state and updating the database.
When using the approach from earlier, a call from the webserver (a GenServer) gets passed to the state GenServer, saying it wants to add an element with a function.
The GenServer will add this element and function to its current state and to the database.
A call from the webserver: remove this element!
The GenServer will remove this element and function from both the state and the database.

The creating module will be either the webserver handling the call or the GenServer storing the call, depending on who uses eval_string right? This can only go wrong whenever one of those two crashes, if I understand correctly.

It would seem logical to me to use the state GenServer to do this, as it’ll load the functions from DEPS/Mnesia as well.

peerreynders · June 10, 2017, 8:38am

The creating module in the case of Code.eval_string/3 will always be :erl_eval. So while there is little danger of it going away there are could be some other interesting challenges lurking, in particular:

During evaluation of a function, no calls can be made to local functions. An undefined function error would be generated. However, the optional argument LocalFunctionHandler can be used to define a function that is called when there is a call to a local function.

and

The optional argument NonLocalFunctionHandler can be used to define a function that is called in the following cases:

A functional object (fun) is called.

A built-in function is called.
A function is called using the M:F syntax, where M and F are atoms or expressions.
An operator Op/A is called (this is handled as a call to function erlang:Op/A).

Now Code.eval_string/3 seems to handle this via the :functions option but essentially it seems to boil to the fact that any and all functions referenced within the string may have to be made available through the :functions option (or at least the modules need to be listed with the :requires option).

So the question becomes if you have already done a mockup of a scenario involving a reasonably complex “storable” function and got it to work - because it may not be as “easy” to generate these functions as you imagine - unless these functions don’t reference any other functions of note.

The issue is that you are planning to run potentially untrusted code - the necessary quarantine measures will likely erode most of the initially perceived “flexibility”. “specification data” can be designed to “sanitize-able” and “validate-able”, assessing the threat posed by unsigned/untrusted code is in an entirely different league.

Like isn’t need. And your initial complaint was that there were “so many variables to keep track” of - and from Elixir v1.2 they are capable of holding millions of keys efficiently. And while it is beneficial to delay some design decision as long as possible, it is also necessary and beneficial to choose the optimal constraints - without constraints there is only uncertainty because there is nothing you can rule out, safely ignore or disregard.

As I mentioned here OTP requires that you first learn about sequential Elixir and then concurrent Elixir - so it’s not surprising that entry level Elixir courses don’t even touch on OTP. Also video courses are only of limited value unless they contain a significant non-video exercise and “more substantial project” component.

For example Functional and Concurrent Programming in Erlang cites 5 hours per week for 3 weeks (each) which is enough to watch the lectures but to complete the exercises and projects it essentially turns into a part if not full-time job. And the first three weeks only cover pattern-matching, body and tail recursion, lists, and higher order functions - but completion of the exercises and project work helps tremendously with the formation of the “functional mindset”. And by the end of the second 3 weeks there is finally a single gen_server exercise (a third OTP oriented follow-up course is said to be in development).

OvermindDL1 · June 12, 2017, 6:31pm

Let me demonstrate why storing anon functions into persistent storage, say via DETS, is bad (you can store in ETS fine, just don’t persist it out and don’t hot-reload code):

iex(1)> defmodule Testering do
...(1)>   def get, do: fn v -> v end
...(1)> end
{:module, Testering,
 <<70, 79, 82, 49, 0, 0, 4, 220, 66, 69, 65, 77, 69, 120, 68, 99, 0, 0, 0, 126,
   131, 104, 2, 100, 0, 14, 101, 108, 105, 120, 105, 114, 95, 100, 111, 99, 115,
   95, 118, 49, 108, 0, 0, 0, 4, 104, 2, ...>>, {:get, 0}}
iex(2)> id = Testering.get()
#Function<0.108102960/1 in Testering.get/0>
iex(3)> b = :erlang.term_to_binary(id)
<<131, 112, 0, 0, 0, 81, 1, 206, 48, 166, 31, 173, 185, 46, 112, 247, 209, 191,
  102, 180, 238, 231, 231, 0, 0, 0, 0, 0, 0, 0, 0, 100, 0, 16, 69, 108, 105,
  120, 105, 114, 46, 84, 101, 115, 116, 101, 114, 105, 110, 103, ...>>
iex(4)> inspect(b, limit: :infinity)
"<<131, 112, 0, 0, 0, 81, 1, 206, 48, 166, 31, 173, 185, 46, 112, 247, 209, 191, 102, 180, 238, 231, 231, 0, 0, 0, 0, 0, 0, 0, 0, 100, 0, 16, 69, 108, 105, 120, 105, 114, 46, 84, 101, 115, 116, 101, 114, 105, 110, 103, 97, 0, 98, 6, 113, 133, 48, 103, 100, 0, 13, 110, 111, 110, 111, 100, 101, 64, 110, 111, 104, 111, 115, 116, 0, 0, 0, 80, 0, 0, 0, 0, 0>>"

So on expressions 1 and 2 I define a function in a module and return a new anonymous function from it. It is given the randomly auto-generated name of #Function<0.108102960/1 in Testering.get/0>, and for all intents and purposes you can consider that as its name (internally it is more of a unique reference that gets even more complex if it is a closure).

On expression 3 I convert that anonymous function to a binary (and on expression 4 I print it out in full), which is precisely what ETS and DETS and mnesia and others do, so this is what is actually stored.

Now, the 131 at the start is basically the ‘version’ of the term format, so we can ignore it.

Next the 112 means that it is a NEW_FUN_EXT, which is defined as (we really really need a table syntax plugin for this Discourse install…):

1    4     1      16    4      4        N1      N2        N3       N4   N5
112  Size  Arity  Uniq  Index  NumFree  Module  OldIndex  OldUniq  Pid  Free Vars

So if we decompose that with Erlang’s bit syntax:

iex(12)> <<131, 112, size::size(32), arity::size(8), uniq::size(128), index::size(32), numFree::size(32), rest::bitstring>> = b
<<131, 112, 0, 0, 0, 81, 1, 206, 48, 166, 31, 173, 185, 46, 112, 247, 209, 191,
  102, 180, 238, 231, 231, 0, 0, 0, 0, 0, 0, 0, 0, 100, 0, 16, 69, 108, 105,
  120, 105, 114, 46, 84, 101, 115, 116, 101, 114, 105, 110, 103, ...>>
iex(13)> {size, arity, uniq, index, numFree}
{81, 1, 274073566770734362952310774876330715111, 0, 0}

So the size is the size in bytes of this function structure (the part after the <<131,112>>, and it matches. The arity is the number of arguments the function takes, 1 in this case, which is correct. The uniq is the MD5 hash of the start of the significant part of the BEAM compiled file of the module where this anonymous function is defined to ensure corruption does not happen, and if this does not match the module that is loaded in memory then calling the anonymous function fails. Thus you can store and run an anonymous function as long as the module that it was defined in does not change, does not hot code swap, is not even recompiled, nothing at all, it must not change. Continuing though, the index is the index into the function table of the module (see if you recompile then an anonymous function can change positions). And lastly the numFree is what it is closing around (this is a function, not a closure, so it is 0 here).

The internal system stores it similarly as well, thus calling an anonymous function that was defined in a module that has been recompiled or hot loaded or anything at all will fail.

A work-around (still not recommended) is to store the bytecode (if it is not a closure) or source code an execute it straight, as detailed prior in this thread, though that is not efficient by any stretch.