Problem with dependencies

joeerl · August 24, 2018, 11:03am

What’s wrong with dependencies?

I have a problem with dependencies.

Here’s what happened:

I decided I want to do some Elixir programming – I’ve neglected this for a while but wanted to do some experiments. There’s a particular problem I want to solve
I download the latest and greatest Elixir - I want to be up-to-date
I Googled a bit and found that “my problem” has been attacked before – indeed I find a github project that has tackled this.
I download the project
I do mix deps.get etc.
The program does not work!

I’m pretty sure that I’m not alone in this – at the project in question somebody asked “are you sill maintaining this” (the last update was a year or so ago) - the answer was “yes possibly if anybody is interested”

At this point I assume that the project did actually work correctly when last committed to github.

So something is wrong but what?

Looking at the dependencies they say things like:

{:package, “~> 2.0”, only: :dev},

Which means that the ‘package’ version should be >= 2.0.0 and < 3.0.0.

Now the author has presumably tested the system with a specific version (say 2.0.2) and just assumes that their program will work with 2.0.X where X > 2.

But obviously something somewhere is wrong since the program manifestly does not work - I’d really like to know exactly which versions the program has been
tested with - and I’d like to be able to download these versions and exactly reproduce the behavior of the program. As far as I can see this is not possible.

What I’d like is the possibility to exactly describe the state of the system “as tested and known to work” - and to be able to reproduce this.

Describing a version by a number like 2.0.2 is in my mind crazy - I’d prefer the system to be described by a list of the SHA1 checksums of all the modules needed to reproduce the behavior of the program.

How can we then make program that use inconsistent sets of module versions?

There are two ways:

a) - a bit of fancy name munging (replacing module names by SHA1 checksums)

b) - running code in different nodes and using pure messaging to separate them

b) seems to be simpler - and has the advantage of being truly OO and really separating concerns. a) is trickier and needs a lot of thought to get right (I have some ideas here

While on the subject of dependencies I guess I should say that I’m not an enthusiastic fan of these.

On the + side, we can move quickly and achieve lots by using other peoples code.

On the - side any errors in the dependencies will creep into our code.

Personally I cut-and-paste the code I need from the dependencies into my code and stare hard at it (which is fine for trivial code, but does not work for complex things)

(aside)

When Robert and I wrote the original system we were paranoid about not having dependencies in the code that booted the system.

reverse/2 and foldl/3 are defined in lists.erl but we cut and paste the code for these into ring0.erl in order to reduce the dependencies.

The system has been refactored many times since this - but you can still see traces in the erts/preloaded/src code - for example in prim_zip.erl you’ll find the code for foldl (renamed to lists_foldl).

Why did we do this? - because we cannot be sure that autoloading works early in the life of the system - also lists.erl has many functions that we do not need - and might be buggy in a later version.
(/aside)

One more passing thought.

I’ve written a garbage collector for Erlang dependencies - basically it takes a root set of function calls and traces everything that can be called renaming appropriately and putting everything into a single module.

I’ve been oftly quoted as saying that the problem with OO inheritance that when you wanted a banana you got not only the banana but the gorilla that was holding the banana and the entire ****ing jungle.

I think the same is true of dependencies - you might add a dependency because you want to reuse a single function in the dependency. The problem here is that
you get all the other function in the module, and (recursively) any dependenciesthat these modules might need even though the code you want to call will NEVER call this code

I have on occasion downloaded the odd node program, the other day one of these stopped working and I checked what dependencies had been downloaded as a side effect of installing the program I did want to use - it was horrific - hundreds of programs had been downloaded as sub-dependencies and I haven’t a clue what they all do.

Statically garbage collecting the code that actually gets called would be a great way of reducing the dependencies.

Actually garbage collecting code is no more than automating what I have always done manually.

The way I work is that when I find some code in some module that does what I want I cut-and-paste into my module and edit out all the bits I don’t need. I’m basically a manual garbage collector.

This is why I’m a slow programmer - move slowly, understand what you do and don’t make any errors - the consequences of this are “slow progress” BUT most of the code I wrote 25 years ago still works without any changes - oh and I avoid NIFs and write in pure simple Erlang.

NIFs are for performance - but if you want fast code wait a few years - Erlang performance has improved by c. 10^6 over 20 years - and it’s not due to smarter code (hardware improvements way outstrip software).

This is of course almost the opposite of “shipping buggy code and getting early to market” - shipping correct beautiful code and being early to market seems impossible - many iterative cycles are need to
shake out a correct design.

Back on topic

I think is would be possible to fix up the system as follows:

We could snapshot the state of the system (and by this I mean find all the modules that are loaded) (easy run your program once then call :code.all_loaded - and gather all the modules that are needed then garbage collect then from some root set - ie just figure out what
actually gets called and not the code that is just their “by accident” and never gets called.
Write stub code to interface the required function calls in the root set through a message passing API.
Dump the snapshot in a single file.
At run time recreate the environment of the dumped program and set it running.

This is not easy BTW - but would be very beneficial - it would be really nice if the programs we write today ran in a few hundred years time without change

So in a thousand years time - somebody can say

> mix deps.get
> mix compile
> mix run

and the program will work just like it did 1000 years ago

stefanchrobot · August 24, 2018, 11:36am

That would be a reasonable assumption if :package followed semantic versioning which it clearly does not. I’d consider this a bug in the :package. On the other hand, I’d personally pin the specific version in the dependency.

Lucky you! For some, “move fast and break things” might be the only way to have a chance at working with BEAM.

NobbZ · August 24, 2018, 3:40pm

I only skimmed your proposal, but it seems as if you want to re-implement nix on BEAM?

AstonJ · August 24, 2018, 4:15pm

This sounds a bit like an extended version of the mix.lock file?

Where basically a snapshot of all dependencies in the last working state are listed - meaning you should be able to reproduce that working state at any time in the future.

From the Elixir site:

You will notice that when you add a dependency to your project, Mix generates a mix.lock file that guarantees repeatable builds . The lock file must be checked in to your version control system, to guarantee that everyone who uses the project will use the same dependency versions as you.

Maybe something like an elixir.lock file could go that bit further?

(Apologies if I have misunderstood you)

dimitarvp · August 25, 2018, 1:38am

Looking at Elixir’s changelogs and inferring the general direction from them and by the core team’s posts, I would say they want to make Elixir even more explicit. I believe this thread is partially related to your valid misgivings:

Trouble with most open-source is that people start off doing it enthusiastically and it either (a) achieves the original goal of the author, or (b) they got burned out and/or got swamped with other duties and can no longer maintain the package.

Elixir does not introduce breaking changes as far as I am aware but bugs in older versions existed and some libs / apps might have worked because of them and not in spite of them. Somebody once said: “Computer bugs are like dogs: they like lying on their bellies in the company of their kin.”

I agree and have been waiting on a programming language tool to introduce something as strict. We absolutely need builds reproducible to the last byte. The closest you can get with mix and hex is to do this (taken from here):

{:foobar, git: "https://github.com/elixir-lang/foobar.git", tag: "0.1"}

If the author has good version control hygiene they will have stable tags and as we know, Git tags are basically pointers to a commit hash.

It is high time that all languages have this. Given an import ThisModule, only: [:function1, :function2] statement, I believe the compiler should go a few hundred extra miles to make damn sure to only include those functions and everything they depend on directly, in the resulting .beam file. Not so easy with runtime dynamic programming though (say you are passing a module as a function parameter for example, and that parameter depends on user input which cannot be predicted beforehand). Or this can be an extra mode for the mix xref command which is mighty useful and helped me refactor with great success several times now.

Many of us want that but the dependency in question might work perfectly under Elixir 1.4.0 and break under Elixir 1.7.2. Having a singular deployable application entity (with sub-applications and libraries bundled in) whose separate parts depend on different versions of the language/runtime is a problem that nobody wants to tackle, and rightfully so.

LostKobrakai · August 25, 2018, 8:28am

dimitarvp:

I agree and have been waiting on a programming language tool to introduce something as strict. We absolutely need builds reproducible to the last byte. The closest you can get with mix and hex is to do this (taken from here):
{:foobar, git: "https://github.com/elixir-lang/foobar.git", tag: "0.1"}
If the author has good version control hygiene they will have stable tags and as we know, Git tags are basically pointers to a commit hash.

You can even have {:foobar, git: "https://github.com/elixir-lang/foobar.git", ref: "925d0cc8"} if you don’t want to depend on tags sticking to a specific commit.

OvermindDL1 · August 25, 2018, 3:15pm

Have you used OCaml or its ilk? Rust is less pretty by far but it’s another if if it compiled it generally works things.

Yeah this is precisely the case here.

Doesn’t the elixir lock files handle this overall case though?