What’s wrong with dependencies?
I have a problem with dependencies.
Here’s what happened:
-
I decided I want to do some Elixir programming – I’ve neglected this for a while but wanted to do some experiments. There’s a particular problem I want to solve
-
I download the latest and greatest Elixir - I want to be up-to-date
-
I Googled a bit and found that “my problem” has been attacked before – indeed I find a github project that has tackled this.
-
I download the project
-
I do mix deps.get etc.
-
The program does not work!
I’m pretty sure that I’m not alone in this – at the project in question somebody asked “are you sill maintaining this” (the last update was a year or so ago) - the answer was “yes possibly if anybody is interested”
At this point I assume that the project did actually work correctly when last committed to github.
So something is wrong but what?
Looking at the dependencies they say things like:
{:package, “~> 2.0”, only: :dev},
Which means that the ‘package’ version should be >= 2.0.0 and < 3.0.0.
Now the author has presumably tested the system with a specific version (say 2.0.2) and just assumes that their program will work with 2.0.X where X > 2.
But obviously something somewhere is wrong since the program manifestly does not work - I’d really like to know exactly which versions the program has been
tested with - and I’d like to be able to download these versions and exactly reproduce the behavior of the program. As far as I can see this is not possible.
What I’d like is the possibility to exactly describe the state of the system “as tested and known to work” - and to be able to reproduce this.
Describing a version by a number like 2.0.2 is in my mind crazy - I’d prefer the system to be described by a list of the SHA1 checksums of all the modules needed to reproduce the behavior of the program.
How can we then make program that use inconsistent sets of module versions?
There are two ways:
a) - a bit of fancy name munging (replacing module names by SHA1 checksums)
b) - running code in different nodes and using pure messaging to separate them
b) seems to be simpler - and has the advantage of being truly OO and really separating concerns. a) is trickier and needs a lot of thought to get right (I have some ideas here
While on the subject of dependencies I guess I should say that I’m not an enthusiastic fan of these.
On the + side, we can move quickly and achieve lots by using other peoples code.
On the - side any errors in the dependencies will creep into our code.
Personally I cut-and-paste the code I need from the dependencies into my code and stare hard at it (which is fine for trivial code, but does not work for complex things)
(aside)
When Robert and I wrote the original system we were paranoid about not having dependencies in the code that booted the system.
reverse/2 and foldl/3 are defined in lists.erl but we cut and paste the code for these into ring0.erl in order to reduce the dependencies.
The system has been refactored many times since this - but you can still see traces in the erts/preloaded/src code - for example in prim_zip.erl you’ll find the code for foldl (renamed to lists_foldl).
Why did we do this? - because we cannot be sure that autoloading works early in the life of the system - also lists.erl has many functions that we do not need - and might be buggy in a later version.
(/aside)
One more passing thought.
I’ve written a garbage collector for Erlang dependencies - basically it takes a root set of function calls and traces everything that can be called renaming appropriately and putting everything into a single module.
I’ve been oftly quoted as saying that the problem with OO inheritance that when you wanted a banana you got not only the banana but the gorilla that was holding the banana and the entire ****ing jungle.
I think the same is true of dependencies - you might add a dependency because you want to reuse a single function in the dependency. The problem here is that
you get all the other function in the module, and (recursively) any dependenciesthat these modules might need even though the code you want to call will NEVER call this code
I have on occasion downloaded the odd node program, the other day one of these stopped working and I checked what dependencies had been downloaded as a side effect of installing the program I did want to use - it was horrific - hundreds of programs had been downloaded as sub-dependencies and I haven’t a clue what they all do.
Statically garbage collecting the code that actually gets called would be a great way of reducing the dependencies.
Actually garbage collecting code is no more than automating what I have always done manually.
The way I work is that when I find some code in some module that does what I want I cut-and-paste into my module and edit out all the bits I don’t need. I’m basically a manual garbage collector.
This is why I’m a slow programmer - move slowly, understand what you do and don’t make any errors - the consequences of this are “slow progress” BUT most of the code I wrote 25 years ago still works without any changes - oh and I avoid NIFs and write in pure simple Erlang.
NIFs are for performance - but if you want fast code wait a few years - Erlang performance has improved by c. 10^6 over 20 years - and it’s not due to smarter code (hardware improvements way outstrip software).
This is of course almost the opposite of “shipping buggy code and getting early to market” - shipping correct beautiful code and being early to market seems impossible - many iterative cycles are need to
shake out a correct design.
Back on topic
I think is would be possible to fix up the system as follows:
-
We could snapshot the state of the system (and by this I mean find all the modules that are loaded) (easy run your program once then call :code.all_loaded - and gather all the modules that are needed then garbage collect then from some root set - ie just figure out what
actually gets called and not the code that is just their “by accident” and never gets called. -
Write stub code to interface the required function calls in the root set through a message passing API.
-
Dump the snapshot in a single file.
-
At run time recreate the environment of the dumped program and set it running.
This is not easy BTW - but would be very beneficial - it would be really nice if the programs we write today ran in a few hundred years time without change
So in a thousand years time - somebody can say
> mix deps.get
> mix compile
> mix run
and the program will work just like it did 1000 years ago