I’m working with a team that are running (relatively) old versions of Elixir and Erlang. Last month I added dialyzer, via dialyxir (can never spell that) to their build. Today we are running into an issue:
:dialyzer.run error: Old PLT file /apps/[redacted]/priv/plts/application.plt
The build is on Azure. The PLTs are cached using Azure Pipeline caching (here if you want to know but I wouldn’t if I could avoid it).
Dialyxir config is
defp dialyzer do
[
plt_add_apps: [:mix, :ex_insights],
plt_file: {:no_warn, "priv/plts/application.plt"},
ignore_warnings: "dialyzer.ignore_warnings"
]
end
The plts are excluded from source control.
Versions:
elixir 1.14.2-otp-25
erlang 25.3.2.6
I know I can fix the issue by invalidating the cache but I would like to understand how it happened in the first place. The only thing I’ve been able to find is this somewhat cryptic Erlang mailing list q&a. Old PLT file? .
The checking is going on here which looks like there is some discrepancy between an internally held hash in the PLT and the file contents.
NVM, my bad, I’ve noticed only the mail listing link.
My assumption is that the PLT format is somehow versioned and you upgraded dialyzer version along the way. I would say that if invalidating the cache fixes the situation, then this issue falls most probably on some weird initial setup.
Yeah, the link was pretty subtle. It looks like the PLT is an erlang record, stored as a binary term which contains a version and an MD5 hash and if hash does not agree with a recomputed hash then it is an old plt. I don’t get it. ¯\(°_o)/¯☕️
Still not solved, but I have not spent a ton of time trying to dig into the Erlang dialyzer code. Some more context though.
This is not a single plt file becoming somehow corrupt. Azure Pipelines being Azure Pipelines, caching is isolated to a specific branch. This error was occurring across multiple PR builds. Also the main build, which uses a different cache key, also got the error. Each PLT became old on the 7th of Jan.
As mentioned we’re using dialyxir . Briefly looking at the output it checks that the plt is up to date before running dialyzer with checking disabled .
So the output looks like
Checking PLT...
[(redacted the list of applications), ...]
PLT is up to date!
ignore_warnings: dialyzer.ignore_warnings
Starting Dialyzer
[
check_plt: false,
init_plt: '(redacted)/priv/plts/application.plt',
files: [(redacted the list of files)],
warnings: [:unknown]
]
:dialyzer.run error: Old PLT file (redacted)/priv/plts/application.plt
Also I have checked and at that time:
there was no change to any dependencies
no change to the dialyzer config
no change to Elixir and Erlang versions
Even if there were, I don’t see how it could have affected multiple pipelines given the cache isolation.
We have the same issue in a project using Elixir 1.18.3 and GitHub Actions. It has been happening for quite a while now. It seems completely random. Once this error appears, I have to clear the cache to fix it.
This is how I’m saving the PLT cache, I’m wondering if there is something else that should be considered for the cache key.
That is generally not a really great strategy for caching, as you will have new caches only when your dependencies change.
I would say to use runner-id or whatever it is called to update your cache after each pipeline run, this will ensure that you are up-to-date when it comes to source files change. This will speed up your pipelines substantially.
Another point is that immutable cache is one of the worst systems I’ve encountered yet, so to be on the safe side in case the cache inevitably breaks, keep a prefix for your cache that can be manually changed such as v1-. If you have the option of removing old cache that is even better.
I think that if I’ll have to interact with GA again, I will do the cardinal sin and use JS to write a mutable caching action that works just as in gitlab . Worked with that one in production for many years and it’s vastly superior in every aspect.
Can you elaborate on the use of the runner ID? I’m under the impression the runner ID will be different for every run in GA (at least for GitHub-hosted runners). Are you suggesting I update PLT’s cache after each run even if no dependencies changed?
Imagine the case where you have a project where you added 200 source files, but you never changed a dependency, this is a very common case for more mature projects where adding/updating dependencies is not something you do often.
Using the cache strategy that relies on ${{ hashFiles('**/mix.lock') }}, you will have to compile/generate artifacts for all those source files at every pipeline run, and taking in considerations there are usually at least 4-5 of them for each commit, the penalty for old cache will cost you a lot of time.
I’ve not got to the bottom of the “Old PLT file” issue, and have not seen it again (yet) in our pipelines.
I’m not clear on the cache invalidation for PLT files suggestion on the runner-id here, but then again I’m not currently using Github Actions. My experience is that that the big time-sink when building the PLTs is in the initial run and there’s not a lot happens to the PLTs even after lots of source file changes.
Caching PLTs based on mix.lock works fine in my experience. (Except when it doesn’t - see above).
If you are paying for the service, which a lot of companies that depend on github do, a difference between a pipeline running 2 minutes and 4 minutes becomes a valid concern.
I wonder if we can ultimately just standardize this and refer to it via an identifier (think GHA allows this but since I always avoided it like the plague, I wouldn’t know for sure).
You could make a action that does that, however in order to make it flexible you need a lot of parameters, which ultimately makes it less readable compared to copy-pasting the original steps.
I also might be paranoid but I think that ultimately the notion of immutable cache is here in the detrement of the end-user and its obscurity and complexity only amounts to pipelines running for longer time, so the billing can be higher. Ultimately the virtual value immutable cache brings, AKA the cache that can be served from CDNs is useless, ideally the cache should be local to the machine that runs the jobs so it’s fast to restore.
I always laughed when it took almost 30 seconds to restore cache in CircleCI, what is the point of having it if you can compile the project from scratch faster?
Could be, I would not know, admittedly I always stayed away. But to me it seems there should not be too many knobs to turn: have or not have Dialyzer is actually the only one I can think of right now.
But it wouldn’t hurt if we had a few strongly opinionated actions, right?
Yeah, same, I watched live output of the actions on production deployment days and was confused but then realized that most of these workers have network-mounted disks which of course defeats like 80% of the whole thing…
I think the effort would be better spent on making a mutable local-first cache system, otherwise all of these solutions are just a hack around the main problem.
If only these goddamn action weren’t written in JS… also BTW you cannot write tests for them, amazing.
That’s not happening because again, the virtual instances where the actions are ran are often using network-mounted disks. They don’t even have local disks.