Gradualyzer vs Dialyzer - main differences?

Can anyone tell me the status and prospects for type checking of message contents? I’ve only used Dialyzer in the context of a smallish Phoenix project and my impression is that it doesn’t even try to cover inter-process messages (but I could certainly be wrong – clues?).

While I’m at it, what about coverage of Nx-based data structures, etc? And a pony…

-r

1 Like

Afaik Dyalizer doesn’t do that as it was not aimed to.

You could in theory have typed structs and then pattern match on handle_info (or any of its friends) when receiving a message, but if your messages are dynamic you are in for a ride and I am not sure there are safety bells for that.

Gradualizer is still in an experimental phase and I do not personally recommend it. As bad as you might consider Dialyzer, it is still several times better than any of its competitors, hands down.

From what I’ve read in the community on that topic it’s less of a “doesn’t try” for not wanting to handle it, but more a “doesn’t try” for it being a problem not easily solvable especially under the (lack of) constraints present on the erlang vm. Especially features like hot code updates really limit that dialyzer can confidently tell about message sending being incorrectly coded.

At runtime, you can automatically validate if the messaged struct matches its type with Domo library.
The lib flawlessly generates appropriate pattern matchings automatically and adds validation ensure_type/1 function and constructor new/1 function to the struct at compile time.

2 Likes

Thanks for bringing Domo to my attention. It looks like a very powerful (and nicely documented!) suite of tools. However, I suspect that runtime performance concerns may scare off some possible users. There’s also a bit of a conflict between the “check everything” and “let it crash” mantras.

So, I’ve been speculating about ways to employ Domo on a dynamic basis (e.g., controlling its activity via supervision trees). Although the following notes are pure science fiction, it might be possible to make something similar work.

Let’s assume that Domo has been compiled into a set of modules, but its runtime checking has not been activated. At some point, a process crashes and the appropriate supervision tree is called in to repair the mess.

At this point, we can “activate the immune system” of each of the restarted processes. We give each one a numeric value which, if non-zero, causes it to turn on Domo’s checking. Each time the process calls itself, the value gets decremented (stopping at zero). So, if this was just a random data glitch, the checking will go away.

However, if some other process is sending bad data on a repetitive basis, Domo will be primed to catch, analyze, and report the problem. And, as part of this activity, the initial counter value(s) can be restored or even increased.

It might even be possible to use some sort of graph-based tracking and analysis to get at the root cause(s) of the problem. For example, Domo could alert the process that sent the bad data, saying “you seem to have a problem; please turn on your Domo checks.”.

OK, that’s enough late-night speculation for this tired camper; let me know if anything here seems worth considering…

1 Like

Domo is aligned with “let it crash” quite well :slightly_smiling_face: Library user may select places of check-point and call MyStruct.ensure_type!(my_struct) to have a crash if the struct’s data is invalid. These places can be the boundaries between modules or processes where f.e. the mapping between data structures occurs before passing these structs deeper.

So it’s an explicit tradeoff. A call to ensure_type(!)/1 trades some CPU cycles for understanding if the operation on the struct was correct. And these can be wisely selected depending on the problem.

In a large set of communicating processes, it may be impossible to predict where “the problem” may occur. Indeed, problems may arise almost spontaneously, because of changes in the input data, etc.

Placing checks strategically at key boundaries makes sense, but it’s still costing cycles at runtime. This is why I like the idea of putting some inactive checks in place, to be activated if and when problems are detected. A bit of macro magic could also be used to make this explicit and visible, yet largely unobtrusive in the code…

Are you responsible for microseconds-latency-sensitive apps?

No. Happily, I’m not responsible for any apps at the moment. :slight_smile: Also, please bear in mind that this is all exploratory armchair speculation on my part.

That said, I was talking in the context of someone scattering deep structure and pre-condition checks throughout a system. Any single check might be pretty fast, but running enough of them could bog things down noticeably.

So, having selected checks “activate” only when faults are detected might be a way to gather useful data while minimizing the impact on overall system latency and throughput.

1 Like

I don’t disagree but I think people fear this more than they should. I’ve worked for financiers a few times and yeah, there you should be afraid of lag spikes going to 10 milliseconds; something most apps wouldn’t even notice.

Everywhere else I ever worked though? Meh. Nobody bats an eye if 1 out of 20 requests takes 3 seconds even. Nobody cares. And to this day Rails developers insist that ActiveRecord (the ORM of Rails) being responsible for 100-150ms delay per web request (admittedly only if it has 3-10 DB queries, of course) is small and is not important. In the meantime a similar Phoenix endpoint in its entirety returns in 7ms at the most.

When it comes to such an impressively fast dynamic language environment like the BEAM VM I don’t view some minuscule delay like 2-10 more ms as consequential. Anything that helps avoid bugs, not leak people’s private data, and not lose money should be counted as a win, even if it comes at the expense of performance.

Another example: Rust’s compiler is slow but it does eliminate several entire classes of bugs by the mere virtue of your program compiling.

IMO we need more such tech in our line of work, including in the Elixir ecosystem. It’s amazing how much traffic can a mere modern i3 mini-computer with 32GB RAM and a SATA III SSD can serve; we should focus on correctness because buying 20% more hardware capacity is a rounding error in most companies.

3 Likes