Pattern matching functions and dialyzer

Background

I am trying to use dialyzer to validate type specs in my code, but I wonder what is the best way of using it when we have functions that use pattern matching.

Code

Let’s focus on this code segment from Designing Elixir Systems with OTP (type specs added by me):

# Validator.check returns :ok if the 1st parameter is true and returns the second otherwise.
defp validate_generator({name, generator}) when is_atom(name) and is_list(generator), do:
  Validator.check(generator != [], {:error, "can't be empty"})

defp validate_generator({name, generator}) when is_atom(name) and is_function(generator, 0), do:
  :ok

defp validate_generator(_generate), do:
  {:error, "must be a string to list or function pair"}

This code receives a tuple of an atom and a function and validates if the parameters have the correct format.

Dialyzing

There are 2 main ways of dialyzing this code:

@spec validate_generator(any) :: :ok | {:error, String.t}
defp validate_generator({name, generator}) when is_atom(name) and is_list(generator), do:
  Validator.check(generator != [], {:error, "can't be empty"})

defp validate_generator({name, generator}) when is_atom(name) and is_function(generator, 0), do:
  :ok

defp validate_generator(_generate), do:
  {:error, "must be a string to list or function pair"}

This is a generic approach, but I find it too generalist. Although it is true that validate_generator can receive anything as a parameter, if you actually pass it anything you will always end up in the 3rd pattern match, which is an error.

You also lack information about the other function clauses.

The other way of dialyzing this code would be the following:

@spec validate_generator({atom, list}) :: :ok | {:error, String.t}
defp validate_generator({name, generator}) when is_atom(name) and is_list(generator), do:
  Validator.check(generator != [], {:error, "can't be empty"})

@spec validate_generator(atom, function) :: :ok
defp validate_generator({name, generator}) when is_atom(name) and is_function(generator, 0), do:
  :ok

@spec validate_generator(any) :: {:error, "must be a string to list or function pair"}
defp validate_generator(_generate), do:
  {:error, "must be a string to list or function pair"}

This alternative has the advantage of making it clear the types of parameters expected for each pattern match clause, although it is a little bit more verbose.

Questions

I would prefer the second way, however I am not sure if having 3 spec definitions for a single function would cause issues with dialyzer.

1.Wouldn’t the second approach confuse dialyzer (because I am defining the same spec 3 times)?
2. If the second approach has serious drawbacks, is there a way of improving the spec of the first approach?

No, you can’t do that, as it has overlapping specs. Dialyzer can’t deal with that.

The issue is that it is a single function - the fact that it is split into multiple clauses is an implementation detail that has nothing to do with the function’s actual type. So if you want tighter typing you need to define separate functions with separate types - but the moment you delegate from the multi-clause function to the specific functions you lose the direct correlation between the input types and the output types on the multi-clause function - because you are creating the respective sum (or union) types for those inputs and outputs. Try as you might:

@spec validate_generator(any) :: :ok | {:error, String.t}

is the effective type of that function. What you are trying to do is tie a specific type to part of a function’s implementation - functions and types simply don’t work that way.

This is one of the reasons why I’ve come around to seeing the mainstream way of typing:

function_name(parameter_name : parameter_type) : return_type

as conflating types with implementation - as in many cases the parameter names are coupled to the implementation. I find myself often in the situation where I know the type of the function before I have good names for all the parameters (and perhaps even the function).

It’s usually helpful to write down the type of new functions first;

2 Likes

I might go one step further and suggest that the last function head should not exist at all. It is a defensive definition that may be self defeating and the spec problem is symptomatic. Then you can define a spec with ( {…} | {…} ) that can actually check the type usage.

I’m of a different opinion.

While validation of inputs / handling of errors should happen at the points of entry/output of the system this function seems like exactly that: Logic to validate input. At some place there needs to be code handling input, which determines if it conforms to expectations or not. And it’s not unreasonable to expect such code to handle any() input.

If this is part of the business domain then it’s a different picture. In there code should assume it’s supplied what it expects and blow up if that’s not the case.

2 Likes

This specific code bit is part of the boundary and it’s the code that validates input. According to my understanding of your comments the first option is the only option available as the second one doesn’t even work.

@peerreynders

Could you convert my example to the form

function_name(parameter_name : parameter_type) : return_type

??

I am not sure what you mean by this as I have not yet seen Elixir code written this way. Is this something used for documentation or can you actually write the functions like this?

Yes. I completely agree :smiley:

This is an example of forms used in other languages.

1 Like

Correct.

There was a discussion about typing style preferences.

I used to be in the inline typing camp - but at that time I was only concerned about the types of the parameters and really had no awareness of a function’s type. There is such a thing as a function pointer but at the time the typing syntax seemed as mystical as a rune spell.

With JavaScript I discovered using “functions as values” and that is when function types became important - at that point the parameter types were simply derivative of the function type they were part of.