Best way to pass many arguments to a function?

kccarter · March 15, 2024, 7:02pm

Looking for some advice on how a seasoned Elixir developer might go about implementing the creation of a map/struct that is dependent on several other maps.

Let’s say we’re modelling something like a Freight Shipping Order, aka “Bill of Lading”. Generally speaking, these have a lot of fields. In our case, we need to make several API calls to just get the right information to assemble one.

Consider this pseudo code, typos are to be ignored.

def fetch_freight_order(id) do 
  # These all just return a Map, constructed from a JSON response.
  shipping_customer = call_api(...)
  receiving_customer = call_api(...)
  line_items = call_api(...)
  freight_company = call_api(...)
  
 # ... and maybe 4-5 more calls to an api we don't control ...

  # To assemble a freight_order, we need all of the parts above.
  freight_order = new_freight_order(
    shipping_customer,
    receiving_customer,
    line_items,
    freight_company,
   # ... 4-5 more arguments
  )
end

Option 1
Would Elixir developers pass each argument or would they be more included to pass a keyword list, map, or use some other pattern?

# Easy to get confused at the call site which order these are all in.
def new_freight_order(
  shipping_customer, 
  receiving_customer,
  line_items,
  freight_company,
   # ... 4-5 more arguments
)

%{
  # Populate shipping_customer fields
  shipping_name: shipping_customer["Name"],
  shipping_street: shipping_customer["Street"],

  # Populate receiving_customer fields
  receiving_name: receiving_customer["Name"],
  receiving_street: receiving_customer["Street"],

  # ... maybe 50+ more fields ... 
}

end

Option 2
If we pass a single map, then we can pattern match in the function’s signature but it gets rather unwieldy pretty quickly and the call site also gets a lot busier:

def new_freight_order(%{
  shipping_customer: shipping_customer,
  receiving_customer: receiving_customer,
  line_items: line_items,
  # a bunch more matches...
}) do 

# Merge into a single freight_order
end

Option 3
Or perhaps use single pattern matches:

def new_freight_order(freight_order, %{shipping_customer: shipping_customer}) do 
  # Merge in just the shipping fields
end

def new_freight_order(freight_order, %{receiving_customer: receiving_customer}) do 
  # Merge in just the receiving fields
end

Option 4
Alternatively, would anyone create separate functions for each section?

def populate_shipping_customer(freight_order, shipping_customer) do 
end

def populate_receiving_customer(freight_order, receiving_customer) do 
end

I appreciate that style is subjective, but as new Elixir devs, we’re curious what patterns might be preferred over others. This use-case comes up quite a bit for us. A lot of the data we’re working with within our Elixir app (and showing to the user), requires a considerable number of discrete API calls just to gather the required data before we can assemble it into some sort of local representation.

Thanks in advance.

D4no0 · March 15, 2024, 7:20pm

The rule of thumb is to not have a lot of arguments for a function (for me personally 4 is the upper limit). The most dangerous thing is as you mentioned, should you mess up the order and you can be in trouble.

In this case going with a map argument is the way to go, however I would advise against Option 2, as it is generally discouraged to have a match where you literally match on all the fields.

What you can do instead is to use the . map operator, for example freight_order.shipping_customer, in this way you will have an exception if that key is missing.

If you want to ensure that the structure of the map is respected at compile-time please use Typespecs.

sodapopcan · March 15, 2024, 7:59pm

A couple of recommendations:

Because APIs are volatile, perhaps you want to give your users a message about it and NOT LetItCrash™, you should have your API calls return error tuples and use them with with.

with {:ok, shipping_customer} <- call_api(...),
     {:ok, receiving_customer} <- call_api(...),
     {:ok, line_items} <- call_api(...),
     {:ok, freight_company} <- call_api(...) do
  # Put them all together
else
  {:error, api_error} ->
    # Handle error
end

In terms of responses, I would have a mapper to rename keys and then you could even use Ecto to validate the final result (via an embedded schema).

def map_keys(shipping_customer, receiving_customer, line_items, freight_company) do
  %{
    shipping_name: shipping_customer["Name"],
    shipping_street: shipping_customer["Street"],
    receiving_name: receiving_customer["Name"],
    receiving_street: receiving_customer["Street"],
    # ...
  }
end

Here I actually wouldn’t use . in this case (even though it’s generally a good suggestion) to keep it simple and leave the validation for the final Schema.

Then you could make an Ecto schema for the final structure and validate it:

defmodule Thing do
  use Ecto.Schema
  import Ecto.Changeset

  embedded_schema do
    field :shipping_name, :string
    field, :shipping_street: :string
    # ...
  end

  def build(attrs) do
    %__MODULE__{}
    |> change()
    |> validate_shipping()
    |> validate_receiving()
    |> # ...
    |> apply_action(:update)
  end
end

Then the whole with would look like:

with {:ok, shipping_customer} <- call_api(...),
     {:ok, receiving_customer} <- call_api(...),
     {:ok, line_items} <- call_api(...),
     {:ok, freight_company} <- call_api(...),
     mapped_attrs = map_keys(shipping_customer, receiving_customer, line_items, freight_company),
     {:ok,  thing} <- Thing.build(mapped_attrs) do
  # Do stuff with `thing`!
else
  {:error, %Ecto.Changeset{} = changeset} ->
    # Handle data error

  {:error, %SomeHttpLibError{} = api_error} ->
    # Handle network error
end

In terms of number of params I definitely like to keep them low, but sometimes rules need to be broken! Not sure how many API calls you need to make, though, so as always, YMMV.

derek-zhou · March 15, 2024, 8:08pm

I do not have a hard limit for how many arguments for a function call; though I do agree that too many is a code smell. I use keyword list for options only. I’d also argue against making up map to reduce argument count. Use structs, or even Ecto. Yes, Ecto schemas can be used without a database.

D4no0 · March 15, 2024, 9:20pm

Using ecto for validation of incoming data was a good advice, but using ecto for arguments is certainly an overkill, let’s not transform this topic into talking about types.

derek-zhou · March 15, 2024, 9:53pm

I was suggesting using structs instead of maps for arguments. Ecto might be useful to construct the structs, because there could be a need to validate the date from the (3rd party) API calls something like:

    %FreightOrder{}
    |> FreightOrder.changeset(%{
        shipping_customer: shipping_customer,
        receiving_customer: receiving_customer,
        line_items: line_items,
        ...
        })
    |>  submit!()

EDIT: seems like @sodapopcan already suggested that.

D4no0 · March 15, 2024, 9:59pm

Structs are a great idea, I always use them, but totally forgot about them now!

Speaking about this question in general, it was not about validating data, but passing arguments around, so a map is a perfect wrapper for this kind of business, as are structs. There are cases where structs are better than maps and vice-versa, but for a beginner I think that presenting ecto structs as a good pattern to write code is a misguidance.

If the suggestion is around trying to clone strict typing in elixir, then that is a 100% anti-pattern in itself.

kccarter · March 15, 2024, 10:08pm

For a bit more context, many of our API calls return pretty large JSON responses. For now, we’ve tended to just work with string based keys because we’re worried of the atom limit, though we haven’t truly stressed tested that.

The example above would tend to produce a more deterministic set of keys for each API call, but we definitely have other parts of the codebase where the JSON response is much less deterministic.

sodapopcan · March 15, 2024, 10:21pm

You should definitely use string keys for JSON responses AFAIC pretty much exactly for the reason you stated. I also find it’s good signal to mean “untrusted.”

The idea of using Ecto is Schemas is to bring that under control. Convert the JSON into a known shape before it’s fed through the rest of your code. So you could end up with code something like: shipping_name: response["shipping_custoer_name"] || response["shipping_name"] || response["some_other_key"] or whatever logic you need. I made the mistake of only showing example of validation. Validation would only fail if something crucial is missing, but otherwise use the changeset to wrangle keys into something known, even if some are blank.

Of course, if you are blindly picking off keys and doing something with them without ever knowning what they are called then this isn’t possible. It does depend on use-case, of course.

rvirding · March 15, 2024, 11:05pm

A struct has one definite advantage over using plain maps is that you define what should be in that structure and what should not be there. It gives you much better control over the data. And as structs are implemented using maps it is just as efficient as using maps directly.

D4no0 · March 15, 2024, 11:18pm

Agree, but defining structs with 2-4 keys can be too verbose in some cases. I don’t say I’m against it, however using structs excessively can lead to obscure code that is hard to read, after all code clarity is as important as compile-time guarantees.

wojtekmach · March 15, 2024, 11:39pm

I agree that around 4 or more function arguments, things start to become dicey.

I also agree that depending on situation, Ecto changesets might feel like an overkill.

Here’s a couple options below.

Keyword.validate! is a built-in way to ensure proper key names and you can even set defaults.

For doing a bit more validations, a good old recursive function that parses the keyword list key/value pairs is often enough. Or do it inline and even convert to a map right away (to later use the assertive map.field syntax):

Map.new(options, fn
  {:host, host} when is_binary(host) -> {:host, host}
  {:port, port} when is_integer(port) -> {:port, port}
end)

Beyond that, NimbleOptions — NimbleOptions v1.1.0 is a great option as it’s super easy to define basic validations, defaults, and it can ever create docs for you.

kccarter · March 15, 2024, 11:39pm

We’re able to use structs in some places where we have deterministic fields for a given response. An address is a good example.

But we find ourselves “having” to use normal maps with string keys more often than not because of the amount of JSON data we work with that isn’t easily defined ahead of time. (This is equally problematic in something like TypeScript.)

An example might be running a report that aggregates a bunch of values for every SKU in a system, with the results being returned in the format:

{
 "sku_001": { "title": "value", "color", "value", ... },
 "sku_002": { "title": "value", "price", "value", ... },
 "sku_003": { "width": "value", "height", "value", ... },
}

We also do a layout of layout in HEEX templates where we don’t know the “keys” ahead of time, and structs don’t allow fetching a field based on a key-string.

sodapopcan · March 16, 2024, 12:23am

I guess you store by key so you can do fast look up?

If you were able to store as structs then you can still do look up by string with Keyword.get/fetch and is fine to convert string to atom there since you already have the atom. I’m getting a bit too much into your business though. And ya, agreed this is a place dynamic languages are nice. I did work with data like this once where we’re just getting all this random JSON data and I almost thought it would be a good use-case for Explorer. I didn’t last long enough to validate that, though. We were rife with nil errors, though, so that’s where my suggestions come from.

dimitarvp · March 16, 2024, 1:33am

In that case you should use recursive parsing as @wojtekmach suggested because it’s indifferent to the order of keys and if you have all keys covered in the separate function clauses then you’ll always arrive at the right parsed data.

Or – something I would do because I am paranoid and always will be when it comes to programming – create a JSON Schema and then use it to validate your assembled records. Or, even better, create several JSON Schemas for each of the responses that you’re getting (you mentioned that the final record requires assembly of results coming from several different 3rd party API calls).

…Or if it’s not a JSON Schema (it can sometimes be an outsized amount of work), at least have a bunch of e.g. ApiFoo.new_response and ApiBar.new_response and ApiBaz.new_response etc. functions that crash and burn if an expectation about shape of data, type of field or contents of a value is violated. Though you have to make sure you have a proper APM (monitoring) system in place so you can iterate quickly – as you’re discovering the actual rules of the response data you’ll have to introduce new / relax old constraints.

I plan on doing the last option in my greenfield project that relies on an external API that lacks proper OpenAPI / Swagger docs and relies strictly on institutional knowledge, for the same reason: I am afraid writing my own JSON Schema(s) might take too much time.