Validate_inclusion in db data, thoughts on refactoring

Athunnea · May 26, 2021, 1:19pm

Hi there!

I have an Ecto Changeset, that uses validate_inclusion against quite a few lists of values (country codes, packaging types, categories, etc). Some of these lists are in the database. Requesting the db from the changeset would be inefficient in case of batch inserts/updates.

I have been preloading these lists and sending to the changeset, so it would be one call to db on each batch of updates/inserts, instead of one per each row. Plus it feels more “pure” in terms of functions.

Now it becomes cumbersome. I’m considering to switch to calls directly from the changeset and using cachex as a proxy.

Current code:

def create_batch(list_of_params) do
   lists_of_values = App.Lists.get_lists()

   Enum.map(list_of_params, fn params ->
      changeset(struct, params, lists_of_values) |> Repo.insert
   end
end

def changeset(struct, params, lists_of_values)
   struct
   ...
   |> validate_inclusion(:field, lists_of_values[:package_codes])
end

Potential refactoring

def create_batch(list_of_params) do
   Enum.map(list_of_params, fn params ->
      changeset(struct, params) |> Repo.insert
   end
end

def changeset(struct, params)
   struct
   ...
   |> validate_inclusion(:field, App.CachedLists.get(:package_codes))
end

I have some concerns about potential downsides. Would appreciate an advice.

Keep sending lists of values to changeset
Request lists of values directly from changeset

0 voters

dimitarvp · May 26, 2021, 3:43pm

I voted for your first option because I really don’t understand how it is cumbersome?

Athunnea · May 26, 2021, 7:54pm

Why it becomes cumbersome in time:

When multiple functions call the changeset, each of them has to deal with passing these validation lists
When a few functions pass through the values, it’s getting harder to track what’s going on. E.g. import() |> create_batch() |> create() |> changeset(). You see that validation uses an list or map of lists, then you have to track it backwards to get to the origin. Instead of following the direct call to list service with cached values.
It’s more complicated to add new lists because they are kept far away from where they are actually used.
Some lists could have some business logic attached, when some values are available only in certain cases.
In general the changeset function shouldn’t be concerned that some lists are coming as function inputs only for performance reasons.

Does it make sense?

dimitarvp · May 26, 2021, 8:16pm

Not really but then again I am not working on the project every day after all.

The only thing I could gleam from your explanation is that you need higher-level functions that compose the lower-level functions and give them some nice descriptive names.

ityonemo · May 26, 2021, 8:33pm

Option 3: refactor into an association?