Ecto: pre-populating a table--Is it a good idea to load the data as part of the migration that creates the table?

hive · May 11, 2021, 2:53pm

I’d like a recommendation please. Are there any best practices for populating tables of data used by an app?

I am adding postGIS support to an elixir app, and I’d like to load a database of US zip codes and lat/longs. Is it best to either:

a) pre-load this data from a file as part of the migration that creates the table,
or
b) load the data from a file as a separate operation independent of the migration?

If you recommend option-b then how do you recommend that I do this and when should I trigger the load of the data?

kelvinst · May 11, 2021, 4:45pm

Well, probably not the response you want, but worth to mention that if a table has a fixed set of rows that will only be change by you and not the user, maybe it shouldn’t be a table at all.

If the only reason for it to be a table is to get some DB level consistency check, I would just create an enum on DB for their keys and have the file loaded at compile time to get the extra info.

All that said, if you still think it should be a table, and the rows should be loaded for every instance of your service (dev, test and prod), I would put it on a migration.

hive · May 11, 2021, 5:50pm

I have about 60K rows of data, and it seems to me that having this in a database table would make it easier and faster to interact with.

kelvinst · May 11, 2021, 6:53pm

Well, not precisely, saving it to an external DB means you are serializing it every time you convert it from erlang terms to DB data and vice-versa. Maybe you would be best served by storing it on an ETS, or, depending on the complexity of your use cases and the amount of actual data, even using the file at compile time to generate code, like it’s done for unicode stuff on elixir for example: elixir/unicode.ex at master · elixir-lang/elixir · GitHub

EDIT.: now if you know these 60K rows are a lot of info, and you are concerned about memory usage, than you have a stronger argument to go with a DB table for sure.

Ted · May 12, 2021, 12:34am

I’m not experienced enough to chime in on which approach is better.

However, if you decide to go with Option A and get an error talking about undefined_table or relation xyz does not exist then try adding a call to flush() between your table set-up code and population code.

dlobo · May 12, 2021, 12:52am

For Glific (which we run as a SaaS), we keep the table creation and updates as part of the migration

Any data changes we do it as part of seeds. We started off using philcolumns (a hex package), but that seemed a bit cumbersome to manage multiple organizations in the SaaS, so we just write our own seeder functions and then call them directly from a remote console during deployment