I have a data warehousing project where I need to store graphs of structs imported from hundreds of different data sources in a table. I know the typical way to do that (short of creating hundreds of associated tables) is to use Ecto’s embedded schemas and add a payload field of type :map to the database… in PostgreSQL under the hood it uses JSONB and serializes/de-serializes automatically.
Since I don’t ever have to use PG to query on the internals of the stored graphs and just want to be able to load the structs at runtime for rendering I was thinking about using a bytea field and a custom Ecto type to persist my arbitrary Elixir structs with compression (these highly nested structs can be huge) using :erlang.term_to_binary
then deserialize with Plug.crypto.safe_binary_to_term
when reading. Also I can de-dupe by storing a :sha256 of the payload as the primary key of this massive table since many of these imported data sources change very slowly and I will have to pull periodic updates from the sources.
Of course I plan to test to see how much performance and storage usage differs vs JSON but I don’t see why to convert to JSON and back when the data stays in Elixir for its entire lifetime and is never sent to JS or other external systems. The only thing that seems a bit scary to me when upgrading OTP versions is https://bugs.erlang.org/browse/ERL-431 but I guess I can always de-serialize, re-serialize, and recompute the new hash if/when they change ETF encoding in the future.
Is this a bad anti-pattern as I don’t see much evidence of this approach being used in the community? I’ve found several threads related to term_to_binary on this forum but none exactly applies to my idea.