Changing module of saved binary data

I run a system, where some structs are saved as binary with :erlang.term_to_binary and then deserialized with :erlang.binary_to_term. The problem is, that if I alter the struct, it does not match the saved binary, and it will then not be deserialized correctly.

I have a test where i add a key, after saving some data, and of course, the deserialized data is missing that key. Is was hoping it will just give the new key the value nil.

Has anyone solved this problem?

Do not serialize the struct, but just the map without the __struct__ key.

Roughly:

data
|> Map.from_struct()
|> :erlang.term_to_binary()

For the deserializing something like the following should do:

binary
|> :erlang.binary_to_term()
|> then(&struct(YourStruct, &1))

This should work, as drafted in the following iex session:

iex(1)> defmodule S do defstruct ~w[a b c d e f g]a end
iex(2)> struct(S, %{a: 1})
%S{a: 1, b: nil, c: nil, d: nil, e: nil, f: nil, g: nil}

On top of that though, I suggest to finally stabilize the struct quickly. Or to version your data format and use explicit conversion or “upcasters”.

1 Like

ye, I will probably end up with explicit conversion. I was just hoping I did not need to rewrite that part of the system as it is quite large…

The part that serializes and deserializes structs is large? Why so? Can’t you just insert Map.from_struct in two places as @NobbZ recommended, and be done with it?

If you’re storing long-lived term data (versus caching, etc where old versions can be discarded) here are two approaches that can help:

  • explicit migration when changing the underlying struct. You’d write a one-off task to load each term, adjust its structure, then save it back
  • explicit versioning in the data. Add a version field to the struct with a default value. Bump it every time you add/remove fields. The code that says binary_to_term will need to check version and handle converting the old shape into the new shape

IMO the motivation to pick one versus the other is how often “old” data is read; if it’s used constantly then updating it once in a batch will be faster (the first approach) - but if it’s mostly historical and only read occasionally then doing it on-demand (the second approach) will avoid doing computations that might never be used.

3 Likes

You could probably even get fancy and use the built-in @vsn Module attribute

1 Like