:erlang.term_to_binary and postgresql => invalid byte sequence for encoding "UTF8"

After reading this:

I tried to use term_to_binary together with postgresql:

Migration:

[...]
      add :owner, :text
      add :version, :text
      add :state, :bytea
[...]

Schema:

[...]
     field :state, :binary
[...]

Code:

update_or_insert(%State{owner: owner, state: :erlang.term_to_binary(state)})

  def update_or_insert(state) do
    state
    |> Repo.insert(
      on_conflict: {:replace_all_except, [:id, :owner, :created_at]},
      conflict_target: [:owner],
      set: [updated_at: DateTime.utc_now()]
    )
  end

Error:

07:11:53.186 [error] GenServer #PID<0.15770.0> terminating
** (Postgrex.Error) ERROR 22021 (character_not_in_repertoire) invalid byte sequence for encoding "UTF8": 0x83

How to solve this?
Change encoding for the resulting binary?
Leave term_to_binary alone and use JSON encoding/decoding?

I was also looking at the docs for erlang term_to_binary but there are only compression levels mentioned

https://www.erlang.org/doc/man/erlang#term_to_binary-1

term_to_binary/1,2 produces a binary in the external term format, not a UTF-8-encoded binary, so this seems like a problem with the schema (by the looks of it on the database server side). Try looking into why field :state, :binary implies UTF-8 encoding.

1 Like

My guess is that its resulting from an encoding error in either the owner or version fields since the Postgres text type maps to Elixir String.t type - which is UTF-8. Therefore one possible cause is that that either the Postgres database or the database connection string are set to LATIN-9 or some other non-UTF-8 encoding.

The bytea type maps to Elixir/erlang binary as you have configured.

3 Likes

I still mainly do Erlang courses but I play with Elixir as well. :smile:

Yes, the ETF is an internal format and is definitely not UTF-8 encoded.

Thanks for chiming in @jhogberg and @kip!

This was interesting to solve.

The DB is set to UTF8:
2023-11-22 08_41_28-sudo -Hiu postgres psql

While I was trying to narrow down the problem I got a different error message:

09:44:01.258 [error] GenServer #PID<0.7186.0> terminating
** (Postgrex.Error) ERROR 22023 (invalid_parameter_value) payload string too long
    (ecto_sql 3.11.0) lib/ecto/adapters/sql.ex:1054: Ecto.Adapters.SQL.raise_sql_call_error/1

Which leads to the real problem
The table has a trigger attached - and that trigger is converting the updated/inserted table row to JSON to send a message using pg_notify => trying to convert the binary data…

And I was able to verify it as soon as I had a look at the postgresql log files

Never thought of this :crazy_face: