Converting hex chars to unicode equivalent

kip · May 7, 2024, 9:29am

\xf4 is a byte representation which Elixir understands, but in this example is not UTF8 encoded, which Elixir does expect. This type of representation happens in some languages when they only support ASCII but need to represent other encodings.

Something like this might help:

defmodule XDecode do
  def decode do
    "Bient\xf4t"
    |> String.chunk(:valid)
    |> decode_codepoints()
  end
  
  def decode_codepoints([utf8, codepoints | rest]) do
    utf8 <> List.to_string(:binary.bin_to_list(codepoints)) <> decode_codepoints(rest)
  end
  
  def decode_codepoints([utf8]) do
    utf8
  end
  
  def decode_codepoints([]) do
    ""
  end
end

As you would expect, this will only work if in fact the hex bytes resolve to valid Unicode code points. Otherwise, as @hauleth says, you need a more generalised converter.