Jason complains when two emojis go together

Hi,

When a text column of my Phoenix/LiveView app includes β€œ:christmas_tree::mrs_claus:t4:”, Jason complains: invalid byte [1].

Each emoji individually is ok (:christmas_tree: or :mrs_claus:t4:). If they are separated by text it is also fine (:christmas_tree:whatever​:mrs_claus:t4:), but when they are together (:christmas_tree::mrs_claus:) Jason fails. If I manually reload the page, in my browser I see :christmas_tree::mrs_claus:οΏ½ β€” but it’s still failing and LiveView reloading continuously.

By the way, String.valid?/1 says that it is ok.

Any idea what might be wrong? Thanks!

[1]

[error] Ranch listener NotesclubWeb.Endpoint.HTTP had connection process started with :cowboy_clear:start_link/4 at #PID<0.5920.0> exit with reason: {%Jason.EncodeError{message: "invalid byte 0xF0 in <<46, 46, 46, 35, 32, 65, 100, 118, 101, 110, 116, 32, 79, 102, 32, 67, 111, 100, 101, 32, 50, 48, 50, 49, 32, 240, 159, 142, 132, 240, 159, 164, 182, 240, 46, 46, 46>>"}, [{Jason, :encode_to_iodata!, 2, [file: 'lib/jason.ex', line: 213, error_info: %{module: Exception}]}, {Phoenix.Socket.V2.JSONSerializer, :encode!, 1, [file: 'lib/phoenix/socket/serializers/v2_json_serializer.ex', line: 70]}, {Phoenix.Socket, :encode_reply, 2, [file: 'lib/phoenix/socket.ex', line: 690]}, {Phoenix.Socket, :handle_in, 4, [file: 'lib/phoenix/socket.ex', line: 602]}, {Phoenix.Endpoint.Cowboy2Handler, :websocket_handle, 2, [file: 'lib/phoenix/endpoint/cowboy2_handler.ex', line: 145]}, {:cowboy_websocket, :handler_call, 6, [file: '/Users/hec/code/e/notesclub/deps/cowboy/src/cowboy_websocket.erl', line: 528]}, {:cowboy_http, :loop, 1, [file: '/Users/hec/code/e/notesclub/deps/cowboy/src/cowboy_http.erl', line: 257]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]}

:christmas_tree::mrs_claus:t4: and :christmas_tree::mrs_claus: are different:

iex(1)> String.codepoints("πŸŽ„πŸ€ΆπŸ½")
["πŸŽ„", "🀢", "🏽"]
iex(2)> String.codepoints("πŸŽ„πŸ€Ά")
["πŸŽ„", "🀢"]

However, I don’t have any problem decoding either of them with Jason, so you must have a problem with how you are encoding unicode somewhere. It looks like something is getting the encoding of U+1F3FD skin-tone modifier wrong.

iex(2)> Jason.decode!(~S{["πŸŽ„πŸ€ΆπŸ½"]})
["πŸŽ„πŸ€ΆπŸ½"]
iex(3)> Jason.decode!(~S{["πŸŽ„πŸ€Ά"]})
["πŸŽ„πŸ€Ά"]
iex(4)> Jason.encode(["πŸŽ„πŸ€ΆπŸ½"])
{:ok, "[\"πŸŽ„πŸ€ΆπŸ½\"]"}
iex(5)> Jason.encode(["πŸŽ„πŸ€Ά"])
{:ok, "[\"πŸŽ„πŸ€Ά\"]"

Also: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software

4 Likes

Thank you @adamu for the detailed message and the link.

The problem was in a regex that extracted the match’s surroundings and broke the utf-8 characters.

3 Likes