Doerge
(Syntax Error) invalid or reserved Unicode code point
Hey everybody,
I get a response from a (pretty funky) API, and am trying to decode it with HTTPoison.
It fails with something like this:
{:error,
%Poison.ParseError{
data: " { \"config\": ...",
skip: 86293,
value: "\\ud83d"
}}
Toying around with it in IEx also doesn’t work:
iex(1)> tmp = "\ud83d"
** (SyntaxError) iex:1:8: invalid or reserved Unicode code point \u{d83d}. Syntax error after: \u
I’m not really sure how to proceed from here.
How can I clean up my string from invalid code points before decoding it? It’s ok for me to just drop the invalid ones, but I would like to keep the valid ones.
Marked As Solved
al2o3cr
\ud83d is a Unicode “surrogate pair” character; it’s normally followed by another character of the \uDxxx variety to represent a character above \uFFFF in UTF-16 systems. If there’s one in the text by itself (a “lone surrogate”) that string can’t be represented in UTF-8 at all and is invalid.
Jason can parse these (when they are paired):
iex(livebook_ky4n2p4p@Matts-MacBook-Pro-2)22> {:ok, decoded} = Jason.decode("{\"foo\":\"\\uD83D\\uDE04\"}")
{:ok, %{"foo" => "😄"}}
But oddly, doesn’t produce them:
iex(livebook_ky4n2p4p@Matts-MacBook-Pro-2)23> {:ok, encoded} = Jason.encode(decoded)
{:ok, "{\"foo\":\"😄\"}"}
iex(livebook_ky4n2p4p@Matts-MacBook-Pro-2)24> String.to_charlist(encoded)
[123, 34, 102, 111, 111, 34, 58, 34, 128516, 34, 125]
Also Liked
kip
Emoji are definitely part of the Unicode specification.
Doerge
Thanks for explaining, and pointing me to Jason! It works perfectly!








