Hello!
We are working with responses from an API that contain HTML from websites. Some of these websites contain unicode code points that Jason is unable to decode. We have a Python script that runs if this fails and can JSON decode these large strings just fine. I’ve also tried to see if Jiffy or Poison could handle this but they still fail at the same code point as Jason.
I’ve also tried several Elixir/Erlang String and Unicode functions to try and filter out anything that’s not valid UTF8 but the code points are ignored and the whole string is considered valid UTF8.
Below is a snippet of the response that we are trying to decode and only a very small part HTML that we occasionally get back, I just provided the portion that causes the problem with the decoding. Any help is appreciated!
{"server": "Microsoft-IIS/10.0", "headers_hash": 1111111111, "host": "127.0.0.1", "html": "\ufffdPNG\r\n\u001a\n\u0000\u0000\u0000\rIHDR\u0000\u0000\u0003\u0000\u0000\u0000\u0002f\b\u0006\u0000\u0000\u0000\ufffd[\ufffd}\u0000\u0000\u0000\u0001sRGB\u0000\ufffd\ufffd\u001c\ufffd\u0000\u0000\u0000\u0004gAMA\u0000\u0000\ufffd\ufffd\u000b\ufffda\u0005\u0000\u0000\u0000\tpHYs\u0000\u0000\u000e\ufffd\u0000\u0000\u000e\ufffd\u0001\ufffd\u0007R\ufffd\ufffd\u04b6\u03c6\ufffd\udc51\ufffd}6\ufffdc\ufffd'\u0005\ufffd\ufffd\ufffd/"}