Make sure that your database is encoding aware and not just assuming binary/opaque blob of data.
Then write a program that fixes all the malformed data you have according to the policy you have decided to go with.
Keeping malformed data in the DB and reinventing fixes for each consumer of the database is what made us see all the questionmarks and backslashes in the early 2000s, as so many PHP folks tried to fix data before pushing to DB and another set of developers tried to fix when pulling out, leading to “double fixed” data which again was often worse, as the roundtrip of fixing started again with each data edit.
On top of that, developers just didn’t understand all the levels of escaping…
FWIW, the byte 0xFA represents a “middle dot” character ( · ) on the old DOS code page 437.
Untangling old data like this is exceptionally difficult in general; I once had to migrate a database that had been storing whatever native character set was sent by clients (due to an old IE6 bug) and thus had everything from Arabic to Cyrillic ISO-8859 codepages mixed together. We had to hand-code a translation function from “customer country” to “likely character set” to deal with all that mess…
String.chunk returns a list with two binaries (“valid” and “invalid” characters), but this passes them BOTH along to be serialized to JSON, thus the error
dropping the characters wouldn’t be desirable anyways - since you mention the value is calculated with sha256, that’d be like using scissors to trim a big floppy disk to fit in a small disk drive
The values in JSON need to be valid UTF-8; if you’re trying to pass arbitrary binary data, consider using an encoding format like hexadecimal (34f89e4c23a0d) or base64. Elixir supports them through the Base module in stdlib.