Is it possible to view extended Ascii characters in IEX?
I got this string:
<<83, 97, 105, 110, 116, 32, 65, 110, 110, 101, 146, 115, 32, 67, 97, 116, 104, 111, 108, 105, 99, 32, 83, 99, 104, 111, 111, 108, 32, 73, 114, 97, 119, 111, 59, 59>>
seems only ASCII 146:
Æ prevents it from displaying.
Is there any reason we don’t display ASCII 127 above in IEx?
This is not properly encoded UTF-8, if you know the input encoding, you can use some conversion library to convert from one encoding to another.
also please remember, that anything
>= 128 is not ASCII, ASCII has only 7 bit. There are some 8 bit encodings that sometimes are refered to as 8-Bit-ASCII or extended ASCII, but thats not their “true” name.
I don’t know the source encoding, but i was just going with
see these characters in the data source, but not on IEx”
and it looked like 146 was the only offending char, and char 146 looks … printable
This depends entirely on the encoding. To print in IEx you need to have the string utf8 encoded, which may mean converting from latin1 or whatever the source encoding is.
so how do you deal with an
UNKNOWN encoding upfront?
I’m reading from
JBASE, i have no idea what encoding they use. Displaying in
IEx is not really a requirement, but
it's just a safety check, i’m basically going from
The thing is, i’m losing characters if filter like so:
def ascii(v) do
if String.printable?(v) do
for <<c <- v>>, c in 32..126 || c in [252, 253, 254], into: "", do: <<c>>
I do not know what JBASE is, but
XML has the strict requirement to be encoded in UTF-8 unless specified otherwise.
So you need to know your input encoding, to be able to convert to your output encoding.
If your input viewer does know the encoding and is not just doing wild guesses or a default fallback, then it is probably Codepage 437 encoded.
If though its just a default fallback, then the actual byte could represent anything…
Perhaps take a look at your rendered string and decide on what makes sense…
You don’t. You can try guessing of course, but ultimately if you don’t know the encoding then you have no idea what characters are represented by values 128-255.
at this point the safe bet is to stick with ascii: 32…126