How to view Extended ascii (> 127) characters in IEx?

Is it possible to view extended Ascii characters in IEX?

I got this string:

<<83, 97, 105, 110, 116, 32, 65, 110, 110, 101, 146, 115, 32, 67, 97, 116, 104, 111, 108, 105, 99, 32, 83, 99, 104, 111, 111, 108, 32, 73, 114, 97, 119, 111, 59, 59>>

seems only ASCII 146: Æ prevents it from displaying.

Is there any reason we don’t display ASCII 127 above in IEx?

This is not properly encoded UTF-8, if you know the input encoding, you can use some conversion library to convert from one encoding to another.


edit

also please remember, that anything >= 128 is not ASCII, ASCII has only 7 bit. There are some 8 bit encodings that sometimes are refered to as 8-Bit-ASCII or extended ASCII, but thats not their “true” name.

4 Likes

I don’t know the source encoding, but i was just going with

“i can see these characters in the data source, but not on IEx”

and it looked like 146 was the only offending char, and char 146 looks … printable

This depends entirely on the encoding. To print in IEx you need to have the string utf8 encoded, which may mean converting from latin1 or whatever the source encoding is.

so how do you deal with an UNKNOWN encoding upfront?

I’m reading from JBASE, i have no idea what encoding they use. Displaying in IEx is not really a requirement, butit's just a safety check, i’m basically going from JBASE to XML

The thing is, i’m losing characters if filter like so:

  def ascii(v) do
    if String.printable?(v) do
      v
    else
      for <<c <- v>>, c in 32..126 || c in [252, 253, 254], into: "", do: <<c>>
    end
  end

I do not know what JBASE is, but XML has the strict requirement to be encoded in UTF-8 unless specified otherwise.

So you need to know your input encoding, to be able to convert to your output encoding.


edit

If your input viewer does know the encoding and is not just doing wild guesses or a default fallback, then it is probably Codepage 437 encoded.

If though its just a default fallback, then the actual byte could represent anything

Perhaps take a look at your rendered string and decide on what makes sense…

2 Likes

You don’t. You can try guessing of course, but ultimately if you don’t know the encoding then you have no idea what characters are represented by values 128-255.

2 Likes

at this point the safe bet is to stick with ascii: 32…126