Strange behavior with strings

Is this the expected behavior for strings?

def rec_text do
    # content of rec.txt
    s = "ALTITUDE ENTERPRISES INT’L"
    a = File.read!("rec.txt")

    {a, s}
  end
# Output
> X.rec_text
{"ALTITUDE ENTERPRISES INTÔÇÖL", "ALTITUDE ENTERPRISES INTÔÇÖL"}

# pasted directly into IEx it outputs correctly as expected
> "ALTITUDE ENTERPRISES INT’L"
"ALTITUDE ENTERPRISES INT'L"
>

No clue why this is happening.

The string s should be output correctly, maybe your terminal is not using U8 codepage? Are you on windows? As for a - depends on the encoding of the file.

I’m on Windows 10, I’m using Windows Terminal + conEmu.

the troubling bit is when I pass the same to ODBC I get an error. so it’s not just the screen output, it’s the actual value

What is the encoding of the source-file?

it’s a plain text file, I did not specify any

Hi - looks like something in your toolchain is tripping you up by interpreting the text file through codepage 858 - Code page 858 - Wikipedia

Your string quotation mark ( ’ ) is encoded in your file as hex e2 80 99 - which in code page 858 gives ÔÇÖ

Try to get your tool chain to respect UTF-8 - it displays correctly that way… you have my deepest sympathy - I have never touched windows 10 so can’t be of help there…

R.
Fridrik

4 Likes

@fsa from what you are saying, the console display will be wrong as expected, but the underlying string value in elixir should not change.

That’s where I’m worried

wow, how did you solve this case, detective? Have you memorized all code pages?

When you encode some text into a binary you have to use some encoding. For example you could have this codepage:

A = 01
B = 02
C = 03

so if you encode the text “ABBA” you’d get the binary 01020201

If you read this file, you need to know the encoding to decode it. If you use another codepage, eg this one:

1 = 01
2 = 02
3 = 03

this would read as 1221 - not what you want at all.

1 Like

Hi again - I’m not on Windows, and have never seen or used conEmu, but there
seems to be a setting here : ConEmu | Settings › Environment page
that talks about utf8 … maybe play around with that?

If the file is utf8 encoded, and your display environment respects utf8, all the
toolchain between the dead file on disk and the display must respect utf8 - if
any of the pieces in your toolchain start to chop the content in the file up
according to different encoding you will be tripped up by surprise…

R.
Fridrik.

yes, I have Chcp utf-8 set

Hi again - have you tried to google for conEmu - I am a little lost on this end, as my familiarity with this toolchain you are using is none-existing :slight_smile:

ConEmu | Unicode Support seems to have relevant info…
you should maybe also check the console font you are using…
you could try to run chcp 65001 before you start iex…

… pester some devs running on windows(10)…!!! :stuck_out_tongue_winking_eye:

Be brave and explore… just remember to note down what works, and what not… and take backups… :slight_smile:

R.
Fridrik.

4 Likes

This worked, Thanks so much!

Yay, success!!! :slight_smile:
R.
Fridrik.

Hi - hah, no - just brutal trial and error - Synalize It is a relatively quick way…

2 Likes

glad it works now. But shouldn’t iex somehow communicate the encoding it wants to the shell?