Applying :crypto and Jason.encode on Unicode characters

Spent quite a bit of time pondering this as I am unable to get the desired output similar to the PHP function that im trying to duplicate:

plaintext = ["<foo>","'bar'","\"baz\"","&blong&"] |> Jason.encode!
key = "secreted"
iv = "PlayerAZ"
:crypto.crypto_one_time(:des_cbc, key, iv, plaintext, [{:encrypt, :true},{:padding, :pkcs_padding}] ) |> Base.encode64

PHP:

<?php
        $plaintext = json_encode(array('<foo>',"'bar'",'"baz"','&blong&'));
        $key = 'secreted';
        $vector = 'PlayerAZ';
        $encryptedString = openssl_encrypt($plaintext, 'des-cbc', $key, OPENSSL_RAW_DATA, $vector);
        echo 'Encrypted: '.$encryptedString."\n";
        echo ''.$plaintext."\n";
        
        $encryptedStringBase64 = base64_encode($encryptedString);
        echo 'Encrypted + base 64:'.$encryptedStringBase64."\n";

        echo 'Decrypted:'.openssl_decrypt(base64_decode($encryptedStringBase64),'des-cbc',  $key, OPENSSL_RAW_DATA, $vector)."\n";

Sandbox: https://wtools.io/php-sandbox/b8Xw
In both cases: the output is the same for both: β€œrQnm2tysllww+bZLSwOa9UQUc0EEH3sNlu84k5o33VgbmbnHnIZx0g==”

The problem when i change the plaintext variable to include unicode character, β€œ\xc3\xa9” as follows:
array(’’,"β€˜bar’",β€™β€œbaz”’,’&blong&’, β€œ\xc3\xa9” (PHP)
["","β€˜bar’","β€œbaz”","&blong&","\xc3\xa9"] (Elixir)

Both function outputs diverge and no longer agree.

May I know what am i doing wrong? Thanks.

Just curious, why do you want to do that? The point of encryption is to decrypt, so shouldn’t you check the decrypted data instead of the encrypted data?

If you just want to checksum, there are much easy ways. Nut then the JSON format is not unique, there could be insignificant spaces. So 2 equivalent JSON may not match in checksum.

Is it possible that PHP is sending the unicode string for encrypting as-is β€œ\xc3\xa9” and Elixir is interpreting it as β€œΓ©β€ first?

Alternatively the unicode representation is being changed somehow depending on the language:

iex(4)> String.codepoints(β€œ\u00e9”)
[β€œΓ©β€]
iex(5)> String.codepoints(β€œ\xc3\xa9”)
[β€œΓ©β€]

I need to duplicate the php encryption function in totality to have perfectly identical output as I am passing an encrypted string to an external software written in PHP.

Hey thats a possibility. Hmm so how do i do the same in elixir - passing it as is without modification. But part of me still thinks that even if elixir modifies the unicode map, its still the same character right

Those are the million dollar questions :money_with_wings:

There is a difference between the raw strings β€œ\xc3\xa9” and β€œΓ©β€, one is interpreted as unicode and the like the charlist [?\, ?x, ?c, ?3, …etc].

You can normalize the unicode string in one of 2 ways: NFC or NFD:

iex(3)> :unicode.characters_to_nfc_list("\u00e9")  
[233]
iex(4)> :unicode.characters_to_nfd_list("\u00e9")
[101, 769]
iex(5)> :unicode.characters_to_nfd_list("\xc3\xa9")
[101, 769]
iex(6)> :unicode.characters_to_nfc_list("\xc3\xa9")
[233]
3 Likes

Trying out the Unicode version in that PHP sandbox shows that PHP encodes Γ© as \u00e9 in JSON.

Jason (and the JSON standard) permits Unicode characters in strings, so they pass the character through unescaped.

You can tell Jason to escape everything but ASCII characters by passing the escape: :unicode_safe option to Jason.encode!

3 Likes