Writing unicode strings as a series of hex values

I recently had the need to store non-latin unicode characters as constants in a method. The chars I wanted were the “box characters”. Essentially, I wanted a method that converted an atom to one of these chars. Below is the only way I could think to do it, but I really don’t like the List.to_string()-ness of it, is there a more idiomatic way to do this?


def direction_to_string(direction) do
     :ground -> "."
      :animal -> "S"
      {:n, :s} -> List.to_string([0x2551])
      {:e, :w} -> List.to_string([0x2550])
      {:n, :e} -> List.to_string([0x255A])
      {:n, :w} -> List.to_string([0x255D])
      {:s, :w} -> List.to_string([0x2557])
      {:s, :e} -> List.to_string([0x2554])
      _ -> ""
end

I tried using bitstrings, like, <<0x2557::16>, but when I IO.puts this in Livebook I don’t get the output I expect.

Can’t you just do this? (works for me)

defmodule Test do
  def direction_to_string(direction) do
    case direction do
      :ground -> "."
      :animal -> "S"
      {:n, :s} -> "║"
      {:e, :w} -> "═"
      {:n, :e} -> "╚"
      {:n, :w} -> "╝"
      {:s, :w} -> "╗"
      {:s, :e} -> "╔"
      _ -> ""
    end
  end
end

IO.inspect(Test.direction_to_string({:n, :w}))

Edit: Provided your editor supports it, you should be able to use pretty much any printable unicode character in a string…

2 Likes

This, is a good idea… I don’t know why I didn’t think to do just do this. I guess I typically avoid having unicode directly in source for fear that some editor may not respond well to it. But, it’s 2023 so I that fear is probably unfounded. I do like having it in hex because I can edit it without having to copy/paste unicode chars or resorting to char map or other weird os tools.

I hear you on the editor thing. A few community libraries have emojis etc embedded in them, so I’m less concerned these days. If you want to maintain your allergy, you can also use the unicode escape (see String — Elixir v1.16.0-rc.0) - e.g. “\u255A”

5 Likes

A size specifier like ::16 means that the given number is represented with exactly two bytes. The UTF8 representation of is three bytes:

iex(6)> <<0xE29597::24>>                       
"╗"

Or you can use the utf8 type directly:

iex(1)> <<0x2557::utf8>>
"╗"
6 Likes