nikody

Bitstring with codepoints of non-ASCII characters

I stumbled upon something that baffles me while working with PDF files. The problem arose due to the non-ASCII characters âãÏÓ on the second line of the PDFs.

I started with a base64 encoded PDF whose first two lines when decoded with Base.decode64! result in: <<37, 80, 68, 70, 45, 49, 46, 53, 13, 10, 37, 226, 227, 207, 211, 13, 10>> with some carriage returns and newlines.

This is not a valid string however, as 226, 227, 207, 211 are the utf-8 codepoints for “âãÏÓ”, but not its bitstring, which is <<195, 162, 195, 163, 195, 143, 195, 147>>.

If we substitute this in the original string so that it becomes <<37, 80, 68, 70, 45, 49, 46, 53, 13, 10, 37, 195, 162, 195, 163, 195, 143, 195, 147, 13, 10>> it is now valid.

I also noticed that <<226 :: utf8>> returns "â", but <<226, 277 :: utf8>> returns <<226, 196, 149>>.

The PDF is naturally a binary, but why is this "âãÏÓ" part not decoded with its bitstring representation and instead it is decoded with its codepoints? I understand that this might sound stupid if the answer is obvious, but I still find it strange. Does it have something to do with the fact that each of these characters use two bytes? And when there is more than one of them so they cannot be meaningfully distinguished?

And as a sidenote, does saving a PDF binary with these codepoints in it work because they are part of a comment (with a %) in the PDF structure and hence the line they are on is completely ignored?

5 comments

#unicode #strings #pdf #bitstring

18 1934 5

2020-03-05 09:51:56 UTC

Marked As Solved

lucaong

I am not a PDF expert, but I think that PDF generally does not use UTF-8 encoding, but rather some single-byte encodings or built in font encoding. For example, if the Latin 1/ISO 8859-1 encoding is used, the characters âãÏÓ would each be encoded by one byte corresponding to their code point.

Note that there is nothing like UTF-8 code points. There are Unicode code points, and UTF-8 is a possible encoding for them. So 226 is the Unicode code point for â, which is encoded in UTF-8 as the binary <<195, 162>>. In another encoding it can be different, for example in Latin 1/ISO 8859-1 (a single-byte encoding) it is encoded as the binary <<226>>.

In short, your PDF is not encoded in UTF-8, so it’s normal that you won’t get UTF-8 bitstrings when looking at it in binary form.

Post #3

Also Liked

NobbZ

How have they been initially written to the PDF?

Perhaps it’s the source encoding that skews you here?

A quick glance makes me assume that source was latin-1/ISO8859-1 encoded.

Post #2

lucaong

That’s because <<226 :: utf8>> gives you the UTF-8 encoded binary corresponding to the code point 226, which is "â" (or, equivalently, the bitstring <<195, 162>>). The meaning of the expression <<226, 277 :: utf8>> is instead: the byte <<226>>, and the bytes corresponding to the code point 227 encoded in UTF-8, which is <<196, 149>>.

The expression <<226 :: utf8, 277 :: utf8>> is probably what you meant to do, and returns, as expected, "âĕ" (or, equivalently, <<195, 162, 196, 149>>).

Post #5

NobbZ

Encodings in PDF are tricky.

Textual segments in the PDF are 7bit ASCII as far as I remember, binary segments may contain arbitrary data.

Text as seen in the rendered PDF does not necessarily exist like that in the PDF, but only as a binary segment listing glyphs to use from another segment. In such a scenario the byte 5 can represent an A while the byte 6 represents the letter ē.

Post #4

Where Next?

View thread on forum (has 5 responses!)

unicode

strings

pdf

bitstring

Home Questions & Help>Questions

#unicode #strings #pdf #bitstring

18 1934 5

Last post

Popular in Questions

Questions & Help>Questions

How can I write a raw sql query?

Hi, I have to write a raw query for one of my project. But till now I have used ecto queries and don’t have much experience writing raw ...

/phoenix #ecto

13 19750 20

2020-04-12 00:15:10 UTC

New

Questions & Help>Questions

How to convert map to string (separated with ,)

Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...

#maps #strings

15 15728 2

2019-03-08 10:48:10 UTC

New

Questions & Help>Questions

Regex question for hyphen match

I’m not a pro in using Regex and can’t figure out why the following behaviour happens, especially if we take into account the difference ...

#learning-elixir

9 13631 10

2019-03-19 14:06:17 UTC

New

Questions & Help>Questions

Failed to run 'elixir' command error in vs code

Using vs code and installed ElixirLS: support and debugger. And I got an error popped up on start up says Failed to run ‘elixir’ comma...

#vscode #elixir-ls

49 16657 39

2025-08-20 18:57:04 UTC

New

Questions & Help>Questions

Difference in between :utc_datetime and :naive_datetime in Ecto

I am using Ecto timestamps with postgres, I can see the timestamps() use the :naive_dateime but for my use case I wanted to store the ti...

#ecto #timestamps

79 28261 15

2019-12-27 00:02:47 UTC

New

Questions & Help>Questions

IEX in Windows Powershell?

Hi. I’ve noticed that Windows Powershell has it’s own IEX command and you cannot access Elixir’s IEX due to the conflict. This isn’t a cr...

#iex #microsoft-windows #windows #powershell

15 30503 4

2018-06-09 16:59:36 UTC

New

Questions & Help>Questions

Ecto: Validating belongs_to association is not nil?

Okay, I’m having a heck of a time trying to figure out how to best handle the validation of belongs_to associations in Ecto. I’m sure I’...

#ecto

29 18027 25

2023-08-04 11:00:52 UTC

New

Questions & Help>Questions

Dialyzer: suppress warning on a specific function

In the Dialyzer docs ( dialyzer — OTP 29.0.2 (dialyzer 6.0.1) ), there is a way to turn off a specific warning for a function: -dialyzer...

#dialyzer

26 14065 7

2026-01-15 15:41:42 UTC

New

Questions & Help>Questions

How to get struct from map - elixir?

Lets say I have map like this fetching from my database %{"_id" => #BSON.ObjectId<58eb1a7a9ad169198c3dXXXX>, "email" => ...

/phoenix #ecto #maps #structs

38 34931 34

2025-08-22 12:15:57 UTC

New

Questions & Help>Questions

Websocket connection works on localhost, but get 403 error when deployed via docker

For some reason my phoenix channels are working for me in my local dev environment, but as soon as I deploy via Docker, I get a 403 error...

/phoenix #channels

8 26986 12

2020-03-07 19:29:53 UTC

New

Other popular topics

Questions & Help>Questions

Erlang and Elixir on Apple Silicon/M1 Chip

Hello all! I am typing this post from my new MacBook Pro with the M1 chip. I’m loving it so far, and will probably use it as my daily dr...

#erlang #troubleshooting

121 25150 65

2023-07-05 21:22:36 UTC

New

Questions & Help>Questions

DateTime Format to string?

Hi, I am new to Elixir. I am trying to use the DateTime component to insert a date into MySQL however the there seems to be no way to fo...

#datetime

19 24983 16

2025-05-01 18:58:58 UTC

New

Questions & Help>Questions

Using List.first instead of Enum.at(0)

I have seen a lot of code which picks the first element from a list using Enum.at(0) instead of List.first. Is there a reason why people ...

#code-style

76 33785 14

2022-10-26 22:41:44 UTC

New

Questions & Help>Questions

Using VSCode on multiple monitors

Hello everybody, usually, I use a 29" ultra-wide monitor for VSCode which can easily accomodate explorer (files panel) + file with code ...

#vscode #vscode-elixir

7 17994 6

2021-04-16 15:44:36 UTC

New

Questions & Help>Questions

How To Get Phoenix & VueJS working Together?

I have VueJS GUIs with the project generated using Webpack. I have Elixir modules that will need to be used by the VueJS GUIs. I forese...

/phoenix

93 22707 42

2019-12-19 09:28:07 UTC

New

Questions & Help>Questions

Import a module from a file into IEX

What is the proper way to load a module from a file in to IEX? In the python world, doing something like this pretty standard: from ....

#iex

35 32683 16

2024-11-20 04:12:47 UTC

New

Questions & Help>Questions

What is the best IDE for elixir?

I would like to know what is the best IDE for elixir development?

#code-editors

33 49735 23

2021-04-11 09:06:24 UTC

New

Questions & Help>Questions

How to get struct from map - elixir?

Lets say I have map like this fetching from my database %{"_id" => #BSON.ObjectId<58eb1a7a9ad169198c3dXXXX>, "email" => ...

/phoenix #ecto #maps #structs

38 34931 34

2025-08-22 12:15:57 UTC

New

Questions & Help>Questions

How is it possible to get 2 million websocket connections when you have 65536 available ports?

I have a server on AWS, and was running a load test using artillery. When looking at the Phoenix dashboard I see the Ports going to 100% ...

/phoenix

20 19015 4

2023-01-24 00:21:16 UTC

New

Questions & Help>Questions

How To Implement if...else if...else condition

Hi everyone! I need implement if…else if…else condition from my elixir code, and anymore of this control flow structures not work proper...

#how-to-question

40 52356 6

2017-08-23 10:29:43 UTC

New

Questions & Help>Questions

Help with elixir-ts-mode in doom-emacs config

Questions & Help>Questions

Are Vi keybindings possible inside IEx?

Questions & Help>Questions

I miss the ternary operator - does anyone have a macro that allows a ternary operator in Elixir code?

Questions & Help>Questions

Empty Result on Generic Action with graphql_unnested_unions

Questions & Help>Questions

Clarification about `assign/2,3` usage in `render/1` callbacks

Questions & Help>Questions

With the new 1.20 release does it change the way you see Gleam?

Questions & Help>Questions

Using Phoenix.LiveView.TagEngine as an EEx.Engine is deprecated!

Questions & Help>Questions

About ambiguity introduced in function default arguments

Questions & Help>Questions

OpenApiSpex schema - are there any naming conventions on handling show and index routes?

Questions & Help>Questions

How to get type warnings before test failure reports

Questions & Help>Questions

Questions Questions ❯

Latest on Elixir Forum

Green_ash - a keyboard-driven LiveView console to probe your Ash resources

News>Announcing

Practical Mentorship for a Stronger Community - Jordan Miller | ElixirConf US

Learning Resources>Talks

Amarula - a WhatsApp client in pure Elixir

News>Announcing

Comcent CE - an open-source voice/contact-center platform on Elixir/OTP, with call queues modeled as processes

News>Announcing

LT: smithy beam: Contract first API Development - Frank Eickhoff | ElixirConf EU

Learning Resources>Talks

BEAM There, Done That with Lukas Backström on Building the BEAM JIT

Blogs & Podcasts>Podcasts

Senior Software Engineer - Stord, Remote USA

Jobs & Member Profiles>Jobs

Hyper - distributed Firecracker microVM orchestrator written in Elixir

News>Announcing

Just_bash - a bash interpreter + virtual filesystem in Elixir (and how we use it to power an agent in production)

Chat & Discussions>AI / LLMs

Update from the Erlang Ecosystem Foundation - Dan Janowski | ElixirConf EU

Learning Resources>Talks

RFC 10008 - HTTP QUERY method: any plans for Plug/Cowboy support?

Chat & Discussions>Discussions

Localize bindings for Lua, LFE, Erlang and Gleam

News>Announcing

Attesto - OpenID-certified OAuth 2.1 / OpenID Connect for Elixir (Phoenix provider, client, and MCP auth)

News>Announcing

Improv - BLE Wi-Fi provisioning for Elixir/Nerves devices

News>Announcing

Andrew (Nature) Okoye - Senior Full Stack Engineer (Elixir, Phoenix, React) | Remote

Jobs & Member Profiles>Member Profiles

Elixir Forum ❯

Sub Categories:

Forums

We're in Beta

About us Mission Statement

Bitstring with codepoints of non-ASCII characters

nikody

Bitstring with codepoints of non-ASCII characters

Marked As Solved

lucaong

Also Liked

NobbZ

lucaong

NobbZ

Where Next?

Popular in Questions

How can I write a raw sql query?

How to convert map to string (separated with ,)

Regex question for hyphen match

Failed to run 'elixir' command error in vs code

Difference in between :utc_datetime and :naive_datetime in Ecto

IEX in Windows Powershell?

Ecto: Validating belongs_to association is not nil?

Dialyzer: suppress warning on a specific function

How to get struct from map - elixir?

Websocket connection works on localhost, but get 403 error when deployed via docker

Other popular topics

Erlang and Elixir on Apple Silicon/M1 Chip

DateTime Format to string?

Using List.first instead of Enum.at(0)

Using VSCode on multiple monitors

How To Get Phoenix & VueJS working Together?

Import a module from a file into IEX

What is the best IDE for elixir?

How to get struct from map - elixir?

How is it possible to get 2 million websocket connections when you have 65536 available ports?

How To Implement if...else if...else condition

Questions & Help>Questions

Latest on Elixir Forum

Sponsor Spotlight

Our Sponsors

Categories:

Sub Categories:

Forums

Popular Tags

Our Sponsors

We're in Beta