Creating a BEP3 compatible percent encoder (URI encoder)

I would like to announce torrents to my opentracker using an Elixir program. I have a test qbittorrent and opentracker running and I’ve identified how qbittorrent sends it’s announces to opentracker.

METHOD: GET
URL
http://localhost:8000/announce?info_hash=%ac%c3%b2%e43%d7%c7GZ%bbYA%b5h%1c%b7%a1%ea%26%e2&peer_id=-qB5020-r3FSX0qNU6Oo&port=4993&uploaded=0&downloaded=0&left=0&corrupt=0&key=F7F879A3&event=started&numwant=200&compact=1&no_peer_id=1&supportcrypto=1&redundant=0
HEADERS
Accept-Encoding:
gzip
Connection:
close
Host:
localhost:8000
User-Agent:
qBittorrent/5.0.2

I would like to build this same GET request into my own code. My confusion comes from the percent encoded info_hash. Neither URI.encode/2 nor URI.encode_www_form/1 encode the URI the same way that qbittorrent does it.

Let me show you what I’ve tried so far.


# We start with the `Info hash v1` of the torrent, as copied from qBittorrent.
info_hash_v1 = "acc3b2e433d7c7475abb5941b5681cb7a1ea26e2"

# Next we decode the hexadecimal representation into binary.
binary = Base.decode16!(info_hash_v1, case: :lower)

I’m not sure what comes next. Digging through Elixir URI source code, I can see the method of percent encoding that creates a binary in almost the right format.

The problem here is that the hex/1 function only outputs 16 possible values, A-F, 0-9. There is no lowercase! The way qbittorrent is doing their percent encoding, they have lowercase too-- a-f, A-F, 0-9.

I read qbittorrent source code to see how they’re percent encoding, but I couldn’t find the code that does that. I found some info_hash references though. I can barely read C++, but I think the url encoding might be abstracted away in a request library.

I also looked through transmission-qt code. That code is greek to me, but I was able to find their percent encoder implementation.

Seems similar to what I read in Elixir URI source code related to unreserved and unescaped characters. RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax

I bounced some ideas off of ChatGPT, borrowed code from Elixir URI, and I got a bep3_encode/1 function put together.


  @doc """
  Encodes `string` as BEP3's weird URL encoded string.

  ## Example

      iex> bep3_encode("a88fda5954e89178c372716a6a78b8180ed4dad3")
      "%A8%8F%DAYT%E8%91x%C3rqjjx%B8%18%0E%D4%DA%D3"

  """
  @spec bep3_encode(binary) :: binary
  def bep3_encode(string) when is_binary(string) do
    string = Base.decode16!(string, case: :lower)
    URI.encode(string, &URI.char_unreserved?/1)
  end

Given an Info hash v1 input of acc3b2e433d7c7475abb5941b5681cb7a1ea26e2, the expected output should be as follows.

%ac%c3%b2%e43%d7%c7GZ%bbYA%b5h%1c%b7%a1%ea%26%e2

However, I haven’t figured out how I preserve the case on the various letters. I assume I need to have exactly the same case-sensitive output as qBittorrent because an ASCII A is not the same as a.

It seems like the way qbittorrent does it, and I apologize for not having the language to communicate this… hex values that can be displayed as ASCII are output as ASCII. Otherwise, the hex representation is displayed.

I wrote this out to make a visual comparison of expected input and output values, because this is the only way I could think of understanding what is happening during the encoding.

 a8   8f   da   59   54   e8   91   78   c3   72   71   6a   6a   78   b8   18   0e   d4   da   d3
%A8  %8F  %DA   Y    T   %E8  %91   x   %C3   r    q    j    j    x   %B8  %18  %0E  %D4  %DA  %D3

On the second line (expected output) See the ASCII Y? That matches up with Hex 54. The three values before that were all not able to be displayed as ASCII, so the hex value was used instead.

Later on, we can see lowercase x, r, q, j, j, x. From what I can tell, Elixir’s built-in URI.encode/1 can’t do this, because like I mentioned earlier, that can only output A-F,0-9. No lowercase!

By the way, https://www.asciitohex.com/ has been very helpful.

Anyway, I am very confused. I’m going to rest now and pick this up in the morning.

This is not true, as trying it will demonstrate:

iex(1)> s = <<0xA8, 0x59, 0x72>>
<<168, 89, 114>>

iex(2)> URI.encode(s)
"%A8Yr"
1 Like

I stand corrected; you’re right. URI.encode/1 will return letters in both uppercase and lowercase.

I set up some unit tests to make sure my encode function gives the same output as what qBittorrent does.


  # test/bep_encode_test.exs

  test "tails-amd64-6.10-img 07 b4 51 63 36 e4 af e9 23 2c 73 bc 31 26 42 59 0a 7d 7e 95" do
    actual = BepEncode.bep_encode("07b4516336e4afe9232c73bc312642590a7d7e95")
    expected = "%07%b4Qc6%e4%af%e9%23%2cs%bc1%26BY%0a%7d~%95"
    assert actual === expected
  end

  test "linuxmint-22-mate-64bit.iso e0 a4 05 8e 40 7d dd ad 1c ac f8 c9 ce db 0b 27 21 c0 7f 92" do
    actual = BepEncode.bep_encode("e0a4058e407dddad1cacf8c9cedb0b2721c07f92")
    expected = "%e0%a4%05%8e%40%7d%dd%ad%1c%ac%f8%c9%ce%db%0b%27!%c0%7f%92"
    assert actual === expected
  end

  test "debian-12.8.0-amd64-DVD-1.iso 56 3e 72 81 c0 00 e1 80 91 e5 c0 d3 9d 09 8c ff 13 5d ab 26" do
    actual = BepEncode.bep_encode("563e7281c000e18091e5c0d39d098cff135dab26")
    expected = "V%3er%81%c0%00%e1%80%91%e5%c0%d3%9d%09%8c%ff%13%5d%ab%26"
    assert actual === expected
  end

Looks like exclamation marks are allowed in BEP3 percent encoding. Custom URI.encode/2 predicate it is! I like how flexible it is because of that predicate arg.

  # lib/bep_encode.ex

  def char_allowed?(character) do
    character in ?0..?9 or character in ?a..?z or character in ?A..?Z or character in ~c"~_-.!"
  end


  @spec bep_encode(binary()) :: binary()
  def bep_encode(string) when is_binary(string) do
    string = Base.decode16!(string, case: :lower)
    URI.encode(string, &char_allowed?/1)
  end

Running tests, it looks like we’re almost there. Length is good, allowable special characters are left alone, but still there’s an issue of case in some of the hex values.


  3) test linuxmint-22-mate-64bit.iso e0 a4 05 8e 40 7d dd ad 1c ac f8 c9 ce db 0b 27 21 c0 7f 92 (BepEncodeTest)
     test/bep_encode_test.exs:19
     Assertion with === failed
     code:  assert actual === expected
     left:  "%E0%A4%05%8E%40%7D%DD%AD%1C%AC%F8%C9%CE%DB%0B%27!%C0%7F%92"
     right: "%e0%a4%05%8e%40%7d%dd%ad%1c%ac%f8%c9%ce%db%0b%27!%c0%7f%92"
     stacktrace:
       test/bep_encode_test.exs:22: (test)



  4) test tails-amd64-6.10-img 07 b4 51 63 36 e4 af e9 23 2c 73 bc 31 26 42 59 0a 7d 7e 95 (BepEncodeTest)
     test/bep_encode_test.exs:13
     Assertion with === failed
     code:  assert actual === expected
     left:  "%07%B4Qc6%E4%AF%E9%23%2Cs%BC1%26BY%0A%7D~%95"
     right: "%07%b4Qc6%e4%af%e9%23%2cs%bc1%26BY%0a%7d~%95"
     stacktrace:
       test/bep_encode_test.exs:16: (test)



  5) test debian-12.8.0-amd64-DVD-1.iso 56 3e 72 81 c0 00 e1 80 91 e5 c0 d3 9d 09 8c ff 13 5d ab 26 (BepEncodeTest)
     test/bep_encode_test.exs:25
     Assertion with === failed
     code:  assert actual === expected
     left:  "V%3Er%81%C0%00%E1%80%91%E5%C0%D3%9D%09%8C%FF%13%5D%AB%26"
     right: "V%3er%81%c0%00%e1%80%91%e5%c0%d3%9d%09%8c%ff%13%5d%ab%26"
     stacktrace:
       test/bep_encode_test.exs:28: (test)

Not sure why this is happening, but I’ll keep plugging along.

Eureka!

defmodule App.BittorrentUrlEncoder do
  @moduledoc """
  URL encoding for Bittorrent Info hash v1. Designed to be compatible with qBittorrent's percent encoding.
  """

  import Bitwise

  @doc """
  Encodes `string` as a Bittorrent-flavored percent-encoded string.

  ## Example

      iex> encode("a88fda5954e89178c372716a6a78b8180ed4dad3")
      "%a8%8f%daYT%e8%91x%c3rqjjx%b8%18%0e%d4%da%d3"

  """
  @spec encode(binary()) :: binary()
  def encode(hex_string) when is_binary(hex_string) do
    hex_string
    |> Base.decode16!(case: :lower) # Decode from hex to raw bytes
    |> encode_bytes()
  end

  defp encode_bytes(<<>>), do: ""

  defp encode_bytes(<<byte, rest::binary>>) do
    percent_encode(byte) <> encode_bytes(rest)
  end

  defp percent_encode(byte) when byte in ?0..?9 or byte in ?a..?z or byte in ?A..?Z or byte in ~c"~_-.!" do
    <<byte>>
  end

  defp percent_encode(byte) do
    "%" <> <<hex(bsr(byte, 4)), hex(band(byte, 15))>>
  end

  defp hex(n) when n <= 9, do: n + ?0
  defp hex(n), do: n + ?a - 10
end

The secret sauce was changing ?A to ?a in the hex/1 function

-defp hex(n), do: n + ?A - 10
+defp hex(n), do: n + ?a - 10
1 Like