Generating alphanumeric strings for permalinks

I want to create random alphanumeric permalinks that I can be sure are always unique. Is there a library in Elixir for doing something like this, or does anyone know how it could be implemented?

3 Likes

Perhaps something like this? Not sure that it has a guarantee for uniqueness but the likelihood is low. You’d probably have to put in a database and add a unique constraint (and regenerate when it fails) in order to give it a guarantee.

3 Likes

IMHO, if you want perfect uniqueness, you have to sacrifice randomness. I’ve written a module for myself, it generates alphanumeric ID’s based on a 48-bit timestamp, a 32-bit node name hash code, and a 16-bit sequence number.

defmodule CUID do
  @moduledoc """
  CUID is for **Cluster-Unique ID**. It's shorter than UUID, case sensitive, and URL-safe.

  The generated ID is guaranteed to be unique among the BEAM cluster
  if there are no more than 65536 nodes in the cluster,
  and the request rate is lower than 256 req/(node * millisec).

  ## Examples

      iex> CUID.start_link(CUID)
      iex> CUID.gen(CUID)
      "AWAAtXL1doYA"

  """

  use Agent
  use Bitwise

  @doc """
  Start a `CUID` process linked to the current process.
  """
  def start_link(name) do
    case Agent.start_link(fn -> 0 end, name: name) do
      {:error, {:already_started, _}} -> :ignore
      other -> other
    end
  end

  @doc """
  Generate a CUID.
  """
  def gen(server) do
    now = System.os_time(:millisecond)  # 48 bits
    node_name_hash = Node.self() |> Atom.to_charlist() |> Enum.reduce(0, fn(char, hash) ->
      ((hash <<< 5) - hash + char) &&& 0xFFFF
    end)  # 32 bits
    n = Agent.get_and_update(server, fn n -> {n, (n + 1) &&& 0xFF} end)  # 16 bits
    <<now::48, node_name_hash::16, n::8>> |> Base.url_encode64(padding: false)
  end
end
5 Likes

Thanks, the output here is what I’m looking for. Randomness isn’t really important, just need it to keep generating alphanumeric strings that are unique from each other. I’ll give this a shot!

2 Likes

The standard library is the gift that keeps giving! Isn’t Elixir the greatest?

Base.url_encode64/2

5 Likes

Yeah, I usually do something like

@spec permalink(integer) :: binary
def permalink(bytes_count) do
  bytes_count
  |> :crypto.strong_rand_bytes()
  |> Base.url_encode64(padding: false)
end
permalink(15)
#=> "tWW9pID24daLP_Q4emyd"
9 Likes

I’ve found http://hashids.org/ nice, and they have an Elixir implementation. The nice thing is that you don’t need to store the slug separately, you can compute the ID back from your slug (and as long as you keep the salt secret, other people can’t).

4 Likes

:laughing: I realize this is a product of math to get this number but isn’t it then theoretically unlimited? I don’t think the erlang VM can cluster with anywhere close to this number of nodes…

1 Like

I haven’t tested the limit of BEAM because I don’t have the environment :grin:

I haven’t either but I’ve heard it doesn’t scale past 75-100 nodes in a cluster due to too much network traffic; packets start getting dropped and it appears nodes are down when they’re not actually… I’ve heard they’re working on changing the protocol to something that does scale better in future versions of the EVM

Check out nanoid: https://github.com/ai/nanoid

There is an elixir library: https://github.com/railsmechanic/nanoid

(I haven’t used any of them though)

1 Like

A pseudo-random alphanumeric string guaranteed to be unique? You’re describing an UUID4.
This is the Elixir library: https://hexdocs.pm/uuid/UUID.html

1 Like

Actually wouldn’t they be more looking for a UUID1 or UUID2? UUID4 can have collisions, rare though they may be, unlike UUID1/UUID2.

2 Likes

You control the character set used, number of lifetime identifiers, and the chance of collision. Much more space efficient than UUIDs. Perfect for the permalink use case.

1 Like

You are minimizing ‘readable’ characters for maximal storage in the smallest visible area? You might get a kick out of these links. ^.^

3 Likes

Do you need your identifiers to be random, or just unguessable? (I.e. if an external user has identifier X, they cannot generate a valid identifier X+1)?

If the latter, you could consider using Hashids which, given a secret salt, give you a bidirectional mapping between integers (e.g. sequential database IDs) and alphanumeric strings (e.g. x8GB7y) which only you can generate or reverse (because you have the secret salt).