Phx.gen.auth - Why does hashing the tokens prevent timing attacks?

Regarding the phx.gen.auth generator: https://github.com/aaronrenner/phx_gen_auth

I’m trying to understand why, for many of the user tokens (such as reset_password ones), a hashed version of the token is stored instead of the original token.

The only hint I’ve been able to find is the “original design spec”, which states:

The hashing helps protect against timing attacks.

Can someone explain in clear terms how a timing attack would work, if they weren’t hashed? Would it be something like this?

  1. Attacker attempts to reset a password using a random token. Observes that the request takes N milliseconds.
  2. Attacker keeps trying random tokens. Once they find one that worked, it takes a statistically significantly different amount of time.

It sounds like the hashing step perhaps makes the request take an order of magnitude longer, such that it’s much more difficult to be able to tell the difference between valid and invalid tokens.

Thanks to anyone who can enlighten me!

1 Like

I wrote a little about it: https://til.simplebet.io/posts/ki5natqkn3-mitigating-timing-attacks
Hashing the attempted password is going to result in a quasi-random string that has no perceptible relation to the actual password. Therefore, missing the password by 1 character, vs 10 characters does not result in a calculable difference in timing.
Doing a left-to-right comparison would fail at different times no matter how close or far the attempted password was.

5 Likes

Actually it’s not about trying random tokens, but methodically trying with all letters (a-zA-Z) or whatever the allowed symbols are as the first letter, and measure.

Let’s say the token is "dG90b3RpdGl0dXR1dG91dG91", trying with the following strings:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
baaaaaaaaaaaaaaaaaaaaaaaaaaaaa
caaaaaaaaaaaaaaaaaaaaaaaaaaaaa
daaaaaaaaaaaaaaaaaaaaaaaaaaaaa
eaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
faaaaaaaaaaaaaaaaaaaaaaaaaaaaa
...
Yaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Zaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

you’ll notice that it’s a bit longer for "daaaaaaaaaaaaaaaaaaaaaaaaaaaaa" if string comparison is used, simply because only in this case there’ll be a second comparison (the first letter being equal, the string comparison needs to check for the second letter before returning false).

Then repeat with "daaaaaaa...", "dbaaaaaa...", "dcaaaaaa..."

Sounds impractical but it is not:

A worst-case scenario for guessing an HMAC would require 20×256×n20 \times 256 \times n20×256×n measurements, where nnn is the number of measurements required to pin down a single byte. So—around 5,000,000 requests. You could do that in less than a week at a barely-perceptible 10 req/s.

Storing hash, you can still guess a hash using this attack but this will be useless because what you need is not the hash, but the original message (or any valid preimage), which mean you need to crack the hashing algorithm.

You can use Plug.Crypto.secure_compare/2 in Elixir to avoid timing attacks (it compares string in constant time) but my guess is that you cannot do it in database, which is why hashes are stored instead.

4 Likes

A timing attack is basically an attacker trying out requests with different inputs. The attacker then tries to observe differences in how long requests take. Those differences can come into existance by different code paths in the request handling having (non tiny) different computational complexity. E.g. for a login one might fetch the user and only if one is found the login password would be hashed for comparison with the db stored value. If no user was found no hashing would happen. The hashing vs. no hashing path likely take noticeable different times. A timing attack could therefore aggregate a list of emails registered to your service, just by observing response times. Long duration -> user likely exists; Short duration -> user likely doesn’t exist.

Storing the hashed values in the db for tokens likely (I haven’t looked at the code) means on requests, where the user input includes the token, no amount of hashing needs to happen anymore. A single string comparison values is not computationally complex enough to be observable in timing attacks or at least it’s masked by changing latencies in various places between the computation and the attacker.

6 Likes

Not sure to understand here. If the user sends the bare token (not hashed) and the token is stored in its hash form in the DB, then hashing the bare token before comparing it in the DB is necessary. As far as I know hashing is constant-time.

Trying to find an old article that explains it all.

A single string comparison values is not computationally complex enough to be observable in timing attacks or at least it’s masked by changing latencies in various places between the computation and the attacker.

Actually it is observable. Adding some jitters, either unintentionally or intentionally (:timer.sleep(:rand.uniform(100)) for instance) doesn’t work because statistically, on the long run, the same amount of time is added to each request, and the one with the correct “first letter” will still be a bit longer and observable.

1 Like

That’s also fine. It’s not a matter of hashing vs. not hashing for every request. It’s a matter of hashing vs. not hashing depending on the requests.

Thanks for the input everyone :slightly_smiling_face: @tangui, your input especially made it click.

I understood the essence of a timing attack, but it hadn’t clicked that with a bare token, they can systematically guess permutations per your example. Whereas with storing the hash, their input would be converted to something totally different performing the string comparison, e.g. in the database.


Do I understand correctly, however, that if the attacker knows which hashing function you’re using, they could still perform a sort of timing attack if they had a rainbow table of all possible inputs, and their hashes? (Which you should assume they do; security by obscurity doesn’t hold up in practice.)

Thought experiment: In the phx.gen.auth implementation, the tokens are generated from :crypto.strong_rand_bytes(32), which means there are 256^32 possible tokens. Which, according to really rough calculations, you’d need on the order of= 4 * 10^60 exabytes of disk space just to hold all of those tokens base64 encoded.

So I guess, nevermind. :slightly_smiling_face:

Thanks again everyone!

1 Like

Nope as hashes are usually salted. The attacker would need the salt as well.

2 Likes

Not sure it is necessary here: I’ve found like @paulstatezny that you need 1e77 bits to store the rainbow table. The universe contains between 10^78 and 10^82 atoms, so in the worst case you need 10% of all available atoms to build the rainbow table.

Rainbow tables are particularly efficient when dealing with user passwords, because they have low entropy (= low level of randomness), narrowing the space search.

On the contrary, tokens used for access control are randomly generated.

Even though you are right, that it might be hard to impossible to store the rainbow at all, personally I consider it easier to imagine that just the second ingridient “salt” is missing, rather than the imagination that there might not be enough atoms in our sight to be able to store the data at all.

I can’t even barely imagine a single atom, as each of my teachers presented me a different model, and no one was able to tell how they really look like. Atoms are quite to abstract for me…

1 Like

…as far as we know. :stuck_out_tongue: Dark matter is still unaccounted for.

But on topic, maybe the quantum computers would make most of these encryption and defense techniques irrelevant. Hence some people are working on techniques to mitigate those attacks as well.

1 Like

There’s another good reason mentioned in the comments in the token schema:

In particular, the part about “to avoid reconstruction”. Storing un-hashed tokens in the database means that an attacker with a read-only DB leak can instantly use any live tokens - storing only hashes means only the recipients of those tokens can use them (barring attacks that reverse the hash). Pretty much the same reason we hash passwords, etc when storing them.

3 Likes