I’m trying to understand why, for many of the user tokens (such as reset_password ones), a hashed version of the token is stored instead of the original token.
The only hint I’ve been able to find is the “original design spec”, which states:
The hashing helps protect against timing attacks.
Can someone explain in clear terms how a timing attack would work, if they weren’t hashed? Would it be something like this?
Attacker attempts to reset a password using a random token. Observes that the request takes N milliseconds.
Attacker keeps trying random tokens. Once they find one that worked, it takes a statistically significantly different amount of time.
It sounds like the hashing step perhaps makes the request take an order of magnitude longer, such that it’s much more difficult to be able to tell the difference between valid and invalid tokens.
I wrote a little about it: https://til.simplebet.io/posts/ki5natqkn3-mitigating-timing-attacks
Hashing the attempted password is going to result in a quasi-random string that has no perceptible relation to the actual password. Therefore, missing the password by 1 character, vs 10 characters does not result in a calculable difference in timing.
Doing a left-to-right comparison would fail at different times no matter how close or far the attempted password was.
Actually it’s not about trying random tokens, but methodically trying with all letters (a-zA-Z) or whatever the allowed symbols are as the first letter, and measure.
Let’s say the token is "dG90b3RpdGl0dXR1dG91dG91", trying with the following strings:
you’ll notice that it’s a bit longer for "daaaaaaaaaaaaaaaaaaaaaaaaaaaaa"if string comparison is used, simply because only in this case there’ll be a second comparison (the first letter being equal, the string comparison needs to check for the second letter before returning false).
Then repeat with "daaaaaaa...", "dbaaaaaa...", "dcaaaaaa..."…
A worst-case scenario for guessing an HMAC would require 20×256×n20 \times 256 \times n20×256×n measurements, where nnn is the number of measurements required to pin down a single byte. So—around 5,000,000 requests. You could do that in less than a week at a barely-perceptible 10 req/s.
Storing hash, you can still guess a hash using this attack but this will be useless because what you need is not the hash, but the original message (or any valid preimage), which mean you need to crack the hashing algorithm.
You can use Plug.Crypto.secure_compare/2 in Elixir to avoid timing attacks (it compares string in constant time) but my guess is that you cannot do it in database, which is why hashes are stored instead.
A timing attack is basically an attacker trying out requests with different inputs. The attacker then tries to observe differences in how long requests take. Those differences can come into existance by different code paths in the request handling having (non tiny) different computational complexity. E.g. for a login one might fetch the user and only if one is found the login password would be hashed for comparison with the db stored value. If no user was found no hashing would happen. The hashing vs. no hashing path likely take noticeable different times. A timing attack could therefore aggregate a list of emails registered to your service, just by observing response times. Long duration -> user likely exists; Short duration -> user likely doesn’t exist.
Storing the hashed values in the db for tokens likely (I haven’t looked at the code) means on requests, where the user input includes the token, no amount of hashing needs to happen anymore. A single string comparison values is not computationally complex enough to be observable in timing attacks or at least it’s masked by changing latencies in various places between the computation and the attacker.
Not sure to understand here. If the user sends the bare token (not hashed) and the token is stored in its hash form in the DB, then hashing the bare token before comparing it in the DB is necessary. As far as I know hashing is constant-time.
Trying to find an old article that explains it all.
A single string comparison values is not computationally complex enough to be observable in timing attacks or at least it’s masked by changing latencies in various places between the computation and the attacker.
Actually it is observable. Adding some jitters, either unintentionally or intentionally (:timer.sleep(:rand.uniform(100)) for instance) doesn’t work because statistically, on the long run, the same amount of time is added to each request, and the one with the correct “first letter” will still be a bit longer and observable.
Thanks for the input everyone @tangui, your input especially made it click.
I understood the essence of a timing attack, but it hadn’t clicked that with a bare token, they can systematically guess permutations per your example. Whereas with storing the hash, their input would be converted to something totally different performing the string comparison, e.g. in the database.
Do I understand correctly, however, that if the attacker knows which hashing function you’re using, they could still perform a sort of timing attack if they had a rainbow table of all possible inputs, and their hashes? (Which you should assume they do; security by obscurity doesn’t hold up in practice.)
Thought experiment: In the phx.gen.auth implementation, the tokens are generated from :crypto.strong_rand_bytes(32), which means there are 256^32 possible tokens. Which, according to really rough calculations, you’d need on the order of= 4 * 10^60 exabytes of disk space just to hold all of those tokens base64 encoded.
Not sure it is necessary here: I’ve found like @paulstatezny that you need 1e77 bits to store the rainbow table. The universe contains between 10^78 and 10^82 atoms, so in the worst case you need 10% of all available atoms to build the rainbow table.
Rainbow tables are particularly efficient when dealing with user passwords, because they have low entropy (= low level of randomness), narrowing the space search.
On the contrary, tokens used for access control are randomly generated.
Even though you are right, that it might be hard to impossible to store the rainbow at all, personally I consider it easier to imagine that just the second ingridient “salt” is missing, rather than the imagination that there might not be enough atoms in our sight to be able to store the data at all.
I can’t even barely imagine a single atom, as each of my teachers presented me a different model, and no one was able to tell how they really look like. Atoms are quite to abstract for me…
…as far as we know. Dark matter is still unaccounted for.
But on topic, maybe the quantum computers would make most of these encryption and defense techniques irrelevant. Hence some people are working on techniques to mitigate those attacks as well.
There’s another good reason mentioned in the comments in the token schema:
In particular, the part about “to avoid reconstruction”. Storing un-hashed tokens in the database means that an attacker with a read-only DB leak can instantly use any live tokens - storing only hashes means only the recipients of those tokens can use them (barring attacks that reverse the hash). Pretty much the same reason we hash passwords, etc when storing them.
I apologize for the necro-bump on this stale thread, but it’s the first to appear on a search engine for “timing attacks elixir”.
If you want to thoroughly avoid the possibility of sophisticated timing attacks (particularly for tokens that have longer lifespans), you may consider a split token as explained here. This is a token that consists of a selector and a verifier.
If you query the token row by the selector, then you can compare the “verifier” or hashed part of the token against the input with constant time by using Plug.Crypto.secure_compare/2 as was suggested in a previous post.
Whilst some Hashes are weak and could be broken by quantum computing, the ones we use like SHA256 are still believed to be immune to quantum computing because the underlying approach is computationally irreducible and basically needs brute force.
There may be some slight speedup using Grover’s algorithm but nothing close to the gains quantum computing offers in solving the discrete logarithm problem.
It is important when comparing the hash of the password provided with the one in the database to ensure it is done without any short circuit evaluation, ie compare all bytes of both hashes and don’t stop prematurely.
This applies equally to pure random tokens.
So the problem is not really related to hashes, but how to compare challenges without any distinguishability between any of the “no match” cases. That then makes it strong but physical side channels may still exist and be observable.
Well, having all that in mind then the safe bet is that we all have hardware backdoors (Intel’s ME and AMD’s PSP) so nobody is likely trying to brute-force super-hard-to-brute-force hashes. And if somebody needs something from our machines then they’ll send a few magic packets.