I’m trying to “securely store” phone numbers in a database. I need them for users of my app to find other users of the app who are in their contact list. I’ve been thinking about using bcrypt on phone numbers but with the same salt for all of them, so that I would only store the hash and not the phone number itself. Would that add any “security” or not? How fast would an intruder compute hashes for every possible phone number? A few days?
If I use a random salt for every phone number, then I don’t know how to quickly match them with their owners …
Bcrypt is quite slow (by design). On my machine, hashing 100 strings took about 26 seconds (using 1 core only), so for 9-digits phone number (1 000 000 000 combination) it would be 260 000 000 seconds. 8 years, not so bad.
In your case, I don’t see downsides of using the same salt. Attacker would know that this contact of the user is the number of the different user, but, you are linking them anyway.
When doing this you lose the main advantage of bcrypt over other hashing algorithms: Its adaptive ability to increase the strength of the hashing over time. If you need to look up a phone number in a list of hashed phone numbers, the only approach you can take is to hash it, and then compare the hash to your list of hashes. But if there are different ways of hashing it, of which the salt is one parameter, but the number of iterations to be used is another, then you have a lot more possibilities to exhaust.
The idea of storing only hashes of something so you do not need to store the real value is a very good approach to improve privacy of your users and security of their information. However, maybe using a different cryptographically secure hashing algorithm, such as Keccak (SHA-3) or Whirlpool.
You’re dealing with a hard problem. Signal had a nice piece on their solution in Wired awhile back. Ultimately there’s no truly safe way to do what you’re trying to do efficiently. So really your calculus should be around risk—what’s the impact of a compromise, what form(s) is such a compromise likely to take, and what is the business impact in real terms? In other words, can you get away with good security on storage of the addressbook and informed consent of the part of the user? Signal’s approach is basically, “Fine, don’t trust us, trust Intel.” That may or may not work for you.
So, philosophy of risk aside, what you’re doing isn’t Safe, but its probably “safe enough” if you’re not pretending to your users that its cryptographically secure.
I’ve got another idea, what if I keep a set of ~100 salts, and choose the salt to hash the phone number with :erlang.phash2(phone_number, 100)? Then when I get the contact list from a user, I compute :erlang.phash2 for each number to choose the right salt, then compute bcrypt hash of each phone number with the choosen salt and try to find them in the database.
Do you think it makes sense to do that?
The problem now is that this approach makes it possible to narrow down the set of possible phone numbers (just by looking at the salt) for bcrypt hash … What if I don’t concat the salt to bcrypt hash and store EK00IR9w0LvFqc75u1mWtoQC/zke/au instead of $2b$12$lUsh1d4kvioZbxOIZ3Xkg.EK00IR9w0LvFqc75u1mWtoQC/zke/au
You’re still going to be dealing with the same problem, and your solutions are basically relying on security through obscurity (assuming the attacker doesn’t realize that the salt is constant or predictable.) You can’t assume this and achieve security. Assume instead that the client is compromised and that your attacker has the source code or a decompiled version of it, and the persistence to reverse-engineer your scheme. If they come to the conclusion that you’re just using some kind of complicated function to choose a salt from a set of possible salts, its then a trivial matter to create a rainbow of possible outputs for each potential input—and that’s assuming they don’t bother figuring out which salt a given input is going to use.
Again, you’re dealing with a hard problem. To connect users efficiently, you have to be a trusted intermediary; you can’t be “trusted” and also provide provable zero-knowledge of the sensitive data; and you can’t trust clients not to be adversarial.
I think I can already assume that since it’s open sourced, except for salts.
its then a trivial matter to create a rainbow of possible outputs for each potential input
That’s exactly what I want to try and make more difficult. I thought that using more salts (rather than just one) would help me with that. I’m also considering switching to argon2 since it’s more memory intensive than bcrypt.
Right now I’m storing the salts in memory (compile them into a module). Don’t know where I should store them “permanently”. Maybe in hashicorp’s vault or something like that.
Yeah, I can see what you’re going for, but realize for any constant number of salts X, whether X is 1 or 100 is somewhat immaterial. You’re just moving the goalposts a little bit on what to compute. In the worst case, where your selection function is known by the attacker (and in an open-source product, you can assume it will be) then you’ve reduced X in all cases to 1, since the attacker can predict which salt needs computing by running the selection function on the input they’re attacking. I’m not a cryptographer by trade, so if there’s a zero-knowledge protocol I don’t know about here then someone please correct me, but as far as I know there’s no getting around the math. You can trust an intermediary to do the comparison of addressbooks efficiently; you can have no trusted intermediaries and push the comparison to the addressbook entries (but you still leak knowledge that way—you have to trust one or the other of the entries to make the comparison); or you can leak no information and require both clients to know each other and coordinate addressbook entries.