Hashing identity proofs
Posted on 2022-09-22
Introduction to claims and proofs
As explained in the Keyoxide documentation on identity claims and proofs, an identity claim is a link that connects your Keyoxide profile to an online account on some website or app, and an identity proof is the link that connects said account to said Keyoxide profile.
When the claim links to the proof and the proof links to the claim, we call this bidirectional linking and this is what Keyoxide uses to verify online identities.
Concealing the proof
It makes sense for your Keyoxide profile to link to all these accounts — after all, that is the reason to use Keyoxide in the first place: link all your accounts together.
But sometimes, it might make sense to not have your account link back to your Keyoxide profile, especially if you are an activist or are otherwise prone to online harassment. If ill-intentioned people find one of your accounts with a proof, it's a small effort to then find your Keyoxide profile using that proof and find your other accounts.
Let us find a solution for this situation.
Keyoxide still needs a functional proof to verify your online identity but it shouldn't have to reveal your Keyoxide profile.
Introduction to cryptographic hash functions
I know, it sounds complicated. But at a basic level, it really isn't.
A cryptographic hash function is a mathematical algorithm that maps data of an arbitrary size to a bit array of a fixed size.
Hmmm, ok. Let's look at some examples using the md5 algorithm.
There are many online md5 hash generators online which will all give the same results.
So I gave a md5 hash algorithm 123 as input and it returned a piece of text of 32 characters. Another example:
Input: What a day, what a lovely day!
The hash is different but still 32 characters long.
Input: It was a bright cold day in April, and the clocks were striking thirteen.
The input is much longer now and still the hash is only 32 characters long!
So, a hash is like a fingerprint of the input, no matter how long it is. It's also unique to each input — well, not exactly but it's extremely difficult to find two inputs that have the same hash.
More importantly, hash functions are one-way functions: if you know the input, it's very easy to calculate the hash. If you know the hash, it's very complicated to calculate the input!
md5 is unsafe to use for sensitive data! I used it here to illustrate how hash functions work, but no more using md5 after this point!
Applying hash functions to identity proofs
So, how does this help us?
Let's look at an example: keyoxide.org/3637202523e7c1309ab79e99ef2dc5827b445f4b.
It contains a single identity claim: the doip.rocks website.
To verify this identity claim, Keyoxide fetches some data associated to this website — in this case, the DNS records. Keyoxide will then go looking inside this data and will consider the identity claim verified if it finds the following piece of text:
If someone accidentally finds this proof, they can then find your Keyoxide profile. This may not be an issue for some, but it could be for others! So, let's conceal the proof using a hash function.
Go to the Keyoxide bcrypt tool and give it the following input:
The bcrypt algorithm is quite different from md5 — every time you generate a hash, you get a different one! — but much more secure.
Here is one of the valid hashes that website will generate:
Now, I can use this as proof and Keyoxide will consider the identity claim verified. And if someone finds this proof by accident, there is no way of figuring out what the associated Keyoxide profile is!
Beware, not a silver bullet!
It is important to note that hashing proofs is NOT infallible. Yes, it will conceal the fingerprint but if someone already knows your Keyoxide profile, they may expose the concealed fingerprint through different channels.
Keyoxide and hash functions
Keyoxide currently supports the bcrypt and argon2 hash functions.
The default parameters on these online tools are good enough for this purpose. You may opt to increase the cost factor but don't go too high, it will take longer to verify the hash and may result in a failed verification! For example, for bcrypt, don't pick a cost factor above 12.
I hope this new feature will be of benefit to those seeking privacy protection. As always, the communication channels are open for suggestions to improve the hashing or supporting more algorithms.
Until next time, Yarmo