Enter The Bitcoin, Part One: Hashes Explained
This morning, we published a guest post by Pete Dushenski about Bitcoin. Tomorrow, we'll have the conclusion to that.
In the meantime, I want to briefly discuss "hashes". In my experience, the inability to understand hashes is what keeps a lot of people from getting a grip on modern technologies like encryption, security, and cryptocurrency.
What follows is an explanation of hashes that I absolutely guarantee you will understand, even if you've called tech support in the past because your computer was on but your monitor was off --- Hi, Dad!
The simplest definition of a "hash" is a one-way equation. We've all done things in the past that are easy to do but tough to undo. Imagine, for example, that you have that new Porsche 911 Lego set sitting in front of you, completely assembled. You could take it apart down to the 2,000-plus individual parts in maybe five minutes, but reassembling it could take you days. Reassembling it without instructions could take you months of continuous effort. That's effectively a "one-way" process.
Now we are going to make a very simple hash together. Again, don't be worried --- this is not hard for anybody over the age of five.
We will start with our rules for the hash. I'm going to put them down below.
For letters between A and E, substitute the number 1. For letters between F and K, substitute the number 2. For letters between L and P, substitute the number 3. For letters between Q and T, substitute the number 4. For letters between U and Z, substitute the number 5.
Then take the number we get and square it. That's our hash.
Let's take the word
DONOR
D = 1 O = 3 N = 3 O = 3 R = 4
13,334 squared is 177795556. Congratulations! You've made a hash.
Now let's say that your brother sends you an letter. It says, "I am having a friend visit you tomorrow. He will give you a password. Run it through the hash and if it equals 177795556 then it is really my friend." When the friend arrives, he tells you "The password is DONOR." You run it through the hash, and it returns 177795556. So you know he's legit.
Now here's the tricky part. Let's say that somebody intercepts the letter between you and your brother. He reads it, then he seals it back up and sends it the rest of the way to you. Uh-oh. He knows that the secret hash is 177795556. Can he get back to the word DONOR from the number 177795556, if he knows all the rules?
He cannot.
Here's why.
Look at the word
COLOR
C = 1 O = 3 L = 3 O = 3 R = 4
13,334 square is 177795556. That's the same!
There is no way to tell if the originally hashed word was DONOR or COLOR. It could also be CONOR, or it could be nonsense words like BONOR (hee hee) or ALPOS, or DLMMQ. In fact, there are 1,280 different five-letter "words" that hash out to 177795556 using our equation. Your enemy would have a 1-in-1,280 chance of guessing right. But the right password, DONOR, will always hash out correctly.
This is how your computer stores your password, by the way. And the early hash methods used by computers weren't that much more complicated than the one I've made up above, which means that if you had the hash and you knew the equation you could probably guess a password in a few million "guesses". That's how people used to "crack" Unix and Windows servers. They would log on, find the "hash file" that had all the passwords, then they would have their own computers run random words through the hash equation until they matched the passwords in the hash file.
Things are a little tougher nowadays, but not that much so. It helps that most people use one of about 300 common passwords ('love', 'secret', and common names), so the password-cracking programs always try those first.
That's one use of hashes: to store secret information in "one-way" format.
Another use: to find out if a file has changed.
Alright, here's our next example. Your brother sends you a letter. It says, "I've written an article and the first word has the hash value 177795556. You can download it from my website." So you download it. The first sentence of the article is: "Donor receipts are up this year by 20%." You run the first word through the hasher and it comes out correct. You delete the article after you read it.
A few weeks later, you want to read it again, so you download the article again. But in the meantime, some bad dude has hacked the document and changed it. The first sentence now reads "It was a dark and stormy night." You run "It" through the hasher and it does NOT come up with 177795556.
This document is not genuine.
In the real world, of course, the hashes are much more complicated and they can be run against a whole document. So you might get a hash that looks like this:
cf23df2207d99a74fbe169e3eba035e633b65d94
and changing even one word could make it
8ad3499ef0ba82e44a3b10cab2ed357a8b11ea4b
This is some neat stuff, isn't it? You can verify everything from passwords to the whole text of Moby-Dick using very simple tools that are publicly available. In a way, hashes are a direct replacement of the "one-time pads" used by spies back in the Cold War days.
But there's one problem.
If you have enough time, and computing power, you can break the hash and learn the original text or input.
There are a lot of old secret files out there that were computed using hashes from the Seventies. Back then, the idea of trying a few billion or trillion guesses against the hash seemed impossible. Today's computers can do it in seconds. So if you sent someone in Switzerland an encrypted banking message in 1998 using the 56-bit encryption that was the international standard at the time... Surprise! Somebody can break it in under a day now.
The modern crypto hashes are complex enough that solving them would take until the heat death of the universe.
Bitcoin uses hashes in a couple of different ways. If you want a more technical explanation, you can click here. If you want a usable explanation, then check back tomorrow for Pete's conclusion article!