From https://xkcd.com/538/

When you register for websites or online services, you have to set a password to enable yourself to login again in the future. Your username and password needs to be stored in a database so that when you ask to login, the server can verify your details are correct and allow you access.

Let's look at the basic way of doing this (btw, the WRONG way) and then work our way up to how most websites (should) be storing your password.

Version 1 - Plain-text

Joe has registered on my website and I have chosen to store his password in "plain-text". This means I store his password with no other security measures than normal. So in my database I store:

Username: Joe

Email: [email protected]

Password: 12345

Yes, it's a bad password. But you'd be surprised how many people use that one. (see top passwords on Gawker leak: http://blogs.wsj.com/digits/2010/12/13/the-top-50-gawker-media-passwords/)

Now when Joe tried to log into my website, I look at the password he gave me and compare it to my database. Let's say Joe gives me his password "12345" - Hurrah! It matches! I can let him login and access my lovely website.

Where are the problems with this? First, anybody running the website can easily look into their database and read all the passwords for all their users. Ideally you want even the admins on the website to not be able to know your password. Secondly, all the security is based on the database. If somebody managed to break into the website, they may be able to break into the database and download all your usernames, emails and passwords.

We need a better form of security.

Version 2 - Password Hashing

Now we are going to secure our passwords with something called "hashing". We use a mathematical equation called a "hash function" to turn your password into a piece of nonsensical data. There are many different types of hash functions we could use, however they ideally need to have these properties:

  • One-way only
    • This means if we take a password and run it through a hash function, we cannot reverse the process. This means you can't take the password hash, run it through a modified version of the hash function and get the original password.
    • This requires some complex mathematics to ensure it's absolutely impossible to find a way of reversing the hash function.
  • No collisions
    • We don't want two passwords resulting in the same password hash. For example, if "12345. and "password" resulted in the same password hash, people will be able to login with either of these passwords.
    • This will make more sense after an example.

So, for this example we're going to use a famous hash function called MD5 (which has actually been proven to have some rare hash collisions, there are better functions available now, but for this example we'll use a popular one).

When Joe registers, instead of storing his password in plain-text, we store the result of the hash function.

Username: Joe

Email: [email protected]

Password: 827ccb0eea8a706c4c34a16891f84e7b

You can see that the result of "12345" is a long piece of text that is impossible to understand.

Now, when Joe tries to log in, we take his password. We run the hash function on the password he gave us and we compare the two hashes instead. If he gives us "12345", we will run it through the hash function, check the resulting password hash and if it matches the hash we have in the database - Hurrah! We have logged Joe into the site again.

But is this really safe enough?

Note that this time, we never store the plain password. So an admin can't look through the database and read everyone's passwords. But, there is still a flaw in this system. What if we built a massive database of every single possible combination of letters, numbers and symbols and ran the same MD5 hash function over every possibility and saved the result. It will take a very very long time to calculate, but people have done exactly this. They have created databases where you can type in a password hash, and it will search through their massive databases trying to find the password that originally created it.

This is the problem of everybody using the same hash functions. But there are very few available that are secure and strong enough.

However, there is a solution to this problem too.

Version 3 - Salted hashes

Salting is almost exactly the same as password hashing, but with one minor difference. We add a new piece of data to each user in our database. For this example, I'm going to generate a random piece of text for Joe using a random text generator.

For Joe, we generated a random piece of text "b5h64h0c78FbXWJHKl7DDKKE35d6SO". We shall call this his "password salt". We store this alongside his username and email address in the database.

Now, instead of storing the hash of only his password, we also add our salt to his password. Now instead of performing the hash function of "12345". we perform the hash function of "12345b5h64h0c78FbXWJHKl7DDKKE35d6SO". Notice it starts with Joe's normal password, but we add our salt onto the end. This gives us a new password hash to store.

Username: Joe

Email: [email protected]

Password: f88378f45a99a13be6f42cefbd80e976

Salt: b5h64h0c78FbXWJHKl7DDKKE35d6SO

So now, we have made Joe's password very long. It would take way too long for somebody to go through every single possibility up to the point of a 35 letter password because of the salt we added on. This is why it's vital that websites add the salt to each user, making it impossible to pre-calculate as many password possibilities as possible, since every user will have a completely different salt, it will take centuries of computation to get anywhere close to finding the right one.

Recently, Gawker, (a website network including Fleshbot, Deadspin, Lifehacker, Gizmodo, io9, Kotaku, Jalopnik and Jezebel) was hacked and their database was compromised. They did not use password salting. Millions of passwords were instantly looked up in large password hash database. It's hard to know how many other websites out there don't salt their password hashes.

We've glanced over a lot of password security, but I thought it would be helpful to essentially explain how your data is secure. After the recent media hype over hacked systems, people actually suddenly seem to care about their online information. Just wait until people get into your Facebook. If you don't want people to know about it, don't put it online.

How can we improve this further? Look up Two-Factor Authentication, banks (and recently GMail http://googleblog.blogspot.com/2011/02/advanced-sign-in-security-for-your.html) implements it and will keep your account significantly more secure: http://en.wikipedia.org/wiki/Two-factor_authentication

Password Hashing - How to make it not suck. A basic guide.