Hash (cryptography)
In cryptography a hash or message digest is a fixed-size digest which can be calculated from an input text of any size up to some large limit. While cryptographic principles are used, these functions are used in manners quite different than two-way, or even one-way full-text cryptographically protected communications. The primary applications of hashes and message digests are as means of error detection, source authentication, or data integrity protection.
Applications
Hashes provide various kinds of authentication service, not the secrecy that other cryptographic primitives (block ciphers, stream ciphers and public key techniques) provide.
Used alone, a unkeyed hash can provide error-checking. The sender calculates a hash and stores or transmits it with the document. The receiver calculates a new hash from the document he receives, or the reader calculates one for the document the pulls from the archive. Compare his new hash with the one he sender calculated; if they match then it is overwhelmingly likely that the document has been transmitted or read with out error.
That technique handles noisy lines or "bit rot" in an archive, but it useless against an adversary who intentionally changes the data. The enemy simply calculates a new hash for his changed version and stores or transmits that instead of the original hash. To block this takes a keyed hash, a Hashed message authentication code or HMAC. Sender and receiver share a secret key; the sender hashes using both the key and the document data, and the receiver verifies using both. Lacking the key, the enemy cannot alter the document undetected.
Hashes are also an essential component of digital signature algorithms, along with public key encryption. A signature is essentially a hash encrypted with the signer's private key. To verify a signature, decrypt it with the signer's public key and check that the decrypted hash matches one for the received document.
Hashes are also commonly used as a mixing operation in random number generators.
Design considerations
The main design requirements for a hash are that it be difficult for an enemy to:
- find two inputs that has to the same result (collision resistance)
- given a hash, find an input that gives that result (pre-image resistance)
- given an input, find another input that hashes to the same result (second pre-image resistance)
An ideal hash resists all of these.
MD4 and descendants
MD4
Message Digest algorithm number 4 was from Ron Rivest. It is no longer used, replaced by its descendants. A specification is in RFC 1320.
MD5
MD5 was Rivests's version of an enhanced MD4. Like MD4, it gives a 128-bit hash. RFC 1321 gives a specification and RFC 1820 a performance analysis.
SHA
There are a whole family of SHA hashes, all designed by NSA. The original SHA was essentially an improved MD4, with two major changes. It increased the hash size from 128 to 160 bits, using five 32-bit words of internal state instead of four. Also, there is an expansion step which spreads the state out to 80 words. One word is then mixed back in at each round of the hash. This was not much used, quickly replaced by SHA-1.
SHA-1
SHA-1 is a slightly modified SHA, also giving a 160-bit hash. It adds a one-bit rotation in each round. The NSA have never explained why they felt this change was necessary; presumably it protects against some attack which they do not wish to reveal.
A specification is in RFC 3174. The US government standard is FIPS 180-1.
SHA-1 is in very wide use. For example, it is used in protocols such as PGP and IPsec and in random number generators such as Intel's hardware generator and the software random device in Linux.
SHA-2
SHA-2 is a family of hashes standardized by the US National Institute for Standards and Technology, NIST. The standard is FIPS 180-2 (pdf). The design is based on SHA.
There are four new hashes in the standard (SHA-1 is retained as well), named by their hash size: SHA-224, SHA-256, SHA-384 and SHA-512. Because of the birthday attack, when a hash is used with a block cipher, the hash size should be twice the key length of the cipher, SHA-256, 384 and 512 are intended to be used with AES-128, 192 and 256 respectively. SHA-224 is for use with Triple DES which has only 112-bit strength.
In internal structure, the four SHA-2 hashes are identical except the 384-bit and 512-bit versions use 64-bit variables while the 256-bit and 224-bit versions use 32-bit variables. SHA-384 is identical to SHA-512 except it starts with different constants and truncates the output to 384 bits. SHA-224 has the same relation to SHA-256.
As of late 2008, no attacks are known against the SHA-2 group of algorithms, but attacks have been found against MD4, MD5 and SHA-1, so there is some cause for worry that eventually SHA-2 might fall. Playing it safe, NIST are therefore now working on an Advanced Hash Standard, also known as SHA-3, which could replace SHA-2 if that should become necessary.
RIPE-MD
This was a European standard.
Other 20th century hashes
Tiger
Whirlpool
The Advanced Hash Standard
In 2005, the US National Institute of Standards and Technology (NIST) began the process of defining a new hash standard, SHA-3 or the Advanced Hash Standard or just AHS. There is a NIST page with details and links.
The overall process and methodology are similar to what they did for the AES contest, choosing a new cipher standard which became the Advanced Encryption Standard. Starting in 2005, they sponsored two public workshops contest to discuss the state of the hashing art, then issued a draft requirements document and invited public comment. After revising the requirements, they issued a call for submissions in November 2007. The deadline on that was October 31, 2008.
As of early November, the deadline has passed and NIST have received 64 entries. They are going through them to see which ones actually meet all submission criteria. Once that is done, those "complete and proper" submissions will become the first round candidates and all their design documents will be public on the NIST site. Meanwhile, there are at least two other sites with partial lists and links to design documents, the SHA-3 Lounge and the SHA-3 Zoo.
There will be more conferences, then a narrowing of the field to a group of finalists, more analysis and another conference, then a final selection. Target date for completion of the process and release of the new standard is 2012.
Skein
From Bruce Schneier and others: [1]
MD6
From a team led by Ron Rivest.
CubeHash
From Dan Bernstein, [2]
Essence
From Jason Worth Martin [3]
Sgàil
Peter Maxwell [4]
EnRUPT
Sean O'Neil [5]
NaSha
Smile Markovski and Aleksandra Mileva [6]
Maraca
Robert Jenkins [7]