A one-way hash is conceptually similar to parity. A calculation is performed on the data before the data is transmitted or stored. The results of the calculation are stored with the data. Before the data is use again, the calculation is performed again and the new results compared with the stored results. If the results match, then the data has not changed, and thus the integrity of the data has been preserved. Hashing algorithms are similar to encryption algorithms, but have a very distinct difference. While the secret key is used to make the results of an encryption algorithm unpredictable, a hashing algorithm has to be 100% predicable. If it were not predictable then it would not be repeatable, and thus unverifiable. A hashing algorithm functions by accepting the document as an input value. The algorithm then parses the document many times and on each iteration removes data from the document. The result is a fixed length string that is directly based on the document. Hash algorithms are very sensitive to change, and thus the change of a single bit in the source document will result it a 50% bit change in the hash. Also, and this is one of the few places the word impossible will be used, it is impossible to derive the document from the hash. This is because the hash is an incomplete document; pieces of data are missing.
The two most common hashing algorithms are MD5 (Message Digest 5) invented by RSA and SHA-1 (Secure Hash Algorithm-1) invented by the NSA. MD5 produces a 128 bit hash from any data stream, while SHA-1 produces a 160 bit hash. The longer the hash, the more secure the hash is, and thus SHA-1 is the most common algorithm in use.
The process of hashing can be represented as follows.
Hash{Data} = y
The data is then transmitted as Data+y.
When the data is received the value of y is extracted and the integrity is calculated as follows.
If Hash{Data} = y then data s valid
Hashing is subject to one major security weakness. It is possible for a user in the middle to intercept the Data+y packet, change the data, recalculate the value of y and then transmit the data to the intended recipient. The recipient has no way of validating that the data received is in fact the data sent. In order for hashing to prove an effective data integrity technology, the hash itself must be protected.