RSA Data Security, Inc. MD5 Message Digest Algorithm

Title:

Language:

Author:

Philip J. Erdelsky

Date:

August 8, 2002

Usage:

Public domain; no restrictions on use

Portability:

Any C environment, also compiles as C++

Keywords:

MD5, cryptography

Abstract:

A C package to calculate the RSA Data Security, Inc. MD5 Message Digest

Source code:

md5.txt

Introduction

The code in this package is taken mainly from Ronald Rivest's original 1992 memorandum (see www.ietf.org/rfc/rfc1321.txt). It computes the 128-bit RSA Data Security, Inc. MD5 message digest of a byte stream. Although the MD5 algorithm defines a digest for bit streams of any size, this package handles only streams of 8-bit bytes.

Although most of the code in Rivest's original memo has been retained, it has been cleaned up quite a bit:

The modified code compiles as either C or C++; the original compiled only as C.
The modified code has prototypes and "const" pointers to force extra error checking.
The code has been indented in a uniform way.

How to Use the Package

The package resides in the files md5.h and md5.c. The file md5.txt contains both files in text form.

Use a text editor to separate the files. Include the file md5.h in any source module that calls on the package. Compile the file md5.c and link it to the rest of your application.

If the package is used on a little-endian machine, such as the Pentium family, the preprocessor label LITTLE_ENDIAN should be defined so the package can avoid some of the byte-swapping that is required on big-endian machines.

To compute the message digest of a byte stream, first define a context buffer and open it:

     MD5Open(pmd5);

     MD5 *pmd5;                pointer to MD5 context buffer

Then feed the byte stream, a block at a time, to the MD5Digest() function:

     MD5Digest(pmd5, buffer, buflen);

     MD5 *pmd5;                pointer to MD5 context buffer

     const void *buffer;       pointer to first byte of block

     unsigned buflen;          number of bytes in block

Finally, close the context buffer and obtain the digest:

     M5Close(pmd5, digest);

     MD5 *pmd5;                pointer to MD5 context buffer

     unsigned char digest[16]; digest

For example, the following code computes the digest of a file:

     #include "md5.h"
     ...

     MD5 md5;
     FILE *fp = fopen(filespecs, "r");
     char buffer[64];
     unsigned char digest[16];
     int length;
     MD5Open(&md5);
     while ((length = fread(buffer, 1, sizeof(buffer), fp)) > 0)
       MD5Digest(&md5, buffer, length);
     MD5Close(&md5, digest);
     fclose(fp);

Although MD5Digest() can handle blocks of any size (even zero), it operates internally on blocks of size 64, so it is most efficient if given blocks that are exact multiples of this size.

Applications of Message Digests

It is obviously not impossible for two different byte streams to give the same digest, but it is so difficult to find two such streams that the digest can be considered unique for all practical purposes.

One use of message digests is in digital signatures. Public key cryptography is used to obtain a pair of keys, one public and one private. A digest of the message is computed, and the private key is used to encrypt the digest. The encrypted digest is the signature.

The signature can be verified by using the public key to decrypt it and comparing the result to a digest computed from the message. If they match, the message must have been signed by someone having access to the private key.

Of course, it would be possible to dispense with the digest and sign the entire message piecemeal. However, this is seldom done because public key encryption is much slower than the computation of a message digest.

Message digests can also be used to establish the creation date of a message without revealing its contents. The creator computes a message digest and files the digest with a reliable registrar, who notes the filing date. If the creator later reveals the message, anyone can show that it was created on or before the filing date by recomputing the message digest and comparing it to the filed message digest.

This technique offers reasonable security in most cases because it is virtually impossible to recover the message from the digest, unless it is very short or almost all of it is already known. Even in these special cases the message can be hidden by simply padding it with random gibberish.