Linked-List Memory Sort

Title:

Language:

Author:

Philip J. Erdelsky

Date:

February 23, 1996

Usage:

Public domain; no restrictions on use

Portability:

Any C compiler

Keywords:

Linked List, Sort

Abstract:

A C function to sort a linked list in memory, using an arbitrary comparison function and a tape-merge algorithm, which is O(n log n) in all cases. The function is non-recursive and re-entrant, and changes only the links.

Linked lists and sort routines are very common in programming.

The function sort_linked_list() will sort virtually any kind of singly-linked list, using a comparison function supplied by the calling program. It has the following advantages over qsort():

It sorts elements of variable size.
It requires O(n log n) comparisons, regardless of the initial order. The time to sort a list is predictable and consistent. This performance is known to be optimal for general sorts.
The algorithm is iterative, not recursive, so there is no problem of stack overflow. The stack requirement is predictable, consistent and small.
It sorts the linked list by changing the pointers, not by moving the elements themselves.
It can sort a list too large to fit into an array.

The function sorts only singly linked lists. If a list is doubly linked, the backward pointers can be restored after the sort by a few lines of code.

Each element of a linked list to be sorted must contain, as its first members, one or more pointers. One of the pointers, which must be in the same relative position in each element, is a pointer to the next element. This pointer is NULL in the last element.

The index is the position of this pointer in each element. It is 0 for the first pointer, 1 for the second pointer, etc.

Let compare() be a comparison function that compares two elements as follows:

     n = compare(p, q, pointer);

     void *p, *q;    pointers to two list elements

     void *pointer;  user-defined pointer passed to compare() by
                     linked_list_sort()

     int n;          result of comparing *p and *q
                       >0 if *p is to be after *q in sorted order
                       <0 if *p is to be before *q in sorted order
                        0 if the order of *p and *q is irrelevant

Let "first" be a pointer to the first element of the list. Then the following function call sorts the list and returns a pointer to the first element of the sorted list:

     first_sorted =
       sort_linked_list(first, index, compare, pointer, pcount);

The fourth argument (pointer) is passed to compare() without change. The example given here makes no use of the pointer. However, it can be an invaluable feature if two or more comparison methods share a substantial amount of code and differ only in one or more parameter values.

The last argument (pcount) is of type (unsigned long *). If it is not NULL, then *pcount is set equal to the number of records in the list.

It is permissible to sort an empty list. If first == NULL, the returned value will also be NULL.

For example, an element may contain both a name and a number:

    struct element
    {
      struct element *next_in_alphabetical_order;
      struct element *next_in_numerical_order;
      char name[9];
      int number;
      /* other members, if any */
    };

Initially, the two pointers in each element are identical, and the list is in arbitrary order, where "first" is a pointer to the first element. The following two function calls sort the list twice:

    first_in_alphabetical_order =
      sort_linked_list(first, 0, compare_names, NULL, NULL);
    first_in_numerical_order =
      sort_linked_list(first, 1, compare_numbers, NULL, NULL);

Here are the comparison functions:

    int compare_names(struct element *p, struct element *q,
      void *pointer)
    {
      return strcmp(p->name, q->name);
    }

    int compare_numbers(struct element *p, struct element *q,
      void *pointer)
    {
      return p->number > q->number ? 1 : -1;
      /* NOTE: return p->number - q->number will suffice if there is
      no danger of numeric overflow */
    }

A previous version of this sort, which was published in a technical journal, required the forward link to be at the beginning of each element. While this made the sort somewhat more efficient, it also made it hard to use with multiply linked lists.

The algorithm is fairly simple. The linked list (called a "tape" by analogy to old magnetic tapes) is first divided into two tapes of the same or nearly the same size. This is done by "writing" elements to the two tapes alternately.

The two tapes are then merged, element by element, into sorted blocks containing two elements each. The blocks are written alternately to two other tapes.

The two tapes are then merged, block by block, into sorted blocks of four elements each, and the blocks are written alternately to two other tapes.

The process continues, doubling the block size each time, until all elements are in a single sorted block on one tape. The function returns a pointer to the first element of that tape.

Each merge requires O(n) comparisons, where n is the number of elements. The number of merges is O(log n). Hence the entire sort takes O(n log n) comparisons.

The sort_linked_list() function is reentrant if the comparison function is reentrant.

C aficionados will be glad to know that this algorithm cannot be coded easily in Ada or Pascal because it relies on typecasts.

Source code in text format:

llmsort.c