Determinants

Philip J. Erdelsky

March 23, 2009

Please e-mail comments, corrections and additions to the webmaster at pje@efgh.com.

1. Definition

Let A be an n ⨯ n matrix. Usually the entries in A are taken from a field, but many of the results also apply to matrices whose elements are taken from more general structures, such as commutative rings.

Let the element in the i-th row and the j-th column of A be represented in the usual way by a_i,j. The determinant of A, which is usually written as det(A) or ∣A∣, is the sum of all n! products of the form

     n
s_p  ∏ a_i,p(i) ,
    i=1

where p is a permutation of {1,2,3,...,n} and s_p is +1 if p is even and -1 if p is odd.

Each product contains exactly one element from each row and exactly one element from each column. The sum contains all such products.

For small matrices, we can compute the determinant directly:

n det(A)

1 a₁₁

2 a₁₁a₂₂ - a₂₁ a₁₂

3 a₁₁ a₂₂ a₃₃ - a₁₁ a₂₃ a₃₂ + a₁₂ a₂₃ a₃₁ - a₁₂ a₂₁ a₃₃ + a₁₃ a₂₁ a₃₂ - a₁₃ a₂₂ a₃₁

n	det(A)
1	a₁₁
2	a₁₁a₂₂ - a₂₁ a₁₂
3	a₁₁ a₂₂ a₃₃ - a₁₁ a₂₃ a₃₂ + a₁₂ a₂₃ a₃₁ - a₁₂ a₂₁ a₃₃ + a₁₃ a₂₁ a₃₂ - a₁₃ a₂₂ a₃₁

For every permutation p of {1,2,3,...,n},

1 + 2 + ... + n = p(1) + p(2) + ... + p(n)

because the right side contains the same addends, possibly in a different order. Hence if p(i) > i for some i, then p(j) < j for some j≠i. Hence if a product in the definition of det(A) contains an element above the main diagonal, it also contains an element below the main diagonal.

If a matrix is either upper triangular (a_i,j=0 when i>j), lower triangular (a_i,j=0 when i<j) or diagonal (a_i,j=0 when i≠j), then the determinant reduces to a single term corresponding to the identity permutation, because every other term contains a zero factor:

det(A) = a_1,1 a_2,2 ... a_n,n when A is triangular or diagonal.

In particular, the determinant of the identity matrix is 1 and the determinant of the zero matrix is 0.

If a matrix contains a row of all zeros, or a column of all zeros, its determinant is zero, because every product in its definition must contain a zero factor.

2. Basic Properties

Theorem 2.1. The determinant of a matrix is equal to the determinant of its transpose.

Proof. The determinant of A^T contains the following products:

    n
s_p  ∏ a_p(i),i ,
   i=1

Now let q be the inverse of the permutation p and use it to rearrange the factors:

    n
s_p  ∏ a_i,q(i) ,
   i=1

Since the parity of q is the same as the parity of p, s_q = s_p, so this becomes

    n
s_q  ∏ a_i,q(i) ,
   i=1

Hence the definition of det(A^T) contains the same terms as the definition of det(A), possibly in a different order. █

Therefore, many results regarding the rows of a determinant also apply, mutatis mutandis, to its columns.

Theorem 2.2. The determinant of a matrix is a linear function of each of its rows (or columns).

Proof. Multiplication of a row by a constant factor introduces the same factor into every product in the definition of the determinant, so it also multiplies the determinant by the same factor.

Consider the first row. If A and B differ only in their first row, then the determinants of A and B can be written as sums of the following products, respectively:

             n
s_p  [a_1,p(1) ∏ a_i,p(i)]
            i=2

             n
s_p  [b_1,p(1) ∏ a_i,p(i)]
            i=2

Then we can add the sums term by term to get the corresponding term in the sum of the two determinants:

                       n
s_p  [(a_1,p(1)+b_1,p(1)) ∏ a_i,p(i)]
                      i=2

This is the determinant of the matrix obtained by adding the first row of A to the first row of B.

The proof for other rows is similar. █

Theorem 2.3. If any two rows (or columns) of a matrix are interchanged, the sign of its determinant is changed.

Proof. This changes the parity of each permutation in the definition of the determinant, so it changes the sign of each term, and hence the sign of the determinant. █

Theorem 2.4. If any two rows (or columns) of a matrix are proportional, its determinant is zero.

Proof. This is obvious if the proportionality factor is zero. In other cases, multiply one row by a constant factor so it becomes equal to the other. This multiplies the determinant by the same nonzero factor. The resulting determinant has two equal rows, so its determinant is unchanged if the rows are interchanged, but by Theorem 2.2 the sign of its determinant is changed. Hence its determinant, and also the determinant of the original matrix, must be zero. █

3. Elementary Row and Column Operations

Recall that multiplying one row of a matrix by a constant and adding the result to another row is called an elementary row operation. The first row is left unchanged and the second one is replaced by the result. In the following example, the first row is multiplied by 2 and added to the second row:

        |  1  2  3  |     |  1  2  3  |
        |  4  1  0  |     |  6  5  6  |
        |  7  8  9  |     |  7  8  9  |

We notice that the two matrices have the same determinant:

1*1*9 - 1*0*8 + 2*0*7 - 2*4*9 + 3*4*8 - 3*1*7 = 9-0+0-72+96-21 = 12
1*5*9 - 1*6*8 + 2*6*7 - 2*6*9 + 3*6*8 - 3*5*7 = 45-48+84-108+144-105 = 12

An analogous operation on columns is called an elementary column operation.

Theorem 3.1. An elementary row (or column) operation on a matrix does not change its determinant.

Proof. By Theorem 2.2 the determinant is a linear function of the row changed by the operation. Moreover, the operation adds the corresponding rows of two matrices, the original matrix and another with two proportional rows, whose determinant is zero. Hence the determinant is unchanged. █

This argument may be easier to follow by examining the example given above:

           |  1  2  3  |         |  1  2  3  |        |  1  2  3  |
       det(|  6  5  6  |)  = det(|  4  1  0  |) + det(|  2  4  6  |),
           |  7  8  9  |         |  7  8  9  |        |  7  8  9  |

The last determinant is zero because the first two rows are proportional.

Theorem 3.2. The determinant of a singular matrix is zero.

Proof. If a matrix is singular, then one of its rows is a linear combination of the others. By applying repeated elementary row operations we can make this row zero without changing the determinant. The determinant of the result is zero, and so was the determinant of the original matrix. █

4. Computing Determinants by Elimination

Elementary row operations can be used to compute determinants. But first, we need a specific result.

Lemma 4.1. If every element of the last column of an n ⨯ n matrix A is zero except a_nn, then its determinant is the product of a_nn and the determinant of the (n-1) ⨯ (n-1) submatrix consisting of the first n-1 rows and columns of A.

Proof. If we delete all products in the definition of det(A) that contain any of the zero elements, the remaining products all contain a_nn, and the associated permutations all keep n fixed. Hence each associated permutation is also a permutation of {1,2,...,n-1}, and its parity is the same as the permutation of {1,2,...,n}. We can factor out a_nn, and what remains is the definition of the determinant of the submatrix. █

The way to compute a determinant can be described recursively.

If n=1, the determinant of the matrix is its sole element.
If n>1,
- If the last column is all zeros, the matrix is singular and its determinant is zero.
- If the last column is not all zeros
  - If a_nn = 0, interchange two rows so it is nonzero.
  - Perform elementary row operations, adding appropriate multiples of the last row to every other row, so that all entries in the last column, except a_nn, become zero.
  - Use the procedure recursively to find the determinant of the (n-1) ⨯ (n-1) submatrix consisting of the first n-1 rows and columns of A.
  - Use Lemma 4.1 to find the determinant of A.
  - If rows of A had to be interchanged, change the sign of the result.

5. The Determinant of a Product

The following elegant result is rather hard to prove.

Theorem 5.1. If A and B are n ⨯ n matrices, then det(AB)=det(A)det(B).

Proof. For particular kinds of matrices, this result easy to prove. We will show that det(AB)=det(A)det(B), first for special matrices, and then for general matrices.

If A is singular, then det(A)=0. Since AB is also singular, det(AB)=0, which establishes the result if A is singular.

Now let A be a diagonal matrix. Then AB is simply the matrix B with each row multiplied by the diagonal element of A in the same row. By Theorem 2.2, det(AB) is the product of det(B) and all the diagonal elements of A. Since det(A) is the product of the diagonal elements of A, this establishes the result when A is diagonal.

Now consider the elementary row operation which multiplies the r-th row of B by c and adds it to the s-th row. It is easy to show, directly from the definition of matrix multiplication, that the result is AB, where A is the matrix with 1's along its principal diagonal, a_sr = c, and zeros elsewhere. Moreover, det(A) = 1 because it is a triangular matrix with all diagonal elements equal to 1. By Theorem 3.1, det(AB) = det(B). Hence det(AB) = det(A)det(B), which establishes the result when A is a matrix of this kind, which we shall call an elementary row operation matrix.

The inverse of an elementary row operation is an elementary row operation with the scalar -c.

Now suppose the matrix A is obtained by interchanging two rows of the identity matrix. Then det(A) = -1. Also, AB is the matrix B with the same two rows interchanged, so det(AB) = -det(B). This establishes the result when A is a matrix of this kind, which we shall call an interchange matrix.

An interchange matrix is its own inverse.

Now any nonsingular square matrix can be converted to a diagonal matrix by some sequence of elementary row operations and row interchanges as follows. This process is vacuous for 1 ⨯ 1 matrices; for larger matrices we use induction on the matrix size.

Since the matrix is nonsingular, the first column contains at least one nonzero element. Exchange two rows, if necessary, to bring a nonzero element to the upper left corner.

Subtract appropriate multiples of the first row from each other row to make all elements of the first row zero, except the element in the upper left corner.

The submatrix obtained by deleting the first row and the first column must be nonsingular. If its rows were dependent, then the last rows of the whole matrix would be dependent in the same way.

Apply the same procedure to the submatrix to diagonalize it. This involves elementary row operations and row exchanges which do not affect the first row.

Finally, subtract appropriate multiples of every other row from the first row to make all elements zero, except the element in the upper left corner.

The resulting matrix is diagonal.

We are now ready to prove that det(AB) = det(A)det(B) for a general nonsingular matrix A.

There is some sequence of elementary row operations and row interchanges which makes A into a diagonal matrix. We can perform these operations in reverse to convert the diagonal matrix back to A. Let R₁, R₂, R₃,..., R_m be the matrices of these reverse operations. Then

R₁ R₂ R₃ ... R_m D = A,

where D is a diagonal matrix. Repeated application of the special-case results shows that

det(R₁) det(R₂) det(R₃) ... det(R_m) det(D) = det(A).

Similarly,

R₁ R₂ R₃ ... R_m D B = AB,

and

det(R₁) det(R₂) det(R₃) ... det(R_m) det(D) det(B) = det(AB),

Hence det(A)det(B) = det(AB), which establishes the result when A is any nonsingular matrix. █

The result can be extended to any finite number of matrices; e.g., det(ABC) = det(A)det(B)det(C).

The result is also valid for matrices over any commutative ring, although this proof depends on an elimination process, which is not always possible in commutative rings. In this case, consider det(AB) - det(A)det(B) as a function of 2 N ² independent variables, which are the entries in A and B. It is a polynomial whose value is zero for all values of the independent variables in any field. For infinite fields, such as the field of real numbers, this is possible only if all coefficients are zero when like terms are combined. Hence the polynomial is also identically zero in any commutative ring.

Theorem 5.2. The determinant of a nonsingular matrix is nonzero, and det(A^-1) = 1/det(A).

Proof. If A is nonsingular, then det(A^-1)det(A) = det(A^-1A) = det(I) = 1. █

Similar matrices have the same determinant, because det(Q^-1AQ) = det(Q^-1) det(A) det(Q) = (1/det(Q)) det(A) det(Q) = det(A). Since two distinct matrices are similar if, and only if, they are matrices of the same linear transformation with respect to different bases, it makes sense to talk of the determinant of a linear transformation, without regard to any particular basis or matrix.

Theorem 5.3. The determinant of the direct sum of two or more square matrices is the product of their determinants:

det(A₁ ⊕ A₂ ⊕ ... ⊕ A_m) = det(A₁) det(A₂) ... det(A_m).

(The matrices need not be the same size.)

Proof. It is fairly easy to show, from the definition, that the result holds for two matrices when one of them is the identity:

det(A ⊕ I) = det(I ⊕ A) = det(A).

The partition theorem can be used to multiply two partitioned matrices:

A ⊕ B = (A ⊕ I_B) (I_A ⊕ B),

where I_A and I_B are identity matrices of the same sizes as A and B, respectively.

Then

det(A ⊕ B) = det(A ⊕ I_B) det(I_A ⊕ B) = det(A) det(B).

The extension to three or more matrices is straightforward. █

6. Cofactors and Minors

In the definition of det(A), where A is an n ⨯ n matrix, let us factor out the elements of the i-th row:

det(A) = A_i1 a_i1 + A_i2 a_i2 + ... + A_in a_in (6.1.1)

The coefficient A_ij is called the cofactor of a_ij.

Since each product in the definition of det(A) contains exactly one element from each row and column, the cofactor A_ij contains no elements from the i-th row or the j-th column. In fact, it is the determinant of a matrix obtained from A by replacing a_ij by 1 and all other elements of the i-th row and j-th column by zeros.

Theorem 6.1. The cofactor A_ij is the product of (-1)^i+j and the determinant of the matrix M_ij obtained from A by deleting the i-th row and j-th column and pushing the remaining elements together to form an (n-1) ⨯ (n-1) matrix.

Proof. This is easiest to see when the element is a₁₁. In this case, M₁₁ consists of the last n-1 rows and the last n-1 columns, and it is not disturbed when a₁₁ is replaced by 1 and other elements in the first row and column are replaced by zeros.

The determinant of the whole matrix is clearly the same as the determinant of M₁₁, which proves the result for this case.

If the element a_ij is elsewhere in the matrix, then we can bring it to the first row without changing the relative positions of the other rows, by i row exchanges. First we exchange the i-th and the (i-1)-th row. Then we exchange the (i-1)-th and (i-2)-th row, and so on until we exchange the first and second rows.

Simlarly, we can perform j adjacent column exchanges to bring the element to the upper left corner.

All these exchanges have multiplied the determinant by (-1)^i+j. The resulting determinant is obviously the determinant of M_ij, whose rows and columns are now together. This proves the result in the general case. █

A matrix formed by deleting some rows and columns of A, such as M_i,j, is called a minor of A.

This theorem can be used recursively with (6.1.1) to compute determinants, but it is impractical for larger matrices because the number of operations increases exponentially with n.

As with most results for determinants, there is a similar result for the j-th column:

det(A) = A_1j a_1j + A_2j a_2j + ... + A_nj a_nj (6.1.2)

Now consider an expression similar to (6.1.1) with elements from the i-th row and cofactors from the k-th row. If i≠k, this is equivalent to (6.1.1) for a matrix with two equal rows. By Theorem 2.4, its determinant is zero. Hence

A_k1 a_i1 + A_k2 a_i2 + ... + A_kn a_in = 1 if i=k, 0 otherwise.

Let C be a matrix of the cofactors of A; i.e., c_ij = A_ij. If A is nonsingular, the product A det(A) C^T = I. Hence (1/det(A)) C^T is the inverse of A.

If the elements of A are taken from a commutative ring, then A has an inverse if, and only if, det(A) has an inverse.

7. Cramer's Rule

Originally, determinants were applied, not to matrices as such, but to systems of n linear equations in n unknowns:

a₁₁ x₁ + a₁₂ x₂ + ... + a_1n x_n = b₁
a₂₁ x₁ + a₂₂ x₂ + ... + a_2n x_n = b₂
* * *
a_n1 x₁ + a_n2 x₂ + ... + a_nn x_n = b_n

This can be written in matrix form as Ax = b, where x and b are column vectors and A is the matrix of coefficients.

Clearly, if A is nonsingular, the system has a unique solution x = A^-1b = (1/det(A))C^Tb, where C is the matrix of cofactors of A. Hence

x_i = (b₁ A_1i + b₂ A_2i + ...+ b_n A_ni) / det(A).

The sum is the cofactor expansion of the determinant of a matrix obtained from A by replacing the i-th column with b. The result can be expressed rather elegantly by Cramer's Rule:

Theorem 7.1. If the determinant D of the matrix of coefficients of a system of n linear equations in n unknowns in a field is nonzero, then the system has a unique solution, and the value of the i-th unknown is the product of (1/D) and the determinant of the matrix obtained by replacing the coefficients of the i-th unknown by the constants.

Of course, the same result applies to a commutative ring, except that in this case the determinant of the coefficients must have an inverse.

Because of the large number of operations required, Cramer's rule is a practical way to solve linear equations only when the number of equations is small.