LOGO

Eigenvalues and Eigenvectors

Philip J. Erdelsky

May 2, 2009

Please e-mail comments, corrections and additions to the webmaster at pje@efgh.com.

1. Definitions

Let A be a square matrix. An eigenvalue of A is a scalar λ such that

Ax = λx
for some nonzero column vector x. The vector x is called an eigenvector corresponding to the eigenvalue.

The above equation can also be written:

(A - λI)x = 0,
where I is the identity matrix (of the same size as A).

The alternative form of the equation is satisfied if, and only if, the matrix A - λI is singular.

This gives us alternative definitions of the eigenvalue. An eigenvalue of the matrix A is a root of the polynomial det(A - zI), which is called the characteristic polynomial of the matrix, or a solution of the characteristic equation det(A - zI) = 0. The eigenvalue itself is also called a characteristic root of the matrix.

2. Basic Properties

A matrix over an algebraically complete field, such as the field of complex numbers, always has at least one eigenvalue. Since the characteristic polynomial of an n⨯ n matrix is of degree n, the matrix cannot have more than n eigenvalues.

Even if a field is not algebraically complete, the eigenvalues of a matrix exist in the splitting field of its characteristic polynomial.

Theorem 2.1. A square matrix has the same characteristic polynomial and the same eigenvalues as its transpose.

Proof. The characteristic polynomials of A and AT are equal for all values of the independent variable:

det(A - zI) = det((A - zI)T) = det(AT - zIT) = det(AT - zI).

In an infinite field, such as the field of real numbers, this is sufficient to show that the polynomials are identical; i.e., they have equal coefficients. However, each coefficient is itself a polynomial in the entries in A and AT. They can be equal for all values of the entries only if they have equal coefficients. Hence the two polynomials are identical in any field.

Identical characteristic polynomials yield identical eigenvalues. █

Theorem 2.2. Similar matrices have the same characteristic polynomial and the same eigenvalues.

Proof. If Q is nonsingular, then det(Q-1) det(Q) = 1, and the characteristic polynomials of A and Q-1AQ are equal for all values of the independent variable:

det(A - zI) = det(Q-1) det(A - zI) det(Q) = det(Q-1 (A - zI) Q) = det(Q-1AQ - zQ-1IQ)) = det(Q-1AQ - zI).

The rest of the argument is the same as for Theorem 2.1, except that not all values of the coefficients of Q will make Q nonsingular. However, all values within some interval will, so the argument is still valid. █

Since similar matrices can be interpreted as matrices of the same linear transformation with respect to different bases, it makes sense to define the characteristic polynomial and eigenvalues of a linear transformation of a finite-dimensional vector space, without regard to any particular basis.

Theorem 2.3. The matrices AB and BA have the same characteristic polynomial and the same eigenvalues.

Proof. If A is nonsingular, then AB and BA are similar:

BA = A-1ABA,

and the desired result follows from Theorem 2.2. The extension to all A uses a similar argument. █

If λ is an eigenvalue of the matrix A, then the set {x | Ax = λx} is a nontrivial subspace, which is called the eigenspace corresponding to λ.

Every matrix has at least one eigenvalue (perhaps in an extension field), but since the characteristic polynomial may have multiple roots, the number of distinct eigenvalues of an n⨯ n matrix may be less than n. It can never be greater than n.

Every eigenvalue λ of a matrix A has an algebraic multiplicity, which is the number of times z-λ appears in a complete factorization of the characteristic polynomial. It also has a geometric multiplicity, which is the dimension of its eigenspace. The algebraic multiplicities of all the eigenvalues of an n⨯ n matrix always add up to n. The geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity. The proof of this fact will come later. It may be less, as the following simple example:

     | 0 1 |
     | 0 0 |
The characteristic polynomial is z 2. The only eigenvalue is 0, and its algebraic multiplicity is 2. However, all eigenvectors are nonzero scalar multiples of (1,0) T, so its geometric multiplicity is only 1.

The characteristic polynomial can be factored into linear factors (in its splitting field, if necessary):

det(A - zI) = (-1) n (z-λ1) (z-λ2) ... (z-λn),
where λ1, λ2 ..., λn are the eigenvalues.

Substitution of z = 0 shows that the determinant is the product of the eigenvalues:

det(A) = λ1 λ2 ... λn.

The coefficient of z n-1 in the determinant is the sum of the diagonal entries in the matrix. The coefficient of z n-1 in the product of linear factors is the sum of the eigenvalues. Hence:

a11 + a22 + ... + ann = λ1 + λ2 + ... + λn.

The sum of the diagonal entries in a square matrix is called the trace of the matrix. Since it is the sum of the eigenvalues, it is the same for similar matrices, and hence it can be considered a property of a linear transformation.

The characteristic polynomial of a direct sum of matrices is the product of their characteristic polynomials:

A = B1B2 ⊕ ... ⊕ Bm,
det(AI) = det(B1I) det(B2I) ... det(BmI).

Its eigenvalues are clearly those of the constituent matrices.

3. Triangular and Diagonal Matrices

If the square matrix A is triangular, then A - zI is also triangular, and its determinant is the product of its diagonal elements:

det(A - zI) = (a11 - z) (a22 - z) ... (ann - z).

This makes the characteristic equation particularly easy to solve. The eigenvalues of A are its diagonal elements.

Theorem 3.1. Every square matrix is similar (over the splitting field of its characteristic polynomial) to an upper triangular matrix.

Proof. We use induction on the size of the matrix. For a 1⨯ 1 matrix the result is trivial.

Now assume that every (n-1)⨯ (n-1) matrix is similar to an upper triangular matrix, and let A be n⨯ n.

Let λ be an eigenvalue of A and let r1 be a corresponding eigenvector. Find additional column vectors so that r1, r2, ..., rn are linearly independent. Let these column vectors be the columns of the matrix R.

Now let e1 be the n⨯ 1 column vector with 1 as its first element and zeros elsewhere. Then it is easily shown that for any n⨯ n matrix S, the product Se1 is the first column of S.

In particular, Re1 = r1 and R -1r1 = e1.

Then the first column of R -1AR is R -1ARe1, which is easily shown to be λe1. Hence R -1AR can be partitioned as follows:

     | λ v |
     | 0 B |

Here v is 1⨯ (n-1), B is (n-1)⨯ (n-1), and 0 consists of zeros.

By inductive hypothesis B is similar to S -1BS, which is upper triangular. Hence

     | 1 0    | | λ v | | 1 0 |
     | 0 S -1 | | 0 B | | 0 S |,
can be shown by the partition theorem to be equal to
     | λ vS     |
     | 0 S -1BS |,
which is upper triangular and similar to A. █

Of course, a similar argument shows that every square matrix is similar to a lower triangular matrix.

Theorem 3.2. Eigenvectors corresponding to distinct eigenvalues are linearly independent.

Proof. We use induction on the number m of eigenvectors (which may be less than the size of the matrix). For m=1 the assertion is trivial.

Now assume that x1, x2, ..., xm are eigenvectors corresponding to the distinct eigenvalues λ1, λ2, ..., λm and let

c1x1 + c2x2 + ... + cm-1xm-1 + cmxm = 0.

Multiply both sides by the matrix A - λmI to obtain:

c11m)x1 + c22m)x2 + ... + cm-1m-1m)xm-1 = 0.

By inductive hypothesis, ckkm) = 0 for every 0 ≤ k ≤ m-1. Since λk≠λm, it follows that ck = 0 in every case. Therefore, cmxm = 0. Hence cm must also be zero, and the eigenvectors are linearly independent. █

Corollary 3.3. A matrix with distinct eigenvalues is similar (over the splitting field of its characteristic polynomial) to a diagonal matrix whose diagonal elements are the eigenvalues.

4. Matrix Polynomials

For a square matrix A, we can define matrix powers as repeated multiplication of A or A -1 (if A is nonsingular):

A m = AA ... A (m times),
A 0 = I,
A -m = A -1A -1 ... A -1 (m times),
where I is the identity matrix of the same size as A. The usual power laws hold:
A r+s = A rA s,
A rs = (A r) s
Although matrix multiplication is not commutative in general, powers of the same matrix do commute.

Let p(z) = cmz m + cm-1z m-1 + ... + c1z + c0 be a polynomial and let A be a square matrix, both over the same field. Although a polynomial is usually evaluated only for scalars, it is possible to substitute the matrix A for z (and c0I for the constant term) to obtain:

p(A) = cmA m + cm-1A m-1 + ... + c1A + c0I.

Matrix polynomials have many of the properties of polynomials in general. Multiplication of polynomials in the same matrix is commutative.

Since similarity preserves matrix operations, application of the same polynomial to similar matrices produces similar results:

p(R -1AR) = R -1p(A)R.

Theorem 4.1. For every square matrix A, there is a unique monic polynomial m(z) (over the same field) of minimal degree for which m(A) = 0. It divides every polynomial p(z) (over the same field or an extension field) for which p(A) = 0.

Proof. If A is the zero matrix, then clearly m(z) = z. (The argument below is valid in this case, but it is hard to follow.)

In other cases, the matrices I, A, A 2, A 3, ... are vectors in a finite-dimensional vector space (with the usual definitions of addition and scalar multiplication), so there is a minimum value of k such that I, A, A 2, ..., A k are linearly dependent. Then

p(A) = ckA k + ck-1A k-1 + ... + c1A + c0I = 0,
where not all coefficients are zero, and in particular ck must be nonzero, since otherwise k would not be minimal. Then m(z) = z k + (ck-1/ck)z k-1 + ... + (c1/ck)z + c0/ck.

Now let p(z) be any polynomial with p(A) = 0. By the division algorithm, p(z) = m(z)q(z) + r(z), where r(z) is zero or of lower degree than m(z). But substitution of A for z shows that r(A) = 0, which is impossible for a nonzero r(z) because m(z) is the nonzero polynomial of lowest degree with this property. Hence r(z) is the zero polynomial and m(z) divides p(z).

The uniqueness of m(z) follows from the fact that monic polynomials of the same degree which divide each other must be identical. █

The polynomial m(z) is called the minimum polynomial of the matrix A. It is easily seen that similar matrices have the same minimum polynomial. Hence it makes sense to define the minimum polynomial of a linear transformation of a finite-dimensional vector space, without regard to the basis.

The following generalization of Theorem 4.1 shows that there is a minimum polynomnial for every nontrivial subspace, relative to a given square matrix.

Theorem 4.2. For every n⨯ n matrix A and every nontrivial subspace S of the vector space of n-dimensional column vectors (over the same field), there is a unique monic polynomial mS(z) (over the same field) of minimal degree such that mS(A)x = 0 for every x ∈ S. It divides every polynomial p(z) (over the same field or an extension field) such that p(A)x = 0 for every x ∈ S. Moreover, if S is a subspace of T, then mS(z) divides mT(z).

Proof. Clearly, such a minimum polynomial exists, because the minimum polynomial of the matrix has the desired property, although its degree may not be minimal.

Now let p(z) be any polynomial with p(A)x = 0 for every x ∈ S (which includes mT(z)). By the division algorithm, p(z) = mS(z)q(z) + r(z), where r(z) is zero or of lower degree than mS(z). But substitution of A for z shows that r(A)x = 0 for every x ∈ S, which is impossible for a nonzero r(z) because mS(z) is the nonzero polynomial of lowest degree with this property. Hence r(z) is the zero polynomial and mS(z) divides p(z). █

The minimum polynomial of the whole space is also the minimum polynomial of the matrix, because p(A)x = 0 for all x if and only if p(A)x = 0. The minimum polynomial of the space of all scalar multiples of a nonzero vector x is often called the minimum polynomial of x.

Of course, the minimum polynomial is always defined with respect to a matrix (or its associated linear transformation). Since most discussions involving minimum polynomals refer to only a single matrix, it is usually understood.

The following theorem is called the Cayley-Hamilton Theorem.

Theorem 4.3. Let p(z) be the characteristic polynomial of a square matrix A. Then p(A) = 0, and the minimum polynomial of A divides p(z).

Proof. We first show that the theorem holds for any diagonal matrix. Operations on diagonal matrices are especially simple. The sum and product of two diagonal matrices are obtained by adding and multiplying corresponding diagonal entries, respectively. Hence p(A) is the diagonal matrix whose i-th diagonal entry is p(aii). But aii is an eigenvalue of A, so p(aii) = 0. Hence p(A) = 0.

Now the theorem also holds for any matrix similar to a diagonal matrix, because p(R -1AR) = R -1p(A)R and similar matrices have the same characteristic polynomial. In particular, the theorem holds for any matrix with distinct eigenvalues.

If every square matrix had distinct eigenvalues, the proof would end here.

We can use a continuity argument to extend the theorem to complex matrices that do not have distinct eigenvalues. Although the matrix A may not be similar to a diagonal matrix, it is similar to an upper triangular matrix T. The eigenvalues of this matrix appear along its main diagonal, as is the case with any upper triangular matrix. It is easy to construct a sequence T1, T2, T3 , ... of upper triangular matrices with distinct eigenvalues whose limit is T. If pk(z) is the characteristic polynomial of Tk, then pk(Tk) = 0. Then limk⟶ ∞ pk(Tk) = p(T)= 0. Since A is similar to T, p(A) = 0.

Finally, we can extend the result to matrices over any field. In the complex case, each entry in p(A) is a polynomial in the entries in A which takes on the value zero for all values of the independent variables. This is possible only if all coefficients are zero, when like terms are combined. Hence the polynomial evaluates to zero in any field. (In fact, it evaluates to zero in any commutative ring.) █

5. The First Matrix Decomposition Theorem

It has been shown that any matrix with distinct eigenvalues is similar to a diagonal matrix. A matrix with repeated eigenvalues may or may not be similar to a diagonal matrix, but it is similar to a matrix which is nearly diagonal, according to the two decomposition theorems.

Theorem 5.1. A square matrix is similar (over the splitting field of its characteristic polynomial) to a direct sum of matrices, each of which has only a single eigenvalue, which is an eigenvalue of the original matrix, and whose size is the algebraic multiplicity of the eigenvalue.

Proof. The proof is by induction on the number of distinct eigenvalues. If there is only one, the theorem is trivially true.

Let λ be an eigenvalue of the n⨯ n matrix A of algebraic multiplicity k, and let λ1, λ2, ..., λn-k be its other eigenvalues. Then its characteristic polynomial can be written as p(z) = q(z)r(z), where q(z) = (z-λ) k, and r(z) = (z-λ1)(z-λ2)...(z-λn-k).

Now let x be any vector in the null space of q(A), i.e. q(A)x = 0. Since multiplication of polynomials in the same matrix is commutative, q(A)(Ax) = Aq(A)x = 0, so Ax is also in the null space. Hence the null space of q(A) is invariant under the linear transformation whose matrix is A.

Similarly, the null space of r(A) is also invariant under the linear transformation whose matrix is A.

Since q(z) and r(z) are relatively prime, by the Euclidean algorithm there are matrices s(z) and s(z) such that s(z)q(z) + t(z)r(z) = 1 and s(A)q(A) + t(A)r(A) = I, and consequently s(A)q(A)x + t(A)r(A)x = x for any vector x.

Now s(A)q(A)x is in the null space of r(A) because r(A)s(A)q(A)x = s(A)q(A)r(A)x = s(A)p(A)x and p(A) = 0 by the Cayley-Hamilton Theorem.

Similarly, t(A)r(A)x is in the null space of q(A).

Hence an arbitrary vector x can be expressed as a sum of elements of the two null spaces.

Moreover, the only common element of the null spaces is zero, because q(A)x = 0 and r(A)x = 0 imply that x = 0.

Therefore, the entire vector space is the direct sum of the two null spaces, each of which is an invariant subspace under the linear transformation whose matrix is A.

It follows that A is similar to the direct sum AqAr of the matrices of the transformation restricted to the null spaces of q(A) and r(A), respectively, and the characteristic polynomial of A is the product of the characteristic polynomials of Aq and Ar.

Now let ω be an eigenvalue of Aq and let x be the corresponding eigenvector (expressed in the original basis). Then 0 = q(A)x = (AI) kx = (ω-λ) kx, which implies that ω = λ. A similar argument shows that λ cannot be an eigenvalue of Ar.

Hence the characteristic polynomials of Aq and Ar are q(z) and r(z), respectively, and Aq has the required properties.

To complete the proof, we apply the inductive hypothesis to Ar. █

6. Decomposition of Nilpotent Matrices

A nilpotent matrix is a square matrix with some power equal to the zero matrix; i.e.; the square matrix A is nilpotent if and only if A m = 0 for some nonnegative integer m. It is easily shown that all eigenvalues of A must be zero. Conversely, if all eigenvalues of a matrix are zero, the Cayley-Hamilton Theorem shows that the matrix is nilpotent. The linear transformation associated with a nilpotent matrix is also said to be nilpotent.

Most of the concepts used in this section are also defined for matrices that are not nilpotent, but we will not need the more general definitions.

The minimum polynomial of a nilpotent matrix is always a power of the matrix, and so are the minimum polynomials of nontrivial subspaces and nonzero vectors. Therefore, we shall in most cases use the degree of the minimum polynomial (abbreviated DMP), to refer to the mimimum polynomial of a nilpotent matrix, the linear transformation associated with it, or a nontrivial subspace or nonzero vector.

Let m be the DMP of the vector x with respect to the nilpotent matrix A. Then A mx = 0 but A m-1x0.

We first show that the vectors x, Ax, A 2x, ..., A m-1x. are linearly independent. To this end, let c0x + c1Ax + c2A 2x + ... + cm-1A m-1x = 0, If we multiply both sides by A m-1, we obtain c0A m-1x = 0. Hence c0 = 0. Now we multiply both sides by A m-2 to obtain c1A m-1x = 0. Hence Hence c1 = 0. Continuing in this way, we show that all coefficients must be zero.

The m-dimensional subspace spanned by x, Ax, A 2x, ..., A m-1x is an invariant subspace. It is called a cyclic invariant subspace of A. Notice that its DMP and dimension are always equal.

Lemma 6.1. Given a nilpotent linear transformation of a finite-dimensional vector space, there is always a cyclic invariant subspace whose DMP is the same as that of the transformation.

Proof. The DMP of the transformation is always the maximum DMP of any of the vectors in a basis for the vector space. Since the basis is finite, it is equal to the DMP of at least one basis vector. █

Lemma 6.2. If e and f are two nonnegative integers whose sum is less than or equal to the dimension of a cyclic invariant subspace for the nilpotent matrix A and A es = 0 for a vector s in the subspace, then there is a vector r in the subspace for which A fr = s.

Proof. Let x, Ax, A 2x, ..., A m-1x be a basis for the subspace. Then

s = c0x + c1Ax + c2A 2x + ... + cm-1A m-1x,
A es = c0A ex + c1A e+1x + c2A e+2x + ... + cm-1-eA m-1x = 0,
which implies that c0 = c1 = c2 = ... = cm-1-e = 0. Hence
s = cm-eA m-ex + cm-e+1A m-e+1x + cm-e+2A m-e+2x + ... + cm-1A m-1x,
and
y = cm-eA m-e-fx + cm-e+1A m-e-f+1x + cm-e+2A m-e-f+2x + ... + cm-1A m-f-1x.

A generalization of the following theorem is often called the second decomposition theorem.

Theorem 6.3. Given a nilpotent linear transformation of a finite-dimensional vector space, the space is a direct sum of cyclic invariant subspaces of the transformation.

Proof. The proof is by induction on the dimension n of the vector space. It is trivial for n = 1 (and also fairly easy to prove for n = 2).

When n > 1, let A be the matrix of the transformation (relative to any basis), and let m be its DMP.

By Lemma 6.1, there is a cyclic invariant subspace S whose DMP is m. If m = n, S is the whole space and the proof is complete.

If m < n, consider the relation between vectors defined by x ~ y if x-y ∈ S. It is easily shown that it is an equivalence relation. Moreover, vector operations on equivalent vectors give equivalent results; i.e, if x ~ x', y ~ y' and c is any scalar, then x + y ~ x' + y' and cx ~ cx'.

Now consider the set T * of equivalence classes. For every vector x, let x * be the class containing it. The vector x is called a representative of the class. It is not unique; clearly (x+s) * = x * for any s ∈ S.

Then it is easily shown that the following definitions make T * a vector space (over the same field as A):

x * + y * = (x + y) *,
cx * = (cx) *.

Let u1, u2, ..., um be any basis for S and let v1 *, v2 *, ..., vk * be any basis for T *. It is easily shown that u1, u2, ..., um, v1, v2, ..., vk is a basis for the whole n-dimensional vector space. Moreover, each vector vj can be replaced by any other representive of the same class, and the same assertions will still hold. Moreover, the dimension of T * is n - m.

Since S is an invariant subspace with respect to A, we can define a linear transformation of T * by

A *(x *) = (Ax) *.

This transformation is obviously nilpotent. Let p be its DMP.

Now A m = 0, which is equivalent to A mx = 0 for every vector x. Then clearly (A *) mx * = 0 * for every equivalence class x *, and p ≤ m.

Since the dimension of T * is less than n, the inductive hypothesis states that it is direct sum of cyclic invariant subspaces. The DMP of every one is less than or equal to p, which is less than or equal to m.

Consider one such subspace whose basis is h *, (Ah) *, (A 2h) *, ..., (A r-1h) *, where r ≤ m.

Now (A rh) * = 0 *, which is equivalent to A rh ∈ S.

Now multiply by A m-r (this is where r ≤ m is used) to obtain A m-rA rh = A mh. Since m is the DMP of A, A mh = 0.

By Lemma 6.2, there must be a vector t ∈ S such that A rt = A rh.

Let g = h - t. Then the basis for the cyclic invariant subspace of T * can be rewritten as g *, (Ag) *, (A 2g) *, ..., (A r-1g) *, where A rg = 0.

Then g, Ag, A 2g, ..., A r-1g are a basis for a cyclic invariant subspace of the whole space.

This can be done for every cyclic invariant subspace of T *. The resulting subspaces, together with S, yield the required decomposition. █

The matrix of a nilpotent transformation over a cyclic invariant subspace relative to the basis x, Ax, A 2x, ..., A m-1x is particularly simple. It has ones just above the main diagonal and zeros elsewhere, as in this four-dimensional example:

     | 0 1 0 0 |
     | 0 0 1 0 |
     | 0 0 0 1 |
     | 0 0 0 0 |

The matrix of a nilpotent linear transformation relative to a basis consisting of the combined bases of its cyclic invariant subspaces is a direct sum of such matrices. Therefore, we have proven:

Theorem 6.4. A nilpotent matrix is similar to a direct sum of matrices, each of which has ones just above the main diagonal and zeros elsewhere.

7. The Jordan Canonical Form

A Jordan block is an upper triangular matrix with a single eigenvalue, which appears on the main diagonal, ones above the main diagonal, and zeros elsewhere:

     | λ 1 0 0 ... 0 0 |
     | 0 λ 1 0 ... 0 0 |
     | 0 0 λ 1 ... 0 0 |
     | 0 0 0 λ ... 1 0 |
            * * *
     | 0 0 0 0 ... λ 1 |
     | 0 0 0 0 ... 0 λ |

Theorem 7.1. Every square matrix is similar (over the spitting field of its characteristic polynomial) to a direct sum of Jordan blocks.

Proof. By Theorem 5.1, a matrix is similar to a direct sum of matrices, each with a single eigenvalue:

Q -1AQ = A1A2 ⊕ ... ⊕ Am.

Let λk be the eigenvalue of Ak. Then Ak - λkI is nilpotent, so by Theorem 6.4 it is similar to a direct sum of matrices, each of which has ones above its main diagonal and zeros elsewhere:

Qk -1(Ak - λkI)Qk = Bk1Bk2Bk3 ⊕ ...

Hence

Qk -1AkQk = (Bk1Bk2Bk3 ⊕ ... ) + λkI,
which is a direct sum of Jordan blocks. Hence
(Q1 -1Q2 -1 ⊕ ... ⊕ Qm -1) Q -1AQ (Q1Q2 ⊕ ... ⊕ Qm)
has the required form. █

The direct sum of Jordan blocks is called the Jordan canonical form, or the Jordan normal form, of a matrix. It can be shown that the Jordan canonical form of a matrix is unique, except for the order of the blocks. Moreover, it is essentially the same for similar matrices, so it can be considered a property of the associated linear transformation.

The basis which gives rise to the Jordan canonical form is not unique, even if rearrangements of the basis vectors are allowed. For example, an identity matrix is its own Jordan canonical form, regardless of the basis used.