## Eigenvalues and Eigenvectors## Philip J. Erdelsky## May 2, 2009 |

Please e-mail comments, corrections and additions to the webmaster at pje@efgh.com.

Let * A* be a square matrix.
An

for some nonzero column vectorAx= λx

The above equation can also be written:

where(A- λI)x=0,

The alternative form of the equation is satisfied if, and only if,
the matrix * A - λI* is
singular.

This gives us alternative definitions of the eigenvalue.
An eigenvalue of the matrix * A* is a root of the
polynomial

A matrix over an algebraically complete field, such as the field of
complex numbers, always has at least one eigenvalue. Since the
characteristic polynomial of an
*n⨯ n* matrix is
of degree *n*, the matrix cannot have more
than *n* eigenvalues.

Even if a field is not algebraically complete, the eigenvalues of a matrix exist in the splitting field of its characteristic polynomial.

**Theorem 2.1.** *
A square matrix has the same characteristic
polynomial and the same eigenvalues as its transpose.*

*Proof.* The characteristic polynomials
of * A* and

det(A- zI) = det((A- zI)^{T}) = det(A^{T}- zI^{T}) = det(A^{T}- zI).

In an infinite field, such as the field of real numbers, this is
sufficient to show that the polynomials are identical; i.e., they
have equal coefficients. However, each coefficient is itself
a polynomial in the entries in * A* and

Identical characteristic polynomials yield identical eigenvalues. █

**Theorem 2.2.** *
Similar matrices have the same characteristic
polynomial and the same eigenvalues.*

*Proof.* If * Q*
is nonsingular, then

det(A- zI) = det(Q^{-1}) det(A- zI) det(Q) = det(Q^{-1}(A- zI)Q) = det(Q^{-1}AQ- zQ^{-1}IQ)) = det(Q^{-1}AQ- zI).

The rest of the argument is the same as for Theorem 2.1, except that not all values
of the coefficients of * Q* will make

Since similar matrices can be interpreted as matrices of the same linear transformation with respect to different bases, it makes sense to define the characteristic polynomial and eigenvalues of a linear transformation of a finite-dimensional vector space, without regard to any particular basis.

**Theorem 2.3.** *
The matrices AB and BA have the same characteristic
polynomial and the same eigenvalues.*

*Proof.* If * A* is nonsingular, then

BA=A^{-1}ABA,

and the desired result follows from Theorem 2.2. The extension to
all * A* uses a similar argument.
█

If *λ* is an eigenvalue
of the matrix * A*, then the set

Every matrix has at least one eigenvalue (perhaps in an extension field),
but since the characteristic polynomial may have multiple roots, the
number of distinct eigenvalues of an *n⨯ n*
matrix may be less than *n*. It can never be
greater than *n*.

Every eigenvalue *λ* of a
matrix * A* has an

The characteristic polynomial is| 0 1 | | 0 0 |

The characteristic polynomial can be factored into linear factors (in its splitting field, if necessary):

where λdet(A- zI) = (-1)^{ n}(z-λ_{1}) (z-λ_{2}) ... (z-λ_{n}),

Substitution of *z = 0* shows that the determinant is
the product of the eigenvalues:

det(A) = λ_{1}λ_{2}... λ_{n}.

The coefficient of *z ^{ n-1}* in the determinant
is the sum of the diagonal entries in the matrix.
The coefficient of

a_{11}+ a_{22}+ ... + a_{nn}= λ_{1}+ λ_{2}+ ... + λ_{n}.

The sum of the diagonal entries in a square matrix is called the __trace__
of the matrix. Since it is the sum of the eigenvalues, it is the same for
similar matrices, and hence it can be considered a property of a linear
transformation.

The characteristic polynomial of a direct sum of matrices is the product of their characteristic polynomials:

A=B_{1}⊕B_{2}⊕ ... ⊕B_{m},

det(A-λI) = det(B_{1}-λI) det(B_{2}-λI) ... det(B_{m}-λI).

Its eigenvalues are clearly those of the constituent matrices.

If the square matrix * A* is triangular, then

det(A- zI) = (a_{11}- z) (a_{22}- z) ... (a_{nn}- z).

This makes the characteristic equation particularly easy to solve. The
eigenvalues of * A* are its diagonal elements.

**Theorem 3.1.** *Every square matrix is similar (over the splitting
field of its characteristic polynomial) to an upper triangular matrix.
*

*Proof.* We use induction on the size of the matrix.
For a *1⨯ 1* matrix the result
is trivial.

Now assume that every
*(n-1)⨯ (n-1)* matrix is similar
to an upper triangular matrix, and let * A* be

Let *λ* be an eigenvalue of
* A* and let

Now let * e_{1}* be the

In particular,
* Re_{1} = r_{1}*
and

Then the first column of
* R^{ -1}AR* is

| λv| |0B|

Here * v* is

By inductive hypothesis * B* is similar
to

can be shown by the partition theorem to be equal to| 10^{ }| | λv| | 10| |0S^{ -1}| |0B| |0S|,

which is upper triangular and similar to| λvS^{ }| |0S^{ -1}BS|,

Of course, a similar argument shows that every square matrix is similar to a lower triangular matrix.

**Theorem 3.2.** *Eigenvectors corresponding to distinct
eigenvalues are linearly independent.
*

*Proof.* We use induction on the number
*m* of eigenvectors (which may be less than the
size of the matrix). For *m=1* the assertion is
trivial.

Now assume that * x_{1}, x_{2}, ...,
x_{m}* are eigenvectors
corresponding to the distinct eigenvalues

c_{1}x_{1}+ c_{2}x_{2}+ ... + c_{m-1}x_{m-1}+ c_{m}x_{m}=0.

Multiply both sides by the matrix
* A - λ_{m}I*
to obtain:

c_{1}(λ_{1}-λ_{m})x_{1}+ c_{2}(λ_{2}-λ_{m})x_{2}+ ... + c_{m-1}(λ_{m-1}-λ_{m})x_{m-1}=0.

By inductive hypothesis,
*c _{k}(λ_{k}-λ_{m}) = 0*
for every

**Corollary 3.3.** *A matrix with distinct eigenvalues is
similar (over the splitting
field of its characteristic polynomial) to a diagonal matrix whose
diagonal elements are the eigenvalues.
*

For a square matrix * A*, we can define matrix
powers as repeated
multiplication of

whereA^{ m}=AA...A(m times),

A^{ 0}=I,

A^{ -m}=A^{ -1}A^{ -1}...A^{ -1}(m times),

Although matrix multiplication is not commutative in general, powers of the same matrix do commute.A^{ r+s}=A^{ r}A^{ s},

A^{ rs}= (A^{ r})^{ s}

Let *p(z) = c _{m}z^{ m} +
c_{m-1}z^{ m-1} + ... + c_{1}z +
c_{0}
* be a polynomial and let

p(A) = c_{m}A^{ m}+ c_{m-1}A^{ m-1}+ ... + c_{1}A+ c_{0}I.

Matrix polynomials have many of the properties of polynomials in general. Multiplication of polynomials in the same matrix is commutative.

Since similarity preserves matrix operations, application of the same polynomial to similar matrices produces similar results:

p(R^{ -1}AR) =R^{ -1}p(A)R.

**Theorem 4.1.** *For every square matrix A,
there is a unique monic polynomial m(z) (over the same field)
of minimal degree
for which m(A) = 0. It divides every polynomial
p(z) (over the same field or an extension field)
for which p(A) = 0.
*

*Proof.* If
* A* is the zero matrix, then clearly

In other cases, the matrices * I,
A, A^{ 2}, A^{ 3}, ...* are
vectors in a finite-dimensional vector space (with the usual definitions of
addition and scalar multiplication), so there is a minimum value of

where not all coefficients are zero, and in particularp(A) = c_{k}A^{ k}+ c_{k-1}A^{ k-1}+ ... + c_{1}A+ c_{0}I=0,

Now let *p(z)* be any polynomial with
*p( A) = 0*.
By the division algorithm,

The uniqueness of *m(z)* follows from the
fact that monic polynomials of the same degree which divide
each other must be identical.
█

The polynomial *m(z)* is called
the __minimum polynomial__ of the matrix * A*.
It is easily seen that similar matrices have the same minimum polynomial.
Hence it makes sense to define the minimum polynomial of a linear
transformation of a finite-dimensional vector space, without regard
to the basis.

The following generalization of Theorem 4.1 shows that there is a minimum polynomnial for every nontrivial subspace, relative to a given square matrix.

**Theorem 4.2.** *For every
n⨯ n matrix A and every nontrivial subspace S of the
vector space of n-dimensional column vectors (over the same field),
there is a unique monic polynomial m_{S}(z) (over the same field)
of minimal degree
such that m_{S}(A)x = 0 for every
x ∈ S.
It divides every polynomial
p(z) (over the same field or an extension field)
such that p(A)x = 0 for every
x ∈ S. Moreover, if S is a subspace of T,
then m_{S}(z) divides m_{T}(z).
*

*Proof.* Clearly, such a minimum polynomial exists,
because the minimum polynomial of the matrix has the desired property,
although its degree may not be minimal.

Now let *p(z)* be any polynomial with
*p( A)x = 0* for every

The minimum polynomial of the whole space is also the minimum polynomial
of the matrix, because *p( A)x = 0*
for all

Of course, the minimum polynomial is always defined with respect to a matrix (or its associated linear transformation). Since most discussions involving minimum polynomals refer to only a single matrix, it is usually understood.

The following theorem is called the __Cayley-Hamilton Theorem__.

**Theorem 4.3.** *Let p(z) be the characteristic
polynomial of a square matrix A. Then p(A) = 0,
and the minimum polynomial of A divides p(z).
*

*Proof.* We first show that the theorem
holds for any diagonal matrix. Operations on diagonal matrices
are especially simple. The sum and product of two diagonal matrices are
obtained by adding and multiplying corresponding diagonal entries,
respectively. Hence *p( A)* is the
diagonal matrix whose

Now the theorem also holds for any matrix similar to a diagonal
matrix, because *
p( R^{ -1}AR) =
R^{ -1}p(A)R
* and similar matrices have the same characteristic
polynomial. In particular, the theorem holds for any matrix
with distinct eigenvalues.

If every square matrix had distinct eigenvalues, the proof would end here.

We can use a continuity argument to extend the theorem to complex
matrices that do not have distinct eigenvalues. Although the
matrix * A* may not be similar to a diagonal matrix,
it is similar to an

Finally, we can extend the result to matrices over any field. In the complex
case, each entry in *p( A)* is a
polynomial in the entries in

It has been shown that any matrix with distinct eigenvalues is similar to a diagonal matrix. A matrix with repeated eigenvalues may or may not be similar to a diagonal matrix, but it is similar to a matrix which is nearly diagonal, according to the two decomposition theorems.

**Theorem 5.1.** *A square matrix is similar (over
the splitting field of its characteristic polynomial) to a direct sum
of matrices, each of which has only a single eigenvalue, which is
an eigenvalue of the original matrix, and whose
size is the algebraic multiplicity of the eigenvalue.
*

*Proof.* The proof is by induction on the number
of distinct eigenvalues. If there is only one, the theorem is trivially
true.

Let *λ* be an eigenvalue
of the *n⨯ n*
matrix * A* of algebraic multiplicity

Now let * x* be any vector in the
null space of

Similarly, the null space of *r( A)*
is also invariant under the linear transformation whose matrix is

Since *q(z)* and *r(z)* are
relatively prime, by the Euclidean algorithm there are matrices
*s(z)* and *s(z)* such that
*s(z)q(z) + t(z)r(z) = 1* and
*s( A)q(A) + t(A)r(A) =
I*, and consequently

Now *s( A)q(A)x* is in the null
space of

Similarly, *t( A)r(A)x* is in the null
space of

Hence an arbitrary vector * x* can be expressed
as a sum of elements of the two null spaces.

Moreover, the only common element of the null spaces is zero, because
*q( A)x = 0* and

Therefore, the entire vector space is the direct sum of the two
null spaces, each of which is an invariant subspace
under the linear transformation whose matrix is
* A*.

It follows that * A* is similar to the
direct sum

Now let *ω* be an eigenvalue of
* A_{q}*
and let

Hence the characteristic polynomials of * A_{q}*
and

To complete the proof, we apply the inductive hypothesis to
* A_{r}*.
█

A __nilpotent matrix__ is a square matrix with some power equal to
the zero matrix; i.e.; the square matrix
* A* is nilpotent if and only if

Most of the concepts used in this section are also defined for matrices that are not nilpotent, but we will not need the more general definitions.

The minimum polynomial of a nilpotent matrix is always a power of the matrix, and so are the minimum polynomials of nontrivial subspaces and nonzero vectors. Therefore, we shall in most cases use the degree of the minimum polynomial (abbreviated DMP), to refer to the mimimum polynomial of a nilpotent matrix, the linear transformation associated with it, or a nontrivial subspace or nonzero vector.

Let *m* be the DMP of the vector
* x* with respect to the nilpotent
matrix

We first show that the vectors
* x,
Ax,
A^{ 2}x, ...,
A^{ m-1}x*.
are linearly independent. To this end, let

The *m*-dimensional subspace spanned by
* x,
Ax,
A^{ 2}x, ...,
A^{ m-1}x*
is an invariant subspace. It is called
a

**Lemma 6.1.** *Given a nilpotent linear transformation
of a finite-dimensional vector space,
there is always a cyclic invariant subspace whose DMP is the
same as that of the transformation.
*

*Proof.* The DMP of the transformation is always
the maximum DMP of any of the vectors in a basis for the vector space.
Since the basis is finite, it is equal to the DMP of at least one basis
vector.
█

**Lemma 6.2.** *
If e and f are two nonnegative integers
whose sum is less than or equal to the dimension of a
cyclic invariant subspace for the nilpotent matrix A and
A^{ e}s = 0 for a vector s in the
subspace, then there is a vector r
in the subspace for which A^{ f}r = s.
*

*Proof.* Let
* x,
Ax,
A^{ 2}x, ...,
A^{ m-1}x*
be a basis for the subspace. Then

which implies thats= c_{0}x+ c_{1}Ax+ c_{2}A^{ 2}x+ ... + c_{m-1}A^{ m-1}x,

A^{ e}s= c_{0}A^{ e}x+ c_{1}A^{ e+1}x+ c_{2}A^{ e+2}x+ ... + c_{m-1-e}A^{ m-1}x=0,

ands= c_{m-e}A^{ m-e}x+ c_{m-e+1}A^{ m-e+1}x+ c_{m-e+2}A^{ m-e+2}x+ ... + c_{m-1}A^{ m-1}x,

█y= c_{m-e}A^{ m-e-f}x+ c_{m-e+1}A^{ m-e-f+1}x+ c_{m-e+2}A^{ m-e-f+2}x+ ... + c_{m-1}A^{ m-f-1}x.

A generalization of the following theorem is often called the second decomposition theorem.

**Theorem 6.3.** *Given a nilpotent linear transformation
of a finite-dimensional vector space, the space is a direct sum of
cyclic invariant subspaces of the transformation.
*

*Proof.* The proof is by induction on
the dimension *n*
of the vector space. It is trivial for
*n = 1* (and also fairly easy
to prove for *n = 2*).

When *n > 1*, let * A*
be the matrix of the transformation (relative to any basis),
and let

By Lemma 6.1, there is a cyclic invariant subspace
*S* whose
DMP is *m*. If *m = n*,
*S* is the whole space and the proof is
complete.

If *m < n*, consider the relation
between vectors defined by * x ~ y* if

Now consider the set *T ^{ *}* of equivalence classes.
For every vector

Then it is easily shown that the following definitions
make *T ^{ *}* a vector space (over the same field
as

x^{ *}+y^{ *}= (x+y)^{ *},

cx^{ *}= (cx)^{ *}.

Let * u_{1}, u_{2}, ...,
u_{m}* be any basis for

Since *S* is an invariant subspace with
respect to * A*, we can define a linear
transformation of

A^{ *}(x^{ *}) = (Ax)^{ *}.

This transformation is obviously nilpotent. Let *p*
be its DMP.

Now * A^{ m} = 0*, which
is equivalent to

Since the dimension of *T ^{ *}* is less than

Consider one such subspace whose basis is
* h^{ *},
(Ah)^{ *},
(A^{ 2}h)^{ *}, ...,
(A^{ r-1}h)^{ *}*, where

Now *( A^{ r}h)^{ *} = 0^{ *}*,
which is equivalent to

Now multiply by * A^{ m-r}* (this is
where

By Lemma 6.2, there must be a
vector * t ∈ S* such that

Let * g = h - t*.
Then the basis for the cyclic invariant subspace of

Then * g,
Ag,
A^{ 2}g, ...,
A^{ r-1}g* are a basis for a
cyclic invariant subspace of the whole space.

This can be done for every cyclic invariant subspace of
*T ^{ *}*. The resulting
subspaces, together with

The matrix of a nilpotent transformation over a cyclic invariant subspace
relative to the basis
* x,
Ax,
A^{ 2}x, ...,
A^{ m-1}x*
is particularly simple. It has ones just above the main diagonal
and zeros elsewhere, as in this four-dimensional example:

| 0 1 0 0 | | 0 0 1 0 | | 0 0 0 1 | | 0 0 0 0 |

The matrix of a nilpotent linear transformation relative to a basis consisting of the combined bases of its cyclic invariant subspaces is a direct sum of such matrices. Therefore, we have proven:

**Theorem 6.4.** *A nilpotent matrix is similar
to a direct sum of matrices, each of which has ones just above the
main diagonal and zeros elsewhere.
*

A __Jordan block__ is an upper triangular matrix with a single eigenvalue,
which appears on the main diagonal, ones above the main diagonal,
and zeros elsewhere:

| λ 1 0 0 ... 0 0 | | 0 λ 1 0 ... 0 0 | | 0 0 λ 1 ... 0 0 | | 0 0 0 λ ... 1 0 | * * * | 0 0 0 0 ... λ 1 | | 0 0 0 0 ... 0 λ |

**Theorem 7.1.** *Every square matrix is similar
(over the spitting field of its characteristic polynomial) to a direct
sum of Jordan blocks.
*

*Proof.* By Theorem 5.1, a matrix is similar
to a direct sum of matrices, each with a single eigenvalue:

Q^{ -1}AQ=A_{1}⊕A_{2}⊕ ... ⊕A_{m}.

Let *λ _{k}* be
the eigenvalue of

Q_{k}^{ -1}(A_{k}- λ_{k}I)Q_{k}=B_{k1}⊕B_{k2}⊕B_{k3}⊕ ...

Hence

which is a direct sum of Jordan blocks. HenceQ_{k}^{ -1}A_{k}Q_{k}= (B_{k1}⊕B_{k2}⊕B_{k3}⊕ ... ) + λ_{k}I,

has the required form. █(Q_{1}^{ -1}⊕Q_{2}^{ -1}⊕ ... ⊕Q_{m}^{ -1})Q^{ -1}AQ(Q_{1}⊕Q_{2}⊕ ... ⊕Q_{m})

The direct sum of Jordan blocks is called the __Jordan canonical form__,
or the __Jordan normal form__, of a matrix. It can be shown that the
Jordan canonical form of a matrix is unique, except for the order
of the blocks. Moreover, it is essentially the same for similar matrices, so
it can be considered a property of the associated linear transformation.

The basis which gives rise to the Jordan canonical form is not unique, even if rearrangements of the basis vectors are allowed. For example, an identity matrix is its own Jordan canonical form, regardless of the basis used.