Skip to main content
Logo image

Section 5.4 The spectral theorem

Among the many takeaways from Section 4.5 is the simple fact that not all matrices are diagonalizable. In principle Theorem 4.5.13 gives a complete answer to the question of diagonalizability in terms of eigenspaces. However, you should not be mislead by the artificially simple examples treated in Section 4.5. In practice even the determination (or approximation) of the distinct eigenvalues of an n×n matrix poses a very challenging computational problem as n gets large. As such the general question of whether a matrix is diagonalizable remains an intractable one. This makes all the more welcoming the main result of this section: all symmetric matrices are diagonalizable! This surprising fact is a consequence of the spectral theorem for self-adjoint operators: a result which itself fits into a larger suite of spectral theorems that treat the diagonalizability of various families of linear transformations of inner product spaces (both finite and infinite dimensional).

Subsection 5.4.1 Self-adjoint operators

Though we are mainly interested in the diagonalizability of symmetric matrices, our arguments are made more elegant by abstracting somewhat to the realm of linear transformations of inner product spaces. In this setting the appropriate analogue of a symmetric matrix is a self-adjoint linear transformation.

Definition 5.4.1. Self-adjoint operators.

Let (V,,) be a finite-dimensional inner product space. A linear transformation T:VV is called a self-adjoint operator if
(5.4.1)T(v),w=v,T(w)
for all v,wV.
The next theorem makes explicit the connection between self-adjoint operators and symmetric matrices.
Let B=(v1,v2,,vn). We have
A=[|||[T(v1)]B[T(v2)]B[T(vn)]B|||].
Furthermore, since B is orthonormal, the i-th entry of [T(vj)]B is computed as T(vj),vi (5.2.8). Thus A=[aij], where
aij=T(vj),vi.
It follows that
A symmetricaij=aji for all 1i,jnT(vj),vi=T(vi),vj for all 1i,jnT(vj),vi=vj,T(vi) for all 1i,jn(5.1.1,ii).
The last equality in this chain of equivalences states that T satisfies property (5.4.1) for all elements of B. Not surprisingly, this is equivalent to T satisfying the property for all elements in V. (See Exercise 5.4.3.10.) We conclude that A is symmetric if and only if T is self-adjoint.
Since A=[TA]B, where B is the standard ordered basis of Rn, and since B is orthonormal with respect to the dot product, it follows from Theorem 5.4.2 that statements (1) and (2) are equivalent. Statements (2) and (3) are equivalent since by definition TA(x)=Ax for all xRn.
The next result, impressive in its own right, is also key to the induction argument we will use to prove Theorem 5.4.8. A proper proof would require a careful treatment of complex vector spaces: a topic which lies just outside the scope of this text. The “proof sketch” we provide can easily be upgraded to a complete argument simply by justifying a few statements about Cn and its standard inner product.
Pick an orthonormal ordered basis B of V, and let A=[T]B. By Theorem 5.4.2, A is symmetric. To prove that all roots of the characteristic polynomial p(t)=det(tIA) are real, we make a slight detour into complex vector spaces. The set
Cn={(z1,z2,,zn):ziC for all 1in}
of all complex n-tuples, together with the operations
(z1,z2,,zn)+(w1,w2,,wn)=(z1+w1,z2+w2,,zn+wn)
and
α(z1,z2,,zn)=(αz1,αz2,,αzn),
where αC, forms what is called a vector space over C. This means that V=Cn satisfies the strengthened axioms of Definition 3.1.1 obtained by replacing every mention of a scalar cR with a scalar αC. Additionally, the vector space Cn has the structure of a complex inner product defined as
(z1,z2,,zn),(w1,w2,,wn)=z1w1+z2w2++znwn,
where wi denotes the complex conjugate of wi for each i. Essentially all of our theory of real vector spaces can be “transported” to complex vector spaces, including definitions and results about eigenvectors and inner products. The rest of this argument makes use of this principle by citing without proof some of these properties, and this is why it has been downgraded to a “proof sketch”.
We now return to A and its characteristic polynomial p(x). Recall that we want to show that all roots of p(x) are real. Let λC be a root of p(x). The complex theory of eigenvectors implies that there is a nonzero vector zCn satisfying Az=λz. On the one hand, we have
Az,z=λz,z=λz,z
using properties of our complex inner product. On the other hand, since AT=A it is easy to see that Corollary 5.4.3 extends to our complex inner product: i.e.,
Az,w=z,Aw
for all z,wCn. Thus
Az,z==z,Az=z,λz=λz,z.
(In the last equality we use the fact that our complex inner product satisfies z,αw=αz,w for any αC and vectors z,wCn.) It follows that
λz,z=λz,z.
Since z0, we have z,z0 (another property of our complex inner product), and thus λ=λ. Since a complex number z=a+bi satisfies z=z if and only if b=0 if and only if z is real, we conclude that λ is a real number, as claimed.
The corollary follows from Theorem 5.4.4 and the fact that the eigenvalues of T are the real roots of its characteristic polynomial (4.4.25).
From Theorem 5.4.4 and Corollary 5.4.3 it follows that the characterisitic polynomial of any symmetric matrix must factor as a product of linear terms over R, as illustrated by the next two examples.

Example 5.4.6. Symmetric 2×2 matrices.

Verify that the characteristic polynomial of any symmetric 2×2 matrix factors into linear terms over R.
Solution.
Given a symmetric 2×2 matrix
A=[abbc],
we have
p(x)=det(xIA)=x2(a+c)x+(acb2)..
Using the quadratic formula and some algebra, we see that the roots of p(x) are given by where (using the quadratic formula)
(a+c)±(a+c)24ac+4b22=(a+c)±(ac)2+4b22.
Since (ac)2+4b20, we see that both these roots are real. Thus p(x)=(xλ1)(xλ2), where λ1,λ2R.

Example 5.4.7. Symmetric 4×4 matrix.

Verify that the characteristic polynomial of the symmetric matrix
A=[6240260440620426]
factors into linear terms over R.
Solution.
The characteristic polynomial of A is p(x)=x4112x2+2560. We can use the quadratic equation to solve p(x)=0 for u=x2, yielding
u=112±(112)24(2560)2=56±24.
We conclude that x2=32 or x2=80, and thus x=±42 or x=±45. It follows that
p(x)=(x42)(x+42)(x45)(x+45).

Subsection 5.4.2 The spectral theorem for self-adjoint operators

Our version of the spectral theorem concerns self-adjoint linear transformations on a finite-dimensional inner product space. It tells us two remarkable things: (a) every such linear transformation has an eigenbasis (and hence is diagonalizable); and furthermore, (b) the eigenbasis can be chosen to be orthogonal, or even orthonormal.
We will prove the cycle of implications (1)(2)(3)(4)(1).
Assume T is self adjoint. First we show that eigenvectors with distinct eigenvalues are orthogonal. To this end, suppose we have T(v)=λv and T(v)=λv, where λλ. Using the definition of self-adjoint, we have
T(v),v=v,T(v)λv,v=v,λvλv,v=λv,vv,v=0(λλ).
We now prove by induction on dimV that if T is self-adjoint, then T is diagonalizable. The base case dimV=1 is trivial. Assume the result is true of any n-dimensional inner product space, and suppose dimV=n+1. By Corollary 5.4.5 there is a nonzero vV with T(v)=λv. Let W=span{v}. Since dimW=1, we have dimW=dimV1=n. The following two facts are crucial for the rest of the argument and are left as an exercise (5.4.3.11).
  1. For all vW we have T(v)W, and thus by restricting T to W we get a linear transformation T|W:WW.
  2. The restriction T|W is self-adjoint, considered as a linear transformation of the inner product space W. Here the inner product on the subspace W is inherited from (V,,) by restriction.
Now since dimW=n1 and T|W is self-adjoint, we may assume by induction that T|W has an eigenbasis B=(v1,v2,,vn). We claim that B=(v,v2,,vn) is an eigenbasis of V. Since by definition T|W(w)=T(w) for all wW, we see that the vectors vi are also eigenvectors of T, and thus that B consists of eigenvectors. To show B is a basis it is enought to prove linear independence, since dimV=n+1. Suppose we have
cv+c1v1++cnvn=0
for scalars c,ciR. Taking the inner product with v, we have :
cv+c1v1++cnvn=0v,cv+c1v1++cnvn=v,0cv,v+i=1nciv,vi=0cv,v=0(v,vi=0)c=0(v,v0).
It follows that we have
c1v1++cnvn=0,
and thus ci=0 for all 1in, since B is linearly independent. Having proved that B is an eigenbasis, we conclude that T is diagonalizable.
Let λ1,λ2,,λr be the distinct eigenvalues of T. Since T is assumed to be diagonalizable, following Procedure 4.5.14 we can create an eigenbasis B by picking bases Bi of each eigenspace Wλi and combining them. We can always choose these bases so that each Bi is orthogonal. When we do so, the assembled B will be orthogonal as a whole. Indeed given any two elements v,v of B, if both vectors are elements of Bi for some i, then they are orthogonal by design; furthermore, if v is an element of basis Bi and v is an element of basis Bj with ij, then they are eigenvectors with distinct eigenvalues, and hence orthogonal by assumption!
This is easy to see since an orthonormal eigenbasis can be obtained from an orthogonal eigenbasis by scaling each element by the reciprocal of its norm.
Assume B is an orthonormal eigenbasis of T. Since B is an eigenbasis, [T]B is a diagonal matrix, and hence symmetric. Since B is orthonormal with respect to the dot product, we conclude from Theorem 5.4.2 that T is self-adjoint.
An operator that admits an orthogonal (and hence an orthonormal) eigenbasis is called orthogonally diagonalizable.

Definition 5.4.9. Orthogonally diagonalizable.

Let V be a finite-dimensional inner product space. A linear transformation T:VV is orthogonally diagonalizable if there exists an orthogonal (equivalently, an orthonormal) eigenbasis of T.
This new language affords us a more succinct articulation of Theorem 5.4.8: to be self-adjoint is to be orthogonally diagonalizable. Think of this as a sort of “diagonalizable+” condition.
As an immediate consequence of Theorem 5.4.8, we have the following result about symmetric matrices.
By Corollary 5.4.3 we have A symmetric if and only if TA is self-adjoint with respect to the dot product. Statements (1)-(3) are seen to be equivalent by applying Theorem 5.4.8 to TA (with respect to the dot product). Let B be the standard basis of Rn. We see that (4) is equivalent to (3) by observing that B is an orthonormal eigenbasis of TA if and only if the matrix Q=PBB obtained by placing the elements of B as columns is orthogonal and diagonalizes A.
The process of finding matrices Q and D satisfying (5.4.3) is called orthogonal diagonalization. A close look at the proof of Theorem 5.4.8 gives rise to the following orthogonal diagonalization method for matrices.

Example 5.4.13. Orthogonal diagonalization.

The symmetric matrix
A=13[122212221]
has characteristic polynomial p(x)=x3+x2x1. Find an orthogonal matrix Q and diagonal matrix D such that D=QTAQ.
Solution.
First we factor p(x). Looking at the constant term we see that the only possible integer roots of p(x) are ±1. It is easily verified that p(1)=0, and polynomial division yields the factorization p(x)=(x1)(x2+2x+1). Further factorization of x2+2x+1 gives us p(x)=(x1)(x+1)2.
Next we compute orthonormal bases of the eigenspaces W1 and W1, yielding
B1=(13(1,1,1))B2=(12(1,1,0),16(1,1,2)).
Assembling these bases elements into the orthogonal matrix
Q=[1/31/21/61/31/21/61/302/6],
we conclude that D=Q1AQ=QTAQ, where
D=[100010001].
Observe that the two eigenspaces W1 and W1 of the matrix A in Example 5.4.13 are orthogonal to one another, as predicted by the spectral theorem. Indeed, W1 is the line passing through the origin with direction vector n=(1,1,1), and W1 is its orthogonal complement, the plane passing through the origin with normal vector n. Figure 5.4.14 depicts the orthogonal configuration of the eigenspaces of this example. This is an excellent illustration of what makes the diagonalizability of symmetric matrices (and self-adjoint operators) special. Keep it in mind!
Figure 5.4.14. Eigenspaces of a symmetric matrix are orthogonal
Do not overlook the reverse implication of equivalence (5.4.2). As the next example illustrates, we can show an operator is self-adjoint by examining the geometry of its eigenspaces.

Example 5.4.15. Orthogonal projections are self-adjoint.

Let (V,,) be a finite-dimensional inner product space, let W be a subpsace of V, and let T=projW be orthogonal projection onto W. Prove that T is self-adjoint.
Solution.
By Theorem 5.4.8 it suffices to show that T is orthogonally diagonalizable. According to Exercise 5.3.6.20 we have
vWT(v)=vvWT(v)=0.
Equivalently, W=W1 and W=W0 are the 1- and 0-eigenspaces of T, respectively. Since dimW+dimW=dimV we conclude that T is diagonalizable. Since clearly W and W are orthogonal, we conclude that T is in fact othogonally diagonalizable, hence self-adjoint.

Exercises 5.4.3 Exercises

WeBWork Exercises

1.
Let A=[31261236666]. Find an orthogonal matrix P with rational entries and a diagonal matrix D such that D=PTAP.
P= (3 × 3 array), D= (3 × 3 array)

Exercise Group.

Orthogonally diagonalize the given symmetric matrix A following Procedure 5.4.12: i.e. find a diagonal matrix D and orthogonal matrix Q satisfying D=QTAQ.
4.
A=[112112222]

10.

Let (V,,) be a finite-dimensional inner product space, let T:VV be a linear transformation, and let B=(v1,v2,,vn) be an ordered basis of V. Prove: T is self-adjoint if and only if
T(vi),vj=vi,T(vj)
for all 1i,jn. In other words, to prove T is self-adjoint it suffices to show property (5.4.1) holds for all elements of a basis of V.

11.

Let (V,,) be a finite-dimensional inner product space, let T:VV be a self-adjoint operator, and let W be a subspace of V.
  1. Prove: if vW, then T(v)W.
  2. By (a), restricting T to W defines a linear transformation
    T|W:WWvT(v).
    Prove that T|W is self-adjoint. Here the inner product on the subspace W is inherited from (V,,) by restriction.

12.

Assume AMnn is symmetric and orthogonal. Prove that the characteristic polynomial of A factors as p(x)=(x1)r(x+1)s for some nonnegative integers r,s. In particular, the eigenvalues of A are among 1 and 1.

Exercise Group.

Let CR2 be a conic curve defined by a quadratic equation of the form
(5.4.4)C:ax2+bxy+cy2=d
where a,b,cR are fixed constants. You may have learned that C can be rotated to a conic C with a “standard equation” of the form ex2+fy2=d. In the following exercises we will see why this is true.
13.
Find a symmetric matrix AM22 satisfying the following property: x=(x,y) satisfies (5.4.4) if and only if
(5.4.5)x(Ax)=xTAx=d.
(Here we conflate the 1×1 matrix [d] with the scalar dR.)
15.
Show that x satisfies (5.4.5) if and only if x=Q1x=QTx satisfies
(5.4.6)ex+fy=d.
16.
Explain why we can conclude that there is a rotation that maps the conic C with equation (5.4.4) to the conic C with “standard equation” (5.4.6).
17.
Let CR2 be the conic curve with equation
x2+4xy+y2=1.
  1. Find an angle θ and constants a,bR such that the rotation ρθ maps C to a conic C with defining equation
    ax2+by2=1.
  2. First graph C, and then graph C using the result of (a). What type of conics (parabolas, ellipses, hyperbolas) are C and C ?