A trick we learn early on in physics-- specifically, in dynamics problems in -- is to pick a convenient axis and then decompose any relevant vectors (force, acceleration, velocity, position, etc.) into a sum of two components: one that points along the chosen axis, and one that points perpendicularly to it. As we will see in this section, this technique can be vastly generalized. Namely, instead of we can take any inner product space ; and instead of a chosen axis in , we can choose any finite-dimensional subspace ; then any can be decomposed in the form
where and is a vector orthogonal to , in a sense we will make precise below. Just as in our toy physics example, this manner of decomposing vectors helps simplify computations in problems where the subspace chosen is of central importance.
According to Definition 5.3.1, to verify that a vector lies in , we must show that for all . The “for all” quantifier here can potentially make this an onerous task: there are in principle infinitely many to check! In the special case where has a finite spanning set, so that for some vectors , deciding whether reduces to checking whether for all . In other words, we have
The forward implication of this equivalence is clear: if is orthogonal to all elements of , then clearly it is orthogonal to each . The reverse implication is left as an exercise. (See Exercise 5.3.6.15.)
Consider the inner product space together with the dot product. Let : the line with equation . Compute and identify it as a familiar geometric object in .
Letting , we see that if and only if , if and only if . Thus is the line with equation . Observe that the lines and are indeed perpendicular to one another. (Graph them!)
Consider the inner product space together with the dot product. Let be the plane with equation . Compute and identify this as a familiar geometric object in .
It follows that is the set of vectors satisfying the linear system
Solving this system using Gaussian elimination we conclude that
,
which we recognize as the line passing through the origin with direction vector . This is none other than the normal line to the plane passing through the origin.
Consider the inner product space with the dot product. Let , the line passing through the origin with direction vector . The orthogonal complement is the set of vectors orthogonal to . Using the definition of dot product, this is the set of solutions to the equation
The notion of orthogonal complement gives us a more conceptual way of understanding the relationship between the various fundamental spaces of a matrix.
Using the dot product method of matrix multiplication, we see that a vector if and only if for each row of , if and only if for all (see Remark 5.3.2), if and only if . This shows .
We can use Corollary 5.3.13 to conclude . Alternatively, and more directly, the argument above shows that , proving . Next, by the rank-nullity theorem we have ; and by Theorem 5.3.5 we have . It follows that . Since and , we conclude by Corollary 3.7.13 that .
Understanding the orthogonal relationship between and allows us in many cases to quickly determine/visualize the one from the other. As an example, consider . Looking at the columns, we see easily that , which implies that . Since is an element of and , we must have , a line. By orthogonality, we conclude that
Furthermore, the pair is unique in the following sense: if we have for some and , then and . Accordingly, the vector equation (5.3.1) is called the orthogonal decomposition of with respect to ; and the vector is called the orthogonal projection of onto , denoted proj.
and satisfy the conditions in (5.3.1). It is clear that the defined in (5.3.4) is an element of , since it is a linear combination of the . Furthermore, we see easily that our choice satisfies
.
It remains only to show that . Since is a basis of , it suffices to show that for all . We compute:
proj,
as desired.
Having shown that a decomposition of of the form (5.3.1) exists, we now show it is unique in the sense specified. Suppose we have
,
where and . Rearranging, we see that
.
We now claim that , in which case and , as desired. To see why the claim is true, consider the vector . Since , and , we have . On the other hand, since , and , we have . Thus . Since (Theorem 5.3.5), we conclude , as claimed.
At this point we have proved both (1) and (2), and it remains only to show that (5.3.3) holds for all . To this end we compute:
Exercise 5.2.4.18.
This shows . Taking square-roots now proves the desired inequality.
The formula (5.3.2) is very convenient for computing an orthogonal projection proj, but mark well this important detail: to apply the formula we must first provide an orthogonal basis of . Thus unless one is provided, our first step in an orthogonal projection computation is to produce an orthogonal basis of . In some simple cases (e.g., when is 1- or 2-dimensional) this can be done by inspection. Otherwise, we use the Gram-Schmidt procedure.
According to Remark 5.3.10 our first step is to produce an orthogonal basis of . We do so by inspection. Since , we simply need to find two solutions to that are orthogonal to one another: e.g., and . Thus we choose as our orthogonal basis, and our computations become a matter of applying (5.3.2), which in this case becomes
proj.
Now compute:
projprojproj.
The last two computations might give you pause. Why do we have proj and proj? The answer is that is already an element of , so it stands to reason that its projection is itself; and is already orthogonal to (it is a scalar multiple of ), so it stands to reason that its projection is equal to . See Exercise 5.3.6.20 for a rigorous proof of these claims.
For this subsection we will always work within Euclidean space: i.e., with the dot product. In applications we often want to compute the projection of a point onto a line (in or ) or plane (in ). According to Corollary 5.3.14 the operation of projecting onto any subspace is in fact a linear transformation . By Corollary 3.6.18 we have , where
Lastly, (5.3.2) gives us an easy formula for computing proj for all , once we have selected an orthogonal basis for . As a result we can easily derive matrix formulas for projection onto any subspace of any Euclidean space . We illustrate this with some examples in and below.
Any plane passing through the origin can be described as . Equivalently, is the set of all satisfying : i.e., , where . Consider the orthogonal decomposition with respect to :
What is the relationship between and ?Theorem 5.3.9 tells us that is the “best” trigonometric polynomial approximation of of degree at most in the following sense: given any any other trigonometric polynomial , we have
However, linear algebra does not tell us just how good this approximation is. This question, among others, is tackled by another mathematical theory: Fourier analysis. There we learn that the trigonometric polynomial approximations get arbitrarily close to as we let increase. More precisely, letting be the orthogonal projection of onto the space of trigonometric polynomials of degree at most , we have
In statistics we often wish to approximate a scatter plot of points ,, with a line that “best fits” the data. “Finding” this line amounts to finding the appropriate slope and -intercept : i.e., in this setup, the points are given, and and are the unknowns we wish to find. For the line to perfectly fit the data, we would want
Of course in most situations the provided points do not lie on a line, and thus there is no solution to the given matrix equation . When this is the case we can use the theory of orthogonal projection to find what is called a least-squares solution, which we now describe in detail.
When , and hence (5.3.5) does not have a solution, the least-squares method proceeds by replacing with the element of closest to it: that is, with its orthogonal projection onto . Let proj, where orthogonal projection is taken with respect to the dot product on , and consider the adjusted matrix equation
By definition of , we have , and thus there is a solution to (5.3.6). We call a least-squares solution to (5.3.5). Observe that does not necessarily satisfy ; rather, it satisfies . What makes this a “least-squares” solution is that is the element of closest to . With respect to the dot product, this means that a least-squares solution minimizes the quantity
Suppose we wish to find a line that best fits (in the least-square sense) the following data points: . Following the discussion above, we seek a solution to the matrix equation , where
Using Gaussian elimination, we see easily that this equation has no solution: equivalently, . Accordingly, we compute proj and find a solution to . Conveniently, the set is already an orthogonal basis of , allowing us to use (5.3.2):
Figure 5.3.21 helps us give a graphical interpretation of how the line best approximates the points .
Figure5.3.21.Least-squares visualization Let be the given -values of the points, and let be the orthogonal projection of onto . In the graph the values denote the vertical difference between the data points, and our fitting line. The projection makes the error as small as possible. This means if I draw any other line and compute the corresponding differences at the -values -3, 1 and 2, then
To compute a least-squares solution to we must first compute the orthogonal projection of onto ; and this in turn requires first producing an orthogonal basis of , which may require using the Gram-Schmidt procedure. The following result bypasses these potentially onerous steps by characterizing a least-squares solution to as a solution to the matrix equation
Consider again the matrix equation from Example 5.3.19. According to Theorem 5.3.22 the least-squares solution can be found by solving the equation for . We compute
Solution: (a) The idea is to find a convenient point lying in the given plane. For example, let us pick , and let be the vector from to . Then the distance from to the plane is the absolute value of the scalar projection of onto the normal vector
So, we get that the distance is .
(b) To find an equation of the plane in question, we can work with the point and the normal vector given by the cross product of the vector from to , and the vector from to . It turns out that
Recall that the trace of a square matrix is the sum of its diagonal entries. Let with inner product . (You may take for granted that this operation is indeed an inner product on .) Define .
Compute an orthogonal basis for . You can do this either by inspection (the space is manageable), or by starting with any basis of and applying the Gram-Schmidt procedure.
Let with the integral inner product, and let . Find the function of the form that “best approximates” in terms of this inner product: i.e. find the the of this form that minimizes .
We consider the problem of fitting a collection of data points with a quadratic curve of the form . Thus we are given some collection of points , and we seek parameters for which the graph of “best fits” the points in some way.
Show, using linear algebra, that if we are given any three points , where the -coordinates are all distinct, then there is a unique choice of such that the corresponding quadratic function agrees precisely with the data. In other words, given just about any three points in the plane, there is a unique quadratic curve connecting them.
Graph the function you found, along with the points . (You may want to use technology.) Use your graph to explain precisely in what sense “best fits” the data.