Matrices played a small supporting role in our discussion of linear systems in Chapter 1. In this chapter we bring them to center stage and give them a full-blown treatment as independent mathematical objects in their own right.
Like any mathematical entity worth its salt, matrices can be employed in a vast multitude of ways. As such it is important to allow matrices to transcend their humble beginnings in this course as boiled down systems of linear equations. We record this observation as another principle.
Mantra2.1.1.Matrix mantra.
A matrix is a matrix is a matrix.
Not every matrix should be thought of as an augmented matrix associated to a linear system.
Subsection2.1.1The basics
Definition2.1.2.Matrix.
A (real) matrix is a rectangular array of real numbers
The number \(a_{ij}\) located in the \(i\)-th row and \(j\)-th column of \(A\) is called the \((i,j)\)-entry (or \(ij\)-th entry) of \(A\text{.}\)
A matrix with \(m\) rows and \(n\) columns is said to have size (or dimension) \(m\times n\text{.}\)
We will typically use capital letters near the beginning of the alphabet (e.g. \(A, B,C, D\text{,}\) etc.) to denote matrices.
The displayed matrix in (2.1.1) is costly both in the space it takes up in show, and the time it takes to write down or typeset. Accordingly we introduce two somewhat complementary forms of notation to help describe matrices.
Definition2.1.3.Matrix notation.
Matrix-building notation
The notation \([a_{ij}]_{m\times n}\) denotes the \(m\times n\) matrix whose \(ij\)-th entry (\(i\)-th row, \(j\)-th column) is \(a_{ij}\text{.}\) When there is no danger of confusion, this notation is often shortened to \([a_{ij}]\text{.}\)
Matrix entry notation
Given a matrix \(A\text{,}\) the notation \([A]_{ij}\) denotes the \(ij\)-th entry of \(A\text{.}\)
Thus if \(A=[a_{ij}]_{m\times n}\text{,}\) then \([A]_{ij}=a_{ij}\) for all \(1\leq i\leq m\) and \(1\leq j\leq n\text{.}\)
Remark2.1.4.
The matrix-building notation is often used simply to give names to the entries of an arbitrary matrix. However, it can also be used to describe a matrix whose \(ij\)-th entry is given by specified rule or formula.
For example, let \(A=[a_{ij}]_{2\times 3}\text{,}\) where \(a_{ij}=(i-j)j\text{.}\) This is the \(2\times 3\) matrix whose \(ij\)-th entry is \((i-j)j\text{.}\) Thus
In this example we have \([A]_{23}=-3\) and \([A]_{ii}=0\) for \(i=1,2\text{.}\)
In everyday language the notion of equality is taken as self-evident. Two things are equal if they are the same. What more is there to say? In mathematics, each time we introduce a new type of mathematical object (e.g., sets, functions, \(n\)-tuples, etc.) we need to spell out exactly what we mean for two things to be considered equal. We do so now with matrices.
Definition2.1.5.Matrix equality.
Let \(A\) and \(B\) be matrices of dimension \(m\times n\) and \(m'\times n'\text{,}\) respectively. The two matrices are equal if
\(m=m'\) and \(n=n'\text{;}\)
\([A]_{ij}=[B]_{ij}\) for all \(1\leq i\leq m\) and \(1\leq j\leq n\text{.}\)
In other words, we have \(A=B\) if and only if \(A\) and \(B\) have the same shape, and each entry of \(A\) is equal to the corresponding entry of \(B\text{.}\)
are not equal to one another, despite their having the same entries that appear roughly in the same order. In this case equality does not hold as \(A\) and \(B\) have different shapes: \(A\) is \(1\times 4\text{,}\) and \(B\) is \(4\times 1\text{.}\)
The matrices \(A=\begin{bmatrix}1\amp 2 \\3\amp 4 \end{bmatrix}\) and \(B=\begin{bmatrix}1\amp 2\\ 5\amp 4\end{bmatrix}\) have the same dimension, but are not equal since \([A]_{21}=3\ne 5=[B]_{21}\text{.}\)
Definition2.1.7.Square matrices, row vectors, column vectors, zero matrices.
A matrix \(A\) is square if its dimension is \(n\times n\text{.}\) The diagonal of a square matrix \(A=[a_{ij}]_{n\times n}\) consists of the entries \(a_{ii}\) for \(1\leq i\leq n\text{.}\)
is called a column vector. The \(i\)-th entry of a column vector \(\boldb\) is denoted \([\boldb]_i\text{.}\)
The \(m\times n\) zero matrix, denoted \(\boldzero_{m\times n}\text{,}\) is the matrix of that dimension, all of whose entries are zero: i.e., \((\boldzero_{m\times n})_{ij}=0\) for all \(1\leq i\leq m\) and \(1\leq j\leq n\text{.}\)
When the actual dimension is not significant, we will often drop the subscript and write simply \(\boldzero\) for a zero matrix of suitable dimension.
Remark2.1.8.Matrices as collections of columns/rows.
Let \(A\) be an \(m\times n\) matrix. We will often think of \(A\) as a collection of columns, in which case we write
The vertical and horizontal lines in (2.1.2) and (2.1.3) are used to emphasize that the \(\boldc_j\) are columns vectors and the \(\boldr_i\) are row vectors.
Sage syntax for accessing specific entries of a matrix is similar in spirit to our matrix entry notation. However, as with all things Python, we always count from 0. Thus if A is assigned to a matrix in Sage, A[i,j] is its \((i+1),(j+1)\)-th entry.
Prescribed subsets of matrix entries are obtained via slicing methods: for example, A[a:b, c:d] returns the collection of entries \([A]_{ij}\) with \(a+1\leq i\lt b+1\) and \(c+1\leq j\lt d\text{,}\) arranged as a matrix.
Leaving the left or right side of : blank in this notation removes the corresponding restriction bound (left or right) from the index in question. Thus A[2, :] returns the third row of \(A\text{,}\) and A[1:, 3] returns the portion of the fourth column of \(A\) beginning with its second entry.
Alternatively, we can obtain a list of all rows or columns of \(A\) using the the methods rows() and columns().
Use the empty cell below to try out some of these commands.
Subsection2.1.2Addition, subtraction and scalar multiplication
We now lay out the various algebraic operations we will use to combine and transform matrices; we refer to the use of these operations loosely as matrix arithmetic. Some of these operations resemble familiar operations from real arithmetic in terms of their notation and definition. Do not be lulled into complacency! These are new operations defined for a new class of mathematical objects, and must be treated carefully. In particular, pay close attention to (a) exactly what type of mathematical objects serve as inputs for each operation (the ingredients of the operation), and (b) what type of mathematical object is outputted.
Definition2.1.9.Matrix addition and subtraction.
Matrix addition is the operation defined as follows: given two \(m\times n\) matrices \(A=[a_{ij}]_{m\times n}\) and \(B=[b_{ij}]_{m\times n}\text{,}\) we define their sum to be the matrix
for all \(1\leq i\leq m\) and \(1\leq j\leq n\text{.}\)
Matrix subtraction is the operation defined as follows: given two \(m\times n\) matrices \(A=[a_{ij}]_{m\times n}\) and \(B=[b_{ij}]_{m\times n}\text{,}\) we define their difference to be the matrix
for all \(1\leq i\leq m\) and \(1\leq j\leq n\text{.}\)
Remark2.1.10.
Observe that matrix addition/subtraction is not defined for any pair of matrices. The ingredients of matrix addition (or subtraction) are two matrices of the same dimension; and the output is a third matrix of this common dimension.
Definition2.1.11.Scalar multiplication of matrices.
Given any matrix \(A=[a_{ij}]_{m\times n}\) and any constant \(c\in \R\text{,}\) we define
In other words, \(cA\) is the \(m\times n\) matrix obtained by “scaling” each of the entries of \(A\) by the constant \(c\text{.}\)
We call \(cA\) a scalar multiple of \(A\text{.}\) Furthermore, to help distinguish between matrices and real numbers, we will refer to elements of \(\R\) as scalars.
Remark2.1.12.
Whereas matrix addition and subtraction closely resemble corresponding operations involving real numbers, there is no obvious real arithmetic analogue to matrix scalar multiplication. In particular, notice how matrix scalar multiplication is a sort of hybrid operation that combines mathematical objects of two very different natures: a real number (or scalar) on the one hand, and a matrix on the other.
We call the result of applying a sequence of matrix additions and scalar multiplications a linear combination of matrices.
Definition2.1.13.Linear combination of matrices.
Given matrices \(A_1,A_2,\dots, A_r\) of the same dimension, and scalars \(c_1,c_2, \dots ,c_r\text{,}\) the expression
Using the definition of matrix equality (Definition 2.1.5), we get the system of equations
\begin{equation*}
\begin{linsys}{3} 1a \amp +\amp b \amp + \amp c \amp = \amp 3\\
a \amp-\amp b\amp +\amp c\amp =\amp -3\\
a \amp \amp \amp -\amp 2c\amp =\amp 3
\end{linsys}\text{.}
\end{equation*}
Using Gaussian elimination we find that there is a unique solution to this system: namely, \((a,b,c)=(1,3,-1)\text{.}\) We conclude that \(B=A_1+3A_2+(-1)A_3=A_1+3A_2-A_3\text{.}\)
Remark2.1.16.
Let \(A_1, A_2,\dots, A_r\) be \(m\times n\) matrices, An easy induction argument on \(r\) shows that for any scalars \(c_1,c_2,\dots, c_r\) we have
for all \(1\leq i\leq m\text{,}\)\(1\leq j\leq n\text{.}\) (See Exercise 2.1.6.11. )
Subsection2.1.3Matrix multiplication
So how do we define the product of two matrices? Looking at the previous operations, you might have guessed that we should define the product of two \(m\times n\) matrices by taking the product of their corresponding entries. Not so!
Definition2.1.17.Matrix multiplication.
Matrix multiplication is the operation defined as follows: given an \(m\times n\) matrix \(A=[a_{ij}]_{m\times n}\) and an \(n\times r\) matrix \(B=[b_{ij}]_{n\times r}\text{,}\) we define their product to be the \(m\times r\) matrix \(AB\) whose \(ij\)-th entry is given by the formula
for all \(1\leq i\leq m\) and \(1\leq j\leq r\text{.}\)
Remark2.1.19.Size and matrix multiplication.
Observe how, like addition, matrix multiplication is not defined for any pair of matrices: there must be a certain agreement in their dimensions.
In more detail, for the product of \(A_{mn}\) and \(B_{pr}\) to be defined, we need \(n=p\text{.}\) In other words we need the “inner” dimensions of \(A\) and \(B\) to be equal:
If this condition is met, the dimension of the resulting matrix \(AB\) is determined by the “outer” dimensions of \(A\) and \(B\text{.}\) Schematically, you can think of the inner dimensions as being “canceled out”:
Since the “inner dimensions” of \(A\) and \(B\) agree, we can form the product matrix \(C=AB\text{,}\) which has dimension \(2\times 2\text{.}\) Let \(c_{ij}=[C]_{ij}\) for all \(1\leq i,j\leq 2\text{.}\) Using Definition 2.1.17, we compute
The formula for the \(ij\)-th entry of a matrix product \(AB\) can be succinctly described as the dot product of the \(i\)-th row of \(A\) with the \(j\)-th column of \(B\text{.}\) You may have already met the dot product in the special case of \(2\)- and \(3\)-tuples; the definition generalizes easily to \(n\)-tuples for any positive integer \(n\text{.}\) We will have a lot more to say about the dot product and related operations in Chapter 5. For now we will provide an official definition so that we can conveniently describe matrix multiplication in terms of dot products.
Definition2.1.21.Dot product.
Given \(n\)-tuples \(\boldx=(x_1,x_2,\dots, x_n)\) and \(\boldy=(y_1,y_2,\dots, y_n)\text{,}\) their dot product, denoted \(\boldx\cdot \boldy\text{,}\) is defined as
Theorem2.1.22.Dot product and matrix multiplication.
Let \(A\) be an \(m\times n\) matrix, and let \(B\) be an \(n\times r\) matrix. For all \(1\leq i\leq m\text{,}\) let \(\boldr_i\) be the \(i\)-th row of \(A\text{;}\) and for all \(1\leq k\leq r\) let \(\boldc_j\) be the \(j\)-th column of \(B\text{.}\) For all \(1\leq i\leq m, 1\leq j\leq n\text{,}\) we have
where \(\boldr_i\) and \(\boldc_j\) are treated as \(n\)-tuples. In other words, the \(ij\)-th entry of \(AB\) is the dot product of the \(i\)-th row of \(A\) and the \(j\)-th column of \(B\text{.}\)
Fix a pair \((i,j)\) with \(1\leq i\leq m\) and \(1\leq j\leq r\text{.}\) Considered as \(n\)-tuples, the \(i\)-th row of \(A\) and \(j\)-th column of \(B\) are given as
The definition of a matrix product \(AB\) is undoubtedly more complicated than you expected, and seems to come completely out of the blue. All of this will make more sense once we begin thinking of matrices \(A\) as defining certain functions \(T_A\text{.}\) Our formula for the entries of \(AB\) is chosen precisely so that this new matrix corresponds to the composition of the functions \(T_A\) and \(T_B\text{:}\) i.e. so that
(See Theorem 3.2.32.) Under this interpretation, the ponderous restriction on the dimensions of the ingredient matrices ensures that the two functions \(T_A\) and \(T_B\) can be composed.
We use + and * for matrix addition and multiplication.
As evidence of Sage’s flexibility, the same symbol * is also used for scalar multiplication.
Edit the cell below to practice these operations.
Subsection2.1.4Alternative methods of multiplication
In addition to the given definition of matrix multiplication, we will make heavy use of two further ways of computing matrix products, called the column and row methods of matrix multiplication.
Theorem2.1.24.Column method of matrix multiplication.
Let \(A=[a_{i}]_{m\times n}\) and \(B=[b_{ij}]_{n\times r}\text{.}\) The column method of matrix multiplication computes \(AB\) using the two steps below.
Step 1
Let \(\boldb_j\) be the \(j\)-th column of \(B\text{,}\) considered as a column vector. Then
First we show \(AB\) and \(C\) have the same size. By definition of matrix multiplication, \(AB\) is \(m\times r\text{.}\) By construction \(C\) has \(r\) columns and its \(j\)-th column is \(A\boldb_j\text{.}\) Since \(A\) and \(\boldb_j\) have size \(m\times n\) and \(n\times 1\text{,}\) respectively, \(A\boldb_j\) has size \(m\times 1\text{.}\) Thus each of the \(r\) columns of \(C\) is an \(m\times 1\) column vector. It follows that \(C\) is \(m\times r\text{,}\) as desired.
Next we show that \([AB]_{ij}=[C]_{ij}\) for all \(1\leq i\leq m\text{,}\)\(1\leq j\leq r\text{.}\) Since the \(ij\)-th entry of \(C\) is the \(i\)-th entry of the \(j\)-th column of \(C\text{,}\) we have
The usual argument shows that both \(A\boldb\) and \(\boldc\) are \(m\times 1\) column vectors. It remains only to show that the \(i\)-th entry \([A\boldb]_i\) of the column \(A\boldb\) is equal to the \(i\)-th entry \([\boldc]_i\) of \(\boldc\) for all \(1\leq i\leq m\text{.}\) For any such \(i\) we have
Theorem 2.1.24 amounts to a two-step process for computing an arbitrary matrix product \(AB\text{.}\)
The first statement (Step 1) tells us that the \(j\)-th column of the matrix \(AB\) can be obtained by computing the product \(A\,\boldb_j\) of \(A\) with the \(j\)-th column of \(B\text{.}\)
The second statement (Step 2) tells us that each product \(A\,\boldb_j\) can itself be computed as a certain linear combination of the columns of \(A\) with coefficients drawn from \(\boldb_j\text{.}\)
A similar remark applies to computing matrix products using the row method, as described below in Theorem 2.1.26.
Theorem2.1.26.Row method of matrix multiplication.
Let \(A=[a_{i}]_{m\times n}\) and \(B=[b_{ij}]_{n\times r}\text{.}\) The row method of matrix multiplication computes \(AB\) using the two steps below.
Step 1
Let \(\bolda_i\) be the \(i\)-th row of \(A\text{.}\) Then
Let’s verify the validity of the column and row methods using Sage in some specific examples. Below we generate random integer matrices \(A\) and \(B\) of dimension \(3\times 5\) and \(5\times 4\text{,}\) respectively, and compute their product \(C=AB\text{.}\)
Let’s check that the \(j\)-th column of \(C\) is equal to the product of \(A\) with the \(j\)-th column of \(B\text{.}\)
Alternatively, we can visually confirm these equalities using the display of \(C\) in the first cell above. Observe that the result of A*colsB[i] is displayed by Sage as a tuple, though technically for us this is a column vector.
Next, let’s verify that the result of multiplying \(A\) and the \(j\)-th column of \(B\) is the corresponding linear combination of the columns of \(A\) given by the coefficients of this column.
Now use the Sage cells below to demonstrate the validity of the row method for the product \(C=AB\text{.}\) Simply modify the code in the two cells above to reflect the row method, as opposed to the column method.
Video example of matrix multiplication.
Subsection2.1.5Transpose of a matrix
We end this section with one last operation, matrix transposition. We will not make much use of this operation until later, but this is as good a place as any to introduce it.
Definition2.1.29.Matrix transposition.
Given an \(m\times n\) matrix \(A=[a_{ij}]\) its transpose \(A^T\) is the matrix whose \(ij\)-entry is the \(ji\)-th entry of \(A\text{.}\) In other words, \(A^T\) is the \(n\times m\) matrix satisfying \([A^T]_{ij}=[A]_{ji}\) for all \(1\leq i\leq n\) and \(1\leq j\leq m\text{.}\)
Remark2.1.30.
Given a matrix \(A\) we can give a column- or row-based description of \(A^T\) as follows:
\(A^T\) is the matrix whose \(i\)-th row is the \(i\)-th column of \(A\text{.}\)
\(A^T\) is the matrix whose \(j\)-th column is the \(j\)-th row of \(A\text{.}\)
Example2.1.31.Transpose.
Let \(A=\begin{bmatrix}1\amp 2\amp 3\\4\amp 5\amp 6 \end{bmatrix}\text{;}\) then \(A^T=\begin{bmatrix}1\amp 4\\2\amp 5\\3\amp 6 \end{bmatrix}\text{.}\)
Let \(B=\begin{bmatrix}1\\0\\3 \end{bmatrix}\text{,}\) then \(B^T=\begin{bmatrix}1\amp 0\amp 3 \end{bmatrix}\text{.}\)
Matrix transposition is implemented in Sage as the transpose() method. In the cell below we (a) choose random integers \(1\leq m,n\leq 6\text{,}\) (b) choose a random \(m\times n\) matrix \(A\) with integer entries, and (c) compute the transpose of \(A\text{.}\)
As usual, experiment with the Sage cell below.
Exercises2.1.6Exercises
WeBWork Exercises
1.
Enter T or F depending on whether the statement is true or false. (You must enter T or F -- True and False will not work.)
If A has dimensions \(m \times n\) and B has dimensions \(n \times r\text{,}\) then AB has dimensions \(m \times r\text{.}\)
If A has dimensions \(5 \times 4\) and B has dimensions \(4 \times 3\text{,}\) then the 3rd row, 4th column entry of AB is obtained by multiplying the 3rd column of A by the 4th row of B.
2.
Matrix Products: Consider the matrices
\begin{equation*}
A = \begin{pmatrix}5\amp 4\amp 7\\6\amp 5\amp 4\end{pmatrix},
B = \begin{pmatrix}2\amp 1\amp 3\amp 3\\2\amp 6\amp 9\amp 6\\
9\amp 7\amp 2\amp 8\end{pmatrix},\textrm{ and }
C = \begin{pmatrix}7\amp 5\\1\amp 8\\2\amp 5\\2\amp 3\end{pmatrix}
\end{equation*}
Of the possible matrix products \(ABC, ACB, BAC, BCA, CAB, CBA\text{,}\)
For each part below write down the most general \(3\times 3\) matrix \(A=[a_{ij}]\) satisfying the given condition (use letter names \(a,b,c\text{,}\)etc. for entries).
Note that the rows of \(A\) are all identical, and equal to \(\begin{bmatrix}1 \amp -1 \amp 1 \amp -1 \amp 1 \end{bmatrix}\text{.}\) From the row method it follows that each row of \(AB\) is given by
Thus the rows of \(AB\) are all identical, and the row method computes the product above by taking the corresponding alternating sum of the rows of \(B\text{:}\)
Thus \(AB\) is the the \(5\times 4\) matrix, all of whose rows are \(\begin{bmatrix}2\amp 2\amp -1\amp 4 \end{bmatrix}\text{.}\)
10.
Each of the \(3\times 3\) matrices \(B_i\) below performs a specific row operation when multiplying a \(3\times n\) matrix \(A=\begin{bmatrix}-\boldr_1-\\ -\boldr_2-\\ -\boldr_3- \end{bmatrix}\) on the left; i.e., the matrix \(B_iA\) is the result of performing a certain row operation on the matrix \(A\text{.}\) Use the row method of matrix multiplication to decide what row operation each \(B_i\) performs.
Let \(r\geq 2\) be an integer. Prove, by induction on \(r\text{,}\) that for any \(m\times n\) matrices \(A_1, A_2,\dots, A_r\) and scalars \(c_1,c_2,\dots, c_r\text{,}\) we have