zotero-db/storage/ZXYLLXJ5/.zotero-ft-cache

Sheldon Axler

Copyr1 hted Mateoat

Second Edit

u

pnn r
-

UYr..1gh ed Mat rial

Undergraduate Texts in Mathematics
Editors
S.Nder FW. Gehring
K.A.Ribet
Springer
New York Berlin Heidelberg Barcelona Hong Kong London Milan Paris Singapore Tokyo

Undergraduate Texts in Mathematics

Abbott: Understanding Analysis. Anglin: Mathematics: A Concise History
and Philosophy. Readings in Mathematics. Anglin/Lambek: The Heritage of Thales. Readings in Mathematics. Apostol: Introduction to Analytic Number Theory. Second edition. Armstrong: Basic Topology. Armstrong: Groups and Symmetry. Axler: Linear Algebra Done Right. Second edition. Beardon: Limits: A New Approach to Real Analysis. Bak/Newman: Complex Analysis. Second edition. Banchoff/Wermer: Linear Algebra Through Geometry. Second edition. Berberian: A First Course in Real Analysis. Bix: Conics and Cubics: A Concrete Introduction to Algebraic Curves. Bremaud: An Introduction to Probabilistic Modeling. Bressoud: Factorization and Primality Testing. Bressoud: Second Year Calculus. Readings in Mathematics. Brickman: Mathematical Introduction to Linear Programming and Game Theory. Browder: Mathematical Analysis: An Introduction. Buchmann: Introduction to Cryptography. Buskes/van Rooij: Topological Spaces: From Distance to Neighborhood. Callahan: The Geometry ofSpacetime: An Introduction to Special and General Relavitity. Carter/van Brunt: The LebesgueStieltjes Integral: A Practical Introduction. Cederberg: A Course in Modem
Geometries. Second edition.

Childs: A Concrete Introduction to Higher Algebra. Second edition.
Chung: Elementary Probability Theory with Stochastic Processes. Third edition.
Cox/Little/O'Shea: Ideals, Varieties, and Algorithms. Second edition.
Croom: Basic Concepts of Algebraic Topology.
Curtis: Linear Algebra: An Introductory Approach. Fourth edition.
Devlin: The Joy of Sets: Fundamentals of Contemporary Set Theory. Second edition.
Dixmier: General Topology. Driver: Why Math? Ebbinghaus/Flum/Thomas:
Mathematical Logic. Second edition. Edgar: Measure, Topology, and Fractal
Geometry. Elaydi: An Introduction to Difference
Equations. Second edition. Exner: An Accompaniment to Higher
Mathematics. Exner: Inside Calculus. Fine/Rosenberger: The Fundamental
Theory of Algebra. Fischer: Intermediate Real Analysis. Flanigan/Kazdan: Calculus Two: Linear
and Nonlinear Functions. Second edition. Fleming: Functions of Several Variables. Second edition. Foulds: Combinatorial Optimization for Undergraduates. Foulds: Optimization Techniques: An Introduction. Franklin: Methods of Mathematical Economics. Frazier: An Introduction to Wavelets Through Linear Algebra. Gamelin: Complex Analysis. Gordon: Discrete Probability. Hairer/Wanner: Analysis by Its History. Readings in Mathematics. Halmos: Finite-Dimensional Vector
Spaces. Second edition.
(continued after index)

Sheldon Axler
Linear Algebra Done Right
Second Edition
Springer

Sheldon Axler Mathematics Department San Francisco State University San Francisco, CA 94132 USA
Editorial Board
S. Axler Mathematics Department San Francisco State University San Francisco, CA 94132 USA
K.A. Ribet Mathematics Department University of Cali fomia at Berkeley Berkeley, CA 94720-3840 USA

F.W. Gehring Mathematics Department East Hall University of Michigan Ann Arbor, MI 48109-1109 USA

Mathematics Subject Classification (1991): 15-01

Library of Congress Cataloging-in-Publication Data

Axler, Sheldon Jay

Linear algebra done right/ Sheldon Axler. - 2nd ed.

p. cm. - (Undergraduate texts in mathematics)

Includes index.

lSHN 0-387-98259-0 (alk. paper). ISBN 0-387-98258-2 (pbk.:

alk. paper)

1. Algebra, Linear. I. Title.

II. Series.

QA184.A96 1997

512'.5-dc20

97-16664

© 1997, 1996 Springer-Verlag New York, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fi Ith Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation. computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.

IS13N 0-387-98259-0 (hardcover) ISBN 0-387-98258-2 (softcover)

SPIN 10629393 SPIN 10794473

Springer-Verlag New York Berlin Heidelberg A member of BertelsmannSpringer Science+Business Media GmbH

Contents

Preface to the Instructor

ix

Preface to the Student

xiii

Acknowledgments

xv

Chapter 1

Vector Spaces

1

Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Deﬁnition of Vector Space . . . . . . . . . . . . . . . . . . . . . . 4

Properties of Vector Spaces . . . . . . . . . . . . . . . . . . . . . 11

Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Sums and Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . 14

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Chapter 2

Finite-Dimensional Vector Spaces

21

Span and Linear Independence . . . . . . . . . . . . . . . . . . . 22

Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 3

Linear Maps

37

Deﬁnitions and Examples . . . . . . . . . . . . . . . . . . . . . . 38

Null Spaces and Ranges . . . . . . . . . . . . . . . . . . . . . . . 41

The Matrix of a Linear Map . . . . . . . . . . . . . . . . . . . . . 48

Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

v

vi

Contents

Chapter 4

Polynomials

63

Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Complex Coefﬁcients . . . . . . . . . . . . . . . . . . . . . . . . 67

Real Coefﬁcients . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Chapter 5

Eigenvalues and Eigenvectors

75

Invariant Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 76

Polynomials Applied to Operators . . . . . . . . . . . . . . . . . 80

Upper-Triangular Matrices . . . . . . . . . . . . . . . . . . . . . 81

Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Invariant Subspaces on Real Vector Spaces . . . . . . . . . . . 91

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Chapter 6

Inner-Product Spaces

97

Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . 106

Orthogonal Projections and Minimization Problems . . . . . . 111

Linear Functionals and Adjoints . . . . . . . . . . . . . . . . . . 117

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Chapter 7

Operators on Inner-Product Spaces

127

Self-Adjoint and Normal Operators . . . . . . . . . . . . . . . . 128

The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . 132

Normal Operators on Real Inner-Product Spaces . . . . . . . . 138

Positive Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Polar and Singular-Value Decompositions . . . . . . . . . . . . 152

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Chapter 8

Operators on Complex Vector Spaces

163

Generalized Eigenvectors . . . . . . . . . . . . . . . . . . . . . . 164

The Characteristic Polynomial . . . . . . . . . . . . . . . . . . . 168

Decomposition of an Operator . . . . . . . . . . . . . . . . . . . 173

Contents

vii

Square Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 The Minimal Polynomial . . . . . . . . . . . . . . . . . . . . . . . 179 Jordan Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Chapter 9

Operators on Real Vector Spaces

193

Eigenvalues of Square Matrices . . . . . . . . . . . . . . . . . . . 194

Block Upper-Triangular Matrices . . . . . . . . . . . . . . . . . . 195

The Characteristic Polynomial . . . . . . . . . . . . . . . . . . . 198

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

Chapter 10

Trace and Determinant

213

Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

Determinant of an Operator . . . . . . . . . . . . . . . . . . . . 222

Determinant of a Matrix . . . . . . . . . . . . . . . . . . . . . . . 225

Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

Symbol Index

247

Index

249

Preface to the Instructor
You are probably about to teach a course that will give students their second exposure to linear algebra. During their ﬁrst brush with the subject, your students probably worked with Euclidean spaces and matrices. In contrast, this course will emphasize abstract vector spaces and linear maps.
The audacious title of this book deserves an explanation. Almost all linear algebra books use determinants to prove that every linear operator on a ﬁnite-dimensional complex vector space has an eigenvalue. Determinants are difﬁcult, nonintuitive, and often deﬁned without motivation. To prove the theorem about existence of eigenvalues on complex vector spaces, most books must deﬁne determinants, prove that a linear map is not invertible if and only if its determinant equals 0, and then deﬁne the characteristic polynomial. This tortuous (torturous?) path gives students little feeling for why eigenvalues must exist.
In contrast, the simple determinant-free proofs presented here offer more insight. Once determinants have been banished to the end of the book, a new route opens to the main goal of linear algebra— understanding the structure of linear operators.
This book starts at the beginning of the subject, with no prerequisites other than the usual demand for suitable mathematical maturity. Even if your students have already seen some of the material in the ﬁrst few chapters, they may be unaccustomed to working exercises of the type presented here, most of which require an understanding of proofs.
• Vector spaces are deﬁned in Chapter 1, and their basic properties are developed.
• Linear independence, span, basis, and dimension are deﬁned in Chapter 2, which presents the basic theory of ﬁnite-dimensional vector spaces.
ix

x

Preface to the Instructor

• Linear maps are introduced in Chapter 3. The key result here is that for a linear map T , the dimension of the null space of T plus the dimension of the range of T equals the dimension of the domain of T .

• The part of the theory of polynomials that will be needed to understand linear operators is presented in Chapter 4. If you take class time going through the proofs in this chapter (which contains no linear algebra), then you probably will not have time to cover some important aspects of linear algebra. Your students will already be familiar with the theorems about polynomials in this chapter, so you can ask them to read the statements of the results but not the proofs. The curious students will read some of the proofs anyway, which is why they are included in the text.

• The idea of studying a linear operator by restricting it to small subspaces leads in Chapter 5 to eigenvectors. The highlight of the chapter is a simple proof that on complex vector spaces, eigenvalues always exist. This result is then used to show that each linear operator on a complex vector space has an upper-triangular matrix with respect to some basis. Similar techniques are used to show that every linear operator on a real vector space has an invariant subspace of dimension 1 or 2. This result is used to prove that every linear operator on an odd-dimensional real vector space has an eigenvalue. All this is done without deﬁning determinants or characteristic polynomials!

• Inner-product spaces are deﬁned in Chapter 6, and their basic properties are developed along with standard tools such as orthonormal bases, the Gram-Schmidt procedure, and adjoints. This chapter also shows how orthogonal projections can be used to solve certain minimization problems.

• The spectral theorem, which characterizes the linear operators for which there exists an orthonormal basis consisting of eigenvectors, is the highlight of Chapter 7. The work in earlier chapters pays off here with especially simple proofs. This chapter also deals with positive operators, linear isometries, the polar decomposition, and the singular-value decomposition.

Preface to the Instructor

xi

• The minimal polynomial, characteristic polynomial, and generalized eigenvectors are introduced in Chapter 8. The main achievement of this chapter is the description of a linear operator on a complex vector space in terms of its generalized eigenvectors. This description enables one to prove almost all the results usually proved using Jordan form. For example, these tools are used to prove that every invertible linear operator on a complex vector space has a square root. The chapter concludes with a proof that every linear operator on a complex vector space can be put into Jordan form.
• Linear operators on real vector spaces occupy center stage in Chapter 9. Here two-dimensional invariant subspaces make up for the possible lack of eigenvalues, leading to results analogous to those obtained on complex vector spaces.
• The trace and determinant are deﬁned in Chapter 10 in terms of the characteristic polynomial (deﬁned earlier without determinants). On complex vector spaces, these deﬁnitions can be restated: the trace is the sum of the eigenvalues and the determinant is the product of the eigenvalues (both counting multiplicity). These easy-to-remember deﬁnitions would not be possible with the traditional approach to eigenvalues because that method uses determinants to prove that eigenvalues exist. The standard theorems about determinants now become much clearer. The polar decomposition and the characterization of self-adjoint operators are used to derive the change of variables formula for multivariable integrals in a fashion that makes the appearance of the determinant there seem natural.
This book usually develops linear algebra simultaneously for real and complex vector spaces by letting F denote either the real or the complex numbers. Abstract ﬁelds could be used instead, but to do so would introduce extra abstraction without leading to any new linear algebra. Another reason for restricting attention to the real and complex numbers is that polynomials can then be thought of as genuine functions instead of the more formal objects needed for polynomials with coefﬁcients in ﬁnite ﬁelds. Finally, even if the beginning part of the theory were developed with arbitrary ﬁelds, inner-product spaces would push consideration back to just real and complex vector spaces.

xii

Preface to the Instructor

Even in a book as short as this one, you cannot expect to cover everything. Going through the ﬁrst eight chapters is an ambitious goal for a one-semester course. If you must reach Chapter 10, then I suggest covering Chapters 1, 2, and 4 quickly (students may have seen this material in earlier courses) and skipping Chapter 9 (in which case you should discuss trace and determinants only on complex vector spaces).
A goal more important than teaching any particular set of theorems is to develop in students the ability to understand and manipulate the objects of linear algebra. Mathematics can be learned only by doing; fortunately, linear algebra has many good homework problems. When teaching this course, I usually assign two or three of the exercises each class, due the next class. Going over the homework might take up a third or even half of a typical class.
A solutions manual for all the exercises is available (without charge) only to instructors who are using this book as a textbook. To obtain the solutions manual, instructors should send an e-mail request to me (or contact Springer if I am no longer around).
Please check my web site for a list of errata (which I hope will be empty or almost empty) and other information about this book.
I would greatly appreciate hearing about any errors in this book, even minor ones. I welcome your suggestions for improvements, even tiny ones. Please feel free to contact me.
Have fun!
Sheldon Axler Mathematics Department San Francisco State University San Francisco, CA 94132, USA
e-mail: axler@math.sfsu.edu www home page: http://math.sfsu.edu/axler

Preface to the Student
You are probably about to begin your second exposure to linear algebra. Unlike your ﬁrst brush with the subject, which probably emphasized Euclidean spaces and matrices, we will focus on abstract vector spaces and linear maps. These terms will be deﬁned later, so don’t worry if you don’t know what they mean. This book starts from the beginning of the subject, assuming no knowledge of linear algebra. The key point is that you are about to immerse yourself in serious mathematics, with an emphasis on your attaining a deep understanding of the deﬁnitions, theorems, and proofs.
You cannot expect to read mathematics the way you read a novel. If you zip through a page in less than an hour, you are probably going too fast. When you encounter the phrase “as you should verify”, you should indeed do the veriﬁcation, which will usually require some writing on your part. When steps are left out, you need to supply the missing pieces. You should ponder and internalize each deﬁnition. For each theorem, you should seek examples to show why each hypothesis is necessary.
Please check my web site for a list of errata (which I hope will be empty or almost empty) and other information about this book.
I would greatly appreciate hearing about any errors in this book, even minor ones. I welcome your suggestions for improvements, even tiny ones.
Have fun!
Sheldon Axler Mathematics Department San Francisco State University San Francisco, CA 94132, USA e-mail: axler@math.sfsu.edu www home page: http://math.sfsu.edu/axler
xiii

Acknowledgments
I owe a huge intellectual debt to the many mathematicians who created linear algebra during the last two centuries. In writing this book I tried to think about the best way to present linear algebra and to prove its theorems, without regard to the standard methods and proofs used in most textbooks. Thus I did not consult other books while writing this one, though the memory of many books I had studied in the past surely inﬂuenced me. Most of the results in this book belong to the common heritage of mathematics. A special case of a theorem may ﬁrst have been proved in antiquity (which for linear algebra means the nineteenth century), then slowly sharpened and improved over decades by many mathematicians. Bestowing proper credit on all the contributors would be a difﬁcult task that I have not undertaken. In no case should the reader assume that any theorem presented here represents my original contribution.
Many people helped make this a better book. For useful suggestions and corrections, I am grateful to William Arveson (for suggesting the proof of 5.13), Marilyn Brouwer, William Brown, Robert Burckel, Paul Cohn, James Dudziak, David Feldman (for suggesting the proof of 8.40), Pamela Gorkin, Aram Harrow, Pan Fong Ho, Dan Kalman, Robert Kantrowitz, Ramana Kappagantu, Mizan Khan, Mikael Lindstr¨om, Jacob Plotkin, Elena Poletaeva, Mihaela Poplicher, Richard Potter, Wade Ramey, Marian Robbins, Jonathan Rosenberg, Joan Stamm, Thomas Starbird, Jay Valanju, and Thomas von Foerster.
Finally, I thank Springer for providing me with help when I needed it and for allowing me the freedom to make the ﬁnal decisions about the content and appearance of this book.
xv

Chapter 1
Vector Spaces
Linear algebra is the study of linear maps on ﬁnite-dimensional vector spaces. Eventually we will learn what all these terms mean. In this chapter we will deﬁne vector spaces and discuss their elementary properties.
In some areas of mathematics, including linear algebra, better theorems and more insight emerge if complex numbers are investigated along with real numbers. Thus we begin by introducing the complex numbers and their basic properties.
✽
1

2

Chapter 1. Vector Spaces

Complex Numbers

The used

styomdbenolotiew√a−s 1ﬁrbsyt

the Swiss

mathematician

Leonhard Euler in 1777.

You should already be familiar with the basic properties of the set R of real numbers. Complex numbers were invented so that we can take square roots of negative numbers. The key idea is to assume we have a square root of −1, denoted i, and manipulate it using the usual rules of arithmetic. Formally, a complex number is an ordered pair (a, b), where a, b ∈ R, but we will write this as a + bi. The set of all complex numbers is denoted by C:
C = {a + bi : a, b ∈ R}.
If a ∈ R, we identify a + 0i with the real number a. Thus we can think of R as a subset of C.
Addition and multiplication on C are deﬁned by
(a + bi) + (c + di) = (a + c) + (b + d)i, (a + bi)(c + di) = (ac − bd) + (ad + bc)i;
here a, b, c, d ∈ R. Using multiplication as deﬁned above, you should verify that i2 = −1. Do not memorize the formula for the product of two complex numbers; you can always rederive it by recalling that i2 = −1 and then using the usual rules of arithmetic.
You should verify, using the familiar properties of the real numbers, that addition and multiplication on C satisfy the following properties:
commutativity w + z = z + w and wz = zw for all w, z ∈ C;
associativity (z1 + z2) + z3 = z1 + (z2 + z3) and (z1z2)z3 = z1(z2z3) for all z1, z2, z3 ∈ C;
identities z + 0 = z and z1 = z for all z ∈ C;
additive inverse for every z ∈ C, there exists a unique w ∈ C such that z + w = 0;
multiplicative inverse for every z ∈ C with z = 0, there exists a unique w ∈ C such that zw = 1;

Complex Numbers

3

distributive property λ(w + z) = λw + λz for all λ, w, z ∈ C.
For z ∈ C, we let −z denote the additive inverse of z. Thus −z is the unique complex number such that
z + (−z) = 0.

Subtraction on C is deﬁned by
w − z = w + (−z)
for w, z ∈ C. For z ∈ C with z = 0, we let 1/z denote the multiplicative inverse
of z. Thus 1/z is the unique complex number such that
z(1/z) = 1.

Division on C is deﬁned by
w/z = w(1/z)
for w, z ∈ C with z = 0. So that we can conveniently make deﬁnitions and prove theorems
that apply to both real and complex numbers, we adopt the following notation:
Throughout this book, F stands for either R or C.
Thus if we prove a theorem involving F, we will know that it holds when F is replaced with R and when F is replaced with C. Elements of F are called scalars. The word “scalar”, which means number, is often used when we want to emphasize that an object is a number, as opposed to a vector (vectors will be deﬁned soon).
For z ∈ F and m a positive integer, we deﬁne zm to denote the product of z with itself m times:
zm = z · · · · · z .
m times
Clearly (zm)n = zmn and (wz)m = wmzm for all w, z ∈ F and all positive integers m, n.

The letter F is used because R and C are examples of what are called ﬁelds. In this book we will not need to deal with ﬁelds other than R or C. Many of the deﬁnitions, theorems, and proofs in linear algebra that work for both R and C also work without change if an arbitrary ﬁeld replaces R or C.

4

Chapter 1. Vector Spaces

Definition of Vector Space

Before deﬁning what a vector space is, let’s look at two important examples. The vector space R2, which you can think of as a plane, consists of all ordered pairs of real numbers:
R2 = {(x, y) : x, y ∈ R}.

The vector space R3, which you can think of as ordinary space, consists of all ordered triples of real numbers:
R3 = {(x, y, z) : x, y, z ∈ R}.

Many mathematicians call a list of length n an
n-tuple.

To generalize R2 and R3 to higher dimensions, we ﬁrst need to discuss the concept of lists. Suppose n is a nonnegative integer. A list of length n is an ordered collection of n objects (which might be numbers, other lists, or more abstract entities) separated by commas and surrounded by parentheses. A list of length n looks like this:
(x1, . . . , xn).
Thus a list of length 2 is an ordered pair and a list of length 3 is an ordered triple. For j ∈ {1, . . . , n}, we say that xj is the jth coordinate of the list above. Thus x1 is called the ﬁrst coordinate, x2 is called the second coordinate, and so on.
Sometimes we will use the word list without specifying its length. Remember, however, that by deﬁnition each list has a ﬁnite length that is a nonnegative integer, so that an object that looks like

(x1, x2, . . . ),

which might be said to have inﬁnite length, is not a list. A list of length
0 looks like this: (). We consider such an object to be a list so that
some of our theorems will not have trivial exceptions.
Two lists are equal if and only if they have the same length and
the same coordinates in the same order. In other words, (x1, . . . , xm) equals (y1, . . . , yn) if and only if m = n and x1 = y1, . . . , xm = ym.
Lists differ from sets in two ways: in lists, order matters and repeti-
tions are allowed, whereas in sets, order and repetitions are irrelevant. For example, the lists (3, 5) and (5, 3) are not equal, but the sets {3, 5} and {5, 3} are equal. The lists (4, 4) and (4, 4, 4) are not equal (they

Deﬁnition of Vector Space

5

do not have the same length), though the sets {4, 4} and {4, 4, 4} both equal the set {4}.
To deﬁne the higher-dimensional analogues of R2 and R3, we will
simply replace R with F (which equals R or C) and replace the 2 or 3
with an arbitrary positive integer. Speciﬁcally, ﬁx a positive integer n for the rest of this section. We deﬁne Fn to be the set of all lists of
length n consisting of elements of F:

Fn = {(x1, . . . , xn) : xj ∈ F for j = 1, . . . , n}.

For example, if F = R and n equals 2 or 3, then this deﬁnition of Fn agrees with our previous notions of R2 and R3. As another example, C4 is the set of all lists of four complex numbers:

C4 = {(z1, z2, z3, z4) : z1, z2, z3, z4 ∈ C}.

If n ≥ 4, we cannot easily visualize Rn as a physical object. The same problem arises if we work with complex numbers: C1 can be thought of as a plane, but for n ≥ 2, the human brain cannot provide geometric models of Cn. However, even if n is large, we can perform algebraic manipulations in Fn as easily as in R2 or R3. For example, addition is deﬁned on Fn by adding corresponding coordinates:

1.1

(x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn).

Often the mathematics of Fn becomes cleaner if we use a single entity to denote an list of n numbers, without explicitly writing the coordinates. Thus the commutative property of addition on Fn should be expressed as
x+y =y+x
for all x, y ∈ Fn, rather than the more cumbersome

(x1, . . . , xn) + (y1, . . . , yn) = (y1, . . . , yn) + (x1, . . . , xn)

for all x1, . . . , xn, y1, . . . , yn ∈ F (even though the latter formulation is needed to prove commutativity). If a single letter is used to denote an element of Fn, then the same letter, with appropriate subscripts,
is often used when coordinates must be displayed. For example, if x ∈ Fn, then letting x equal (x1, . . . , xn) is good notation. Even better, work with just x and avoid explicit coordinates, if possible.

For an amusing account of how R3 would be perceived by a creature living in R2, read Flatland: A Romance of Many Dimensions, by Edwin A. Abbott. This novel, published in 1884, can help creatures living in three-dimensional space, such as ourselves, imagine a physical space of four or more dimensions.

6

Chapter 1. Vector Spaces

We let 0 denote the list of length n all of whose coordinates are 0: 0 = (0, . . . , 0).

Note that we are using the symbol 0 in two different ways—on the left side of the equation above, 0 denotes a list of length n, whereas on the right side, each 0 denotes a number. This potentially confusing practice actually causes no problems because the context always makes clear what is intended. For example, consider the statement that 0 is an additive identity for Fn:
x+0=x
for all x ∈ Fn. Here 0 must be a list because we have not deﬁned the sum of an element of Fn (namely, x) and the number 0.
A picture can often aid our intuition. We will draw pictures depicting R2 because we can easily sketch this space on two-dimensional surfaces such as paper and blackboards. A typical element of R2 is a point x = (x1, x2). Sometimes we think of x not as a point but as an arrow starting at the origin and ending at (x1, x2), as in the picture below. When we think of x as an arrow, we refer to it as a vector .
x2-axis

(x1, x2) x

x1-axis
Elements of R2 can be thought of as points or as vectors.
The coordinate axes and the explicit coordinates unnecessarily clutter the picture above, and often you will gain better understanding by dispensing with them and just thinking of the vector, as in the next picture.

Deﬁnition of Vector Space

7

x

0
A vector
Whenever we use pictures in R2 or use the somewhat vague language of points and vectors, remember that these are just aids to our understanding, not substitutes for the actual mathematics that we will develop. Though we cannot draw good pictures in high-dimensional spaces, the elements of these spaces ar√e as rigorously deﬁned as elements of R2. For example, (2, −3, 17, π , 2) is an element of R5, and we may casually refer to it as a point in R5 or a vector in R5 without worrying about whether the geometry of R5 has any physical meaning.
Recall that we deﬁned the sum of two elements of Fn to be the element of Fn obtained by adding corresponding coordinates; see 1.1. In the special case of R2, addition has a simple geometric interpretation. Suppose we have two vectors x and y in R2 that we want to add, as in the left side of the picture below. Move the vector y parallel to itself so that its initial point coincides with the end point of the vector x. The sum x + y then equals the vector whose initial point equals the initial point of x and whose end point equals the end point of the moved vector y, as in the right side of the picture below.

x

x

y

Mathematical models of the economy often have thousands of variables, say x1, . . . , x5000, which means that we must operate in R5000. Such a space cannot be dealt with geometrically, but the algebraic approach works well. That’s why our subject is called linear algebra.

0

0 x+y

y

The sum of two vectors
Our treatment of the vector y in the picture above illustrates a standard philosophy when we think of vectors in R2 as arrows: we can move an arrow parallel to itself (not changing its length or direction) and still think of it as the same vector.

8

Chapter 1. Vector Spaces

In scalar multiplication, we multiply together a scalar and a vector, getting a vector. You may be familiar with the dot product in R2 or R3, in which we multiply together two vectors and obtain a scalar. Generalizations of the dot product will become important when we study inner products in Chapter 6. You may also be familiar with the cross product in R3, in which we multiply together two vectors and obtain another vector. No
useful generalization of this type of
multiplication exists in higher dimensions.

Having dealt with addition in Fn, we now turn to multiplication. We could deﬁne a multiplication on Fn in a similar fashion, starting with two elements of Fn and getting another element of Fn by multiplying corresponding coordinates. Experience shows that this deﬁnition is not useful for our purposes. Another type of multiplication, called scalar

multiplication, will be central to our subject. Speciﬁcally, we need to

deﬁne what it means to multiply an element of Fn by an element of F.

We make the obvious deﬁnition, performing the multiplication in each

coordinate:

a(x1, . . . , xn) = (ax1, . . . , axn);

here a ∈ F and (x1, . . . , xn) ∈ Fn. Scalar multiplication has a nice geometric interpretation in R2. If
a is a positive number and x is a vector in R2, then ax is the vector

that points in the same direction as x and whose length is a times the

length of x. In other words, to get ax, we shrink or stretch x by a factor of a, depending upon whether a < 1 or a > 1. The next picture illustrates this point.

x (1/2)x

(3/2) x

Multiplication by positive scalars
If a is a negative number and x is a vector in R2, then ax is the vector that points in the opposite direction as x and whose length is |a| times the length of x, as illustrated in the next picture.

x

(−1/2) x

(−3/2) x

Multiplication by negative scalars

Deﬁnition of Vector Space

9

The motivation for the deﬁnition of a vector space comes from the important properties possessed by addition and scalar multiplication on Fn. Speciﬁcally, addition on Fn is commutative and associative and has an identity, namely, 0. Every element has an additive inverse. Scalar multiplication on Fn is associative, and scalar multiplication by 1 acts as a multiplicative identity should. Finally, addition and scalar multiplication on Fn are connected by distributive properties.
We will deﬁne a vector space to be a set V along with an addition and a scalar multiplication on V that satisfy the properties discussed in the previous paragraph. By an addition on V we mean a function that assigns an element u + v ∈ V to each pair of elements u, v ∈ V . By a scalar multiplication on V we mean a function that assigns an element av ∈ V to each a ∈ F and each v ∈ V .
Now we are ready to give the formal deﬁnition of a vector space. A vector space is a set V along with an addition on V and a scalar multiplication on V such that the following properties hold:
commutativity u + v = v + u for all u, v ∈ V ;
associativity (u + v) + w = u + (v + w) and (ab)v = a(bv) for all u, v, w ∈ V and all a, b ∈ F;
additive identity there exists an element 0 ∈ V such that v + 0 = v for all v ∈ V ;
additive inverse for every v ∈ V , there exists w ∈ V such that v + w = 0;
multiplicative identity 1v = v for all v ∈ V ;
distributive properties a(u + v) = au + av and (a + b)u = au + bu for all a, b ∈ F and all u, v ∈ V .
The scalar multiplication in a vector space depends upon F. Thus when we need to be precise, we will say that V is a vector space over F instead of saying simply that V is a vector space. For example, Rn is a vector space over R, and Cn is a vector space over C. Frequently, a vector space over R is called a real vector space and a vector space over

10

Chapter 1. Vector Spaces

The simplest vector space contains only one point. In other words, {0} is a vector space, though not a very interesting one.
Though Fn is our crucial example of a vector space, not all vector spaces consist of lists. For example, the elements of P(F) consist of functions on F, not lists. In general, a vector space is an abstract entity whose elements might be lists,
functions, or weird objects.

C is called a complex vector space. Usually the choice of F is either obvious from the context or irrelevant, and thus we often assume that F is lurking in the background without speciﬁcally mentioning it.
Elements of a vector space are called vectors or points. This geometric language sometimes aids our intuition.
Not surprisingly, Fn is a vector space over F, as you should verify. Of course, this example motivated our deﬁnition of vector space.
For another example, consider F∞, which is deﬁned to be the set of all sequences of elements of F:
F∞ = {(x1, x2, . . . ) : xj ∈ F for j = 1, 2, . . . }.
Addition and scalar multiplication on F∞ are deﬁned as expected:
(x1, x2, . . . ) + (y1, y2, . . . ) = (x1 + y1, x2 + y2, . . . ), a(x1, x2, . . . ) = (ax1, ax2, . . . ).
With these deﬁnitions, F∞ becomes a vector space over F, as you should verify. The additive identity in this vector space is the sequence consisting of all 0’s.
Our next example of a vector space involves polynomials. A function p : F → F is called a polynomial with coefﬁcients in F if there exist a0, . . . , am ∈ F such that
p(z) = a0 + a1z + a2z2 + · · · + amzm
for all z ∈ F. We deﬁne P(F) to be the set of all polynomials with coefﬁcients in F. Addition on P(F) is deﬁned as you would expect: if p, q ∈ P(F), then p + q is the polynomial deﬁned by
(p + q)(z) = p(z) + q(z)
for z ∈ F. For example, if p is the polynomial deﬁned by p(z) = 2z +z3 and q is the polynomial deﬁned by q(z) = 7 + 4z, then p + q is the polynomial deﬁned by (p + q)(z) = 7 + 6z + z3. Scalar multiplication on P(F) also has the obvious deﬁnition: if a ∈ F and p ∈ P(F), then ap is the polynomial deﬁned by
(ap)(z) = ap(z)
for z ∈ F. With these deﬁnitions of addition and scalar multiplication, P(F) is a vector space, as you should verify. The additive identity in this vector space is the polynomial all of whose coefﬁcients equal 0.
Soon we will see further examples of vector spaces, but ﬁrst we need to develop some of the elementary properties of vector spaces.

Properties of Vector Spaces

11

Properties of Vector Spaces

The deﬁnition of a vector space requires that it have an additive identity. The proposition below states that this identity is unique.

1.2 Proposition: A vector space has a unique additive identity.

Proof: Suppose 0 and 0 are both additive identities for some vector space V . Then
0 = 0 + 0 = 0,
where the ﬁrst equality holds because 0 is an additive identity and the second equality holds because 0 is an additive identity. Thus 0 = 0, proving that V has only one additive identity.
Each element v in a vector space has an additive inverse, an element w in the vector space such that v + w = 0. The next proposition shows that each element in a vector space has only one additive inverse.

The symbol means “end of the proof”.

1.3 Proposition: Every element in a vector space has a unique additive inverse.
Proof: Suppose V is a vector space. Let v ∈ V . Suppose that w and w are additive inverses of v. Then
w = w + 0 = w + (v + w ) = (w + v) + w = 0 + w = w .
Thus w = w , as desired.
Because additive inverses are unique, we can let −v denote the additive inverse of a vector v. We deﬁne w − v to mean w + (−v).
Almost all the results in this book will involve some vector space. To avoid being distracted by having to restate frequently something such as “Assume that V is a vector space”, we now make the necessary declaration once and for all:
Let’s agree that for the rest of the book V will denote a vector space over F.

12

Chapter 1. Vector Spaces

Note that 1.4 and 1.5 assert something about
scalar multiplication and the additive
identity of V . The only part of the deﬁnition of
a vector space that connects scalar
multiplication and vector addition is the distributive property. Thus the distributive property must be used
in the proofs.

Because of associativity, we can dispense with parentheses when dealing with additions involving more than two elements in a vector space. For example, we can write u+v+w without parentheses because the two possible interpretations of that expression, namely, (u+v)+w and u + (v + w), are equal. We ﬁrst use this familiar convention of not using parentheses in the next proof. In the next proposition, 0 denotes a scalar (the number 0 ∈ F) on the left side of the equation and a vector (the additive identity of V ) on the right side of the equation.
1.4 Proposition: 0v = 0 for every v ∈ V .
Proof: For v ∈ V , we have 0v = (0 + 0)v = 0v + 0v.
Adding the additive inverse of 0v to both sides of the equation above gives 0 = 0v, as desired.
In the next proposition, 0 denotes the additive identity of V . Though their proofs are similar, 1.4 and 1.5 are not identical. More precisely, 1.4 states that the product of the scalar 0 and any vector equals the vector 0, whereas 1.5 states that the product of any scalar and the vector 0 equals the vector 0.
1.5 Proposition: a0 = 0 for every a ∈ F.
Proof: For a ∈ F, we have a0 = a(0 + 0) = a0 + a0.
Adding the additive inverse of a0 to both sides of the equation above gives 0 = a0, as desired.
Now we show that if an element of V is multiplied by the scalar −1, then the result is the additive inverse of the element of V .
1.6 Proposition: (−1)v = −v for every v ∈ V .
Proof: For v ∈ V , we have v + (−1)v = 1v + (−1)v = 1 + (−1) v = 0v = 0.
This equation says that (−1)v, when added to v, gives 0. Thus (−1)v must be the additive inverse of v, as desired.

Subspaces

13

Subspaces

A subset U of V is called a subspace of V if U is also a vector space (using the same addition and scalar multiplication as on V ). For example,
{(x1, x2, 0) : x1, x2 ∈ F}
is a subspace of F3. If U is a subset of V , then to check that U is a subspace of V we
need only check that U satisﬁes the following:

additive identity 0∈U

closed under addition u, v ∈ U implies u + v ∈ U;

closed under scalar multiplication a ∈ F and u ∈ U implies au ∈ U.

The ﬁrst condition insures that the additive identity of V is in U. The

second condition insures that addition makes sense on U . The third

condition insures that scalar multiplication makes sense on U . To show

that U is a vector space, the other parts of the deﬁnition of a vector

space do not need to be checked because they are automatically satis-

ﬁed. For example, the associative and commutative properties of addi-

tion automatically hold on U because they hold on the larger space V . As another example, if the third condition above holds and u ∈ U , then −u (which equals (−1)u by 1.6) is also in U, and hence every element

of U has an additive inverse in U.

The three conditions above usually enable us to determine quickly whether a given subset of V is a subspace of V . For example, if b ∈ F,

then

{(x1, x2, x3, x4) ∈ F4 : x3 = 5x4 + b}

is a subspace of F4 if and only if b = 0, as you should verify. As another

example, you should verify that

{p ∈ P(F) : p(3) = 0}

is a subspace of P(F). The subspaces of R2 are precisely {0}, R2, and all lines in R2 through
the origin. The subspaces of R3 are precisely {0}, R3, all lines in R3

Some mathematicians use the term linear subspace, which means the same as subspace.
Clearly {0} is the smallest subspace of V and V itself is the largest subspace of V . The empty set is not a subspace of V because a subspace must be a vector space and a vector space must contain at least one element, namely, an additive identity.

14

Chapter 1. Vector Spaces

through the origin, and all planes in R3 through the origin. To prove that all these objects are indeed subspaces is easy—the hard part is to show that they are the only subspaces of R2 or R3. That task will be easier after we introduce some additional tools in the next chapter.

Sums and Direct Sums

When dealing with vector spaces, we are usually interested only
in subspaces, as opposed to arbitrary subsets. The union of subspaces is rarely a
subspace (see Exercise 9 in this chapter), which is why we usually work with sums rather than
unions.

In later chapters, we will ﬁnd that the notions of vector space sums and direct sums are useful. We deﬁne these concepts here.
Suppose U1, . . . , Um are subspaces of V . The sum of U1, . . . , Um, denoted U1 + · · · + Um, is deﬁned to be the set of all possible sums of elements of U1, . . . , Um. More precisely,
U1 + · · · + Um = {u1 + · · · + um : u1 ∈ U1, . . . , um ∈ Um}.
You should verify that if U1, . . . , Um are subspaces of V , then the sum U1 + · · · + Um is a subspace of V .
Let’s look at some examples of sums of subspaces. Suppose U is the set of all elements of F3 whose second and third coordinates equal 0, and W is the set of all elements of F3 whose ﬁrst and third coordinates equal 0:
U = {(x, 0, 0) ∈ F3 : x ∈ F} and W = {(0, y, 0) ∈ F3 : y ∈ F}.

Sums of subspaces in the theory of vector
spaces are analogous to unions of subsets in set
theory. Given two subspaces of a vector
space, the smallest subspace containing
them is their sum. Analogously, given two
subsets of a set, the smallest subset
containing them is their union.

Then 1.7

U + W = {(x, y, 0) : x, y ∈ F},

as you should verify. As another example, suppose U is as above and W is the set of all
elements of F3 whose ﬁrst and second coordinates equal each other and whose third coordinate equals 0:
W = {(y, y, 0) ∈ F3 : y ∈ F}.
Then U + W is also given by 1.7, as you should verify. Suppose U1, . . . , Um are subspaces of V . Clearly U1, . . . , Um are all
contained in U1 + · · · + Um (to see this, consider sums u1 + · · · + um where all except one of the u’s are 0). Conversely, any subspace of V containing U1, . . . , Um must contain U1 + · · · + Um (because subspaces

Sums and Direct Sums

15

must contain all ﬁnite sums of their elements). Thus U1 + · · · + Um is the smallest subspace of V containing U1, . . . , Um.
Suppose U1, . . . , Um are subspaces of V such that V = U1 +· · ·+Um. Thus every element of V can be written in the form
u1 + · · · + um,
where each uj ∈ Uj. We will be especially interested in cases where each vector in V can be uniquely represented in the form above. This situation is so important that we give it a special name: direct sum. Speciﬁcally, we say that V is the direct sum of subspaces U1, . . . , Um, written V = U1 ⊕ · · · ⊕ Um, if each element of V can be written uniquely as a sum u1 + · · · + um, where each uj ∈ Uj.
Let’s look at some examples of direct sums. Suppose U is the subspace of F3 consisting of those vectors whose last coordinate equals 0, and W is the subspace of F3 consisting of those vectors whose ﬁrst two coordinates equal 0:
U = {(x, y, 0) ∈ F3 : x, y ∈ F} and W = {(0, 0, z) ∈ F3 : z ∈ F}.
Then F3 = U ⊕ W , as you should verify. As another example, suppose Uj is the subspace of Fn consisting
of those vectors whose coordinates are all 0, except possibly in the jth slot (for example, U2 = {(0, x, 0, . . . , 0) ∈ Fn : x ∈ F}). Then
Fn = U1 ⊕ · · · ⊕ Un,
as you should verify. As a ﬁnal example, consider the vector space P(F) of all polynomials
with coefﬁcients in F. Let Ue denote the subspace of P(F) consisting of all polynomials p of the form
p(z) = a0 + a2z2 + · · · + a2mz2m,
and let Uo denote the subspace of P(F) consisting of all polynomials p of the form
p(z) = a1z + a3z3 + · · · + a2m+1z2m+1;
here m is a nonnegative integer and a0, . . . , a2m+1 ∈ F (the notations Ue and Uo should remind you of even and odd powers of z). You should verify that

The symbol ⊕, consisting of a plus sign inside a circle, is used to denote direct sums as a reminder that we are dealing with a special type of sum of subspaces—each element in the direct sum can be represented only one way as a sum of elements from the speciﬁed subspaces.

16

Chapter 1. Vector Spaces

P(F) = Ue ⊕ Uo.
Sometimes nonexamples add to our understanding as much as examples. Consider the following three subspaces of F3:
U1 = {(x, y, 0) ∈ F3 : x, y ∈ F}; U2 = {(0, 0, z) ∈ F3 : z ∈ F}; U3 = {(0, y, y) ∈ F3 : y ∈ F}.
Clearly F3 = U1 + U2 + U3 because an arbitrary vector (x, y, z) ∈ F3 can be written as
(x, y, z) = (x, y, 0) + (0, 0, z) + (0, 0, 0),
where the ﬁrst vector on the right side is in U1, the second vector is in U2, and the third vector is in U3. However, F3 does not equal the direct sum of U1, U2, U3 because the vector (0, 0, 0) can be written in two different ways as a sum u1+u2+u3, with each uj ∈ Uj. Speciﬁcally, we have
(0, 0, 0) = (0, 1, 0) + (0, 0, 1) + (0, −1, −1)
and, of course,
(0, 0, 0) = (0, 0, 0) + (0, 0, 0) + (0, 0, 0),
where the ﬁrst vector on the right side of each equation above is in U1, the second vector is in U2, and the third vector is in U3.
In the example above, we showed that something is not a direct sum by showing that 0 does not have a unique representation as a sum of appropriate vectors. The deﬁnition of direct sum requires that every vector in the space have a unique representation as an appropriate sum. Suppose we have a collection of subspaces whose sum equals the whole space. The next proposition shows that when deciding whether this collection of subspaces is a direct sum, we need only consider whether 0 can be uniquely written as an appropriate sum.

1.8 Proposition: Suppose that U1, . . . , Un are subspaces of V . Then V = U1 ⊕ · · · ⊕ Un if and only if both the following conditions hold:
(a) V = U1 + · · · + Un; (b) the only way to write 0 as a sum u1 + · · · + un, where each
uj ∈ Uj, is by taking all the uj’s equal to 0.

Sums and Direct Sums

17

Proof: First suppose that V = U1 ⊕ · · · ⊕ Un. Clearly (a) holds (because of how sum and direct sum are deﬁned). To prove (b), suppose that u1 ∈ U1, . . . , un ∈ Un and
0 = u1 + · · · + un.
Then each uj must be 0 (this follows from the uniqueness part of the deﬁnition of direct sum because 0 = 0+· · ·+0 and 0 ∈ U1, . . . , 0 ∈ Un), proving (b).
Now suppose that (a) and (b) hold. Let v ∈ V . By (a), we can write
v = u1 + · · · + un
for some u1 ∈ U1, . . . , un ∈ Un. To show that this representation is unique, suppose that we also have
v = v1 + · · · + vn,
where v1 ∈ U1, . . . , vn ∈ Un. Subtracting these two equations, we have
0 = (u1 − v1) + · · · + (un − vn).
Clearly u1 − v1 ∈ U1, . . . , un − vn ∈ Un, so the equation above and (b) imply that each uj − vj = 0. Thus u1 = v1, . . . , un = vn, as desired.

The next proposition gives a simple condition for testing which pairs of subspaces give a direct sum. Note that this proposition deals only with the case of two subspaces. When asking about a possible direct sum with more than two subspaces, it is not enough to test that any two of the subspaces intersect only at 0. To see this, consider the nonexample presented just before 1.8. In that nonexample, we had F3 = U1 + U2 + U3, but F3 did not equal the direct sum of U1, U2, U3. However, in that nonexample, we have U1∩U2 = U1∩U3 = U2∩U3 = {0} (as you should verify). The next proposition shows that with just two subspaces we get a nice necessary and sufﬁcient condition for a direct sum.
1.9 Proposition: Suppose that U and W are subspaces of V . Then V = U ⊕ W if and only if V = U + W and U ∩ W = {0}.
Proof: First suppose that V = U ⊕ W . Then V = U + W (by the deﬁnition of direct sum). Also, if v ∈ U ∩ W , then 0 = v + (−v), where

Sums of subspaces are analogous to unions of subsets. Similarly, direct sums of subspaces are analogous to disjoint unions of subsets. No two subspaces of a vector space can be disjoint because both must contain 0. So disjointness is replaced, at least in the case of two subspaces, with the requirement that the intersection equals {0}.

18

Chapter 1. Vector Spaces

v ∈ U and −v ∈ W . By the unique representation of 0 as the sum of a vector in U and a vector in W , we must have v = 0. Thus U ∩ W = {0}, completing the proof in one direction.
To prove the other direction, now suppose that V = U + W and U ∩ W = {0}. To prove that V = U ⊕ W , suppose that
0 = u + w,
where u ∈ U and w ∈ W . To complete the proof, we need only show that u = w = 0 (by 1.8). The equation above implies that u = −w ∈ W . Thus u ∈ U ∩ W , and hence u = 0. This, along with equation above, implies that w = 0, completing the proof.

Exercises

19

Exercises

1. Suppose a and b are real numbers, not both 0. Find real numbers
c and d such that 1/(a + bi) = c + di.

2. Show that

√ −1 + 3i

2

is a cube root of 1 (meaning that its cube equals 1).

3. Prove that −(−v) = v for every v ∈ V .

4. Prove that if a ∈ F, v ∈ V , and av = 0, then a = 0 or v = 0.

5. For each of the following subsets of F3, determine whether it is a subspace of F3:

(a) {(x1, x2, x3) ∈ F3 : x1 + 2x2 + 3x3 = 0}; (b) {(x1, x2, x3) ∈ F3 : x1 + 2x2 + 3x3 = 4}; (c) {(x1, x2, x3) ∈ F3 : x1x2x3 = 0}; (d) {(x1, x2, x3) ∈ F3 : x1 = 5x3}.

6. Give an example of a nonempty subset U of R2 such that U is closed under addition and under taking additive inverses (meaning −u ∈ U whenever u ∈ U ), but U is not a subspace of R2.
7. Give an example of a nonempty subset U of R2 such that U is closed under scalar multiplication, but U is not a subspace of R2.

8. Prove that the intersection of any collection of subspaces of V is a subspace of V .

9. Prove that the union of two subspaces of V is a subspace of V if and only if one of the subspaces is contained in the other.
10. Suppose that U is a subspace of V . What is U + U ?

11. Is the operation of addition on the subspaces of V commutative?
Associative? (In other words, if U1, U2, U3 are subspaces of V , is U1 + U2 = U2 + U1? Is (U1 + U2) + U3 = U1 + (U2 + U3)?)

20

Chapter 1. Vector Spaces

12. Does the operation of addition on the subspaces of V have an additive identity? Which subspaces have additive inverses?

13. Prove or give a counterexample: if U1, U2, W are subspaces of V such that U1 + W = U2 + W ,
then U1 = U2.

14. Suppose U is the subspace of P(F) consisting of all polynomials

p of the form

p(z) = az2 + bz5,

where a, b ∈ F. Find a subspace W of P(F) such that P(F) = U ⊕W.

15. Prove or give a counterexample: if U1, U2, W are subspaces of V such that V = U1 ⊕ W and V = U2 ⊕ W ,
then U1 = U2.

Chapter 2
Finite-Dimensional Vector Spaces
In the last chapter we learned about vector spaces. Linear algebra focuses not on arbitrary vector spaces, but on ﬁnite-dimensional vector spaces, which we introduce in this chapter. Here we will deal with the key concepts associated with these spaces: span, linear independence, basis, and dimension.
Let’s review our standing assumptions: Recall that F denotes R or C.
Recall also that V is a vector space over F.
✽✽
21

22

Chapter 2. Finite-Dimensional Vector Spaces

Span and Linear Independence

Some mathematicians use the term linear
span, which means the same as span.
Recall that by deﬁnition every list has
ﬁnite length.

A linear combination of a list (v1, . . . , vm) of vectors in V is a vector of the form

2.1

a1v1 + · · · + amvm,

where a1, . . . , am ∈ F. The set of all linear combinations of (v1, . . . , vm) is called the span of (v1, . . . , vm), denoted span(v1, . . . , vm). In other words,

span(v1, . . . , vm) = {a1v1 + · · · + amvm : a1, . . . , am ∈ F}.
As an example of these concepts, suppose V = F3. The vector (7, 2, 9) is a linear combination of (2, 1, 3), (1, 0, 1) because

(7, 2, 9) = 2(2, 1, 3) + 3(1, 0, 1).

Thus (7, 2, 9) ∈ span (2, 1, 3), (1, 0, 1) . You should verify that the span of any list of vectors in V is a sub-
space of V . To be consistent, we declare that the span of the empty list () equals {0} (recall that the empty set is not a subspace of V ).
If (v1, . . . , vm) is a list of vectors in V , then each vj is a linear combination of (v1, . . . , vm) (to show this, set aj = 1 and let the other a’s in 2.1 equal 0). Thus span(v1, . . . , vm) contains each vj. Conversely, because subspaces are closed under scalar multiplication and addition, every subspace of V containing each vj must contain span(v1, . . . , vm). Thus the span of a list of vectors in V is the smallest subspace of V containing all the vectors in the list.
If span(v1, . . . , vm) equals V , we say that (v1, . . . , vm) spans V . A vector space is called ﬁnite dimensional if some list of vectors in it spans the space. For example, Fn is ﬁnite dimensional because

(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)

spans Fn, as you should verify.
Before giving the next example of a ﬁnite-dimensional vector space, we need to deﬁne the degree of a polynomial. A polynomial p ∈ P(F) is said to have degree m if there exist scalars a0, a1, . . . , am ∈ F with am = 0 such that

2.2

p(z) = a0 + a1z + · · · + amzm

Span and Linear Independence

23

for all z ∈ F. The polynomial that is identically 0 is said to have degree −∞.
For m a nonnegative integer, let Pm(F) denote the set of all polynomials with coefﬁcients in F and degree at most m. You should verify that Pm(F) is a subspace of P(F); hence Pm(F) is a vector space. This vector space is ﬁnite dimensional because it is spanned by the list (1, z, . . . , zm); here we are slightly abusing notation by letting zk denote a function (so z is a dummy variable).
A vector space that is not ﬁnite dimensional is called inﬁnite dimensional. For example, P(F) is inﬁnite dimensional. To prove this, consider any list of elements of P(F). Let m denote the highest degree of any of the polynomials in the list under consideration (recall that by deﬁnition a list has ﬁnite length). Then every polynomial in the span of this list must have degree at most m. Thus our list cannot span P(F). Because no list spans P(F), this vector space is inﬁnite dimensional.
The vector space F∞, consisting of all sequences of elements of F, is also inﬁnite dimensional, though this is a bit harder to prove. You should be able to give a proof by using some of the tools we will soon develop.
Suppose v1, . . . , vm ∈ V and v ∈ span(v1, . . . , vm). By the deﬁnition of span, there exist a1, . . . , am ∈ F such that
v = a1v1 + · · · + amvm.
Consider the question of whether the choice of a’s in the equation above is unique. Suppose aˆ1, . . . , aˆm is another set of scalars such that
v = aˆ1v1 + · · · + aˆmvm.
Subtracting the last two equations, we have
0 = (a1 − aˆ1)v1 + · · · + (am − aˆm)vm.
Thus we have written 0 as a linear combination of (v1, . . . , vm). If the only way to do this is the obvious way (using 0 for all scalars), then each aj − aˆj equals 0, which means that each aj equals aˆj (and thus the choice of a’s was indeed unique). This situation is so important that we give it a special name—linear independence—which we now deﬁne.
A list (v1, . . . , vm) of vectors in V is called linearly independent if the only choice of a1, . . . , am ∈ F that makes a1v1 + · · · + amvm equal 0 is a1 = · · · = am = 0. For example,

Inﬁnite-dimensional vector spaces, which we will not mention much anymore, are the center of attention in the branch of mathematics called functional analysis. Functional analysis uses tools from both analysis and algebra.

24

Chapter 2. Finite-Dimensional Vector Spaces

Most linear algebra texts deﬁne linearly
independent sets instead of linearly independent lists. With that deﬁnition, the set {(0, 1), (0, 1), (1, 0)} is linearly independent in F2 because it equals the set {(0, 1), (1, 0)}. With our deﬁnition, the list (0, 1), (0, 1), (1, 0) is
not linearly independent (because 1
times the ﬁrst vector plus −1 times the
second vector plus 0 times the third vector equals 0). By dealing
with lists instead of sets, we will avoid some problems
associated with the usual approach.

(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0)

is linearly independent in F4, as you should verify. The reasoning in the
previous paragraph shows that (v1, . . . , vm) is linearly independent if and only if each vector in span(v1, . . . , vm) has only one representation as a linear combination of (v1, . . . , vm).
For another example of a linearly independent list, ﬁx a nonnegative integer m. Then (1, z, . . . , zm) is linearly independent in P(F). To verify this, suppose that a0, a1, . . . , am ∈ F are such that

2.3

a0 + a1z + · · · + amzm = 0

for every z ∈ F. If at least one of the coefﬁcients a0, a1, . . . , am were nonzero, then 2.3 could be satisﬁed by at most m distinct values of z (if
you are unfamiliar with this fact, just believe it for now; we will prove
it in Chapter 4); this contradiction shows that all the coefﬁcients in 2.3 equal 0. Hence (1, z, . . . , zm) is linearly independent, as claimed.
A list of vectors in V is called linearly dependent if it is not lin-
early independent. In other words, a list (v1, . . . , vm) of vectors in V is linearly dependent if there exist a1, . . . , am ∈ F, not all 0, such that a1v1 + · · · + amvm = 0. For example, (2, 3, 1), (1, −1, 2), (7, 3, 8) is linearly dependent in F3 because

2(2, 3, 1) + 3(1, −1, 2) + (−1)(7, 3, 8) = (0, 0, 0).

As another example, any list of vectors containing the 0 vector is linearly dependent (why?).
You should verify that a list (v) of length 1 is linearly independent if and only if v = 0. You should also verify that a list of length 2 is linearly independent if and only if neither vector is a scalar multiple of the other. Caution: a list of length three or more may be linearly dependent even though no vector in the list is a scalar multiple of any other vector in the list, as shown by the example in the previous paragraph.
If some vectors are removed from a linearly independent list, the remaining list is also linearly independent, as you should verify. To allow this to remain true even if we remove all the vectors, we declare the empty list () to be linearly independent.
The lemma below will often be useful. It states that given a linearly dependent list of vectors, with the ﬁrst vector not zero, one of the vectors is in the span of the previous ones and furthermore we can throw out that vector without changing the span of the original list.

Span and Linear Independence

25

2.4 Linear Dependence Lemma: If (v1, . . . , vm) is linearly dependent in V and v1 = 0, then there exists j ∈ {2, . . . , m} such that the following hold:
(a) vj ∈ span(v1, . . . , vj−1);
(b) if the jth term is removed from (v1, . . . , vm), the span of the remaining list equals span(v1, . . . , vm).

Proof: Suppose (v1, . . . , vm) is linearly dependent in V and v1 = 0. Then there exist a1, . . . , am ∈ F, not all 0, such that

a1v1 + · · · + amvm = 0.

Not all of a2, a3, . . . , am can be 0 (because v1 = 0). Let j be the largest element of {2, . . . , m} such that aj = 0. Then

2.5

vj

=

−

a1 aj

v1

−

···

−

aj−1 aj

vj

−1,

proving (a).
To prove (b), suppose that u ∈ span(v1, . . . , vm). Then there exist c1, . . . , cm ∈ F such that

u = c1v1 + · · · + cmvm.

In the equation above, we can replace vj with the right side of 2.5, which shows that u is in the span of the list obtained by removing the jth term from (v1, . . . , vm). Thus (b) holds.

Now we come to a key result. It says that linearly independent lists are never longer than spanning lists.

2.6 Theorem: In a ﬁnite-dimensional vector space, the length of every linearly independent list of vectors is less than or equal to the length of every spanning list of vectors.
Proof: Suppose that (u1, . . . , um) is linearly independent in V and that (w1, . . . , wn) spans V . We need to prove that m ≤ n. We do so through the multistep process described below; note that in each step we add one of the u’s and remove one of the w’s.

Suppose that for each positive integer m, there exists a linearly independent list of m vectors in V . Then this theorem implies that V is inﬁnite dimensional.

26

Chapter 2. Finite-Dimensional Vector Spaces

Step 1 The list (w1, . . . , wn) spans V , and thus adjoining any vector to it produces a linearly dependent list. In particular, the list
(u1, w1, . . . , wn)
is linearly dependent. Thus by the linear dependence lemma (2.4), we can remove one of the w’s so that the list B (of length n) consisting of u1 and the remaining w’s spans V .
Step j The list B (of length n) from step j −1 spans V , and thus adjoining any vector to it produces a linearly dependent list. In particular, the list of length (n + 1) obtained by adjoining uj to B, placing it just after u1, . . . , uj−1, is linearly dependent. By the linear dependence lemma (2.4), one of the vectors in this list is in the span of the previous ones, and because (u1, . . . , uj) is linearly independent, this vector must be one of the w’s, not one of the u’s. We can remove that w from B so that the new list B (of length n) consisting of u1, . . . , uj and the remaining w’s spans V .
After step m, we have added all the u’s and the process stops. If at any step we added a u and had no more w’s to remove, then we would have a contradiction. Thus there must be at least as many w’s as u’s.
Our intuition tells us that any vector space contained in a ﬁnitedimensional vector space should also be ﬁnite dimensional. We now prove that this intuition is correct.
2.7 Proposition: Every subspace of a ﬁnite-dimensional vector space is ﬁnite dimensional.
Proof: Suppose V is ﬁnite dimensional and U is a subspace of V . We need to prove that U is ﬁnite dimensional. We do this through the following multistep construction.
Step 1 If U = {0}, then U is ﬁnite dimensional and we are done. If U = {0}, then choose a nonzero vector v1 ∈ U .
Step j If U = span(v1, . . . , vj−1), then U is ﬁnite dimensional and we are

Bases

27

done. If U = span(v1, . . . , vj−1), then choose a vector vj ∈ U such that
vj ∉ span(v1, . . . , vj−1).
After each step, as long as the process continues, we have constructed a list of vectors such that no vector in this list is in the span of the previous vectors. Thus after each step we have constructed a linearly independent list, by the linear dependence lemma (2.4). This linearly independent list cannot be longer than any spanning list of V (by 2.6), and thus the process must eventually terminate, which means that U is ﬁnite dimensional.

Bases

A basis of V is a list of vectors in V that is linearly independent and spans V . For example,
(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)
is a basis of Fn, called the standard basis of Fn. In addition to the standard basis, Fn has many other bases. For example, (1, 2), (3, 5) is a basis of F2. The list (1, 2) is linearly independent but is not a basis of F2 because it does not span F2. The list (1, 2), (3, 5), (4, 7) spans F2 but is not a basis because it is not linearly independent. As another example, (1, z, . . . , zm) is a basis of Pm(F).
The next proposition helps explain why bases are useful.

2.8 Proposition: A list (v1, . . . , vn) of vectors in V is a basis of V if and only if every v ∈ V can be written uniquely in the form

2.9

v = a1v1 + · · · + anvn,

where a1, . . . , an ∈ F.

Proof: First suppose that (v1, . . . , vn) is a basis of V . Let v ∈ V . Because (v1, . . . , vn) spans V , there exist a1, . . . , an ∈ F such that 2.9 holds. To show that the representation in 2.9 is unique, suppose that b1, . . . , bn are scalars so that we also have
v = b1v1 + · · · + bnvn.

This proof is essentially a repetition of the ideas that led us to the deﬁnition of linear independence.

28

Chapter 2. Finite-Dimensional Vector Spaces

Subtracting the last equation from 2.9, we get
0 = (a1 − b1)v1 + · · · + (an − bn)vn.
This implies that each aj − bj = 0 (because (v1, . . . , vn) is linearly independent) and hence a1 = b1, . . . , an = bn. We have the desired uniqueness, completing the proof in one direction.
For the other direction, suppose that every v ∈ V can be written uniquely in the form given by 2.9. Clearly this implies that (v1, . . . , vn) spans V . To show that (v1, . . . , vn) is linearly independent, suppose that a1, . . . , an ∈ F are such that
0 = a1v1 + · · · + anvn.
The uniqueness of the representation 2.9 (with v = 0) implies that a1 = · · · = an = 0. Thus (v1, . . . , vn) is linearly independent and hence is a basis of V .
A spanning list in a vector space may not be a basis because it is not linearly independent. Our next result says that given any spanning list, some of the vectors in it can be discarded so that the remaining list is linearly independent and still spans the vector space.
2.10 Theorem: Every spanning list in a vector space can be reduced to a basis of the vector space.
Proof: Suppose (v1, . . . , vn) spans V . We want to remove some of the vectors from (v1, . . . , vn) so that the remaining vectors form a basis of V . We do this through the multistep process described below. Start with B = (v1, . . . , vn).
Step 1 If v1 = 0, delete v1 from B. If v1 = 0, leave B unchanged.
Step j If vj is in span(v1, . . . , vj−1), delete vj from B. If vj is not in span(v1, . . . , vj−1), leave B unchanged.
Stop the process after step n, getting a list B. This list B spans V because our original list spanned B and we have discarded only vectors that were already in the span of the previous vectors. The process

Bases

29

insures that no vector in B is in the span of the previous ones. Thus B is linearly independent, by the linear dependence lemma (2.4). Hence B is a basis of V .

Consider the list
(1, 2), (3, 6), (4, 7), (5, 9) ,
which spans F2. To make sure that you understand the last proof, you should verify that the process in the proof produces (1, 2), (4, 7) , a basis of F2, when applied to the list above.
Our next result, an easy corollary of the last theorem, tells us that every ﬁnite-dimensional vector space has a basis.

2.11 Corollary: Every ﬁnite-dimensional vector space has a basis.

Proof: By deﬁnition, a ﬁnite-dimensional vector space has a spanning list. The previous theorem tells us that any spanning list can be reduced to a basis.

We have crafted our deﬁnitions so that the ﬁnite-dimensional vector space {0} is not a counterexample to the corollary above. In particular, the empty list () is a basis of the vector space {0} because this list has been deﬁned to be linearly independent and to have span {0}.
Our next theorem is in some sense a dual of 2.10, which said that
every spanning list can be reduced to a basis. Now we show that given
any linearly independent list, we can adjoin some additional vectors so
that the extended list is still linearly independent but also spans the
space.

2.12 Theorem: Every linearly independent list of vectors in a ﬁnitedimensional vector space can be extended to a basis of the vector space.
Proof: Suppose V is ﬁnite dimensional and (v1, . . . , vm) is linearly independent in V . We want to extend (v1, . . . , vm) to a basis of V . We do this through the multistep process described below. First we let (w1, . . . , wn) be any list of vectors in V that spans V .
Step 1 If w1 is in the span of (v1, . . . , vm), let B = (v1, . . . , vm). If w1 is not in the span of (v1, . . . , vm), let B = (v1, . . . , vm, w1).

This theorem can be used to give another proof of the previous corollary. Speciﬁcally, suppose V is ﬁnite dimensional. This theorem implies that the empty list () can be extended to a basis of V . In particular, V has a basis.

30

Chapter 2. Finite-Dimensional Vector Spaces

Step j If wj is in the span of B, leave B unchanged. If wj is not in the span of B, extend B by adjoining wj to it.
After each step, B is still linearly independent because otherwise the linear dependence lemma (2.4) would give a contradiction (recall that (v1, . . . , vm) is linearly independent and any wj that is adjoined to B is not in the span of the previous vectors in B). After step n, the span of B includes all the w’s. Thus the B obtained after step n spans V and hence is a basis of V .

As a nice application of the theorem above, we now show that every subspace of a ﬁnite-dimensional vector space can be paired with another subspace to form a direct sum of the whole space.

Using the same basic ideas but considerably
more advanced tools, this proposition can be
proved without the hypothesis that V is
ﬁnite dimensional.

2.13 Proposition: Suppose V is ﬁnite dimensional and U is a subspace of V . Then there is a subspace W of V such that V = U ⊕ W .

Proof: Because V is ﬁnite dimensional, so is U (see 2.7). Thus
there is a basis (u1, . . . , um) of U (see 2.11). Of course (u1, . . . , um) is a linearly independent list of vectors in V , and thus it can be extended to a basis (u1, . . . , um, w1, . . . , wn) of V (see 2.12). Let W = span(w1, . . . , wn).
To prove that V = U ⊕ W , we need to show that

V = U + W and U ∩ W = {0};

see 1.9. To prove the ﬁrst equation, suppose that v ∈ V . Then,
because the list (u1, . . . , um, w1, . . . , wn) spans V , there exist scalars a1, . . . , am, b1, . . . , bn ∈ F such that

v = a1u1 + · · · + amum + b1w1 + · · · + bnwn .

u

w

In other words, we have v = u+w, where u ∈ U and w ∈ W are deﬁned as above. Thus v ∈ U + W , completing the proof that V = U + W .
To show that U ∩ W = {0}, suppose v ∈ U ∩ W . Then there exist scalars a1, . . . , am, b1, . . . , bn ∈ F such that

v = a1u1 + · · · + amum = b1w1 + · · · + bnwn.

Thus

Dimension

31

a1u1 + · · · + amum − b1w1 − · · · − bnwn = 0.
Because (u1, . . . , um, w1, . . . , wn) is linearly independent, this implies that a1 = · · · = am = b1 = · · · = bn = 0. Thus v = 0, completing the proof that U ∩ W = {0}.

Dimension
Though we have been discussing ﬁnite-dimensional vector spaces, we have not yet deﬁned the dimension of such an object. How should dimension be deﬁned? A reasonable deﬁnition should force the dimension of Fn to equal n. Notice that the basis
(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)
has length n. Thus we are tempted to deﬁne the dimension as the length of a basis. However, a ﬁnite-dimensional vector space in general has many different bases, and our attempted deﬁnition makes sense only if all bases in a given vector space have the same length. Fortunately that turns out to be the case, as we now show.
2.14 Theorem: Any two bases of a ﬁnite-dimensional vector space have the same length.
Proof: Suppose V is ﬁnite dimensional. Let B1 and B2 be any two bases of V . Then B1 is linearly independent in V and B2 spans V , so the length of B1 is at most the length of B2 (by 2.6). Interchanging the roles of B1 and B2, we also see that the length of B2 is at most the length of B1. Thus the length of B1 must equal the length of B2, as desired.
Now that we know that any two bases of a ﬁnite-dimensional vector space have the same length, we can formally deﬁne the dimension of such spaces. The dimension of a ﬁnite-dimensional vector space is deﬁned to be the length of any basis of the vector space. The dimension of V (if V is ﬁnite dimensional) is denoted by dim V . As examples, note that dim Fn = n and dim Pm(F) = m + 1.
Every subspace of a ﬁnite-dimensional vector space is ﬁnite dimensional (by 2.7) and so has a dimension. The next result gives the expected inequality about the dimension of a subspace.

32

Chapter 2. Finite-Dimensional Vector Spaces

2.15 Proposition: If V is ﬁnite dimensional and U is a subspace of V , then dim U ≤ dim V .

Proof: Suppose that V is ﬁnite dimensional and U is a subspace of V . Any basis of U is a linearly independent list of vectors in V and thus can be extended to a basis of V (by 2.12). Hence the length of a basis of U is less than or equal to the length of a basis of V .

The real vector space R2 has dimension 2;
the complex vector space C has
dimension 1. As sets, R2 can be identiﬁed
with C (and addition is the same on both
spaces, as is scalar multiplication by real numbers). Thus when
we talk about the dimension of a vector space, the role played
by the choice of F cannot be neglected.

To check that a list of vectors in V is a basis of V , we must, according to the deﬁnition, show that the list in question satisﬁes two properties: it must be linearly independent and it must span V . The next two results show that if the list in question has the right length, then we need only check that it satisﬁes one of the required two properties. We begin by proving that every spanning list with the right length is a basis.
2.16 Proposition: If V is ﬁnite dimensional, then every spanning list of vectors in V with length dim V is a basis of V .
Proof: Suppose dim V = n and (v1, . . . , vn) spans V . The list (v1, . . . , vn) can be reduced to a basis of V (by 2.10). However, every basis of V has length n, so in this case the reduction must be the trivial one, meaning that no elements are deleted from (v1, . . . , vn). In other words, (v1, . . . , vn) is a basis of V , as desired.

Now we prove that linear independence alone is enough to ensure that a list with the right length is a basis.

2.17 Proposition: If V is ﬁnite dimensional, then every linearly independent list of vectors in V with length dim V is a basis of V .
Proof: Suppose dim V = n and (v1, . . . , vn) is linearly independent in V . The list (v1, . . . , vn) can be extended to a basis of V (by 2.12). However, every basis of V has length n, so in this case the extension must be the trivial one, meaning that no elements are adjoined to (v1, . . . , vn). In other words, (v1, . . . , vn) is a basis of V , as desired.

As an example of how the last proposition can be applied, consider the list (5, 7), (4, 3) . This list of two vectors in F2 is obviously linearly independent (because neither vector is a scalar multiple of the other).

Dimension

33

Because F2 has dimension 2, the last proposition implies that this linearly independent list of length 2 is a basis of F2 (we do not need to bother checking that it spans F2).
The next theorem gives a formula for the dimension of the sum of two subspaces of a ﬁnite-dimensional vector space.

2.18 Theorem: If U1 and U2 are subspaces of a ﬁnite-dimensional vector space, then
dim(U1 + U2) = dim U1 + dim U2 − dim(U1 ∩ U2).
Proof: Let (u1, . . . , um) be a basis of U1 ∩ U2; thus dim(U1 ∩ U2) = m. Because (u1, . . . , um) is a basis of U1 ∩ U2, it is linearly independent in U1 and hence can be extended to a basis (u1, . . . , um, v1, . . . , vj) of U1 (by 2.12). Thus dim U1 = m + j. Also extend (u1, . . . , um) to a basis (u1, . . . , um, w1, . . . , wk) of U2; thus dim U2 = m + k.
We will show that (u1, . . . , um, v1, . . . , vj, w1, . . . , wk) is a basis of U1 + U2. This will complete the proof because then we will have
dim(U1 + U2) = m + j + k = (m + j) + (m + k) − m = dim U1 + dim U2 − dim(U1 ∩ U2).

This formula for the dimension of the sum of two subspaces is analogous to a familiar counting formula: the number of elements in the union of two ﬁnite sets equals the number of elements in the ﬁrst set, plus the number of elements in the second set, minus the number of elements in the intersection of the two sets.

Clearly span(u1, . . . , um, v1, . . . , vj, w1, . . . , wk) contains U1 and U2 and hence contains U1 + U2. So to show that this list is a basis of U1 + U2 we need only show that it is linearly independent. To prove
this, suppose

a1u1 + · · · + amum + b1v1 + · · · + bj vj + c1w1 + · · · + ckwk = 0,

where all the a’s, b’s, and c’s are scalars. We need to prove that all the a’s, b’s, and c’s equal 0. The equation above can be rewritten as

c1w1 + · · · + ckwk = −a1u1 − · · · − amum − b1v1 − · · · − bj vj ,
which shows that c1w1 + · · · + ckwk ∈ U1. All the w’s are in U2, so this implies that c1w1 + · · · + ckwk ∈ U1 ∩ U2. Because (u1, . . . , um) is a basis of U1 ∩ U2, we can write
c1w1 + · · · + ckwk = d1u1 + · · · + dmum

34

Chapter 2. Finite-Dimensional Vector Spaces

for some choice of scalars d1, . . . , dm. But (u1, . . . , um, w1, . . . , wk) is linearly independent, so the last equation implies that all the c’s (and d’s) equal 0. Thus our original equation involving the a’s, b’s, and c’s becomes
a1u1 + · · · + amum + b1v1 + · · · + bjvj = 0.
This equation implies that all the a’s and b’s are 0 because the list (u1, . . . , um, v1, . . . , vj) is linearly independent. We now know that all the a’s, b’s, and c’s equal 0, as desired.

The next proposition shows that dimension meshes well with direct sums. This result will be useful in later chapters.

Recall that direct sum is analogous to disjoint
union. Thus 2.19 is analogous to the
statement that if a ﬁnite set B is written as
A1 ∪ · · · ∪ Am and the sum of the number of
elements in the A’s equals the number of elements in B, then the
union is a disjoint union.

2.19 Proposition: Suppose V is ﬁnite dimensional and U1, . . . , Um are subspaces of V such that

2.20

V = U1 + · · · + Um

and

2.21

dim V = dim U1 + · · · + dim Um.

Then V = U1 ⊕ · · · ⊕ Um.

Proof: Choose a basis for each Uj. Put these bases together in one list, forming a list that spans V (by 2.20) and has length dim V (by 2.21). Thus this list is a basis of V (by 2.16), and in particular it is linearly independent.
Now suppose that u1 ∈ U1, . . . , um ∈ Um are such that
0 = u1 + · · · + um.
We can write each uj as a linear combination of the basis vectors (chosen above) of Uj. Substituting these linear combinations into the expression above, we have written 0 as a linear combination of the basis of V constructed above. Thus all the scalars used in this linear combination must be 0. Thus each uj = 0, which proves that V = U1 ⊕ · · · ⊕ Um (by 1.8).

Exercises

35

Exercises

1. Prove that if (v1, . . . , vn) spans V , then so does the list (v1 − v2, v2 − v3, . . . , vn−1 − vn, vn)
obtained by subtracting from each vector (except the last one) the following vector.
2. Prove that if (v1, . . . , vn) is linearly independent in V , then so is the list (v1 − v2, v2 − v3, . . . , vn−1 − vn, vn) obtained by subtracting from each vector (except the last one) the following vector.
3. Suppose (v1, . . . , vn) is linearly independent in V and w ∈ V . Prove that if (v1 + w, . . . , vn + w) is linearly dependent, then w ∈ span(v1, . . . , vn).
4. Suppose m is a positive integer. Is the set consisting of 0 and all polynomials with coefﬁcients in F and with degree equal to m a subspace of P(F)?
5. Prove that F∞ is inﬁnite dimensional.
6. Prove that the real vector space consisting of all continuous realvalued functions on the interval [0, 1] is inﬁnite dimensional.
7. Prove that V is inﬁnite dimensional if and only if there is a sequence v1, v2, . . . of vectors in V such that (v1, . . . , vn) is linearly independent for every positive integer n.
8. Let U be the subspace of R5 deﬁned by
U = {(x1, x2, x3, x4, x5) ∈ R5 : x1 = 3x2 and x3 = 7x4}.
Find a basis of U .
9. Prove or disprove: there exists a basis (p0, p1, p2, p3) of P3(F) such that none of the polynomials p0, p1, p2, p3 has degree 2.
10. Suppose that V is ﬁnite dimensional, with dim V = n. Prove that there exist one-dimensional subspaces U1, . . . , Un of V such that V = U1 ⊕ · · · ⊕ Un.

36

Chapter 2. Finite-Dimensional Vector Spaces

11. Suppose that V is ﬁnite dimensional and U is a subspace of V such that dim U = dim V . Prove that U = V .
12. Suppose that p0, p1, . . . , pm are polynomials in Pm(F) such that pj(2) = 0 for each j. Prove that (p0, p1, . . . , pm) is not linearly independent in Pm(F).
13. Suppose U and W are subspaces of R8 such that dim U = 3, dim W = 5, and U + W = R8. Prove that U ∩ W = {0}.
14. Suppose that U and W are both ﬁve-dimensional subspaces of R9. Prove that U ∩ W = {0}.
15. You might guess, by analogy with the formula for the number of elements in the union of three subsets of a ﬁnite set, that if U1, U2, U3 are subspaces of a ﬁnite-dimensional vector space, then
dim(U1 + U2 + U3) = dim U1 + dim U2 + dim U3 − dim(U1 ∩ U2) − dim(U1 ∩ U3) − dim(U2 ∩ U3) + dim(U1 ∩ U2 ∩ U3).
Prove this or give a counterexample.
16. Prove that if V is ﬁnite dimensional and U1, . . . , Um are subspaces of V , then
dim(U1 + · · · + Um) ≤ dim U1 + · · · + dim Um.

17. Suppose V is ﬁnite dimensional. Prove that if U1, . . . , Um are subspaces of V such that V = U1 ⊕ · · · ⊕ Um, then
dim V = dim U1 + · · · + dim Um.
This exercise deepens the analogy between direct sums of subspaces and disjoint unions of subsets. Speciﬁcally, compare this exercise to the following obvious statement: if a ﬁnite set is written as a disjoint union of subsets, then the number of elements in the set equals the sum of the number of elements in the disjoint subsets.

Chapter 3
Linear Maps
So far our attention has focused on vector spaces. No one gets excited about vector spaces. The interesting part of linear algebra is the subject to which we now turn—linear maps.
Let’s review our standing assumptions: Recall that F denotes R or C.
Recall also that V is a vector space over F. In this chapter we will frequently need another vector space in addition to V . We will call this additional vector space W :
Let’s agree that for the rest of this chapter W will denote a vector space over F.
✽✽✽
37

38

Chapter 3. Linear Maps

Definitions and Examples

Some mathematicians use the term linear
transformation, which means the same as linear map.

A linear map from V to W is a function T : V → W with the following properties:

additivity T (u + v) = T u + T v for all u, v ∈ V ;

homogeneity T (av) = a(T v) for all a ∈ F and all v ∈ V .

Note that for linear maps we often use the notation T v as well as the more standard functional notation T (v).
The set of all linear maps from V to W is denoted L(V , W ). Let’s look at some examples of linear maps. Make sure you verify that each of the functions deﬁned below is indeed a linear map:

zero

In addition to its other uses, we let the symbol 0 denote the func-

tion that takes each element of some vector space to the additive identity of another vector space. To be speciﬁc, 0 ∈ L(V , W ) is

deﬁned by

0v = 0.

Note that the 0 on the left side of the equation above is a function from V to W , whereas the 0 on the right side is the additive identity in W . As usual, the context should allow you to distinguish between the many uses of the symbol 0.

identity

The identity map, denoted I, is the function on some vector space that takes each element to itself. To be speciﬁc, I ∈ L(V , V ) is

deﬁned by

Iv = v.

differentiation Deﬁne T ∈ L(P(R), P(R)) by
Tp = p .

The assertion that this function is a linear map is another way of stating a basic result about differentiation: (f + g) = f + g and (af ) = af whenever f , g are differentiable and a is a constant.

Deﬁnitions and Examples

39

integration Deﬁne T ∈ L(P(R), R) by
1
T p = p(x) dx.
0
The assertion that this function is linear is another way of stating a basic result about integration: the integral of the sum of two functions equals the sum of the integrals, and the integral of a constant times a function equals the constant times the integral of the function.
multiplication by x2 Deﬁne T ∈ L(P(R), P(R)) by
(T p)(x) = x2p(x)
for x ∈ R.
backward shift Recall that F∞ denotes the vector space of all sequences of elements of F. Deﬁne T ∈ L(F∞, F∞) by
T (x1, x2, x3, . . . ) = (x2, x3, . . . ).
from Fn to Fm Deﬁne T ∈ L(R3, R2) by
T (x, y, z) = (2x − y + 3z, 7x + 5y − 6z).
More generally, let m and n be positive integers, let aj,k ∈ F for j = 1, . . . , m and k = 1, . . . , n, and deﬁne T ∈ L(Fn, Fm) by
T (x1, . . . , xn) = (a1,1x1+· · ·+a1,nxn, . . . , am,1x1+· · ·+am,nxn).
Later we will see that every linear map from Fn to Fm is of this form.
Suppose (v1, . . . , vn) is a basis of V and T : V → W is linear. If v ∈ V , then we can write v in the form
v = a1v1 + · · · + anvn.
The linearity of T implies that

Though linear maps are pervasive throughout mathematics, they are not as ubiquitous as imagined by some confused students who seem to think that cos is a linear map from R to R when they write “identities” such as cos 2x = 2 cos x and cos(x + y) = cos x + cos y.

40

Chapter 3. Linear Maps

T v = a1T v1 + · · · + anT vn.

In particular, the values of T v1, . . . , T vn determine the values of T on arbitrary vectors in V .
Linear maps can be constructed that take on arbitrary values on a
basis. Speciﬁcally, given a basis (v1, . . . , vn) of V and any choice of vectors w1, . . . , wn ∈ W , we can construct a linear map T : V → W such that T vj = wj for j = 1, . . . , n. There is no choice of how to do this—we must deﬁne T by

T (a1v1 + · · · + anvn) = a1w1 + · · · + anwn,

where a1, . . . , an are arbitrary elements of F. Because (v1, . . . , vn) is a
basis of V , the equation above does indeed deﬁne a function T from V
to W . You should verify that the function T deﬁned above is linear and that T vj = wj for j = 1, . . . , n.
Now we will make L(V , W ) into a vector space by deﬁning addition and scalar multiplication on it. For S, T ∈ L(V , W ), deﬁne a function S + T ∈ L(V , W ) in the usual manner of adding functions:

(S + T )v = Sv + T v

for v ∈ V . You should verify that S + T is indeed a linear map from V to W whenever S, T ∈ L(V , W ). For a ∈ F and T ∈ L(V , W ), deﬁne a function aT ∈ L(V , W ) in the usual manner of multiplying a function

by a scalar:

(aT )v = a(T v)

for v ∈ V . You should verify that aT is indeed a linear map from V to W whenever a ∈ F and T ∈ L(V , W ). With the operations we have just deﬁned, L(V , W ) becomes a vector space (as you should verify). Note that the additive identity of L(V , W ) is the zero linear map deﬁned
earlier in this section.
Usually it makes no sense to multiply together two elements of a
vector space, but for some pairs of linear maps a useful product exists.
We will need a third vector space, so suppose U is a vector space over F. If T ∈ L(U, V ) and S ∈ L(V , W ), then we deﬁne ST ∈ L(U , W ) by

(ST )(v) = S(T v)

for v ∈ U. In other words, ST is just the usual composition S ◦T of two functions, but when both functions are linear, most mathematicians

Null Spaces and Ranges

41

write ST instead of S ◦ T . You should verify that ST is indeed a linear map from U to W whenever T ∈ L(U, V ) and S ∈ L(V , W ). Note that ST is deﬁned only when T maps into the domain of S. We often call ST the product of S and T . You should verify that it has most of the usual properties expected of a product:

associativity (T1T2)T3 = T1(T2T3) whenever T1, T2, and T3 are linear maps such that the products make sense (meaning that T3 must map into the domain of T2, and T2 must map into the domain of T1).
identity T I = T and IT = T whenever T ∈ L(V , W ) (note that in the ﬁrst equation I is the identity map on V , and in the second equation I is the identity map on W ).
distributive properties (S1 + S2)T = S1T + S2T and S(T1 + T2) = ST1 + ST2 whenever T , T1, T2 ∈ L(U , V ) and S, S1, S2 ∈ L(V , W ).
Multiplication of linear maps is not commutative. In other words, it is not necessarily true that ST = T S, even if both sides of the equation make sense. For example, if T ∈ L(P(R), P(R)) is the differentiation map deﬁned earlier in this section and S ∈ L(P(R), P(R)) is the multiplication by x2 map deﬁned earlier in this section, then
((ST )p)(x) = x2p (x) but ((T S)p)(x) = x2p (x) + 2xp(x).
In other words, multiplying by x2 and then differentiating is not the same as differentiating and then multiplying by x2.

Null Spaces and Ranges

For T ∈ L(V , W ), the null space of T , denoted null T , is the subset of V consisting of those vectors that T maps to 0:
null T = {v ∈ V : T v = 0}.

Some mathematicians use the term kernel instead of null space.

Let’s look at a few examples from the previous section. In the differentiation example, we deﬁned T ∈ L(P(R), P(R)) by T p = p . The

42

Chapter 3. Linear Maps

only functions whose derivative equals the zero function are the constant functions, so in this case the null space of T equals the set of constant functions.
In the multiplication by x2 example, we deﬁned T ∈ L(P(R), P(R)) by (T p)(x) = x2p(x). The only polynomial p such that x2p(x) = 0 for all x ∈ R is the 0 polynomial. Thus in this case we have
null T = {0}.
In the backward shift example, we deﬁned T ∈ L(F∞, F∞) by
T (x1, x2, x3, . . . ) = (x2, x3, . . . ).
Clearly T (x1, x2, x3, . . . ) equals 0 if and only if x2, x3, . . . are all 0. Thus in this case we have
null T = {(a, 0, 0, . . . ) : a ∈ F}.
The next proposition shows that the null space of any linear map is a subspace of the domain. In particular, 0 is in the null space of every linear map.
3.1 Proposition: If T ∈ L(V , W ), then null T is a subspace of V .
Proof: Suppose T ∈ L(V , W ). By additivity, we have
T (0) = T (0 + 0) = T (0) + T (0),
which implies that T (0) = 0. Thus 0 ∈ null T . If u, v ∈ null T , then
T (u + v) = T u + T v = 0 + 0 = 0,
and hence u + v ∈ null T . Thus null T is closed under addition. If u ∈ null T and a ∈ F, then
T (au) = aT u = a0 = 0,
and hence au ∈ null T . Thus null T is closed under scalar multiplication.
We have shown that null T contains 0 and is closed under addition and scalar multiplication. Thus null T is a subspace of V .

Null Spaces and Ranges

43

A linear map T : V → W is called injective if whenever u, v ∈ V and T u = T v, we have u = v. The next proposition says that we
can check whether a linear map is injective by checking whether 0 is
the only vector that gets mapped to 0. As a simple application of this
proposition, we see that of the three linear maps whose null spaces we computed earlier in this section (differentiation, multiplication by x2, and backward shift), only multiplication by x2 is injective.

Many mathematicians use the term one-to-one, which means the same as injective.

3.2 Proposition: Let T ∈ L(V , W ). Then T is injective if and only if null T = {0}.

Proof: First suppose that T is injective. We want to prove that null T = {0}. We already know that {0} ⊂ null T (by 3.1). To prove the inclusion in the other direction, suppose v ∈ null T . Then
T (v) = 0 = T (0).
Because T is injective, the equation above implies that v = 0. Thus null T = {0}, as desired.
To prove the implication in the other direction, now suppose that null T = {0}. We want to prove that T is injective. To do this, suppose u, v ∈ V and T u = T v. Then
0 = T u − T v = T (u − v).
Thus u − v is in null T , which equals {0}. Hence u − v = 0, which implies that u = v. Hence T is injective, as desired.

For T ∈ L(V , W ), the range of T , denoted range T , is the subset of W consisting of those vectors that are of the form T v for some v ∈ V :
range T = {T v : v ∈ V }.
For example, if T ∈ L(P(R), P(R)) is the differentiation map deﬁned by T p = p , then range T = P(R) because for every polynomial q ∈ P(R) there exists a polynomial p ∈ P(R) such that p = q.
As another example, if T ∈ L(P(R), P(R)) is the linear map of multiplication by x2 deﬁned by (T p)(x) = x2p(x), then the range of T is the set of polynomials of the form a2x2 + · · · + amxm, where a2, . . . , am ∈ R.
The next proposition shows that the range of any linear map is a subspace of the target space.

Some mathematicians use the word image, which means the same as range.

44

Chapter 3. Linear Maps

3.3 Proposition: If T ∈ L(V , W ), then range T is a subspace of W .

Proof: Suppose T ∈ L(V , W ). Then T (0) = 0 (by 3.1), which implies that 0 ∈ range T .
If w1, w2 ∈ range T , then there exist v1, v2 ∈ V such that T v1 = w1 and T v2 = w2. Thus
T (v1 + v2) = T v1 + T v2 = w1 + w2,
and hence w1+w2 ∈ range T . Thus range T is closed under addition. If w ∈ range T and a ∈ F, then there exists v ∈ V such that T v = w.
Thus T (av) = aT v = aw,
and hence aw ∈ range T . Thus range T is closed under scalar multiplication.
We have shown that range T contains 0 and is closed under addition and scalar multiplication. Thus range T is a subspace of W .

Many mathematicians use the term onto,
which means the same as surjective.

A linear map T : V → W is called surjective if its range equals W . For example, the differentiation map T ∈ L(P(R), P(R)) deﬁned by T p = p is surjective because its range equals P(R). As another example, the linear map T ∈ L(P(R), P(R)) deﬁned by (T p)(x) = x2p(x) is not surjective because its range does not equal P(R). As a ﬁnal example, you should verify that the backward shift T ∈ L(F∞, F∞) deﬁned by
T (x1, x2, x3, . . . ) = (x2, x3, . . . )
is surjective.
Whether a linear map is surjective can depend upon what we are
thinking of as the target space. For example, ﬁx a positive integer m. The differentiation map T ∈ L(Pm(R), Pm(R)) deﬁned by T p = p is not surjective because the polynomial xm is not in the range of T . However, the differentiation map T ∈ L(Pm(R), Pm−1(R)) deﬁned by T p = p is surjective because its range equals Pm−1(R), which is now the target space.
The next theorem, which is the key result in this chapter, states that
the dimension of the null space plus the dimension of the range of a
linear map on a ﬁnite-dimensional vector space equals the dimension
of the domain.

Null Spaces and Ranges

45

3.4 Theorem: If V is ﬁnite dimensional and T ∈ L(V , W ), then range T is a ﬁnite-dimensional subspace of W and
dim V = dim null T + dim range T .

Proof: Suppose that V is a ﬁnite-dimensional vector space and T ∈ L(V , W ). Let (u1, . . . , um) be a basis of null T ; thus dim null T = m. The linearly independent list (u1, . . . , um) can be extended to a basis (u1, . . . , um, w1, . . . , wn) of V (by 2.12). Thus dim V = m + n, and to complete the proof, we need only show that range T is ﬁnite dimensional and dim range T = n. We will do this by proving that (T w1, . . . , T wn) is a basis of range T .
Let v ∈ V . Because (u1, . . . , um, w1, . . . , wn) spans V , we can write
v = a1u1 + · · · + amum + b1w1 + · · · + bnwn,

where the a’s and b’s are in F. Applying T to both sides of this equation, we get
T v = b1T w1 + · · · + bnT wn,
where the terms of the form T uj disappeared because each uj ∈ null T . The last equation implies that (T w1, . . . , T wn) spans range T . In particular, range T is ﬁnite dimensional.
To show that (T w1, . . . , T wn) is linearly independent, suppose that c1, . . . , cn ∈ F and
c1T w1 + · · · + cnT wn = 0.

Then

T (c1w1 + · · · + cnwn) = 0,

and hence

c1w1 + · · · + cnwn ∈ null T .

Because (u1, . . . , um) spans null T , we can write

c1w1 + · · · + cnwn = d1u1 + · · · + dmum,

where the d’s are in F. This equation implies that all the c’s (and d’s) are 0 (because (u1, . . . , um, w1, . . . , wn) is linearly independent). Thus (T w1, . . . , T wn) is linearly independent and hence is a basis for range T , as desired.

46

Chapter 3. Linear Maps

Now we can show that no linear map from a ﬁnite-dimensional vector space to a “smaller” vector space can be injective, where “smaller” is measured by dimension.
3.5 Corollary: If V and W are ﬁnite-dimensional vector spaces such that dim V > dim W , then no linear map from V to W is injective.
Proof: Suppose V and W are ﬁnite-dimensional vector spaces such that dim V > dim W . Let T ∈ L(V , W ). Then
dim null T = dim V − dim range T ≥ dim V − dim W > 0,
where the equality above comes from 3.4. We have just shown that dim null T > 0. This means that null T must contain vectors other than 0. Thus T is not injective (by 3.2).
The next corollary, which is in some sense dual to the previous corollary, shows that no linear map from a ﬁnite-dimensional vector space to a “bigger” vector space can be surjective, where “bigger” is measured by dimension.
3.6 Corollary: If V and W are ﬁnite-dimensional vector spaces such that dim V < dim W , then no linear map from V to W is surjective.
Proof: Suppose V and W are ﬁnite-dimensional vector spaces such that dim V < dim W . Let T ∈ L(V , W ). Then
dim range T = dim V − dim null T ≤ dim V < dim W ,
where the equality above comes from 3.4. We have just shown that dim range T < dim W . This means that range T cannot equal W . Thus T is not surjective.
The last two corollaries have important consequences in the theory of linear equations. To see this, ﬁx positive integers m and n, and let aj,k ∈ F for j = 1, . . . , m and k = 1, . . . , n. Deﬁne T : Fn → Fm by

Null Spaces and Ranges

47

n

n

T (x1, . . . , xn) =

a1,kxk, . . . , am,kxk .

k=1

k=1

Now consider the equation T x = 0 (where x ∈ Fn and the 0 here is

the additive identity in Fm, namely, the list of length m consisting of all 0’s). Letting x = (x1, . . . , xn), we can rewrite the equation T x = 0

as a system of homogeneous equations:

n
a1,kxk = 0
k=1
...

n
am,kxk = 0.
k=1

We think of the a’s as known; we are interested in solutions for the
variables x1, . . . , xn. Thus we have m equations and n variables. Obviously x1 = · · · = xn = 0 is a solution; the key question here is whether any other solutions exist. In other words, we want to know if null T is strictly bigger than {0}. This happens precisely when T is not injective
(by 3.2). From 3.5 we see that T is not injective if n > m. Conclusion:
a homogeneous system of linear equations in which there are more
variables than equations must have nonzero solutions.
With T as in the previous paragraph, now consider the equation T x = c, where c = (c1, . . . , cm) ∈ Fm. We can rewrite the equation T x = c as a system of inhomogeneous equations:

n
a1,kxk = c1
k=1
...

n
am,kxk = cm.
k=1

As before, we think of the a’s as known. The key question here is whether for every choice of the constant terms c1, . . . , cm ∈ F, there exists at least one solution for the variables x1, . . . , xn. In other words, we want to know whether range T equals Fm. From 3.6 we see that T
is not surjective if n < m. Conclusion: an inhomogeneous system of
linear equations in which there are more equations than variables has
no solution for some choice of the constant terms.

Homogeneous, in this context, means that the constant term on the right side of each equation equals 0.
These results about homogeneous systems with more variables than equations and inhomogeneous systems with more equations than variables are often proved using Gaussian elimination. The abstract approach taken here leads to cleaner proofs.

48

Chapter 3. Linear Maps

The Matrix of a Linear Map

We have seen that if (v1, . . . , vn) is a basis of V and T : V → W is linear, then the values of T v1, . . . , T vn determine the values of T on arbitrary vectors in V . In this section we will see how matrices are used as an efﬁcient method of recording the values of the T vj’s in terms of a basis of W .
Let m and n denote positive integers. An m-by-n matrix is a rectangular array with m rows and n columns that looks like this:





3.7



a1,1 ...

...

a1,n ...

 .

am,1 . . . am,n

Note that the ﬁrst index refers to the row number and the second in-
dex refers to the column number. Thus a3,2 refers to the entry in the third row, second column of the matrix above. We will usually consider
matrices whose entries are elements of F. Let T ∈ L(V , W ). Suppose that (v1, . . . , vn) is a basis of V and
(w1, . . . , wm) is a basis of W . For each k = 1, . . . , n, we can write T vk uniquely as a linear combination of the w’s:

3.8

T vk = a1,kw1 + · · · + am,kwm,

where aj,k ∈ F for j = 1, . . . , m. The scalars aj,k completely determine the linear map T because a linear map is determined by its values on a basis. The m-by-n matrix 3.7 formed by the a’s is called the matrix of T with respect to the bases (v1, . . . , vn) and (w1, . . . , wm); we denote it by
M T , (v1, . . . , vn), (w1, . . . , wm) .

If the bases (v1, . . . , vn) and (w1, . . . , wm) are clear from the context (for example, if only one set of bases is in sight), we write just M(T ) instead of M T , (v1, . . . , vn), (w1, . . . , wm) .
As an aid to remembering how M(T ) is constructed from T , you
might write the basis vectors v1, . . . , vn for the domain across the top and the basis vectors w1, . . . , wm for the target space along the left, as follows:

The Matrix of a Linear Map

49

v1 . . . vk . . . vn





w1 ...



a1,k ...



wm

am,k

Note that in the matrix above only the kth column is displayed (and thus the second index of each displayed a is k). The kth column of M(T )
consists of the scalars needed to write T vk as a linear combination of the w’s. Thus the picture above should remind you that T vk is retrieved from the matrix M(T ) by multiplying each entry in the kth column by
the corresponding w from the left column, and then adding up the
resulting vectors.
If T is a linear map from Fn to Fm, then unless stated otherwise you
should assume that the bases in question are the standard ones (where the kth basis vector is 1 in the kth slot and 0 in all the other slots). If you think of elements of Fm as columns of m numbers, then you can think of the kth column of M(T ) as T applied to the kth basis vector. For example, if T ∈ L(F2, F3) is deﬁned by

With respect to any choice of bases, the matrix of the 0 linear map (the linear map that takes every vector to 0) consists of all 0’s.

T (x, y) = (x + 3y, 2x + 5y, 7x + 9y),

then T (1, 0) = (1, 2, 7) and T (0, 1) = (3, 5, 9), so the matrix of T (with respect to the standard bases) is the 3-by-2 matrix





13

 2 5  .

79

Suppose we have bases (v1, . . . , vn) of V and (w1, . . . , wm) of W . Thus for each linear map from V to W , we can talk about its matrix (with respect to these bases, of course). Is the matrix of the sum of two linear maps equal to the sum of the matrices of the two maps?
Right now this question does not make sense because, though we have deﬁned the sum of two linear maps, we have not deﬁned the sum of two matrices. Fortunately the obvious deﬁnition of the sum of two matrices has the right properties. Speciﬁcally, we deﬁne addition of matrices of the same size by adding corresponding entries in the matrices:

50

Chapter 3. Linear Maps









a1,1 ...

...

a1,n ...

 + 

b1,1 ...

...

b1,n ...



am,1 . . . am,n

bm,1 . . . bm,n

 = 

a1,1 + b1,1 ...

...

a1,n + b1,n ...

  .

am,1 + bm,1 . . . am,n + bm,n

You should verify that with this deﬁnition of matrix addition,

3.9

M(T + S) = M(T ) + M(S)

whenever T , S ∈ L(V , W ).

Still assuming that we have some bases in mind, is the matrix of a

scalar times a linear map equal to the scalar times the matrix of the

linear map? Again the question does not make sense because we have

not deﬁned scalar multiplication on matrices. Fortunately the obvious

deﬁnition again has the right properties. Speciﬁcally, we deﬁne the

product of a scalar and a matrix by multiplying each entry in the matrix

by the scalar:







c 

a1,1 ...

...

a1,n ...

 = 

ca1,1 ...

...

ca1,n ...

 .

am,1 . . . am,n

cam,1 . . . cam,n

You should verify that with this deﬁnition of scalar multiplication on matrices,

3.10

M(cT ) = cM(T )

whenever c ∈ F and T ∈ L(V , W ). Because addition and scalar multiplication have now been deﬁned
for matrices, you should not be surprised that a vector space is about to appear. We need only a bit of notation so that this new vector space has a name. The set of all m-by-n matrices with entries in F is denoted by Mat(m, n, F). You should verify that with addition and scalar multiplication deﬁned as above, Mat(m, n, F) is a vector space. Note that the additive identity in Mat(m, n, F) is the m-by-n matrix all of whose entries equal 0.
Suppose (v1, . . . , vn) is a basis of V and (w1, . . . , wm) is a basis of W . Suppose also that we have another vector space U and that (u1, . . . , up)

The Matrix of a Linear Map

51

is a basis of U. Consider linear maps S : U → V and T : V → W . The composition T S is a linear map from U to W . How can M(T S) be computed from M(T ) and M(S)? The nicest solution to this question
would be to have the following pretty relationship:

3.11

M(T S) = M(T )M(S).

So far, however, the right side of this equation does not make sense

because we have not yet deﬁned the product of two matrices. We will

choose a deﬁnition of matrix multiplication that forces the equation

above to hold. Let’s see how to do this.

Let









M(T ) = 

a1,1 ...

...

a1,n ...



and

M(S) = 

b1,1 ...

...

b1,p ...

 .

am,1 . . . am,n

bn,1 . . . bn,p

For k ∈ {1, . . . , p}, we have

n
T Suk = T ( br ,kvr )
r =1

n
= br ,kT vr
r =1

n

m

=

br ,k

aj,r wj

r =1

j=1

mn
= ( aj,r br ,k)wj .
j=1 r =1

Thus M(T S) is the m-by-p matrix whose entry in row j, column k

equals

n r =1

aj,r

br

,k

.

Now it’s clear how to deﬁne matrix multiplication so that 3.11 holds.

Namely, if A is an m-by-n matrix with entries aj,k and B is an n-by-p

matrix with entries bj,k, then AB is deﬁned to be the m-by-p matrix

whose entry in row j, column k, equals

n
aj,r br ,k.
r =1

In other words, the entry in row j, column k, of AB is computed by taking row j of A and column k of B, multiplying together corresponding entries, and then summing. Note that we deﬁne the product of two

You probably learned this deﬁnition of matrix multiplication in an earlier course, although you may not have seen this motivation for it.

52

Chapter 3. Linear Maps

You should ﬁnd an example to show that matrix multiplication is
not commutative. In other words, AB is not necessarily equal to BA,
even when both are deﬁned.

matrices only when the number of columns of the ﬁrst matrix equals

the number of rows of the second matrix.

As an example of matrix multiplication, here we multiply together

a 3-by-2 matrix and a 2-by-4 matrix, obtaining a 3-by-4 matrix:





12

 3 4 

56

654 3 2 1 0 −1





10 7 4 1

=  26 19 12 5  .

42 31 20 9

Suppose (v1, . . . , vn) is a basis of V . If v ∈ V , then there exist unique scalars b1, . . . , bn such that

3.12

v = b1v1 + · · · + bnvn.

The matrix of v, denoted M(v), is the n-by-1 matrix deﬁned by

3.13



M(v) = 

b1 ...

 .

bn

Usually the basis is obvious from the context, but when the basis needs to be displayed explicitly use the notation M v, (v1, . . . , vn) instead of M(v).
For example, the matrix of a vector x ∈ Fn with respect to the stan-

dard basis is obtained by writing the coordinates of x as the entries in

an n-by-1 matrix. In other words, if x = (x1, . . . , xn) ∈ Fn, then



M(x) = 

x1 ...

 .

xn

The next proposition shows how the notions of the matrix of a linear
map, the matrix of a vector, and matrix multiplication ﬁt together. In this proposition M(T v) is the matrix of the vector T v with respect to the basis (w1, . . . , wm) and M(v) is the matrix of the vector v with respect to the basis (v1, . . . , vn), whereas M(T ) is the matrix of the linear
map T with respect to the bases (v1, . . . , vn) and (w1, . . . , wm).

3.14 Proposition: Suppose T ∈ L(V , W ) and (v1, . . . , vn) is a basis of V and (w1, . . . , wm) is a basis of W . Then
M(T v) = M(T )M(v)
for every v ∈ V .

Invertibility

53

Proof: Let 3.15





M(T ) = 

a1,1 ...

...

a1,n ...

 .

am,1 . . . am,n

This means, we recall, that

3.16

m
T vk = aj,kwj
j=1

for each k. Let v be an arbitrary vector in V , which we can write in the form 3.12. Thus M(v) is given by 3.13. Now

T v = b1T v1 + · · · + bnT vn

m

m

= b1 aj,1wj + · · · + bn aj,nwj

j=1

j=1

m
= (aj,1b1 + · · · + aj,nbn)wj ,
j=1

where the ﬁrst equality comes from 3.12 and the second equality comes from 3.16. The last equation shows that M(T v), the m-by-1 matrix of

the vector T v with respect to the basis (w1, . . . , wm), is given by the

equation

 M(T v) = 

a1,1b1 + · · · + a1,nbn ...

  .

am,1b1 + · · · + am,nbn

This formula, along with the formulas 3.15 and 3.13 and the deﬁnition of matrix multiplication, shows that M(T v) = M(T )M(v).

Invertibility
A linear map T ∈ L(V , W ) is called invertible if there exists a linear map S ∈ L(W , V ) such that ST equals the identity map on V and T S equals the identity map on W . A linear map S ∈ L(W , V ) satisfying ST = I and T S = I is called an inverse of T (note that the ﬁrst I is the identity map on V and the second I is the identity map on W ).
If S and S are inverses of T , then

54

Chapter 3. Linear Maps

S = SI = S(T S ) = (ST )S = IS = S ,
so S = S . In other words, if T is invertible, then it has a unique inverse, which we denote by T −1. Rephrasing all this once more, if T ∈ L(V , W ) is invertible, then T −1 is the unique element of L(W , V ) such that T −1T = I and T T −1 = I. The following proposition characterizes the invertible linear maps.

3.17 Proposition: A linear map is invertible if and only if it is injective and surjective.
Proof: Suppose T ∈ L(V , W ). We need to show that T is invertible if and only if it is injective and surjective.
First suppose that T is invertible. To show that T is injective, suppose that u, v ∈ V and T u = T v. Then
u = T −1(T u) = T −1(T v) = v,
so u = v. Hence T is injective. We are still assuming that T is invertible. Now we want to prove
that T is surjective. To do this, let w ∈ W . Then w = T (T −1w), which shows that w is in the range of T . Thus range T = W , and hence T is surjective, completing this direction of the proof.
Now suppose that T is injective and surjective. We want to prove that T is invertible. For each w ∈ W , deﬁne Sw to be the unique element of V such that T (Sw) = w (the existence and uniqueness of such an element follow from the surjectivity and injectivity of T ). Clearly T S equals the identity map on W . To prove that ST equals the identity map on V , let v ∈ V . Then
T (ST v) = (T S)(T v) = I(T v) = T v.
This equation implies that ST v = v (because T is injective), and thus ST equals the identity map on V . To complete the proof, we need to show that S is linear. To do this, let w1, w2 ∈ W . Then
T (Sw1 + Sw2) = T (Sw1) + T (Sw2) = w1 + w2.
Thus Sw1 + Sw2 is the unique element of V that T maps to w1 + w2. By the deﬁnition of S, this implies that S(w1 + w2) = Sw1 + Sw2. Hence S satisﬁes the additive property required for linearity. The proof of homogeneity is similar. Speciﬁcally, if w ∈ W and a ∈ F, then

Invertibility

55

T (aSw) = aT (Sw) = aw.
Thus aSw is the unique element of V that T maps to aw. By the deﬁnition of S, this implies that S(aw) = aSw. Hence S is linear, as desired.

Two vector spaces are called isomorphic if there is an invertible linear map from one vector space onto the other one. As abstract vector spaces, two isomorphic spaces have the same properties. From this viewpoint, you can think of an invertible linear map as a relabeling of the elements of a vector space.
If two vector spaces are isomorphic and one of them is ﬁnite dimensional, then so is the other one. To see this, suppose that V and W are isomorphic and that T ∈ L(V , W ) is an invertible linear map. If V is ﬁnite dimensional, then so is W (by 3.4). The same reasoning, with T replaced with T −1 ∈ L(W , V ), shows that if W is ﬁnite dimensional, then so is V . Actually much more is true, as the following theorem shows.

The Greek word isos means equal; the Greek word morph means shape. Thus isomorphic literally means equal shape.

3.18 Theorem: Two ﬁnite-dimensional vector spaces are isomorphic if and only if they have the same dimension.

Proof: First suppose V and W are isomorphic ﬁnite-dimensional vector spaces. Thus there exists an invertible linear map T from V onto W . Because T is invertible, we have null T = {0} and range T = W . Thus dim null T = 0 and dim range T = dim W . The formula
dim V = dim null T + dim range T
(see 3.4) thus becomes the equation dim V = dim W , completing the proof in one direction.
To prove the other direction, suppose V and W are ﬁnite-dimensional vector spaces with the same dimension. Let (v1, . . . , vn) be a basis of V and (w1, . . . , wn) be a basis of W . Let T be the linear map from V to W deﬁned by
T (a1v1 + · · · + anvn) = a1w1 + · · · + anwn.
Then T is surjective because (w1, . . . , wn) spans W , and T is injective because (w1, . . . , wn) is linearly independent. Because T is injective and

56

Chapter 3. Linear Maps

surjective, it is invertible (see 3.17), and hence V and W are isomorphic, as desired.

Because every ﬁnite-dimensional
vector space is isomorphic to some Fn,
why bother with abstract vector spaces?
To answer this question, note that an
investigation of Fn would soon lead to vector spaces that do
not equal Fn. For example, we would encounter the null space and range of linear maps, the set of matrices Mat(n, n, F), and the polynomials Pn(F). Though each of these vector spaces is isomorphic to some Fm, thinking of them that way often adds complexity but no new
insight.

The last theorem implies that every ﬁnite-dimensional vector space is isomorphic to some Fn. Speciﬁcally, if V is a ﬁnite-dimensional vector space and dim V = n, then V and Fn are isomorphic.
If (v1, . . . , vn) is a basis of V and (w1, . . . , wm) is a basis of W , then for each T ∈ L(V , W ), we have a matrix M(T ) ∈ Mat(m, n, F). In other words, once bases have been ﬁxed for V and W , M becomes a function from L(V , W ) to Mat(m, n, F). Notice that 3.9 and 3.10 show that M is
a linear map. This linear map is actually invertible, as we now show.

3.19 Proposition: Suppose that (v1, . . . , vn) is a basis of V and (w1, . . . , wm) is a basis of W . Then M is an invertible linear map between L(V , W ) and Mat(m, n, F).

Proof: We have already noted that M is linear, so we need only prove that M is injective and surjective (by 3.17). Both are easy. Let’s begin with injectivity. If T ∈ L(V , W ) and M(T ) = 0, then T vk = 0 for k = 1, . . . , n. Because (v1, . . . , vn) is a basis of V , this implies that T = 0. Thus M is injective (by 3.2).
To prove that M is surjective, let





A = 

a1,1 ...

...

a1,n ...



am,1 . . . am,n

be a matrix in Mat(m, n, F). Let T be the linear map from V to W such

that

m
T vk = aj,kwj
j=1

for k = 1, . . . , n. Obviously M(T ) equals A, and so the range of M

equals Mat(m, n, F), as desired.

An obvious basis of Mat(m, n, F) consists of those m-by-n matrices that have 0 in all entries except for a 1 in one entry. There are mn such matrices, so the dimension of Mat(m, n, F) equals mn.
Now we can determine the dimension of the vector space of linear maps from one ﬁnite-dimensional vector space to another.

Invertibility

57

3.20 Proposition: If V and W are ﬁnite dimensional, then L(V , W ) is ﬁnite dimensional and
dim L(V , W ) = (dim V )(dim W ).

Proof: This follows from the equation dim Mat(m, n, F) = mn, 3.18, and 3.19.

A linear map from a vector space to itself is called an operator . If we want to specify the vector space, we say that a linear map T : V → V is an operator on V . Because we are so often interested in linear maps from a vector space into itself, we use the notation L(V ) to denote the set of all operators on V . In other words, L(V ) = L(V , V ).
Recall from 3.17 that a linear map is invertible if it is injective and surjective. For a linear map of a vector space into itself, you might wonder whether injectivity alone, or surjectivity alone, is enough to imply invertibility. On inﬁnite-dimensional vector spaces neither condition alone implies invertibility. We can see this from some examples we have already considered. The multiplication by x2 operator (from P(R) to itself) is injective but not surjective. The backward shift (from F∞ to itself) is surjective but not injective. In view of these examples, the next theorem is remarkable—it states that for maps from a ﬁnitedimensional vector space to itself, either injectivity or surjectivity alone implies the other condition.
3.21 Theorem: Suppose V is ﬁnite dimensional. If T ∈ L(V ), then the following are equivalent:
(a) T is invertible;
(b) T is injective;
(c) T is surjective.
Proof: Suppose T ∈ L(V ). Clearly (a) implies (b). Now suppose (b) holds, so that T is injective. Thus null T = {0} (by 3.2). From 3.4 we have
dim range T = dim V − dim null T = dim V ,
which implies that range T equals V (see Exercise 11 in Chapter 2). Thus T is surjective. Hence (b) implies (c).

The deepest and most important parts of linear algebra, as well as most of the rest of this book, deal with operators.

58

Chapter 3. Linear Maps

Now suppose (c) holds, so that T is surjective. Thus range T = V . From 3.4 we have
dim null T = dim V − dim range T = 0,
which implies that null T equals {0}. Thus T is injective (by 3.2), and so T is invertible (we already knew that T was surjective). Hence (c) implies (a), completing the proof.

Exercises

59

Exercises

1. Show that every linear map from a one-dimensional vector space
to itself is multiplication by some scalar. More precisely, prove that if dim V = 1 and T ∈ L(V , V ), then there exists a ∈ F such that T v = av for all v ∈ V .

2. Give an example of a function f : R2 → R such that

Exercise 2 shows that

f (av) = af (v)

homogeneity alone is not enough to imply

for all a ∈ R and all v ∈ R2 but f is not linear.

that a function is a linear map. Additivity

3. Suppose that V is ﬁnite dimensional. Prove that any linear map alone is also not on a subspace of V can be extended to a linear map on V . In enough to imply that a other words, show that if U is a subspace of V and S ∈ L(U , W ), function is a linear then there exists T ∈ L(V , W ) such that T u = Su for all u ∈ U . map, although the proof of this involves

4. Suppose that T is a linear map from V to F. Prove that if u ∈ V advanced tools that are

is not in null T , then

beyond the scope of

V = null T ⊕ {au : a ∈ F}.

this book.

5. Suppose that T ∈ L(V , W ) is injective and (v1, . . . , vn) is linearly independent in V . Prove that (T v1, . . . , T vn) is linearly independent in W .
6. Prove that if S1, . . . , Sn are injective linear maps such that S1 . . . Sn makes sense, then S1 . . . Sn is injective.
7. Prove that if (v1, . . . , vn) spans V and T ∈ L(V , W ) is surjective, then (T v1, . . . , T vn) spans W .
8. Suppose that V is ﬁnite dimensional and that T ∈ L(V , W ). Prove that there exists a subspace U of V such that U ∩ null T = {0} and range T = {T u : u ∈ U}.
9. Prove that if T is a linear map from F4 to F2 such that
null T = {(x1, x2, x3, x4) ∈ F4 : x1 = 5x2 and x3 = 7x4},
then T is surjective.

60

Chapter 3. Linear Maps

10. Prove that there does not exist a linear map from F5 to F2 whose null space equals
{(x1, x2, x3, x4, x5) ∈ F5 : x1 = 3x2 and x3 = x4 = x5}.

11. Prove that if there exists a linear map on V whose null space and range are both ﬁnite dimensional, then V is ﬁnite dimensional.
12. Suppose that V and W are both ﬁnite dimensional. Prove that there exists a surjective linear map from V onto W if and only if dim W ≤ dim V .
13. Suppose that V and W are ﬁnite dimensional and that U is a subspace of V . Prove that there exists T ∈ L(V , W ) such that null T = U if and only if dim U ≥ dim V − dim W .
14. Suppose that W is ﬁnite dimensional and T ∈ L(V , W ). Prove that T is injective if and only if there exists S ∈ L(W , V ) such that ST is the identity map on V .
15. Suppose that V is ﬁnite dimensional and T ∈ L(V , W ). Prove that T is surjective if and only if there exists S ∈ L(W , V ) such that T S is the identity map on W .
16. Suppose that U and V are ﬁnite-dimensional vector spaces and that S ∈ L(V , W ), T ∈ L(U , V ). Prove that
dim null ST ≤ dim null S + dim null T .

17. Prove that the distributive property holds for matrix addition and matrix multiplication. In other words, suppose A, B, and C are matrices whose sizes are such that A(B + C) makes sense. Prove that AB + AC makes sense and that A(B + C) = AB + AC.
18. Prove that matrix multiplication is associative. In other words, suppose A, B, and C are matrices whose sizes are such that (AB)C makes sense. Prove that A(BC) makes sense and that (AB)C = A(BC).

Exercises

61

19. Suppose T ∈ L(Fn, Fm) and that





M(T ) = 

a1,1 ...

...

a1,n ...

 ,

am,1 . . . am,n

This exercise shows that T has the form promised on page 39.

where we are using the standard bases. Prove that

T (x1, . . . , xn) = (a1,1x1+· · ·+a1,nxn, . . . , am,1x1+· · ·+am,nxn) for every (x1, . . . , xn) ∈ Fn.

20. Suppose (v1, . . . , vn) is a basis of V . Prove that the function T : V → Mat(n, 1, F) deﬁned by
T v = M(v)
is an invertible linear map of V onto Mat(n, 1, F); here M(v) is the matrix of v ∈ V with respect to the basis (v1, . . . , vn).

21. Prove that every linear map from Mat(n, 1, F) to Mat(m, 1, F) is given by a matrix multiplication. In other words, prove that if T ∈ L(Mat(n, 1, F), Mat(m, 1, F)), then there exists an m-by-n matrix A such that T B = AB for every B ∈ Mat(n, 1, F).
22. Suppose that V is ﬁnite dimensional and S, T ∈ L(V ). Prove that ST is invertible if and only if both S and T are invertible.
23. Suppose that V is ﬁnite dimensional and S, T ∈ L(V ). Prove that ST = I if and only if T S = I.
24. Suppose that V is ﬁnite dimensional and T ∈ L(V ). Prove that T is a scalar multiple of the identity if and only if ST = T S for every S ∈ L(V ).

25. Prove that if V is ﬁnite dimensional with dim V > 1, then the set of noninvertible operators on V is not a subspace of L(V ).

11:45 am, Jan 11, 2005

62

Chapter 3. Linear Maps

26. Suppose n is a positive integer and ai,j ∈ F for i, j = 1, . . . , n. Prove that the following are equivalent:
(a) The trivial solution x1 = · · · = xn = 0 is the only solution to the homogeneous system of equations
n
a1,kxk = 0
k=1
...
n
an,kxk = 0.
k=1
(b) For every c1, . . . , cn ∈ F, there exists a solution to the system of equations
n
a1,kxk = c1
k=1
...
n
an,kxk = cn.
k=1
Note that here we have the same number of equations as variables.

Chapter 4
Polynomials
This short chapter contains no linear algebra. It does contain the background material on polynomials that we will need in our study of linear maps from a vector space to itself. Many of the results in this chapter will already be familiar to you from other courses; they are included here for completeness. Because this chapter is not about linear algebra, your instructor may go through it rapidly. You may not be asked to scrutinize all the proofs. Make sure, however, that you at least read and understand the statements of all the results in this chapter—they will be used in the rest of the book.
Recall that F denotes R or C.
✽ ✽✽✽
63

64

Chapter 4. Polynomials

Degree

When necessary, use the obvious arithmetic with −∞. For example,
−∞ < m and −∞ + m = −∞ for every integer m. The 0 polynomial is declared to have degree −∞ so that exceptions are not needed for various reasonable results. For example, the degree of pq equals the degree of p plus the degree of q
even if p = 0.

Recall that a function p : F → F is called a polynomial with coefﬁcients in F if there exist a0, . . . , am ∈ F such that
p(z) = a0 + a1z + a2z2 + · · · + amzm
for all z ∈ F. If p can be written in the form above with am = 0, then we say that p has degree m. If all the coefﬁcients a0, . . . , am equal 0, then we say that p has degree −∞. For all we know at this stage, a polynomial may have more than one degree because we have not yet proved that the coefﬁcients in the equation above are uniquely determined by the function p.
Recall that P(F) denotes the vector space of all polynomials with coefﬁcients in F and that Pm(F) is the subspace of P(F) consisting of the polynomials with coefﬁcients in F and degree at most m. A number λ ∈ F is called a root of a polynomial p ∈ P(F) if
p(λ) = 0.

Roots play a crucial role in the study of polynomials. We begin by
showing that λ is a root of p if and only if p is a polynomial multiple of z − λ.

4.1 Proposition: Suppose p ∈ P(F) is a polynomial with degree m ≥ 1. Let λ ∈ F. Then λ is a root of p if and only if there is a polynomial q ∈ P(F) with degree m − 1 such that

4.2

p(z) = (z − λ)q(z)

for all z ∈ F.

Proof: One direction is obvious. Namely, suppose there is a polynomial q ∈ P(F) such that 4.2 holds. Then
p(λ) = (λ − λ)q(λ) = 0,
and hence λ is a root of p, as desired. To prove the other direction, suppose that λ ∈ F is a root of p. Let
a0, . . . , am ∈ F be such that am = 0 and
p(z) = a0 + a1z + a2z2 + · · · + amzm

Degree

65

for all z ∈ F. Because p(λ) = 0, we have 0 = a0 + a1λ + a2λ2 + · · · + amλm.
Subtracting the last two equations, we get p(z) = a1(z − λ) + a2(z2 − λ2) + · · · + am(zm − λm)
for all z ∈ F. For each j = 2, . . . , m, we can write zj − λj = (z − λ)qj−1(z)
for all z ∈ F, where qj−1 is a polynomial with degree j − 1 (speciﬁcally, take qj−1(z) = zj−1 + zj−2λ + · · · + zλj−2 + λj−1). Thus
p(z) = (z − λ) (a1 + a2q2(z) + · · · + amqm−1(z))
q(z)
for all z ∈ F. Clearly q is a polynomial with degree m − 1, as desired.

Now we can prove that polynomials do not have too many roots.
4.3 Corollary: Suppose p ∈ P(F) is a polynomial with degree m ≥ 0. Then p has at most m distinct roots in F.
Proof: If m = 0, then p(z) = a0 = 0 and so p has no roots. If m = 1, then p(z) = a0 + a1z, with a1 = 0, and p has exactly one root, namely, −a0/a1. Now suppose m > 1. We use induction on m, assuming that every polynomial with degree m − 1 has at most m − 1 distinct roots. If p has no roots in F, then we are done. If p has a root λ ∈ F, then by 4.1 there is a polynomial q with degree m − 1 such that
p(z) = (z − λ)q(z)
for all z ∈ F. The equation above shows that if p(z) = 0, then either z = λ or q(z) = 0. In other words, the roots of p consist of λ and the roots of q. By our induction hypothesis, q has at most m − 1 distinct roots in F. Thus p has at most m distinct roots in F.

The next result states that if a polynomial is identically 0, then all its coefﬁcients must be 0.

66

Chapter 4. Polynomials

4.4 Corollary: Suppose a0, . . . , am ∈ F. If a0 + a1z + a2z2 + · · · + amzm = 0
for all z ∈ F, then a0 = · · · = am = 0.
Proof: Suppose a0 +a1z +a2z2 +· · ·+amzm equals 0 for all z ∈ F. By 4.3, no nonnegative integer can be the degree of this polynomial. Thus all the coefﬁcients equal 0.

The corollary above implies that (1, z, . . . , zm) is linearly independent in P(F) for every nonnegative integer m. We had noted this earlier (in Chapter 2), but now we have a complete proof. This linear independence implies that each polynomial can be represented in only one way as a linear combination of functions of the form zj. In particular, the degree of a polynomial is unique.
If p and q are nonnegative integers, with p = 0, then there exist nonnegative integers s and r such that
q = sp + r.

and r < p. Think of dividing q by p, getting s with remainder r . Our next task is to prove an analogous result for polynomials.
Let deg p denote the degree of a polynomial p. The next result is often called the division algorithm, though as stated here it is not really an algorithm, just a useful lemma.

Think of 4.6 as giving the remainder r when
q is divided by p.

4.5 Division Algorithm: Suppose p, q ∈ P(F), with p = 0. Then there exist polynomials s, r ∈ P(F) such that

4.6

q = sp + r

and deg r < deg p.

Proof: Choose s ∈ P(F) such that q − sp has degree as small as possible. Let r = q − sp. Thus 4.6 holds, and all that remains is to show that deg r < deg p. Suppose that deg r ≥ deg p. If c ∈ F and j is a nonnegative integer, then
q − (s + czj)p = r − czjp.

Choose j and c so that the polynomial on the right side of this equation has degree less than deg r (speciﬁcally, take j = deg r − deg p and then

Complex Coefﬁcients

67

choose c so that the coefﬁcients of zdeg r in r and in czjp are equal). This contradicts our choice of s as the polynomial that produces the smallest degree for expressions of the form q − sp, completing the proof.

Complex Coefficients

So far we have been handling polynomials with complex coefﬁcients and polynomials with real coefﬁcients simultaneously through our convention that F denotes R or C. Now we will see some differences between these two cases. In this section we treat polynomials with complex coefﬁcients. In the next section we will use our results about polynomials with complex coefﬁcients to prove corresponding results for polynomials with real coefﬁcients.
Though this chapter contains no linear algebra, the results so far have nonetheless been proved using algebra. The next result, though called the fundamental theorem of algebra, requires analysis for its proof. The short proof presented here uses tools from complex analysis. If you have not had a course in complex analysis, this proof will almost certainly be meaningless to you. In that case, just accept the fundamental theorem of algebra as something that we need to use but whose proof requires more advanced tools that you may learn in later courses.

4.7 Fundamental Theorem of Algebra: Every nonconstant polynomial with complex coefﬁcients has a root.
Proof: Let p be a nonconstant polynomial with complex coefﬁcients. Suppose that p has no roots. Then 1/p is an analytic function on C. Furthermore, p(z) → ∞ as z → ∞, which implies that 1/p → 0 as z → ∞. Thus 1/p is a bounded analytic function on C. By Liouville’s theorem, any such function must be constant. But if 1/p is constant, then p is constant, contradicting our assumption that p is nonconstant.
The fundamental theorem of algebra leads to the following factorization result for polynomials with complex coefﬁcients. Note that in this factorization, the numbers λ1, . . . , λm are precisely the roots of p, for these are the only values of z for which the right side of 4.9 equals 0.

This is an existence theorem. The quadratic formula gives the roots explicitly for polynomials of degree 2. Similar but more complicated formulas exist for polynomials of degree 3 and 4. No such formulas exist for polynomials of degree 5 and above.

68

Chapter 4. Polynomials

4.8 Corollary: If p ∈ P(C) is a nonconstant polynomial, then p has a unique factorization (except for the order of the factors) of the form

4.9

p(z) = c(z − λ1) . . . (z − λm),

where c, λ1, . . . , λm ∈ C.

Proof: Let p ∈ P(C) and let m denote the degree of p. We will use induction on m. If m = 1, then clearly the desired factorization exists
and is unique. So assume that m > 1 and that the desired factorization exists and is unique for all polynomials of degree m − 1.
First we will show that the desired factorization of p exists. By the
fundamental theorem of algebra (4.7), p has a root λ. By 4.1, there is a polynomial q with degree m − 1 such that

p(z) = (z − λ)q(z)

for all z ∈ C. Our induction hypothesis implies that q has the desired factorization, which when plugged into the equation above gives the desired factorization of p.
Now we turn to the question of uniqueness. Clearly c is uniquely determined by 4.9—it must equal the coefﬁcient of zm in p. So we need only show that except for the order, there is only one way to choose λ1, . . . , λm. If

(z − λ1) . . . (z − λm) = (z − τ1) . . . (z − τm)
for all z ∈ C, then because the left side of the equation above equals 0 when z = λ1, one of the τ’s on the right side must equal λ1. Relabeling, we can assume that τ1 = λ1. Now for z = λ1, we can divide both sides of the equation above by z − λ1, getting
(z − λ2) . . . (z − λm) = (z − τ2) . . . (z − τm)
for all z ∈ C except possibly z = λ1. Actually the equation above must hold for all z ∈ C because otherwise by subtracting the right side from the left side we would get a nonzero polynomial that has inﬁnitely many roots. The equation above and our induction hypothesis imply that except for the order, the λ’s are the same as the τ’s, completing the proof of the uniqueness.

Real Coefﬁcients

69

Real Coefficients

Before discussing polynomials with real coefﬁcients, we need to learn a bit more about the complex numbers.
Suppose z = a + bi, where a and b are real numbers. Then a is called the real part of z, denoted Re z, and b is called the imaginary part of z, denoted Im z. Thus for every complex number z, we have
z = Re z + (Im z)i.
The complex conjugate of z ∈ C, denoted z¯, is deﬁned by
z¯ = Re z − (Im z)i.
For example, 2 + 3i = 2 − 3i. The absolute value of a complex number z, denoted |z|, is deﬁned
by |z| = (Re z)2 + (Im z)2. √
For example, |1 + 2i| = 5. Note that |z| is always a nonnegative number.
You should verify that the real and imaginary parts, absolute value, and complex conjugate have the following properties:

Note that z = z¯ if and only if z is a real number.

additivity of real part Re(w + z) = Re w + Re z for all w, z ∈ C;

additivity of imaginary part Im(w + z) = Im w + Im z for all w, z ∈ C;

sum of z and z¯ z + z¯ = 2 Re z for all z ∈ C;

difference of z and z¯ z − z¯ = 2(Im z)i for all z ∈ C;

product of z and z¯ zz¯ = |z|2 for all z ∈ C;

additivity of complex conjugate w + z = w¯ + z¯ for all w, z ∈ C;

multiplicativity of complex conjugate wz = w¯ z¯ for all w, z ∈ C;

70

Chapter 4. Polynomials

conjugate of conjugate z¯ = z for all z ∈ C;
multiplicativity of absolute value |wz| = |w| |z| for all w, z ∈ C.

In the next result, we need to think of a polynomial with real coefﬁcients as an element of P(C). This makes sense because every real
number is also a complex number.

A polynomial with real coefﬁcients may have no real roots. For example, the polynomial 1 + x2 has no real roots. The failure of the fundamental theorem of algebra for R accounts for the differences between operators on real and
complex vector spaces, as we will see in later chapters.

4.10 Proposition: Suppose p is a polynomial with real coefﬁcients. If λ ∈ C is a root of p, then so is λ¯.

Proof: Let

p(z) = a0 + a1z + · · · + amzm,

where a0, . . . , am are real numbers. Suppose λ ∈ C is a root of p. Then

a0 + a1λ + · · · + amλm = 0.

Take the complex conjugate of both sides of this equation, obtaining

a0 + a1λ¯ + · · · + amλ¯m = 0,

where we have used some of the basic properties of complex conjugation listed earlier. The equation above shows that λ¯ is a root of p.

We want to prove a factorization theorem for polynomials with real coefﬁcients. To do this, we begin by characterizing the polynomials with real coefﬁcients and degree 2 that can be written as the product of two polynomials with real coefﬁcients and degree 1.

Think about the connection between the
quadratic formula and this proposition.

4.11 Proposition: Let α, β ∈ R. Then there is a polynomial factorization of the form

4.12

x2 + αx + β = (x − λ1)(x − λ2),

with λ1, λ2 ∈ R, if and only if α2 ≥ 4β.

Proof: Notice that

4.13

x2 + αx + β = (x + α )2 + (β − α2 ).

2

4

Real Coefﬁcients

71

First suppose that α2 < 4β. Then clearly the right side of the

equation above is positive for every x ∈ R, and hence the polynomial

x2 + αx + β has no real roots. Thus no factorization of the form 4.12,

with λ1, λ2 ∈ R, can exist.

Conversely, now suppose that α2 ≥ 4β. Thus there is a real number

c

such

that

c2

=

α2 4

−

β.

From

4.13,

we

have

x2 + αx + β = (x + α )2 − c2 2

= (x + α + c)(x + α − c),

2

2

which gives the desired factorization.

In the following theorem, each term of the form x2 + αjx + βj, with αj2 < 4βj, cannot be factored into the product of two polynomials with real coefﬁcients and degree 1 (by 4.11). Note that in the factorization below, the numbers λ1, . . . , λm are precisely the real roots of p, for these are the only real values of x for which the right side of the equation below equals 0.

4.14 Theorem: If p ∈ P(R) is a nonconstant polynomial, then p has a unique factorization (except for the order of the factors) of the form
p(x) = c(x − λ1) . . . (x − λm)(x2 + α1x + β1) . . . (x2 + αM x + βM ),
where c, λ1, . . . , λm ∈ R and (α1, β1), . . . , (αM , βM ) ∈ R2 with αj2 < 4βj for each j.

Here either m or M may equal 0.

Proof: Let p ∈ P(R) be a nonconstant polynomial. We can think of p as an element of P(C) (because every real number is a complex number). The idea of the proof is to use the factorization 4.8 of p as a polynomial with complex coefﬁcients. Complex but nonreal roots of p come in pairs; see 4.10. Thus if the factorization of p as an element of P(C) includes terms of the form (x − λ) with λ a nonreal complex number, then (x − λ¯) is also a term in the factorization. Combining these two terms, we get a quadratic term of the required form.
The idea sketched in the paragraph above almost provides a proof of the existence of our desired factorization. However, we need to be careful about one point. Suppose λ is a nonreal complex number

72

Chapter 4. Polynomials

Here we are not dividing by 0 because
the roots of x2 − 2(Re λ)x + |λ|2 are λ and λ¯, neither of
which is real.

and (x − λ) is a term in the factorization of p as an element of P(C). We are guaranteed by 4.10 that (x − λ¯) also appears as a term in the factorization, but 4.10 does not state that these two factors appear the same number of times, as needed to make the idea above work. However, all is well. We can write

p(x) = (x − λ)(x − λ¯)q(x) = x2 − 2(Re λ)x + |λ|2 q(x)

for some polynomial q ∈ P(C) with degree two less than the degree
of p. If we can prove that q has real coefﬁcients, then, by using induction on the degree of p, we can conclude that (x − λ) appears in the factorization of p exactly as many times as (x − λ¯).

To prove that q has real coefﬁcients, we solve the equation above

for q, getting

q(x)

=

x2

−

p(x) 2(Re λ)x

+

|λ|2

for all x ∈ R. The equation above implies that q(x) ∈ R for all x ∈ R.

Writing

q(x) = a0 + a1x + · · · + an−2xn−2,

where a0, . . . , an−2 ∈ C, we thus have

0 = Im q(x) = (Im a0) + (Im a1)x + · · · + (Im an−2)xn−2

for all x ∈ R. This implies that Im a0, . . . , Im an−2 all equal 0 (by 4.4). Thus all the coefﬁcients of q are real, as desired, and hence the desired
factorization exists.
Now we turn to the question of uniqueness of our factorization. A factor of p of the form x2 +αx +β with α2 < 4β can be uniquely written as (x − λ)(x − λ¯) with λ ∈ C. A moment’s thought shows that two different factorizations of p as an element of P(R) would lead to two different factorizations of p as an element of P(C), contradicting 4.8.

Exercises

73

Exercises
1. Suppose m and n are positive integers with m ≤ n. Prove that there exists a polynomial p ∈ Pn(F) with exactly m distinct roots.
2. Suppose that z1, . . . , zm+1 are distinct elements of F and that w1, . . . , wm+1 ∈ F. Prove that there exists a unique polynomial p ∈ Pm(F) such that p(zj) = wj
for j = 1, . . . , m + 1.
3. Prove that if p, q ∈ P(F), with p = 0, then there exist unique polynomials s, r ∈ P(F) such that
q = sp + r

and deg r < deg p. In other words, add a uniqueness statement to the division algorithm (4.5).
4. Suppose p ∈ P(C) has degree m. Prove that p has m distinct roots if and only if p and its derivative p have no roots in common.
5. Prove that every polynomial with odd degree and real coefﬁcients has a real root.

Chapter 5
Eigenvalues and Eigenvectors
In Chapter 3 we studied linear maps from one vector space to another vector space. Now we begin our investigation of linear maps from a vector space to itself. Their study constitutes the deepest and most important part of linear algebra. Most of the key results in this area do not hold for inﬁnite-dimensional vector spaces, so we work only on ﬁnite-dimensional vector spaces. To avoid trivialities we also want to eliminate the vector space {0} from consideration. Thus we make the following assumption:
Recall that F denotes R or C. Let’s agree that for the rest of the book V will denote a ﬁnite-dimensional, nonzero vector space over F.
✽✽ ✽✽✽
75

76

Chapter 5. Eigenvalues and Eigenvectors

Invariant Subspaces

The most famous unsolved problem in functional analysis is
called the invariant subspace problem. It
deals with invariant subspaces of operators on inﬁnite-dimensional
vector spaces.

In this chapter we develop the tools that will help us understand the
structure of operators. Recall that an operator is a linear map from a
vector space to itself. Recall also that we denote the set of operators on V by L(V ); in other words, L(V ) = L(V , V ).
Let’s see how we might better understand what an operator looks like. Suppose T ∈ L(V ). If we have a direct sum decomposition

5.1

V = U1 ⊕ · · · ⊕ Um,

where each Uj is a proper subspace of V , then to understand the behavior of T , we need only understand the behavior of each T |Uj ; here T |Uj denotes the restriction of T to the smaller domain Uj. Dealing with T |Uj should be easier than dealing with T because Uj is a smaller vector space than V . However, if we intend to apply tools useful in the
study of operators (such as taking powers), then we have a problem: T |Uj may not map Uj into itself; in other words, T |Uj may not be an operator on Uj. Thus we are led to consider only decompositions of
the form 5.1 where T maps each Uj into itself.
The notion of a subspace that gets mapped into itself is sufﬁciently important to deserve a name. Thus, for T ∈ L(V ) and U a subspace of V , we say that U is invariant under T if u ∈ U implies T u ∈ U. In other words, U is invariant under T if T |U is an operator on U . For example, if T is the operator of differentiation on P7(R), then P4(R) (which is a subspace of P7(R)) is invariant under T because the deriva-
tive of any polynomial of degree at most 4 is also a polynomial with
degree at most 4.
Let’s look at some easy examples of invariant subspaces. Suppose T ∈ L(V ). Clearly {0} is invariant under T . Also, the whole space V is
obviously invariant under T . Must T have any invariant subspaces other than {0} and V ? Later we will see that this question has an afﬁrmative
answer for operators on complex vector spaces with dimension greater
than 1 and also for operators on real vector spaces with dimension
greater than 2. If T ∈ L(V ), then null T is invariant under T (proof: if u ∈ null T ,
then T u = 0, and hence T u ∈ null T ). Also, range T is invariant under T (proof: if u ∈ range T , then T u is also in range T , by the deﬁnition of
range). Although null T and range T are invariant under T , they do not
necessarily provide easy answers to the question about the existence

Invariant Subspaces

77

of invariant subspaces other than {0} and V because null T may equal {0} and range T may equal V (this happens when T is invertible).
We will return later to a deeper study of invariant subspaces. Now we turn to an investigation of the simplest possible nontrivial invariant subspaces—invariant subspaces with dimension 1.
How does an operator behave on an invariant subspace of dimension 1? Subspaces of V of dimension 1 are easy to describe. Take any nonzero vector u ∈ V and let U equal the set of all scalar multiples of u:

5.2

U = {au : a ∈ F}.

Then U is a one-dimensional subspace of V , and every one-dimensional subspace of V is of this form. If u ∈ V and the subspace U deﬁned by 5.2 is invariant under T ∈ L(V ), then T u must be in U , and hence there must be a scalar λ ∈ F such that T u = λu. Conversely, if u is a nonzero vector in V such that T u = λu for some λ ∈ F, then the
subspace U deﬁned by 5.2 is a one-dimensional subspace of V invariant
under T .
The equation

5.3

T u = λu,

These subspaces are loosely connected to the subject of Herbert Marcuse’s well-known book One-Dimensional Man.

which we have just seen is intimately connected with one-dimensional
invariant subspaces, is important enough that the vectors u and scalars λ satisfying it are given special names. Speciﬁcally, a scalar λ ∈ F is called an eigenvalue of T ∈ L(V ) if there exists a nonzero vector u ∈ V such that T u = λu. We must require u to be nonzero because with u = 0 every scalar λ ∈ F satisﬁes 5.3. The comments above show
that T has a one-dimensional invariant subspace if and only if T has
an eigenvalue. The equation T u = λu is equivalent to (T − λI)u = 0, so λ is an
eigenvalue of T if and only if T − λI is not injective. By 3.21, λ is an eigenvalue of T if and only if T − λI is not invertible, and this happens if and only if T − λI is not surjective.
Suppose T ∈ L(V ) and λ ∈ F is an eigenvalue of T . A vector u ∈ V is called an eigenvector of T (corresponding to λ) if T u = λu. Because 5.3 is equivalent to (T − λI)u = 0, we see that the set of eigenvectors of T corresponding to λ equals null(T − λI). In particular, the set of
eigenvectors of T corresponding to λ is a subspace of V .

The regrettable word eigenvalue is half-German, half-English. The German adjective eigen means own in the sense of characterizing some intrinsic property. Some mathematicians use the term characteristic value instead of eigenvalue.

78

Chapter 5. Eigenvalues and Eigenvectors

Some texts deﬁne eigenvectors as we have, except that 0 is declared not to be an eigenvector. With the deﬁnition used here, the set of eigenvectors corresponding to a ﬁxed eigenvalue is a
subspace.

Let’s look at some examples of eigenvalues and eigenvectors. If a ∈ F, then aI has only one eigenvalue, namely, a, and every vector is
an eigenvector for this eigenvalue. For a more complicated example, consider the operator T ∈ L(F2)
deﬁned by

5.4

T (w, z) = (−z, w).

If F = R, then this operator has a nice geometric interpretation: T is just a counterclockwise rotation by 90◦ about the origin in R2. An
operator has an eigenvalue if and only if there exists a nonzero vector
in its domain that gets sent by the operator to a scalar multiple of itself. The rotation of a nonzero vector in R2 obviously never equals a scalar multiple of itself. Conclusion: if F = R, the operator T deﬁned by 5.4 has no eigenvalues. However, if F = C, the story changes. To ﬁnd
eigenvalues of T , we must ﬁnd the scalars λ such that

T (w, z) = λ(w, z)

has some solution other than w = z = 0. For T deﬁned by 5.4, the equation above is equivalent to the simultaneous equations

5.5

−z = λw, w = λz.

Substituting the value for w given by the second equation into the ﬁrst equation gives
−z = λ2z.
Now z cannot equal 0 (otherwise 5.5 implies that w = 0; we are looking for solutions to 5.5 where (w, z) is not the 0 vector), so the equation above leads to the equation
−1 = λ2.
The solutions to this equation are λ = i or λ = −i. You should be able to verify easily that i and −i are eigenvalues of T . Indeed, the eigenvectors corresponding to the eigenvalue i are the vectors of the form (w, −wi), with w ∈ C, and the eigenvectors corresponding to the eigenvalue −i are the vectors of the form (w, wi), with w ∈ C.
Now we show that nonzero eigenvectors corresponding to distinct eigenvalues are linearly independent.

Invariant Subspaces

79

5.6 Theorem: Let T ∈ L(V ). Suppose λ1, . . . , λm are distinct eigenvalues of T and v1, . . . , vm are corresponding nonzero eigenvectors. Then (v1, . . . , vm) is linearly independent.

Proof: Suppose (v1, . . . , vm) is linearly dependent. Let k be the smallest positive integer such that

5.7

vk ∈ span(v1, . . . , vk−1);

the existence of k with this property follows from the linear dependence lemma (2.4). Thus there exist a1, . . . , ak−1 ∈ F such that

5.8

vk = a1v1 + · · · + ak−1vk−1.

Apply T to both sides of this equation, getting

λkvk = a1λ1v1 + · · · + ak−1λk−1vk−1.

Multiply both sides of 5.8 by λk and then subtract the equation above, getting
0 = a1(λk − λ1)v1 + · · · + ak−1(λk − λk−1)vk−1.
Because we chose k to be the smallest positive integer satisfying 5.7, (v1, . . . , vk−1) is linearly independent. Thus the equation above implies that all the a’s are 0 (recall that λk is not equal to any of λ1, . . . , λk−1). However, this means that vk equals 0 (see 5.8), contradicting our hypothesis that all the v’s are nonzero. Therefore our assumption that (v1, . . . , vm) is linearly dependent must have been false.

The corollary below states that an operator cannot have more distinct eigenvalues than the dimension of the vector space on which it acts.

5.9 Corollary: Each operator on V has at most dim V distinct eigenvalues.
Proof: Let T ∈ L(V ). Suppose that λ1, . . . , λm are distinct eigenvalues of T . Let v1, . . . , vm be corresponding nonzero eigenvectors. The last theorem implies that (v1, . . . , vm) is linearly independent. Thus m ≤ dim V (see 2.6), as desired.

80

Chapter 5. Eigenvalues and Eigenvectors

Polynomials Applied to Operators

The main reason that a richer theory exists for operators (which map a vector space into itself) than for linear maps is that operators can be raised to powers. In this section we deﬁne that notion and the key concept of applying a polynomial to an operator.
If T ∈ L(V ), then T T makes sense and is also in L(V ). We usually write T 2 instead of T T . More generally, if m is a positive integer, then T m is deﬁned by
Tm = T ...T .
m times
For convenience we deﬁne T 0 to be the identity operator I on V . Recall from Chapter 3 that if T is an invertible operator, then the
inverse of T is denoted by T −1. If m is a positive integer, then we deﬁne T −m to be (T −1)m.
You should verify that if T is an operator, then
T mT n = T m+n and (T m)n = T mn,
where m and n are allowed to be arbitrary integers if T is invertible and nonnegative integers if T is not invertible.
If T ∈ L(V ) and p ∈ P(F) is a polynomial given by
p(z) = a0 + a1z + a2z2 + · · · + amzm
for z ∈ F, then p(T ) is the operator deﬁned by
p(T ) = a0I + a1T + a2T 2 + · · · + amT m.
For example, if p is the polynomial deﬁned by p(z) = z2 for z ∈ F, then p(T ) = T 2. This is a new use of the symbol p because we are applying it to operators, not just elements of F. If we ﬁx an operator T ∈ L(V ), then the function from P(F) to L(V ) given by p p(T ) is linear, as you should verify.
If p and q are polynomials with coefﬁcients in F, then pq is the polynomial deﬁned by
(pq)(z) = p(z)q(z)
for z ∈ F. You should verify that we have the following nice multiplicative property: if T ∈ L(V ), then

Upper-Triangular Matrices

81

(pq)(T ) = p(T )q(T )

for all polynomials p and q with coefﬁcients in F. Note that any two polynomials in T commute, meaning that p(T )q(T ) = q(T )p(T ), be-

cause

p(T )q(T ) = (pq)(T ) = (qp)(T ) = q(T )p(T ).

Upper-Triangular Matrices

Now we come to one of the central results about operators on complex vector spaces.

5.10 Theorem: Every operator on a ﬁnite-dimensional, nonzero, complex vector space has an eigenvalue.
Proof: Suppose V is a complex vector space with dimension n > 0 and T ∈ L(V ). Choose v ∈ V with v = 0. Then
(v, T v, T 2v, . . . , T nv)
cannot be linearly independent because V has dimension n and we have n + 1 vectors. Thus there exist complex numbers a0, . . . , an, not all 0, such that
0 = a0v + a1T v + · · · + anT nv. Let m be the largest index such that am = 0. Because v = 0, the coefﬁcients a1, . . . , am cannot all be 0, so 0 < m ≤ n. Make the a’s the coefﬁcients of a polynomial, which can be written in factored form (see 4.8) as
a0 + a1z + · · · + anzn = c(z − λ1) . . . (z − λm),
where c is a nonzero complex number, each λj ∈ C, and the equation holds for all z ∈ C. We then have
0 = a0v + a1T v + · · · + anT nv = (a0I + a1T + · · · + anT n)v = c(T − λ1I) . . . (T − λmI)v,
which means that T − λjI is not injective for at least one j. In other words, T has an eigenvalue.

Compare the simple proof of this theorem given here with the standard proof using determinants. With the standard proof, ﬁrst the difﬁcult concept of determinants must be deﬁned, then an operator with 0 determinant must be shown to be not invertible, then the characteristic polynomial needs to be deﬁned, and by the time the proof of this theorem is reached, no insight remains about why it is true.

82

Chapter 5. Eigenvalues and Eigenvectors

The kth column of the matrix is formed from the coefﬁcients used to
write T vk as a linear combination of the v’s.
We often use ∗ to denote matrix entries that we do not know
about or that are irrelevant to the questions being
discussed.

Recall that in Chapter 3 we discussed the matrix of a linear map from one vector space to another vector space. This matrix depended on a choice of a basis for each of the two vector spaces. Now that we are studying operators, which map a vector space to itself, we need only one basis. In addition, now our matrices will be square arrays, rather than the more general rectangular arrays that we considered earlier. Speciﬁcally, let T ∈ L(V ). Suppose (v1, . . . , vn) is a basis of V . For each k = 1, . . . , n, we can write

T vk = a1,kv1 + · · · + an,kvn,

where aj,k ∈ F for j = 1, . . . , n. The n-by-n matrix

5.11







a1,1 ...

...

a1,n ...



an,1 . . . an,n

is called the matrix of T with respect to the basis (v1, . . . , vn); we denote it by M T , (v1, . . . , vn) or just by M(T ) if the basis (v1, . . . , vn)
is clear from the context (for example, if only one basis is in sight). If T is an operator on Fn and no basis is speciﬁed, you should assume
that the basis in question is the standard one (where the jth basis vector is 1 in the jth slot and 0 in all the other slots). You can then think of the jth column of M(T ) as T applied to the jth basis vector.

A central goal of linear algebra is to show that given an operator T ∈ L(V ), there exists a basis of V with respect to which T has a

reasonably simple matrix. To make this vague formulation (“reasonably

simple” is not precise language) a bit more concrete, we might try to make M(T ) have many 0’s.

If V is a complex vector space, then we already know enough to

show that there is a basis of V with respect to which the matrix of T

has 0’s everywhere in the ﬁrst column, except possibly the ﬁrst entry.

In other words, there is a basis of V with respect to which the matrix

of T looks like







λ
0 ...

∗

 ;

0

here the ∗ denotes the entries in all the columns other than the ﬁrst

column. To prove this, let λ be an eigenvalue of T (one exists by 5.10)

Upper-Triangular Matrices

83

and let v be a corresponding nonzero eigenvector. Extend (v) to a

basis of V . Then the matrix of T with respect to this basis has the form

above. Soon we will see that we can choose a basis of V with respect to

which the matrix of T has even more 0’s.

The diagonal of a square matrix consists of the entries along the

straight line from the upper left corner to the bottom right corner.

For example, the diagonal of the matrix 5.11 consists of the entries

a1,1, a2,2, . . . , an,n.

A matrix is called upper triangular if all the entries below the di-

agonal equal 0. For example, the 4-by-4 matrix







6 0 0

2 6 0

7 1 7

5 3 9



0008

is upper triangular. Typically we represent an upper-triangular matrix

in the form

  λ1

...

∗  ;

0

λn

the 0 in the matrix above indicates that all entries below the diagonal in this n-by-n matrix equal 0. Upper-triangular matrices can be considered reasonably simple—for n large, an n-by-n upper-triangular matrix has almost half its entries equal to 0.
The following proposition demonstrates a useful connection between upper-triangular matrices and invariant subspaces.

5.12 Proposition: Suppose T ∈ L(V ) and (v1, . . . , vn) is a basis of V . Then the following are equivalent:
(a) the matrix of T with respect to (v1, . . . , vn) is upper triangular; (b) T vk ∈ span(v1, . . . , vk) for each k = 1, . . . , n; (c) span(v1, . . . , vk) is invariant under T for each k = 1, . . . , n.

Proof: The equivalence of (a) and (b) follows easily from the definitions and a moment’s thought. Obviously (c) implies (b). Thus to complete the proof, we need only prove that (b) implies (c). So suppose that (b) holds. Fix k ∈ {1, . . . , n}. From (b), we know that

84

Chapter 5. Eigenvalues and Eigenvectors

T v1 ∈ span(v1) ⊂ span(v1, . . . , vk); T v2 ∈ span(v1, v2) ⊂ span(v1, . . . , vk);
... T vk ∈ span(v1, . . . , vk).
Thus if v is a linear combination of (v1, . . . , vk), then
T v ∈ span(v1, . . . , vk).
In other words, span(v1, . . . , vk) is invariant under T , completing the proof.

Now we can show that for each operator on a complex vector space, there is a basis of the vector space with respect to which the matrix of the operator has only 0’s below the diagonal. In Chapter 8 we will improve even this result.

This theorem does not hold on real vector
spaces because the ﬁrst vector in a basis with respect to which an operator has an
upper-triangular matrix must be an eigenvector of the operator. Thus if
an operator on a real vector space has no
eigenvalues (we have seen an example on R2), then there is no basis with respect to
which the operator has an upper-triangular matrix.

5.13 Theorem: Suppose V is a complex vector space and T ∈ L(V ). Then T has an upper-triangular matrix with respect to some basis of V .

Proof: We will use induction on the dimension of V . Clearly the desired result holds if dim V = 1.

Suppose now that dim V > 1 and the desired result holds for all

complex vector spaces whose dimension is less than the dimension

of V . Let λ be any eigenvalue of T (5.10 guarantees that T has an

eigenvalue). Let

U = range(T − λI).

Because T −λI is not surjective (see 3.21), dim U < dim V . Furthermore, U is invariant under T . To prove this, suppose u ∈ U . Then

T u = (T − λI)u + λu.

Obviously (T − λI)u ∈ U (from the deﬁnition of U ) and λu ∈ U . Thus the equation above shows that T u ∈ U. Hence U is invariant under T ,
as claimed. Thus T |U is an operator on U . By our induction hypothesis, there
is a basis (u1, . . . , um) of U with respect to which T |U has an upper-
triangular matrix. Thus for each j we have (using 5.12)

5.14

T uj = (T |U )(uj) ∈ span(u1, . . . , uj).

Upper-Triangular Matrices

85

Extend (u1, . . . , um) to a basis (u1, . . . , um, v1, . . . , vn) of V . For each k, we have
T vk = (T − λI)vk + λvk.
The deﬁnition of U shows that (T − λI)vk ∈ U = span(u1, . . . , um). Thus the equation above shows that

5.15

T vk ∈ span(u1, . . . , um, v1, . . . , vk).

From 5.14 and 5.15, we conclude (using 5.12) that T has an uppertriangular matrix with respect to the basis (u1, . . . , um, v1, . . . , vn).

How does one determine from looking at the matrix of an operator whether the operator is invertible? If we are fortunate enough to have a basis with respect to which the matrix of the operator is upper triangular, then this problem becomes easy, as the following proposition shows.

5.16 Proposition: Suppose T ∈ L(V ) has an upper-triangular matrix with respect to some basis of V . Then T is invertible if and only if all the entries on the diagonal of that upper-triangular matrix are nonzero.

Proof: Suppose (v1, . . . , vn) is a basis of V with respect to which T has an upper-triangular matrix

5.17





M T , (v1, . . . , vn) =  λ1

λ2

...

∗  .

0

λn

We need to prove that T is not invertible if and only if one of the λk’s
equals 0.
First we will prove that if one of the λk’s equals 0, then T is not invertible. If λ1 = 0, then T v1 = 0 (from 5.17) and hence T is not invertible, as desired. So suppose that 1 < k ≤ n and λk = 0. Then,
as can be seen from 5.17, T maps each of the vectors v1, . . . , vk−1 into span(v1, . . . , vk−1). Because λk = 0, the matrix representation 5.17 also implies that T vk ∈ span(v1, . . . , vk−1). Thus we can deﬁne a linear map

S : span(v1, . . . , vk) → span(v1, . . . , vk−1)

86

Chapter 5. Eigenvalues and Eigenvectors

by Sv = T v for v ∈ span(v1, . . . , vk). In other words, S is just T restricted to span(v1, . . . , vk).
Note that span(v1, . . . , vk) has dimension k and span(v1, . . . , vk−1) has dimension k − 1 (because (v1, . . . , vn) is linearly independent). Because span(v1, . . . , vk) has a larger dimension than span(v1, . . . , vk−1), no linear map from span(v1, . . . , vk) to span(v1, . . . , vk−1) is injective (see 3.5). Thus there exists a nonzero vector v ∈ span(v1, . . . , vk) such that Sv = 0. Hence T v = 0, and thus T is not invertible, as desired.
To prove the other direction, now suppose that T is not invertible. Thus T is not injective (see 3.21), and hence there exists a nonzero vector v ∈ V such that T v = 0. Because (v1, . . . , vn) is a basis of V , we can write
v = a1v1 + · · · + akvk,
where a1, . . . , ak ∈ F and ak = 0 (represent v as a linear combination of (v1, . . . , vn) and then choose k to be the largest index with a nonzero coefﬁcient). Thus
0 = Tv 0 = T (a1v1 + · · · + akvk)
= (a1T v1 + · · · + ak−1T vk−1) + akT vk.
The last term in parentheses is in span(v1, . . . , vk−1) (because of the upper-triangular form of 5.17). Thus the last equation shows that akT vk ∈ span(v1, . . . , vk−1). Multiplying by 1/ak, which is allowed because ak = 0, we conclude that T vk ∈ span(v1, . . . , vk−1). Thus when T vk is written as a linear combination of the basis (v1, . . . , vn), the coefﬁcient of vk will be 0. In other words, λk in 5.17 must be 0, completing the proof.

Powerful numeric techniques exist for
ﬁnding good approximations to the
eigenvalues of an operator from its
matrix.

Unfortunately no method exists for exactly computing the eigenvalues of a typical operator from its matrix (with respect to an arbitrary basis). However, if we are fortunate enough to ﬁnd a basis with respect to which the matrix of the operator is upper triangular, then the problem of computing the eigenvalues becomes trivial, as the following proposition shows.
5.18 Proposition: Suppose T ∈ L(V ) has an upper-triangular matrix with respect to some basis of V . Then the eigenvalues of T consist precisely of the entries on the diagonal of that upper-triangular matrix.

Diagonal Matrices

87

Proof: Suppose (v1, . . . , vn) is a basis of V with respect to which

T has an upper-triangular matrix





M T , (v1, . . . , vn) =  λ1

λ2

...

∗  .

0

λn

Let λ ∈ F. Then



M T − λI, (v1, . . . , vn) =  λ1 − λ

λ2 − λ

...

 ∗
 .

0

λn − λ

Hence T − λI is not invertible if and only if λ equals one of the λjs (see 5.16). In other words, λ is an eigenvalue of T if and only if λ

equals one of the λjs, as desired.

Diagonal Matrices

A diagonal matrix is a square matrix that is 0 everywhere except

possibly along the diagonal. For example,





800

 0 2 0 

005

is a diagonal matrix. Obviously every diagonal matrix is upper triangu-

lar, although in general a diagonal matrix has many more 0’s than an

upper-triangular matrix.

An operator T ∈ L(V ) has a diagonal matrix

  λ1 . . .

 0


0

λn

with respect to a basis (v1, . . . , vn) of V if and only T v1 = λ1v1 ... T vn = λnvn;