zotero-db/storage/GJBKEL38/.zotero-ft-cache

5296 lines
192 KiB
Plaintext
Raw Normal View History

COMPUTATIONAL SCIENCE AND
ENGINEERING
GILBERT STRANG
Massachusetts Institute of Technology
WELLESLEY-CAMBRIDGE PRESS Box 812060 Wellesley MA 02482
Computational Science and Engineering Copyright @2007 by Gilbert Strang ISBN-10 0-9614088-1-2 ISBN-13 978-0-9614088-1-7
All rights reserved. No part of this work may be reproduced or stored or transmitted by any means, including photocopying, without written permission from Wellesley-Cambridge Press. Translation in any language is strictly prohibited - authorized translations are arranged.
Other texts from Wellesley-Cambridge Press
98765432
Introduction to Linear Algebra, Gilbert Strang
ISBN-10 0-9614088-9-8
ISBN-13 978-0-9614088-9-3.
Wavelets and Filter Banks, Gilbert Strang and Truong Nguyen
ISBN-10 0-9614088-7-1
ISBN-13 978-0-9614088-7-9.
Linear Algebra, Geodesy, and GPS, Gilbert Strang and Kai Borre
ISBN-10 0-9614088-6-3
ISBN-13 978-0-9614088-6-2.
Introduction to Applied Mathematics, Gilbert Strang
ISBN-10 0-9614088-0-4
ISBN-13 978-0-9614088-0-0.
An Analysis of the Finite Element Method, Gilbert Strang and George Fix
ISBN-10 0-9802327-0-8
ISBN-13 978-0-9802327-0-7.
Calculus, Gilbert Strang ISBN-10 0-9614088-2-0
ISBN-13 978-0-9614088-2-4.
Wellesley-Cambridge Press Box 812060 Wellesley MA 02482 USA www.wellesleycambridge.com
gs@math.mit.edu math.mit.edu/~gs phone (781) 431-8488 fax (617) 253-4358
J5IEX text preparation by Valutone Solutions, www.valutone.com.
Ib-'JEX assembly and book design by Brett Coonley, Massachusetts Institute of Technology.
MATLAB@ is a registered trademark of The Mathworks, Inc.
Course materials including syllabus and MATLAB codes and exams are available on the computational science and engineering web site: math.mit.edu/cse. Problem solutions will also be on this cse site, with further examples.
Videotaped lectures of the CSE courses 18.085 and 18.086 (which now use this book) are available on the course web sites: math.mit.edu/18085 and math.mit.edu/18086.
Computational Science and Engineering is also included in OpenCourseWare ocw.mit.edu.
TABLE OF CONTENTS
1 Applied Linear Algebra
1
1.1 Four Special Matrices
1
1.2 Differences, Derivatives, Boundary Conditions
13
1.3 Elimination Leads to K = LDLT
26
1.4 Inverses and Delta Functions
36
1.5 Eigenvalues and Eigenvectors
46
1.6 Positive Definite Matrices
66
1.7 Numerical Linear Algebra: LU, QR, SVD
78
1.8 Best Basis from the SVD
92
2 A Framework for Applied Mathematics
98
2.1 Equilibrium and the Stiffness Matrix
98
2.2 Oscillation by Newton's Law
111
2.3 Least Squares for Rectangular Matrices
128
2.4 Graph Models and Kirchhoff's Laws
142
2.5 Networks and Transfer Functions
156
2.6 Nonlinear Problems
171
2.7 Structures in Equilibrium
185
2.8 Covariances and Recursive Least Squares
200
* 2.9 Graph Cuts and Gene Clustering
217
3 Boundary Value Problems
229
3.1 Differential Equations and Finite Elements
229
3.2 Cubic Splines and Fourth-Order Equations
245
3.3 Gradient and Divergence
255
3.4 Laplace's Equation
269
3.5 Finite Differences and Fast Poisson Solvers
283
3.6 The Finite Element Method
293
3.7 Elasticity and Solid Mechanics
310
4 Fourier Series and Integrals
317
4.1 Fourier Series for Periodic Functions
317
4.2 Chebyshev, Legendre, and Bessel
334
4.3 Discrete Fourier Transform and the FFT
346
4.4 Convolution and Signal Processing
356
4.5 Fourier Integrals
367
4.6 Deconvolution and Integral Equations
381
4.7 Wavelets and Signal Processing
388
Ill
Table of Contents
5 Analytic Functions
403
5.1 Taylor Series and Complex Integration
403
5.2 Famous Functions and Great Theorems
419
5.3 The Laplace Transform and z-Transform
426
5.4 Spectral Methods of Exponential Accuracy
440
6 Initial Value Problems
456
6.1 Introduction
456
6.2 Finite Difference Methods
461
6.3 Accuracy and Stability for Ut = cUx
472
6.4 Wave Equations and Staggered Leapfrog
485
6.5 Diffusion, Convection, and Finance
500
6.6 Nonlinear Flow and Conservation Laws
517
6.7 Fluid Flow and Navier-Stokes
533
6.8 Level Sets and Fast Marching
547
7 Solving Large Systems
551
7.1 Elimination with Reordering
551
7.2 Iterative Methods
563
7.3 Multigrid Methods
571
7.4 Krylov Subspaces and Conjugate Gradients
586
8 Optimization and Minimum Principles
598
8.1 Two Fundamental Examples
598
8.2 Regularized Least Squares
613
8.3 Calculus of Variations
627
8.4 Errors in Projections and Eigenvalues
646
8.5 The Saddle Point Stokes Problem
652
8.6 Linear Programming and Duality
661
8.7 Adjoint Methods in Design
678
Linear Algebra in a Nutshell
685
Sampling and Aliasing
691
Computational Science and Engineering
694
Bibliography
698
Index
704
Teaching and Learning from the Book v
TEACHING AND LEARNING FROM THE BOOK
I hope that mathematics and also engineering departments will approve of this textbook. It developed from teaching the MIT course 18.085 for thirty years. I thank thousands of engineering and science students for learning this subject with me. I certainly do not teach every single topic! Here is my outline:
1. Applied linear algebra (its importance is now recognized)
2. Applied differential equations (with boundary values and initial values)
3. Fourier series including the Discrete Fourier Transform and convolution.
You will have support from the book and the cse website (and the author). Please select the sections appropriate for the course and the class. What I hope is that this book will serve as a basic text for all mathematicians and engineers and scientists, to explain the core ideas of applied mathematics and scientific computing. The subject is beautiful, it is coherent, and it has moved a long way.
The course text in earlier years was my book Introduction to Applied Mathematics (Wellesley-Cambridge Press). That text contains very substantial material that is not in this book, and vice versa. What naturally happened, from lectures and exams and homeworks and projects over all those years, was a clearer focus on how applied and engineering mathematics could be presented. This new book is the result.
This whole book aims to bring ideas and algorithms together. I am convinced that they must be taught and learned in the same course. The algorithm clarifies the idea. The old method, separation of responsibilities, no longer works:
Not perfect
Mathematics courses teach analytical techniques Engineering courses work on real problems
Even within computational science there is a separation we don't need:
Not efficient
Mathematics courses analyze numerical algorithms Engineering and computer science implement the software
I believe it is time to teach and learn the reality of computational science and engineering. I hope this book helps to move that beautiful subject forward. Thank you for reading it.
vi
Gilbert Strang is in the Department of Mathematics at MIT. His textbooks have transformed the teaching of linear algebra into a more useful course for many students. His lectures are on the OpenCourseWare website at ocw.mit.edu, where 18.06 is the most frequently visited of 1700 courses. The next course 18.085 evolved in a natural way to become Computational Science and Engineering, and led to this textbook.
Awards have come for research and teaching and mathematical exposition:
Von Neumann Medal in Computational Mechanics Teaching Prizes from the MIT School of Science Henrici Prize for Applied Analysis Haimo Prize for Distinguished Teaching, Mathematical Association of America Su Buchin Prize, International Congress of Industrial and Applied Mathematics
Gilbert Strang served as President of SIAM (1999-2000) and as chair of the U.S. National Committee on Mathematics.
Earlier books presented the finite element method and the theory of wavelets and the mathematics of GPS. On those topics George Fix and Truong Nguyen and Kai Borre were valuable coauthors. The textbooks Introduction to Linear Algebra and Linear Algebra and Its Applications are widely adopted by mathematics and engineering departments. With one exception (LAA), all books are published by Wellesley-Cambridge Press. They are available also through SIAM.
The present book developed step by step-text first, then problems, MATLAB codes, and video lectures. The response from students has been wonderful. This development will continue on the website math.mit.edu/cse (also /18085 and /18086). Problem solutions will be on that cse site, with further examples.
The crucial need for today's students and readers is to move forward from the older "formula-based" emphasis toward a solution-based course. Solving problems is the heart of modern engineering mathematics and scientific computing.
THE COVER OF THE BOOK
Lois Sellers and Gail Corbett created the cover from the "circles" of Section 2.2. The solution to aim for is a true circle, but Euler's method takes discrete steps. When those steps spiral out, they produce the beautiful background on the cover (not the best circle). The spirals and circles and meshes, plus microarrays and the Gibbs phenomenon, are serious parts of Computational Science and Engineering.
It was the inspiration of the cover artists to highlight the three letters CS E. Those letters have come to identify an exciting direction for applied mathematics. I hope the book and the cover from birchdesignassociates.com and the evolving website math.mit.edu/cse will give you ideas to work with, and pleasure too.
Acknowledgements vii
ACKNOWLEDGEMENTS
I have had wonderful help with this book. For a long time we were a team of two : Brett Coonley prepared hundreds of Ib-'1}jX pages. The book would not exist without his steady support. Then new help came from four directions:
1. Per-Olof Persson and Nick 'Trefethen and Benjamin Seibold and Aslan Kasimov brought the computational part of the book to life. The text explains scientific computing, and their codes do it.
2. The typesetting was completed by www.valutone.com (highly recommended!).
3. Jim Collins and Tim Gardner and Mike Driscoll gave advice on mathematical biology (including the gene microarray on the back cover). From biomechanics to heart rhythms to gene expression, we want and need computational biology. It became clear that clustering is a crucial algorithm in bioinformatics, and far beyond. Des Higham and Inderjit Dhillon and Jon Kleinberg generously helped me to develop the newest section *2.9 on Graph Cuts and Gene Clustering.
4. A host of applied mathematicians and engineers told me what to write.
The words came from teaching thousands of students over 40 happy years. The structure of a textbook emerges safely but slowly, it can't be rushed. For ideas of all kinds, I owe thanks to so many (plus Oxford and the Singapore-MIT Alliance):
Stephen Boyd, Bill Briggs, Yeunwoo Cho, Daniel Cremers, Tim Davis, Sohan Dharmaraja, Alan Edelman, Lotti Ekert, Bob Fourer, Michael Friedlander, Mike Giles (especially), Gene Golub, Nick Gould, Mike Heath, David Hibbitt, Nick Higham, Steven Johnson, David Keyes, Brian Kulis, Ruitian Lang, Jorg Liesen, Ross Lippert, Konstantin Lurie, Bill Morton, Jean-Christophe Nave, Jaime Peraire, Raj Rao, John Reid, Naoki Saito, Mike Saunders, Jos Stam, Vasily Strela, Jared Tanner, Kim Chuan Toh, Alar Toomre, Andy Wathen (especially), Andre Weideman, Chris Wiggins, Karen Willcox, and (on a memorable day at Hong Kong airport) Ding-Xuan Zhou.
May I dedicate this book to my family and friends. They make life beautiful.
Gilbert Strang
viii Introduction
INTRODUCTION
When you study a subject as large as applied mathematics, or teach it, or write about it, you first need to organize it. There has to be a pattern and a structure. Then the reader (and the author!) can fit the pieces together. Let me try to separate this subject into manageable pieces, and propose a structure for this book and this course.
A first step is to see two parts-modeling and solving. Those are reflected in the contents of this book. Applied mathematics identifies the key quantities in the problem, and connects them by differential equations or matrix equations. Those equations are the starting point for scientific computing. In an extreme form, modeling begins with a problem and computing begins with a matrix.
A few more words about those two parts. "Applied mathematics" traditionally includes a study of special functions. These have enormous power and importance (sometimes a complete analysis has to wait for a more advanced course). Also traditionally, "scientific computing" includes a numerical analysis of the algorithm-to test its accuracy and stability. Our focus stays on the basic problems that everybody meets:
A. Constructing the equations of equilibrium and of motion (balance equations)
B. Solving steady state and time-dependent matrix and differential equations.
Most scientists and engineers, by the nature of our minds and our jobs, will concentrate more heavily on one side or the other. We model the problem, or we use algorithms like the FFT and software like MATLAB to solve it. It is terrific to do both. Doing the whole job from start to finish has become possible, because of fast hardware and professionally written software. So we teach both parts.
The complete effort now defines Computational Science and Engineering. New departments are springing up with that name. This is really a text for the basic course in that great (and quickly growing) subject of CSE.
Four Simplifications
We all learn by example. One goal in writing this book and teaching this course is to provide specific examples from many areas of engineering and science. The first section of the first chapter starts with four very particular matrices. Those matrices appear over and over in computational science. The underlying model has been made linear, and discrete, and one-dimensional, with constant coefficients.
I see those as the great simplifications which make it possible to understand applied mathematics. Let me focus on these four steps:
Introduction ix
1. Nonlinear becomes linear 2. Continuous becomes discrete 3. Multidimensional becomes one-dimensional 4. Variable coefficients become constants.
I don't know if "becomes" is the right word. We can't change the reality of nature. But we do begin to understand the real problem by solving a simpler problem. This is illustrated by Einstein and Newton, the two greatest physicists of all time. Einstein's equations of relativity are not linear (and we are still trying to solve them). Newton
linearized the geometry of space (and this book works with F = ma). His linear
equation came 250 years before Einstein connected a nonlinearly to m.
Those four great simplifications are fundamental to the organization of this book.
Chapter 1 includes all four, by working with the special matrices K, T, B, and C.
Here are Kand C:
[-i -~ Stiffness K =
-1 ]
Matrix
-1 2 -1
Circulant Matrix
C = [ -12
-12 -1
-1 2
-1] -1
-1 2
-1
-1 2
This -1, 2, -1 pattern shows constant coefficients in a one-dimensional problem. Being matrices, K and C are already linear and discrete. The difference is in the boundary conditions, which are always crucial. K is "chopped off" at both ends, while C is cyclic or circular or "periodic." (An interval wraps around into a circle, because of -1 in the corners.) The Fourier transform is perfect for C.
Chapter 1 will find K- 1, and the triangular factors in K = LU, and the eigenvalues of K and C. Then Chapter 2 can solve equilibrium problems Ku= f (steady state equations) and initial-value problems Mu"+ Ku= f (time-dependent equations).
If you get to know this remarkable matrix K, and apply good software when it becomes large (and later multidimensional), you have made a terrific start. K is a positive definite second difference matrix, with beautiful properties.
1. Nonlinear becomes linear Chapter 2 models a series of important scientific and engineering and economic problems. In each model, the "physical law" is taken to be linear:
(a) Hooke's Law in mechanics: Displacement is proportional to force (b) Ohm's Law in networks: Current is proportional to voltage difference (c) Scaling law in economics: Output is proportional to input (d) Linear regression in statistics: A straight line or a hyperplane can fit the data.
X Introduction
None of those laws is actually true. They are all approximations (no apology for that, false laws can be extremely useful). The truth is that a spring behaves almost linearly until the applied force is very large. Then the spring stretches easily. A resistor is also close to linear-but the highly nonlinear transistor has revolutionized electronics. Economies of scale destroy the linearity of input-output laws (and a price-sales law). We work with linear models as long as we can-but eventually we can't.
That was not a complete list of applications-this book gives more. Biology and medicine are rich in the nonlinearities that make our bodies work. So are engineering and chemistry and materials science, and also financial mathematics. Linearization is the fundamental idea of calculus-a curve is known by its tangent lines. Newton's method solves a nonlinear equation by a series of linear equations. No wonder that I find linear algebra everywhere.
Let me note that "physical nonlinearity" is easier than "geometric nonlinearity." In the bending of a beam, we replace the true but awful curvature formula
u"/(1 + (u')2)312 by a simple u". That succeeds when u 1 is small-typical for many
problems. In other cases we can't linearize. If Boeing had assumed ideal flow and ignored the Navier-Stokes equations, the 777 would never fly.
2. Continuous becomes discrete Chapter 3 introduces differential equations.
The leading example is Laplace's equation fJ2u/fJx2 +a2u/ay2 = 0, when the magic of
complex variables produces a complete family of particular solutions. The solutions
come in pairs from (x + iyt and rnein°. We call the pairs u ands:
u(x,y)
X x2 -y2
s(x, y)
y 2xy
u(r,0)
rcos0 r 2 cos20
s(r,0)
rsin0 r 2 sin 20
Laplace's equation shows (in action) the gradient and divergence and curl. But real applications solve a discrete form of the differential equation. That innocent sentence contains two essential tasks: to discretize the continuous equation into
Ku = f, and to solve for u. Those steps are at the center of scientific computing,
and this book concentrates on two methods for each of them:
Continuous to discrete (Chapter 3)
Solving discrete Ku = f
(Chapter 7)
1. The finite element method 2. Finite difference methods
1. Direct elimination 2. Iterations with preconditioning
The matrix K can be very large (and very sparse). A good solution algorithm is usually a better investment than a supercomputer. Multigrid is quite remarkable.
Chapter 6 turns to initial-value problems, first for wave and heat equations
(convection + diffusion). Waves allow shocks, diffusion makes the solution smooth.
These are at the center of scientific computing. The diffusion equation has become
CHAPTER 1 APPLIED LINEAR ALGEBRA
1.1 FOUR SPECIAL MATRICES
An m by n matrix has m rows and n columns and mn entries. We operate on those rows and columns to solve linear systems Ax= band eigenvalue problems Ax= >.x. From inputs A and b (and from software like MATLAB) we get outputs x and>.. A fast stable algorithm is extremely important, and this book includes fast algorithms.
One purpose of matrices is to store information, but another viewpoint is more important for applied mathematics. Often we see the matrix as an "operator." A acts on vectors x to produce Ax. The components of x have a meaningdisplacements or pressures or voltages or prices or concentrations. The operator A also has a meaning-in this chapter A takes differences. Then Ax represents pressure differences or voltage drops or price differentials.
Before we turn the problem over to the machine-and also after, when we interpret A\b or eig(A)-it is the meaning we want, as well as the numbers.
This book begins with four special families of matrices-simple and useful, absolutely basic. We look first at the properties of these particular matrices Kn, Cn, Tn, and Bn. (Some properties are obvious, others are hidden.) It is terrific to practice linear algebra by working with genuinely important matrices. Here are K 2 , K3 , K 4 in the first family, with -1 and 2 and -1 down the diagonals:
K2 = [-12 -l2J
-1 0
-Il 2 -1
-1 2
0 -1
What is significant about K 2 and K3 and K 4 , and eventually then by n matrix Kn? I will give six answers in the same order that my class gave them-starting with four properties of the K's that you can see immediately.
1
2 Chapter 1 Applied Linear Algebra
1. These matrices are symmetric. The entry in row i, column j also appears
in row j, column i. Thus Kij = Kj., on opposite sides of the main diagonal.
Symmetry can be expressed by transposing the whole matrix at once: K = KT.
2. The matrices Kn are sparse. Most of their entries are zero when n gets large.
K 1000 has a million entries, but only 1000 + 999 + 999 are nonzero.
3. The nonzeros lie in a "band" around the main diagonal, so each Kn is banded. The band has only three diagonals, so these matrices are tridiagonal.
Because K is a tridiagonal matrix, Ku = f can be quickly solved. If the unknown
vector u has a thousand components, we can find them in a few thousand steps
(which take a small fraction of a second). For a full matrix of order n = 1000, solving Ku = f would take hundreds of millions of steps. Of course we have to ask if the
linear equations have a solution in the first place. That question is coming soon.
4. The matrices have constant diagonals. Right away that property wakes up Fourier. It signifies that something is not changing when we move in space or time. The problem is shift-invariant or time-invariant. Coefficients are constant. The tridiagonal matrix is entirely determined by the three numbers -1, 2, -1. These are actually "second difference matrices" but my class never says that.
The whole world of Fourier transforms is linked to constant-diagonal matrices. In
signal processing, the matrix D = K/4 is a "highpass filter." Du picks out the rapidly
varying (high frequency) part of a vector u. It gives a convolution with ¼(-1, 2, -1). We use these words to call attention to the Fourier part (Chapter 4) of this book.
Mathematicians call K a Toeplitz matrix, and MATLAB uses that name:
The command K = toeplitz([2 -1 zeros(l, 2)]) constructs K4 from row 1.
Actually, Fourier will be happier if we make two small changes in Kn. Insert -1 in the southwest and northeast corners. This completes two diagonals (which circle around). All four diagonals of C4 wrap around in this "periodic matrix" or "cyclic convolution" or circulant matrix:
Circulant matrix
2 -1 0
_ C4 -
[
-1 0
2 -1
-1 2
-1] -[ -
toeplitz([ 2 -1 0 -1]).
-1 0 -1
This matrix is singular. It is not invertible. Its determinant is zero. Rather than
computing that determinant, it is much better to identify a nonzero vector u that
solves C4u = 0. (If C4 had an inverse, the only solution to C4u = 0 would be the zero vector. We could multiply by C41 to find u = 0.) For this matrix, the column vector u of all ones (printed as u = (1, 1, 1, 1) with commas) solves C4u = 0.
1.1 Four Special Matrices 3
The columns of C add to the zero column. This vector u = ones(4, 1) is in the
nullspace of C4 . The nullspace contains all solutions to Cu= 0.
Whenever the entries along every row of a matrix add to zero, the matrix is certainly singular. The same all-ones vector u is responsible. Matrix multiplication
Cu adds the column vectors and produces zero. The constant vector u = (1, 1, 1, 1)
or u = (c, c, c, c) in the nullspace is like the constant C when we integrate a function.
In calculus, this "arbitrary constant" is not knowable from the derivative. In linear
algebra, the constant in u = (c, c, c, c) is not knowable from Cu= 0.
5. All the matrices K = Kn are invertible. They are not singular, like Cn. There
is a square matrix K-1 such that K-1K =I= identity matrix. And if a square
matrix has an inverse on the left, then also K K-1 = I. This "inverse matrix"
is also symmetric when K is symmetric. But K-1 is not sparse.
Invertibility is not easy to decide from a quick look at a matrix. Theoretically,
one test is to compute the determinant. There is an inverse except when <let K = 0,
because the formula for K-1 includes a division by det K. But computing the deter-
minant is almost never done in practice! It is a poor way to find u = K-1f. What we actually do is to go ahead with the elimination steps that solve Ku =
f. Those steps simplify the matrix, to make it triangular. The nonzero pivots on the main diagonal of the triangular matrix show that the original K is invertible.
(Important: We don't want or need K-1 to find u = K-1f. The inverse would be a
full matrix, with all positive entries. All we compute is the solution vector u.)
6. The symmetric matrices Kn are positive definite. Those words might be new. One goal of Chapter 1 is to explain what this crucial property means (K4 has it, C4 doesn't). Allow me to start by contrasting positive definiteness with invertibility, using the words "pivots" and "eigenvalues" that will soon be familiar. Please notice the Appendix that summarizes linear algebra.
(Pivots) An invertible matrix has n nonzero pivots. A positive definite symmetric matrix has n positive pivots.
(Eigenvalues) An invertible matrix has n nonzero eigenvalues. A positive definite symmetric matrix has n positive eigenvalues.
Positive pivots and eigenvalues are tests for positive definiteness, and C4 fails those tests because it is singular. Actually C4 has three positive pivots and eigenvalues, so it almost passes. But its fourth eigenvalue is zero (the matrix is
singular). Since no eigenvalue is negative (>. 2'.: 0), C4 is positive semidefinite.
The pivots appear on the main diagonal in Section 1.3, when solving Ku = f by elimination. The eigenvalues arise in K x = >.x. There is also a determinant test
for positive definiteness (not just <let K > 0). The proper definition of a symmetric
positive definite matrix (it is connected to positive energy) will come in Section 1.6.
4 Chapter 1 Applied Linear Algebra
Changing Kn to Tn
After Kn and Cn, there are two more families of matrices that you need to know. They are symmetric and tridiagonal like the family Kn. But the (1, 1) entry in Tn is changed from 2 to 1:
(1)
That top row (T stands for top) represents a new boundary condition, whose meaning we will soon understand. Right now we use T3 as a perfect example of elimination. Row operations produce zeros below the diagonal, and the pivots are circled as they are found. Two elimination steps reduce T to the upper triangular U.
Step 1. Add row 1 to row 2, which leaves zeros below the first pivot.
l l [0 - ~ Step 2. Add the new row 2 to row 3, which produces U.
T = [ -01 - 12 -10
O
8~ 1 ~o-0l -1 s~2
1 0 0-1 =U.
0 -1 2
O -1 2
0 0 1
All three pivots of T equal 1. We can apply the test for invertibility (three nonzero pivots). T3 also passes the test for positive definiteness (three positive pivots). In fact every Tn in this family is positive definite, with all its pivots equal to 1.
That matrix U has an inverse (which is automatically upper triangular). The
exceptional fact for this particular u-1 is that all upper triangular entries are l's:
1 -1 0 i-l
u-1 = [ o 1 -1
0 0 1
1
1 : ] - triu(ones(3)).
(2)
0
This says that the inverse of a 3 by 3 "difference matrix" is a 3 by 3 "sum matrix."
This neat inverse of U will lead us to the inverse of T in Problem 2. The product
u-1u is the identity matrix I. U takes differences, and u-1 takes sums. Taking
differences and then sums will recover the original vector (u1, u2 , u3):
1.1 Four Special Matrices 5
Changing Tn to Bn
The fourth family Bn has the last entry also changed from 2 to 1. The new boundary condition is being applied at both ends (B stands for both). These matrices Bn are symmetric and tridiagonal, but you will quickly see that they are not invertible. The Bn are positive semidefinite but not positive definite:
-1] Bn(n, n) = 1 B2 = [ -11 1
(3)
Again, elimination brings out the properties of the matrix. The first n - l pivots
l l l l will all equal 1, because those rows are not changed from Tn. But the change from 2
to 1 in the last entry of B produces a change from 1 to 0 in the last entry of U:
0-1 0 0 0-1
0-1 0 0 0-1
=U.
(4)
0 -1 1
0 0 0
There are only two pivots. (A pivot must be nonzero.) The last matrix U is certainly not invertible. Its determinant is zero, because its third row is all zeros. The constant vector (1, 1, 1) is in the nullspace of U, and therefore it is in the nullspace of B:
The whole point of elimination was to simplify a linear system like Bu = 0, without
changing the solutions. In this case we could have recognized non-invertibility in the matrix B, because each row adds to zero. Then the sum of its three columns is the zero column. This is what we see when B multiplies the vector (1, 1, 1).
Let me summarize this section in four lines (all these matrices are symmetric):
Kn and Tn are invertible and (more than that) positive definite.
Cn and Bn are singular and (more than that) positive semidefinite.
The nullspaces of Cn and Bn contain all the constant vectors
u = (c, c, ... , c). Their columns are dependent.
The nullspaces of Kn and Tn contain only the zero vector
u = (0, 0, ... , 0). Their columns are independent.
6 Chapter 1 Applied Linear Algebra
Matrices in MATLAB
It is natural to choose MATLAB for linear algebra, but the reader may select another system. (Octave is very close, and free. Mathematica and Maple are good for symbolic calculation, LAPACK provides excellent codes at no cost in netlib, and there are many other linear algebra packages.) We will construct matrices and operate on them in the convenient language that MATLAB provides.
Our first step is to construct the matrices Kn. For n = 3, we can enter the 3 by 3 matrix a row at a time, inside brackets. Rows are separated by a semicolon
K = [ 2 -1 0; -1 2 -1; 0 -1 2 ]
For large matrices this is too slow. We can build Ks from "eye" and "ones":
eye(8) = 8 by 8 identity matrix ones(7, 1) = column vector of seven l's
The diagonal part is 2*eye(8). The symbol* means multiplication! The -l's above the diagonal of Ks have the vector -ones(7, 1) along diagonal 1 of the matrix E:
Superdiagonal of -l's
E = -diag(ones(7, 1), 1)
The -1 's below the diagonal of Ks lie on the diagonal numbered -1. For those we could change the last argument in E from 1 to -1. Or we can simply transpose E, using the all-important symbol E' for ET. Then K comes from its three diagonals:
Tridiagonal matrix K 8
K = 2 * eye(8) + E + E'
Note: The zeroth diagonal (main diagonal) is the default with no second argument, so eye(8)= diag(ones(8,1)). And then diag(eye(8)) = ones(8, 1).
The constant diagonals make K a Toeplitz matrix. The toeplitz command produces K, when each diagonal is determined by a single number 2 or -1 or 0. Use the zeros vector for the 6 zeros in the first row of Ks:
Symmetric Toeplitz rowl = [2 -1 zeros(l,6)]; K = toeplitz(rowl)
For an unsymmetric constant-diagonal matrix, use toeplitz(coll, rowl). Taking coll= [1 -1 0 0] and row1 = [1 0 0] gives a 4 by 3 backward difference matrix. It has two nonzero diagonals, l's and -1 's.
To construct the matrices T and B and C from K, just change entries as in the last three lines of this M-file that we have named KTBC.m. Its input is the size n, its output is four matrices of that size. The semicolons suppress display of K, T, B, C:
function [K,T,B,C) = KTBC(n)
% Create the four special matrices assuming n>l
K = toeplitz ([2 -1 zeros(l,n-2)]);
T = K· T(l 1) = 1·
= B =
K I '
'
B(l I 1)
'
1·I B(n r n) = 1·I
C = K; C(l,n) = -1; C(n,1) = -1;
1.1 Four Special Matrices 7
If we happened to want their determinants (we shouldn't!), then with n = 8
[ det(K) det(T) det(B) det(C) ] produces the output [ 9 1 0 0]
One more point. MATLAB could not store Kn as a dense matrix for n = 10,000. The 108 entries need about 800 megabytes unless we recognize K as sparse. The code sparseKTBC.m on the course website avoids storing (and operating on) all the zeros. It has K, T, B, or C and n as its first two arguments. The third argument is 1 for sparse, 0 for dense (default Ofor narg= 2, no third argument).
The input to Sparse MATLAB includes the locations of all nonzero entries. The command A =sparse(i,j,s,m,n) creates an m by n sparse matrix from the vectors i, j, s that list all positions i, j of nonzero entries s. Elimination by lu(A) may produce additional nonzeros (called fill-in) which the software will correctly identify. In the normal "full" option, zeros are processed like all other numbers.
It is best to create the list of triplets i, j, s and then call sparse. Insertions A(i, j) =
s or A(i, j) = A(i, j) + s are more expensive. We return to this point in Section 3.6.
The sparse KTBC code on the website uses spdiags to enter the three diagonals. Here is the toeplitz way to form K 8, all made sparse by its sparse vector start:
vsp = sparse([2 -1 zeros(l, 6)]) % please look at each output Ksp= toeplitz(vsp) % sparse format gives the nonzero positions and entries bsp = Ksp(:, 2) % colon keeps all rows of column 2, so bsp = column 2 of Ksp usp = Ksp\bsp % zeros in Ksp and bsp are not processed, solution: usp(2) = 1 uuu = full(usp) % return from sparse format to the full uuu = [O 1 0 0 0 0 0 OJ
Note The open source language Python is also very attractive and convenient.
The next sections will use all four matrices in these basic tasks of linear algebra:
(1.2) The finite difference matrices K, T, B, C include boundary conditions
(1.3) Elimination produces pivots in D and triangular factors in LDLT
r- (1.4) Point loads produce inverse matrices K-1 and 1
(1.5) The eigenvalues and eigenvectors of K, T, B, C involve sines and cosines.
You will see K\f in 1.2, lu(K) in 1.3, inv(K) in 1.4, eig(K) in 1.5, and chol(K) in 1.6.
I very much hope that you will come to know and like these special matrices.
8 Chapter 1 Applied Linear Algebra
a WORKED EXAMPLES a
1.1 A Bu = f and Cu = f might be solvable even though B and C are singular !
Show that every vector f = Bu has Ji+ h + · · · + fn = 0. Physical meaning:
the external forces balance. Linear algebra meaning: Bu= f is solvable when f is perpendicular to the all-ones column vector e = (1, 1, 1, 1, ... ) = ones(n, 1).
Solution Bu is a vector of "differences" of u's. Those differences always add to zero:
All terms cancel in (u1 - u2) + (-u1 + 2u2 - u3) + (-u2 + 2u3 - u4) + (-u3 + U4) = 0.
The dot product withe= (1, 1, 1, 1) is that sum JTe =Ji+ h + h + / 4 = 0.
Dot product
(f 1 * e in MATLAB).
A second explanation for JT e = 0 starts from the fact that Be = 0. The all-ones vector e is in the nullspace of B. Transposing f = Bu gives /T = uT BT, since the transpose of a product has the individual transposes in reverse order. This matrix B is symmetric so BT = B. Then
Conclusion Bu= f is only solvable when f is perpendicular to the all-ones vector e. ( The same is true for Cu = f. Again the differences cancel out.) The external forces balance when the f's add to zero. The command B\f will produce Inf because B is square and singular, but the "pseudoinverse" u = pinv(B) * f will succeed. (Or add a zero row to Band f before the command B\f, to make the system rectangular.)
1.1 B The "fixed-free" matrix H changes the last entry of K from 2 to 1. Connect H to the "free-fixed" T (first entry = 1) by using the reverse identity matrix J:
comes from JT J via the reverse identity J
Chapter 2 shows how T comes from a tower structure (free at the top). H comes from a hanging structure (free at the bottom). Two MATLAB constructions are
H = toeplitz([2 -1 0]); H(3,3) = 1 or J= fliplr(eye(3)); H=J*T*J
1.1 Four Special Matrices 9
Solution JT reverses the rows of T. Then JTJ reverses the columns to give H:
JT= (rows)
(JT)J = H.
( columns too)
We could reverse columns first by T J. Then J(TJ) would be the same matrix Has
(JT)J. The parentheses never matter in (AB)C = A(BC) !
Any permutation matrix like J has the rows of the identity matrix I in some order. There are six 3 by 3 permutation matrices because there are six orders for the numbers 1, 2, 3. The inverse of every permutation matrix is its transpose. This particular J is symmetric, so it has J = JT = J- 1 as you can easily check:
With back= 3:-1:1, reordering to JTJ is H = T(back, back) in MATLAB.
Problem Set 1. 1
Problems 1-4 are about T-1 and Problems 5-8 are about K-1.
1 The inverses of T3 and T4 (with T11 = 1 in the top corner) are
r Guess 5- 1 and multiply by T5 . Find a simple formula for the entries of T;; 1 on
and below the diagonal (i;?: j), and then on and above the diagonal (i :'.S; j).
2 Compute r 3- 1 in three steps, using U and u-1 in equation (2):
1. Check that T3 = UTU, where U has l's on the main diagonal and -l's
along the diagonal above. Its transpose UT is lower triangular.
2. Check that uu-1 = I when u-1 has l's on and above the main diagonal. 3. Invert UTU to find r 3- 1 = (u-1) (u-1?. Inverses come in reverse order!
3 The difference matrix U = U5 in MATLAB is eye(5)-diag(ones(4,1),l). Con-
struct the sum matrix S from triu(ones(5)). (This keeps the upper triangular
part of the 5 by 5 all-ones matrix.) Multiply U * S to verify that S = u-1.
4 For every n, Sn = Un-l is upper triangular with ones on and above the diagonal. For n = 4 check that SST produces the matrix T4 -l predicted in Problem 1.
Why is SST certain to be a symmetric matrix?
10 Chapter 1 Applied Linear Algebra
5
i: The inverses of K 3 and K 4 (please also invert K 2 ) have fractions d~t = {,
First guess the determinant of K = K5 . Then compute det(K) and inv(K) and
det(K)* inv(K)-any software is allowed.
6 (Challenge problem) Find a formula for the i, j entry of K41 below the diagonal (i 2: j). Those entries grow linearly along every row and up every column.
(Section 1.4 will come back to these important inverses.) Problem 7 below is developed in the Worked Example of Section 1.4.
7 A column u times a row vT is a rank-one matrix uvT. All columns are
multiples of u, and all rows are multiples of vT. T4 -l - K 4-l has rank 1:
l [ l ! 16 12 8 4
r-i _ 4
K_1 _ 4 -5
[
12 8
9 6 3 6 4 2
!
4 3
[4 3 2 1]
5 2
4 321
1
Write K 3-T3 in this special form uvT. Predict a similar formula for T3- 1 -K31.
8 (a) Based on Problem 7, predict the i,j entry of T5- 1 -K51 below the diagonal.
r (b) Subtract this from your answer to Problem 1 (the formula for 5- 1 when
i 2:: j). This gives the not-so-simple formula for K51.
9 Following Example 1.1 A with C instead of B, show that e = (l, 1, 1, 1) is perpendicular to each column of C4 . Solve Cu = f = (l, -1, 1, -1) with the
singular matrix C by u = pinv(C) * f. Try u = C\e and C\f, before and after
adding a fifth equation O= 0.
10 The "hanging matrix" Hin Worked Example 1.1 B changes the last entry of K 3
to H33 = 1. Find the inverse matrix from s-1 = JT- 1J. Find the inverse also
from H = UUT (check upper times lower triangular!) and s-1 = (u-1)Tu-1.
11 Suppose U is any upper triangular matrix and J is the reverse identity matrix in 1.1 B. Then JU is a "southeast matrix". What geographies are UJ and JU J? By experiment, a southeast matrix times a northwest matrix is __ .
12 Carry out elimination on the 4 by 4 circulant matrix C4 to reach an upper
triangular U (or try [L, U] = lu(C) in MATLAB). Two points to notice: The
last entry of U is __ because C is singular. The last column of U has new nonzeros. Explain why this "fill-in" happens.
1.1 Four Special Matrices 11
13 By hand, can you factor the circulant C4 (with three nonzero diagonals, allowing wraparound) into circulants L times U (with two nonzero diagonals, allowing wraparound so not truly triangular)?
14 Gradually reduce the diagonal 2, 2, 2 in the matrix K 3 until you reach a singular matrix M. This happens when the diagonal entries reach _ _ . Check the determinant as you go, and find a nonzero vector that solves Mu= 0.
Questions 15-21 bring out important facts about matrix multiplication.
15 How many individual multiplications to create Ax and A2 and AB?
An x nXn x 1
Am x n Bn x p = (AB)mxp
16 You can multiply Ax by rows (the usual way) or by columns (more important). Do this multiplication both ways:
By rows
[ 42
35]
[ 21]
=
[inner inner
product product
using using
row row
1] 2
By columns
[ 24
3] [1] = 1 [2] + 2 [3]
5 2
4
5
= [combination]
of columns
17 The product Ax is a linear combination of the columns of A. The equations Ax= b have a solution vector x exactly when bis a __ of the columns.
Give an example in which bis not in the column space of A. There is no solution
to Ax = b, because b is not a combination of the columns of A.
18 Compute C = AB by multiplying the matrix A times each column of B:
Thus, A * B(:,j) = C(:,j).
19 You can also compute AB by multiplying each row of A times B:
[1 [
2 4
3] 5
2
2] = [2 * row 1 + 3 * row 2] = [8
4 4*row1+5*row2 *
16] * •
A solution to Bx= 0 is also a solution to (AB)x = 0. Why? From
12 Chapter 1 Applied Linear Algebra
20 The four ways to find AB give numbers, columns, rows, and matrices:
1 (rows ofA) times (columns ofB) 2 A times (columns of B) 3 (rows of A) times B 4 (columns of A) times (rows ofB)
C(i,j) = A(i,:) * B(:,j) C(:,j)=A* B(:,j) C(i,:)=A(i,:)*B for k=l:n, C=C+A(:,k)*B(k,:); end
Finish these 8 multiplications for columns times rows. How many for n by n?
21 Which one of these equations is true for all n by n matrices A and B?
AB=BA
(AB)A = A(BA)
(AB)B = B(BA)
(AB)2 = A2B2 .
22 Use n = 1000; e = ones(n, 1); K = spdiags([-e, 2 * e, -e], -1: 1, n, n); to enter
K 1000 as a sparse matrix. Solve the sparse equation Ku= e by u = K\e. Plot
the solution by plot(u).
23 Create 4-component vectors u, v, w and enter A= spdiags([u, v, w], -1: 1, 4, 4). Which components of u and ware left out from the -1 and 1 diagonals of A?
24 Build the sparse identity matrix I = sparse(i, j, s, 100, 100) by creating vectors i, j, s of positions i, j with nonzero entries s. (You could use a for loop.) In this case speye(lO0) is quicker. Notice that sparse(eye(lO000)) would be a disaster, since there isn't room to store eye(lO000) before making it sparse.
25 The only solution to Ku= 0 or Tu= 0 is u = 0, so Kand Tare invertible. For proof, suppose ui is the largest component of u. If -ui-I + 2ui - ui+1 is zero,
this forces Ui-l = ui = Ui+1· Then the next equations force every Uj = ui. At
the end, when the boundary is reached, -Un-I+ 2un only gives zero if u = 0.
Why does this "diagonally dominant" argument fail for B and C?
26 For which vectors v is toeplitz(v) a circulant matrix (cyclic diagonals)?
27 (Important) Show that the 3 by 3 matrix K comes from AJ'A0:
0~] -1 1 0
Ao= [ O -1 1
is a "difference matrix"
0 0 -1
Which column of A0 would you remove to produce A1 with T = ATA1 ? Which
column would you remove next to produce A2 with B = A;fA2 ? The differ-
ence matrices A 0 , A1 , A 2 have 0, 1, 2 boundary conditions. So do the "second
differences" K, T, and B.
1.2 Differences, Derivatives, Boundary Conditions 13
1.2 DIFFERENCES, DERIVATIVES, BOUNDARY CONDITIONS
This important section connects difference equations to differential equations. A typical row in our matrices has the entries -1, 2, -1. We want to see how those numbers are producing a second difference (or more exactly, minus a second difference). The second difference gives a natural approximation to the second derivative. The matrices Kn and Cn and Tn and Bn are all involved in approximating the equation
-
d2u dx2
=
f(x)
with boundary conditions at x = 0 and x = 1.
(1)
Notice that the variable is x and not t. This is a boundary-value problem and not an
initial-value problem. There are boundary conditions at x = 0 and x = 1, not initial
conditions at t = 0. Those conditions are reflected in the first and last rows of the
matrix. They decide whether we have Kn or Cn or Tn or Bn.
We will go from first differences to second differences. All four matrices have the special form ATA (matrix times transpose). Those matrices AT and A produce first differences, and ATA produces second differences. So this section has two parts:
I. Differences replace derivatives (and we estimate the error).
II.
We
solve
-
d2u dx2
=
1 and
then
-
bo2u (box) 2
=
1 using
the
matrices
Kand
T.
Part I: Finite Differences
How can we approximate du/dx, the slope of a function u(x)? The function might be known, like u(x) = x2 . The function might be unknown, inside a differential equation.
We are allowed to use values u(x) and u(x + h) and u(x - h), but the stepsize h = b.x
is fixed. We have to work with bou/box without taking the limit as b.x --> 0. So we
have "finite differences" where calculus has derivatives.
Three different possibilities for b.u are basic and useful. We can choose a forward
difference or a backward difference or a centered difference. Calculus textbooks typi-
cally take b.u = u(x +box) - u(x), going forward to x + b.x. I will use u(x) = x2 to
test the accuracy of all three differences. The derivative of x 2 is 2x, and that forward difference bo+ is usually not the best! Here are bo+, bo_, and bo0 :
Forward difference Backward difference Centered difference
u(x+h}-u(x) h
u(x}-u{x-h) h
u(x+h)-u(x-h) 2h
The test gives The test gives The test gives
(x + h) 2 -
h
x2 = 2x+h
x2 -
(x -
h) 2 =2x-h
h
(x+h)2 -(x-h)2 =2x
2h
For u = x2 , the centered difference is the winner. It gives the exact derivative 2x, while forward and backward miss by h. Notice the division by 2h (not h).
14 Chapter 1 Applied Linear Algebra
Centered is generally more accurate than one-sided, when h = b.x is small. The reason is in the Taylor series approximation of u(x + h) and u(x - h). These first few
terms are always the key to understanding the accuracy of finite differences:
Forward Backward
u(x + h) = u(x) + hu'(x) + ½h2u"(x) + ½h3u111(x) +. ••
(2)
u(x - h) = u(x) - hu'(x) + ½h2u"(x) - ¼h3u111 (x) + •••
(3)
Subtract u(x) from each side and divide by h. The forward difference is first order accurate because the leading error ½hu"(x) involves the first power of h:
One-sided is first order -u-(-x'-+--h-')---u'-(-x-')- = U'(X) + -lhU "(X) + · · ·
(4)
h
2
The backward difference is also first order accurate, and its leading error is - ½hu" (x). For u(x) = x 2 , when u"(x) = 2 and u111 = 0, the error ½hu" is exactly h.
For the centered difference, subtract (3) from (2). Then u(x) cancels and also ½h2u"(x) cancels (this gives extra accuracy). Dividing by 2h leaves an h2 error:
Centered is second order -u-('-x-+---h'-)---u-'(-x----h-')- = u'(x) + -1h2u"'(x) + ••• (5)
2h
6
The centered error is O(h2) where the one-sided errors were O(h), a significant change.
If h = 110 we are comparing a 1% error to a 10% error.
The matrix for centered differences is antisymmetric (like the first derivative):
Centered difference
matrix
aoT = -ao
-1 0 1 -1 0 1
U,-1
u,
u,+1 u,+2
Ui+l - Ui-1 Ui+2 - U,
Transposing b.0 reverses -1 and 1. The transpose of the forward difference matrix b.+
would be - (backward difference) = -b._. Centered difference quotients b.0u/2h are the average of forward and backward (Figure 1.1 shows u(x) = x 3 ).
u=8
,,,
1
u= x3
2
Forward - - - Backward········· Centered - • - • -
b.+u = 8 - 1 = 7 h b._u = 1 - 0 = 1 h b.0u = 8 - 0 = 4(2h) Notice 4 = ½(1 + 7)
Figure 1.1: b.+/h and b._/h and b.0/2h approximate u' =3x2 =3 by 7, 1, 4 at x = 1.
The second difference b.2u = 8 - 2(1) + 0 is exactly u" = 6x = 6 with step h = 1.
1.2 Differences, Derivatives, Boundary Conditions 15
Second Differences from First Differences
We can propose a basic problem in scientific computing. Find a finite difference approximation to this linear second order differential equation:
- ::~ = J(x) with the boundary conditions u(O) = 0 and u(l) = 0. (6)
The derivative of the derivative is the second derivative. In symbols d/dx(du/dx) is d2u/dx2. It is natural that the first difference of the first difference should be the second difference. Watch how a second difference 6_6+u is centered around point i:
Difference of difference
Those numbers 1, -2, 1 appear on the middle rows of our matrices K and T and B and C (with signs reversed). The denominator is h2 = (6x)2 under this second
difference. Notice the right positions for the superscripts 2, before u and after x:
+ + d2u ~ 6 2u _ u(x -6.a::) - 2u(a::) u(a:: - -6.a::)
Second difference dx2 ~ ~x2 -
( -6.a::) 2
(8)
What is the accuracy of this approximation? For u(x + h) we use equa-
tion (2), and for u(x - h) we use (3). The terms with hand h3 cancel out:
6 2u(x) = u(x + h) - 2u(x) + u(x - h) = h2u"(x) + ch4u""(x) + ••• (9)
Dividing by h2 , 6 2u/(6x) 2 has second order accuracy (error ch2u 1111 ). We get
that extra order because ~ 2 is centered. The important tests are on u(x) = x 2 and u(x) = x3 . The second difference divided by (6x) 2 gives the correct second derivative:
Perfection for u = a::2
(x + h)2 - 2x2 + (x - h)2 = 2
h2
(10)
An equation d2u/dx2 = constant and its difference approximation 6 2u/(6x) 2 = constant will have the same solutions. Unless the boundary conditions get in the way...
The Important Multiplications
You will like what comes now. The second difference matrix (with those diagonals 1, -2, 1) will multiply the most important vectors I can think of. To avoid any problems at boundaries, I am only looking now at the internal rows-which are the same for K, T, B, C. These multiplications are a beautiful key to the whole chapter.
.6.2 (Squares) =2·(0nes) .6.2 (Ramp) =Delta .6.2 (Sines) =.X • (Sines) (11)
16 Chapter 1 Applied Linear Algebra
Here are column vectors whose second differences are special:
Constant Linear Squares
(1, 1, ... , 1) (1, 2, ... , n) (1 2 ,22 , ... ,n2 )
ones(n,1) (1:n)' (in MATLAB notation) (1:n)'."2
Delta at k Step at k Ramp at k
(0, 0, 1, 0, ... , 0) (0, 0, 1, 1, ... , 1) (0, 0, 0, 1, ... , n - k)
[zeros(k-1,1) ; 1 ; zeros(n-k,1)] [zeros(k-1,1) ; ones(n-k+l,1)] [zeros(k-1,1) ; 0:(n-k)']
Sines Cosines Exponentials
(sin t, ... , sin nt) (cost, ... ,cosnt)
(eit ' ... ' eint)
sin((l:n)'*t) cos( (1:n)'*t) exp((l:n)'*i*t)
Now come the multiplications in each group. The second difference of each vector is analogous (and sometimes equal!) to a second derivative.
I. For constant and linear vectors, the second differences are zero:
.6.2 (constant) .6.2 (linear)
.Jrn [~] [·.i -2 1 1 -2
(12)
[~] [ .i -2 1 1 -2 ..Jm
For squares, the second differences are constant (the second derivative of x 2 is 2). This is truly important: Matrix multiplication confirms equation (8) .
.6.2 (squares)
(13)
Then Ku= ones for u = -(squares)/2. Below come boundary conditions.
II. Second differences of the ramp vector produce the delta vector:
(14)
Section 1.4 will solve Ku= 8 with boundary conditions included. You will see
how each position of the "1" in delta produces a column u in K-1 or r-1.
For functions: The second derivative of a ramp max(x, 0) is a delta function.
1.2 Differences, Derivatives, Boundary Conditions 17
III. Second differences of the sine and cosine and exponential produce 2 cost - 2 times those vectors. (Second derivatives of sin xt and cos xt and eixt produce -t2 times the functions.) In Section 1.5, sines or cosines or exponentials will be eigenvectors of K, T, B, C with the right boundary conditions.
a 2 (sines) a 2 (cosines)
l l -2 1
['; 1 -2
[ .J l l ·; -2 1
J ~! 1 -2
[
ssiinn2t t
sin3t
ssiinn2t t
=(2cost-2) [ sin 3t
(15)
sin4t
sin4t
ccooss2t t [ cos3t
= (2 cost - 2) [ c~~o:st
(16)
cos4t
cos4t
a'(exponent;als) [. ·; -2 1
1 -2
l [ l l e2itit
1 e3~t
ee2itit
=(2cost-2) [ e3it
(17)
. . . e4it
e4it
The eigenvalue 2 cost - 2 is easiest to see for the exponential in (17). It is exactly
eit - 2 + e-it, which factors out in the matrix multiplication. Then (16) and (15),
cosine and sine, are the real and imaginary parts of (17). Soon twill be 0.
Part 11: Finite Difference Equations
We have an approximation /:). 2uj(!:).x)2 to the second derivative d2u/dx2. So we can quickly create a discrete form of -d2u/dx2 = f(x). Divide the interval [O, 1] into
equal pieces of length h = /:).x. If that meshlength is h = n~l, then n + 1 short
subintervals will meet at x = h, x = 2h, ... , x = nh. The extreme endpoints are
x = 0 and x = (n + l)h = 1. The goal is to compute approximations u1, ... , Un to the true values u( h), ... , u(nh) at those n meshpoints inside the [O, 1] interval.
[I] Ulliillowns u-
U2
0 u(x)
Uo and Un+l known from boundary conditions
0 h 2h
nh (n+l)h = 1
Figure 1.2: The discrete unknowns u1, ... , Un approximate the true u(h), ... , u(nh).
Certainly -d2/dx2 is replaced by our -1, 2, -1 matrix, divided by h2 and with
the minus sign built in. What to do on the right side? The source term f(x) might be a smooth distributed load or a concentrated point load. If f (x) is smooth as in
sin 21rx, the first possibility is to use its values /i at the meshpoints x = ifl.x.
18 Chapter 1 Applied Linear Algebra
Finite difference equation
(18)
The first equation (i = 1) involves u0. The last equation (i = n) involves Un+1• The boundary conditions given at x = 0 and x = 1 will determine what to do. We now solve the key examples with fixed ends u(0) = u0 = 0 and u(l) = Un+1 = 0.
= Example 1 Solve the differential and difference equations with constant force J(x) 1:
-
d2u dx2
=
1
with
u(0) = 0 (fixed end)
and
u(l) = 0
(19)
+ -U·+l i
= 2U· - U· I
i
i-
1
with
u0 = 0
and
Un+I = 0
(20)
h2
Solution For every linear equation, the complete solution has two parts. One "particular solution" is added to any solution with zero on the right side (no force):
Complete solution
= + Ucomplete Uparticular Unullspace •
(21)
This is where linearity is so valuable. Every solution to Lu = 0 can be added to one
particular solution of Lu = f. Then by linearity L(Upart + Unun) = f + 0.
Particular solution
-
d2u dx2
=
1
is solved by Upart(x) = -½x2
Nullspace solution
-
d2u dx2
=
0
is solved by Unun(x) = Cx + D.
The complete solution is u(x) = -½x2 + Cx + D. The boundary conditions will tell
us the constants C and Din the nullspace part. Substitute x = 0 and x = 1:
Boundary condition at x = 0 u(0) = 0 gives D = 0 Boundary condition at x = 1 u(l) = 0 gives C = ½
Solution
In Figure 1.3, the finite difference solution agrees with this u(x) at the meshpoints. This is special: (19) and (20) have the same solution (a parabola). A second difference of ui = i2h2 gives exactly the correct second derivative of u = x2. The
second difference of a linear ui = ih matches the second derivative (zero) of u = x:
(i+l)h-2ih+(i-l)h_
h d2 ( )-0
h2
- 0 mate es dx2 x - .
(22)
The combination of quadratic i2h2 and linear ih (particular and nullspace solutions) is exactly right. It solves the equation and satisfies the boundary conditions. We can
1.2 Differences, Derivatives, Boundary Conditions 19
4
32
3
32
u(x) = ½(x - x 2)
Ui = ½(ih - i2h2)
d2u Solution to - dx2
= 1
b,,.2u Solution to - b.x2 = ones
0
h
X
2h 3h 4h = 1
Figure 1.3: Finite differences give an exact match of u(x) and ui for the special case -u" = 1 with uo = Un+1 = 0. The discrete values lie right on the parabola.
set x = ih in the true solution u(x) = ½(x - x2) to find the correct ui.
2 2 Finite difference solution Ui = 1(i.h - i·2h2) has Un+1 = 1(1 - 12) = 0.
It is unusual to have this perfect agreement between ui and the exact u(ih). It is
also unusual that no matrices were displayed. When 4h = 1 and/= 1, the matrix is
k K 3/h2 = 16K3 . Then ih = ¼,¾,¾leads to ui = ~' 3~,
Ku=f
l [ l [ l 16 [ -12 -12 -1o 34//3322 = 11 .
(23)
0 -1 2 3/32
1
The -1 's in columns 0 and 4 were safely chopped off because u 0 = u4 = 0.
A Different Boundary Condition
Chapter 2 will give a host of physical examples leading to these differential and difference equations. Right now we stay focused on the boundary condition at x = 0, changing from zero height to zero slope:
d2u
. du
- dx2 = f(x) with dx (0) = 0 (free end) and u(l) = 0.
(24)
With this new boundary condition, the difference equation no longer wants u 0 = 0. Instead we could set the first difference to zero: u 1 - u0 = 0 means zero slope in the
first small interval. With u0 = u 1, the second difference -u0 + 2u1 - u2 in row 1
reduces to u1 - u2 . The new boundary condition changes Kn to Tn.
Example 2 Solve the differential and difference equations starting from zero slope:
Free-fixed
d2u
du
- - = 1 with dx(0) = 0 and u(l) = 0
(25)
dx2
+ -U·'+1 h22U'· - U•· -1 = 1 with U1 -h Uo = 0 and Un+I = 0
(26)
20 Chapter 1 Applied Linear Algebra
Continuous solution u(O) = -& Discrete solution = = u0 u1 166
5
16 Free end x f-----+--+--+-----tt-¼ Fixed end
0 h 2h 3h 4h = 1
Figure 1.4: ui is below the true u(x) = ½(1 - x2 ) by an error ½h(l - x).
Solution u(x) = -½x2 + Cx +Dis still the complete solution to -u" = 1. But the new boundary condition changes the constants to C = 0 and D = ½:
:~ = 0 at x = 0 gives C = 0. Then u = 0 at x = 1 gives D = ~-
The free-fixed solution is u(x) = ½(1 - x 2 ).
Figure 1.4 shows this parabola. Example 1 was a symmetric parabola, but now
the discrete boundary condition u1 = u0 is not exactly satisfied by u(x). So the
finite difference u/s show small errors. We expect an O(h) correction because of the
forward difference (u1 - u0 )/h. For n = 3 and h = ¼,
gives
Figure 1.4 shows that solution (not very accurate, with three meshpoints). The discrete points will lie much nearer the parabola for large n. The error is h(l - x)/2.
For completeness we can go ahead to solve Tnu = h2 ones(n, 1) for every n:
1 0 Tn = (backward)(-forward) = [ -1 1 0
-1
The inverses of those first difference matrices are sum matrices (triangles of l's). The inverse of Tn is an upper triangle times a lower triangle:
1.2 Differences, Derivatives, Boundary Conditions 21
For n = 3 we recover 1 + 2 + 3 = 6 and 2 + 3 = 5, which appeared in (27). There is a formula for those sums in (30) and it gives the approximation U(
Discrete solution
(31)
This answer has Un+I = 0 when i = n+ 1. And u0 = u1 so the boundary conditions are satisfied. That starting value u0 = ½nh is below the correct value u(0) = ½= ½(n+ 1)h only by ½h. This ½his the first-order error caused by replacing the zero slope at x = 0 by the one-sided condition u1 = ua.
The worked example removes that O(h) error by centering the boundary condition.
MATLAB Experiment
The function u(x) = cos(1rx/2) satisfies free-fixed boundary conditions u'(0) = 0 and u(l) = 0. It solves the equation -u" = f = (1r/2)2cos(1rx/2). How close to u are the solutions U and V of the finite difference equations TnU = f and Tn+l V = g?
h = 1/(n+l); u = cos(pi*(l:n) '*h/2); c = (pi/2)1'2; f = C*U; % Usual matrix T
U = h*h*T\f;
% Solution u1, ... , Un with one-sided condition u0 = u1
e = 1 - U(l)
% First-order error at x = 0
g = [c/2;f]; T = ... ; % Create Tn+I as in equation (34) below. Note g(l) = /(0)/2
V = h*h*T\g;
% Solution u0 , . .. , Un with centered condition u_ 1 = u1
E = 1 - V(l)
% Second-order error from centering at x = 0
Take n = 3, 7, 15 and test T\f with appropriate T and f. Somehow the right mesh has (n + ½)h = 1, so the boundary point with u' = 0 is halfway between meshpoints. You should find e proportional to h and E proportional to h2. A big difference.
ci WORKED EXAMPLES a
1.2 A Is there a way to avoid this O(h) error from the one-sided boundary condition u1 = u0? Constructing a more accurate difference equation is a perfect example of numerical analysis. This crucial decision comes between the modeling step (by a differential equation) and the computing step (solving the discrete equation).
Solution The natural idea is a centered difference (u1 - u_1)/2h = 0. This copies the true u'(0) = 0 with second order accuracy. It introduces a new unknown u_1, so extend the difference equation to x = 0. Eliminating u_1 leaves size(T)= n+l:
22 Chapter 1 Applied Linear Algebra
Centering the boundary condition multiplies /(0) by ½- Try n = 3 and h = ¼:
l [ l gives
uUo1 [ U2
_!_ 87.50 . (33)
16 6.0
U3
3.5
Those numbers ui are exactly equal to the true u(x) = ½(1 - x2) at the nodes. We
are back to perfect agreement with the parabola in Figure 1.4. For a varying load
f(x) and a non-parabolic solution to -u" = f(x), the centered discrete equation will
have second-order errors O(h2).
Problem 21 shows a very direct approach to u0 - u1 = ½h2/(0).
1.2 B When we multiply matrices, the backward ,6._ times the forward .6.+ gives 1 and -2 and 1 on the interior rows:
We didn't get K 3, for two reasons. First, the signs are still reversed. And the first
l corner entry is -1 instead of -2. The boundary rows give us T3, because .6._(.6.+u)
sets to zero the first value .6.+u = (u1 - u0)/h (not the value of u itself!). -1 1 0 +- .6.2u boundary row with u0 = u1
-T3 = [ 1 -2 1 +- .6.2u typical row u2 - 2u1 + u0
(35)
0 1 -2 +- .6.2u boundary row with u4 = 0
The boundary condition at the top is zero slope. The second difference u2 - 2u1+ u0 becomes u2 - u1 when u0 = u1. We will come back to this, because in my experience
99% of the difficulties with differential equations occur at the boundary.
u{O) = O, u{l) = 0 u'(O) = O, u'{l) = 0 u'(O) = O, u{l) = 0
u(O) = u{l), u'{O) = u'{l)
K has Uo = Un+l = 0 B has Uo = U1, Un= Un+l
T has Uo = u1, Un+l = 0 C has Uo = Un, U1 = Un+l
An infinite tridiagonal matrix, with no boundary, maintains 1, -2, 1 down its infinitely long diagonals. Chopping off the infinite matrix would be the same as pretending that Uo and Un+1 are both zero. That leaves Kn, which has 2's in the corners.
1.2 Differences, Derivatives, Boundary Conditions 23
Problem Set 1.2
1 What are the second derivative u"(x) and the second difference !).2Un? Use 6(:z:).
-2A
u(x) = { Ax ifx:S0
Un= {An ifn:S0
-A
0
Bx if X ~ 0
Bn if n ~ 0
B
2B
u(x) and U are piecewise linear with a corner at 0.
2 Solve the differential equation -u"(x) = 8(x) with u(-2) = 0 and u(3) = 0.
The pieces u = A(x + 2) and u = B(x - 3) meet at x = 0. Show that the
vector U = (u(-l),u(0),u(l),u(2)) solves the corresponding matrix problem
KU= F = (0, 1,0,0).
Problems 3-12 are about the "local accuracy" of finite differences.
3 The h2 term in the error for a centered difference (u(x + h) - u(x - h))/2h is ¼h2u"'(x). Test by computing that difference for u(x) = x3 and x4 .
4 Verify that the inverse of the backward difference matrix/)._ in (28) is the sum
matrix in (29). But the centered difference matrix 1).0 = (!).+ + !)._)/2 might not be invertible! Solve !).0u = 0 for n = 3 and n = 5.
5 In the Taylor series (2), find the number a in the next term ah4u""(x) by testing
u(x) = x4 at x = 0.
6 For u(x) = x4, compute the second derivative and second difference !)..2uj(!)..x) 2 .
From the answers, predict c in the leading error in equation (9).
7 Four samples of u can give fourth-order accuracy for du/dx at the center:
-u2 + 8u1 - 8u_1 + u_2 _ du bh4 d5u
12h
- dx + dx5 + •••
l. Check that this is correct for u = l and u = x2 and u = x4 .
2. Expand u2, u1, u_1, u_2 as in equation (2). Combine the four Taylor series to discover the coefficient b in the h4 leading error term.
8 Question Why didn't I square the centered difference for a good !)..2?
Answer A centered difference of a centered difference stretches too far:
/)..0 /).0
+ Un+2 - 2Un Un-2
2h 2h Un=
(2h) 2
The second difference matrix now has 1, O, -2, O, 1 on a typical row. The accuracy is no better and we have trouble with Un+2 at the boundaries.
Can you construct a fourth-order accurate centered difference for d2u/dx2 , choosing the right coefficients to multiply u2, u1, u0 , u_1, U-2?
24 Chapter 1 Applied Linear Algebra
9 Show that the fourth difference /:).4uj(!:).x) 4 with coefficients 1, -4, 6, -4, 1 ap-
proximates d4u / dx4 by testing on u = x, x2, x 3, and x4:
-u2 -- 4-u1-+-6(Du-o-x)--44-u_1-+-u_-2 = -dd4xu4 + (which leading error?).
10 Multiply the first difference matrices in the order !:).+!:).-, instead of!:)._!:).+ in
equation (27). Which boundary row, first or last, corresponds to the boundary
condition u = 0? Where is the approximation to u' = 0?
11 Suppose we want a one-sided approximation to ~~ with second order accuracy:
ru(x) + su(x - !:).x) + tu(x - 2D.x) _ du
/:).x
- dx
c
ior
_
u -
1 ,
x,
2
x
.
Substitute u = 1, x, x2 to find and solve three equations for r, s, t. The corresponding difference matrix will be lower triangular. The formula is "causal."
12 Equation (7) shows the "first difference of the first difference." Why is the left side within O(h2) of¾ [u:+½ - u:_½]? Why is this within O(h2) of u~?
Problems 13-19 solve differential equations to test global accuracy.
13 Graph the free-fixed solution uo, ... , Us with n = 7 in Figure 1.4, in place of
the existing graph with n = 3. You can use formula (30) or solve the 7 by 7
system. The O(h) error should be cut in half, from h = ¼to ½-
14 (a) Solve -u" = 12x2 with free-fixed conditions u'(0) = 0 and u(l) = 0. The complete solution involves integrating f(x) = 12x2 twice, plus Cx + D.
(b) With h = n~l and n = 3, 7, 15, compute the discrete u1, ... , Un using Tn:
ui+1 - 2hu2, + u,-1 = 3(i.h)2 w.ith u0 = 0 and Un+1 = 0 .
Compare ui with the exact answer at the center point x = ih = ½. Is the
error proportional to h or h2?
15 Plot the u = cos 41rx for 0 ::; x ::; 1 and the discrete values u, = cos 41rih at the meshpoints x = ih = n~1 . For small n those values will not catch the
oscillations of cos 1rx. How large is a good n? How many mesh points per
oscillation?
16 Solve -u" = cos 41rx with fixed-fixed conditions u(0) = u(l) = 0. Use K 4 and
Ks to compute u1, ... , Un and plot on the same graph with u(x):
ui+l -
2u, + u,_1 _
h2
-
cos
4 .h 7fZ
with
Uo = Un+l = 0.
1.2 Differences, Derivatives, Boundary Conditions 25
17 Test the differences b.0u = (ui+l - Ui-i) and b.2u = ui+1 - + 2ui ui-1 on u(x) = eax. Factor out eax (this is why exponentials are so useful). Expand
eat:,,x = l + ab.x + (ab.x) 2/2 + • • • to find the leading error terms.
18 Write a finite difference approximation (using K) with n = 4 unknowns to
::~ = x with boundary conditions u(O) = 0 and u(l) = 0.
Solve for u 1, u 2, u 3, u4 . Compare them to the true solution. 19 Construct a centered difference approximation using K/h2 and b.0/2h to
- d2u + du = l with u(O) = 0 and u(l) = 0. dx2 dx
Separately use a forward difference b..+U/h for du/dx. Notice b.0=(b.++b.._)/2.
Solve for the centered u and uncentered U with h = 1/5. The true u(x) is the particular solution u = x plus any A+Bex. Which A and B satisfy the boundary
conditions ? How close are u and U to u(x) ?
20 The transpose of the centered difference b.0 is -b..0 (antisymmetric). That is like the minus sign in integration by parts, when f(x)g(x) drops to zero at ±oo:
Integration by parts
l oo f(x)-ddg dx = - loo -ddf g(x) dx.
-oo
X
-oo X
00
00
-oo
-00
E E Hint: Change i + 1 to i in /i9i+l, and change i - 1 to i in /i9i-1·
21 Use the expansion u(h) = u(O)+hu'(O)+½h2u"(O)+ •• with zero slope u'(O) = 0
and -u" = f (x) to derive the top boundary equation u 0 - u 1 = ½h2 f (0).
This factor ½removes the O(h) error from Figure 1.4: good.
26 Chapter 1 Applied Linear Algebra
1.3 ELIMINATION LEADS TO K = LDLT
This book has two themes-how to understand equations, and how to solve them.
This section is about solving a system of n linear equations Ku = f. Our method will
be Gaussian elimination (not determinants and not Cramer's Rule!). All software packages use elimination on positive definite systems of all sizes. MATLAB uses
u = K\f (known as backslash), and [L, U] = lu(K) for triangular factors of K.
The symmetric factorization K = LDLT takes two extra steps beyond the
solution itself. First, elimination factors K into LU: lower triangular L times upper
triangular U. Second, the symmetry of K leads to U = DLT. The steps from K to U
and back to K are by lower triangular matrices-rows operating on lower rows.
K = LU and K = LDLT are the right "matrix ways" to understand elimination.
The pivots go into D. This is the most frequently used algorithm in scientific computing (billions of dollars per year) so it belongs in this book. If you met LU and LDLT earlier in a linear algebra course, I hope you find this a good review.
Our first example is the 3 by 3 matrix K = K 3. It contains the nine coefficients (two of them are zero) in the linear equations Ku = f. The vector on the right side is not so important now, and we choose f = (4,0,0).
Ku=f
2u1 - U2
= 4
+ = -U1 2u2 - U3 0
+ = - U2 2U3 0
The first step is to eliminate u1 from the second equation. Multiply equation 1 by
½and add to equation 2. The new matrix has a zero in the 2, 1 position-where
u1 is eliminated. I have circled the first two pivots:
l [ l [00-(1D-1o
UU12
o -1 2 ¾
[ l Jhi+½!i is
h
2u1 - U2
= 4
23 U2 - U3 = 2
+ = - U2 2U3 0
The next step looks at the 2 by 2 system in the last two equations. The pivot d2 = ~
is circled. To eliminate u2 from the third equation, add ~ of the second equation.
Then the matrix has a zero in the 3, 2 position. It is now the upper triangular U.
!, 1, The three pivots 2,
are on its diagonal:
2u1 - U2
= 4
lS
23 U2 - U3 = 2
34 U3 = 34
Forward elimination is complete. Note that all pivots and multipliers were decided
by K, not f. The right side changed from f = (4, 0, 0) into the new c = (4, 2, !), and
back substitution can begin. Triangular systems are quick to solve (n2 operations).
1.3 Elimination Leads to K = LDLT 27
Solution by back substitution. The last equation gives u3 = 1. Substituting into the second equation gives ~u2 - 1 = 2. Therefore U2 = 2. Substituting into the first equation gives 2u1 - 2 = 4. Therefore u 1 = 3 and the system is solved.
The solution vector is u = (3, 2, 1). When we multiply the columns of K by those
three numbers, they add up to the vector f. I always think of a matrix-vector multiplication Ku as a combination of the columns of K. Please look:
Combine columns for Ku
That sum is f = (4, 0, 0). Solving a system Ku= f is exactly the same as finding
a combination of the columns of K that produces the vector f. This is important. The solution u expresses f as the "right combination" of the columns (with coefficients 3, 2, 1). For a singular matrix there might be no combination that
produces f, or there might be infinitely many combinations.
Our matrix K is invertible. When we divide Ku = (4, 0, 0) by 4, the right side
becomes (1, 0, 0) which is the first column of I. So we are looking at the first column
of K K-1 = I. We must be seeing the first column of K- 1. After dividing the previous
u = (3, 2, 1) by 4, the first column of K-1 must be ¾, ¾, ¼:
-! l[ !: :] [ ::] [-i :~ Column 1
of inverse
~
= I
(3)
If we really want K- 1, its columns come from Ku= columns of I. So K- 1 = K\I.
Note about the multipliers: When we know the pivot in row j, and we know the entry to be eliminated in row i, the multiplier eii is their ratio:
M uIti.p11. er
= 0
tij
-en-tr-y t-o .el-im-in-at-e -(in-ro-w-i)
(4)
pivot
(in row j)
The convention is to subtract (not add) eii times one equation from another equation. The multiplier €21 at our first step was-½ (the ratio of -1 to 2). That step added ½ of row 1 to row 2, which is the same as subtracting -½ (row 1) from row 2.
Subtract .f.ij times the pivot row j from row i. Then the i, j entry is 0.
- l The 3, 1 entry in the lower left corner of K was already zero. So there was nothing
to eliminate and that multiplier was €31 = 0. The last multiplier was €32 =
Elimination Produces K = LU
Now put those multipliers €21, €31 , €32 into a lower triangular matrix L, with ones on the diagonal. L records the steps of elimination by storing the multipliers. The
28 Chapter 1 Applied Linear Algebra
upper triangular U records the final result, and here is the connection K = LU:
K=LU
l [ l [ -! l . i J [ -12 -12 -10 =
-½1
o o
O
O2 -1 o
(5)
0 -1 2
0 --3 1 0 0 -3
The short and important and beautiful statement of Gaussian elimination is that
K = LU. Please multiply those two matrices L and U.
The lower triangular matrix L times the upper triangular matrix U recovers the original matrix K. I think of it this way: L reverses the elimination steps. This takes U back to K. LU is the "matrix form" of elimination and we have to emphasize it.
Suppose forward elimination uses the multipliers in L to change the rows of K into the rows of U (upper triangular). Then K is factored into L times U.
Elimination is a two-step process, going forward (down) and then backward (up). Forward uses L, backward uses U. Forward elimination reached a new right side
c. (The elimination steps are really multiplying by L-1 to solve Le = f.) Back substitution on Uu = c leads to the solution u. Then c = L-1f and u = u-1c combine into u = u-1L- 1f which is the correct u = K-1f.
Go back to the example and check that Le = f produces the right vector e:
Le= f
e1 = 4
e2 = 2 e3 = 34
as in (1).
By keeping the right side up to date in elimination, we were solving Le = f. Forward elimination changed f into c. Then back substitution quickly finds u = (3, 2, 1).
You might notice that we are not using "inverse matrices" anywhere in this computation. The inverse of K is not needed. Good software for linear algebra (the LAPACK library is in the public domain) separates Gaussian elimination into a factoring step that works on K, and a solving step that works on f:
Step 1. Factor K into LU
Step 2. Solve Ku = f for u
[L, U] = lu(K) in MATLAB
Le = f forward for c, then Uu = e backward
The first step factors K into triangular matrices L times U. The solution step computes e (forward elimination) and then u (back substitution). MATLAB should almost never be asked for an inverse matrix. Use the backslash command K\J to compute u, and not the inverse command inv(K) * f:
Step 1 + 2 Solve Ku= f by u = K\J (Backslash notices symmetry).
1.3 Elimination Leads to K = LDLT 29
The reason for two subroutines in LAPACK is to avoid repeating the same steps on K when there is a new vector f*. It is quite common (and desirable) to have several right sides with the same K. Then we Factor only once; it is the expensive part. The quick subroutine Solve finds the solutions u, u*, ... without computing K-1. For multiple f's, put them in the columns of a matrix F. Then use K\f.
Singular Systems
Back substitution is fast because U is triangular. It generally fails if a zero appears in the pivot. Forward elimination also fails, because a zero entry can't remove a nonzero entry below it. The official definition requires that pivots are never zero. If we meet a zero in the pivot position, we can exchange rows-hoping to move a nonzero entry up into the pivot. An invertible matrix has a full set of pivots.
When the column has all zeros in the pivot position and below, this is our signal that the matrix is singular. It has no inverse. An example is C.
l Example 1 Add -l's in the corners to get the circulant C. The first pivot is d1 = 2
with multipliers f21 = €31 = -½- The second pivot is d2 = But there is no third pivot:
In the language of linear algebra, the rows of C are linearly dependent. Elimination found a combination of those rows (it was their sum) that produced the last row of all zeros in U. With only two pivots, C is singular.
Example 2 Suppose a zero appears in the second pivot position but there is a nonzero below it. Then a row exchange produces the second pivot and elimination can continue. This example is not singular, even with the zero appearing in the 2, 2 position:
o~l leads to [~1 ~1
. Exchange rows to U =
Exchange rows on the right side of the equations too! The pivots become all ones, and elimination succeeds. The original matrix is invertible but not positive definite. (Its determinant is minus the product of pivots, so -1, because of the row exchange.)
The exercises show how a permutation matrix P carries out this row exchange. The triangular Land U are now the factors of PA (so that PA= LU). The original A had no LU factorization, even though it was invertible. After the row exchange, PA has its rows in the right order for LU. We summarize the three possibilities:
30 Chapter 1 Applied Linear Algebra
Elimination on an n by n matrix A may or may not require row exchanges:
No row exchanges to get n pivots: A is invertible and A= LU. Row exchanges by P to get n pivots: A is invertible and PA= LU. No way to find n pivots: A is singular. There is no inverse matrix A - 1 .
Positive definite matrices are recognized by the fact that they are symmetric and they need no row exchanges and all pivots are positive. We are still waiting for the meaning of this property-elimination gives a way to test for it.
Symmetry Converts K = LU to K = LDLT
The factorization K = LU comes directly from elimination-which produces U by
the multipliers in L. This is extremely valuable, but something good was lost. The
original K was symmetric, but L and U are not symmetric:
l [-½ ~ l [ -i -! l Symmetry
is lost
K = [-~ :~ -~ =
2
0 1 2
O --3 1
=LU. -3
The lower factor L has ones on the diagonal. The upper factor U has the pivots.
f This is unsymmetric, but the symmetry is easy to recover. Just separate the pivots
into a diagonal matrix D, by dividing the rows of U by the pivots 2, ~, and
Symmetry is recovered
(6)
Now we have it. The pivot matrix D is in the middle. The matrix on the left is still L. The matrix on the right is the transpose of L:
The symmetric factorization of a symmetric matrix is K = LDLT .
This triple factorization preserves the symmetry. That is important and needs to be highlighted. It applies to LDLT and to every other "symmetric product" ATCA.
The product LDLT is automatically a symmetric matrix, if D is diagonal. More than that, ATCA is automatically symmetric if C is symmetric. The factor A is not necessarily square and C is not necessarily diagonal.
The reason for symmetry comes directly from matrix multiplication. The transpose of any product AB is equal to BTAT. The individual transposes come in the opposite order, and that is just what we want:
This is LDLT again.
1.3 Elimination Leads to K = LDLT 31
(LT)T is the same as L. Also DT = D (diagonal matrices are symmetric). The displayed line says that the transpose of LDLT is LDLT. This is symmetry.
The same reasoning applies to ATCA. Its transpose is ATCT(AT)T. If C is symmetric (C = CT), then this is ATCA again. Notice the special case when the matrix in the middle is the identity matrix C = I:
For any rectangular matrix A, the product ATA is square and symmetric.
We will meet these products ATA and ATCA many times. By assuming a little more about A and C, the product will be not only symmetric but positive definite.
The Determinant of Kn
f ! !. Elimination on K begins with the three pivots and and This pattern continues.
The ith pivot is i1l · The last pivot is n!l · The product of the pivots is the
determinant, and cancelling fractions produces the answer n + 1:
Determinant of Kn
(7)
The reason is that determinants always multiply: (det K) = (det L)(det U). The
triangular matrix L has l's on its diagonal, so det L = 1. The triangular matrix U has
the pivots on its diagonal, so det U = product of pivots = n + 1. The LU factorization
not only solves Ku = f, it is the quick way to compute the determinant.
There is a similar pattern for the multipliers that appear in elimination:
Multipliers
n-1
fn,n-1 = - -n- •
(8)
All other multipliers are zero. This is the crucial fact about elimination on a tridiagonal matrix, that L and U are bidiagonal. If a row of K starts with p zeros (no elimination needed there), then that row of L also starts with p zeros. If a column of K starts with q zeros, then that column of U starts with q zeros. Zeros inside the band can unfortunately be "filled in" by elimination. This leads to the fundamental problem of reordering the rows and columns to make the p's and q's as large as possible. For our tridiagonal matrices the ordering is already perfect.
You may not need a proof that the pivots i-J:- l and the multipliers - i-: 1 are correct.
i
For completeness, here is row i of L multiplying columns i -
1,
i,
and
i
i
+
1
of
U:
l ~ [ _i•i1 1 ] [ i-~i-111
_
o
1
=[-12-l]=rowiofK.
The Thomas algorithm in Example 4 will solve tridiagonal systems in 8n steps.
32 Chapter 1 Applied Linear Algebra
Positive Pivots and Positive Determinants
I will repeat one note about positive definiteness (the matrix must be symmetric to start with). It is positive definite if all n pivots are positive. We need n nonzero pivots for invertibility, and we need n positive pivots (without row exchanges) for positive definiteness.
For 2 by 2 matrices [b~], the first pivot is a. The only multiplier is f21 = b/a.
Subtracting b/a times row 1 from row 2 puts the number c - (b2/a) into the second pivot. This is the same as (ac - b2 )/a. Please notice Land LT in K = LDLT:
2 by 2 factors
(9)
These pivots are positive when a> 0 and ac - b2 > 0. This is the 2 by 2 test:
[ ~ ~ ] is positive definite if and only if a > 0 and ac - b2 > 0 .
Out of four examples, the last three fail the test:
[~ ~ ] [! :] [~ ~]
pos def pos semidef indef
[
-2 -3
-3] -8
neg def
The matrix with b = 4 is singular (pivot missing) and positive semidefinite. The
matrix with b = 6 has ac - b2 = -20. That matrix is indefinite (pivots +2 and
-10). The last matrix has a< 0. It is negative definite even though its determinant
is positive.
Example 3 K3 shows the key link between pivots and upper left determinants:
[IiJ. ][~ -~ [tJ [~ -~ l l 0 1 2 =
0 -- 1
-
l ·
1
3
3
J, 1- The upper left determinants of K are 2, 3, 4. The pivots are their ratios 2,
All
upper left determinants are positive exactly when all pivots are positive.
Operation Counts
The factors L and U are bidiagonal when K = LU is tridiagonal. Then the work of elimination is proportional to n (a few operations per row) . This is very different from the number of additions plus multiplications to factor a full matrix. The leading terms are }n3 with symmetry and Jn3 in general. For n = 1000, we are comparing thousands of operations (quick) against hundreds of millions.
1.3 Elimination Leads to K = LDLT 33
Between those extremes (tridiagonal versus full) are band matrices. There might be w nonzero diagonals above and also below the main diagonal. Each row operation needs a division for the multiplier, and w multiplications and additions. With w
entries below each pivot, that makes 2w2 + w to clear out each column. With n
columns the overall count grows like 2w2n, still only linear in n.
On the right side vector f, forward elimination and back substitution use w multiply-adds per row, plus one division. A full matrix needs n2 multiply-adds on
the right side, [(n-1) + (n-2) + .. ·+ 1] forward and [1 + 2 + .. ·+ (n-1)] backward.
in This is still much less than 3 total operations on the left side. Here is a table:
Operation Count (Multiplies+ adds) Factor: Find L and U Solve: Forward and back on/
Full
~~ 32 n3
2n2
Banded Tridiagonal
2w2n+ wn
3n
4wn+n
5n
Example 4 The Thomas algorithm solves tridiagonal Au = f in 8n floating-point
operations. A has b1, ... , bn on the diagonal with a2, ... , an below and c1, ... , Cn-1 above.
Exchange equations i and i + 1 if lbil < lai+l I at step i. With no exchanges:
for i from 1 to n - 1
ci - c;/b; Ii - /;/bi bi+1 - bi+l - ai+1 ci /i+1 - /i+1 - ai+di
end forward loop Un f-- fn/bn for i from n - 1 to 1
Ui - Ii - CiUi+l end backward loop
Example 5 Test the commands [L, U, P] = lu(A) and P * A - *L U on A1, A 2 , A3 .
H:~ -il !~] !:] Ai-
A, - [~
A, - [~
For A1 , the permutation matrix P exchanges which rows? Always PA= LU. For A2 , MATLAB exchanges rows to achieve the largest pivots column by column. For A3 , which is not positive definite, rows are still exchanged: P -=I= I and U -=I= DLT.
Problem Set 1.3
1 Extend equation (5) into a 4 by 4 factorization K 4 = L4D4LJ. What is the
determinant of K4?
2
1. Find the inverses of the 3 by 3 matrices Land D and LT in equation (5).
2. Write a formula for the ith pivot of K.
3. Check that the i,j entry of L41 is j/i (on and below the diagonal) by multiplying L4L41 or L41L4.
34 Chapter 1 Applied Linear Algebra
3
1. Enter the matrix K5 by the MATLAB command toeplitz([2 -1 0 0 0]).
2. Compute the determinant and the inverse by det(K) and inv(K). For a neater answer compute the determinant times the inverse.
3. Find the L, D, U factors of K 5 and verify that the i,j entry of L-1 is j/i.
! ! ~]. 4 The vector of pivots for K 4 is d = [t
This is d = (2:5)./(1:4), using
MATLAB's counting vector i : j = (i,i + 1, ... ,j). The extra. makes the
division act a component at a time. Find f, in the MATLAB expression for
L = eye(4) - diag(f, -1) and multiply L * diag(d) * L' to recover K 4 .
5 If A has pivots 2, 7, 6 with no row exchanges, what are the pivots for the upper left 2 by 2 submatrix B (without row 3 and column 3)? Explain why.
6 How many entries can you choose freely in a 5 by 5 symmetric matrix K? How many can you choose in a 5 by 5 diagonal matrix D and lower triangular L (with ones on its diagonal)?
7 Suppose A is rectangular (m by n) and C is symmetric (m by m).
1. Transpose ATCA to show its symmetry. What shape is this matrix? 2. Show why ATA has no negative numbers on its diagonal.
8 Factor these symmetric matrices into A= LDLT with the pivots in D:
! ;] ! ~] A = [
and A = [
and
A =
21 [0
211
o;]
9 The Cholesky command A= chol(K) produces an upper triangular A with K = ATA. The square roots of the pivots from D are now included on the diagonal of A (so Cholesky fails unless K = KT and the pivots are positive).
Try the chol command on K 3 , T3 , B3 , and B3 + eps * eye(3).
10 The all-ones matrix ones(4) is positive semidefinite. Find all its pivots (zero not allowed). Find its determinant and try eig(ones(4)). Factor it into a 4 by 1 matrix L times a 1 by 4 matrix LT.
11 The matrix K = ones(4) + eye(4)/100 has all l's off the diagonal, and 1.01
down the main diagonal. Is it positive definite? Find the pivots by lu(K) and eigenvalues by eig(K). Also find its LDLT factorization and inv(K).
12 The matrix K =pasca1(4) contains the numbers from the Pascal triangle (tilted to fit symmetrically into K). Multiply its pivots to find its determinant. Factor K into LLT where the lower triangular L also contains the Pascal triangle!
13 The Fibonacci matrix [t A] is indefinite. Find its pivots. Factor it into LDLT.
Multiply (1, 0) by this matrix 5 times, to see the first 6 Fibonacci numbers.
1.3 Elimination Leads to K = LDLT 35
14 If A = LU, solve by hand the equation Ax = f without ever finding A itself.
Solve Le = f and then Ux = c (then LUx = Le is the desired equation Ax = !) .
Le = f is forward elimination and Ux = c is back substitution:
ol U=
[2
8 3
~
15 From the multiplication LS show that
is the inverse of S = [-t'2~ 1 ] ·
-t'31 0 1
S subtracts multiples of row 1 from lower rows. L adds them back.
16 Unlike the previous exercise, which eliminated only one column, show that
is not the inverse of S = [-t'2~ 1 ] ·
-t'31 -t'32 1
Write Las L1L2 to find the correct inverse L-1 = £"21£":11 (notice the order):
and
17 By trial and error, find examples of 2 by 2 matrices such that
1. LU f= UL 2. A2 = -1, with real entries in A 3. B2 = 0, with no zeros in B
4. CD= -DC, not allowing CD= 0
18 Write down a 3 by 3 matrix with row 1 - 2 * row 2 + row 3 = 0 and find a
similar dependence of the columns-a combination of columns that gives zero.
19 Draw these equations in their row form (two intersecting lines) and find the solution (x, y). Then draw their column form by adding two vectors:
20 True or false: Every matrix A can be factored into a lower triangular L times an upper triangular U, with nonzero diagonals. Find Land U when possible:
!] When is A = [~ = LU?
36 Chapter 1 Applied Linear Algebra
1.4 INVERSES AND DELTA FUNCTIONS
We are comparing matrix equations with differential equations. One is Ku = f, the other is -u" = J(x). The solutions are vectors u and functions u(x). This comparison is quite remarkable when special vectors f and functions J(x) are the forcing terms on the right side. With a uniform load J(x) = constant, both solutions are parabolas (Section 1.2). Now comes the opposite choice with f = point load:
In the matrix equation, take f = Dj = jth column of the identity matrix.
In the differential equation, take J(x) = 8(x - a) = delta function at x = a.
The delta function may be partly or completely new to you. It is zero except at one point. The function 8(x - a) represents a "spike" or a "point load" or an "impulse"
concentrated at the single point x = a. The solution u(x) or u(x, a), where a gives
the placement of the load, is the Green's function. When we know the Green's
function for all point loads 8(x - a), we can solve -u" = J(x) for any load f(x).
In the matrix equation Ku= 8j, the right side is column j of I. The solution is u = column j of K- 1. We are solving KK- 1 = I, column by column. So we are finding the inverse matrix, which is the "discrete Green's function." Like x and a, the discrete (K-1)ij locates the solution at point i from a load at point j.
The amazing fact is that the entries of K-1 and r-1 fall exactly on the solutions
u(x) to the continuous problems. The figures show this, and so will the text.
u(x)
slope drops from ½to -½
slope drops from Oto -1 free
fixed
u(O) = 0
1
2
u(l) = 0
u 1(0) = 0
1
2
u(l) = 0
r Figure 1.5: Middle columns of h K51 and h 5- 1 lie on solutions to -u" = 8(x - ½)-
Concentrated Load
Figure 1.5 shows the form of u(x), when the load is at the halfway point x = ½-
Away from this load, our equation is u" = 0 and its solution is u = straight line. The problem is to match the two lines (before and after ½) with the point load.
Example 1 Solve -u" = point load with fixed-fixed and free-fixed endpoints:
-
d2u
-dx2 =
f (x) =
8(x -
-1 ) 2
with
{
fixed: u(O) = 0 free: u'(O) = 0
and and
fixed: u(l) = 0 fixed: u(l) = 0
1.4 Inverses and Delta Functions 37
Solution In the fixed-fixed problem, the up and down lines must start and end at u = 0.
At the load point x = ½, the function u(x) is continuous and the lines meet. The slope
drops by 1 because the delta function has "area= 1". To see the drop in slope, integrate
both sides of -u" = 8 across x = ½:
= right d2u
I.right
I. - - dx
8(x -
1 - ) dx
left
dx 2
left
2
gives
_ (du) + (du) = 1. (l)
dx right
dx left
½ -½- The fixed-fixed case has u;eft = and u~ight =
The fixed-free case has u;eft = 0 and
u~ight = -1. In every case the slope drops by 1 at the unit load.
These solutions u(x) are ramp functions, with a corner. In the rest of the
section we move the load to x = a and compute the new ramps. (The fixed-fixed
ramp will have slopes 1 - a and -a, always dropping by 1.) And we will find discrete
r- ramps for the columns of the inverse matrices K-1 and 1. The entries will increase
linearly, up to the diagonal of K-1, then go linearly down to the end of the column.
It is remarkable to be finding exact solutions and exact inverses. We take this chance to do it. These problems are exceptionally simple and important, so why not?
Example 2 Move the point load to x = a. Everywhere else u" = 0, so the solution
is u = Ax + B up to the load. It changes to u = Cx + D beyond that point. Four
equations (two at boundaries, two at x = a) determine those constants A, B, C, D:
Boundary Conditions
Jump/No Jump Conditions at x = a
fixed u(O) = 0 :
B
0
fixed u(l) = 0 : C + D = 0
No jump in u: Aa+B
Ca+D
Drop by 1 in u' :
A = C+l
Substitute B = 0 and D = -C into the first equation on the right:
Aa + 0 = Ca - C and A = C + 1 give slopes A = 1 - a and C = -a. (2)
Then D = -C = a produces the solution in Figure 1.6. The ramp is u = (1- a)x going up and u = a(l - x) going down. On the right we show a column of K- 1 , computed in
equation (10): linear up and down.
u(x)
Column 5 of K61
u=a(l-x)
2-5
7
=Cx+D
0
a--§7. 1
1 2 3 4 5 6
Row i
Figure 1.6: Response at x to a load at a= ¥ (fixed-fixed boundary). For the matrix
K61, the entries in each column go linearly up and down like the true u(x).
38 Chapter 1 Applied Linear Algebra
Delta Function and Green's Function
We solve -u" = 8(x - a) again, by a slightly different method (same answer). A particular solution is a ramp. Then we add all solutions Cx+D to u" = 0. Example 2
used the boundary conditions first, and now we use them last.
You must recognize that c5(x) and c5(x - a) are not true functions! They are zero
except at one point x = 0 or x = a where the function is "infinite"-too vague. The
spike is "infinitely tall and infinitesimally thin." One definition is to say that the
integral of c5(x) is the unit step function S (x) in Figure 1.7. "The area is 1 under the
spike at x = 0." No true function could achieve that, but c5(x) is extremely useful.
The standard ramp function is R = 0 up to the corner at x = 0 and then R = x.
Its slope dR/dx is a step function. Its second derivative is d2R/dx2 = 8(x).
_ c 8(x)=ddSx
S ( x )d= x d R - L -
0 Delta function c5(x)
X
0 Step function S(x)
X
0
Ramp function R(x)
Figure 1.7: The integral of the delta function is the step function, so c5(x) = dS/dx. The integral of the step S(x) is the ramp R(x), so c5(x) = d2R/dx2 .
Now shift the three graphs by a. The shifted ramp R(x - a) is O then x - a. This has first derivative S(x - a) and second derivative c5(x - a). In words, the first
derivative jumps by 1 at x = a so the second derivative is a delta function.
Since our equation -d2u/dx2 = 8(x - a) has a minus sign, we want the slope to drop by 1. The descending ramp - R(x - a) is a particular solution to -u" = c5(x - a).
Main point We have u" = 0 except at x = a. So u is a straight line on the left and right of a. The slope of this ramp drops by l at a, as required by -u" = 8(x - a). The downward ramp -R(x - a) is one particular solution, and we can add Cx + D.
The two constants C and D came from two integrations.
The complete solution (particular + nullspace) is a family of ramps:
Complete solution
-
d2u dx2
= 8(x -
a)
is solved by
u(x) = -R(x - a)+ Cx + D.
(3)
The constants C and D are determined by the boundary conditions.
u(0) = -R(0 - a)+ C • 0 + D = 0. Therefore D must be zero.
From u(l) = 0 we learn that the other constant (in Cx) is C = l - a:
u(l) = -R(l - a)+ C • l + D = a - l + C = 0. Therefore C = l - a.
So the ramp increases with slope 1 - a until x = a. Then it decreases to u(l) = 0. When we substitute R = 0 followed by R = x - a, we find the two parts:
1.4 Inverses and Delta Functions 39
FIXED
{ (1 - a)x for x::; a
ENDS u(x) = -R(x - a)+ (l - a)x = (1 - x)a for x 2': a
(4)
The slope of u(x) starts at 1 - a and drops by 1 to -a. This unit drop in the slope means a delta function for -d2u/dx2 , as required. The first part (1 - a)x gives u(0) = 0, the second part (1 - x)a gives u(l) = 0.
Please notice the symmetry between x and a in the two parts! Those are like i
and j in the symmetric matrix (K-1)ij = (K-1)ji· The response at x to a load
at a equals the response at a to a load at x. It is the "Green's function."
Free-Fixed Boundary Conditions
When the end x = 0 becomes free, that boundary condition changes to u'(0) = 0.
This leads to C = 0 in the complete solution u(x) = -R(x - a)+ Cx + D:
Set x = 0: u'(0) = 0 + C + 0. Therefore C must be zero.
Then u(l) = 0 yields the other constant D = 1 - a:
Set x = 1: u(l) = -R(l - a)+ D = a - 1 + D = 0. Therefore D = 1 - a.
The solution is a constant D (zero slope) up to the load at x = a. Then the slope
drops to -1 (descending ramp). The two-part formula for u(x) is
u=l-a
FREEFIXED
u(x) = {
1 1 -
a x
for for
x::; a
x 2': a
(5)
0
a--3 ~ 1
Free-Free: There is no solution for f = c5(x - a) when both ends are free. If we require u'(0) = 0 and also u'(l) = 0, those give conditions on C and D that cannot
be met. A ramp can't have zero slope on both sides (and support the load). In the
same way, the matrix B is singular and BB- 1 = I has no solution.
J The free-free problem does have a solution when J(x) dx = 0. The example
in problem 7 is J(x) = c5 (x - ½) - c5 (x - i)- The problem is still singular and it
has infinitely many solutions (any constant can be added to u(x), without changing
u 1(0) = 0 and u'(l) = 0).
J Integrating -u 11 = J(x) from Oto 1 gives this requirement J(x) dx = 0. The
integral of -u" is u 1(0) - u 1(1), and free-free boundary conditions make that zero.
In the matrix case, add then equations Bu= f to get O=Ji+···+ fn as the test.
40 Chapter 1 Applied Linear Algebra
Discrete Vectors: Load and Step and Ramp
Those solutions u(x) in (4) and (5) are the Green's functions G(x,a) for fixed ends
r- and free-fixed ends. They will correspond to K-1 and 1 in the matrix equations.
The matrices have second differences in place of second derivatives. It is a pleasure to see how difference equations imitate differential equations. The
crucial equation becomes /:).2R = 8. This copies R"(x) = 8(x):
The delta vector 8 has one nonzero component 80 = 1:
The step vector S has components Si= 0 or 1:
The ramp vector R has components Ri = 0 or i:
8 = (... , 0, 0, 1, 0, 0, .. .
S = (... ,0, 0, 1, 1, 1, .. . R = (... ,0, 0, 0, 1, 2, .. .
These vectors are all centered at i = 0. Notice that !:)._S = 8 but !:).+R = S. We need a backward !:)._ and a forward!:).+ to get a centered second difference /:).2 =
!:)._!:).+· Then 1:).2R = !:)._S = 8. Matrix multiplication shows this clearly:
.6.2 (ramp)
(6)
The ramp vector R is piecewise linear. At the center point, the second difference
jumps to R1 - 2Ro + R_1 = 1. At all other points (where the delta vector is zero) the ramp solves 1:).2 R = 0. Thus 1:).2R = 8 copies R"(x) = 8(x).
0
-o-----o--+-o-- 2 -1 0 1 2
s
0 0 0
---o---o-+----+ -2 -1 0 1 2
0
R
0
-0------0------ 2 -1 0 1 2
Figure 1.8: The delta vector 8 and step vector S and ramp vector R. The key relations
= = = are t5 -6._S (backward) and S -6.+R (forward) and t5 .6.2 R (centered).
The solutions to d2u/dx2 = 0 are linear functions Cx + D. The solutions to I:).2u = 0 are "linear vectors" with u, = Ci+ D. The equation ui+1 - 2u, + u,_1 = 0 is satisfied by constant vectors and linear vectors, since (i + 1) - 2i + (i - 1) = 0. The complete solution to !:)..2u = 8 is + Uparticular Unullspace· Thus Ui = R +Ci+ D.
I want to emphasize that this is unusually perfect. The discrete Ri +Ci+ D is an
exact copy of the continuous solution u(x) = R(x) + Cx + D. We can solve /:). 2u = 8
by sampling the ramp u(x) at equally spaced points, without any error.
1.4 Inverses and Delta Functions 41
The Discrete Equations Ku= ,Si and Tu= ,Si
In the differential equation, the point load and step and ramp moved to x = a. In the difference equation, the load moves to component j. The right side Dj has components Di-j, zero except when i = j. Then the shifted step and shifted ramp have components Si-j and R-j, also centered at j.
The fixed-ends difference equation from -u"(x) = 8(x - a) is now -b.2u = Dj:
-
A2
u
ui
=
-ui+I + 2ui -
ui-I
=
{l0iiff ii=i- jj
w.ith uo = 0 and Un+1 = 0.
(7)
The left side is exactly the matrix-vector multiplication Knu. The minus sign in -b.2
changes the rows 1, -2, 1 to their positive definite form -1, 2, -1. On the right side,
the shifted delta vector is the jth column of the identity matrix. When the load is at
meshpoint j = 2, the equation is column 2 of K K-1 = I:
l[~: l [ l - [-i -~ -~ ~ n=4
j=2
0 -1 2 -1
U3
~
0
j =2
(8)
0 0 -1 2 U4
0
When the right sides are the four columns of I (with j = 1, 2, 3, 4) the solutions are the four columns of Ki 1. This inverse matrix is the discrete Green's function.
What is the solution vector u? A particular solution is the descending ramp
- R-j, shifted by j and sign-reversed. The complete solution includes Ci+ D, which solves b.2u = 0. Thus ui = -R-j +Ci+ D. The constants C and Dare determined by the two boundary conditions u0 = 0 and Un+1 = 0:
u0 = -Ro-j + C • 0 + D = 0.
Un+I = -Rn+1-j + C(n + 1) + 0 = 0.
Therefore D must be zero
(9)
n¼r +1 •
Therefore C = nn+? = 1 -
(10)
Those results are parallel to D = 0 and C = 1 - a in the differential equation. The
tilted ramp u = -R + Ci in Figure 1.9 increases linearly from u0 = 0. Its peak is at
the position j of the point load, and the ramp descends linearly to Un+I = 0:
FIXED
. { (n;~~j) i for i :::; j
ENDS
Ui = -R-j +Ci= (n+l-i) .
. .
n+l J for z?::. J
(11)
Those are the entries of K;;1 (asked for in earlier problem sets). Above the diagonal, for i :::; j, the ramp is zero and ui = Ci. Below the diagonal, we can just exchange i and j, since we know that K;;1 is symmetric. These formulas for the vector u are exactly parallel to (1-a)x and (1-x)a in equation (4) for the fixed-ends continuous problem.
Figure 1.9 shows a typical case with n = 4 and the load at j = 2. The formulas
i, !, }) . i in (11) give u = (~,
The numbers go linearly up to (on the main diagonal
of K:41). Then 4/5 and 2/5 go linearly back down. The matrix equation (8) shows
that this vector u should be the second column of Ki 1, and it is.
42 Chapter 1 Applied Linear Algebra
0 1 2 3 4 5
f = load at j = 2 = column 2 of J
u = response to load = column 2 of K- 1
K_1
4
=
_!:
5
[
3 4 2
6 3 4
4 2 6
2l l 3
1 2 3 4
Figure 1.9: K 4u = 82 has the point load in position j = 2. The equation is column 2
of K4 K 41 = I. The solution u is column 2 of K41.
The free-fixed discrete equation Tu = f can also have / = OJ = point load at j:
Discrete -,!:),.2u, = 8,-J with u1 - u0 = 0 (zero slope) and Un+1 = 0.
(12)
The solution is still a ramp ui = - R,,_i + Ci + D with corner at j. But the constants C and D have new values because of the new boundary condition u 1 = u0 :
u1 - uo = 0 + C + 0 = 0 + Un+l = -Rn+l-J D = 0
so the first constant is C = 0
(13)
so the second constant is D = n + 1 - j. (14)
Those are completely analogous to C = 0 and D = 1 - a in the continuous problem
above. The solution equals D up to the point load at position j. Then the ramp
descends to reach Un+I = 0 at the other boundary. The two-part formula -R,,-J +D, before and after the point load at i = j, is
FREEFIXED
u, =
-R,,-J + (n + 1- J.) =
{
nn
++1l
-
j i
for i:::; j for i ~ j
(15)
r- The two parts are above and below the diagonal in the matrix 1. The point loads
at j = 1, 2, 3, ... lead to columns 1, 2, 3, ... and you seen+ 1 - 1 in the corner:
~ ' i 0 1 2 3 4 5
!= load at j = 2
column 2 of J
U= response to load
column 2 of r-1
T4-
1
_ -
r- Figure 1.10: T4u = 82 is column 2 of TT- 1 = I, sou= column 2 of 1.
r- This 1 is the matrix that came in Section 1.2 by inverting T = UTU. Each r- column of 1 is constant down to the main diagonal and then linear, just like
u(x) = 1 - a followed by u(x) = 1 - x in the free-fixed Green's function u(x, a).
1.4 Inverses and Delta Functions 43 Green's Function and the Inverse Matrix
If we can solve for point loads, we can solve for any loads. In the matrix case this is immediate (and worth seeing). Any vector f is a combination of n point loads:
(16)
The inverse matrix multiplies each column to combine three point load solutions:
Matrix multiplication u = K-1f is perfectly chosen to combine those columns.
In the continuous case, the combination gives an integral not a sum. The load J(x) is an integral of point loads f(a)8(x - a). The solution u(x) is an integral over all a of responses u(x, a) to those loads at each point a:
1 1 -u" = J(x) = 1J(a)8(x - a)da is solved by u(x) = 1J(a)u(x, a)da. (18)
The Green's function u(x, a) corresponds to "row x and column a" of a continuous K-1. We will see it again. To summarize, we repeat formulas (4) and (5) for u(x, a):
FIXED u _ { (1- a)x for x:::; a ENDS - (1 - x)a for x 2: a
FREE
{ 1 - a for x < a
FIXED u = 1 - x for x ~ a
(19)
If we sample the fixed-ends solution at x = ntl, when the load is at a= n{l, then
we (almost!) have the i,j entry of K;;1. The only difference between equations (11)
and (19) is an extra factor of n + 1 = 1/f::J..x. The exact analogy would be this:
(8) - d~xu2 = 8(x) corresponds to (f::JK..x) 2 U = f::J..x •
(20)
We divide K by h2 = (b.x) 2 to approximate the second derivative. We divide 8 by h = f::J..x because the area should be 1. Each component of 8 corresponds to a little x-interval of length f::J..x, so area = 1 requires height = 1/ l::J..x. Then our u is U/ l::J..x.
Cl WORKED EXAMPLES Cl
r- 1.4 The "Woodbury-Sherman-Morrison formula" will find K-1 from 1. This for-
mula gives the rank-one change in the inverse, when the matrix has a rank-one change
in K = T - uvT. In this example the change is only +1 in the 1, 1 entry, coming from T11 = 1 + K11 . The column vectors are v = (1, 0, ... 0) = -u.
44 Chapter 1 Applied Linear Algebra
Here is one of the most useful formulas in linear algebra (it extends to T- U VT):
Woodbury-Sherman-Morrison
Inverse of K = T - uvT
(21)
The proof multiplies the right side by T - uvT, and simplifies to I.
Problem 1.1.7 displays r-1 - K-1 when the vectors have length n = 4:
vTT- 1 = rowlofT- 1 = [4 3 2 1] 1-vTT-1u=1+4=5.
For any n, K-1 comes from the simpler r- 1 by subtracting wTwj(n+l) with w =n:-1:1
Problem Set 1.4
1 For -u" = 8(x - a), the solution must be linear on each side of the load. What four conditions determine A, B, C, D if u(0) = 2 and u(l) = 0?
u(x) =Ax+ B for O:::; x:::; a and u(x) = Cx + D for a:::; x:::; 1.
2 Change Problem 1 to the free-fixed case u'(0) = 0 and u(l) = 4. Find and solve the four equations for A, B, C, D.
l 3 Suppose there are two unit loads, at the points a = ½and b = Solve the
fixed-fixed problem in two ways: First combine the two single-load solutions. The other way is to find six conditions for A, B, C, D, E, F:
u(x) =Ax+ B for x:::; 31,
Cx+D
1
2
for3-<- x<-3-,
Ex + F
for
x
?:
2
3
.
4 Solve the equation -d2u/dx2 = 8(x - a) with fixed-free boundary conditions u(0) = 0 and u'(l) = 0. Draw the graphs of u(x) and u'(x).
5 Show that the same equation with free-free conditions u'(0) = 0 and u'(l) = 0 has no solution. The equations for C and D cannot be solved. This corresponds to the singular matrix Bn (with 1, 1 and n, n entries both changed to 1).
6 Show that -u" = 8(x - a) with periodic conditions u(0) = u(l) and u'(0) = u'(l) cannot be solved. Again the requirements on C and D cannot be met. This corresponds to the singular circulant matrix Cn (with 1, n and n, 1 entries changed to -1).
7 A difference of point loads, J(x) = 8(x - ½) - 8(x - i), does allow a freefree solution to -u" = f. Find infinitely many solutions with u'(0) = 0 and
u'(l) = 0.
8 The difference f(x) = 8(x - ½) - 8(x - i) has zero total load, and -u" = J(x)
can also be solved with periodic boundary conditions. Find a particular solution Upart(x) and then the complete solution Upart + Unull·
1.4 Inverses and Delta Functions 45
9 The distributed load f(x) = 1 is the integral of loads o(x - a) at all points
x = a. The free-fixed solution u(x) = ½(1- x2) from Section 1.3 should then be
the integral of the point-load solutions (1 - x for a :s; x, and 1 - a for a~ x):
1 u(x) =
x
11
(1-x) da+ (1-a)
da
=
12
x2
(1-x)x+(l--)-(x--)
=
1 --
-1x2.
YES!
0
X
2
2 22
Check the fixed-fixed case u(x) = f0x(l - x)ada + J:(1 - a)xda = __ .
10 If you add together the columns of K-1 (or r-1), you get a "discrete parabola"
that solves the equation Ku = f (or Tu = f) with what vector f? Do this
addition for K41 in Figure 1.9 and r4- 1 in Figure 1.10.
Problems 11-15 are about delta functions and their integrals and derivatives
11 The integral of o(x) is the step function S(x). The integral of S(x) is the ramp R(x). Find and graph the next two integrals: the quadratic spline Q(x) and the cubic spline C(x). Which derivatives of C(x) are continuous at x = 0?
12 The cubic spline C(x) solves the fourth-order equation u'111 = o(x). What is the complete solution u(x) with four arbitrary constants? Choose those constants so that u(l) = u"(l) = u(-1) = u"(-1) = 0. This gives the bending of a uniform simply supported beam under a point load.
13 The defining property of the delta function o(x) is that
1: o(x) g(x) dx = g(0) for every smooth function g(x).
How does this give "area= l" under o(x)? What is Jo(x - 3) g(x) dx?
14 The function o(x) is a "weak limit" of very high, very thin square waves SW:
1: SW(x) = 2~ for !xi :s; h has
SW(x) g(x) dx -t g(0) as h -t 0.
For a constant g(x) = 1 and every g(x) = xn, show that J SW(x)g(x) dx -t
g(0). We use the word "weak" because the rule depends on test functions g(x).
15 The derivative of o(x) is the doublet o'(x). Integrate by parts to compute
1: -1: g(x) o'(x) dx =
(?) o(x) dx = (??) for smooth g(x).
46 Chapter 1 Applied Linear Algebra
1.5 EIGENVALUES AND EIGENVECTORS
This section begins with Ax= AX. That is the equation for an eigenvector x and its
eigenvalue A. We can solve Ax= AX for small matrices, starting from det(A-AI) = 0.
Possibly you know this already (it would be a horrible method for large matrices). There is no "elimination" that leads in a finite time to the exact A and x. Since A
multiplies x, the equation Ax = Ax is not linear.
One great success of numerical linear algebra is the development of fast and stable algorithms to compute eigenvalues (especially when A is symmetric). The command eig(A) in MATLAB produces n numbers A1, ... , An and not a formula. But this chapter is dealing with special matrices ! For those we will find A and x exactly.
Part I: Applying eigenvalues to diagonalize A and solve u 1 = Au.
Part II: The eigenvalues of Kn, Tn, Bn, Cn are all A= 2 - 2 cos 0.
The two parts may need two lectures. The table at the end reports all we know about A and x for important classes of matrices. The first big application of eigenvalues is to Newton's Law Mu"+ Ku= 0 in Section 2.2.
Part I: Ax = .Xx and Akx = _xkx and Diagonalizing A
Almost every vector changes direction when it is multiplied by A. Certain exceptional vectors x lie along the same line as Ax. Those are the eigenvectors. For an eigenvector, Ax is a number ..X times the original x.
The eigenvalue A tells whether the special vector x is stretched or shrunk or reversed or left unchanged, when it is multiplied by A. We may find A= 2 (stretching)
or A = ½(shrinking) or A = -1 (reversing) or A = 1 (steady state, because Ax = x is
unchanged). We may also find A= 0. If the nullspace contains nonzero vectors, they have Ax= Ox. So the nullspace contains eigenvectors corresponding to A= 0.
For our special matrices, we will guess x and then discover A. For matrices in general, we find A first. To separate A from x, start by rewriting the basic equation:
Ax= .Xx means that (A - ..XI)x = 0.
The matrix A-AI must be singular. Its determinant must be zero. The eigenvector x will be in the nullspace of A- >.I. The first step is to recognize that A is an eigenvalue exactly when the shifted matrix A - >.I is not invertible:
The number ..X is an eigenvalue of A if and only if det(A - AI) = 0.
This "characteristic equation" det(A - AI) = 0 involves only A, not x. The
determinant of A - >.I is a polynomial in A of degree n. By the Fundamental Theorem of Algebra, this polynomial must haven roots A1, ... , An. Some of those eigenvalues may be repeated and some may be complex-those cases can give us a little trouble.
1.5 Eigenvalues and Eigenvectors 47
Example 1 Start with the special 2 by 2 matrix K = [2 -1; -1 2]. Estimate K 100 .
Step 1 Subtract >. from the diagonal to get
K
_
'I "
=
[2 - >.
-1
-1 ] 2->..
Step 2 Take the determinant of this matrix. That is (2 - >.) 2 - 1 and we simplify:
det(K->.I)=
1
2_-
> 1
.
2-_1>. I =>.2 -4>.+3.
Step 3 Factoring into >. - 1 times >. - 3, the roots are 1 and 3:
).2 - 4>. + 3 = 0 yields the eigenvalues >.1 = 1 and >.2 = 3.
Now find the eigenvectors by solving (K - >.I)x = 0 separately for each >.:
K-1=
[
1 -1
-lJ 1
K-31=
[
-1 -1
-lJ -1
leads to leads to
As expected, K -I and K -31 are singular. Each nullspace produces a line of eigenvectors. We chose x1 and x2 to have nice components 1 and -1, but any multiples c1x1 and c2x2 (other than zero) would have been equally good as eigenvectors. The MATLAB choice is c1 = c2 = 1/../2, because then the eigenvectors have length 1 (unit vectors).
These eigenvectors of K are special (since K is). If I graph the functions sin 1rx
i and sin 21rx, their samples at the two mesh points x = ½and are the eigenvectors in
Figure 1.11. (The functions sink1rx will soon lead us to the eigenvectors of Kn.)
K 100 will grow like 3 100 because A-ma:i: = 3.
An exact formula for 2K100 is
~ U 1100 [
from ). = 1
+ 3 100
[
-
1 1
-l1J from>.= 3
sin 1rx has samples (sin i, sin 2;) = c(l, 1)
1
3 sin 21rx has samples
(sin 2;, sin 4;) = c(l, -1)
~ Figure 1. 11: The eigenvectors of [ _ - ~] lie on the graphs of sin 1rx and sin 21rx.
48 Chapter 1 Applied Linear Algebra
Example 2
l Here is a 3 by 3 singular example, the circulant matrix C = 0 3: 2 _ A -1 -1 and C-Al= [ -1 2-A -1 . -1 -1 2-A
A little patience (3 by 3 is already requiring work) produces the determinant and its
factors:
det(C - AI)= -A3 + 6A2 - 9A = -A(A - 3)2 .
This third degree polynomial has three roots. The eigenvalues are A1 = 0 (singular
matrix) and A2 = 3 and A3 = 3 (repeated root!). The all-ones vector x 1 = (1, 1, 1) is
in the nullspace of C, so it is an eigenvector for A1 = 0. We hope for two independent
eigenvectors corresponding to the repeated eigenvalue A2 = A3 = 3:
[=~ =~ =~] C - 3/ =
has rank 1 (doubly singular).
-1 -1 -1
Elimination will zero out its last two rows. The three equations in (C - 3I)x = 0 are
all the same equation -x1 - x2 - x3 = 0, with a whole plane of solutions. They are all eigenvectors for A = 3. Allow me to make this choice of eigenvectors x 2 and x3 from that plane of solutions to Cx = 3x:
With this choice, the x's are orthonormal (orthogonal unit vectors). Every symmetric matrix has a full set of n perpendicular unit eigenvectors.
For an n by n matrix, the determinant of A - Al will start with (-Ar. The rest of this polynomial takes more work to compute. Galois proved that an algebraic
formula for the roots A1, ... , An is impossible for n > 4. (He got killed in a duel, but
not about this.) That is why the eigenvalue problem needs its own special algorithms, which do not begin with the determinant of A - Al.
The eigenvalue problem is harder than Ax= b, but there is partial good news. Two coefficients in the polynomial are easy to compute, and they give direct information about the product and sum of the roots A1 , ... , An.
The product of then eigenvalues equals the determinant of A. This is the constant term in det(A - AI):
= Determinant Product of .X's
(2)
The sum of the n eigenvalues equals the sum of the n diagonal entries. The trace is the coefficient of (-Ar-1 in det(A - AI).
= Trace
Sum of .X's
A1 + A2 + · · · + An= an + a22 + · · · + ann = sum down diagonal of A.
(3)
1.5 Eigenvalues and Eigenvectors 49
Those checks are very useful, especially the trace. They appear in Problems 20 and 21. They don't remove all the pain of computing det(A - >-.I) and its factors. But when the computation is wrong, they generally tell us so. In our examples,
.X = 1,3 K =
has trace 2 + 2 = 1 + 3 = 4. det(K) = 1 • 3.
.X = 0,3,3 C=
has trace 2 + 2 + 2 = 0 + 3 + 3 = 6. det(C) = 0
Let me note three important facts about the eigenvalue problem Ax= >-.x.
1. If A is triangular then its eigenvalues lie along its main diagonal.
The determinant of [4 ~ ).. 2 ~ )..] is (4 - >-.) (2 - >-.), so >-. = 4 and >-. = 2.
2. The eigenvalues of A 2 are .X~, ... , .X!. The eigenvalues of A-1 are l/.X1, ... , 1/.Xn.
Multiply Ax = >-.x by A. Multiply Ax= >-.x by A-1.
Then A 2x = >-.Ax = >-.2x. Then x = >-.A- 1x and A-1x = ½x.
Eigenvectors of A are also eigenvectors of A2 and A- 1 (and any function of A).
3. Eigenvalues of A + B and AB are not known from eigenvalues of A and B.
rn ~] A =
~ ~ ~ and B = [ ~] yield A + B = [ ~] and AB = [ ~] .
A and B have zero eigenvalues (triangular matrices with zeros on the diagonal). But the eigenvalues of A+ Bare 1 and -1. And AB has eigenvalues 1 and 0.
In the special case when AB = BA, these commuting matrices A and B will share eigenvectors: Ax= >-.x and Bx= >-.*x for the same eigenvector x. Then we do have (A+ B)x = (>-. + >-.*)x and ABx = >-.>-.*x. Eigenvalues of A and B can now be added and multiplied. (With B = A, the eigenvalues of A+ A and A2 are>-.+>-. and >-.2 .)
Example 3 A Markov matrix has no negative entries and each column adds to 1 (some authors work with row vectors and then each row adds to one):
Markov matrix
A= [·8 .3] .2 .7
has >-. = 1 and .5 .
Every Markov matrix has>-.= 1 as an eigenvalue (A- I has dependent rows). When the trace is .8 + .7 = 1.5, the other eigenvalue must be >-. = .5. The determinant of A must be (>-.1)(>-.2) = .5 and it is. The eigenvectors are (.6, .4) and (-1, 1).
50 Chapter 1 Applied Linear Algebra
Eigshow
A MATLAB demo (just type eigshow) displays the eigenvalue problem for a 2 by 2
matrix. Figure 1.12 starts with the vector x = (l, 0). The mouse makes this vector
move around the unit circle. At the same time the screen shows Ax, in color and also moving. Possibly Ax is ahead of x. Possibly Ax is behind x. Sometimes Ax is parallel to x. At that parallel moment, Ax equals >.x.
.! A = [:~ :~] has eigenvalues ~:
y
is behind y = [~]
Not eigenvectors
[:~] is ahead of x = [~]
X
\
I
/
,._ ___ ,,,circle of x's
Figure 1.12: Eigshow for the Markov matrix: x1 and x2 line up with Ax1 and Ax2 .
The eigenvalue >. is the length of Ax, when it is parallel to the unit eigenvector x. On web.mit.edu/18.06 we added a voice explanation of what can happen. The choices for A illustrate three possibilities, 0 or 1 or 2 real eigenvectors:
1. There may be no real eigenvectors. Ax stays behind or ahead of x. This means the eigenvalues and eigenvectors are complex (as for a rotation matrix).
2. There may be only one line of eigenvectors (unusual). The moving directions
Ax and x meet but don't cross. This can happen only when >.1 = >.2.
3. There are two independent eigenvectors. This is typical! Ax crosses x at the first eigenvector x1, and it crosses back at the second eigenvector x2 (also at -x1 and -x2). The figure on the right shows those crossing directions: x is parallel to Ax. These eigenvectors are not perpendicular because A is not symmetric.
The Powers of a Matrix
Linear equations Ax = b come from steady state problems. Eigenvalues have their
greatest importance in dynamic problems. The solution is changing with timegrowing or decaying or oscillating or approaching a steady state. We cannot use elimination (which changes the eigenvalues). But the eigenvalues and eigenvectors tell us everything.
1.5 Eigenvalues and Eigenvectors 51
Example 4 The two components of u(t) are the US populations east and west of the
Mississippi at time t. Every year, 180 of the eastern population stays east and 120 moves
west. At the same time i7o of the western population stays west and 130 moves east:
u(t+l) = Au(t)
t] . [east
west
at at
ti_me time
t t
+ +
1 1
]
=
[·8 .2
.3] [east at ti_me .7 west at time t
Start with a million people in the east at t = 0. After one year (multiply by A), the
numbers are 800,000 and 200,000. Nobody is created or destroyed because the columns add to 1. Populations stay positive because a Markov matrix has no negative entries.
The initial u = [1,000, 000 0] combines the eigenvectors [ 600, 000 400,000] and
[400,000 -400, 000 ].
After 100 steps the populations are almost at steady state because (½) 100 is so small:
Steady state plus transient
u(lOO)
=
[
600,000] 400,000
2 + ( 1) lOO
[ 400,000] -400,000
You can see the steady state directly from the powers A, A2, A3 , and A100 :
A= [·8 .3] A2 = [.70 .45] A3 = [.650 .525] A 100 = [.6000 .6000]
.2 .7
.30 .55
.350 .475
.4000 .4000
Three steps find uk = A ku0 from the eigenvalues and eigenvectors of A.
Step 1. Write uo as a combination of the eigenvectors uo = a1x1 + ···+ anXn.
Step 2. Multiply each number ai by (>.i/-
Step 3. Recombine the eigenvectors into uk = a1(>.1/x1 + ···+ an(>.n/Xn.
In matrix language this is exactly uk = SAks- 1u0 . S has the eigenvectors of
A in its columns. The diagonal matrix A contains the eigenvalues:
Step 1.
[x,= Wdte Uo
Step 2.
SAka which is
= uk SAkS-1u 0 .
52 Chapter 1 Applied Linear Algebra
Step 2 is the fastest-just n multiplications by Af. Step 1 solves a linear system to analyze u 0 into eigenvectors. Step 3 multiplies by S to reconstruct the answer uk.
This process occurs over and over in applied mathematics. We see the same steps
next for du/dt = Au, and again in Section 3.5 for A-1. All of Fourier series and signal
processing depends on using the eigenvectors in exactly this way (the Fast Fourier Transform makes it quick). Example 4 carried out the steps in a specific case.
Diagonalizing a Matrix
For an eigenvector x, multiplication by A just multiplies by a number: Ax = AX.
All the n by n difficulties are swept away. Instead of an interconnected system, we can follow the eigenvectors separately. It is like having a diagonal matrix, with no off-diagonal interconnections. The 100th power of a diagonal matrix is easy.
The matrix A turns into a diagonal matrix A when we use the eigenvectors properly. This is the matrix form of our key idea. Here is the one essential computation.
Suppose the n by n matrix A has n linearly independent eigenvectors x1 , ... , Xn-
s- Those are the columns of an eigenvector matrix S. Then 1AS= A is diagonal:
Diagonalization
s-1AS = A = [Ai . .
] = eigen~lue matrix
(4)
An
We use capital lambda for the eigenvalue matrix, with the small A's on its diagonal.
Proof Multiply A times its eigenvectors x1, ... , Xn, which are the columns of S. The first column of AS is Ax1. That is A1x1:
A times S
The trick is to split this matrix AS into S times A:
Keep those matrices in the right order! Then A1 multiples the first column x1, as shown. We can write the diagonalization AS= SA in two good ways:
AS= SA is s- 1AS = A or A= SAs- 1.
(5)
The matrix S has an inverse, because its columns (the eigenvectors of A) were assumed to be independent. Without n independent eigenvectors, we cannot diagonalize A. With no repeated eigenvalues, it is automatic that A has n independent eigenvectors.
1.5 Eigenvalues and Eigenvectors 53
Application to Vector Differential Equations
~f A single differential equation = ay has the general solution y(t) = Ceat. The
initial value y(0) determines C. The solution y(0)eat decays if a < 0 and it grows if
a> 0. Decay is stability, growth is instability. When a is a complex number, its real
part determines the growth or decay. The imaginary part gives an oscillating factor eiwt = cos wt+ i sin wt.
Now consider two coupled equations, which give one vector equation:
du
-=Au dt
dy/dt = 2y- z dz/dt = -y + 2z
or
The solution will still involve exponentials e>-.t. But we no longer have a single growth rate as in eat. There are two eigenvalues >. = 1 and >. = 3 of this matrix A = K 2 . The solution has two exponentials et and e3t. They multiply x = (1, 1) and (1, -1).
The neat way to find solutions is by the eigenvectors. Pure solutions e-Xtx are eigenvectors that grow according to their own eigenvalue 1 or 3. We combine them:
is
[
y(t)] z(t)
=
[Get+ De3t] Get - De3t •
(6)
This is the complete solution. Its two constants (C and D) are determined by two
initial values y(0) and z(0). Check first that each part e>-.tx solves ~~ = Au:
= Each eigenvector u(t) e.\tx
yields
!~ = >.e>-.tx = Ae>-.tx =Au.
(7)
The number e>-.t just multiplies all components of the eigenvector x. This is the point of eigenvectors, they grow by themselves at their own rate >.. Then the complete solution u(t) in (6) combines the pure modes Cetx1 and De3tx2 . The three steps
for powers apply here too: Expand u{O) = Sa, multiply each a3 by e-X;t, recombine into u(t) = Se.xts-1u(O).
Example 5 Suppose the initial values are y(0) = 7 and z(0) = 3. This determines the constants C and D. At the starting time t = 0, the growth factors e>-.t are both one:
We solved two equations to find C = 5 and D = 2. Then u(t) = + 5etx1 2e3tx2 solves
the whole problem. The solution is a combination of slow growth and fast growth. Over
a long time the faster e3t will dominate, and the solution will line up with x 2 .
+ Section 2.2 will explain the key equation Mu" Ku = 0 in much more detail.
Newton's law involves acceleration (second derivative instead of first derivative). We might have two masses connected by springs. They can oscillate together, as in the first eigenvector (1, 1). Or they can be completely out of phase and move in opposite directions, as in the second eigenvector (1, -1). The eigenvectors give the pure motions eiwtx, or "normal modes." The initial conditions produce a mixture.
54 Chapter 1 Applied Linear Algebra
Symmetric Matrices and Orthonormal Eigenvectors
Our special matrices Kn and Tn and Bn and Cn are all symmetric. When A is a symmetric matrix, its eigenvectors are perpendicular (and the >.'s are real):
Symmetric matrices have real eigenvalues and orthonormal eigenvectors.
The columns of Sare these orthonormal eigenvectors q1 , ... , qn. We use q instead of x for orthonormal vectors, and we use Q instead of S for the eigenvector matrix with those columns. Orthonormal vectors are perpendicular unit vectors:
when i -/- j (orthogonal vectors)
when i = j (orthonormal vectors)
(8)
The matrix Q is easy to work with because QTQ = I. The transpose is the inverse! This repeats in matrix language that the columns of Q are orthonormal. QTQ = I
contains all those inner products q;qi that equal Oor 1:
Orthogonal Matrix
(9)
For two orthonormal columns in three-dimensional space, Q is 3 by 2. In this rectangular case, we still have QTQ = I but we don't have QQT = I.
For our full set of eigenvectors, Q is square and QT is Q-1. The diagonalization
of a real symmetric matrix has S = Q and s-1 = QT:
Symmetric diagonalization A= SAs- 1 = QAQT with QT= Q-1 .
(10)
Notice how QAQT is automatically symmetric (like LDLT). These factorizations
perfectly reflect the symmetry of A. The eigenvalues >.1, ... , An are the "spectrum"
of the matrix, and A= QAQT is the spectral theorem or principal axis theorem.
Part 11: Eigenvectors for Derivatives and Differences
A main theme of this textbook is the analogy between discrete and continuous problems (matrix equations and differential equations). Our special matrices all give sec-
ond differences, and we go first to the differential equation -y 11 = >.y. The eigen-
functions y(x) are cosines and sines:
d2y
- - = ..Xy(x) is solved by y = coswx and y = sinwx with ..X = w2 . (11)
d:z:2
Allowing all frequencies w, we have too many eigenfunctions. The boundary conditions will pick out the frequencies w and choose cosines or sines.
1.5 Eigenvalues and Eigenvectors 55
For y(O) = 0 and y(l) = 0, the fixed-fixed eigenfunctions are y(a::) = sin k1rx.
The boundary condition y(O) = 0 reduces to sin0 = 0, good. The condition y(l) = 0 reduces to sin br = 0. The sine comes back to zero at 1r and 21r and every integer multiple of 1r. So k = l, 2, 3, ... (since k = 0 only produces sin Ox = 0 which is useless). Substitute y(x) = sink1rx into (11) to find the eigenvalues>.= k21r2:
- ddx22 (sm• k1rx ) = k21r2 sm• k1rx so ,,.. = k21r2 = { 1r2 , 41r2 , 91r2 , . . .} .
(12)
We will make a similar guess (discrete sines) for the discrete eigenvectors of Kn.
Changing the boundary conditions gives new eigenfunctions and eigenvalues. The equation -y 11 = >.y is still solved by sines and cosines. Instead of y = sin k1rx which is zero at both endpoints, here are the eigenfunctions Yk(x) and their eigenvalues >.k for free-free (zero slope) and periodic and free-fixed conditions:
Analog of Bn y 1(0) = 0 and y 1(l) = 0 y(x) = cosk1rx
Analog of Cn y(O) = y(l), y 1(0) = y 1(l) y(x) = sin21rka::,cos21rka:: ). = 4k 21r2 Analog of Tn y 1(0) = 0 and y(l) = 0 y(x) = cos {k+½)1ra:: >. = {k+½) 21r2
Remember that Bn and Cn are singular matrices (>. = 0 is an eigenvalue). Their continuous analogs also have>.= 0, with cos Ox= 1 as the eigenfunction (set k = 0). This constant eigenfunction y(x) = 1 is like the constant vector y = (l, 1, ... , 1).
The free-fixed eigenfunctions cos(k + ½)1rx start with zero slope because sin 0 = 0. They end with zero height because cos(k + ½)1r = 0. So y 1(0) = 0 and y(l) = 0. The
matrix eigenvectors will now use these same sines and cosines (but >. is different).
Eigenvectors of Kn: Discrete Sines
Now come eigenvectors for the -1, 2, -1 matrices. They are discrete sines and cosines-try them and they work. For all the middle rows, sinj0 and cosj0 are still
successful for -Yi-I + 2yi - Yi+I = >.yi, with eigenvalues .X = 2 - 2 cos 0 ~ 0:
-l{sin(j_ cos(J -
1)0} 1)0
+
2{sinj_0}cosJ0
1{sin(j_ cos(J
+ +
1)0} 1)0
=
(2 _ 2 cos 0) {sinf0}· cosJ0
(l3)
These are the imaginary and real parts of a simpler identity:
The boundary rows decide 0 and everything! For Kn the angles are 0 = k1r / (n + l).
The first eigenvector y1 will sample the first eigenfunction y(x) = sin 1rx at the n meshpoints with h = n~I:
= First eigenvector Discrete sine y1 = (sin 1rh, sin 21rh, ... , sin n1rh)
(14)
56 Chapter 1 Applied Linear Algebra
The jth component is sin ; ;1 . It is zero for j = 0 and j = n + 1, as desired. The
angle is 0 = 1rh = n:l. The lowest eigenvalue is 2 - 2 cos 0 ~ 02:
~ First eigenvalue of Kn A1 = 2 - 2cos1rh = 2 - 2 ( 1 - 1r:h2 + • • •) 1r2h2 . (15)
Compare with the first eigenvalue A = 1r2 of the differential equation (when y(x) = sin 1rx and -y 11 = 1r2y). Divide K by h2 = (box)2 to match differences with derivatives. The eigenvalues of Kare also divided by h2:
A1(K)/h2 ~ 1r2h2/h2 is close to the first eigenvalue A1 = 1r2 in (12).
The other continuous eigenfunctions are sin 21rx, sin 31rx, and generally sin k1rx. It is neat for the kth discrete eigenvector to sample sin k1rx again at x = h, ... , nh:
= Eigenvectors Discrete sines
Yk = (sin k1rh, ... , sin nk1rh)
(16)
All eigenvalues of Kn
Ak = 2 - 2 cos k1rh, k = 1, ... , n. (17)
The sum A1 +•••+An must be 2n, because that is the sum of the 2's on the diagonal (the trace). The product of the A's must be n+l. This probably needs an expert (not the author). For K 2 and K 3 , Figure 1.13 shows the eigenvalues (symmetric around 2).
Eigenvalues 2 - 2 cos 0 of K3
1) ..X1 = 2 - 2 (
= 2 - v'2
..X2 = 2 - 2(0) = 2
-1) ..Xa = 2 - 2 (
= 2 + v'2
Trace A1 + A2 + A3 = 6
Determinant A1A2A3 = 4
For B4 include also Ao= 0
0 1r I 4 21r / 4 31r / 4 1r
Figure 1.13: Eigenvalues* of Kn-I interlace eigenvalues•= 2 - 2cos nk.;1 of Kn.
Orthogonality The eigenvectors of a symmetric matrix are perpendicular. That
is confirmed for the eigenvectors (1, 1) and (1, -1) in the 2 by 2 case. The three
eigenvectors (when n = 3 and n + l = 4) are the columns of this sine matrix:
Discrete
. 7r
S1Il4
sr.n 427r sr.n 437r
1
1
v'2 1 v'2
Sine
DST=
sr.n 427r
sr•n 447r
sin B1r
4
1 0 -1
(18)
Transform
s•m 437r
sm• 461r
sin 91r
4
1
v'2
-1
1
v'2
1.5 Eigenvalues and Eigenvectors 57
The columns of Sare orthogonal vectors, of length ../2. If we divide all components by ../2, the three eigenvectors become orthonormal. Their components lie on sine curves. The DST matrix becomes an orthogonal Q = DST/../2, with Q-1 = QT.
Section 3.5 uses the DST matrix in a Fast Poisson Solver for a two-dimensional
difference equation (K2D)U = F. The columns of the matrix are displayed there for n = 5. The code will give a fast sine transform based on the FFT.
, ,sin31rx
\ - Eigenvectors of Ka
.. ,..._°o'-
o
cos Ox
=
1
o
,
,.
' lie along sine curves
\
\ ',
\
\
,'ti cos 21rx
0
1\ \
3
4 \ \I 4
I
\ (
/
Eigenvectors of Ba lie along cosines ---+
I
0
1\1
1
' i i \
,'o, 11 1
i3 i ' t i i5
I
1
\ I\ /
"' -o sin 21rx
' \
/ 'o COS7rX
'r
... -
Figure 1.14: Three discrete eigenvectors fall on three continuous eigenfunctions.
Eigenvectors of Bn: Discrete Cosines
The matrices Bn correspond to zero slope at both ends. Remarkably, Bn has the same
n - 1 eigenvalues as Kn-I plus the additional eigenvalue >. = 0. (B is singular with
(1, ... , 1) in its nullspace, because its first and last rows contain +1 and -1.) Thus B3 has eigenvalues 0, 1, 3 and trace 4, agreeing with 1 + 2 + 1 on its diagonal:
Eigenvalues of Bn
>. = 2 -
k1r 2 cos-,
k = 0, ... , n - 1.
(19)
n
Eigenvectors of B sample cosk1rx at then midpoints x = (j - ½)/n in Figure 1.14,
where eigenvectors of K sample the sines at the meshpoints x = j / (n + 1):
Eigenvectors of Bn
Yk = ( co21s-kn1-r' co32sk-n1-r'···, cos ( n - -21) -kn1r) •
(20)
Since the cosine is even, those vectors have zero slope at the ends:
and
Notice that k = 0 gives the all-ones eigenvector y0 = (1, 1, ... , 1) which has eigenvalue >. = 0. This is the DC vector with zero frequency. Starting the count at zero is the
useful convention in electrical engineering and signal processing.
58 Chapter 1 Applied Linear Algebra
These eigenvectors of Bn give the Discrete Cosine Transform. Here is the
cosine matrix for n = 3, with the unnormalized eigenvectors of B3 in its columns:
Discrete Cosine
l [ cosO cosl12!3: cos2l 231r
1
OCT =
[ cos 0
cos
~
2
13 c
cos
~
2
231r
=
1
l l2 V'30 l2 0 -1
(21)
Transform
cos 0
cos Q 1!: 2 3
cos
§_ 2
231r
1 -½\1'3 ½
= Eigenvectors of Cn: Powers of w e 21ri/n
After sines from Kn and cosines from Bn, we come to the eigenvectors of Cn. These are both sines and cosines. Equivalently, they are complex exponentials. They are even more important than the sine and cosine transforms, because now the eigenvectors give the Discrete Fourier Transform.
You can't have better eigenvectors than that. Every circulant matrix shares these
eigenvectors, as we see in Chapter 4 on Fourier transforms. A circulant matrix is a
"periodic matrix." It has constant diagonals with wrap-around (the -l's below the
l main diagonal of Cn wrap around to the -1 in the upper right corner). Our goal is
to find the eigenvalues and eigenvectors of the matrices Cn, like C4:
2 -1 0 -1
Circulant matrix (periodic)
C4 =
[
-1 0
2 -1
-1 2
0 -1
-1 0 -1 2
This symmetric matrix has real orthogonal eigenvectors (discrete sines and cosines). They have full cycles like sin 2k1Tx, not half cycles like sin k1Tx. But the numbering gets awkward when cosines start at k = 0 and sines start at k = l. It is better to work with complex exponentials ei9 . The kth eigenvector of Cn comes from sampling Yk(x) = ei21rkx at then meshpoints which are now x = j/n:
jth component of Yk ei21rk(j/n) = wjk where w = e21ri/n = nth root of 1. (22)
That special number w = e 27r:i/n is the key to the Discrete Fourier Transform. Its angle is 211"/n, which is an nth part of the whole way around the unit circle. The powers of w cycle around the circle and come back town= l:
Eigenvectors of Cn Yk-- (1 ' wk ' w2k , ••• , w<n-l)k)
(23)
Eigenvalues of Cn
.Xk = 2 - wk - w-k = 2 - 2 cos 27n1:k
(24)
The numbering is k = 0, l, ... , n - l. The eigenvector with k = 0 is the constant
y0 = (1, 1, ... , 1). The choice k = n would give the same (1, 1, ... , l)-nothing new,
just aliasing! The lowest eigenvalue 2 - 2 cos 0 is >.0 = 0. The Cn are singular.
1.5 Eigenvalues and Eigenvectors 59
The eigenvector with k = l is y1 = (1, w, ... , wn-l). Those components are
the n roots of 1. Figure 1.15 shows the unit circle r = 1 in the complex plane,
with then= 4 numbers 1, i, i2, i3 equally spaced around the circle on the left. Those
numbers are e0, e2 e4n:i/4, e6 4 and their fourth powers are 1.
n:i/4,
n:i/
l [ l [ l Cy1 =
-12 [ 0
_-121
-10 2
-_101
i1i2
. ·3 = (2 - z - z )
i1i2
. (25)
-1 0 -1 2 f
f
For any n, the top row gives 2 - w - wn-l = 2 - w - w. Notice that wn-l is also the complex conjugate w = e-2n:i/n = l/w, because one more factor w will reach 1.
Eigenvalue of C >.1 = 2 - w - w = 2 - e2n:i/n - e-2n:i/n = 2 - 2 cos 27r .
(26)
n
w2 = -1
w = e21ri/4 = i
n=4
w4 = 1
w3 = -i
Figure 1.15: The solutions to z4 = 1 are 1,i,i2,i3 The 8th roots are powers of e2 8
.
n:i/
.
Now we know the first eigenvectors y0 =
(1,l,1,1) and y1 =
(1,i,i2,i3 )
of 0 4.
The eigenvalues are Oand 2. To finish, we need the eigenvectors y2 = (1, i2, i4, i6) and
y3 =
(1,i3 ,i6,i9 ).
Their eigenvalues are 4 and 2, which are 2-2cos1r and 2-2cos 3 ;.
Then the sum of the eigenvalues is O+ 2 + 4 + 2 = 8, agreeing with the sum of the
diagonal entries (the trace 2 + 2 + 2 + 2) of this matrix 0 4.
The Fourier Matrix
As always, the eigenvectors go into the columns of a matrix. Instead of the sine or
cosine matrix, these eigenvectors of Cn give the Fourier matrix Fn. We have the
[ i l DFT instead of the DST or DCT. For n = 4 the columns of F4 are Yo, Y1, Y2, y : 3
Fourier matrix F 4 Eigenvectors of C4
F4 =
1 1 i i2 i2 i4 ii6;
i3 i6 ig
= = (Fn)jk wik e2-rrijk/n.
60 Chapter 1 Applied Linear Algebra
The columns of the Fourier matrix are orthogonal! The inner product of two complex
vectors requires that we take the complex conjugate of one of them (by convention
l- - the first one). Otherwise we would have y'[y3 = 1 + 1 + 1 + 1 = 4. But y1 is truly
orthogonal to y3 because the correct product uses the conjugate y1:
Complex
[1 -i (-i)2 (-i)3] [ i\
inner
·6 - 1 - 1 + 1 - 1 - 0 . (27)
product
i
ig
Similarly YfY1 = 4 gives the correct length IIY1II = 2 (not YfY1 = 0). The matrix
FT F of all the column inner products is 4/. Orthogonality of columns reveals p-1:
Orthogonal
-T
F 4F4
=
4/
so that
F4-1
=
4lp
T
4
=
i•nverse of
p
.
(28)
Always F!Fn = nl. The inverse of Fn is F:/n. We could divide Fn by ,/n,
which normalizes it to Un = Fn/ ..fii,. This normalized Fourier matrix is unitary:
Orthonormal
(29)
A unitary matrix has UT U = I and orthonormal columns. It is the complex analog of a real orthogonal Q (which has QTQ = I). The Fourier matrix is the most important complex matrix ever seen. Fn and F,;;1 produce the Discrete Fourier Transform.
Problem Set 1.5
The first nine problems are about the matrices Kn, Tn, Bn, Cn.
1 The 2 by 2 matrix K2 in Example 1 has eigenvalues 1 and 3 in A. Its unit eigenvectors q1 and q2 are the columns of Q. Multiply QAQT to recover K2 .
2 When you multiply the eigenvector y = (sin 1rh, sin 21rh, ... ) by K, the first row will produce a multiple of sin 1rh. Find that multiplier .X by a double-angle formula for sin 21rh:
(Ky )i = 2 sin 7rh - l sin 21rh = .X sin 7rh Then .X =
3 In MATLAB, construct K = K 5 and then its eigenvalues bye= eig(K). That
column should be (2 - v13, 2 - 1, 2 - 0, 2 + 1, 2 + v'3). Verify that e agrees with
2 * ones(5, 1) - 2 * cos([l : 5] * pi/6)'.
4 Continue 3 to find an eigenvector matrix Q by [Q, E] = eig(K). The Discrete Sine Transform DST = Q * diag([ -1 -1 1 -1 1]) starts each column with a positive entry. The matrix J K = [1 : 5 ]' * [1 : 5] has entries j times k.
Verify that DST agrees with sin(JK*pi/6)/sqrt(3), and test DSTT = DST-1.
1.5 Eigenvalues and Eigenvectors 61
5 Construct B = BB and [Q, E] = eig(B) with B(l, 1) = 1 and B(6, 6) = 1. Verify
that E = diag(e) with eigenvalues 2 *ones(l, 6) - 2 *cos([O: 5] * pi/6) in e. How
do you adjust Q to produce the (highly important) Discrete Cosine Transform with entries OCT= cos([.5: 5.5 ]' * [O: 5] * pi/6)/sqrt(3)?
6 The free-fixed matrix T = TB has T(l, 1) = 1. Check that its eigenvalues are
2-2 cos [(k - ½)11"/6.5]. The matrix cos([.5: 5.5 ]' * [.5: 5.5] * pi/6.5)/sqrt(3.25) should contain its unit eigenvectors. Compute Q' * Q and Q' * T * Q.
7 The columns of the Fourier matrix F4 are eigenvectors of the circulant matrix C = C4 . But [Q, E] = eig(C) does not produce Q = F4 . What combinations of
the columns of Q give the columns of F4 ? Notice the double eigenvalue in E.
8 Show that then eigenvalues 2 - 2 cos ;:_;1 of Kn add to the trace 2 + · · · + 2. 9 K3 and B4 have the same nonzero eigenvalues because they come from the same
4x3 backward difference LL. Show that K 3 = LLT ,6._ and B4 = .6._.6._T_ The
eigenvalues of K 3 are the squared singular values a2 of .6._ in 1.7.
Problems 10-23 are about diagonalizing A by its eigenvectors in S.
s- 10 Factor these two matrices into A= SAs- 1. Check that A2 = SA2 1:
and
11 If A= SAs- 1 then A-1 = ( )( )( ). The eigenvectors of A3 are (the same columns of S)(different vectors).
12 If A has >.1 = 2 with eigenvector x 1 = [i] and >.2 = 5 with x2 = [½] , use
SAs-1 to find A. No other matrix has the same >.'sand x's.
13 Suppose A = SAs-1. What is the eigenvalue matrix for A + 21? What is the eigenvector matrix? Check that A+ 21 = ( )( )( )-1.
14 If the columns of S (n eigenvectors of A) are linearly independent, then
(a) A is invertible (b) A is diagonalizable (c) S is invertible
15 The matrix A = [gA] is not diagonalizable because the rank of A - 31 is __ .
A only has one line of eigenvector. Which entries could you change to make A diagonalizable, with two eigenvectors?
16 Ak = SAks-1 approaches the zero matrix as k --too if and only if every >. has
absolute value less than
. Which of these matrices has Ak --t O?
A1 = [·.46 ..64]
and A2 = [·.61 ..69]
and
62 Chapter 1 Applied Linear Algebra
17 Find A and S to diagonalize A1 in Problem 16. What is A 110u0 for these u0?
18 Diagonalize A and compute SAks- 1 to prove this formula for Ak:
has
19 Diagonalize B and compute SAks- 1 to show how Bk involves 3k and 2k:
has
20 Suppose that A = SAs-1. Take determinants to prove that det A = >.1>.2• • •An = product of .X's. This quick proof only works when A is __ .
21 Show that trace GH = trace HG, by adding the diagonal entries of GH and HG:
and
Choose G = S and H = As-1. Then S As- 1 = A has the same trace as
AS-1S = A, so the trace is the sum of the eigenvalues.
22 Substitute A = SAs-1 into the product (A - >.1/)(A - >.21) ···(A - >.nl) and explain why (A - >.1/) ···(A - An/) produces the zero matrix. We are substi-
tuting A for>. in the polynomial p(>.) = det(A - >.I). The Cayley-Hamilton Theorem says that p(A) = zero matrix (true even if A is not diagonalizable).
Problems 23-26 solve first-order systems u 1 = Au by using Ax = >.x.
23 Find .X's and x's so that u = e>--tx solves
What combination u = c1e>--1tx1 + c2e>--2tx2 starts from u(O) = (5, -2)? 24 Find A to change the scalar equation y" = 5y' + 4y into a vector equation for
u = (y, y'). What are the eigenvalues of A? Find >.1 and >.2 also by substituting y = e>--t into y" = 5y' + 4y.
ddut = [yy"'] = [
1.5 Eigenvalues and Eigenvectors 63
25 The rabbit and wolf populations show fast growth of rabbits (from 6r) but loss to wolves (from -2w). Find A and its eigenvalues and eigenvectors:
-dr =6r-2w dt
and
ddwt = 2r+w.
If r(0) = w(0) = 30 what are the populations at time t? After a long time, is
the ratio of rabbits to wolves 1 to 2 or is it 2 to 1?
26 Substitute y = e>-.t into y" = 6y' - 9y to show that >. = 3 is a repeated root.
This is trouble; we need a second solution after e3t. The matrix equation is
Show that this matrix has >. = 3, 3 and only one line of eigenvectors. Trouble here too. Show that the second solution is y = te3t.
27 Explain why A and AT have the same eigenvalues. Show that >. = 1 is always an eigenvalue when A is a Markov matrix, because each row of AT adds to 1 and the vector __ is an eigenvector of AT.
28 Find the eigenvalues and unit eigenvectors of A and T, and check the trace:
A= [ 11 01 0ll 1 0 0
T = [ -11 -l2J •
29 Here is a quick "proof" that the eigenvalues of all real matrices are real:
is real.
Find the flaw in this reasoning-a hidden assumption that is not justified.
30 Find all 2 by 2 matrices that are orthogonal and also symmetric. Which two numbers can be eigenvalues of these matrices?
31 To find the eigenfunction y(x) = sin brx, we could put y = e= in the differential equation -u" = >.u. Then -a2eax = >.e= gives a = i,/>.. or a = -i,/>... The complete solution y(x) = Ceiv'>-x + De-iv'>-x has C + D = 0 because y(0) = 0.
That simplifies y(x) to a sine function:
y(x) = C(eiv'>-x - e-iv'>-x) = 2iC sin ..f>..x.
y(l) = 0 yields sin '15,. = 0. Then '15,. must be a multiple of k1r, and >. = k21r2 as before. Repeat these steps for y'(0) = y'(l) = 0 and also y'(0) = y(l) = 0.
64 Chapter 1 Applied Linear Algebra
32 Suppose eigshow follows x and Ax for these six matrices. How many real eigenvectors? When does Ax go around in the opposite direction from x?
33 Scarymatlab shows what can happen when roundoff destroys symmetry:
A=[lllll; 1:5]'; B=A'*A; P=A* inv(B)*A'; [Q, E]= eig(P);
Bis exactly symmetric. The projection P should be symmetric, but isn't. From Q' * Q show that two eigenvectors of P fail badly to have inner product 0.
a WORKED EXAMPLE a
The eigenvalue problem -u 11 + x2u = >.u for the Schrodinger equation is im-
portant in physics (the harmonic oscillator). The exact eigenvalues are the odd numbers >. = 1, 3, 5, .... This is a beautiful example for numerical experiment. One new point is that for computations, the infinite interval (-oo, oo) is reduced to - L ::=; x ::=; L. The eigenfunctions decay so quickly, like e-x2!2 , that the matrix K could be replaced by B (maybe even by the circulant C). Try harmonic(lO, 10, 8) and (10, 20, 8) and (5, 10, 8) to see how the error in>.= 1 depends on hand L.
function harmonic(L,n,k)
h=l/n; N=2 * n * L+ 1;
K= toeplitz([2-1 zeros(l,N-2)]);
H=K/h /\ 2 + diag((-L:h:L). /\ 2);
[V,F]= eig(H);
E=diag(F); E=E(l:k)
j=l:k; plotU,E);
% positive integers L, n, k % N points in interval [-£, L] % second difference matrix % diagonal matrix from x I\ 2 % trideig is faster for large N
% first k eigenvalues (near 2n + 1)
% choose sparse K and diag if needed
A tridiagonal eigenvalue code trideig is on math.mit.edu/~persson.
The exact eigenfunctions Un = Hn(x)e-x2 / 2 come from a classical method: Put
u(x) = (LajxJ)e-x2 / 2 into the equation -u" + x2u = (2n + l)u and match up each
power of x. Then aH2 comes from a1 (even powers stay separate from odd powers):
The coefficients are connected by (j + l)(j + 2)aH2 = -2(n - j)a1 .
At n = j the right side is zero, so a1+2 = 0 and the power series stops (good thing).
Otherwise the series would produce a solution u(x) that blows up at infinity. (The
cutoff explains why >. = 2n + 1 is an eigenvalue.) I am happy with this chance to
show a success for the power series method, which is not truly a popular part of computational science and engineering.
The functions Hn(x) turn out to be Hermite polynomials. The eigenvalues in
physical units are E = (n + ½)nw. That is the quantization condition that picks
out discrete energy states for this quantum oscillator.
1.5 Eigenvalues and Eigenvectors 65
The hydrogen atom is a stiffer numerical test because e-x2 / 2 disappears. You can
see the difference in experiments with -u" + (l(l + 1)/2x2 - l/x)u = AU on the radial line 0:::; x < oo. Niels Bohr discovered that An= c/n2, which Griffiths [69] highlights
as "the most important formula in all of quantum mechanics. Bohr obtained it in 1913 by a serendipitous mixture of inapplicable classical physics and premature quantum theory... "
Now we know that Schrodinger's equation and its eigenvalues hold the key.
Properties of eigenvalues and eigenvectors
Matrix
Symmetric: AT = A Orthogonal: QT = Q-1 Skew-symmetric: AT = -A
Complex Hermitian: AT = A
Positive Definite: x T Ax > 0
Eigenvalues all A are real
all IAI = 1
all A are imaginary
all A are real
all A > 0
Eigenvectors
orthogonal x;Xj = 0
x; orthogonal x j = 0 x; orthogonal x j = 0 x; orthogonal x j = 0
orthogonal
Markov: mij > 0, L~=l mij = 1 Similar: B = M-1AM
Projection: P = P 2 = pT
Amax = 1 A(B) = A(A)
A= l; 0
steady state x > 0 x(B) = M-1x(A)
column space; nullspace
Reflection: I - 2uuT
A= -1; 1, .. , 1
u;u_1_
Rank One: uvT
A=vTu; 0, .. ,0
u; Vl_
Inverse: A-1 Shift: A+ cl Stable Powers: An--+ 0
1/A(A)
A(A) + c
all IAI < 1
eigenvectors of A eigenvectors of A
Stable Exponential: eAt --+ O
Cyclic: P(l, .. , n) = (2, .. , n, 1)
Toeplitz: -1, 2, -1 on diagonals Diagonalizable: SAs-1
all Re A<0
Ak = e21rik/n
Ak = 2 - 2cos EnI+..l.
diagonal of A
Xk = (1, Ak, ... , A~-l)
= Xk
(S•ill
k1r
n+l,
Slll
2k1r
n+l,
)
•••
columns of S are independent
Symmetric: QAQT
Jordan: J = M- 1AM
diagonal of A (real) columns of Qare orthonormal
diagonal of J
each block gives x = (0, .. , 1, .. , 0)
SVD: A= U~VT
singular values in ~ eigenvectors of ATA, AAT in V, U
66 Chapter 1 Applied Linear Algebra
1.6 POSITIVE DEFINITE MATRICES
This section focuses on the meaning of "positive definite." Those words apply to square symmetric matrices with especially nice properties. They are summarized at the end of the section, and I believe we need three basic facts in order to go forward:
1. Every K = ATA is symmetric and positive definite (or at least semidefinite).
2. If K 1 and K 2 are positive definite matrices then so is K 1 + K2 .
3. All pivots and all eigenvalues of a positive definite matrix are positive.
The pivots and eigenvalues have been emphasized. But those don't give the best
approach to facts 1 and 2. When we add K 1 + K2, it is not easy to follow the pivots
or eigenvalues in the sum. When we multiply ATA (and later ATCA), why can't the pivots be negative? The key is in the energy ½uTKu. We really need an energy-based definition of positive definiteness, from which facts 1, 2, 3 will be clear.
Out of that definition will come the test for a function P(u) to have a minimum. Start with a point where all partial derivatives 8P/8u1 , 8P/8u2 , ... , 8P/8un are zero. This point is a minimum (not a maximum or saddle point) if the matrix of second derivatives is positive definite. The discussion yields an algorithm that actually finds this minimum point. When P(u) is a quadratic function (involving only ½Kiiul and KijUiUj and fiui) that minimum has a neat and important form:
The minimum of P(u) = ½uTKu - uTf is Pmin = -½ITK- 1f when Ku= f.
Examples and Energy-based Definition
Three example matrices ½K, B, Mare displayed below, to show the difference between definite and semidefinite and indefinite. The off-diagonal entries in these examples get larger at each step. You will see how the "energy" goes from positive (for K) to possibly zero (for B) to possibly negative (for M).
Definite
[
}
-2
-½ ]
1
Ui - U1U2 + U~
Always positive
Semidefinite
B = [ -! -! ]
Ui - 2U1U2 + U~
Positive or zero
Indefinite
M=
[
1 -3
-3] 1
Ui - 6U1U2 + U~
Positive or negative
Below the three matrices you will see something extra. The matrices are multiplied
on the left by the row vector uT = [u1 u2 ] and on the right by the column vector u.
The results uT ( ½K) u and uT Bu and uTMu are printed under the matrices.
With zeros off the diagonal, I is positive definite (pivots and eigenvalues all 1). When the off-diagonals reach -½, the matrix ½K is still positive definite. At -1
1.6 Positive Definite Matrices 67
we hit the semidefinite matrix B (singular matrix). The matrix M with -3 off the diagonal is very indefinite (pivots and eigenvalues of both signs). It is the size of those off-diagonal numbers-½, -1, -3 that is important, not the minus signs.
Quadratics These pure quadratics like ur - U1U2 + U§ contain only second degree
terms. The simplest positive definite example would be ur + u§, coming from the
identity matrix I. This is positive except at u1 = u2 = 0. Every pure quadratic function
comes from a symmetric matrix. When the matrix is S, the function is uT Su.
When S has an entry b on both sides of its main diagonal, those entries combine into 2b in the function. Here is the multiplication uT Su, when a typical 2 by 2 symmetric matrix S produces aui and 2bu1u2 and cu§:
Quadratic Function
Notice how a and con the diagonal multiply Ui and U§. The two b's multiply u 1u2.
The numbers a, b, c will decide whether uT Su is always positive (except at u = 0).
This positivity of uT Su is the requirement for S to be a "positive definite matrix."
Definition
The symmetric matrix S is positive definite when
u T Su > 0 for every vector u except u = 0.
The graph of uT Su goes upward from zero. There is a minimum point at u = 0.
Figure 1.16a shows Ui - u1u2 + U§, from S = ½K. Its graph is like a bowl.
This definition makes it easy to see why the sum K 1 + K2 stays positive definite (Fact 2 above). We are adding positive energies so the sum is positive. We don't need to know the pivots or eigenvalues. The sum of uT K 1u and uT K 2u is uT(K1 + K 2 )u.
If the two pieces are positive whenever u i= 0, the sum is positive too. Short proof!
In the indefinite case, the graph of uTMu goes up and down from the origin in Figure 1.16c. There is no minimum or maximum, and the surface has a "saddle
point." If we take u1 = 1 and u2 = 1 then uTMu= -4. If u1 = 1 and u2 = 10 then
uT Mu= +41. The semidefinite matrix B has uT Bu = (u1 - u2) 2. This is positive for most u, but it is zero along the line u1 = u2.
Figure 1.16: Positive definite, semidefinite, and indefinite: Bowl, trough, and saddle.
68 Chapter 1 Applied Linear Algebra
Sums of Squares
To confirm that M is indefinite, we found a vector with uTMu > 0 and another vector with uT Mu < 0. The matrix K needs more thought. How do we show that uT Ku stays positive? We cannot substitute every u1, u2 and it would not be enough to test only a few vectors. We need an expression that is automatically positive, like
uTu = ur + U§. The key is to write UT Ku as a sum of squares:
UT Ku= 2ur - 2u1U2 + 2u~ = Ui + (u1 - u2) 2 + u~ (three squares) (2)
The right side cannot be negative. It cannot be zero, except if u1 = 0 and u2 = 0. So
this sum of squares proves that K is a positive definite matrix.
We could achieve the same result with two squares instead of three:
uTKu = 2u12 - 2u1u2 + 2u22 = 2(u1 - 21u2 )2 + 23u22 (two squares)
(3)
J What I notice about this sum of squares is that the coefficients 2 and are the pivots
of K. And the number -½ inside the first square is the multiplier €21 , in K = LDLT:
-i ] ~ Two squares K = [ -~ -; ] = [ -½ 1 ] [ 2 ] [ 1
= LDLT. (4)
The sum of three squares in (2) is connected to a factorization K = ATA, in which A
~ l has three rows instead of two. The three rows give the squares in ur + (u2-u1)2+u§:
Three squares
K = [ -12
-12 ] = [ 01
-11
-10 ]
[-~ 0
-1
= ATA.
(5)
Probably there could be a factorization K = ATA with four squares in the sum and
four rows in the matrix A. What happens if there is only one square in the sum?
Semidefinite UT Bu= ur - 2u1U2 + U§ = (u1 - u2) 2 (only one square). (6)
The right side can never be negative. But that single term (u 1 - u2) 2 could be zero! A sum of less than n squares will mean that an n by n matrix is only semidefinite.
The indefinite example uTMu is a difference of squares (mixed signs):
uTMu = Ui - 6u1u2 + u~ = (u1 - 3u2) 2 - 8u~ (square minus square). (7)
Again the pivots 1 and -8 multiply the squares. Inside is the number €21 = -3 from elimination. The difference of squares is coming from M = LDLT, but the diagonal
pivot matrix D is no longer all positive and M is indefinite:
Indefinite
M = [ 1 -3 ] = [ 1 ] [ 1 ] [ 1 -3 ] = LDLT.
-3 1
-3 1
-8
1
(8)
The next page moves to the matrix form uTATAu for a sum of squares.
1.6 Positive Definite Matrices 69
Positive Definiteness from ATA, ATCA, LDLT, and QAQT
Now comes the key point. Those 2 by 2 examples suggest what happens for n by n positive definite matrices. K might come as AT times A, for some rectangular
matrix A. Or elimination factors K into LDLT and D > 0 gives positive definiteness.
Eigenvalues and eigenvectors factor K into QAQT, and the eigenvalue test is A> 0. The matrix theory only needs a few sentences. In linear algebra, "simple is good."
K =ATA is symmetric positive definite if and only if A has independent columns.
This means that the only solution to Au = 0 is the zero vector u = 0. If there are nonzero
solutions to Au= 0, then ATA is positive semidefinite.
We now show that uTKu 2:: 0, when K is ATA. Just move the parentheses!
Basic trick for ATA
(9)
This is the length squared of Au. So ATA is at least semidefinite.
When A has independent columns, Au = 0 only happens when u = 0. The only vector in the nullspace is the zero vector. For all other vectors uT (AT A)u = 11 Au II 2
is positive. So ATA is positive definite, using the energy-based definition uTKu > 0.
Example 1 If A has more columns than rows, then those columns are not independent.
With dependent columns ATA is only semidefinite. This example (the free-free matrix
B 3 ) has three columns and two rows in A, so dependent columns:
l [ l 3 columns of A add to zero
3 columns of AT A add to zero
[
-
1 0
_ o1
1
[ -1 1 0 ] _ 0 -1 1 -
_ 1 -12 _ o1
0 _1 1 •
This is the semidefinite case. If Au = 0 then certainly ATAu = 0. The rank of ATA always equals the rank of A (its rank here is only r = 2). The energy uT Bu is
(u2 - u1) 2 + (u3 - u2 )2, with only two squares but n = 3.
It is a short step from ATA to positive definiteness of the triple products ATCA and LDLT and QAQT. The middle matrices C and D and A are easily included.
The matrix K = ATCA is symmetric positive definite, provided A has independent
columns and the middle matrix C is symmetric positive definite.
To check positive energy in ATCA, use the same idea of moving the parentheses:
Same trick
(10)
If u is not zero then Au is not zero (because A has independent columns). Then
(Au?C(Au) is positive because C is positive definite. So uTKu> 0: positive definite.
C = cT in the middle could be the pivot matrix Dor the eigenvalue matrix A.
70 Chapter 1 Applied Linear Algebra
If a symmetric K has a full set of positive pivots, it is positive definite.
Reason: The diagonal pivot matrix Din LDLT is positive definite. LT has independent columns (l's on the diagonal and invertible). This is the special case of ATCA
with C = D and A= LT. Pivots in D multiply squares in £Tu to give uTKu:
(11)
The pivots are those factors a and c - ~. This is called "completing the square."
If a symmetric K has all positive eigenvalues in A, it is positive definite.
Reason: Use K = QAQT. The diagonal eigenvalue matrix A is positive definite.
The orthogonal matrix is invertible (Q-1 is QT). Then the triple product QAQT is positive definite. The eigenvalues in A multiply the squares in QTu:
(12)
The eigenvalues are 3 and 1. The unit eigenvectors are (1, -1)/v'2 and (1, 1)/v'2.
If there were negative pivots or negative eigenvalues, we would have a difference
of squares. The matrix would be indefinite. Since Ku = >.u leads to uT Ku = >.uTu, positive energy uT Ku requires positive eigenvalues >..
Review and Summary A symmetric matrix K is positive definite if it passes any
of these five tests (then it passes them all). I will apply each test to the 3 by 3 second
difference matrix K = toeplitz([2 - 1 0]).
1. All pivots are positive
J, ! K = LDLT with pivots 2,
2. Upper left determinants > 0 K has determinants 2, 3, 4
3. All eigenvalues are positive K = QAQT with >. = 2, 2 + y'2, 2 - y'2
4. uT Ku> 0 if u -=I= 0
uTKu= 2(u1 - ½u2) 2 + J(u2 - ~u3) 2 + !u32
5. K = ATA, indep. columns A can be the Cholesky factor chol(K)
That Cholesky factorization chooses the square upper triangular A = ,./J5LT. The
l command chol will fail unless K is positive definite, with positive pivots:
Square
[ 1.4142
[1.4142 -0.7071
l
ATA=K K= -0.7071 1.2247
1.2247 -0.8165
A=chol(K)
-0.8165 1.1547
1.1547
1.6 Positive Definite Matrices 71
Minimum Problems in n Dimensions
Minimum problems appear everywhere in applied mathematics. Very often, ½uT Ku is the "internal energy" in the system. This energy should be positive, so K is naturally positive definite. The subject of optimization deals with minimization, to produce the best design or the most efficient schedule at the lowest cost. But the cost function P(u) is not a pure quadratic uT Ku with minimum at 0.
A key step moves the minimum away from the origin by including a linear term
-uTJ. When K = K 2 , the optimization problem is to minimize P(u):
Total energy P(u) = ~uTKu - uT f = (u~ - u1u2 + u~) - uif1 - ud2. (13)
The partial derivatives (gradient of P) with respect to u 1 and u2 must both be zero:
= Calculus gives Ku f
8P/8u1 = 2u1 - u2 - Ji= 0
8P/8u2 = -u1 + 2u2 - h = 0
(14)
In all cases the partial derivatives of P(u) are zero when Ku = f. This is truly
a minimum point (the graph goes up) when K is positive definite. We substitute
u = K-1f into P(u) to find the minimum value of P:
P(u) is never below that value Pmin· For every P(u) the difference is ~ 0:
P(u) - P(K-1J) = ½uTKu - uT f - (-½JTK-1J)
= ½(u- K- 1J)TK(u- K- 1J) ~ 0.
(16)
The last result is never negative, because it has the form ½vT K v. That result is zero
only when the vector v = u - K-1f is zero (which means u = K-1!). So at every point except u = K-1f, the value P(u) is above the minimum value Pmin·
Shifted bowl
72 Chapter 1 Applied Linear Algebra
Test for a Minimum: Positive Definite Second Derivatives
Suppose P(u1, ... ,un) is not a quadratic function. Then its derivatives won't be linear functions. But to minimize P(u) we still look for points where all the first
derivatives (the partial derivatives) are zero:
1st derivative vector
is gradient 8P/8u
(17)
If those n first derivatives are all zero at the point u* = (uf, ... , u!), how do we
know whether P(u) has a minimum (not a maximum or saddle point) at u*?
To confirm a minimum we look at the second derivatives. Remember the rule for
an ordinary function y(x) at a point where dy/dx = 0. This point is a minimum if
d2y/dx2 > 0. The graph curves upward. Then-dimensional version of d2y/dx2 is the symmetric "Hessian" matrix H of second derivatives:
2nd derivative matrix
(18)
The Taylor series for P(u), when u is near u*, starts with these three terms (constant, linear from gradient, and quadratic from Hessian):
Ta~lor series
P(u) = P(u*) + (u* -
u?{a)Pu
(u*)
+
!(u* 2
-
u)TH(u*)(u* -
u) + · · ·
(19)
Suppose the gradient vector 8P/ au of the first derivatives is zero at u *, as in (17).
So the linear term is gone and the second derivatives are in control. If H is positive definite at u*, then (u* - u )TH(u* - u) carries the function upward as we leave u*.
A positive definite H(u*) produces a minimum at u = u*.
Our quadratic functions were P(u) = ½uTKu-uTf. The second derivative matrix was H = K, the same at every point. For non-quadratic functions, H changes from
point to point, and we might have several local minima or local maxima. The decision
depends on Hat every point u* where the first derivatives are zero.
Here is an example with one local minimum at (0, 0), even though the overall minimum is -oo. The function includes a fourth power uf.
Example 2 P(u) = 2ur + 3u~ - uf has zero derivatives at (uf, un = (0, 0).
• • Second derivatives
H = [ 02 P8I2aPu/2aauur1
82P0/82uP1I0aUu~2
]
=
[
4 -
12ur 0
0 ] 6
At the point (0, 0), H is certainly positive definite. So this is a local minimum.
There are two other points where both first derivatives 4u1 - 4uf and 6u2 are zero.
Those points are u* = (1, 0) and u* = (-1, 0). The second derivatives are -8 and 6 at
both of those points, so H is indefinite. The graph of P(u) will look like a bowl around
(0, 0), but (1, 0) and (-1, 0) are saddle points. MATLAB could draw y = P(u 1, u2 ).
1.6 Positive Definite Matrices 73
Newton's Method for Minimization
This section may have seemed less "applied" than the rest of the book. Maybe so, but minimization is a problem with a million applications. And we need an algorithm to minimize P(u), especially when this function is not a quadratic. We have to expect an iterative method, starting from an initial guess u0 and improving it to u1 and u2 (approaching the true minimizer u* if the algorithm is successful).
The natural idea is to use the first and second derivatives of P(u) at the current point. Suppose we have reached ui with coordinates ui, ... ,u~. We need a rule to choose the next point ui+l_ Close to ui, the function P(u) is approximated by cutting off the Taylor series, as in (19). Newton will minimize PcutotT(u).
Pcutoff is a quadratic function. Instead of K it has the second derivative H. Both
8P/8u and Hare evaluated at the current point u = ui (this is the expensive part of
the algorithm). The minimum of Pcutoff is the next guess ui+1.
= Newton's method to solve 8P/8u 0
(21)
For quadratics, one step gives the minimizer u 1 = K- 1f. Now 8P/au and H are
changing as we move to u1 and ui and ui+1. If ui exactly hits u* (not too likely) then
8P/8u will be zero. So ui+1 - ui = 0 and we don't move away from perfection.
Section 2.6 will return to this algorithm. We propose examples in the Problem Set below, and add one comment here. The full Newton step to ui+1 may be too bold, when the true minimizer u* is far away. The terms we cut off could be too large. In
that case we shorten the Newton step ui+1 - ui, for safety, by a factor c < 1.
Problem Set 1.6
1 Express uTTu as a combination of u~, u1u2 , and u~ for the free-fixed matrix
T
=
[
1 -1
-1 ] 2 •
Write the answer as a sum of two squares to prove positive definiteness.
2 Express uTKu = 4u12 + 16u1u2 + 26ui as a sum of two squares. Then find chol(K) = /DLT.
74 Chapter 1 Applied Linear Algebra
3 A different A produces the circulant second-difference matrix C = ATA:
gives
How can you tell from A that C = ATA is only semidefinite? Which vectors
solve Au= 0 and therefore Cu= 0? Note that chol(C) will fail.
4 Confirm that the circulant C = ATA above is semidefinite by the pivot test.
Write uT Cu as a sum of two squares with the pivots as coefficients. (The eigenvalues 0, 3, 3 give another proof that C is semidefinite.)
5 uTCu 2: 0 means that ur + u~ + u~ 2: U1U2 + U2U3 + U3U1 for any U1' U2, U3. A
more unusual way to check this is by the Schwarz inequality lvTwl ::; llvll llwll:
lu1U2 + U2U3 + U3U1I ::; Jur + u~ + u~ Ju~+ u~ + ur,
Which u's give equality? Check that uT Cu = 0 for those u.
6 For what range of numbers b is this matrix positive definite?
!]· K=[~
There are two borderline values of b when K is only semidefinite. In those cases
write uT Ku with only one square. Find the pivots if b = 5. 7 Is K = ATA or M = BTB positive definite (independent columns in A or B)?
We know that uTMu= (Bu)T(Bu) = (u1+4u2)2 + (2u1+ 5u2)2 + (3u1+6u2 )2.
Show how the three squares for uTKu= (Au)T(Au) collapse into one square.
Problems 8-16 are about tests for positive definiteness.
8 Which of A1, A2 , A3 , A4 has two positive eigenvalues? Use the tests a > 0 and ac > b2 , don't compute the >. 's. Find a vector u so that uT A1u < 0. A3 = [ 101 10100]
9 For which numbers b and c are these matrices positive definite?
and
With the pivots in D and multiplier in L, factor each A into LDLT.
1.6 Positive Definite Matrices 75
10 Show that f(x, y) = x2 + 4xy + 3y2 does not have a minimum at (0, 0) even
though it has positive coefficients. Write f as a difference of squares and find a point (x, y) where f is negative.
11 The function f(x, y) = 2xy certainly has a saddle point and not a minimum at
(0, 0). What symmetric matrix S produces this f? What are its eigenvalues? 12 Test the columns of A to see if ATA will be positive definite in each case:
[i ~] and A=
13 Find the 3 by 3 matrix S and its pivots, rank, eigenvalues, and determinant:
14 Which 3 by 3 symmetric matrices S produce these functions f = xT Sx? Why
is the first matrix positive definite but not the second one?
(a) f = 2(x~ + x~ + x~ - X1X2 - x2x3)
= (b) f 2(x~ + x~ + x~ - X1X2 - X1X3 - X2X3).
15 For what numbers c and d are A and B positive definite? Test the three upper left determinants (1 by 1, 2 by 2, 3 by 3) of each matrix:
A= [ C1 c1 1ll
and
1 1 C
16 If A is positive definite then A-1 is positive definite. Best proof: The eigenvalues of A-1 are positive because __ . Second proof (only quick for 2 by 2):
The entries of A-1 = ac _1 b2 [ -bc -ba] pass the determinant tests __ .
17 A positive definite matrix cannot have a zero (or even worse, a negative number)
on its diagonal. Show that this matrix fails to have uT Au > 0.
[ul
U2 U3] [ 4~
~1
~ll
1 [u~: ]
is not positive when (u1, u2, u3) = ( , , ).
18 A diagonal entry aii of a symmetric matrix cannot be smaller than all the >.'s. If it were, then A - ajjl would have __ eigenvalues and would be positive definite. But A - ajjl has a zero on the main diagonal.
76 Chapter 1 Applied Linear Algebra
19 If all>.> 0, show that uTKu > 0 for every u i- 0, not just the eigenvectors xi.
Write u as a combination of eigenvectors. Why are all "cross terms" xrXj = 0?
20
Without multiplying A =
[
c?s0 sm0
- sin0] [2 cos0 0
OJ [ c?s0 5 - sm0
sin0] find cos0 '
(a) the determinant of A (b) the eigenvalues of A (c) the eigenvectors of A (d) a reason why A is symmetric positive definite.
21 For fi(x, y) = ¼x4 +x2y+y2 and h(x, y) = x3 +xy-x find the second derivative (Hessian) matrices H1 and H2:
H1 is positive definite so Ji is concave up(= convex). Find the minimum point of Ji and the saddle point of h (look where first derivatives are zero).
22 The graph of z = x 2 + y2 is a bowl opening upward. The graph of z = x 2 - y2
is a saddle. The graph of z = -x2 - y2 is a bowl opening downward. What is
a test on a, b, c for z = ax2 + 2bxy + cy2 to have a saddle at (0, 0)?
23 Which values of c give a bowl and which give a saddle point for the graph of
z = 4x2 + 12xy + cy2 ? Describe this graph at the borderline value of c.
24 Here is another way to work with the quadratic function P(u). Check that
The last term -½JTK-1f is Pmin· The other (long) term on the right side is always __ . When u = K-1f, this long term is zero so P = Pmin.
25 Find the first derivatives inf= 8P/8u and the second derivatives in the matrix H for P(u) = ui+u~-c(ui+u~)4 . Start Newton's iteration (21) at u0 = (1,0). Which values of c give a next vector u1 that is closer to the local minimum at u* = (0, 0)? Why is (0, 0) not a global minimum?
26 Guess the smallest 2, 2 block that makes [c-1 A; AT __ ] semidefinite.
! 1] 27 If Hand Kare positive definite, explain why M = [
is positive definite
11] but N = [
is not. Connect the pivots and eigenvalues of M and N
to the pivots and eigenvalues of H and K. How is chol(M) constructed from chol(H) and chol(K)?
1.6 Positive Definite Matrices 77
28 This "KKT matrix" has eigenvalues >.1 = 1, >.2 = 2, >.3 = -1: Saddle point.
Put its unit eigenvectors inside the squares and >. = 1, 2, -1 outside: Verify Wi + w~ - 2uw1 + 2uw2 = 1( __ )2 + 2( __ )2-1( __ )2.
The first parentheses contain (w1- w2)/ v2 from the eigenvector (1, -1, 0) / ../2.
We are using QAQT instead of LDLT. Still two squares minus one square. 29 (Important) Find the three pivots of that indefinite KKT matrix. Verify that
the product of pivots equals the product of eigenvalues (this also equals the determinant). Now put the pivots outside the squares:
1.7 NUMERICAL LINEAR ALGEBRA: LU, QR, SVD
Applied mathematics starts from a problem and builds an equation to describe it. Scientific computing aims to solve that equation. Numerical linear algebra displays this "build up, break down" process in its clearest form, with matrix models:
Ku=J or Kx=>.x or Mu"+Ku=O.
Often the computations break K into simpler pieces. The properties of K are crucial: symmetric or not, banded or not, sparse or not, well conditioned or not. Numerical linear algebra can deal with a large class of matrices in a uniform way, without adjusting to every detail of the model. The algorithm becomes clearest when we see it as a factorization into triangular matrices or orthogonal matrices or very sparse matrices. We will summarize those factorizations quickly, for future use.
This chapter began with the special matrices K, T, B, C and their properties. We needed something to work with! Now we pull together the factorizations you need for more general matrices. They lead to "norms" and "condition numbers" of any A. In my experience, applications of rectangular matrices constantly lead to AT and ATA.
Three Essential Factorizations
I will use the neutral letter A for the matrix we start with. It may be rectangular. If A
has independent columns, then K = ATA is symmetric positive definite. Sometimes
we operate directly with A (better conditioned and more sparse) and sometimes with K (symmetric and more beautiful).
Here are the three essential factorizations, A= LU and A= QR and A= UEVT:
(1) Elimination reduces A to U by row operations using multipliers in L:
A = LU = lower triangular times upper triangular
(2) Orthogonalization changes the columns of A to orthonormal columns in Q:
A = QR = orthonormal columns times upper triangular
(3) Singular Value Decomposition sees every A as (rotation)(stretch)(rotation):
A = U:EVT = orthonormal columns X singular values X orthonormal rows
As soon as I see that last line, I think of more to say. In the SVD, the orthonormal columns in U and V are the left and right singular vectors (eigenvectors of AAT and
ATA). Then AV = UE is like the usual diagonalization AS = SA by eigenvectors, but with two matrices U and V. We only have U = V when AAT = ATA.
1.7 Numerical Linear Algebra: LU, QR, SVD 79
For a positive definite matrix K, everything comes together: U is Q and VT is QT.
The diagonal matrix Eis A (singular values are eigenvalues). Then K = QAQT. The columns of Q are the principal axes = eigenvectors = singular vectors. Matrices
with orthonormal columns play a central role in computations. Start there.
Orthogonal Matrices
The vectors q1, q2 , ... , qn are orthonormal if all their inner products are 0 or 1:
q;qj = 0 if i-/- j (orthogonality)
T . = 1 (normalization
q, q,
to unit vectors)
(1)
Those dot products are beautifully summarized by the matrix multiplication QTQ = /:
If Q is square, we call it an orthogonal matrix. QTQ = I tells us immediately that
o The inverse of an orthogonal matrix is its transpose: Q-1 = QT.
o Multiplying a vector by Q doesn't change its length: IIQxll = llxll-
Length (soon called norm) is preserved because 11Qxll 2 = xTQTQx = xTx = llxll 2.
This doesn't require a square matrix: QTQ = I for rectangular matrices too. But a two-sided inverse Q-1 = QT (so that QQT is also I) does require that Q is square.
Here are three quick examples of Q: permutations, rotations, reflections.
Example 1 Every permutation matrix P has the same rows as I, but probably in a different order. P has a single 1 in every row and in every column. Multiplying Px puts the components of x in that row order. Reordering doesn't change the length. All n by
n permutation matrices (there are n! of them) have p-1 = pT_
The l's in pT hit the l's in P to give pTP = I. Here is a 3 by 3 example of Px:
l [ l ~ ~ ~ ~ ~ ~ pTp = [
= J (3)
010 100
Example 2 Rotation changes the direction of vectors. It doesn't change lengths. Every
vector just turns:
Rotation matrix in the 1-3 plane
l ~ Q = [ co; 0 - st 0
sin0 0 cos0
Every orthogonal matrix Q with determinant 1 is a product of plane rotations.
80 Chapter 1 Applied Linear Algebra
l Example 3 The reflection H takes every v to its image Hv on the other side of a
plane mirror. The unit vector u (perpendicular to the mirror) is reversed into Hu= -u:
Reflection matrix
u = (cos0,O,sin0)
- cos 20 0 - sin 20 H = I - 2uuT = [ 0 1 0
- sin 20 0 cos 20
(4)
This "Householder reflection" has determinant -1. Both rotations and reflections have
orthonormal columns, and (I - 2uuT)u = u - 2u guarantees that Hu= -u. Modern
orthogonalization uses reflections to create the Q in A= QR.
Orthogonalization A= QR
We are given an m by n matrix A with linearly independent columns a 1, ... , an. Its rank is n. Those n columns are a basis for the column space of A, but not necessarily a good basis. All computations are improved by switching from the a, to orthonormal vectors q1, ... , qn. There are two important ways to go from A to Q.
1. The Gram-Schmidt algorithm gives a simple construction of the q's from the
a's. First, q1 is the unit vector ai/lla1II- In reverse, a1 = r11q1 with r11 = llaill-
Second, subtract from a2 its component in the q1 direction (the Gram-Schmidt
idea). That vector B = a2 - (q!a2)q1 is orthogonal to q1. Normalize B to q2 = B/IIBII- At every step, subtract from ak its components in the settled
directions q1, ... , qk-I, and normalize to find the next unit vector qk.
Gram-Schmidt
(m by n)(n by n)
(5)
2. The Householder algorithm uses reflection matrices I - 2uuT. Column by column, it produces zeros in R. In this method, Q is square and R is rectangular:
Householder qr(A)
(m by m)(m by n)
(6)
The vector q3 comes for free! It is orthogonal to a1, a2 and also to q1, q2. This method is MATLAB's choice for qr because it is more stable than Gram-Schmidt and gives extra information. Since q3 multiplies the zero row, it has no effect on A= QR. Use qr(A, 0) to return to the "economy size" in (5).
Section 2.3 will give full explanations and example codes for both methods. Most
linear algebra courses emphasize Gram-Schmidt, which gives an orthonormal basis q1, ... , qr for the column space of A. Householder is now the method of choice, completing to an orthonormal basis q1, ... , qm for the whole space Rm.
1.7 Numerical Linear Algebra: LU, QR, SVD 81
Numerically, the great virtue of Q is its stability. When you multiply by Q,
overflow and underflow will not happen. All formulas involving ATA become simpler,
since QTQ = I. A square system Qx = b will be perfectly conditioned, because
llxll = llbll and an error !lb produces an error Llx of the same size:
gives
and llllxll = llllbll- (7)
Singular Value Decomposition
This section now concentrates on the SVD, which reaches a diagonal matrix E.
Since diagonalization involves eigenvalues, the matrices from A = QR will not
do the job. Most square matrices A are diagonalized by their eigenvectors x1, ... , Xn,
If xis a combination c1x1 + · · · + CnXn, then A multiplies each Xi by >.i. In matrix language this is Ax = SAS- 1x. Usually, the eigenvector matrix S is
not orthogonal. Eigenvectors only meet at right angles when A is special (for example symmetric). If we want to diagonalize an ordinary A by orthogonal matrices, we need
two different Q's. They are generally called U and V, so A = UI::VT.
What is this diagonal matrix E? It now contains singular values ai instead of eigenvalues Ai. To understand those ai, the key is always the same: Look at ATA.
Find V and I::
Removing UTU = I leaves V(ETE)VT. This is exactly like K = QAQT, but it
applies to K = ATA. The diagonal matrix ETE contains the numbers a;, and those
are the positive eigenvalues of ATA. The orthonormal eigenvectors of AT A are in V.
In the end we want AV= UE. So we must choose Ui = Avdui, These ui are
orthonormal eigenvectors of AAT. At this point we have the "reduced" SVD, with v1, ... , Vr and u1, ... , Ur as perfect bases for the column space and row space of A. The rank r is the dimension of these spaces, and svd(A, 0) gives this form:
= A
UmxrErxr V,.~n
Reduced SVD
= from ui Avdui
[u,
U1 ···Ur
J[
lv! (9)
VrT
To complete the v's, add any orthonormal basis Vr+1, ... , Vn for the nullspace of A. To complete the u's, add any orthonormal basis Ur+1, ... , Um for the nullspace of AT. To complete E to an m by n matrix, add zeros for svd(A) and the unreduced form:
82 Chapter 1 Applied Linear Algebra
= A
Umxm ~mxn vnTxn
Full SVD
V1T
[ a,
U1 ···Ur·· ·Um
(10)
VT
l ar
T
VnT
Normally we number the ui, ai, Vi so that a1 2: a2 2: ••• 2: ar > 0. Then the SVD
has the wonderful property of splitting any matrix A into rank-one pieces ordered by their size:
The first piece u1a1v! is described by only m + n + l numbers, not mn. Often a
few pieces contain almost all the information in A (in a stable form). This isn't a fast method for image compression because computing the SVD involves eigenvalues. (Filters are faster.) The SVD is the centerpiece of matrix approximation.
The right and left singular vectors v, and u, are the Karhunen-Loeve bases in
engineering. A symmetric positive definite K has v, = ui: one basis.
I think of the SVD as the final step in the Fundamental Theorem of Linear Algebra. First come the dimensions of the four subspaces. Then their orthogonality. Then the orthonormal bases u1, ... , Um and v1, ... , Vn which diagonalize A.
SVD
= AvJ
aJUJ
for j ::=; r
AvJ = 0
for j > r
= ATu3 a 3v 3 for j ::=; r
= ATu3 0
for j > r
(12)
- - --------
~
......---......
u
......---......
~
V
/ / / /
Figure 1.18: U and V are rotations and reflections. ~ stretches by a1, ... , ar.
1.7 Numerical Linear Algebra: LU, QR, SVD 83
These ui = Avi/<7i are orthonormal eigenvectors of AAT. Start from ATAvi = <7?W
Multiply by v;: v'[ATAvi = <7?v'[vi says that IIAvill = <7i so lluill = 1
Multiply by vJ: vJ ATAvi = <7?vJvi says that (Avj) • (Avi) = 0 so uJui = 0
Multiply by A: AATAvi = <7?Avi says that AATui = ului
Here is a homemade code to create the SVD. It follows the steps above, based primarily on eig(A'*A). The faster and more stable codes in LAPACK work directly with A. Ultimately, stability may require that very small singular values are replaced by <7 = 0. The SVD identifies the dangers in Ax= b (near O in A, very large in x).
% input A, output orthogonal U, V and diagonal sigma with A=U*sigma*V'
[m,n]=size(A); r=rank(A); [V,squares]=eig(A'*A); % n by n matrices
sing=sqrt(squares(l:r,l:r));
% r by r, singular values> 0 on diagonal
sigma=zeros(m,n); sigma(l:r,1:r)=sing; % m by n singular value matrix
u=A*V(:,1:r)*inv(sing);
% first r columns of U (singular vectors)
[U,R]=qr(u); U(:,1:r)=u; % qr command completes u to an m by m U
A-U*sigma*V';
% test for zero m by n matrix (could print its norm)
Example 4 Find the SVD for the singular matrix A= [~ ~].
Solution A has rank one, so there is one singular value. First comes ATA:
[! -!J . ~ ATA= [;~ ;~] has>.= 100 and 0, with eigenvectors [v1 v2] =
The singular value is <71 = v'Io5 = 10. Then u1 = Avi/10 = (1, 7)/../50. Add in
U2 = (-7, 1)/../50:
o] A= u1::vT = _1_ [1 ../50 7
-71] [100
0
1 [ 1 y'2 -1
1] 1 •
Example 5 Find the SVD of then+ 1 by n backward difference matrix b._.
Solution With diagonal l's and subdiagonal -l's in b._, the products S:b._ and LLS:
are Kn and Bn+l· When (n + l)h = 1r, Kn has eigenvalues >. = <72 = 2 - 2 cos kh and
eigenvectors vk = (sin kh, ... , sin nkh). Bn+i has the same eigenvalues (plus An+1 = 0) and its eigenvectors are uk = (cos ½kh, ... , cos(n + ½)kh) in U.
Those eigenvectors vk and uk fill the DST and DCT matrices. Normalized to unit length, these are the columns of V and U. The SVD is b._ = (DCT)I:(DST). The equation b._vk = <7kUk says that the first differences of sine vectors are cosine vectors.
Section 1.8 will apply the SVD to Principal Component Analysis and to Model Reduction. The goal is to find a small part of the data and the model (starting with u1 and v1) that carries the important information.
84 Chapter 1 Applied Linear Algebra
The Pseudoinverse
By choosing good bases, A multiplies vi in the row space to give a,u, in the column
space. A-1 must do the opposite! If Av= au then A-1u = v/a. The singular values
of A-1 are 1/a, just as the eigenvalues of A-1 are 1/A. The bases are reversed. The u's are in the row space of A-1, the v's are in the column space.
Until this moment we would have added "if A-1 exists." Now we don't. A matrix
that multiplies u, to produce v,/ai does exist. It is the pseudoinverse A+ = pinv(A).
The vectors u1 , ... , Ur in the column space of A go back to the row space. The other vectors Ur+l, ... , Um are sent to zero. When we know what happens to each basis vector ui, we know A+. The pseudoinverse has the same rank r as A.
In the pseudoinverse E+ of the diagonal matrix E, each a is replaced by a-1. The product E+E is as near to the identity as we can get. So are AA+ and A+A:
AA+= projection matrix onto the column space of A A+ A= projection matrix onto the row space of A
Example 6 Find the pseudoinverse A+ of the same rank one matrix A= [~ ~].
Solution Since A has a 1 = 10, the pseudoinverse A+= pinv(A) has 1/10.
A+=
vE+uT
=
_J_2!_
[1 1
-lJ [1/10 1 0
OJ 0
_J510_
[ -
1 7
7] = _1_ [1
1 100 1
7] 7 •
The pseudoinverse of a rank-one matrix A= auvT is A+= vuT /a, also rank-one.
Always A+b is in the row space of A (a combination of the basis u1, ... , ur)-
With n > m, Ax = b is solvable when b is in the column space of A. Then A+b is
the shortest solution because it has no nullspace component, while A\b is a different
"sparse solution" with n - m zero components.
Condition Numbers and Norms
The condition number of a positive definite matrix is c(K) = Amax/Amin· This ratio
measures the "sensitivity" of the linear system Ku = f. Suppose f changes by b..f
because of roundoff or measurement error. Our goal is to estimate b..u (the change in the solution) . If we are serious about scientific computing, we have to control errors.
Subtract Ku= f from K(u + .6.u) = f + b..f. The error equation is K(b..u) =
.6.f. Since K is positive definite, Amin gives a reliable bound on b..u:
Error
bound
K(b..u) = b..f means b..u = K-1(b..f). Then ll.6.ull :'.S llb..JII (14) Amin(K)
1.7 Numerical Linear Algebra: LU, QR, SVD 85
The top eigenvalue of K-1 is 1/Amin(K). Then f::J.u is largest in the direction of that eigenvector. The eigenvalue Amin indicates how close K is to a singular matrix (but eigenvalues are not reliable for an unsymmetric matrix A). That single number Amin has two serious drawbacks in measuring the sensitivity of Ku = f or Ax = b.
First, if we multiply K by 1000, then u and f::J..u are divided by 1000. That rescaling (to make Kless singular and Amin larger) cannot change the reality of the problem. The relative error 11!::J..ull/llull stays the same, since 1000/1000 = 1. It is the relative changes in u and f that we should compare. Here is the key for positive definite K:
m · ll!::J../11 by !lull >_ 11/11 . 11!::J..ull Amax(K) ll!::J../11
Dividing 11!::J..ull :::; Amin(K)
Amax(K) gives M:::; Amin(K)
In words: f::J..u is largest when f::J..f is an eigenvector for Amin· The true solution u is smallest when f is an eigenvector for Amax• The ratio Amax/Amin produces the condition number c(K), the maximum "blowup factor" in the relative error.
Condition number for positive definite K
c(K) = Amax(K)
Amin(K)
When A is not symmetric, the inequality IIAxll :::; Amax(A)llxll can be false (see Figure 1.19). Other vectors can blow up more than eigenvectors. A triangular matrix with l's on the diagonal might look perfectly conditioned, since Amax=Amin=l. We need a norm IIAII to measure the size of every A, and Amax won't work.
DEFINITIONS The norm IIAII is the maximum of the ratio IIAxll/llxllThe condition number of A is IIAII times IIA-1 11-
Norm IIAII = max IIAxll # 0 llxll
Condition number c(A) = IIAII IIA- 1 11 (15)
n A=[~
ATA= [~ ~]
detATA = 1 ellipse of all Ax
IIAII = l + y'5
2
IIAll 2 = Amax(ATA) ~ 2.6 1/IIA-1 11 2 = Amin(AT A) ~ 1/2.6
c(A) = IIAIIIIA-1 11 ~ 2.6
Figure 1.19: The norms of A and A-1 come from the longest and shortest Ax.
86 Chapter 1 Applied Linear Algebra
IIAxll/llxll is never larger than IIAII (its maximum), so always IIAxll :=:; IIAll llxll- For all matrices and vectors, the number IIAII meets these requirements:
IIAxll :=:; IIAll llxll and IIABII :=:; IIAII IIBII and IIA + BIi :=:; IIAII + IIBII- (16)
Thenormof lO00Awill be lO00IIAII- But 1000A has the same condition number as A. For a positive definite matrix, the largest eigenvalue is the norm: IIKII = Amax(K).
Reason: The orthogonal matrices in K = QAQT leave lengths unchanged. So IIKII = IIAII = Amax• Similarly IIK-1 11 = 1/Amin(K). Then c(K) = Amax/Amin is correct.
A very unsymmetric example has Amax= 0, but the norm is IIAII = 2:
and the ratio is IIAxll llxll
This unsymmetric A leads to the symmetric ATA = [ g~]. The largest eigenvalue
is ar = 4. Its square root is the norm: IIAII = 2 = largest singular value. This singular value JAmax(AT A) is generally larger than Amax(A). Here is the
great formula for IIAll 2 all on one line:
Norm
(17)
The norm of A-1 is 1/amin, generally larger than 1/Amin· The product is c(A):
Condition number
c(A) = IIAIIIIA-1 11 = O"max •
(18)
O"min
Here is one comment: amin tells us the distance from an invertible A to the nearest singular matrix. When O"min changes to zero inside E, it is multiplied by U and VT (orthogonal, preserving norms). So the norm of that smallest change in A is O"min·
Example 7 For this 2 by 2 matrix A, the inverse just changes 7 to - 7. Notice that
72 + 12 = 50. The condition number c(A) = IIAII IIA-1 11 is at least v'50v'50 = 50:
Ax=[~ ~] [~]
A-1x = [ ~ -~] [ ~]
has IIAxll llxll
has IIA-1xll llxll
v'50
1
So IIAII ?: v50
v'50 1
Suppose we intend to solve Ax= b = [I]. The ~olution is x = [~]. Move th~ right side
-:-t] , by b..b = [ .1] . Then x moves by b..x = [
since A(b..x) = b..b. The relative change
in x is 50 times the relative change in b:
lib.xii = ( l)v50 is 50 times greater than ll~bll
llxll
llbll
1.7 Numerical Linear Algebra: LU, QR, SVD 87
Example 8 The eigenvalues of the -1, 2, -1 matrix are A = 2 - 2 cos n".;1 . Then k = l
and k = n give Amin and Amax- The condition number of Kn grows like n2:
Amax is nearly 2 - 2 cos 1r = 4, at the top of Figure 1.13. The smallest eigenvalue uses cos0 ~ 1- ½02 from calculus, which is the same as 2 - 2cos0 ~ 02 = (n:1 )2.
A rough rule for Ax = b is that the computer loses about log c decimals to
roundoff error. MATLAB gives a warning when the condition number is large (c is not calculated exactly, the eigenvalues of ATA would take too long). It is normal for c(K) to be of order 1/(.6.x)2 in approximating a second-order differential equation, agreeing with n2 in (19). Fourth order problems have Amax/Amin~ C/(.6.x)4 .
Row Exchanges in PA = LU
Our problems might be ill conditioned or well conditioned. We can't necessarily control c(A), but we don't want to make the condition worse by a bad algorithm. Since elimination is the most frequently used algorithm in scientific computing, a lot of effort has been concentrated on doing it right. Often we reorder the rows of A.
The main point is that small pivots are dangerous. To find the numbers that multiply rows, we divide by the pivots. Small pivots mean large multipliers in L. Then L (and probably U) are more ill-conditioned than A. The simplest cure is to exchange rows by P, bringing the largest possible entry up into the pivot.
The command lu(A) does this "partial pivoting" for A = [1 2; 3 3 ]. The first
pivot changes from 1 to 3. Partial pivoting avoids multipliers in L larger than 1:
[P, L, u] = lu{A)
The product of pivots is - det A = +3 since P exchanged the rows of A.
A positive definite matrix K has no need for row exchanges. Its factoriza-
tion into K = LDLT can be rewritten as K = L../l5..fl5LT (named after Cholesky). In this form we are seeing K = ATA with A= ../DLT. Then we know from (17) that Amax(K) = IIKII = O'~ax(A) and Amin(K) = a~in(A). Elimination to A= chol(K) does absolutely no harm to the condition number of a positive definite K = ATA:
A= chol{K)
c(K) = Amax(K) = (O'max(A)) 2 = (c(A))2.
(20)
Amin(K) O'min(A)
Usually elimination into PA = LU makes c(L) c(U) larger than the original c(A).
That price is often remarkably low-a fact that we don't fully understand.
88 Chapter 1 Applied Linear Algebra
The next chapters build models for important applications. Discrete problems lead to matrices A and AT and ATA in Chapter 2. A differential equation produces many discrete equations, as we choose finite differences or finite elements or spectral methods or a Fourier transform-or any other option in scientific computing. All these options replace calculus, one way or another, by linear algebra.
Problem Set 1.7
Problems 1-5 are about orthogonal matrices with QTQ = I.
1 Are these pairs of vectors orthonormal or only orthogonal or only independent?
(c) [c?s0] and [-sin0].
sm0
cos0
Change the second vector when necessary to produce orthonormal vectors.
2 Give an example of each of the following:
(a) A matrix Q that has orthonormal columns but QQT -=I= I. (b) Two orthogonal vectors that are not linearly independent.
(c) An orthonormal basis for R 4, where every component is ½or -½-
3 If Q and Q are orthogonal matrices, show that their product Q Q is also an
1
2
1
2
orthogonal matrix. (Use QTQ = I.)
4 Orthonormal vectors are automatically linearly independent. Two proofs:
(a) Vector proof: + + When c1q1 c2q2 c 3q3 = 0, what dot product leads to
c = O? Similarly c = 0 and c = 0. Thus the q's are independent.
1
2
3
(b) Matrix proof: Show that Qx = 0 leads to x = 0. Since Q may be rectan-
gular, you can use QT but not Q-1.
5 If a a a is a basis for R 3 any vector b can be written as
,
,
,
1
2
3
or
(a) Suppose the a's are orthonormal. Show that x1 = aTb. (b) Suppose the a's are orthogonal. Show that x1 = aTb/aTa1.
(c) If the a's are independent, x1 is the first component of __ times b.
1.7 Numerical Linear Algebra: LU, QR, SVD 89
Problems 6-14 and 31 are about norms and condition numbers.
6 Figure 1.18 displays any matrix A as rotation times stretching times rotation:
A=U~VT= [c?sa -sina] [u1 ] [ c?s0 sin0] (2l)
sm a cos a
u2 - sm 0 cos 0
The count of four parameters a, u1, u2 , 0 agrees with the count of four entries
an, a12, a21, a22- When A is symmetric and a12 = a21, the count drops to three
because a= 0 and we only need one Q. The determinant of A in (21) is u1u2 .
For det A< 0, add a reflection. In Figure 1.19, verify Amax(ATA) = ½(3 + v'S) and its square root IIAII = ½(1 + v'S).
7 Find by hand the norms Amax and condition numbers Amax/ Amin of these posi-
tive definite matrices:
8 Compute the norms and condition numbers from the square roots of A (AT A):
9 Explain these two inequalities from the definitions of the norms IIAII and IIBII: IIABxll ::; IIAII IIBxll ::; IIAII IIBll llxll-
From the ratio that gives IIABII, deduce that IIABII ::; IIAII IIBII- This fact is
the key to using matrix norms.
10 Use IIABII ::; IIAII IIBII to prove that the condition number of any matrix A is at least 1. Show that an orthogonal Q has c(Q) = 1.
11 If A is any eigenvalue of A, explain why jAI ::; IIAII- Start from Ax= AX. 12 The "spectral radius" p(A) = IAmaxl is the largest absolute value of the eigen-
values. Show with 2 by 2 examples that p(A + B) ::; p(A) + p(B) and p(AB) ::;
p(A)p(B) can both be false. The spectral radius is not acceptable as a norm.
13 Estimate the condition number of the ill-conditioned matrix A = [½1.o~oi] . 14 The "€1 norm" and the "€00 norm" of x = (x1 , ... , Xn) are
Compute the norms llxll and llxll1 and llxii 00 of these two vectors in R5: x = (l, 1, 1, 1, 1) x = (.l, .7, .3, .4, .5).