COMPUTATIONAL SCIENCE AND ENGINEERING GILBERT STRANG Massachusetts Institute of Technology WELLESLEY-CAMBRIDGE PRESS Box 812060 Wellesley MA 02482 Computational Science and Engineering Copyright @2007 by Gilbert Strang ISBN-10 0-9614088-1-2 ISBN-13 978-0-9614088-1-7 All rights reserved. No part of this work may be reproduced or stored or transmitted by any means, including photocopying, without written permission from Wellesley-Cambridge Press. Translation in any language is strictly prohibited - authorized translations are arranged. Other texts from Wellesley-Cambridge Press 98765432 Introduction to Linear Algebra, Gilbert Strang ISBN-10 0-9614088-9-8 ISBN-13 978-0-9614088-9-3. Wavelets and Filter Banks, Gilbert Strang and Truong Nguyen ISBN-10 0-9614088-7-1 ISBN-13 978-0-9614088-7-9. Linear Algebra, Geodesy, and GPS, Gilbert Strang and Kai Borre ISBN-10 0-9614088-6-3 ISBN-13 978-0-9614088-6-2. Introduction to Applied Mathematics, Gilbert Strang ISBN-10 0-9614088-0-4 ISBN-13 978-0-9614088-0-0. An Analysis of the Finite Element Method, Gilbert Strang and George Fix ISBN-10 0-9802327-0-8 ISBN-13 978-0-9802327-0-7. Calculus, Gilbert Strang ISBN-10 0-9614088-2-0 ISBN-13 978-0-9614088-2-4. Wellesley-Cambridge Press Box 812060 Wellesley MA 02482 USA www.wellesleycambridge.com gs@math.mit.edu math.mit.edu/~gs phone (781) 431-8488 fax (617) 253-4358 J5IEX text preparation by Valutone Solutions, www.valutone.com. Ib-'JEX assembly and book design by Brett Coonley, Massachusetts Institute of Technology. MATLAB@ is a registered trademark of The Mathworks, Inc. Course materials including syllabus and MATLAB codes and exams are available on the computational science and engineering web site: math.mit.edu/cse. Problem solutions will also be on this cse site, with further examples. Videotaped lectures of the CSE courses 18.085 and 18.086 (which now use this book) are available on the course web sites: math.mit.edu/18085 and math.mit.edu/18086. Computational Science and Engineering is also included in OpenCourseWare ocw.mit.edu. TABLE OF CONTENTS 1 Applied Linear Algebra 1 1.1 Four Special Matrices 1 1.2 Differences, Derivatives, Boundary Conditions 13 1.3 Elimination Leads to K = LDLT 26 1.4 Inverses and Delta Functions 36 1.5 Eigenvalues and Eigenvectors 46 1.6 Positive Definite Matrices 66 1.7 Numerical Linear Algebra: LU, QR, SVD 78 1.8 Best Basis from the SVD 92 2 A Framework for Applied Mathematics 98 2.1 Equilibrium and the Stiffness Matrix 98 2.2 Oscillation by Newton's Law 111 2.3 Least Squares for Rectangular Matrices 128 2.4 Graph Models and Kirchhoff's Laws 142 2.5 Networks and Transfer Functions 156 2.6 Nonlinear Problems 171 2.7 Structures in Equilibrium 185 2.8 Covariances and Recursive Least Squares 200 * 2.9 Graph Cuts and Gene Clustering 217 3 Boundary Value Problems 229 3.1 Differential Equations and Finite Elements 229 3.2 Cubic Splines and Fourth-Order Equations 245 3.3 Gradient and Divergence 255 3.4 Laplace's Equation 269 3.5 Finite Differences and Fast Poisson Solvers 283 3.6 The Finite Element Method 293 3.7 Elasticity and Solid Mechanics 310 4 Fourier Series and Integrals 317 4.1 Fourier Series for Periodic Functions 317 4.2 Chebyshev, Legendre, and Bessel 334 4.3 Discrete Fourier Transform and the FFT 346 4.4 Convolution and Signal Processing 356 4.5 Fourier Integrals 367 4.6 Deconvolution and Integral Equations 381 4.7 Wavelets and Signal Processing 388 Ill Table of Contents 5 Analytic Functions 403 5.1 Taylor Series and Complex Integration 403 5.2 Famous Functions and Great Theorems 419 5.3 The Laplace Transform and z-Transform 426 5.4 Spectral Methods of Exponential Accuracy 440 6 Initial Value Problems 456 6.1 Introduction 456 6.2 Finite Difference Methods 461 6.3 Accuracy and Stability for Ut = cUx 472 6.4 Wave Equations and Staggered Leapfrog 485 6.5 Diffusion, Convection, and Finance 500 6.6 Nonlinear Flow and Conservation Laws 517 6.7 Fluid Flow and Navier-Stokes 533 6.8 Level Sets and Fast Marching 547 7 Solving Large Systems 551 7.1 Elimination with Reordering 551 7.2 Iterative Methods 563 7.3 Multigrid Methods 571 7.4 Krylov Subspaces and Conjugate Gradients 586 8 Optimization and Minimum Principles 598 8.1 Two Fundamental Examples 598 8.2 Regularized Least Squares 613 8.3 Calculus of Variations 627 8.4 Errors in Projections and Eigenvalues 646 8.5 The Saddle Point Stokes Problem 652 8.6 Linear Programming and Duality 661 8.7 Adjoint Methods in Design 678 Linear Algebra in a Nutshell 685 Sampling and Aliasing 691 Computational Science and Engineering 694 Bibliography 698 Index 704 Teaching and Learning from the Book v TEACHING AND LEARNING FROM THE BOOK I hope that mathematics and also engineering departments will approve of this textbook. It developed from teaching the MIT course 18.085 for thirty years. I thank thousands of engineering and science students for learning this subject with me. I certainly do not teach every single topic! Here is my outline: 1. Applied linear algebra (its importance is now recognized) 2. Applied differential equations (with boundary values and initial values) 3. Fourier series including the Discrete Fourier Transform and convolution. You will have support from the book and the cse website (and the author). Please select the sections appropriate for the course and the class. What I hope is that this book will serve as a basic text for all mathematicians and engineers and scientists, to explain the core ideas of applied mathematics and scientific computing. The subject is beautiful, it is coherent, and it has moved a long way. The course text in earlier years was my book Introduction to Applied Mathematics (Wellesley-Cambridge Press). That text contains very substantial material that is not in this book, and vice versa. What naturally happened, from lectures and exams and homeworks and projects over all those years, was a clearer focus on how applied and engineering mathematics could be presented. This new book is the result. This whole book aims to bring ideas and algorithms together. I am convinced that they must be taught and learned in the same course. The algorithm clarifies the idea. The old method, separation of responsibilities, no longer works: Not perfect Mathematics courses teach analytical techniques Engineering courses work on real problems Even within computational science there is a separation we don't need: Not efficient Mathematics courses analyze numerical algorithms Engineering and computer science implement the software I believe it is time to teach and learn the reality of computational science and engineering. I hope this book helps to move that beautiful subject forward. Thank you for reading it. vi Gilbert Strang is in the Department of Mathematics at MIT. His textbooks have transformed the teaching of linear algebra into a more useful course for many students. His lectures are on the OpenCourseWare website at ocw.mit.edu, where 18.06 is the most frequently visited of 1700 courses. The next course 18.085 evolved in a natural way to become Computational Science and Engineering, and led to this textbook. Awards have come for research and teaching and mathematical exposition: Von Neumann Medal in Computational Mechanics Teaching Prizes from the MIT School of Science Henrici Prize for Applied Analysis Haimo Prize for Distinguished Teaching, Mathematical Association of America Su Buchin Prize, International Congress of Industrial and Applied Mathematics Gilbert Strang served as President of SIAM (1999-2000) and as chair of the U.S. National Committee on Mathematics. Earlier books presented the finite element method and the theory of wavelets and the mathematics of GPS. On those topics George Fix and Truong Nguyen and Kai Borre were valuable coauthors. The textbooks Introduction to Linear Algebra and Linear Algebra and Its Applications are widely adopted by mathematics and engineering departments. With one exception (LAA), all books are published by Wellesley-Cambridge Press. They are available also through SIAM. The present book developed step by step-text first, then problems, MATLAB codes, and video lectures. The response from students has been wonderful. This development will continue on the website math.mit.edu/cse (also /18085 and /18086). Problem solutions will be on that cse site, with further examples. The crucial need for today's students and readers is to move forward from the older "formula-based" emphasis toward a solution-based course. Solving problems is the heart of modern engineering mathematics and scientific computing. THE COVER OF THE BOOK Lois Sellers and Gail Corbett created the cover from the "circles" of Section 2.2. The solution to aim for is a true circle, but Euler's method takes discrete steps. When those steps spiral out, they produce the beautiful background on the cover (not the best circle). The spirals and circles and meshes, plus microarrays and the Gibbs phenomenon, are serious parts of Computational Science and Engineering. It was the inspiration of the cover artists to highlight the three letters CS E. Those letters have come to identify an exciting direction for applied mathematics. I hope the book and the cover from birchdesignassociates.com and the evolving website math.mit.edu/cse will give you ideas to work with, and pleasure too. Acknowledgements vii ACKNOWLEDGEMENTS I have had wonderful help with this book. For a long time we were a team of two : Brett Coonley prepared hundreds of Ib-'1}jX pages. The book would not exist without his steady support. Then new help came from four directions: 1. Per-Olof Persson and Nick 'Trefethen and Benjamin Seibold and Aslan Kasimov brought the computational part of the book to life. The text explains scientific computing, and their codes do it. 2. The typesetting was completed by www.valutone.com (highly recommended!). 3. Jim Collins and Tim Gardner and Mike Driscoll gave advice on mathematical biology (including the gene microarray on the back cover). From biomechanics to heart rhythms to gene expression, we want and need computational biology. It became clear that clustering is a crucial algorithm in bioinformatics, and far beyond. Des Higham and Inderjit Dhillon and Jon Kleinberg generously helped me to develop the newest section *2.9 on Graph Cuts and Gene Clustering. 4. A host of applied mathematicians and engineers told me what to write. The words came from teaching thousands of students over 40 happy years. The structure of a textbook emerges safely but slowly, it can't be rushed. For ideas of all kinds, I owe thanks to so many (plus Oxford and the Singapore-MIT Alliance): Stephen Boyd, Bill Briggs, Yeunwoo Cho, Daniel Cremers, Tim Davis, Sohan Dharmaraja, Alan Edelman, Lotti Ekert, Bob Fourer, Michael Friedlander, Mike Giles (especially), Gene Golub, Nick Gould, Mike Heath, David Hibbitt, Nick Higham, Steven Johnson, David Keyes, Brian Kulis, Ruitian Lang, Jorg Liesen, Ross Lippert, Konstantin Lurie, Bill Morton, Jean-Christophe Nave, Jaime Peraire, Raj Rao, John Reid, Naoki Saito, Mike Saunders, Jos Stam, Vasily Strela, Jared Tanner, Kim Chuan Toh, Alar Toomre, Andy Wathen (especially), Andre Weideman, Chris Wiggins, Karen Willcox, and (on a memorable day at Hong Kong airport) Ding-Xuan Zhou. May I dedicate this book to my family and friends. They make life beautiful. Gilbert Strang viii Introduction INTRODUCTION When you study a subject as large as applied mathematics, or teach it, or write about it, you first need to organize it. There has to be a pattern and a structure. Then the reader (and the author!) can fit the pieces together. Let me try to separate this subject into manageable pieces, and propose a structure for this book and this course. A first step is to see two parts-modeling and solving. Those are reflected in the contents of this book. Applied mathematics identifies the key quantities in the problem, and connects them by differential equations or matrix equations. Those equations are the starting point for scientific computing. In an extreme form, modeling begins with a problem and computing begins with a matrix. A few more words about those two parts. "Applied mathematics" traditionally includes a study of special functions. These have enormous power and importance (sometimes a complete analysis has to wait for a more advanced course). Also traditionally, "scientific computing" includes a numerical analysis of the algorithm-to test its accuracy and stability. Our focus stays on the basic problems that everybody meets: A. Constructing the equations of equilibrium and of motion (balance equations) B. Solving steady state and time-dependent matrix and differential equations. Most scientists and engineers, by the nature of our minds and our jobs, will concentrate more heavily on one side or the other. We model the problem, or we use algorithms like the FFT and software like MATLAB to solve it. It is terrific to do both. Doing the whole job from start to finish has become possible, because of fast hardware and professionally written software. So we teach both parts. The complete effort now defines Computational Science and Engineering. New departments are springing up with that name. This is really a text for the basic course in that great (and quickly growing) subject of CSE. Four Simplifications We all learn by example. One goal in writing this book and teaching this course is to provide specific examples from many areas of engineering and science. The first section of the first chapter starts with four very particular matrices. Those matrices appear over and over in computational science. The underlying model has been made linear, and discrete, and one-dimensional, with constant coefficients. I see those as the great simplifications which make it possible to understand applied mathematics. Let me focus on these four steps: Introduction ix 1. Nonlinear becomes linear 2. Continuous becomes discrete 3. Multidimensional becomes one-dimensional 4. Variable coefficients become constants. I don't know if "becomes" is the right word. We can't change the reality of nature. But we do begin to understand the real problem by solving a simpler problem. This is illustrated by Einstein and Newton, the two greatest physicists of all time. Einstein's equations of relativity are not linear (and we are still trying to solve them). Newton linearized the geometry of space (and this book works with F = ma). His linear equation came 250 years before Einstein connected a nonlinearly to m. Those four great simplifications are fundamental to the organization of this book. Chapter 1 includes all four, by working with the special matrices K, T, B, and C. Here are Kand C: [-i -~ Stiffness K = -1 ] Matrix -1 2 -1 Circulant Matrix C = [ -12 -12 -1 -1 2 -1] -1 -1 2 -1 -1 2 This -1, 2, -1 pattern shows constant coefficients in a one-dimensional problem. Being matrices, K and C are already linear and discrete. The difference is in the boundary conditions, which are always crucial. K is "chopped off" at both ends, while C is cyclic or circular or "periodic." (An interval wraps around into a circle, because of -1 in the corners.) The Fourier transform is perfect for C. Chapter 1 will find K- 1, and the triangular factors in K = LU, and the eigenvalues of K and C. Then Chapter 2 can solve equilibrium problems Ku= f (steady state equations) and initial-value problems Mu"+ Ku= f (time-dependent equations). If you get to know this remarkable matrix K, and apply good software when it becomes large (and later multidimensional), you have made a terrific start. K is a positive definite second difference matrix, with beautiful properties. 1. Nonlinear becomes linear Chapter 2 models a series of important scientific and engineering and economic problems. In each model, the "physical law" is taken to be linear: (a) Hooke's Law in mechanics: Displacement is proportional to force (b) Ohm's Law in networks: Current is proportional to voltage difference (c) Scaling law in economics: Output is proportional to input (d) Linear regression in statistics: A straight line or a hyperplane can fit the data. X Introduction None of those laws is actually true. They are all approximations (no apology for that, false laws can be extremely useful). The truth is that a spring behaves almost linearly until the applied force is very large. Then the spring stretches easily. A resistor is also close to linear-but the highly nonlinear transistor has revolutionized electronics. Economies of scale destroy the linearity of input-output laws (and a price-sales law). We work with linear models as long as we can-but eventually we can't. That was not a complete list of applications-this book gives more. Biology and medicine are rich in the nonlinearities that make our bodies work. So are engineering and chemistry and materials science, and also financial mathematics. Linearization is the fundamental idea of calculus-a curve is known by its tangent lines. Newton's method solves a nonlinear equation by a series of linear equations. No wonder that I find linear algebra everywhere. Let me note that "physical nonlinearity" is easier than "geometric nonlinearity." In the bending of a beam, we replace the true but awful curvature formula u"/(1 + (u')2)312 by a simple u". That succeeds when u 1 is small-typical for many problems. In other cases we can't linearize. If Boeing had assumed ideal flow and ignored the Navier-Stokes equations, the 777 would never fly. 2. Continuous becomes discrete Chapter 3 introduces differential equations. The leading example is Laplace's equation fJ2u/fJx2 +a2u/ay2 = 0, when the magic of complex variables produces a complete family of particular solutions. The solutions come in pairs from (x + iyt and rnein°. We call the pairs u ands: u(x,y) X x2 -y2 s(x, y) y 2xy u(r,0) rcos0 r 2 cos20 s(r,0) rsin0 r 2 sin 20 Laplace's equation shows (in action) the gradient and divergence and curl. But real applications solve a discrete form of the differential equation. That innocent sentence contains two essential tasks: to discretize the continuous equation into Ku = f, and to solve for u. Those steps are at the center of scientific computing, and this book concentrates on two methods for each of them: Continuous to discrete (Chapter 3) Solving discrete Ku = f (Chapter 7) 1. The finite element method 2. Finite difference methods 1. Direct elimination 2. Iterations with preconditioning The matrix K can be very large (and very sparse). A good solution algorithm is usually a better investment than a supercomputer. Multigrid is quite remarkable. Chapter 6 turns to initial-value problems, first for wave and heat equations (convection + diffusion). Waves allow shocks, diffusion makes the solution smooth. These are at the center of scientific computing. The diffusion equation has become CHAPTER 1 APPLIED LINEAR ALGEBRA 1.1 FOUR SPECIAL MATRICES An m by n matrix has m rows and n columns and mn entries. We operate on those rows and columns to solve linear systems Ax= band eigenvalue problems Ax= >.x. From inputs A and b (and from software like MATLAB) we get outputs x and>.. A fast stable algorithm is extremely important, and this book includes fast algorithms. One purpose of matrices is to store information, but another viewpoint is more important for applied mathematics. Often we see the matrix as an "operator." A acts on vectors x to produce Ax. The components of x have a meaningdisplacements or pressures or voltages or prices or concentrations. The operator A also has a meaning-in this chapter A takes differences. Then Ax represents pressure differences or voltage drops or price differentials. Before we turn the problem over to the machine-and also after, when we interpret A\b or eig(A)-it is the meaning we want, as well as the numbers. This book begins with four special families of matrices-simple and useful, absolutely basic. We look first at the properties of these particular matrices Kn, Cn, Tn, and Bn. (Some properties are obvious, others are hidden.) It is terrific to practice linear algebra by working with genuinely important matrices. Here are K 2 , K3 , K 4 in the first family, with -1 and 2 and -1 down the diagonals: K2 = [-12 -l2J -1 0 -Il 2 -1 -1 2 0 -1 What is significant about K 2 and K3 and K 4 , and eventually then by n matrix Kn? I will give six answers in the same order that my class gave them-starting with four properties of the K's that you can see immediately. 1 2 Chapter 1 Applied Linear Algebra 1. These matrices are symmetric. The entry in row i, column j also appears in row j, column i. Thus Kij = Kj., on opposite sides of the main diagonal. Symmetry can be expressed by transposing the whole matrix at once: K = KT. 2. The matrices Kn are sparse. Most of their entries are zero when n gets large. K 1000 has a million entries, but only 1000 + 999 + 999 are nonzero. 3. The nonzeros lie in a "band" around the main diagonal, so each Kn is banded. The band has only three diagonals, so these matrices are tridiagonal. Because K is a tridiagonal matrix, Ku = f can be quickly solved. If the unknown vector u has a thousand components, we can find them in a few thousand steps (which take a small fraction of a second). For a full matrix of order n = 1000, solving Ku = f would take hundreds of millions of steps. Of course we have to ask if the linear equations have a solution in the first place. That question is coming soon. 4. The matrices have constant diagonals. Right away that property wakes up Fourier. It signifies that something is not changing when we move in space or time. The problem is shift-invariant or time-invariant. Coefficients are constant. The tridiagonal matrix is entirely determined by the three numbers -1, 2, -1. These are actually "second difference matrices" but my class never says that. The whole world of Fourier transforms is linked to constant-diagonal matrices. In signal processing, the matrix D = K/4 is a "highpass filter." Du picks out the rapidly varying (high frequency) part of a vector u. It gives a convolution with ¼(-1, 2, -1). We use these words to call attention to the Fourier part (Chapter 4) of this book. Mathematicians call K a Toeplitz matrix, and MATLAB uses that name: The command K = toeplitz([2 -1 zeros(l, 2)]) constructs K4 from row 1. Actually, Fourier will be happier if we make two small changes in Kn. Insert -1 in the southwest and northeast corners. This completes two diagonals (which circle around). All four diagonals of C4 wrap around in this "periodic matrix" or "cyclic convolution" or circulant matrix: Circulant matrix 2 -1 0 _ C4 - [ -1 0 2 -1 -1 2 -1] -[ - toeplitz([ 2 -1 0 -1]). -1 0 -1 This matrix is singular. It is not invertible. Its determinant is zero. Rather than computing that determinant, it is much better to identify a nonzero vector u that solves C4u = 0. (If C4 had an inverse, the only solution to C4u = 0 would be the zero vector. We could multiply by C41 to find u = 0.) For this matrix, the column vector u of all ones (printed as u = (1, 1, 1, 1) with commas) solves C4u = 0. 1.1 Four Special Matrices 3 The columns of C add to the zero column. This vector u = ones(4, 1) is in the nullspace of C4 . The nullspace contains all solutions to Cu= 0. Whenever the entries along every row of a matrix add to zero, the matrix is certainly singular. The same all-ones vector u is responsible. Matrix multiplication Cu adds the column vectors and produces zero. The constant vector u = (1, 1, 1, 1) or u = (c, c, c, c) in the nullspace is like the constant C when we integrate a function. In calculus, this "arbitrary constant" is not knowable from the derivative. In linear algebra, the constant in u = (c, c, c, c) is not knowable from Cu= 0. 5. All the matrices K = Kn are invertible. They are not singular, like Cn. There is a square matrix K-1 such that K-1K =I= identity matrix. And if a square matrix has an inverse on the left, then also K K-1 = I. This "inverse matrix" is also symmetric when K is symmetric. But K-1 is not sparse. Invertibility is not easy to decide from a quick look at a matrix. Theoretically, one test is to compute the determinant. There is an inverse except when . 2'.: 0), C4 is positive semidefinite. The pivots appear on the main diagonal in Section 1.3, when solving Ku = f by elimination. The eigenvalues arise in K x = >.x. There is also a determinant test for positive definiteness (not just 0). The proper definition of a symmetric positive definite matrix (it is connected to positive energy) will come in Section 1.6. 4 Chapter 1 Applied Linear Algebra Changing Kn to Tn After Kn and Cn, there are two more families of matrices that you need to know. They are symmetric and tridiagonal like the family Kn. But the (1, 1) entry in Tn is changed from 2 to 1: (1) That top row (T stands for top) represents a new boundary condition, whose meaning we will soon understand. Right now we use T3 as a perfect example of elimination. Row operations produce zeros below the diagonal, and the pivots are circled as they are found. Two elimination steps reduce T to the upper triangular U. Step 1. Add row 1 to row 2, which leaves zeros below the first pivot. l l [0 - ~ Step 2. Add the new row 2 to row 3, which produces U. T = [ -01 - 12 -10 O 8~ 1 ~o-0l -1 s~2 1 0 0-1 =U. 0 -1 2 O -1 2 0 0 1 All three pivots of T equal 1. We can apply the test for invertibility (three nonzero pivots). T3 also passes the test for positive definiteness (three positive pivots). In fact every Tn in this family is positive definite, with all its pivots equal to 1. That matrix U has an inverse (which is automatically upper triangular). The exceptional fact for this particular u-1 is that all upper triangular entries are l's: 1 -1 0 i-l u-1 = [ o 1 -1 0 0 1 1 1 : ] - triu(ones(3)). (2) 0 This says that the inverse of a 3 by 3 "difference matrix" is a 3 by 3 "sum matrix." This neat inverse of U will lead us to the inverse of T in Problem 2. The product u-1u is the identity matrix I. U takes differences, and u-1 takes sums. Taking differences and then sums will recover the original vector (u1, u2 , u3): 1.1 Four Special Matrices 5 Changing Tn to Bn The fourth family Bn has the last entry also changed from 2 to 1. The new boundary condition is being applied at both ends (B stands for both). These matrices Bn are symmetric and tridiagonal, but you will quickly see that they are not invertible. The Bn are positive semidefinite but not positive definite: -1] Bn(n, n) = 1 B2 = [ -11 1 (3) Again, elimination brings out the properties of the matrix. The first n - l pivots l l l l will all equal 1, because those rows are not changed from Tn. But the change from 2 to 1 in the last entry of B produces a change from 1 to 0 in the last entry of U: 0-1 0 0 0-1 0-1 0 0 0-1 =U. (4) 0 -1 1 0 0 0 There are only two pivots. (A pivot must be nonzero.) The last matrix U is certainly not invertible. Its determinant is zero, because its third row is all zeros. The constant vector (1, 1, 1) is in the nullspace of U, and therefore it is in the nullspace of B: The whole point of elimination was to simplify a linear system like Bu = 0, without changing the solutions. In this case we could have recognized non-invertibility in the matrix B, because each row adds to zero. Then the sum of its three columns is the zero column. This is what we see when B multiplies the vector (1, 1, 1). Let me summarize this section in four lines (all these matrices are symmetric): Kn and Tn are invertible and (more than that) positive definite. Cn and Bn are singular and (more than that) positive semidefinite. The nullspaces of Cn and Bn contain all the constant vectors u = (c, c, ... , c). Their columns are dependent. The nullspaces of Kn and Tn contain only the zero vector u = (0, 0, ... , 0). Their columns are independent. 6 Chapter 1 Applied Linear Algebra Matrices in MATLAB It is natural to choose MATLAB for linear algebra, but the reader may select another system. (Octave is very close, and free. Mathematica and Maple are good for symbolic calculation, LAPACK provides excellent codes at no cost in netlib, and there are many other linear algebra packages.) We will construct matrices and operate on them in the convenient language that MATLAB provides. Our first step is to construct the matrices Kn. For n = 3, we can enter the 3 by 3 matrix a row at a time, inside brackets. Rows are separated by a semicolon K = [ 2 -1 0; -1 2 -1; 0 -1 2 ] For large matrices this is too slow. We can build Ks from "eye" and "ones": eye(8) = 8 by 8 identity matrix ones(7, 1) = column vector of seven l's The diagonal part is 2*eye(8). The symbol* means multiplication! The -l's above the diagonal of Ks have the vector -ones(7, 1) along diagonal 1 of the matrix E: Superdiagonal of -l's E = -diag(ones(7, 1), 1) The -1 's below the diagonal of Ks lie on the diagonal numbered -1. For those we could change the last argument in E from 1 to -1. Or we can simply transpose E, using the all-important symbol E' for ET. Then K comes from its three diagonals: Tridiagonal matrix K 8 K = 2 * eye(8) + E + E' Note: The zeroth diagonal (main diagonal) is the default with no second argument, so eye(8)= diag(ones(8,1)). And then diag(eye(8)) = ones(8, 1). The constant diagonals make K a Toeplitz matrix. The toeplitz command produces K, when each diagonal is determined by a single number 2 or -1 or 0. Use the zeros vector for the 6 zeros in the first row of Ks: Symmetric Toeplitz rowl = [2 -1 zeros(l,6)]; K = toeplitz(rowl) For an unsymmetric constant-diagonal matrix, use toeplitz(coll, rowl). Taking coll= [1 -1 0 0] and row1 = [1 0 0] gives a 4 by 3 backward difference matrix. It has two nonzero diagonals, l's and -1 's. To construct the matrices T and B and C from K, just change entries as in the last three lines of this M-file that we have named KTBC.m. Its input is the size n, its output is four matrices of that size. The semicolons suppress display of K, T, B, C: function [K,T,B,C) = KTBC(n) % Create the four special matrices assuming n>l K = toeplitz ([2 -1 zeros(l,n-2)]); T = K· T(l 1) = 1· = B = K I ' ' B(l I 1) ' 1·I B(n r n) = 1·I C = K; C(l,n) = -1; C(n,1) = -1; 1.1 Four Special Matrices 7 If we happened to want their determinants (we shouldn't!), then with n = 8 [ det(K) det(T) det(B) det(C) ] produces the output [ 9 1 0 0] One more point. MATLAB could not store Kn as a dense matrix for n = 10,000. The 108 entries need about 800 megabytes unless we recognize K as sparse. The code sparseKTBC.m on the course website avoids storing (and operating on) all the zeros. It has K, T, B, or C and n as its first two arguments. The third argument is 1 for sparse, 0 for dense (default Ofor narg= 2, no third argument). The input to Sparse MATLAB includes the locations of all nonzero entries. The command A =sparse(i,j,s,m,n) creates an m by n sparse matrix from the vectors i, j, s that list all positions i, j of nonzero entries s. Elimination by lu(A) may produce additional nonzeros (called fill-in) which the software will correctly identify. In the normal "full" option, zeros are processed like all other numbers. It is best to create the list of triplets i, j, s and then call sparse. Insertions A(i, j) = s or A(i, j) = A(i, j) + s are more expensive. We return to this point in Section 3.6. The sparse KTBC code on the website uses spdiags to enter the three diagonals. Here is the toeplitz way to form K 8, all made sparse by its sparse vector start: vsp = sparse([2 -1 zeros(l, 6)]) % please look at each output Ksp= toeplitz(vsp) % sparse format gives the nonzero positions and entries bsp = Ksp(:, 2) % colon keeps all rows of column 2, so bsp = column 2 of Ksp usp = Ksp\bsp % zeros in Ksp and bsp are not processed, solution: usp(2) = 1 uuu = full(usp) % return from sparse format to the full uuu = [O 1 0 0 0 0 0 OJ Note The open source language Python is also very attractive and convenient. The next sections will use all four matrices in these basic tasks of linear algebra: (1.2) The finite difference matrices K, T, B, C include boundary conditions (1.3) Elimination produces pivots in D and triangular factors in LDLT r- (1.4) Point loads produce inverse matrices K-1 and 1 (1.5) The eigenvalues and eigenvectors of K, T, B, C involve sines and cosines. You will see K\f in 1.2, lu(K) in 1.3, inv(K) in 1.4, eig(K) in 1.5, and chol(K) in 1.6. I very much hope that you will come to know and like these special matrices. 8 Chapter 1 Applied Linear Algebra a WORKED EXAMPLES a 1.1 A Bu = f and Cu = f might be solvable even though B and C are singular ! Show that every vector f = Bu has Ji+ h + · · · + fn = 0. Physical meaning: the external forces balance. Linear algebra meaning: Bu= f is solvable when f is perpendicular to the all-ones column vector e = (1, 1, 1, 1, ... ) = ones(n, 1). Solution Bu is a vector of "differences" of u's. Those differences always add to zero: All terms cancel in (u1 - u2) + (-u1 + 2u2 - u3) + (-u2 + 2u3 - u4) + (-u3 + U4) = 0. The dot product withe= (1, 1, 1, 1) is that sum JTe =Ji+ h + h + / 4 = 0. Dot product (f 1 * e in MATLAB). A second explanation for JT e = 0 starts from the fact that Be = 0. The all-ones vector e is in the nullspace of B. Transposing f = Bu gives /T = uT BT, since the transpose of a product has the individual transposes in reverse order. This matrix B is symmetric so BT = B. Then Conclusion Bu= f is only solvable when f is perpendicular to the all-ones vector e. ( The same is true for Cu = f. Again the differences cancel out.) The external forces balance when the f's add to zero. The command B\f will produce Inf because B is square and singular, but the "pseudoinverse" u = pinv(B) * f will succeed. (Or add a zero row to Band f before the command B\f, to make the system rectangular.) 1.1 B The "fixed-free" matrix H changes the last entry of K from 2 to 1. Connect H to the "free-fixed" T (first entry = 1) by using the reverse identity matrix J: comes from JT J via the reverse identity J Chapter 2 shows how T comes from a tower structure (free at the top). H comes from a hanging structure (free at the bottom). Two MATLAB constructions are H = toeplitz([2 -1 0]); H(3,3) = 1 or J= fliplr(eye(3)); H=J*T*J 1.1 Four Special Matrices 9 Solution JT reverses the rows of T. Then JTJ reverses the columns to give H: JT= (rows) (JT)J = H. ( columns too) We could reverse columns first by T J. Then J(TJ) would be the same matrix Has (JT)J. The parentheses never matter in (AB)C = A(BC) ! Any permutation matrix like J has the rows of the identity matrix I in some order. There are six 3 by 3 permutation matrices because there are six orders for the numbers 1, 2, 3. The inverse of every permutation matrix is its transpose. This particular J is symmetric, so it has J = JT = J- 1 as you can easily check: With back= 3:-1:1, reordering to JTJ is H = T(back, back) in MATLAB. Problem Set 1. 1 Problems 1-4 are about T-1 and Problems 5-8 are about K-1. 1 The inverses of T3 and T4 (with T11 = 1 in the top corner) are r Guess 5- 1 and multiply by T5 . Find a simple formula for the entries of T;; 1 on and below the diagonal (i;?: j), and then on and above the diagonal (i :'.S; j). 2 Compute r 3- 1 in three steps, using U and u-1 in equation (2): 1. Check that T3 = UTU, where U has l's on the main diagonal and -l's along the diagonal above. Its transpose UT is lower triangular. 2. Check that uu-1 = I when u-1 has l's on and above the main diagonal. 3. Invert UTU to find r 3- 1 = (u-1) (u-1?. Inverses come in reverse order! 3 The difference matrix U = U5 in MATLAB is eye(5)-diag(ones(4,1),l). Con- struct the sum matrix S from triu(ones(5)). (This keeps the upper triangular part of the 5 by 5 all-ones matrix.) Multiply U * S to verify that S = u-1. 4 For every n, Sn = Un-l is upper triangular with ones on and above the diagonal. For n = 4 check that SST produces the matrix T4 -l predicted in Problem 1. Why is SST certain to be a symmetric matrix? 10 Chapter 1 Applied Linear Algebra 5 i: The inverses of K 3 and K 4 (please also invert K 2 ) have fractions d~t = {, First guess the determinant of K = K5 . Then compute det(K) and inv(K) and det(K)* inv(K)-any software is allowed. 6 (Challenge problem) Find a formula for the i, j entry of K41 below the diagonal (i 2: j). Those entries grow linearly along every row and up every column. (Section 1.4 will come back to these important inverses.) Problem 7 below is developed in the Worked Example of Section 1.4. 7 A column u times a row vT is a rank-one matrix uvT. All columns are multiples of u, and all rows are multiples of vT. T4 -l - K 4-l has rank 1: l [ l ! 16 12 8 4 r-i _ 4 K_1 _ 4 -5 [ 12 8 9 6 3 6 4 2 ! 4 3 [4 3 2 1] 5 2 4 321 1 Write K 3-T3 in this special form uvT. Predict a similar formula for T3- 1 -K31. 8 (a) Based on Problem 7, predict the i,j entry of T5- 1 -K51 below the diagonal. r (b) Subtract this from your answer to Problem 1 (the formula for 5- 1 when i 2:: j). This gives the not-so-simple formula for K51. 9 Following Example 1.1 A with C instead of B, show that e = (l, 1, 1, 1) is perpendicular to each column of C4 . Solve Cu = f = (l, -1, 1, -1) with the singular matrix C by u = pinv(C) * f. Try u = C\e and C\f, before and after adding a fifth equation O= 0. 10 The "hanging matrix" Hin Worked Example 1.1 B changes the last entry of K 3 to H33 = 1. Find the inverse matrix from s-1 = JT- 1J. Find the inverse also from H = UUT (check upper times lower triangular!) and s-1 = (u-1)Tu-1. 11 Suppose U is any upper triangular matrix and J is the reverse identity matrix in 1.1 B. Then JU is a "southeast matrix". What geographies are UJ and JU J? By experiment, a southeast matrix times a northwest matrix is __ . 12 Carry out elimination on the 4 by 4 circulant matrix C4 to reach an upper triangular U (or try [L, U] = lu(C) in MATLAB). Two points to notice: The last entry of U is __ because C is singular. The last column of U has new nonzeros. Explain why this "fill-in" happens. 1.1 Four Special Matrices 11 13 By hand, can you factor the circulant C4 (with three nonzero diagonals, allowing wraparound) into circulants L times U (with two nonzero diagonals, allowing wraparound so not truly triangular)? 14 Gradually reduce the diagonal 2, 2, 2 in the matrix K 3 until you reach a singular matrix M. This happens when the diagonal entries reach _ _ . Check the determinant as you go, and find a nonzero vector that solves Mu= 0. Questions 15-21 bring out important facts about matrix multiplication. 15 How many individual multiplications to create Ax and A2 and AB? An x nXn x 1 Am x n Bn x p = (AB)mxp 16 You can multiply Ax by rows (the usual way) or by columns (more important). Do this multiplication both ways: By rows [ 42 35] [ 21] = [inner inner product product using using row row 1] 2 By columns [ 24 3] [1] = 1 [2] + 2 [3] 5 2 4 5 = [combination] of columns 17 The product Ax is a linear combination of the columns of A. The equations Ax= b have a solution vector x exactly when bis a __ of the columns. Give an example in which bis not in the column space of A. There is no solution to Ax = b, because b is not a combination of the columns of A. 18 Compute C = AB by multiplying the matrix A times each column of B: Thus, A * B(:,j) = C(:,j). 19 You can also compute AB by multiplying each row of A times B: [1 [ 2 4 3] 5 2 2] = [2 * row 1 + 3 * row 2] = [8 4 4*row1+5*row2 * 16] * • A solution to Bx= 0 is also a solution to (AB)x = 0. Why? From 12 Chapter 1 Applied Linear Algebra 20 The four ways to find AB give numbers, columns, rows, and matrices: 1 (rows ofA) times (columns ofB) 2 A times (columns of B) 3 (rows of A) times B 4 (columns of A) times (rows ofB) C(i,j) = A(i,:) * B(:,j) C(:,j)=A* B(:,j) C(i,:)=A(i,:)*B for k=l:n, C=C+A(:,k)*B(k,:); end Finish these 8 multiplications for columns times rows. How many for n by n? 21 Which one of these equations is true for all n by n matrices A and B? AB=BA (AB)A = A(BA) (AB)B = B(BA) (AB)2 = A2B2 . 22 Use n = 1000; e = ones(n, 1); K = spdiags([-e, 2 * e, -e], -1: 1, n, n); to enter K 1000 as a sparse matrix. Solve the sparse equation Ku= e by u = K\e. Plot the solution by plot(u). 23 Create 4-component vectors u, v, w and enter A= spdiags([u, v, w], -1: 1, 4, 4). Which components of u and ware left out from the -1 and 1 diagonals of A? 24 Build the sparse identity matrix I = sparse(i, j, s, 100, 100) by creating vectors i, j, s of positions i, j with nonzero entries s. (You could use a for loop.) In this case speye(lO0) is quicker. Notice that sparse(eye(lO000)) would be a disaster, since there isn't room to store eye(lO000) before making it sparse. 25 The only solution to Ku= 0 or Tu= 0 is u = 0, so Kand Tare invertible. For proof, suppose ui is the largest component of u. If -ui-I + 2ui - ui+1 is zero, this forces Ui-l = ui = Ui+1· Then the next equations force every Uj = ui. At the end, when the boundary is reached, -Un-I+ 2un only gives zero if u = 0. Why does this "diagonally dominant" argument fail for B and C? 26 For which vectors v is toeplitz(v) a circulant matrix (cyclic diagonals)? 27 (Important) Show that the 3 by 3 matrix K comes from AJ'A0: 0~] -1 1 0 Ao= [ O -1 1 is a "difference matrix" 0 0 -1 Which column of A0 would you remove to produce A1 with T = ATA1 ? Which column would you remove next to produce A2 with B = A;fA2 ? The differ- ence matrices A 0 , A1 , A 2 have 0, 1, 2 boundary conditions. So do the "second differences" K, T, and B. 1.2 Differences, Derivatives, Boundary Conditions 13 1.2 DIFFERENCES, DERIVATIVES, BOUNDARY CONDITIONS This important section connects difference equations to differential equations. A typical row in our matrices has the entries -1, 2, -1. We want to see how those numbers are producing a second difference (or more exactly, minus a second difference). The second difference gives a natural approximation to the second derivative. The matrices Kn and Cn and Tn and Bn are all involved in approximating the equation - d2u dx2 = f(x) with boundary conditions at x = 0 and x = 1. (1) Notice that the variable is x and not t. This is a boundary-value problem and not an initial-value problem. There are boundary conditions at x = 0 and x = 1, not initial conditions at t = 0. Those conditions are reflected in the first and last rows of the matrix. They decide whether we have Kn or Cn or Tn or Bn. We will go from first differences to second differences. All four matrices have the special form ATA (matrix times transpose). Those matrices AT and A produce first differences, and ATA produces second differences. So this section has two parts: I. Differences replace derivatives (and we estimate the error). II. We solve - d2u dx2 = 1 and then - bo2u (box) 2 = 1 using the matrices Kand T. Part I: Finite Differences How can we approximate du/dx, the slope of a function u(x)? The function might be known, like u(x) = x2 . The function might be unknown, inside a differential equation. We are allowed to use values u(x) and u(x + h) and u(x - h), but the stepsize h = b.x is fixed. We have to work with bou/box without taking the limit as b.x --> 0. So we have "finite differences" where calculus has derivatives. Three different possibilities for b.u are basic and useful. We can choose a forward difference or a backward difference or a centered difference. Calculus textbooks typi- cally take b.u = u(x +box) - u(x), going forward to x + b.x. I will use u(x) = x2 to test the accuracy of all three differences. The derivative of x 2 is 2x, and that forward difference bo+ is usually not the best! Here are bo+, bo_, and bo0 : Forward difference Backward difference Centered difference u(x+h}-u(x) h u(x}-u{x-h) h u(x+h)-u(x-h) 2h The test gives The test gives The test gives (x + h) 2 - h x2 = 2x+h x2 - (x - h) 2 =2x-h h (x+h)2 -(x-h)2 =2x 2h For u = x2 , the centered difference is the winner. It gives the exact derivative 2x, while forward and backward miss by h. Notice the division by 2h (not h). 14 Chapter 1 Applied Linear Algebra Centered is generally more accurate than one-sided, when h = b.x is small. The reason is in the Taylor series approximation of u(x + h) and u(x - h). These first few terms are always the key to understanding the accuracy of finite differences: Forward Backward u(x + h) = u(x) + hu'(x) + ½h2u"(x) + ½h3u111(x) +. •• (2) u(x - h) = u(x) - hu'(x) + ½h2u"(x) - ¼h3u111 (x) + ••• (3) Subtract u(x) from each side and divide by h. The forward difference is first order accurate because the leading error ½hu"(x) involves the first power of h: One-sided is first order -u-(-x'-+--h-')---u'-(-x-')- = U'(X) + -lhU "(X) + · · · (4) h 2 The backward difference is also first order accurate, and its leading error is - ½hu" (x). For u(x) = x 2 , when u"(x) = 2 and u111 = 0, the error ½hu" is exactly h. For the centered difference, subtract (3) from (2). Then u(x) cancels and also ½h2u"(x) cancels (this gives extra accuracy). Dividing by 2h leaves an h2 error: Centered is second order -u-('-x-+---h'-)---u-'(-x----h-')- = u'(x) + -1h2u"'(x) + ••• (5) 2h 6 The centered error is O(h2) where the one-sided errors were O(h), a significant change. If h = 110 we are comparing a 1% error to a 10% error. The matrix for centered differences is antisymmetric (like the first derivative): Centered difference matrix aoT = -ao -1 0 1 -1 0 1 U,-1 u, u,+1 u,+2 Ui+l - Ui-1 Ui+2 - U, Transposing b.0 reverses -1 and 1. The transpose of the forward difference matrix b.+ would be - (backward difference) = -b._. Centered difference quotients b.0u/2h are the average of forward and backward (Figure 1.1 shows u(x) = x 3 ). u=8 ,,, 1 u= x3 2 Forward - - - Backward········· Centered - • - • - b.+u = 8 - 1 = 7 h b._u = 1 - 0 = 1 h b.0u = 8 - 0 = 4(2h) Notice 4 = ½(1 + 7) Figure 1.1: b.+/h and b._/h and b.0/2h approximate u' =3x2 =3 by 7, 1, 4 at x = 1. The second difference b.2u = 8 - 2(1) + 0 is exactly u" = 6x = 6 with step h = 1. 1.2 Differences, Derivatives, Boundary Conditions 15 Second Differences from First Differences We can propose a basic problem in scientific computing. Find a finite difference approximation to this linear second order differential equation: - ::~ = J(x) with the boundary conditions u(O) = 0 and u(l) = 0. (6) The derivative of the derivative is the second derivative. In symbols d/dx(du/dx) is d2u/dx2. It is natural that the first difference of the first difference should be the second difference. Watch how a second difference 6_6+u is centered around point i: Difference of difference Those numbers 1, -2, 1 appear on the middle rows of our matrices K and T and B and C (with signs reversed). The denominator is h2 = (6x)2 under this second difference. Notice the right positions for the superscripts 2, before u and after x: + + d2u ~ 6 2u _ u(x -6.a::) - 2u(a::) u(a:: - -6.a::) Second difference dx2 ~ ~x2 - ( -6.a::) 2 (8) What is the accuracy of this approximation? For u(x + h) we use equa- tion (2), and for u(x - h) we use (3). The terms with hand h3 cancel out: 6 2u(x) = u(x + h) - 2u(x) + u(x - h) = h2u"(x) + ch4u""(x) + ••• (9) Dividing by h2 , 6 2u/(6x) 2 has second order accuracy (error ch2u 1111 ). We get that extra order because ~ 2 is centered. The important tests are on u(x) = x 2 and u(x) = x3 . The second difference divided by (6x) 2 gives the correct second derivative: Perfection for u = a::2 (x + h)2 - 2x2 + (x - h)2 = 2 h2 • (10) An equation d2u/dx2 = constant and its difference approximation 6 2u/(6x) 2 = constant will have the same solutions. Unless the boundary conditions get in the way... The Important Multiplications You will like what comes now. The second difference matrix (with those diagonals 1, -2, 1) will multiply the most important vectors I can think of. To avoid any problems at boundaries, I am only looking now at the internal rows-which are the same for K, T, B, C. These multiplications are a beautiful key to the whole chapter. .6.2 (Squares) =2·(0nes) .6.2 (Ramp) =Delta .6.2 (Sines) =.X • (Sines) (11) 16 Chapter 1 Applied Linear Algebra Here are column vectors whose second differences are special: Constant Linear Squares (1, 1, ... , 1) (1, 2, ... , n) (1 2 ,22 , ... ,n2 ) ones(n,1) (1:n)' (in MATLAB notation) (1:n)'."2 Delta at k Step at k Ramp at k (0, 0, 1, 0, ... , 0) (0, 0, 1, 1, ... , 1) (0, 0, 0, 1, ... , n - k) [zeros(k-1,1) ; 1 ; zeros(n-k,1)] [zeros(k-1,1) ; ones(n-k+l,1)] [zeros(k-1,1) ; 0:(n-k)'] Sines Cosines Exponentials (sin t, ... , sin nt) (cost, ... ,cosnt) (eit ' ... ' eint) sin((l:n)'*t) cos( (1:n)'*t) exp((l:n)'*i*t) Now come the multiplications in each group. The second difference of each vector is analogous (and sometimes equal!) to a second derivative. I. For constant and linear vectors, the second differences are zero: .6.2 (constant) .6.2 (linear) .Jrn [~] [·.i -2 1 1 -2 (12) [~] [ .i -2 1 1 -2 ..Jm For squares, the second differences are constant (the second derivative of x 2 is 2). This is truly important: Matrix multiplication confirms equation (8) . .6.2 (squares) (13) Then Ku= ones for u = -(squares)/2. Below come boundary conditions. II. Second differences of the ramp vector produce the delta vector: (14) Section 1.4 will solve Ku= 8 with boundary conditions included. You will see how each position of the "1" in delta produces a column u in K-1 or r-1. For functions: The second derivative of a ramp max(x, 0) is a delta function. 1.2 Differences, Derivatives, Boundary Conditions 17 III. Second differences of the sine and cosine and exponential produce 2 cost - 2 times those vectors. (Second derivatives of sin xt and cos xt and eixt produce -t2 times the functions.) In Section 1.5, sines or cosines or exponentials will be eigenvectors of K, T, B, C with the right boundary conditions. a 2 (sines) a 2 (cosines) l l -2 1 ['; 1 -2 [ .J l l ·; -2 1 J ~! 1 -2 [ ssiinn2t t sin3t ssiinn2t t =(2cost-2) [ sin 3t (15) sin4t sin4t ccooss2t t [ cos3t = (2 cost - 2) [ c~~o:st (16) cos4t cos4t a'(exponent;als) [. ·; -2 1 1 -2 l [ l l e2itit 1 e3~t ee2itit =(2cost-2) [ e3it (17) . . . e4it e4it The eigenvalue 2 cost - 2 is easiest to see for the exponential in (17). It is exactly eit - 2 + e-it, which factors out in the matrix multiplication. Then (16) and (15), cosine and sine, are the real and imaginary parts of (17). Soon twill be 0. Part 11: Finite Difference Equations We have an approximation /:). 2uj(!:).x)2 to the second derivative d2u/dx2. So we can quickly create a discrete form of -d2u/dx2 = f(x). Divide the interval [O, 1] into equal pieces of length h = /:).x. If that meshlength is h = n~l, then n + 1 short subintervals will meet at x = h, x = 2h, ... , x = nh. The extreme endpoints are x = 0 and x = (n + l)h = 1. The goal is to compute approximations u1, ... , Un to the true values u( h), ... , u(nh) at those n meshpoints inside the [O, 1] interval. [I] Ulliillowns u- U2 0 u(x) Uo and Un+l known from boundary conditions 0 h 2h nh (n+l)h = 1 Figure 1.2: The discrete unknowns u1, ... , Un approximate the true u(h), ... , u(nh). Certainly -d2/dx2 is replaced by our -1, 2, -1 matrix, divided by h2 and with the minus sign built in. What to do on the right side? The source term f(x) might be a smooth distributed load or a concentrated point load. If f (x) is smooth as in sin 21rx, the first possibility is to use its values /i at the meshpoints x = ifl.x. 18 Chapter 1 Applied Linear Algebra Finite difference equation (18) The first equation (i = 1) involves u0. The last equation (i = n) involves Un+1• The boundary conditions given at x = 0 and x = 1 will determine what to do. We now solve the key examples with fixed ends u(0) = u0 = 0 and u(l) = Un+1 = 0. = Example 1 Solve the differential and difference equations with constant force J(x) 1: - d2u dx2 = 1 with u(0) = 0 (fixed end) and u(l) = 0 (19) + -U·+l i = 2U· - U· I i i- 1 with u0 = 0 and Un+I = 0 (20) h2 Solution For every linear equation, the complete solution has two parts. One "particular solution" is added to any solution with zero on the right side (no force): Complete solution = + Ucomplete Uparticular Unullspace • (21) This is where linearity is so valuable. Every solution to Lu = 0 can be added to one particular solution of Lu = f. Then by linearity L(Upart + Unun) = f + 0. Particular solution - d2u dx2 = 1 is solved by Upart(x) = -½x2 Nullspace solution - d2u dx2 = 0 is solved by Unun(x) = Cx + D. The complete solution is u(x) = -½x2 + Cx + D. The boundary conditions will tell us the constants C and Din the nullspace part. Substitute x = 0 and x = 1: Boundary condition at x = 0 u(0) = 0 gives D = 0 Boundary condition at x = 1 u(l) = 0 gives C = ½ Solution In Figure 1.3, the finite difference solution agrees with this u(x) at the meshpoints. This is special: (19) and (20) have the same solution (a parabola). A second difference of ui = i2h2 gives exactly the correct second derivative of u = x2. The second difference of a linear ui = ih matches the second derivative (zero) of u = x: (i+l)h-2ih+(i-l)h_ h d2 ( )-0 h2 - 0 mate es dx2 x - . (22) The combination of quadratic i2h2 and linear ih (particular and nullspace solutions) is exactly right. It solves the equation and satisfies the boundary conditions. We can 1.2 Differences, Derivatives, Boundary Conditions 19 4 32 3 32 u(x) = ½(x - x 2) Ui = ½(ih - i2h2) d2u Solution to - dx2 = 1 b,,.2u Solution to - b.x2 = ones 0 h X 2h 3h 4h = 1 Figure 1.3: Finite differences give an exact match of u(x) and ui for the special case -u" = 1 with uo = Un+1 = 0. The discrete values lie right on the parabola. set x = ih in the true solution u(x) = ½(x - x2) to find the correct ui. 2 2 Finite difference solution Ui = 1(i.h - i·2h2) has Un+1 = 1(1 - 12) = 0. It is unusual to have this perfect agreement between ui and the exact u(ih). It is also unusual that no matrices were displayed. When 4h = 1 and/= 1, the matrix is k K 3/h2 = 16K3 . Then ih = ¼,¾,¾leads to ui = ~' 3~, Ku=f l [ l [ l 16 [ -12 -12 -1o 34//3322 = 11 . (23) 0 -1 2 3/32 1 The -1 's in columns 0 and 4 were safely chopped off because u 0 = u4 = 0. A Different Boundary Condition Chapter 2 will give a host of physical examples leading to these differential and difference equations. Right now we stay focused on the boundary condition at x = 0, changing from zero height to zero slope: d2u . du - dx2 = f(x) with dx (0) = 0 (free end) and u(l) = 0. (24) With this new boundary condition, the difference equation no longer wants u 0 = 0. Instead we could set the first difference to zero: u 1 - u0 = 0 means zero slope in the first small interval. With u0 = u 1, the second difference -u0 + 2u1 - u2 in row 1 reduces to u1 - u2 . The new boundary condition changes Kn to Tn. Example 2 Solve the differential and difference equations starting from zero slope: Free-fixed d2u du - - = 1 with dx(0) = 0 and u(l) = 0 (25) dx2 + -U·'+1 h22U'· - U•· -1 = 1 with U1 -h Uo = 0 and Un+I = 0 (26) 20 Chapter 1 Applied Linear Algebra Continuous solution u(O) = -& Discrete solution = = u0 u1 166 5 16 Free end x f-----+--+--+-----tt-¼ Fixed end 0 h 2h 3h 4h = 1 Figure 1.4: ui is below the true u(x) = ½(1 - x2 ) by an error ½h(l - x). Solution u(x) = -½x2 + Cx +Dis still the complete solution to -u" = 1. But the new boundary condition changes the constants to C = 0 and D = ½: :~ = 0 at x = 0 gives C = 0. Then u = 0 at x = 1 gives D = ~- The free-fixed solution is u(x) = ½(1 - x 2 ). Figure 1.4 shows this parabola. Example 1 was a symmetric parabola, but now the discrete boundary condition u1 = u0 is not exactly satisfied by u(x). So the finite difference u/s show small errors. We expect an O(h) correction because of the forward difference (u1 - u0 )/h. For n = 3 and h = ¼, gives Figure 1.4 shows that solution (not very accurate, with three meshpoints). The discrete points will lie much nearer the parabola for large n. The error is h(l - x)/2. For completeness we can go ahead to solve Tnu = h2 ones(n, 1) for every n: 1 0 Tn = (backward)(-forward) = [ -1 1 0 -1 The inverses of those first difference matrices are sum matrices (triangles of l's). The inverse of Tn is an upper triangle times a lower triangle: 1.2 Differences, Derivatives, Boundary Conditions 21 For n = 3 we recover 1 + 2 + 3 = 6 and 2 + 3 = 5, which appeared in (27). There is a formula for those sums in (30) and it gives the approximation U( Discrete solution (31) This answer has Un+I = 0 when i = n+ 1. And u0 = u1 so the boundary conditions are satisfied. That starting value u0 = ½nh is below the correct value u(0) = ½= ½(n+ 1)h only by ½h. This ½his the first-order error caused by replacing the zero slope at x = 0 by the one-sided condition u1 = ua. The worked example removes that O(h) error by centering the boundary condition. MATLAB Experiment The function u(x) = cos(1rx/2) satisfies free-fixed boundary conditions u'(0) = 0 and u(l) = 0. It solves the equation -u" = f = (1r/2)2cos(1rx/2). How close to u are the solutions U and V of the finite difference equations TnU = f and Tn+l V = g? h = 1/(n+l); u = cos(pi*(l:n) '*h/2); c = (pi/2)1'2; f = C*U; % Usual matrix T U = h*h*T\f; % Solution u1, ... , Un with one-sided condition u0 = u1 e = 1 - U(l) % First-order error at x = 0 g = [c/2;f]; T = ... ; % Create Tn+I as in equation (34) below. Note g(l) = /(0)/2 V = h*h*T\g; % Solution u0 , . .. , Un with centered condition u_ 1 = u1 E = 1 - V(l) % Second-order error from centering at x = 0 Take n = 3, 7, 15 and test T\f with appropriate T and f. Somehow the right mesh has (n + ½)h = 1, so the boundary point with u' = 0 is halfway between meshpoints. You should find e proportional to h and E proportional to h2. A big difference. ci WORKED EXAMPLES a 1.2 A Is there a way to avoid this O(h) error from the one-sided boundary condition u1 = u0? Constructing a more accurate difference equation is a perfect example of numerical analysis. This crucial decision comes between the modeling step (by a differential equation) and the computing step (solving the discrete equation). Solution The natural idea is a centered difference (u1 - u_1)/2h = 0. This copies the true u'(0) = 0 with second order accuracy. It introduces a new unknown u_1, so extend the difference equation to x = 0. Eliminating u_1 leaves size(T)= n+l: 22 Chapter 1 Applied Linear Algebra Centering the boundary condition multiplies /(0) by ½- Try n = 3 and h = ¼: l [ l gives uUo1 [ U2 _!_ 87.50 . (33) 16 6.0 U3 3.5 Those numbers ui are exactly equal to the true u(x) = ½(1 - x2) at the nodes. We are back to perfect agreement with the parabola in Figure 1.4. For a varying load f(x) and a non-parabolic solution to -u" = f(x), the centered discrete equation will have second-order errors O(h2). Problem 21 shows a very direct approach to u0 - u1 = ½h2/(0). 1.2 B When we multiply matrices, the backward ,6._ times the forward .6.+ gives 1 and -2 and 1 on the interior rows: We didn't get K 3, for two reasons. First, the signs are still reversed. And the first l corner entry is -1 instead of -2. The boundary rows give us T3, because .6._(.6.+u) sets to zero the first value .6.+u = (u1 - u0)/h (not the value of u itself!). -1 1 0 +- .6.2u boundary row with u0 = u1 -T3 = [ 1 -2 1 +- .6.2u typical row u2 - 2u1 + u0 (35) 0 1 -2 +- .6.2u boundary row with u4 = 0 The boundary condition at the top is zero slope. The second difference u2 - 2u1+ u0 becomes u2 - u1 when u0 = u1. We will come back to this, because in my experience 99% of the difficulties with differential equations occur at the boundary. u{O) = O, u{l) = 0 u'(O) = O, u'{l) = 0 u'(O) = O, u{l) = 0 u(O) = u{l), u'{O) = u'{l) K has Uo = Un+l = 0 B has Uo = U1, Un= Un+l T has Uo = u1, Un+l = 0 C has Uo = Un, U1 = Un+l An infinite tridiagonal matrix, with no boundary, maintains 1, -2, 1 down its infinitely long diagonals. Chopping off the infinite matrix would be the same as pretending that Uo and Un+1 are both zero. That leaves Kn, which has 2's in the corners. 1.2 Differences, Derivatives, Boundary Conditions 23 Problem Set 1.2 1 What are the second derivative u"(x) and the second difference !).2Un? Use 6(:z:). -2A u(x) = { Ax ifx:S0 Un= {An ifn:S0 -A 0 Bx if X ~ 0 Bn if n ~ 0 B 2B u(x) and U are piecewise linear with a corner at 0. 2 Solve the differential equation -u"(x) = 8(x) with u(-2) = 0 and u(3) = 0. The pieces u = A(x + 2) and u = B(x - 3) meet at x = 0. Show that the vector U = (u(-l),u(0),u(l),u(2)) solves the corresponding matrix problem KU= F = (0, 1,0,0). Problems 3-12 are about the "local accuracy" of finite differences. 3 The h2 term in the error for a centered difference (u(x + h) - u(x - h))/2h is ¼h2u"'(x). Test by computing that difference for u(x) = x3 and x4 . 4 Verify that the inverse of the backward difference matrix/)._ in (28) is the sum matrix in (29). But the centered difference matrix 1).0 = (!).+ + !)._)/2 might not be invertible! Solve !).0u = 0 for n = 3 and n = 5. 5 In the Taylor series (2), find the number a in the next term ah4u""(x) by testing u(x) = x4 at x = 0. 6 For u(x) = x4, compute the second derivative and second difference !)..2uj(!)..x) 2 . From the answers, predict c in the leading error in equation (9). 7 Four samples of u can give fourth-order accuracy for du/dx at the center: -u2 + 8u1 - 8u_1 + u_2 _ du bh4 d5u 12h - dx + dx5 + ••• l. Check that this is correct for u = l and u = x2 and u = x4 . 2. Expand u2, u1, u_1, u_2 as in equation (2). Combine the four Taylor series to discover the coefficient b in the h4 leading error term. 8 Question Why didn't I square the centered difference for a good !)..2? Answer A centered difference of a centered difference stretches too far: /)..0 /).0 + Un+2 - 2Un Un-2 2h 2h Un= (2h) 2 The second difference matrix now has 1, O, -2, O, 1 on a typical row. The accuracy is no better and we have trouble with Un+2 at the boundaries. Can you construct a fourth-order accurate centered difference for d2u/dx2 , choosing the right coefficients to multiply u2, u1, u0 , u_1, U-2? 24 Chapter 1 Applied Linear Algebra 9 Show that the fourth difference /:).4uj(!:).x) 4 with coefficients 1, -4, 6, -4, 1 ap- proximates d4u / dx4 by testing on u = x, x2, x 3, and x4: -u2 -- 4-u1-+-6(Du-o-x)--44-u_1-+-u_-2 = -dd4xu4 + (which leading error?). 10 Multiply the first difference matrices in the order !:).+!:).-, instead of!:)._!:).+ in equation (27). Which boundary row, first or last, corresponds to the boundary condition u = 0? Where is the approximation to u' = 0? 11 Suppose we want a one-sided approximation to ~~ with second order accuracy: ru(x) + su(x - !:).x) + tu(x - 2D.x) _ du /:).x - dx c ior _ u - 1 , x, 2 x . Substitute u = 1, x, x2 to find and solve three equations for r, s, t. The corresponding difference matrix will be lower triangular. The formula is "causal." 12 Equation (7) shows the "first difference of the first difference." Why is the left side within O(h2) of¾ [u:+½ - u:_½]? Why is this within O(h2) of u~? Problems 13-19 solve differential equations to test global accuracy. 13 Graph the free-fixed solution uo, ... , Us with n = 7 in Figure 1.4, in place of the existing graph with n = 3. You can use formula (30) or solve the 7 by 7 system. The O(h) error should be cut in half, from h = ¼to ½- 14 (a) Solve -u" = 12x2 with free-fixed conditions u'(0) = 0 and u(l) = 0. The complete solution involves integrating f(x) = 12x2 twice, plus Cx + D. (b) With h = n~l and n = 3, 7, 15, compute the discrete u1, ... , Un using Tn: ui+1 - 2hu2, + u,-1 = 3(i.h)2 w.ith u0 = 0 and Un+1 = 0 . Compare ui with the exact answer at the center point x = ih = ½. Is the error proportional to h or h2? 15 Plot the u = cos 41rx for 0 ::; x ::; 1 and the discrete values u, = cos 41rih at the meshpoints x = ih = n~1 . For small n those values will not catch the oscillations of cos 1rx. How large is a good n? How many mesh points per oscillation? 16 Solve -u" = cos 41rx with fixed-fixed conditions u(0) = u(l) = 0. Use K 4 and Ks to compute u1, ... , Un and plot on the same graph with u(x): ui+l - 2u, + u,_1 _ h2 - cos 4 .h 7fZ with Uo = Un+l = 0. 1.2 Differences, Derivatives, Boundary Conditions 25 17 Test the differences b.0u = (ui+l - Ui-i) and b.2u = ui+1 - + 2ui ui-1 on u(x) = eax. Factor out eax (this is why exponentials are so useful). Expand eat:,,x = l + ab.x + (ab.x) 2/2 + • • • to find the leading error terms. 18 Write a finite difference approximation (using K) with n = 4 unknowns to ::~ = x with boundary conditions u(O) = 0 and u(l) = 0. Solve for u 1, u 2, u 3, u4 . Compare them to the true solution. 19 Construct a centered difference approximation using K/h2 and b.0/2h to - d2u + du = l with u(O) = 0 and u(l) = 0. dx2 dx Separately use a forward difference b..+U/h for du/dx. Notice b.0=(b.++b.._)/2. Solve for the centered u and uncentered U with h = 1/5. The true u(x) is the particular solution u = x plus any A+Bex. Which A and B satisfy the boundary conditions ? How close are u and U to u(x) ? 20 The transpose of the centered difference b.0 is -b..0 (antisymmetric). That is like the minus sign in integration by parts, when f(x)g(x) drops to zero at ±oo: Integration by parts l oo f(x)-ddg dx = - loo -ddf g(x) dx. -oo X -oo X 00 00 -oo -00 E E Hint: Change i + 1 to i in /i9i+l, and change i - 1 to i in /i9i-1· 21 Use the expansion u(h) = u(O)+hu'(O)+½h2u"(O)+ •• with zero slope u'(O) = 0 and -u" = f (x) to derive the top boundary equation u 0 - u 1 = ½h2 f (0). This factor ½removes the O(h) error from Figure 1.4: good. 26 Chapter 1 Applied Linear Algebra 1.3 ELIMINATION LEADS TO K = LDLT This book has two themes-how to understand equations, and how to solve them. This section is about solving a system of n linear equations Ku = f. Our method will be Gaussian elimination (not determinants and not Cramer's Rule!). All software packages use elimination on positive definite systems of all sizes. MATLAB uses u = K\f (known as backslash), and [L, U] = lu(K) for triangular factors of K. The symmetric factorization K = LDLT takes two extra steps beyond the solution itself. First, elimination factors K into LU: lower triangular L times upper triangular U. Second, the symmetry of K leads to U = DLT. The steps from K to U and back to K are by lower triangular matrices-rows operating on lower rows. K = LU and K = LDLT are the right "matrix ways" to understand elimination. The pivots go into D. This is the most frequently used algorithm in scientific computing (billions of dollars per year) so it belongs in this book. If you met LU and LDLT earlier in a linear algebra course, I hope you find this a good review. Our first example is the 3 by 3 matrix K = K 3. It contains the nine coefficients (two of them are zero) in the linear equations Ku = f. The vector on the right side is not so important now, and we choose f = (4,0,0). Ku=f 2u1 - U2 = 4 + = -U1 2u2 - U3 0 + = - U2 2U3 0 The first step is to eliminate u1 from the second equation. Multiply equation 1 by ½and add to equation 2. The new matrix has a zero in the 2, 1 position-where u1 is eliminated. I have circled the first two pivots: l [ l [00-(1D-1o UU12 o -1 2 ¾ [ l Jhi+½!i is h 2u1 - U2 = 4 23 U2 - U3 = 2 + = - U2 2U3 0 The next step looks at the 2 by 2 system in the last two equations. The pivot d2 = ~ is circled. To eliminate u2 from the third equation, add ~ of the second equation. Then the matrix has a zero in the 3, 2 position. It is now the upper triangular U. !, 1, The three pivots 2, are on its diagonal: 2u1 - U2 = 4 lS 23 U2 - U3 = 2 34 U3 = 34 Forward elimination is complete. Note that all pivots and multipliers were decided by K, not f. The right side changed from f = (4, 0, 0) into the new c = (4, 2, !), and back substitution can begin. Triangular systems are quick to solve (n2 operations). 1.3 Elimination Leads to K = LDLT 27 Solution by back substitution. The last equation gives u3 = 1. Substituting into the second equation gives ~u2 - 1 = 2. Therefore U2 = 2. Substituting into the first equation gives 2u1 - 2 = 4. Therefore u 1 = 3 and the system is solved. The solution vector is u = (3, 2, 1). When we multiply the columns of K by those three numbers, they add up to the vector f. I always think of a matrix-vector multiplication Ku as a combination of the columns of K. Please look: Combine columns for Ku That sum is f = (4, 0, 0). Solving a system Ku= f is exactly the same as finding a combination of the columns of K that produces the vector f. This is important. The solution u expresses f as the "right combination" of the columns (with coefficients 3, 2, 1). For a singular matrix there might be no combination that produces f, or there might be infinitely many combinations. Our matrix K is invertible. When we divide Ku = (4, 0, 0) by 4, the right side becomes (1, 0, 0) which is the first column of I. So we are looking at the first column of K K-1 = I. We must be seeing the first column of K- 1. After dividing the previous u = (3, 2, 1) by 4, the first column of K-1 must be ¾, ¾, ¼: -! l[ !: :] [ ::] [-i :~ Column 1 of inverse ~ = I (3) If we really want K- 1, its columns come from Ku= columns of I. So K- 1 = K\I. Note about the multipliers: When we know the pivot in row j, and we know the entry to be eliminated in row i, the multiplier eii is their ratio: M uIti.p11. er = 0 tij -en-tr-y t-o .el-im-in-at-e -(in-ro-w-i) (4) pivot (in row j) The convention is to subtract (not add) eii times one equation from another equation. The multiplier €21 at our first step was-½ (the ratio of -1 to 2). That step added ½ of row 1 to row 2, which is the same as subtracting -½ (row 1) from row 2. Subtract .f.ij times the pivot row j from row i. Then the i, j entry is 0. - l The 3, 1 entry in the lower left corner of K was already zero. So there was nothing to eliminate and that multiplier was €31 = 0. The last multiplier was €32 = Elimination Produces K = LU Now put those multipliers €21, €31 , €32 into a lower triangular matrix L, with ones on the diagonal. L records the steps of elimination by storing the multipliers. The 28 Chapter 1 Applied Linear Algebra upper triangular U records the final result, and here is the connection K = LU: K=LU l [ l [ -! l . i J [ -12 -12 -10 = -½1 o o O O2 -1 o (5) 0 -1 2 0 --3 1 0 0 -3 The short and important and beautiful statement of Gaussian elimination is that K = LU. Please multiply those two matrices L and U. The lower triangular matrix L times the upper triangular matrix U recovers the original matrix K. I think of it this way: L reverses the elimination steps. This takes U back to K. LU is the "matrix form" of elimination and we have to emphasize it. Suppose forward elimination uses the multipliers in L to change the rows of K into the rows of U (upper triangular). Then K is factored into L times U. Elimination is a two-step process, going forward (down) and then backward (up). Forward uses L, backward uses U. Forward elimination reached a new right side c. (The elimination steps are really multiplying by L-1 to solve Le = f.) Back substitution on Uu = c leads to the solution u. Then c = L-1f and u = u-1c combine into u = u-1L- 1f which is the correct u = K-1f. Go back to the example and check that Le = f produces the right vector e: Le= f e1 = 4 e2 = 2 e3 = 34 as in (1). By keeping the right side up to date in elimination, we were solving Le = f. Forward elimination changed f into c. Then back substitution quickly finds u = (3, 2, 1). You might notice that we are not using "inverse matrices" anywhere in this computation. The inverse of K is not needed. Good software for linear algebra (the LAPACK library is in the public domain) separates Gaussian elimination into a factoring step that works on K, and a solving step that works on f: Step 1. Factor K into LU Step 2. Solve Ku = f for u [L, U] = lu(K) in MATLAB Le = f forward for c, then Uu = e backward The first step factors K into triangular matrices L times U. The solution step computes e (forward elimination) and then u (back substitution). MATLAB should almost never be asked for an inverse matrix. Use the backslash command K\J to compute u, and not the inverse command inv(K) * f: Step 1 + 2 Solve Ku= f by u = K\J (Backslash notices symmetry). 1.3 Elimination Leads to K = LDLT 29 The reason for two subroutines in LAPACK is to avoid repeating the same steps on K when there is a new vector f*. It is quite common (and desirable) to have several right sides with the same K. Then we Factor only once; it is the expensive part. The quick subroutine Solve finds the solutions u, u*, ... without computing K-1. For multiple f's, put them in the columns of a matrix F. Then use K\f. Singular Systems Back substitution is fast because U is triangular. It generally fails if a zero appears in the pivot. Forward elimination also fails, because a zero entry can't remove a nonzero entry below it. The official definition requires that pivots are never zero. If we meet a zero in the pivot position, we can exchange rows-hoping to move a nonzero entry up into the pivot. An invertible matrix has a full set of pivots. When the column has all zeros in the pivot position and below, this is our signal that the matrix is singular. It has no inverse. An example is C. l Example 1 Add -l's in the corners to get the circulant C. The first pivot is d1 = 2 with multipliers f21 = €31 = -½- The second pivot is d2 = But there is no third pivot: In the language of linear algebra, the rows of C are linearly dependent. Elimination found a combination of those rows (it was their sum) that produced the last row of all zeros in U. With only two pivots, C is singular. Example 2 Suppose a zero appears in the second pivot position but there is a nonzero below it. Then a row exchange produces the second pivot and elimination can continue. This example is not singular, even with the zero appearing in the 2, 2 position: o~l leads to [~1 ~1 . Exchange rows to U = Exchange rows on the right side of the equations too! The pivots become all ones, and elimination succeeds. The original matrix is invertible but not positive definite. (Its determinant is minus the product of pivots, so -1, because of the row exchange.) The exercises show how a permutation matrix P carries out this row exchange. The triangular Land U are now the factors of PA (so that PA= LU). The original A had no LU factorization, even though it was invertible. After the row exchange, PA has its rows in the right order for LU. We summarize the three possibilities: 30 Chapter 1 Applied Linear Algebra Elimination on an n by n matrix A may or may not require row exchanges: No row exchanges to get n pivots: A is invertible and A= LU. Row exchanges by P to get n pivots: A is invertible and PA= LU. No way to find n pivots: A is singular. There is no inverse matrix A - 1 . Positive definite matrices are recognized by the fact that they are symmetric and they need no row exchanges and all pivots are positive. We are still waiting for the meaning of this property-elimination gives a way to test for it. Symmetry Converts K = LU to K = LDLT The factorization K = LU comes directly from elimination-which produces U by the multipliers in L. This is extremely valuable, but something good was lost. The original K was symmetric, but L and U are not symmetric: l [-½ ~ l [ -i -! l Symmetry is lost K = [-~ :~ -~ = 2 0 1 2 O --3 1 =LU. -3 The lower factor L has ones on the diagonal. The upper factor U has the pivots. f This is unsymmetric, but the symmetry is easy to recover. Just separate the pivots into a diagonal matrix D, by dividing the rows of U by the pivots 2, ~, and Symmetry is recovered (6) Now we have it. The pivot matrix D is in the middle. The matrix on the left is still L. The matrix on the right is the transpose of L: The symmetric factorization of a symmetric matrix is K = LDLT . This triple factorization preserves the symmetry. That is important and needs to be highlighted. It applies to LDLT and to every other "symmetric product" ATCA. The product LDLT is automatically a symmetric matrix, if D is diagonal. More than that, ATCA is automatically symmetric if C is symmetric. The factor A is not necessarily square and C is not necessarily diagonal. The reason for symmetry comes directly from matrix multiplication. The transpose of any product AB is equal to BTAT. The individual transposes come in the opposite order, and that is just what we want: This is LDLT again. 1.3 Elimination Leads to K = LDLT 31 (LT)T is the same as L. Also DT = D (diagonal matrices are symmetric). The displayed line says that the transpose of LDLT is LDLT. This is symmetry. The same reasoning applies to ATCA. Its transpose is ATCT(AT)T. If C is symmetric (C = CT), then this is ATCA again. Notice the special case when the matrix in the middle is the identity matrix C = I: For any rectangular matrix A, the product ATA is square and symmetric. We will meet these products ATA and ATCA many times. By assuming a little more about A and C, the product will be not only symmetric but positive definite. The Determinant of Kn f ! !. Elimination on K begins with the three pivots and and This pattern continues. The ith pivot is i1l · The last pivot is n!l · The product of the pivots is the determinant, and cancelling fractions produces the answer n + 1: Determinant of Kn (7) The reason is that determinants always multiply: (det K) = (det L)(det U). The triangular matrix L has l's on its diagonal, so det L = 1. The triangular matrix U has the pivots on its diagonal, so det U = product of pivots = n + 1. The LU factorization not only solves Ku = f, it is the quick way to compute the determinant. There is a similar pattern for the multipliers that appear in elimination: Multipliers n-1 fn,n-1 = - -n- • (8) All other multipliers are zero. This is the crucial fact about elimination on a tridiagonal matrix, that L and U are bidiagonal. If a row of K starts with p zeros (no elimination needed there), then that row of L also starts with p zeros. If a column of K starts with q zeros, then that column of U starts with q zeros. Zeros inside the band can unfortunately be "filled in" by elimination. This leads to the fundamental problem of reordering the rows and columns to make the p's and q's as large as possible. For our tridiagonal matrices the ordering is already perfect. You may not need a proof that the pivots i-J:- l and the multipliers - i-: 1 are correct. i For completeness, here is row i of L multiplying columns i - 1, i, and i i + 1 of U: l ~ [ _i•i1 1 ] [ i-~i-111 _ o 1 =[-12-l]=rowiofK. The Thomas algorithm in Example 4 will solve tridiagonal systems in 8n steps. 32 Chapter 1 Applied Linear Algebra Positive Pivots and Positive Determinants I will repeat one note about positive definiteness (the matrix must be symmetric to start with). It is positive definite if all n pivots are positive. We need n nonzero pivots for invertibility, and we need n positive pivots (without row exchanges) for positive definiteness. For 2 by 2 matrices [b~], the first pivot is a. The only multiplier is f21 = b/a. Subtracting b/a times row 1 from row 2 puts the number c - (b2/a) into the second pivot. This is the same as (ac - b2 )/a. Please notice Land LT in K = LDLT: 2 by 2 factors (9) These pivots are positive when a> 0 and ac - b2 > 0. This is the 2 by 2 test: [ ~ ~ ] is positive definite if and only if a > 0 and ac - b2 > 0 . Out of four examples, the last three fail the test: [~ ~ ] [! :] [~ ~] pos def pos semidef indef [ -2 -3 -3] -8 neg def The matrix with b = 4 is singular (pivot missing) and positive semidefinite. The matrix with b = 6 has ac - b2 = -20. That matrix is indefinite (pivots +2 and -10). The last matrix has a< 0. It is negative definite even though its determinant is positive. Example 3 K3 shows the key link between pivots and upper left determinants: [IiJ. ][~ -~ [tJ [~ -~ l l 0 1 2 = 0 -- 1 - l · 1 3 3 J, 1- The upper left determinants of K are 2, 3, 4. The pivots are their ratios 2, All upper left determinants are positive exactly when all pivots are positive. Operation Counts The factors L and U are bidiagonal when K = LU is tridiagonal. Then the work of elimination is proportional to n (a few operations per row) . This is very different from the number of additions plus multiplications to factor a full matrix. The leading terms are }n3 with symmetry and Jn3 in general. For n = 1000, we are comparing thousands of operations (quick) against hundreds of millions. 1.3 Elimination Leads to K = LDLT 33 Between those extremes (tridiagonal versus full) are band matrices. There might be w nonzero diagonals above and also below the main diagonal. Each row operation needs a division for the multiplier, and w multiplications and additions. With w entries below each pivot, that makes 2w2 + w to clear out each column. With n columns the overall count grows like 2w2n, still only linear in n. On the right side vector f, forward elimination and back substitution use w multiply-adds per row, plus one division. A full matrix needs n2 multiply-adds on the right side, [(n-1) + (n-2) + .. ·+ 1] forward and [1 + 2 + .. ·+ (n-1)] backward. in This is still much less than 3 total operations on the left side. Here is a table: Operation Count (Multiplies+ adds) Factor: Find L and U Solve: Forward and back on/ Full ~~ 32 n3 2n2 Banded Tridiagonal 2w2n+ wn 3n 4wn+n 5n Example 4 The Thomas algorithm solves tridiagonal Au = f in 8n floating-point operations. A has b1, ... , bn on the diagonal with a2, ... , an below and c1, ... , Cn-1 above. Exchange equations i and i + 1 if lbil < lai+l I at step i. With no exchanges: for i from 1 to n - 1 ci - c;/b; Ii - /;/bi bi+1 - bi+l - ai+1 ci /i+1 - /i+1 - ai+di end forward loop Un f-- fn/bn for i from n - 1 to 1 Ui - Ii - CiUi+l end backward loop Example 5 Test the commands [L, U, P] = lu(A) and P * A - *L U on A1, A 2 , A3 . H:~ -il !~] !:] Ai- A, - [~ A, - [~ For A1 , the permutation matrix P exchanges which rows? Always PA= LU. For A2 , MATLAB exchanges rows to achieve the largest pivots column by column. For A3 , which is not positive definite, rows are still exchanged: P -=I= I and U -=I= DLT. Problem Set 1.3 1 Extend equation (5) into a 4 by 4 factorization K 4 = L4D4LJ. What is the determinant of K4? 2 1. Find the inverses of the 3 by 3 matrices Land D and LT in equation (5). 2. Write a formula for the ith pivot of K. 3. Check that the i,j entry of L41 is j/i (on and below the diagonal) by multiplying L4L41 or L41L4. 34 Chapter 1 Applied Linear Algebra 3 1. Enter the matrix K5 by the MATLAB command toeplitz([2 -1 0 0 0]). 2. Compute the determinant and the inverse by det(K) and inv(K). For a neater answer compute the determinant times the inverse. 3. Find the L, D, U factors of K 5 and verify that the i,j entry of L-1 is j/i. ! ! ~]. 4 The vector of pivots for K 4 is d = [t This is d = (2:5)./(1:4), using MATLAB's counting vector i : j = (i,i + 1, ... ,j). The extra. makes the division act a component at a time. Find f, in the MATLAB expression for L = eye(4) - diag(f, -1) and multiply L * diag(d) * L' to recover K 4 . 5 If A has pivots 2, 7, 6 with no row exchanges, what are the pivots for the upper left 2 by 2 submatrix B (without row 3 and column 3)? Explain why. 6 How many entries can you choose freely in a 5 by 5 symmetric matrix K? How many can you choose in a 5 by 5 diagonal matrix D and lower triangular L (with ones on its diagonal)? 7 Suppose A is rectangular (m by n) and C is symmetric (m by m). 1. Transpose ATCA to show its symmetry. What shape is this matrix? 2. Show why ATA has no negative numbers on its diagonal. 8 Factor these symmetric matrices into A= LDLT with the pivots in D: ! ;] ! ~] A = [ and A = [ and A = 21 [0 211 o;] 9 The Cholesky command A= chol(K) produces an upper triangular A with K = ATA. The square roots of the pivots from D are now included on the diagonal of A (so Cholesky fails unless K = KT and the pivots are positive). Try the chol command on K 3 , T3 , B3 , and B3 + eps * eye(3). 10 The all-ones matrix ones(4) is positive semidefinite. Find all its pivots (zero not allowed). Find its determinant and try eig(ones(4)). Factor it into a 4 by 1 matrix L times a 1 by 4 matrix LT. 11 The matrix K = ones(4) + eye(4)/100 has all l's off the diagonal, and 1.01 down the main diagonal. Is it positive definite? Find the pivots by lu(K) and eigenvalues by eig(K). Also find its LDLT factorization and inv(K). 12 The matrix K =pasca1(4) contains the numbers from the Pascal triangle (tilted to fit symmetrically into K). Multiply its pivots to find its determinant. Factor K into LLT where the lower triangular L also contains the Pascal triangle! 13 The Fibonacci matrix [t A] is indefinite. Find its pivots. Factor it into LDLT. Multiply (1, 0) by this matrix 5 times, to see the first 6 Fibonacci numbers. 1.3 Elimination Leads to K = LDLT 35 14 If A = LU, solve by hand the equation Ax = f without ever finding A itself. Solve Le = f and then Ux = c (then LUx = Le is the desired equation Ax = !) . Le = f is forward elimination and Ux = c is back substitution: ol U= [2 8 3 ~ 15 From the multiplication LS show that is the inverse of S = [-t'2~ 1 ] · -t'31 0 1 S subtracts multiples of row 1 from lower rows. L adds them back. 16 Unlike the previous exercise, which eliminated only one column, show that is not the inverse of S = [-t'2~ 1 ] · -t'31 -t'32 1 Write Las L1L2 to find the correct inverse L-1 = £"21£":11 (notice the order): and 17 By trial and error, find examples of 2 by 2 matrices such that 1. LU f= UL 2. A2 = -1, with real entries in A 3. B2 = 0, with no zeros in B 4. CD= -DC, not allowing CD= 0 18 Write down a 3 by 3 matrix with row 1 - 2 * row 2 + row 3 = 0 and find a similar dependence of the columns-a combination of columns that gives zero. 19 Draw these equations in their row form (two intersecting lines) and find the solution (x, y). Then draw their column form by adding two vectors: 20 True or false: Every matrix A can be factored into a lower triangular L times an upper triangular U, with nonzero diagonals. Find Land U when possible: !] When is A = [~ = LU? 36 Chapter 1 Applied Linear Algebra 1.4 INVERSES AND DELTA FUNCTIONS We are comparing matrix equations with differential equations. One is Ku = f, the other is -u" = J(x). The solutions are vectors u and functions u(x). This comparison is quite remarkable when special vectors f and functions J(x) are the forcing terms on the right side. With a uniform load J(x) = constant, both solutions are parabolas (Section 1.2). Now comes the opposite choice with f = point load: In the matrix equation, take f = Dj = jth column of the identity matrix. In the differential equation, take J(x) = 8(x - a) = delta function at x = a. The delta function may be partly or completely new to you. It is zero except at one point. The function 8(x - a) represents a "spike" or a "point load" or an "impulse" concentrated at the single point x = a. The solution u(x) or u(x, a), where a gives the placement of the load, is the Green's function. When we know the Green's function for all point loads 8(x - a), we can solve -u" = J(x) for any load f(x). In the matrix equation Ku= 8j, the right side is column j of I. The solution is u = column j of K- 1. We are solving KK- 1 = I, column by column. So we are finding the inverse matrix, which is the "discrete Green's function." Like x and a, the discrete (K-1)ij locates the solution at point i from a load at point j. The amazing fact is that the entries of K-1 and r-1 fall exactly on the solutions u(x) to the continuous problems. The figures show this, and so will the text. u(x) slope drops from ½to -½ slope drops from Oto -1 free fixed u(O) = 0 1 2 u(l) = 0 u 1(0) = 0 1 2 u(l) = 0 r Figure 1.5: Middle columns of h K51 and h 5- 1 lie on solutions to -u" = 8(x - ½)- Concentrated Load Figure 1.5 shows the form of u(x), when the load is at the halfway point x = ½- Away from this load, our equation is u" = 0 and its solution is u = straight line. The problem is to match the two lines (before and after ½) with the point load. Example 1 Solve -u" = point load with fixed-fixed and free-fixed endpoints: - d2u -dx2 = f (x) = 8(x - -1 ) 2 with { fixed: u(O) = 0 free: u'(O) = 0 and and fixed: u(l) = 0 fixed: u(l) = 0 1.4 Inverses and Delta Functions 37 Solution In the fixed-fixed problem, the up and down lines must start and end at u = 0. At the load point x = ½, the function u(x) is continuous and the lines meet. The slope drops by 1 because the delta function has "area= 1". To see the drop in slope, integrate both sides of -u" = 8 across x = ½: = right d2u I.right I. - - dx 8(x - 1 - ) dx left dx 2 left 2 gives _ (du) + (du) = 1. (l) dx right dx left ½ -½- The fixed-fixed case has u;eft = and u~ight = The fixed-free case has u;eft = 0 and u~ight = -1. In every case the slope drops by 1 at the unit load. These solutions u(x) are ramp functions, with a corner. In the rest of the section we move the load to x = a and compute the new ramps. (The fixed-fixed ramp will have slopes 1 - a and -a, always dropping by 1.) And we will find discrete r- ramps for the columns of the inverse matrices K-1 and 1. The entries will increase linearly, up to the diagonal of K-1, then go linearly down to the end of the column. It is remarkable to be finding exact solutions and exact inverses. We take this chance to do it. These problems are exceptionally simple and important, so why not? Example 2 Move the point load to x = a. Everywhere else u" = 0, so the solution is u = Ax + B up to the load. It changes to u = Cx + D beyond that point. Four equations (two at boundaries, two at x = a) determine those constants A, B, C, D: Boundary Conditions Jump/No Jump Conditions at x = a fixed u(O) = 0 : B 0 fixed u(l) = 0 : C + D = 0 No jump in u: Aa+B Ca+D Drop by 1 in u' : A = C+l Substitute B = 0 and D = -C into the first equation on the right: Aa + 0 = Ca - C and A = C + 1 give slopes A = 1 - a and C = -a. (2) Then D = -C = a produces the solution in Figure 1.6. The ramp is u = (1- a)x going up and u = a(l - x) going down. On the right we show a column of K- 1 , computed in equation (10): linear up and down. u(x) Column 5 of K61 u=a(l-x) 2-5 7 =Cx+D 0 a--§7. 1 1 2 3 4 5 6 Row i Figure 1.6: Response at x to a load at a= ¥ (fixed-fixed boundary). For the matrix K61, the entries in each column go linearly up and down like the true u(x). 38 Chapter 1 Applied Linear Algebra Delta Function and Green's Function We solve -u" = 8(x - a) again, by a slightly different method (same answer). A particular solution is a ramp. Then we add all solutions Cx+D to u" = 0. Example 2 used the boundary conditions first, and now we use them last. You must recognize that c5(x) and c5(x - a) are not true functions! They are zero except at one point x = 0 or x = a where the function is "infinite"-too vague. The spike is "infinitely tall and infinitesimally thin." One definition is to say that the integral of c5(x) is the unit step function S (x) in Figure 1.7. "The area is 1 under the spike at x = 0." No true function could achieve that, but c5(x) is extremely useful. The standard ramp function is R = 0 up to the corner at x = 0 and then R = x. Its slope dR/dx is a step function. Its second derivative is d2R/dx2 = 8(x). _ c 8(x)=ddSx S ( x )d= x d R - L - 0 Delta function c5(x) X 0 Step function S(x) X 0 Ramp function R(x) Figure 1.7: The integral of the delta function is the step function, so c5(x) = dS/dx. The integral of the step S(x) is the ramp R(x), so c5(x) = d2R/dx2 . Now shift the three graphs by a. The shifted ramp R(x - a) is O then x - a. This has first derivative S(x - a) and second derivative c5(x - a). In words, the first derivative jumps by 1 at x = a so the second derivative is a delta function. Since our equation -d2u/dx2 = 8(x - a) has a minus sign, we want the slope to drop by 1. The descending ramp - R(x - a) is a particular solution to -u" = c5(x - a). Main point We have u" = 0 except at x = a. So u is a straight line on the left and right of a. The slope of this ramp drops by l at a, as required by -u" = 8(x - a). The downward ramp -R(x - a) is one particular solution, and we can add Cx + D. The two constants C and D came from two integrations. The complete solution (particular + nullspace) is a family of ramps: Complete solution - d2u dx2 = 8(x - a) is solved by u(x) = -R(x - a)+ Cx + D. (3) The constants C and D are determined by the boundary conditions. u(0) = -R(0 - a)+ C • 0 + D = 0. Therefore D must be zero. From u(l) = 0 we learn that the other constant (in Cx) is C = l - a: u(l) = -R(l - a)+ C • l + D = a - l + C = 0. Therefore C = l - a. So the ramp increases with slope 1 - a until x = a. Then it decreases to u(l) = 0. When we substitute R = 0 followed by R = x - a, we find the two parts: 1.4 Inverses and Delta Functions 39 FIXED { (1 - a)x for x::; a ENDS u(x) = -R(x - a)+ (l - a)x = (1 - x)a for x 2': a (4) The slope of u(x) starts at 1 - a and drops by 1 to -a. This unit drop in the slope means a delta function for -d2u/dx2 , as required. The first part (1 - a)x gives u(0) = 0, the second part (1 - x)a gives u(l) = 0. Please notice the symmetry between x and a in the two parts! Those are like i and j in the symmetric matrix (K-1)ij = (K-1)ji· The response at x to a load at a equals the response at a to a load at x. It is the "Green's function." Free-Fixed Boundary Conditions When the end x = 0 becomes free, that boundary condition changes to u'(0) = 0. This leads to C = 0 in the complete solution u(x) = -R(x - a)+ Cx + D: Set x = 0: u'(0) = 0 + C + 0. Therefore C must be zero. Then u(l) = 0 yields the other constant D = 1 - a: Set x = 1: u(l) = -R(l - a)+ D = a - 1 + D = 0. Therefore D = 1 - a. The solution is a constant D (zero slope) up to the load at x = a. Then the slope drops to -1 (descending ramp). The two-part formula for u(x) is u=l-a FREEFIXED u(x) = { 1 1 - a x for for x::; a x 2': a (5) 0 a--3 ~ 1 Free-Free: There is no solution for f = c5(x - a) when both ends are free. If we require u'(0) = 0 and also u'(l) = 0, those give conditions on C and D that cannot be met. A ramp can't have zero slope on both sides (and support the load). In the same way, the matrix B is singular and BB- 1 = I has no solution. J The free-free problem does have a solution when J(x) dx = 0. The example in problem 7 is J(x) = c5 (x - ½) - c5 (x - i)- The problem is still singular and it has infinitely many solutions (any constant can be added to u(x), without changing u 1(0) = 0 and u'(l) = 0). J Integrating -u 11 = J(x) from Oto 1 gives this requirement J(x) dx = 0. The integral of -u" is u 1(0) - u 1(1), and free-free boundary conditions make that zero. In the matrix case, add then equations Bu= f to get O=Ji+···+ fn as the test. 40 Chapter 1 Applied Linear Algebra Discrete Vectors: Load and Step and Ramp Those solutions u(x) in (4) and (5) are the Green's functions G(x,a) for fixed ends r- and free-fixed ends. They will correspond to K-1 and 1 in the matrix equations. The matrices have second differences in place of second derivatives. It is a pleasure to see how difference equations imitate differential equations. The crucial equation becomes /:).2R = 8. This copies R"(x) = 8(x): The delta vector 8 has one nonzero component 80 = 1: The step vector S has components Si= 0 or 1: The ramp vector R has components Ri = 0 or i: 8 = (... , 0, 0, 1, 0, 0, .. . S = (... ,0, 0, 1, 1, 1, .. . R = (... ,0, 0, 0, 1, 2, .. . These vectors are all centered at i = 0. Notice that !:)._S = 8 but !:).+R = S. We need a backward !:)._ and a forward!:).+ to get a centered second difference /:).2 = !:)._!:).+· Then 1:).2R = !:)._S = 8. Matrix multiplication shows this clearly: .6.2 (ramp) (6) The ramp vector R is piecewise linear. At the center point, the second difference jumps to R1 - 2Ro + R_1 = 1. At all other points (where the delta vector is zero) the ramp solves 1:).2 R = 0. Thus 1:).2R = 8 copies R"(x) = 8(x). 0 -o-----o--+-o-- 2 -1 0 1 2 s 0 0 0 ---o---o-+----+ -2 -1 0 1 2 0 R 0 -0------0------ 2 -1 0 1 2 Figure 1.8: The delta vector 8 and step vector S and ramp vector R. The key relations = = = are t5 -6._S (backward) and S -6.+R (forward) and t5 .6.2 R (centered). The solutions to d2u/dx2 = 0 are linear functions Cx + D. The solutions to I:).2u = 0 are "linear vectors" with u, = Ci+ D. The equation ui+1 - 2u, + u,_1 = 0 is satisfied by constant vectors and linear vectors, since (i + 1) - 2i + (i - 1) = 0. The complete solution to !:)..2u = 8 is + Uparticular Unullspace· Thus Ui = R +Ci+ D. I want to emphasize that this is unusually perfect. The discrete Ri +Ci+ D is an exact copy of the continuous solution u(x) = R(x) + Cx + D. We can solve /:). 2u = 8 by sampling the ramp u(x) at equally spaced points, without any error. 1.4 Inverses and Delta Functions 41 The Discrete Equations Ku= ,Si and Tu= ,Si In the differential equation, the point load and step and ramp moved to x = a. In the difference equation, the load moves to component j. The right side Dj has components Di-j, zero except when i = j. Then the shifted step and shifted ramp have components Si-j and R-j, also centered at j. The fixed-ends difference equation from -u"(x) = 8(x - a) is now -b.2u = Dj: - A2 u ui = -ui+I + 2ui - ui-I = {l0iiff ii=i- jj w.ith uo = 0 and Un+1 = 0. (7) The left side is exactly the matrix-vector multiplication Knu. The minus sign in -b.2 changes the rows 1, -2, 1 to their positive definite form -1, 2, -1. On the right side, the shifted delta vector is the jth column of the identity matrix. When the load is at meshpoint j = 2, the equation is column 2 of K K-1 = I: l[~: l [ l - [-i -~ -~ ~ n=4 j=2 0 -1 2 -1 U3 ~ 0 j =2 (8) 0 0 -1 2 U4 0 When the right sides are the four columns of I (with j = 1, 2, 3, 4) the solutions are the four columns of Ki 1. This inverse matrix is the discrete Green's function. What is the solution vector u? A particular solution is the descending ramp - R-j, shifted by j and sign-reversed. The complete solution includes Ci+ D, which solves b.2u = 0. Thus ui = -R-j +Ci+ D. The constants C and Dare determined by the two boundary conditions u0 = 0 and Un+1 = 0: u0 = -Ro-j + C • 0 + D = 0. Un+I = -Rn+1-j + C(n + 1) + 0 = 0. Therefore D must be zero (9) n¼r +1 • • Therefore C = nn+? = 1 - (10) Those results are parallel to D = 0 and C = 1 - a in the differential equation. The tilted ramp u = -R + Ci in Figure 1.9 increases linearly from u0 = 0. Its peak is at the position j of the point load, and the ramp descends linearly to Un+I = 0: FIXED . { (n;~~j) i for i :::; j ENDS Ui = -R-j +Ci= (n+l-i) . . . n+l J for z?::. J (11) Those are the entries of K;;1 (asked for in earlier problem sets). Above the diagonal, for i :::; j, the ramp is zero and ui = Ci. Below the diagonal, we can just exchange i and j, since we know that K;;1 is symmetric. These formulas for the vector u are exactly parallel to (1-a)x and (1-x)a in equation (4) for the fixed-ends continuous problem. Figure 1.9 shows a typical case with n = 4 and the load at j = 2. The formulas i, !, }) . i in (11) give u = (~, The numbers go linearly up to (on the main diagonal of K:41). Then 4/5 and 2/5 go linearly back down. The matrix equation (8) shows that this vector u should be the second column of Ki 1, and it is. 42 Chapter 1 Applied Linear Algebra 0 1 2 3 4 5 f = load at j = 2 = column 2 of J u = response to load = column 2 of K- 1 K_1 4 = _!: 5 [ 3 4 2 6 3 4 4 2 6 2l l 3 1 2 3 4 Figure 1.9: K 4u = 82 has the point load in position j = 2. The equation is column 2 of K4 K 41 = I. The solution u is column 2 of K41. The free-fixed discrete equation Tu = f can also have / = OJ = point load at j: Discrete -,!:),.2u, = 8,-J with u1 - u0 = 0 (zero slope) and Un+1 = 0. (12) The solution is still a ramp ui = - R,,_i + Ci + D with corner at j. But the constants C and D have new values because of the new boundary condition u 1 = u0 : u1 - uo = 0 + C + 0 = 0 + Un+l = -Rn+l-J D = 0 so the first constant is C = 0 (13) so the second constant is D = n + 1 - j. (14) Those are completely analogous to C = 0 and D = 1 - a in the continuous problem above. The solution equals D up to the point load at position j. Then the ramp descends to reach Un+I = 0 at the other boundary. The two-part formula -R,,-J +D, before and after the point load at i = j, is FREEFIXED u, = -R,,-J + (n + 1- J.) = { nn ++1l - j i for i:::; j for i ~ j (15) r- The two parts are above and below the diagonal in the matrix 1. The point loads at j = 1, 2, 3, ... lead to columns 1, 2, 3, ... and you seen+ 1 - 1 in the corner: ~ ' i 0 1 2 3 4 5 != load at j = 2 column 2 of J U= response to load column 2 of r-1 T4- 1 _ - r- Figure 1.10: T4u = 82 is column 2 of TT- 1 = I, sou= column 2 of 1. r- This 1 is the matrix that came in Section 1.2 by inverting T = UTU. Each r- column of 1 is constant down to the main diagonal and then linear, just like u(x) = 1 - a followed by u(x) = 1 - x in the free-fixed Green's function u(x, a). 1.4 Inverses and Delta Functions 43 Green's Function and the Inverse Matrix If we can solve for point loads, we can solve for any loads. In the matrix case this is immediate (and worth seeing). Any vector f is a combination of n point loads: (16) The inverse matrix multiplies each column to combine three point load solutions: Matrix multiplication u = K-1f is perfectly chosen to combine those columns. In the continuous case, the combination gives an integral not a sum. The load J(x) is an integral of point loads f(a)8(x - a). The solution u(x) is an integral over all a of responses u(x, a) to those loads at each point a: 1 1 -u" = J(x) = 1J(a)8(x - a)da is solved by u(x) = 1J(a)u(x, a)da. (18) The Green's function u(x, a) corresponds to "row x and column a" of a continuous K-1. We will see it again. To summarize, we repeat formulas (4) and (5) for u(x, a): FIXED u _ { (1- a)x for x:::; a ENDS - (1 - x)a for x 2: a FREE { 1 - a for x < a FIXED u = 1 - x for x ~ a (19) If we sample the fixed-ends solution at x = ntl, when the load is at a= n{l, then we (almost!) have the i,j entry of K;;1. The only difference between equations (11) and (19) is an extra factor of n + 1 = 1/f::J..x. The exact analogy would be this: (8) - d~xu2 = 8(x) corresponds to (f::JK..x) 2 U = f::J..x • (20) We divide K by h2 = (b.x) 2 to approximate the second derivative. We divide 8 by h = f::J..x because the area should be 1. Each component of 8 corresponds to a little x-interval of length f::J..x, so area = 1 requires height = 1/ l::J..x. Then our u is U/ l::J..x. Cl WORKED EXAMPLES Cl r- 1.4 The "Woodbury-Sherman-Morrison formula" will find K-1 from 1. This for- mula gives the rank-one change in the inverse, when the matrix has a rank-one change in K = T - uvT. In this example the change is only +1 in the 1, 1 entry, coming from T11 = 1 + K11 . The column vectors are v = (1, 0, ... 0) = -u. 44 Chapter 1 Applied Linear Algebra Here is one of the most useful formulas in linear algebra (it extends to T- U VT): Woodbury-Sherman-Morrison Inverse of K = T - uvT (21) The proof multiplies the right side by T - uvT, and simplifies to I. Problem 1.1.7 displays r-1 - K-1 when the vectors have length n = 4: vTT- 1 = rowlofT- 1 = [4 3 2 1] 1-vTT-1u=1+4=5. For any n, K-1 comes from the simpler r- 1 by subtracting wTwj(n+l) with w =n:-1:1 Problem Set 1.4 1 For -u" = 8(x - a), the solution must be linear on each side of the load. What four conditions determine A, B, C, D if u(0) = 2 and u(l) = 0? u(x) =Ax+ B for O:::; x:::; a and u(x) = Cx + D for a:::; x:::; 1. 2 Change Problem 1 to the free-fixed case u'(0) = 0 and u(l) = 4. Find and solve the four equations for A, B, C, D. l 3 Suppose there are two unit loads, at the points a = ½and b = Solve the fixed-fixed problem in two ways: First combine the two single-load solutions. The other way is to find six conditions for A, B, C, D, E, F: u(x) =Ax+ B for x:::; 31, Cx+D 1 2 for3-<- x<-3-, Ex + F for x ?: 2 3 . 4 Solve the equation -d2u/dx2 = 8(x - a) with fixed-free boundary conditions u(0) = 0 and u'(l) = 0. Draw the graphs of u(x) and u'(x). 5 Show that the same equation with free-free conditions u'(0) = 0 and u'(l) = 0 has no solution. The equations for C and D cannot be solved. This corresponds to the singular matrix Bn (with 1, 1 and n, n entries both changed to 1). 6 Show that -u" = 8(x - a) with periodic conditions u(0) = u(l) and u'(0) = u'(l) cannot be solved. Again the requirements on C and D cannot be met. This corresponds to the singular circulant matrix Cn (with 1, n and n, 1 entries changed to -1). 7 A difference of point loads, J(x) = 8(x - ½) - 8(x - i), does allow a freefree solution to -u" = f. Find infinitely many solutions with u'(0) = 0 and u'(l) = 0. 8 The difference f(x) = 8(x - ½) - 8(x - i) has zero total load, and -u" = J(x) can also be solved with periodic boundary conditions. Find a particular solution Upart(x) and then the complete solution Upart + Unull· 1.4 Inverses and Delta Functions 45 9 The distributed load f(x) = 1 is the integral of loads o(x - a) at all points x = a. The free-fixed solution u(x) = ½(1- x2) from Section 1.3 should then be the integral of the point-load solutions (1 - x for a :s; x, and 1 - a for a~ x): 1 u(x) = x 11 (1-x) da+ (1-a) da = 12 x2 (1-x)x+(l--)-(x--) = 1 -- -1x2. YES! 0 X 2 2 22 Check the fixed-fixed case u(x) = f0x(l - x)ada + J:(1 - a)xda = __ . 10 If you add together the columns of K-1 (or r-1), you get a "discrete parabola" that solves the equation Ku = f (or Tu = f) with what vector f? Do this addition for K41 in Figure 1.9 and r4- 1 in Figure 1.10. Problems 11-15 are about delta functions and their integrals and derivatives 11 The integral of o(x) is the step function S(x). The integral of S(x) is the ramp R(x). Find and graph the next two integrals: the quadratic spline Q(x) and the cubic spline C(x). Which derivatives of C(x) are continuous at x = 0? 12 The cubic spline C(x) solves the fourth-order equation u'111 = o(x). What is the complete solution u(x) with four arbitrary constants? Choose those constants so that u(l) = u"(l) = u(-1) = u"(-1) = 0. This gives the bending of a uniform simply supported beam under a point load. 13 The defining property of the delta function o(x) is that 1: o(x) g(x) dx = g(0) for every smooth function g(x). How does this give "area= l" under o(x)? What is Jo(x - 3) g(x) dx? 14 The function o(x) is a "weak limit" of very high, very thin square waves SW: 1: SW(x) = 2~ for !xi :s; h has SW(x) g(x) dx -t g(0) as h -t 0. For a constant g(x) = 1 and every g(x) = xn, show that J SW(x)g(x) dx -t g(0). We use the word "weak" because the rule depends on test functions g(x). 15 The derivative of o(x) is the doublet o'(x). Integrate by parts to compute 1: -1: g(x) o'(x) dx = (?) o(x) dx = (??) for smooth g(x). 46 Chapter 1 Applied Linear Algebra 1.5 EIGENVALUES AND EIGENVECTORS This section begins with Ax= AX. That is the equation for an eigenvector x and its eigenvalue A. We can solve Ax= AX for small matrices, starting from det(A-AI) = 0. Possibly you know this already (it would be a horrible method for large matrices). There is no "elimination" that leads in a finite time to the exact A and x. Since A multiplies x, the equation Ax = Ax is not linear. One great success of numerical linear algebra is the development of fast and stable algorithms to compute eigenvalues (especially when A is symmetric). The command eig(A) in MATLAB produces n numbers A1, ... , An and not a formula. But this chapter is dealing with special matrices ! For those we will find A and x exactly. Part I: Applying eigenvalues to diagonalize A and solve u 1 = Au. Part II: The eigenvalues of Kn, Tn, Bn, Cn are all A= 2 - 2 cos 0. The two parts may need two lectures. The table at the end reports all we know about A and x for important classes of matrices. The first big application of eigenvalues is to Newton's Law Mu"+ Ku= 0 in Section 2.2. Part I: Ax = .Xx and Akx = _xkx and Diagonalizing A Almost every vector changes direction when it is multiplied by A. Certain exceptional vectors x lie along the same line as Ax. Those are the eigenvectors. For an eigenvector, Ax is a number ..X times the original x. The eigenvalue A tells whether the special vector x is stretched or shrunk or reversed or left unchanged, when it is multiplied by A. We may find A= 2 (stretching) or A = ½(shrinking) or A = -1 (reversing) or A = 1 (steady state, because Ax = x is unchanged). We may also find A= 0. If the nullspace contains nonzero vectors, they have Ax= Ox. So the nullspace contains eigenvectors corresponding to A= 0. For our special matrices, we will guess x and then discover A. For matrices in general, we find A first. To separate A from x, start by rewriting the basic equation: Ax= .Xx means that (A - ..XI)x = 0. The matrix A-AI must be singular. Its determinant must be zero. The eigenvector x will be in the nullspace of A- >.I. The first step is to recognize that A is an eigenvalue exactly when the shifted matrix A - >.I is not invertible: The number ..X is an eigenvalue of A if and only if det(A - AI) = 0. This "characteristic equation" det(A - AI) = 0 involves only A, not x. The determinant of A - >.I is a polynomial in A of degree n. By the Fundamental Theorem of Algebra, this polynomial must haven roots A1, ... , An. Some of those eigenvalues may be repeated and some may be complex-those cases can give us a little trouble. 1.5 Eigenvalues and Eigenvectors 47 Example 1 Start with the special 2 by 2 matrix K = [2 -1; -1 2]. Estimate K 100 . Step 1 Subtract >. from the diagonal to get K _ 'I " = [2 - >. -1 -1 ] 2->.. Step 2 Take the determinant of this matrix. That is (2 - >.) 2 - 1 and we simplify: det(K->.I)= 1 2_- > 1 . 2-_1>. I =>.2 -4>.+3. Step 3 Factoring into >. - 1 times >. - 3, the roots are 1 and 3: ).2 - 4>. + 3 = 0 yields the eigenvalues >.1 = 1 and >.2 = 3. Now find the eigenvectors by solving (K - >.I)x = 0 separately for each >.: K-1= [ 1 -1 -lJ 1 K-31= [ -1 -1 -lJ -1 leads to leads to As expected, K -I and K -31 are singular. Each nullspace produces a line of eigenvectors. We chose x1 and x2 to have nice components 1 and -1, but any multiples c1x1 and c2x2 (other than zero) would have been equally good as eigenvectors. The MATLAB choice is c1 = c2 = 1/../2, because then the eigenvectors have length 1 (unit vectors). These eigenvectors of K are special (since K is). If I graph the functions sin 1rx i and sin 21rx, their samples at the two mesh points x = ½and are the eigenvectors in Figure 1.11. (The functions sink1rx will soon lead us to the eigenvectors of Kn.) K 100 will grow like 3 100 because A-ma:i: = 3. An exact formula for 2K100 is ~ U 1100 [ from ). = 1 + 3 100 [ - 1 1 -l1J from>.= 3 sin 1rx has samples (sin i, sin 2;) = c(l, 1) 1 3 sin 21rx has samples (sin 2;, sin 4;) = c(l, -1) ~ Figure 1. 11: The eigenvectors of [ _ - ~] lie on the graphs of sin 1rx and sin 21rx. 48 Chapter 1 Applied Linear Algebra Example 2 l Here is a 3 by 3 singular example, the circulant matrix C = 0 3: 2 _ A -1 -1 and C-Al= [ -1 2-A -1 . -1 -1 2-A A little patience (3 by 3 is already requiring work) produces the determinant and its factors: det(C - AI)= -A3 + 6A2 - 9A = -A(A - 3)2 . This third degree polynomial has three roots. The eigenvalues are A1 = 0 (singular matrix) and A2 = 3 and A3 = 3 (repeated root!). The all-ones vector x 1 = (1, 1, 1) is in the nullspace of C, so it is an eigenvector for A1 = 0. We hope for two independent eigenvectors corresponding to the repeated eigenvalue A2 = A3 = 3: [=~ =~ =~] C - 3/ = has rank 1 (doubly singular). -1 -1 -1 Elimination will zero out its last two rows. The three equations in (C - 3I)x = 0 are all the same equation -x1 - x2 - x3 = 0, with a whole plane of solutions. They are all eigenvectors for A = 3. Allow me to make this choice of eigenvectors x 2 and x3 from that plane of solutions to Cx = 3x: With this choice, the x's are orthonormal (orthogonal unit vectors). Every symmetric matrix has a full set of n perpendicular unit eigenvectors. For an n by n matrix, the determinant of A - Al will start with (-Ar. The rest of this polynomial takes more work to compute. Galois proved that an algebraic formula for the roots A1, ... , An is impossible for n > 4. (He got killed in a duel, but not about this.) That is why the eigenvalue problem needs its own special algorithms, which do not begin with the determinant of A - Al. The eigenvalue problem is harder than Ax= b, but there is partial good news. Two coefficients in the polynomial are easy to compute, and they give direct information about the product and sum of the roots A1 , ... , An. The product of then eigenvalues equals the determinant of A. This is the constant term in det(A - AI): = Determinant Product of .X's (2) The sum of the n eigenvalues equals the sum of the n diagonal entries. The trace is the coefficient of (-Ar-1 in det(A - AI). = Trace Sum of .X's A1 + A2 + · · · + An= an + a22 + · · · + ann = sum down diagonal of A. (3) 1.5 Eigenvalues and Eigenvectors 49 Those checks are very useful, especially the trace. They appear in Problems 20 and 21. They don't remove all the pain of computing det(A - >-.I) and its factors. But when the computation is wrong, they generally tell us so. In our examples, .X = 1,3 K = has trace 2 + 2 = 1 + 3 = 4. det(K) = 1 • 3. .X = 0,3,3 C= has trace 2 + 2 + 2 = 0 + 3 + 3 = 6. det(C) = 0 Let me note three important facts about the eigenvalue problem Ax= >-.x. 1. If A is triangular then its eigenvalues lie along its main diagonal. The determinant of [4 ~ ).. 2 ~ )..] is (4 - >-.) (2 - >-.), so >-. = 4 and >-. = 2. 2. The eigenvalues of A 2 are .X~, ... , .X!. The eigenvalues of A-1 are l/.X1, ... , 1/.Xn. Multiply Ax = >-.x by A. Multiply Ax= >-.x by A-1. Then A 2x = >-.Ax = >-.2x. Then x = >-.A- 1x and A-1x = ½x. Eigenvectors of A are also eigenvectors of A2 and A- 1 (and any function of A). 3. Eigenvalues of A + B and AB are not known from eigenvalues of A and B. rn ~] A = ~ ~ ~ and B = [ ~] yield A + B = [ ~] and AB = [ ~] . A and B have zero eigenvalues (triangular matrices with zeros on the diagonal). But the eigenvalues of A+ Bare 1 and -1. And AB has eigenvalues 1 and 0. In the special case when AB = BA, these commuting matrices A and B will share eigenvectors: Ax= >-.x and Bx= >-.*x for the same eigenvector x. Then we do have (A+ B)x = (>-. + >-.*)x and ABx = >-.>-.*x. Eigenvalues of A and B can now be added and multiplied. (With B = A, the eigenvalues of A+ A and A2 are>-.+>-. and >-.2 .) Example 3 A Markov matrix has no negative entries and each column adds to 1 (some authors work with row vectors and then each row adds to one): Markov matrix A= [·8 .3] .2 .7 has >-. = 1 and .5 . Every Markov matrix has>-.= 1 as an eigenvalue (A- I has dependent rows). When the trace is .8 + .7 = 1.5, the other eigenvalue must be >-. = .5. The determinant of A must be (>-.1)(>-.2) = .5 and it is. The eigenvectors are (.6, .4) and (-1, 1). 50 Chapter 1 Applied Linear Algebra Eigshow A MATLAB demo (just type eigshow) displays the eigenvalue problem for a 2 by 2 matrix. Figure 1.12 starts with the vector x = (l, 0). The mouse makes this vector move around the unit circle. At the same time the screen shows Ax, in color and also moving. Possibly Ax is ahead of x. Possibly Ax is behind x. Sometimes Ax is parallel to x. At that parallel moment, Ax equals >.x. .! A = [:~ :~] has eigenvalues ~: y is behind y = [~] Not eigenvectors [:~] is ahead of x = [~] X \ I / ,._ ___ ,,,circle of x's Figure 1.12: Eigshow for the Markov matrix: x1 and x2 line up with Ax1 and Ax2 . The eigenvalue >. is the length of Ax, when it is parallel to the unit eigenvector x. On web.mit.edu/18.06 we added a voice explanation of what can happen. The choices for A illustrate three possibilities, 0 or 1 or 2 real eigenvectors: 1. There may be no real eigenvectors. Ax stays behind or ahead of x. This means the eigenvalues and eigenvectors are complex (as for a rotation matrix). 2. There may be only one line of eigenvectors (unusual). The moving directions Ax and x meet but don't cross. This can happen only when >.1 = >.2. 3. There are two independent eigenvectors. This is typical! Ax crosses x at the first eigenvector x1, and it crosses back at the second eigenvector x2 (also at -x1 and -x2). The figure on the right shows those crossing directions: x is parallel to Ax. These eigenvectors are not perpendicular because A is not symmetric. The Powers of a Matrix Linear equations Ax = b come from steady state problems. Eigenvalues have their greatest importance in dynamic problems. The solution is changing with timegrowing or decaying or oscillating or approaching a steady state. We cannot use elimination (which changes the eigenvalues). But the eigenvalues and eigenvectors tell us everything. 1.5 Eigenvalues and Eigenvectors 51 Example 4 The two components of u(t) are the US populations east and west of the Mississippi at time t. Every year, 180 of the eastern population stays east and 120 moves west. At the same time i7o of the western population stays west and 130 moves east: u(t+l) = Au(t) t] . [east west at at ti_me time t t + + 1 1 ] = [·8 .2 .3] [east at ti_me .7 west at time t Start with a million people in the east at t = 0. After one year (multiply by A), the numbers are 800,000 and 200,000. Nobody is created or destroyed because the columns add to 1. Populations stay positive because a Markov matrix has no negative entries. The initial u = [1,000, 000 0] combines the eigenvectors [ 600, 000 400,000] and [400,000 -400, 000 ]. After 100 steps the populations are almost at steady state because (½) 100 is so small: Steady state plus transient u(lOO) = [ 600,000] 400,000 2 + ( 1) lOO [ 400,000] -400,000 • You can see the steady state directly from the powers A, A2, A3 , and A100 : A= [·8 .3] A2 = [.70 .45] A3 = [.650 .525] A 100 = [.6000 .6000] .2 .7 .30 .55 .350 .475 .4000 .4000 Three steps find uk = A ku0 from the eigenvalues and eigenvectors of A. Step 1. Write uo as a combination of the eigenvectors uo = a1x1 + ···+ anXn. Step 2. Multiply each number ai by (>.i/- Step 3. Recombine the eigenvectors into uk = a1(>.1/x1 + ···+ an(>.n/Xn. In matrix language this is exactly uk = SAks- 1u0 . S has the eigenvectors of A in its columns. The diagonal matrix A contains the eigenvalues: Step 1. [x,= Wdte Uo Step 2. SAka which is = uk SAkS-1u 0 . 52 Chapter 1 Applied Linear Algebra Step 2 is the fastest-just n multiplications by Af. Step 1 solves a linear system to analyze u 0 into eigenvectors. Step 3 multiplies by S to reconstruct the answer uk. This process occurs over and over in applied mathematics. We see the same steps next for du/dt = Au, and again in Section 3.5 for A-1. All of Fourier series and signal processing depends on using the eigenvectors in exactly this way (the Fast Fourier Transform makes it quick). Example 4 carried out the steps in a specific case. Diagonalizing a Matrix For an eigenvector x, multiplication by A just multiplies by a number: Ax = AX. All the n by n difficulties are swept away. Instead of an interconnected system, we can follow the eigenvectors separately. It is like having a diagonal matrix, with no off-diagonal interconnections. The 100th power of a diagonal matrix is easy. The matrix A turns into a diagonal matrix A when we use the eigenvectors properly. This is the matrix form of our key idea. Here is the one essential computation. Suppose the n by n matrix A has n linearly independent eigenvectors x1 , ... , Xn- s- Those are the columns of an eigenvector matrix S. Then 1AS= A is diagonal: Diagonalization s-1AS = A = [Ai . . ] = eigen~lue matrix (4) An We use capital lambda for the eigenvalue matrix, with the small A's on its diagonal. Proof Multiply A times its eigenvectors x1, ... , Xn, which are the columns of S. The first column of AS is Ax1. That is A1x1: A times S The trick is to split this matrix AS into S times A: Keep those matrices in the right order! Then A1 multiples the first column x1, as shown. We can write the diagonalization AS= SA in two good ways: AS= SA is s- 1AS = A or A= SAs- 1. (5) The matrix S has an inverse, because its columns (the eigenvectors of A) were assumed to be independent. Without n independent eigenvectors, we cannot diagonalize A. With no repeated eigenvalues, it is automatic that A has n independent eigenvectors. 1.5 Eigenvalues and Eigenvectors 53 Application to Vector Differential Equations ~f A single differential equation = ay has the general solution y(t) = Ceat. The initial value y(0) determines C. The solution y(0)eat decays if a < 0 and it grows if a> 0. Decay is stability, growth is instability. When a is a complex number, its real part determines the growth or decay. The imaginary part gives an oscillating factor eiwt = cos wt+ i sin wt. Now consider two coupled equations, which give one vector equation: du -=Au dt dy/dt = 2y- z dz/dt = -y + 2z or The solution will still involve exponentials e>-.t. But we no longer have a single growth rate as in eat. There are two eigenvalues >. = 1 and >. = 3 of this matrix A = K 2 . The solution has two exponentials et and e3t. They multiply x = (1, 1) and (1, -1). The neat way to find solutions is by the eigenvectors. Pure solutions e-Xtx are eigenvectors that grow according to their own eigenvalue 1 or 3. We combine them: is [ y(t)] z(t) = [Get+ De3t] Get - De3t • (6) This is the complete solution. Its two constants (C and D) are determined by two initial values y(0) and z(0). Check first that each part e>-.tx solves ~~ = Au: = Each eigenvector u(t) e.\tx yields !~ = >.e>-.tx = Ae>-.tx =Au. (7) The number e>-.t just multiplies all components of the eigenvector x. This is the point of eigenvectors, they grow by themselves at their own rate >.. Then the complete solution u(t) in (6) combines the pure modes Cetx1 and De3tx2 . The three steps for powers apply here too: Expand u{O) = Sa, multiply each a3 by e-X;t, recombine into u(t) = Se.xts-1u(O). Example 5 Suppose the initial values are y(0) = 7 and z(0) = 3. This determines the constants C and D. At the starting time t = 0, the growth factors e>-.t are both one: We solved two equations to find C = 5 and D = 2. Then u(t) = + 5etx1 2e3tx2 solves the whole problem. The solution is a combination of slow growth and fast growth. Over a long time the faster e3t will dominate, and the solution will line up with x 2 . + Section 2.2 will explain the key equation Mu" Ku = 0 in much more detail. Newton's law involves acceleration (second derivative instead of first derivative). We might have two masses connected by springs. They can oscillate together, as in the first eigenvector (1, 1). Or they can be completely out of phase and move in opposite directions, as in the second eigenvector (1, -1). The eigenvectors give the pure motions eiwtx, or "normal modes." The initial conditions produce a mixture. 54 Chapter 1 Applied Linear Algebra Symmetric Matrices and Orthonormal Eigenvectors Our special matrices Kn and Tn and Bn and Cn are all symmetric. When A is a symmetric matrix, its eigenvectors are perpendicular (and the >.'s are real): Symmetric matrices have real eigenvalues and orthonormal eigenvectors. The columns of Sare these orthonormal eigenvectors q1 , ... , qn. We use q instead of x for orthonormal vectors, and we use Q instead of S for the eigenvector matrix with those columns. Orthonormal vectors are perpendicular unit vectors: when i -/- j (orthogonal vectors) when i = j (orthonormal vectors) (8) The matrix Q is easy to work with because QTQ = I. The transpose is the inverse! This repeats in matrix language that the columns of Q are orthonormal. QTQ = I contains all those inner products q;qi that equal Oor 1: Orthogonal Matrix (9) For two orthonormal columns in three-dimensional space, Q is 3 by 2. In this rectangular case, we still have QTQ = I but we don't have QQT = I. For our full set of eigenvectors, Q is square and QT is Q-1. The diagonalization of a real symmetric matrix has S = Q and s-1 = QT: Symmetric diagonalization A= SAs- 1 = QAQT with QT= Q-1 . (10) Notice how QAQT is automatically symmetric (like LDLT). These factorizations perfectly reflect the symmetry of A. The eigenvalues >.1, ... , An are the "spectrum" of the matrix, and A= QAQT is the spectral theorem or principal axis theorem. Part 11: Eigenvectors for Derivatives and Differences A main theme of this textbook is the analogy between discrete and continuous problems (matrix equations and differential equations). Our special matrices all give sec- ond differences, and we go first to the differential equation -y 11 = >.y. The eigen- functions y(x) are cosines and sines: d2y - - = ..Xy(x) is solved by y = coswx and y = sinwx with ..X = w2 . (11) d:z:2 Allowing all frequencies w, we have too many eigenfunctions. The boundary conditions will pick out the frequencies w and choose cosines or sines. 1.5 Eigenvalues and Eigenvectors 55 For y(O) = 0 and y(l) = 0, the fixed-fixed eigenfunctions are y(a::) = sin k1rx. The boundary condition y(O) = 0 reduces to sin0 = 0, good. The condition y(l) = 0 reduces to sin br = 0. The sine comes back to zero at 1r and 21r and every integer multiple of 1r. So k = l, 2, 3, ... (since k = 0 only produces sin Ox = 0 which is useless). Substitute y(x) = sink1rx into (11) to find the eigenvalues>.= k21r2: - ddx22 (sm• k1rx ) = k21r2 sm• k1rx so ,,.. = k21r2 = { 1r2 , 41r2 , 91r2 , . . .} . (12) We will make a similar guess (discrete sines) for the discrete eigenvectors of Kn. Changing the boundary conditions gives new eigenfunctions and eigenvalues. The equation -y 11 = >.y is still solved by sines and cosines. Instead of y = sin k1rx which is zero at both endpoints, here are the eigenfunctions Yk(x) and their eigenvalues >.k for free-free (zero slope) and periodic and free-fixed conditions: Analog of Bn y 1(0) = 0 and y 1(l) = 0 y(x) = cosk1rx Analog of Cn y(O) = y(l), y 1(0) = y 1(l) y(x) = sin21rka::,cos21rka:: ). = 4k 21r2 Analog of Tn y 1(0) = 0 and y(l) = 0 y(x) = cos {k+½)1ra:: >. = {k+½) 21r2 Remember that Bn and Cn are singular matrices (>. = 0 is an eigenvalue). Their continuous analogs also have>.= 0, with cos Ox= 1 as the eigenfunction (set k = 0). This constant eigenfunction y(x) = 1 is like the constant vector y = (l, 1, ... , 1). The free-fixed eigenfunctions cos(k + ½)1rx start with zero slope because sin 0 = 0. They end with zero height because cos(k + ½)1r = 0. So y 1(0) = 0 and y(l) = 0. The matrix eigenvectors will now use these same sines and cosines (but >. is different). Eigenvectors of Kn: Discrete Sines Now come eigenvectors for the -1, 2, -1 matrices. They are discrete sines and cosines-try them and they work. For all the middle rows, sinj0 and cosj0 are still successful for -Yi-I + 2yi - Yi+I = >.yi, with eigenvalues .X = 2 - 2 cos 0 ~ 0: -l{sin(j_ cos(J - 1)0} 1)0 + 2{sinj_0}cosJ0 1{sin(j_ cos(J + + 1)0} 1)0 = (2 _ 2 cos 0) {sinf0}· cosJ0 (l3) These are the imaginary and real parts of a simpler identity: The boundary rows decide 0 and everything! For Kn the angles are 0 = k1r / (n + l). The first eigenvector y1 will sample the first eigenfunction y(x) = sin 1rx at the n meshpoints with h = n~I: = First eigenvector Discrete sine y1 = (sin 1rh, sin 21rh, ... , sin n1rh) (14) 56 Chapter 1 Applied Linear Algebra The jth component is sin ; ;1 . It is zero for j = 0 and j = n + 1, as desired. The angle is 0 = 1rh = n:l. The lowest eigenvalue is 2 - 2 cos 0 ~ 02: ~ First eigenvalue of Kn A1 = 2 - 2cos1rh = 2 - 2 ( 1 - 1r:h2 + • • •) 1r2h2 . (15) Compare with the first eigenvalue A = 1r2 of the differential equation (when y(x) = sin 1rx and -y 11 = 1r2y). Divide K by h2 = (box)2 to match differences with derivatives. The eigenvalues of Kare also divided by h2: A1(K)/h2 ~ 1r2h2/h2 is close to the first eigenvalue A1 = 1r2 in (12). The other continuous eigenfunctions are sin 21rx, sin 31rx, and generally sin k1rx. It is neat for the kth discrete eigenvector to sample sin k1rx again at x = h, ... , nh: = Eigenvectors Discrete sines Yk = (sin k1rh, ... , sin nk1rh) (16) All eigenvalues of Kn Ak = 2 - 2 cos k1rh, k = 1, ... , n. (17) The sum A1 +•••+An must be 2n, because that is the sum of the 2's on the diagonal (the trace). The product of the A's must be n+l. This probably needs an expert (not the author). For K 2 and K 3 , Figure 1.13 shows the eigenvalues (symmetric around 2). Eigenvalues 2 - 2 cos 0 of K3 1) ..X1 = 2 - 2 ( = 2 - v'2 ..X2 = 2 - 2(0) = 2 -1) ..Xa = 2 - 2 ( = 2 + v'2 Trace A1 + A2 + A3 = 6 Determinant A1A2A3 = 4 For B4 include also Ao= 0 0 1r I 4 21r / 4 31r / 4 1r Figure 1.13: Eigenvalues* of Kn-I interlace eigenvalues•= 2 - 2cos nk.;1 of Kn. Orthogonality The eigenvectors of a symmetric matrix are perpendicular. That is confirmed for the eigenvectors (1, 1) and (1, -1) in the 2 by 2 case. The three eigenvectors (when n = 3 and n + l = 4) are the columns of this sine matrix: Discrete . 7r S1Il4 sr.n 427r sr.n 437r 1 1 v'2 1 v'2 Sine DST= sr.n 427r sr•n 447r sin B1r 4 1 0 -1 (18) Transform s•m 437r sm• 461r sin 91r 4 1 v'2 -1 1 v'2 1.5 Eigenvalues and Eigenvectors 57 The columns of Sare orthogonal vectors, of length ../2. If we divide all components by ../2, the three eigenvectors become orthonormal. Their components lie on sine curves. The DST matrix becomes an orthogonal Q = DST/../2, with Q-1 = QT. Section 3.5 uses the DST matrix in a Fast Poisson Solver for a two-dimensional difference equation (K2D)U = F. The columns of the matrix are displayed there for n = 5. The code will give a fast sine transform based on the FFT. , ,sin31rx \ - Eigenvectors of Ka .. ,..._°o'- o cos Ox = 1 o , ,. ' lie along sine curves \ \ ', \ \ ,'ti cos 21rx 0 1\ \ 3 4 \ \I 4 I \ ( / Eigenvectors of Ba lie along cosines ---+ I 0 1\1 1 ' i i \ ,'o, 11 1 i3 i ' t i i5 I 1 \ I\ / "' -o sin 21rx ' \ / 'o COS7rX 'r ... - Figure 1.14: Three discrete eigenvectors fall on three continuous eigenfunctions. Eigenvectors of Bn: Discrete Cosines The matrices Bn correspond to zero slope at both ends. Remarkably, Bn has the same n - 1 eigenvalues as Kn-I plus the additional eigenvalue >. = 0. (B is singular with (1, ... , 1) in its nullspace, because its first and last rows contain +1 and -1.) Thus B3 has eigenvalues 0, 1, 3 and trace 4, agreeing with 1 + 2 + 1 on its diagonal: Eigenvalues of Bn >. = 2 - k1r 2 cos-, k = 0, ... , n - 1. (19) n Eigenvectors of B sample cosk1rx at then midpoints x = (j - ½)/n in Figure 1.14, where eigenvectors of K sample the sines at the meshpoints x = j / (n + 1): Eigenvectors of Bn Yk = ( co21s-kn1-r' co32sk-n1-r'···, cos ( n - -21) -kn1r) • (20) Since the cosine is even, those vectors have zero slope at the ends: and Notice that k = 0 gives the all-ones eigenvector y0 = (1, 1, ... , 1) which has eigenvalue >. = 0. This is the DC vector with zero frequency. Starting the count at zero is the useful convention in electrical engineering and signal processing. 58 Chapter 1 Applied Linear Algebra These eigenvectors of Bn give the Discrete Cosine Transform. Here is the cosine matrix for n = 3, with the unnormalized eigenvectors of B3 in its columns: Discrete Cosine l [ cosO cosl12!3: cos2l 231r 1 OCT = [ cos 0 cos ~ 2 13 c cos ~ 2 231r = 1 l l2 V'30 l2 0 -1 (21) Transform cos 0 cos Q 1!: 2 3 cos §_ 2 231r 1 -½\1'3 ½ = Eigenvectors of Cn: Powers of w e 21ri/n After sines from Kn and cosines from Bn, we come to the eigenvectors of Cn. These are both sines and cosines. Equivalently, they are complex exponentials. They are even more important than the sine and cosine transforms, because now the eigenvectors give the Discrete Fourier Transform. You can't have better eigenvectors than that. Every circulant matrix shares these eigenvectors, as we see in Chapter 4 on Fourier transforms. A circulant matrix is a "periodic matrix." It has constant diagonals with wrap-around (the -l's below the l main diagonal of Cn wrap around to the -1 in the upper right corner). Our goal is to find the eigenvalues and eigenvectors of the matrices Cn, like C4: 2 -1 0 -1 Circulant matrix (periodic) C4 = [ -1 0 2 -1 -1 2 0 -1 • -1 0 -1 2 This symmetric matrix has real orthogonal eigenvectors (discrete sines and cosines). They have full cycles like sin 2k1Tx, not half cycles like sin k1Tx. But the numbering gets awkward when cosines start at k = 0 and sines start at k = l. It is better to work with complex exponentials ei9 . The kth eigenvector of Cn comes from sampling Yk(x) = ei21rkx at then meshpoints which are now x = j/n: jth component of Yk ei21rk(j/n) = wjk where w = e21ri/n = nth root of 1. (22) That special number w = e 27r:i/n is the key to the Discrete Fourier Transform. Its angle is 211"/n, which is an nth part of the whole way around the unit circle. The powers of w cycle around the circle and come back town= l: Eigenvectors of Cn Yk-- (1 ' wk ' w2k , ••• , w.0 = 0. The Cn are singular. 1.5 Eigenvalues and Eigenvectors 59 The eigenvector with k = l is y1 = (1, w, ... , wn-l). Those components are the n roots of 1. Figure 1.15 shows the unit circle r = 1 in the complex plane, with then= 4 numbers 1, i, i2, i3 equally spaced around the circle on the left. Those numbers are e0, e2 e4n:i/4, e6 4 and their fourth powers are 1. n:i/4, n:i/ l [ l [ l Cy1 = -12 [ 0 _-121 -10 2 -_101 i1i2 . ·3 = (2 - z - z ) i1i2 . (25) -1 0 -1 2 f f For any n, the top row gives 2 - w - wn-l = 2 - w - w. Notice that wn-l is also the complex conjugate w = e-2n:i/n = l/w, because one more factor w will reach 1. Eigenvalue of C >.1 = 2 - w - w = 2 - e2n:i/n - e-2n:i/n = 2 - 2 cos 27r . (26) n w2 = -1 w = e21ri/4 = i n=4 w4 = 1 w3 = -i Figure 1.15: The solutions to z4 = 1 are 1,i,i2,i3 The 8th roots are powers of e2 8 . n:i/ . Now we know the first eigenvectors y0 = (1,l,1,1) and y1 = (1,i,i2,i3 ) of 0 4. The eigenvalues are Oand 2. To finish, we need the eigenvectors y2 = (1, i2, i4, i6) and y3 = (1,i3 ,i6,i9 ). Their eigenvalues are 4 and 2, which are 2-2cos1r and 2-2cos 3 ;. Then the sum of the eigenvalues is O+ 2 + 4 + 2 = 8, agreeing with the sum of the diagonal entries (the trace 2 + 2 + 2 + 2) of this matrix 0 4. The Fourier Matrix As always, the eigenvectors go into the columns of a matrix. Instead of the sine or cosine matrix, these eigenvectors of Cn give the Fourier matrix Fn. We have the [ i l DFT instead of the DST or DCT. For n = 4 the columns of F4 are Yo, Y1, Y2, y : 3 Fourier matrix F 4 Eigenvectors of C4 F4 = 1 1 i i2 i2 i4 ii6; i3 i6 ig = = (Fn)jk wik e2-rrijk/n. 60 Chapter 1 Applied Linear Algebra The columns of the Fourier matrix are orthogonal! The inner product of two complex vectors requires that we take the complex conjugate of one of them (by convention l- - the first one). Otherwise we would have y'[y3 = 1 + 1 + 1 + 1 = 4. But y1 is truly orthogonal to y3 because the correct product uses the conjugate y1: Complex [1 -i (-i)2 (-i)3] [ i\ inner ·6 - 1 - 1 + 1 - 1 - 0 . (27) product i ig Similarly YfY1 = 4 gives the correct length IIY1II = 2 (not YfY1 = 0). The matrix FT F of all the column inner products is 4/. Orthogonality of columns reveals p-1: Orthogonal -T F 4F4 = 4/ so that F4-1 = 4lp T 4 = i•nverse of p . (28) Always F!Fn = nl. The inverse of Fn is F:/n. We could divide Fn by ,/n, which normalizes it to Un = Fn/ ..fii,. This normalized Fourier matrix is unitary: Orthonormal (29) A unitary matrix has UT U = I and orthonormal columns. It is the complex analog of a real orthogonal Q (which has QTQ = I). The Fourier matrix is the most important complex matrix ever seen. Fn and F,;;1 produce the Discrete Fourier Transform. Problem Set 1.5 The first nine problems are about the matrices Kn, Tn, Bn, Cn. 1 The 2 by 2 matrix K2 in Example 1 has eigenvalues 1 and 3 in A. Its unit eigenvectors q1 and q2 are the columns of Q. Multiply QAQT to recover K2 . 2 When you multiply the eigenvector y = (sin 1rh, sin 21rh, ... ) by K, the first row will produce a multiple of sin 1rh. Find that multiplier .X by a double-angle formula for sin 21rh: (Ky )i = 2 sin 7rh - l sin 21rh = .X sin 7rh Then .X = 3 In MATLAB, construct K = K 5 and then its eigenvalues bye= eig(K). That column should be (2 - v13, 2 - 1, 2 - 0, 2 + 1, 2 + v'3). Verify that e agrees with 2 * ones(5, 1) - 2 * cos([l : 5] * pi/6)'. 4 Continue 3 to find an eigenvector matrix Q by [Q, E] = eig(K). The Discrete Sine Transform DST = Q * diag([ -1 -1 1 -1 1]) starts each column with a positive entry. The matrix J K = [1 : 5 ]' * [1 : 5] has entries j times k. Verify that DST agrees with sin(JK*pi/6)/sqrt(3), and test DSTT = DST-1. 1.5 Eigenvalues and Eigenvectors 61 5 Construct B = BB and [Q, E] = eig(B) with B(l, 1) = 1 and B(6, 6) = 1. Verify that E = diag(e) with eigenvalues 2 *ones(l, 6) - 2 *cos([O: 5] * pi/6) in e. How do you adjust Q to produce the (highly important) Discrete Cosine Transform with entries OCT= cos([.5: 5.5 ]' * [O: 5] * pi/6)/sqrt(3)? 6 The free-fixed matrix T = TB has T(l, 1) = 1. Check that its eigenvalues are 2-2 cos [(k - ½)11"/6.5]. The matrix cos([.5: 5.5 ]' * [.5: 5.5] * pi/6.5)/sqrt(3.25) should contain its unit eigenvectors. Compute Q' * Q and Q' * T * Q. 7 The columns of the Fourier matrix F4 are eigenvectors of the circulant matrix C = C4 . But [Q, E] = eig(C) does not produce Q = F4 . What combinations of the columns of Q give the columns of F4 ? Notice the double eigenvalue in E. 8 Show that then eigenvalues 2 - 2 cos ;:_;1 of Kn add to the trace 2 + · · · + 2. 9 K3 and B4 have the same nonzero eigenvalues because they come from the same 4x3 backward difference LL. Show that K 3 = LLT ,6._ and B4 = .6._.6._T_ The eigenvalues of K 3 are the squared singular values a2 of .6._ in 1.7. Problems 10-23 are about diagonalizing A by its eigenvectors in S. s- 10 Factor these two matrices into A= SAs- 1. Check that A2 = SA2 1: and 11 If A= SAs- 1 then A-1 = ( )( )( ). The eigenvectors of A3 are (the same columns of S)(different vectors). 12 If A has >.1 = 2 with eigenvector x 1 = [i] and >.2 = 5 with x2 = [½] , use SAs-1 to find A. No other matrix has the same >.'sand x's. 13 Suppose A = SAs-1. What is the eigenvalue matrix for A + 21? What is the eigenvector matrix? Check that A+ 21 = ( )( )( )-1. 14 If the columns of S (n eigenvectors of A) are linearly independent, then (a) A is invertible (b) A is diagonalizable (c) S is invertible 15 The matrix A = [gA] is not diagonalizable because the rank of A - 31 is __ . A only has one line of eigenvector. Which entries could you change to make A diagonalizable, with two eigenvectors? 16 Ak = SAks-1 approaches the zero matrix as k --too if and only if every >. has absolute value less than . Which of these matrices has Ak --t O? A1 = [·.46 ..64] and A2 = [·.61 ..69] and 62 Chapter 1 Applied Linear Algebra 17 Find A and S to diagonalize A1 in Problem 16. What is A 110u0 for these u0? 18 Diagonalize A and compute SAks- 1 to prove this formula for Ak: has 19 Diagonalize B and compute SAks- 1 to show how Bk involves 3k and 2k: has 20 Suppose that A = SAs-1. Take determinants to prove that det A = >.1>.2• • •An = product of .X's. This quick proof only works when A is __ . 21 Show that trace GH = trace HG, by adding the diagonal entries of GH and HG: and Choose G = S and H = As-1. Then S As- 1 = A has the same trace as AS-1S = A, so the trace is the sum of the eigenvalues. 22 Substitute A = SAs-1 into the product (A - >.1/)(A - >.21) ···(A - >.nl) and explain why (A - >.1/) ···(A - An/) produces the zero matrix. We are substi- tuting A for>. in the polynomial p(>.) = det(A - >.I). The Cayley-Hamilton Theorem says that p(A) = zero matrix (true even if A is not diagonalizable). Problems 23-26 solve first-order systems u 1 = Au by using Ax = >.x. 23 Find .X's and x's so that u = e>--tx solves What combination u = c1e>--1tx1 + c2e>--2tx2 starts from u(O) = (5, -2)? 24 Find A to change the scalar equation y" = 5y' + 4y into a vector equation for u = (y, y'). What are the eigenvalues of A? Find >.1 and >.2 also by substituting y = e>--t into y" = 5y' + 4y. ddut = [yy"'] = [ 1.5 Eigenvalues and Eigenvectors 63 25 The rabbit and wolf populations show fast growth of rabbits (from 6r) but loss to wolves (from -2w). Find A and its eigenvalues and eigenvectors: -dr =6r-2w dt and ddwt = 2r+w. If r(0) = w(0) = 30 what are the populations at time t? After a long time, is the ratio of rabbits to wolves 1 to 2 or is it 2 to 1? 26 Substitute y = e>-.t into y" = 6y' - 9y to show that >. = 3 is a repeated root. This is trouble; we need a second solution after e3t. The matrix equation is Show that this matrix has >. = 3, 3 and only one line of eigenvectors. Trouble here too. Show that the second solution is y = te3t. 27 Explain why A and AT have the same eigenvalues. Show that >. = 1 is always an eigenvalue when A is a Markov matrix, because each row of AT adds to 1 and the vector __ is an eigenvector of AT. 28 Find the eigenvalues and unit eigenvectors of A and T, and check the trace: A= [ 11 01 0ll 1 0 0 T = [ -11 -l2J • 29 Here is a quick "proof" that the eigenvalues of all real matrices are real: is real. Find the flaw in this reasoning-a hidden assumption that is not justified. 30 Find all 2 by 2 matrices that are orthogonal and also symmetric. Which two numbers can be eigenvalues of these matrices? 31 To find the eigenfunction y(x) = sin brx, we could put y = e= in the differential equation -u" = >.u. Then -a2eax = >.e= gives a = i,/>.. or a = -i,/>... The complete solution y(x) = Ceiv'>-x + De-iv'>-x has C + D = 0 because y(0) = 0. That simplifies y(x) to a sine function: y(x) = C(eiv'>-x - e-iv'>-x) = 2iC sin ..f>..x. y(l) = 0 yields sin '15,. = 0. Then '15,. must be a multiple of k1r, and >. = k21r2 as before. Repeat these steps for y'(0) = y'(l) = 0 and also y'(0) = y(l) = 0. 64 Chapter 1 Applied Linear Algebra 32 Suppose eigshow follows x and Ax for these six matrices. How many real eigenvectors? When does Ax go around in the opposite direction from x? 33 Scarymatlab shows what can happen when roundoff destroys symmetry: A=[lllll; 1:5]'; B=A'*A; P=A* inv(B)*A'; [Q, E]= eig(P); Bis exactly symmetric. The projection P should be symmetric, but isn't. From Q' * Q show that two eigenvectors of P fail badly to have inner product 0. a WORKED EXAMPLE a The eigenvalue problem -u 11 + x2u = >.u for the Schrodinger equation is im- portant in physics (the harmonic oscillator). The exact eigenvalues are the odd numbers >. = 1, 3, 5, .... This is a beautiful example for numerical experiment. One new point is that for computations, the infinite interval (-oo, oo) is reduced to - L ::=; x ::=; L. The eigenfunctions decay so quickly, like e-x2!2 , that the matrix K could be replaced by B (maybe even by the circulant C). Try harmonic(lO, 10, 8) and (10, 20, 8) and (5, 10, 8) to see how the error in>.= 1 depends on hand L. function harmonic(L,n,k) h=l/n; N=2 * n * L+ 1; K= toeplitz([2-1 zeros(l,N-2)]); H=K/h /\ 2 + diag((-L:h:L). /\ 2); [V,F]= eig(H); E=diag(F); E=E(l:k) j=l:k; plotU,E); % positive integers L, n, k % N points in interval [-£, L] % second difference matrix % diagonal matrix from x I\ 2 % trideig is faster for large N % first k eigenvalues (near 2n + 1) % choose sparse K and diag if needed A tridiagonal eigenvalue code trideig is on math.mit.edu/~persson. The exact eigenfunctions Un = Hn(x)e-x2 / 2 come from a classical method: Put u(x) = (LajxJ)e-x2 / 2 into the equation -u" + x2u = (2n + l)u and match up each power of x. Then aH2 comes from a1 (even powers stay separate from odd powers): The coefficients are connected by (j + l)(j + 2)aH2 = -2(n - j)a1 . At n = j the right side is zero, so a1+2 = 0 and the power series stops (good thing). Otherwise the series would produce a solution u(x) that blows up at infinity. (The cutoff explains why >. = 2n + 1 is an eigenvalue.) I am happy with this chance to show a success for the power series method, which is not truly a popular part of computational science and engineering. The functions Hn(x) turn out to be Hermite polynomials. The eigenvalues in physical units are E = (n + ½)nw. That is the quantization condition that picks out discrete energy states for this quantum oscillator. 1.5 Eigenvalues and Eigenvectors 65 The hydrogen atom is a stiffer numerical test because e-x2 / 2 disappears. You can see the difference in experiments with -u" + (l(l + 1)/2x2 - l/x)u = AU on the radial line 0:::; x < oo. Niels Bohr discovered that An= c/n2, which Griffiths [69] highlights as "the most important formula in all of quantum mechanics. Bohr obtained it in 1913 by a serendipitous mixture of inapplicable classical physics and premature quantum theory... " Now we know that Schrodinger's equation and its eigenvalues hold the key. Properties of eigenvalues and eigenvectors Matrix Symmetric: AT = A Orthogonal: QT = Q-1 Skew-symmetric: AT = -A Complex Hermitian: AT = A Positive Definite: x T Ax > 0 Eigenvalues all A are real all IAI = 1 all A are imaginary all A are real all A > 0 Eigenvectors orthogonal x;Xj = 0 x; orthogonal x j = 0 x; orthogonal x j = 0 x; orthogonal x j = 0 orthogonal Markov: mij > 0, L~=l mij = 1 Similar: B = M-1AM Projection: P = P 2 = pT Amax = 1 A(B) = A(A) A= l; 0 steady state x > 0 x(B) = M-1x(A) column space; nullspace Reflection: I - 2uuT A= -1; 1, .. , 1 u;u_1_ Rank One: uvT A=vTu; 0, .. ,0 u; Vl_ Inverse: A-1 Shift: A+ cl Stable Powers: An--+ 0 1/A(A) A(A) + c all IAI < 1 eigenvectors of A eigenvectors of A Stable Exponential: eAt --+ O Cyclic: P(l, .. , n) = (2, .. , n, 1) Toeplitz: -1, 2, -1 on diagonals Diagonalizable: SAs-1 all Re A<0 Ak = e21rik/n Ak = 2 - 2cos EnI+..l. diagonal of A Xk = (1, Ak, ... , A~-l) = Xk (S•ill k1r n+l, • Slll 2k1r n+l, ) ••• columns of S are independent Symmetric: QAQT Jordan: J = M- 1AM diagonal of A (real) columns of Qare orthonormal diagonal of J each block gives x = (0, .. , 1, .. , 0) SVD: A= U~VT singular values in ~ eigenvectors of ATA, AAT in V, U 66 Chapter 1 Applied Linear Algebra 1.6 POSITIVE DEFINITE MATRICES This section focuses on the meaning of "positive definite." Those words apply to square symmetric matrices with especially nice properties. They are summarized at the end of the section, and I believe we need three basic facts in order to go forward: 1. Every K = ATA is symmetric and positive definite (or at least semidefinite). 2. If K 1 and K 2 are positive definite matrices then so is K 1 + K2 . 3. All pivots and all eigenvalues of a positive definite matrix are positive. The pivots and eigenvalues have been emphasized. But those don't give the best approach to facts 1 and 2. When we add K 1 + K2, it is not easy to follow the pivots or eigenvalues in the sum. When we multiply ATA (and later ATCA), why can't the pivots be negative? The key is in the energy ½uTKu. We really need an energy-based definition of positive definiteness, from which facts 1, 2, 3 will be clear. Out of that definition will come the test for a function P(u) to have a minimum. Start with a point where all partial derivatives 8P/8u1 , 8P/8u2 , ... , 8P/8un are zero. This point is a minimum (not a maximum or saddle point) if the matrix of second derivatives is positive definite. The discussion yields an algorithm that actually finds this minimum point. When P(u) is a quadratic function (involving only ½Kiiul and KijUiUj and fiui) that minimum has a neat and important form: The minimum of P(u) = ½uTKu - uTf is Pmin = -½ITK- 1f when Ku= f. Examples and Energy-based Definition Three example matrices ½K, B, Mare displayed below, to show the difference between definite and semidefinite and indefinite. The off-diagonal entries in these examples get larger at each step. You will see how the "energy" goes from positive (for K) to possibly zero (for B) to possibly negative (for M). Definite [ } -2 -½ ] 1 Ui - U1U2 + U~ Always positive Semidefinite B = [ -! -! ] Ui - 2U1U2 + U~ Positive or zero Indefinite M= [ 1 -3 -3] 1 Ui - 6U1U2 + U~ Positive or negative Below the three matrices you will see something extra. The matrices are multiplied on the left by the row vector uT = [u1 u2 ] and on the right by the column vector u. The results uT ( ½K) u and uT Bu and uTMu are printed under the matrices. With zeros off the diagonal, I is positive definite (pivots and eigenvalues all 1). When the off-diagonals reach -½, the matrix ½K is still positive definite. At -1 1.6 Positive Definite Matrices 67 we hit the semidefinite matrix B (singular matrix). The matrix M with -3 off the diagonal is very indefinite (pivots and eigenvalues of both signs). It is the size of those off-diagonal numbers-½, -1, -3 that is important, not the minus signs. Quadratics These pure quadratics like ur - U1U2 + U§ contain only second degree terms. The simplest positive definite example would be ur + u§, coming from the identity matrix I. This is positive except at u1 = u2 = 0. Every pure quadratic function comes from a symmetric matrix. When the matrix is S, the function is uT Su. When S has an entry b on both sides of its main diagonal, those entries combine into 2b in the function. Here is the multiplication uT Su, when a typical 2 by 2 symmetric matrix S produces aui and 2bu1u2 and cu§: Quadratic Function Notice how a and con the diagonal multiply Ui and U§. The two b's multiply u 1u2. The numbers a, b, c will decide whether uT Su is always positive (except at u = 0). This positivity of uT Su is the requirement for S to be a "positive definite matrix." Definition The symmetric matrix S is positive definite when u T Su > 0 for every vector u except u = 0. The graph of uT Su goes upward from zero. There is a minimum point at u = 0. Figure 1.16a shows Ui - u1u2 + U§, from S = ½K. Its graph is like a bowl. This definition makes it easy to see why the sum K 1 + K2 stays positive definite (Fact 2 above). We are adding positive energies so the sum is positive. We don't need to know the pivots or eigenvalues. The sum of uT K 1u and uT K 2u is uT(K1 + K 2 )u. If the two pieces are positive whenever u i= 0, the sum is positive too. Short proof! In the indefinite case, the graph of uTMu goes up and down from the origin in Figure 1.16c. There is no minimum or maximum, and the surface has a "saddle point." If we take u1 = 1 and u2 = 1 then uTMu= -4. If u1 = 1 and u2 = 10 then uT Mu= +41. The semidefinite matrix B has uT Bu = (u1 - u2) 2. This is positive for most u, but it is zero along the line u1 = u2. Figure 1.16: Positive definite, semidefinite, and indefinite: Bowl, trough, and saddle. 68 Chapter 1 Applied Linear Algebra Sums of Squares To confirm that M is indefinite, we found a vector with uTMu > 0 and another vector with uT Mu < 0. The matrix K needs more thought. How do we show that uT Ku stays positive? We cannot substitute every u1, u2 and it would not be enough to test only a few vectors. We need an expression that is automatically positive, like uTu = ur + U§. The key is to write UT Ku as a sum of squares: UT Ku= 2ur - 2u1U2 + 2u~ = Ui + (u1 - u2) 2 + u~ (three squares) (2) The right side cannot be negative. It cannot be zero, except if u1 = 0 and u2 = 0. So this sum of squares proves that K is a positive definite matrix. We could achieve the same result with two squares instead of three: uTKu = 2u12 - 2u1u2 + 2u22 = 2(u1 - 21u2 )2 + 23u22 (two squares) (3) J What I notice about this sum of squares is that the coefficients 2 and are the pivots of K. And the number -½ inside the first square is the multiplier €21 , in K = LDLT: -i ] ~ Two squares K = [ -~ -; ] = [ -½ 1 ] [ 2 ] [ 1 = LDLT. (4) The sum of three squares in (2) is connected to a factorization K = ATA, in which A ~ l has three rows instead of two. The three rows give the squares in ur + (u2-u1)2+u§: Three squares K = [ -12 -12 ] = [ 01 -11 -10 ] [-~ 0 -1 = ATA. (5) Probably there could be a factorization K = ATA with four squares in the sum and four rows in the matrix A. What happens if there is only one square in the sum? Semidefinite UT Bu= ur - 2u1U2 + U§ = (u1 - u2) 2 (only one square). (6) The right side can never be negative. But that single term (u 1 - u2) 2 could be zero! A sum of less than n squares will mean that an n by n matrix is only semidefinite. The indefinite example uTMu is a difference of squares (mixed signs): uTMu = Ui - 6u1u2 + u~ = (u1 - 3u2) 2 - 8u~ (square minus square). (7) Again the pivots 1 and -8 multiply the squares. Inside is the number €21 = -3 from elimination. The difference of squares is coming from M = LDLT, but the diagonal pivot matrix D is no longer all positive and M is indefinite: Indefinite M = [ 1 -3 ] = [ 1 ] [ 1 ] [ 1 -3 ] = LDLT. -3 1 -3 1 -8 1 (8) The next page moves to the matrix form uTATAu for a sum of squares. 1.6 Positive Definite Matrices 69 Positive Definiteness from ATA, ATCA, LDLT, and QAQT Now comes the key point. Those 2 by 2 examples suggest what happens for n by n positive definite matrices. K might come as AT times A, for some rectangular matrix A. Or elimination factors K into LDLT and D > 0 gives positive definiteness. Eigenvalues and eigenvectors factor K into QAQT, and the eigenvalue test is A> 0. The matrix theory only needs a few sentences. In linear algebra, "simple is good." K =ATA is symmetric positive definite if and only if A has independent columns. This means that the only solution to Au = 0 is the zero vector u = 0. If there are nonzero solutions to Au= 0, then ATA is positive semidefinite. We now show that uTKu 2:: 0, when K is ATA. Just move the parentheses! Basic trick for ATA (9) This is the length squared of Au. So ATA is at least semidefinite. When A has independent columns, Au = 0 only happens when u = 0. The only vector in the nullspace is the zero vector. For all other vectors uT (AT A)u = 11 Au II 2 is positive. So ATA is positive definite, using the energy-based definition uTKu > 0. Example 1 If A has more columns than rows, then those columns are not independent. With dependent columns ATA is only semidefinite. This example (the free-free matrix B 3 ) has three columns and two rows in A, so dependent columns: l [ l 3 columns of A add to zero 3 columns of AT A add to zero [ - 1 0 _ o1 1 [ -1 1 0 ] _ 0 -1 1 - _ 1 -12 _ o1 0 _1 1 • This is the semidefinite case. If Au = 0 then certainly ATAu = 0. The rank of ATA always equals the rank of A (its rank here is only r = 2). The energy uT Bu is (u2 - u1) 2 + (u3 - u2 )2, with only two squares but n = 3. It is a short step from ATA to positive definiteness of the triple products ATCA and LDLT and QAQT. The middle matrices C and D and A are easily included. The matrix K = ATCA is symmetric positive definite, provided A has independent columns and the middle matrix C is symmetric positive definite. To check positive energy in ATCA, use the same idea of moving the parentheses: Same trick (10) If u is not zero then Au is not zero (because A has independent columns). Then (Au?C(Au) is positive because C is positive definite. So uTKu> 0: positive definite. C = cT in the middle could be the pivot matrix Dor the eigenvalue matrix A. 70 Chapter 1 Applied Linear Algebra If a symmetric K has a full set of positive pivots, it is positive definite. Reason: The diagonal pivot matrix Din LDLT is positive definite. LT has independent columns (l's on the diagonal and invertible). This is the special case of ATCA with C = D and A= LT. Pivots in D multiply squares in £Tu to give uTKu: (11) The pivots are those factors a and c - ~. This is called "completing the square." If a symmetric K has all positive eigenvalues in A, it is positive definite. Reason: Use K = QAQT. The diagonal eigenvalue matrix A is positive definite. The orthogonal matrix is invertible (Q-1 is QT). Then the triple product QAQT is positive definite. The eigenvalues in A multiply the squares in QTu: (12) The eigenvalues are 3 and 1. The unit eigenvectors are (1, -1)/v'2 and (1, 1)/v'2. If there were negative pivots or negative eigenvalues, we would have a difference of squares. The matrix would be indefinite. Since Ku = >.u leads to uT Ku = >.uTu, positive energy uT Ku requires positive eigenvalues >.. Review and Summary A symmetric matrix K is positive definite if it passes any of these five tests (then it passes them all). I will apply each test to the 3 by 3 second difference matrix K = toeplitz([2 - 1 0]). 1. All pivots are positive J, ! K = LDLT with pivots 2, 2. Upper left determinants > 0 K has determinants 2, 3, 4 3. All eigenvalues are positive K = QAQT with >. = 2, 2 + y'2, 2 - y'2 4. uT Ku> 0 if u -=I= 0 uTKu= 2(u1 - ½u2) 2 + J(u2 - ~u3) 2 + !u32 5. K = ATA, indep. columns A can be the Cholesky factor chol(K) That Cholesky factorization chooses the square upper triangular A = ,./J5LT. The l command chol will fail unless K is positive definite, with positive pivots: Square [ 1.4142 [1.4142 -0.7071 l ATA=K K= -0.7071 1.2247 1.2247 -0.8165 A=chol(K) -0.8165 1.1547 1.1547 1.6 Positive Definite Matrices 71 Minimum Problems in n Dimensions Minimum problems appear everywhere in applied mathematics. Very often, ½uT Ku is the "internal energy" in the system. This energy should be positive, so K is naturally positive definite. The subject of optimization deals with minimization, to produce the best design or the most efficient schedule at the lowest cost. But the cost function P(u) is not a pure quadratic uT Ku with minimum at 0. A key step moves the minimum away from the origin by including a linear term -uTJ. When K = K 2 , the optimization problem is to minimize P(u): Total energy P(u) = ~uTKu - uT f = (u~ - u1u2 + u~) - uif1 - ud2. (13) The partial derivatives (gradient of P) with respect to u 1 and u2 must both be zero: = Calculus gives Ku f 8P/8u1 = 2u1 - u2 - Ji= 0 8P/8u2 = -u1 + 2u2 - h = 0 (14) In all cases the partial derivatives of P(u) are zero when Ku = f. This is truly a minimum point (the graph goes up) when K is positive definite. We substitute u = K-1f into P(u) to find the minimum value of P: P(u) is never below that value Pmin· For every P(u) the difference is ~ 0: P(u) - P(K-1J) = ½uTKu - uT f - (-½JTK-1J) = ½(u- K- 1J)TK(u- K- 1J) ~ 0. (16) The last result is never negative, because it has the form ½vT K v. That result is zero only when the vector v = u - K-1f is zero (which means u = K-1!). So at every point except u = K-1f, the value P(u) is above the minimum value Pmin· Shifted bowl 72 Chapter 1 Applied Linear Algebra Test for a Minimum: Positive Definite Second Derivatives Suppose P(u1, ... ,un) is not a quadratic function. Then its derivatives won't be linear functions. But to minimize P(u) we still look for points where all the first derivatives (the partial derivatives) are zero: 1st derivative vector is gradient 8P/8u (17) If those n first derivatives are all zero at the point u* = (uf, ... , u!), how do we know whether P(u) has a minimum (not a maximum or saddle point) at u*? To confirm a minimum we look at the second derivatives. Remember the rule for an ordinary function y(x) at a point where dy/dx = 0. This point is a minimum if d2y/dx2 > 0. The graph curves upward. Then-dimensional version of d2y/dx2 is the symmetric "Hessian" matrix H of second derivatives: 2nd derivative matrix (18) The Taylor series for P(u), when u is near u*, starts with these three terms (constant, linear from gradient, and quadratic from Hessian): Ta~lor series P(u) = P(u*) + (u* - u?{a)Pu (u*) + !(u* 2 - u)TH(u*)(u* - u) + · · · (19) Suppose the gradient vector 8P/ au of the first derivatives is zero at u *, as in (17). So the linear term is gone and the second derivatives are in control. If H is positive definite at u*, then (u* - u )TH(u* - u) carries the function upward as we leave u*. A positive definite H(u*) produces a minimum at u = u*. Our quadratic functions were P(u) = ½uTKu-uTf. The second derivative matrix was H = K, the same at every point. For non-quadratic functions, H changes from point to point, and we might have several local minima or local maxima. The decision depends on Hat every point u* where the first derivatives are zero. Here is an example with one local minimum at (0, 0), even though the overall minimum is -oo. The function includes a fourth power uf. Example 2 P(u) = 2ur + 3u~ - uf has zero derivatives at (uf, un = (0, 0). • • Second derivatives H = [ 02 P8I2aPu/2aauur1 82P0/82uP1I0aUu~2 ] = [ 4 - 12ur 0 0 ] 6 At the point (0, 0), H is certainly positive definite. So this is a local minimum. There are two other points where both first derivatives 4u1 - 4uf and 6u2 are zero. Those points are u* = (1, 0) and u* = (-1, 0). The second derivatives are -8 and 6 at both of those points, so H is indefinite. The graph of P(u) will look like a bowl around (0, 0), but (1, 0) and (-1, 0) are saddle points. MATLAB could draw y = P(u 1, u2 ). 1.6 Positive Definite Matrices 73 Newton's Method for Minimization This section may have seemed less "applied" than the rest of the book. Maybe so, but minimization is a problem with a million applications. And we need an algorithm to minimize P(u), especially when this function is not a quadratic. We have to expect an iterative method, starting from an initial guess u0 and improving it to u1 and u2 (approaching the true minimizer u* if the algorithm is successful). The natural idea is to use the first and second derivatives of P(u) at the current point. Suppose we have reached ui with coordinates ui, ... ,u~. We need a rule to choose the next point ui+l_ Close to ui, the function P(u) is approximated by cutting off the Taylor series, as in (19). Newton will minimize PcutotT(u). Pcutoff is a quadratic function. Instead of K it has the second derivative H. Both 8P/8u and Hare evaluated at the current point u = ui (this is the expensive part of the algorithm). The minimum of Pcutoff is the next guess ui+1. = Newton's method to solve 8P/8u 0 (21) For quadratics, one step gives the minimizer u 1 = K- 1f. Now 8P/au and H are changing as we move to u1 and ui and ui+1. If ui exactly hits u* (not too likely) then 8P/8u will be zero. So ui+1 - ui = 0 and we don't move away from perfection. Section 2.6 will return to this algorithm. We propose examples in the Problem Set below, and add one comment here. The full Newton step to ui+1 may be too bold, when the true minimizer u* is far away. The terms we cut off could be too large. In that case we shorten the Newton step ui+1 - ui, for safety, by a factor c < 1. Problem Set 1.6 1 Express uTTu as a combination of u~, u1u2 , and u~ for the free-fixed matrix T = [ 1 -1 -1 ] 2 • Write the answer as a sum of two squares to prove positive definiteness. 2 Express uTKu = 4u12 + 16u1u2 + 26ui as a sum of two squares. Then find chol(K) = /DLT. 74 Chapter 1 Applied Linear Algebra 3 A different A produces the circulant second-difference matrix C = ATA: gives How can you tell from A that C = ATA is only semidefinite? Which vectors solve Au= 0 and therefore Cu= 0? Note that chol(C) will fail. 4 Confirm that the circulant C = ATA above is semidefinite by the pivot test. Write uT Cu as a sum of two squares with the pivots as coefficients. (The eigenvalues 0, 3, 3 give another proof that C is semidefinite.) 5 uTCu 2: 0 means that ur + u~ + u~ 2: U1U2 + U2U3 + U3U1 for any U1' U2, U3. A more unusual way to check this is by the Schwarz inequality lvTwl ::; llvll llwll: lu1U2 + U2U3 + U3U1I ::; Jur + u~ + u~ Ju~+ u~ + ur, Which u's give equality? Check that uT Cu = 0 for those u. 6 For what range of numbers b is this matrix positive definite? !]· K=[~ There are two borderline values of b when K is only semidefinite. In those cases write uT Ku with only one square. Find the pivots if b = 5. 7 Is K = ATA or M = BTB positive definite (independent columns in A or B)? We know that uTMu= (Bu)T(Bu) = (u1+4u2)2 + (2u1+ 5u2)2 + (3u1+6u2 )2. Show how the three squares for uTKu= (Au)T(Au) collapse into one square. Problems 8-16 are about tests for positive definiteness. 8 Which of A1, A2 , A3 , A4 has two positive eigenvalues? Use the tests a > 0 and ac > b2 , don't compute the >. 's. Find a vector u so that uT A1u < 0. A3 = [ 101 10100] 9 For which numbers b and c are these matrices positive definite? and With the pivots in D and multiplier in L, factor each A into LDLT. 1.6 Positive Definite Matrices 75 10 Show that f(x, y) = x2 + 4xy + 3y2 does not have a minimum at (0, 0) even though it has positive coefficients. Write f as a difference of squares and find a point (x, y) where f is negative. 11 The function f(x, y) = 2xy certainly has a saddle point and not a minimum at (0, 0). What symmetric matrix S produces this f? What are its eigenvalues? 12 Test the columns of A to see if ATA will be positive definite in each case: [i ~] and A= 13 Find the 3 by 3 matrix S and its pivots, rank, eigenvalues, and determinant: 14 Which 3 by 3 symmetric matrices S produce these functions f = xT Sx? Why is the first matrix positive definite but not the second one? (a) f = 2(x~ + x~ + x~ - X1X2 - x2x3) = (b) f 2(x~ + x~ + x~ - X1X2 - X1X3 - X2X3). 15 For what numbers c and d are A and B positive definite? Test the three upper left determinants (1 by 1, 2 by 2, 3 by 3) of each matrix: A= [ C1 c1 1ll and 1 1 C 16 If A is positive definite then A-1 is positive definite. Best proof: The eigenvalues of A-1 are positive because __ . Second proof (only quick for 2 by 2): The entries of A-1 = ac _1 b2 [ -bc -ba] pass the determinant tests __ . 17 A positive definite matrix cannot have a zero (or even worse, a negative number) on its diagonal. Show that this matrix fails to have uT Au > 0. [ul U2 U3] [ 4~ ~1 ~ll 1 [u~: ] is not positive when (u1, u2, u3) = ( , , ). 18 A diagonal entry aii of a symmetric matrix cannot be smaller than all the >.'s. If it were, then A - ajjl would have __ eigenvalues and would be positive definite. But A - ajjl has a zero on the main diagonal. 76 Chapter 1 Applied Linear Algebra 19 If all>.> 0, show that uTKu > 0 for every u i- 0, not just the eigenvectors xi. Write u as a combination of eigenvectors. Why are all "cross terms" xrXj = 0? 20 Without multiplying A = [ c?s0 sm0 - sin0] [2 cos0 0 OJ [ c?s0 5 - sm0 sin0] find cos0 ' (a) the determinant of A (b) the eigenvalues of A (c) the eigenvectors of A (d) a reason why A is symmetric positive definite. 21 For fi(x, y) = ¼x4 +x2y+y2 and h(x, y) = x3 +xy-x find the second derivative (Hessian) matrices H1 and H2: H1 is positive definite so Ji is concave up(= convex). Find the minimum point of Ji and the saddle point of h (look where first derivatives are zero). 22 The graph of z = x 2 + y2 is a bowl opening upward. The graph of z = x 2 - y2 is a saddle. The graph of z = -x2 - y2 is a bowl opening downward. What is a test on a, b, c for z = ax2 + 2bxy + cy2 to have a saddle at (0, 0)? 23 Which values of c give a bowl and which give a saddle point for the graph of z = 4x2 + 12xy + cy2 ? Describe this graph at the borderline value of c. 24 Here is another way to work with the quadratic function P(u). Check that The last term -½JTK-1f is Pmin· The other (long) term on the right side is always __ . When u = K-1f, this long term is zero so P = Pmin. 25 Find the first derivatives inf= 8P/8u and the second derivatives in the matrix H for P(u) = ui+u~-c(ui+u~)4 . Start Newton's iteration (21) at u0 = (1,0). Which values of c give a next vector u1 that is closer to the local minimum at u* = (0, 0)? Why is (0, 0) not a global minimum? 26 Guess the smallest 2, 2 block that makes [c-1 A; AT __ ] semidefinite. ! 1] 27 If Hand Kare positive definite, explain why M = [ is positive definite 11] but N = [ is not. Connect the pivots and eigenvalues of M and N to the pivots and eigenvalues of H and K. How is chol(M) constructed from chol(H) and chol(K)? 1.6 Positive Definite Matrices 77 28 This "KKT matrix" has eigenvalues >.1 = 1, >.2 = 2, >.3 = -1: Saddle point. Put its unit eigenvectors inside the squares and >. = 1, 2, -1 outside: Verify Wi + w~ - 2uw1 + 2uw2 = 1( __ )2 + 2( __ )2-1( __ )2. The first parentheses contain (w1- w2)/ v2 from the eigenvector (1, -1, 0) / ../2. We are using QAQT instead of LDLT. Still two squares minus one square. 29 (Important) Find the three pivots of that indefinite KKT matrix. Verify that the product of pivots equals the product of eigenvalues (this also equals the determinant). Now put the pivots outside the squares: 1.7 NUMERICAL LINEAR ALGEBRA: LU, QR, SVD Applied mathematics starts from a problem and builds an equation to describe it. Scientific computing aims to solve that equation. Numerical linear algebra displays this "build up, break down" process in its clearest form, with matrix models: Ku=J or Kx=>.x or Mu"+Ku=O. Often the computations break K into simpler pieces. The properties of K are crucial: symmetric or not, banded or not, sparse or not, well conditioned or not. Numerical linear algebra can deal with a large class of matrices in a uniform way, without adjusting to every detail of the model. The algorithm becomes clearest when we see it as a factorization into triangular matrices or orthogonal matrices or very sparse matrices. We will summarize those factorizations quickly, for future use. This chapter began with the special matrices K, T, B, C and their properties. We needed something to work with! Now we pull together the factorizations you need for more general matrices. They lead to "norms" and "condition numbers" of any A. In my experience, applications of rectangular matrices constantly lead to AT and ATA. Three Essential Factorizations I will use the neutral letter A for the matrix we start with. It may be rectangular. If A has independent columns, then K = ATA is symmetric positive definite. Sometimes we operate directly with A (better conditioned and more sparse) and sometimes with K (symmetric and more beautiful). Here are the three essential factorizations, A= LU and A= QR and A= UEVT: (1) Elimination reduces A to U by row operations using multipliers in L: A = LU = lower triangular times upper triangular (2) Orthogonalization changes the columns of A to orthonormal columns in Q: A = QR = orthonormal columns times upper triangular (3) Singular Value Decomposition sees every A as (rotation)(stretch)(rotation): A = U:EVT = orthonormal columns X singular values X orthonormal rows As soon as I see that last line, I think of more to say. In the SVD, the orthonormal columns in U and V are the left and right singular vectors (eigenvectors of AAT and ATA). Then AV = UE is like the usual diagonalization AS = SA by eigenvectors, but with two matrices U and V. We only have U = V when AAT = ATA. 1.7 Numerical Linear Algebra: LU, QR, SVD 79 For a positive definite matrix K, everything comes together: U is Q and VT is QT. The diagonal matrix Eis A (singular values are eigenvalues). Then K = QAQT. The columns of Q are the principal axes = eigenvectors = singular vectors. Matrices with orthonormal columns play a central role in computations. Start there. Orthogonal Matrices The vectors q1, q2 , ... , qn are orthonormal if all their inner products are 0 or 1: q;qj = 0 if i-/- j (orthogonality) T . = 1 (normalization q, q, to unit vectors) (1) Those dot products are beautifully summarized by the matrix multiplication QTQ = /: If Q is square, we call it an orthogonal matrix. QTQ = I tells us immediately that o The inverse of an orthogonal matrix is its transpose: Q-1 = QT. o Multiplying a vector by Q doesn't change its length: IIQxll = llxll- Length (soon called norm) is preserved because 11Qxll 2 = xTQTQx = xTx = llxll 2. This doesn't require a square matrix: QTQ = I for rectangular matrices too. But a two-sided inverse Q-1 = QT (so that QQT is also I) does require that Q is square. Here are three quick examples of Q: permutations, rotations, reflections. Example 1 Every permutation matrix P has the same rows as I, but probably in a different order. P has a single 1 in every row and in every column. Multiplying Px puts the components of x in that row order. Reordering doesn't change the length. All n by n permutation matrices (there are n! of them) have p-1 = pT_ The l's in pT hit the l's in P to give pTP = I. Here is a 3 by 3 example of Px: l [ l ~ ~ ~ ~ ~ ~ pTp = [ = J (3) 010 100 Example 2 Rotation changes the direction of vectors. It doesn't change lengths. Every vector just turns: Rotation matrix in the 1-3 plane l ~ Q = [ co; 0 - st 0 sin0 0 cos0 Every orthogonal matrix Q with determinant 1 is a product of plane rotations. 80 Chapter 1 Applied Linear Algebra l Example 3 The reflection H takes every v to its image Hv on the other side of a plane mirror. The unit vector u (perpendicular to the mirror) is reversed into Hu= -u: Reflection matrix u = (cos0,O,sin0) - cos 20 0 - sin 20 H = I - 2uuT = [ 0 1 0 - sin 20 0 cos 20 (4) This "Householder reflection" has determinant -1. Both rotations and reflections have orthonormal columns, and (I - 2uuT)u = u - 2u guarantees that Hu= -u. Modern orthogonalization uses reflections to create the Q in A= QR. Orthogonalization A= QR We are given an m by n matrix A with linearly independent columns a 1, ... , an. Its rank is n. Those n columns are a basis for the column space of A, but not necessarily a good basis. All computations are improved by switching from the a, to orthonormal vectors q1, ... , qn. There are two important ways to go from A to Q. 1. The Gram-Schmidt algorithm gives a simple construction of the q's from the a's. First, q1 is the unit vector ai/lla1II- In reverse, a1 = r11q1 with r11 = llaill- Second, subtract from a2 its component in the q1 direction (the Gram-Schmidt idea). That vector B = a2 - (q!a2)q1 is orthogonal to q1. Normalize B to q2 = B/IIBII- At every step, subtract from ak its components in the settled directions q1, ... , qk-I, and normalize to find the next unit vector qk. Gram-Schmidt (m by n)(n by n) (5) 2. The Householder algorithm uses reflection matrices I - 2uuT. Column by column, it produces zeros in R. In this method, Q is square and R is rectangular: Householder qr(A) (m by m)(m by n) (6) The vector q3 comes for free! It is orthogonal to a1, a2 and also to q1, q2. This method is MATLAB's choice for qr because it is more stable than Gram-Schmidt and gives extra information. Since q3 multiplies the zero row, it has no effect on A= QR. Use qr(A, 0) to return to the "economy size" in (5). Section 2.3 will give full explanations and example codes for both methods. Most linear algebra courses emphasize Gram-Schmidt, which gives an orthonormal basis q1, ... , qr for the column space of A. Householder is now the method of choice, completing to an orthonormal basis q1, ... , qm for the whole space Rm. 1.7 Numerical Linear Algebra: LU, QR, SVD 81 Numerically, the great virtue of Q is its stability. When you multiply by Q, overflow and underflow will not happen. All formulas involving ATA become simpler, since QTQ = I. A square system Qx = b will be perfectly conditioned, because llxll = llbll and an error !lb produces an error Llx of the same size: gives and llllxll = llllbll- (7) Singular Value Decomposition This section now concentrates on the SVD, which reaches a diagonal matrix E. Since diagonalization involves eigenvalues, the matrices from A = QR will not do the job. Most square matrices A are diagonalized by their eigenvectors x1, ... , Xn, If xis a combination c1x1 + · · · + CnXn, then A multiplies each Xi by >.i. In matrix language this is Ax = SAS- 1x. Usually, the eigenvector matrix S is not orthogonal. Eigenvectors only meet at right angles when A is special (for example symmetric). If we want to diagonalize an ordinary A by orthogonal matrices, we need two different Q's. They are generally called U and V, so A = UI::VT. What is this diagonal matrix E? It now contains singular values ai instead of eigenvalues Ai. To understand those ai, the key is always the same: Look at ATA. Find V and I:: Removing UTU = I leaves V(ETE)VT. This is exactly like K = QAQT, but it applies to K = ATA. The diagonal matrix ETE contains the numbers a;, and those are the positive eigenvalues of ATA. The orthonormal eigenvectors of AT A are in V. In the end we want AV= UE. So we must choose Ui = Avdui, These ui are orthonormal eigenvectors of AAT. At this point we have the "reduced" SVD, with v1, ... , Vr and u1, ... , Ur as perfect bases for the column space and row space of A. The rank r is the dimension of these spaces, and svd(A, 0) gives this form: = A UmxrErxr V,.~n Reduced SVD = from ui Avdui [u, U1 ···Ur J[ lv! (9) VrT To complete the v's, add any orthonormal basis Vr+1, ... , Vn for the nullspace of A. To complete the u's, add any orthonormal basis Ur+1, ... , Um for the nullspace of AT. To complete E to an m by n matrix, add zeros for svd(A) and the unreduced form: 82 Chapter 1 Applied Linear Algebra = A Umxm ~mxn vnTxn Full SVD V1T [ a, U1 ···Ur·· ·Um (10) VT l ar T VnT Normally we number the ui, ai, Vi so that a1 2: a2 2: ••• 2: ar > 0. Then the SVD has the wonderful property of splitting any matrix A into rank-one pieces ordered by their size: The first piece u1a1v! is described by only m + n + l numbers, not mn. Often a few pieces contain almost all the information in A (in a stable form). This isn't a fast method for image compression because computing the SVD involves eigenvalues. (Filters are faster.) The SVD is the centerpiece of matrix approximation. The right and left singular vectors v, and u, are the Karhunen-Loeve bases in engineering. A symmetric positive definite K has v, = ui: one basis. I think of the SVD as the final step in the Fundamental Theorem of Linear Algebra. First come the dimensions of the four subspaces. Then their orthogonality. Then the orthonormal bases u1, ... , Um and v1, ... , Vn which diagonalize A. SVD = AvJ aJUJ for j ::=; r AvJ = 0 for j > r = ATu3 a 3v 3 for j ::=; r = ATu3 0 for j > r (12) - - -------- ~ ......---...... u ......---...... ~ V / / / / Figure 1.18: U and V are rotations and reflections. ~ stretches by a1, ... , ar. 1.7 Numerical Linear Algebra: LU, QR, SVD 83 These ui = Avi/<7i are orthonormal eigenvectors of AAT. Start from ATAvi = <7?W Multiply by v;: v'[ATAvi = <7?v'[vi says that IIAvill = <7i so lluill = 1 Multiply by vJ: vJ ATAvi = <7?vJvi says that (Avj) • (Avi) = 0 so uJui = 0 Multiply by A: AATAvi = <7?Avi says that AATui = ului Here is a homemade code to create the SVD. It follows the steps above, based primarily on eig(A'*A). The faster and more stable codes in LAPACK work directly with A. Ultimately, stability may require that very small singular values are replaced by <7 = 0. The SVD identifies the dangers in Ax= b (near O in A, very large in x). % input A, output orthogonal U, V and diagonal sigma with A=U*sigma*V' [m,n]=size(A); r=rank(A); [V,squares]=eig(A'*A); % n by n matrices sing=sqrt(squares(l:r,l:r)); % r by r, singular values> 0 on diagonal sigma=zeros(m,n); sigma(l:r,1:r)=sing; % m by n singular value matrix u=A*V(:,1:r)*inv(sing); % first r columns of U (singular vectors) [U,R]=qr(u); U(:,1:r)=u; % qr command completes u to an m by m U A-U*sigma*V'; % test for zero m by n matrix (could print its norm) Example 4 Find the SVD for the singular matrix A= [~ ~]. Solution A has rank one, so there is one singular value. First comes ATA: [! -!J . ~ ATA= [;~ ;~] has>.= 100 and 0, with eigenvectors [v1 v2] = The singular value is <71 = v'Io5 = 10. Then u1 = Avi/10 = (1, 7)/../50. Add in U2 = (-7, 1)/../50: o] A= u1::vT = _1_ [1 ../50 7 -71] [100 0 1 [ 1 y'2 -1 1] 1 • Example 5 Find the SVD of then+ 1 by n backward difference matrix b._. Solution With diagonal l's and subdiagonal -l's in b._, the products S:b._ and LLS: are Kn and Bn+l· When (n + l)h = 1r, Kn has eigenvalues >. = <72 = 2 - 2 cos kh and eigenvectors vk = (sin kh, ... , sin nkh). Bn+i has the same eigenvalues (plus An+1 = 0) and its eigenvectors are uk = (cos ½kh, ... , cos(n + ½)kh) in U. Those eigenvectors vk and uk fill the DST and DCT matrices. Normalized to unit length, these are the columns of V and U. The SVD is b._ = (DCT)I:(DST). The equation b._vk = <7kUk says that the first differences of sine vectors are cosine vectors. Section 1.8 will apply the SVD to Principal Component Analysis and to Model Reduction. The goal is to find a small part of the data and the model (starting with u1 and v1) that carries the important information. 84 Chapter 1 Applied Linear Algebra The Pseudoinverse By choosing good bases, A multiplies vi in the row space to give a,u, in the column space. A-1 must do the opposite! If Av= au then A-1u = v/a. The singular values of A-1 are 1/a, just as the eigenvalues of A-1 are 1/A. The bases are reversed. The u's are in the row space of A-1, the v's are in the column space. Until this moment we would have added "if A-1 exists." Now we don't. A matrix that multiplies u, to produce v,/ai does exist. It is the pseudoinverse A+ = pinv(A). The vectors u1 , ... , Ur in the column space of A go back to the row space. The other vectors Ur+l, ... , Um are sent to zero. When we know what happens to each basis vector ui, we know A+. The pseudoinverse has the same rank r as A. In the pseudoinverse E+ of the diagonal matrix E, each a is replaced by a-1. The product E+E is as near to the identity as we can get. So are AA+ and A+A: AA+= projection matrix onto the column space of A A+ A= projection matrix onto the row space of A Example 6 Find the pseudoinverse A+ of the same rank one matrix A= [~ ~]. Solution Since A has a 1 = 10, the pseudoinverse A+= pinv(A) has 1/10. A+= vE+uT = _J_2!_ [1 1 -lJ [1/10 1 0 OJ 0 _J510_ [ - 1 7 7] = _1_ [1 1 100 1 7] 7 • The pseudoinverse of a rank-one matrix A= auvT is A+= vuT /a, also rank-one. Always A+b is in the row space of A (a combination of the basis u1, ... , ur)- With n > m, Ax = b is solvable when b is in the column space of A. Then A+b is the shortest solution because it has no nullspace component, while A\b is a different "sparse solution" with n - m zero components. Condition Numbers and Norms The condition number of a positive definite matrix is c(K) = Amax/Amin· This ratio measures the "sensitivity" of the linear system Ku = f. Suppose f changes by b..f because of roundoff or measurement error. Our goal is to estimate b..u (the change in the solution) . If we are serious about scientific computing, we have to control errors. Subtract Ku= f from K(u + .6.u) = f + b..f. The error equation is K(b..u) = .6.f. Since K is positive definite, Amin gives a reliable bound on b..u: Error bound K(b..u) = b..f means b..u = K-1(b..f). Then ll.6.ull :'.S llb..JII (14) Amin(K) 1.7 Numerical Linear Algebra: LU, QR, SVD 85 The top eigenvalue of K-1 is 1/Amin(K). Then f::J.u is largest in the direction of that eigenvector. The eigenvalue Amin indicates how close K is to a singular matrix (but eigenvalues are not reliable for an unsymmetric matrix A). That single number Amin has two serious drawbacks in measuring the sensitivity of Ku = f or Ax = b. First, if we multiply K by 1000, then u and f::J..u are divided by 1000. That rescaling (to make Kless singular and Amin larger) cannot change the reality of the problem. The relative error 11!::J..ull/llull stays the same, since 1000/1000 = 1. It is the relative changes in u and f that we should compare. Here is the key for positive definite K: m · ll!::J../11 by !lull >_ 11/11 . 11!::J..ull Amax(K) ll!::J../11 Dividing 11!::J..ull :::; Amin(K) Amax(K) gives M:::; Amin(K) In words: f::J..u is largest when f::J..f is an eigenvector for Amin· The true solution u is smallest when f is an eigenvector for Amax• The ratio Amax/Amin produces the condition number c(K), the maximum "blowup factor" in the relative error. Condition number for positive definite K c(K) = Amax(K) Amin(K) When A is not symmetric, the inequality IIAxll :::; Amax(A)llxll can be false (see Figure 1.19). Other vectors can blow up more than eigenvectors. A triangular matrix with l's on the diagonal might look perfectly conditioned, since Amax=Amin=l. We need a norm IIAII to measure the size of every A, and Amax won't work. DEFINITIONS The norm IIAII is the maximum of the ratio IIAxll/llxllThe condition number of A is IIAII times IIA-1 11- Norm IIAII = max IIAxll # 0 llxll Condition number c(A) = IIAII IIA- 1 11 (15) n A=[~ ATA= [~ ~] detATA = 1 ellipse of all Ax IIAII = l + y'5 2 IIAll 2 = Amax(ATA) ~ 2.6 1/IIA-1 11 2 = Amin(AT A) ~ 1/2.6 c(A) = IIAIIIIA-1 11 ~ 2.6 Figure 1.19: The norms of A and A-1 come from the longest and shortest Ax. 86 Chapter 1 Applied Linear Algebra IIAxll/llxll is never larger than IIAII (its maximum), so always IIAxll :=:; IIAll llxll- For all matrices and vectors, the number IIAII meets these requirements: IIAxll :=:; IIAll llxll and IIABII :=:; IIAII IIBII and IIA + BIi :=:; IIAII + IIBII- (16) Thenormof lO00Awill be lO00IIAII- But 1000A has the same condition number as A. For a positive definite matrix, the largest eigenvalue is the norm: IIKII = Amax(K). Reason: The orthogonal matrices in K = QAQT leave lengths unchanged. So IIKII = IIAII = Amax• Similarly IIK-1 11 = 1/Amin(K). Then c(K) = Amax/Amin is correct. A very unsymmetric example has Amax= 0, but the norm is IIAII = 2: and the ratio is IIAxll llxll This unsymmetric A leads to the symmetric ATA = [ g~]. The largest eigenvalue is ar = 4. Its square root is the norm: IIAII = 2 = largest singular value. This singular value JAmax(AT A) is generally larger than Amax(A). Here is the great formula for IIAll 2 all on one line: Norm (17) The norm of A-1 is 1/amin, generally larger than 1/Amin· The product is c(A): Condition number c(A) = IIAIIIIA-1 11 = O"max • (18) O"min Here is one comment: amin tells us the distance from an invertible A to the nearest singular matrix. When O"min changes to zero inside E, it is multiplied by U and VT (orthogonal, preserving norms). So the norm of that smallest change in A is O"min· Example 7 For this 2 by 2 matrix A, the inverse just changes 7 to - 7. Notice that 72 + 12 = 50. The condition number c(A) = IIAII IIA-1 11 is at least v'50v'50 = 50: Ax=[~ ~] [~] A-1x = [ ~ -~] [ ~] has IIAxll llxll has IIA-1xll llxll v'50 1 So IIAII ?: v50 v'50 1 Suppose we intend to solve Ax= b = [I]. The ~olution is x = [~]. Move th~ right side -:-t] , by b..b = [ .1] . Then x moves by b..x = [ since A(b..x) = b..b. The relative change in x is 50 times the relative change in b: lib.xii = ( l)v50 is 50 times greater than ll~bll llxll • llbll 1.7 Numerical Linear Algebra: LU, QR, SVD 87 Example 8 The eigenvalues of the -1, 2, -1 matrix are A = 2 - 2 cos n".;1 . Then k = l and k = n give Amin and Amax- The condition number of Kn grows like n2: Amax is nearly 2 - 2 cos 1r = 4, at the top of Figure 1.13. The smallest eigenvalue uses cos0 ~ 1- ½02 from calculus, which is the same as 2 - 2cos0 ~ 02 = (n:1 )2. A rough rule for Ax = b is that the computer loses about log c decimals to roundoff error. MATLAB gives a warning when the condition number is large (c is not calculated exactly, the eigenvalues of ATA would take too long). It is normal for c(K) to be of order 1/(.6.x)2 in approximating a second-order differential equation, agreeing with n2 in (19). Fourth order problems have Amax/Amin~ C/(.6.x)4 . Row Exchanges in PA = LU Our problems might be ill conditioned or well conditioned. We can't necessarily control c(A), but we don't want to make the condition worse by a bad algorithm. Since elimination is the most frequently used algorithm in scientific computing, a lot of effort has been concentrated on doing it right. Often we reorder the rows of A. The main point is that small pivots are dangerous. To find the numbers that multiply rows, we divide by the pivots. Small pivots mean large multipliers in L. Then L (and probably U) are more ill-conditioned than A. The simplest cure is to exchange rows by P, bringing the largest possible entry up into the pivot. The command lu(A) does this "partial pivoting" for A = [1 2; 3 3 ]. The first pivot changes from 1 to 3. Partial pivoting avoids multipliers in L larger than 1: [P, L, u] = lu{A) The product of pivots is - det A = +3 since P exchanged the rows of A. A positive definite matrix K has no need for row exchanges. Its factoriza- tion into K = LDLT can be rewritten as K = L../l5..fl5LT (named after Cholesky). In this form we are seeing K = ATA with A= ../DLT. Then we know from (17) that Amax(K) = IIKII = O'~ax(A) and Amin(K) = a~in(A). Elimination to A= chol(K) does absolutely no harm to the condition number of a positive definite K = ATA: A= chol{K) c(K) = Amax(K) = (O'max(A)) 2 = (c(A))2. (20) Amin(K) O'min(A) Usually elimination into PA = LU makes c(L) c(U) larger than the original c(A). That price is often remarkably low-a fact that we don't fully understand. 88 Chapter 1 Applied Linear Algebra The next chapters build models for important applications. Discrete problems lead to matrices A and AT and ATA in Chapter 2. A differential equation produces many discrete equations, as we choose finite differences or finite elements or spectral methods or a Fourier transform-or any other option in scientific computing. All these options replace calculus, one way or another, by linear algebra. Problem Set 1.7 Problems 1-5 are about orthogonal matrices with QTQ = I. 1 Are these pairs of vectors orthonormal or only orthogonal or only independent? (c) [c?s0] and [-sin0]. sm0 cos0 Change the second vector when necessary to produce orthonormal vectors. 2 Give an example of each of the following: (a) A matrix Q that has orthonormal columns but QQT -=I= I. (b) Two orthogonal vectors that are not linearly independent. (c) An orthonormal basis for R 4, where every component is ½or -½- 3 If Q and Q are orthogonal matrices, show that their product Q Q is also an 1 2 1 2 orthogonal matrix. (Use QTQ = I.) 4 Orthonormal vectors are automatically linearly independent. Two proofs: (a) Vector proof: + + When c1q1 c2q2 c 3q3 = 0, what dot product leads to c = O? Similarly c = 0 and c = 0. Thus the q's are independent. 1 2 3 (b) Matrix proof: Show that Qx = 0 leads to x = 0. Since Q may be rectan- gular, you can use QT but not Q-1. 5 If a a a is a basis for R 3 any vector b can be written as , , , 1 2 3 or (a) Suppose the a's are orthonormal. Show that x1 = aTb. (b) Suppose the a's are orthogonal. Show that x1 = aTb/aTa1. (c) If the a's are independent, x1 is the first component of __ times b. 1.7 Numerical Linear Algebra: LU, QR, SVD 89 Problems 6-14 and 31 are about norms and condition numbers. 6 Figure 1.18 displays any matrix A as rotation times stretching times rotation: A=U~VT= [c?sa -sina] [u1 ] [ c?s0 sin0] (2l) sm a cos a u2 - sm 0 cos 0 The count of four parameters a, u1, u2 , 0 agrees with the count of four entries an, a12, a21, a22- When A is symmetric and a12 = a21, the count drops to three because a= 0 and we only need one Q. The determinant of A in (21) is u1u2 . For det A< 0, add a reflection. In Figure 1.19, verify Amax(ATA) = ½(3 + v'S) and its square root IIAII = ½(1 + v'S). 7 Find by hand the norms Amax and condition numbers Amax/ Amin of these posi- tive definite matrices: 8 Compute the norms and condition numbers from the square roots of A (AT A): 9 Explain these two inequalities from the definitions of the norms IIAII and IIBII: IIABxll ::; IIAII IIBxll ::; IIAII IIBll llxll- From the ratio that gives IIABII, deduce that IIABII ::; IIAII IIBII- This fact is the key to using matrix norms. 10 Use IIABII ::; IIAII IIBII to prove that the condition number of any matrix A is at least 1. Show that an orthogonal Q has c(Q) = 1. 11 If A is any eigenvalue of A, explain why jAI ::; IIAII- Start from Ax= AX. 12 The "spectral radius" p(A) = IAmaxl is the largest absolute value of the eigen- values. Show with 2 by 2 examples that p(A + B) ::; p(A) + p(B) and p(AB) ::; p(A)p(B) can both be false. The spectral radius is not acceptable as a norm. 13 Estimate the condition number of the ill-conditioned matrix A = [½1.o~oi] . 14 The "€1 norm" and the "€00 norm" of x = (x1 , ... , Xn) are Compute the norms llxll and llxll1 and llxii 00 of these two vectors in R5: x = (l, 1, 1, 1, 1) x = (.l, .7, .3, .4, .5).