ANALYSIS ON Analysis on Manifolds James R. Munkres Massachusetts Institute of Technology Cambridge, Massachusetts ADDISON-WESLEY PUBLISHING COMPANY The Advanced Book Program Redwood City, California • Menlo Park, California • Reading, Massachusetts New York • Don Mills, Ontario • Wokingham, United Kingdom • Amsterdam Bonn• Sydney •Singapore• Tokyo• Madrid• San Juan Publisher: Allan M. Wylde Production Manager: Jan V. Benes Marketing Manager: Laura Likely Electronic Composition: Peter Vacek Cover Design: Iva Frank Library of Congress Cataloging-in-Publication Data Munkres, James R., 1930- Analysis on manifolds/James R. Munkres. p. cm. Includes bibliographical references. 1. Mathematical analysis. 2. Manifolds (Mathematics) QA300.M75 1990 516.3'6'20-dc20 91-39786 ISBN 0-201-51035-9 CIP This book was prepared using the '!EX typesetting language. Copyright ©1991 by Addison-Wesley Publishing Company, The Advanced Book Program, 350 Bridge Parkway, Suite 209, Redwood City, CA 94065 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Published simultaneously in Canada. ABCDEFGHIJ-MA-943210 Preface This book is intended as a text for a course in analysis, at the senior or first-year graduate level. A year-long course in real analysis is an essential part of the preparation of any potential mathematician. For the first half of such a course, there is substantial agreement as to what the syllabus should be. Standard topics include: sequence and series, the topology of metric spaces, and the derivative and the Riemannian integral for functions of a single variable. There are a number of excellent texts for such a course, including books by Apostol [A], Rudin [Ru], Goldberg [Go], and Royden (Ro], among others. There is no such universal agreement as to what the syllabus of the second half of such a course should be. Part of the problem is that there are simply too many topics that belong in such a course for one to be able to treat them all within the confines of a single semester, at more than a superficial level. At M.I.T., we have dealt with the problem by offering two independent second-term courses in analysis. One of these deals with the derivative and the Riemannian integral for functions of several variables, followed by a treatment of differential forms and a proof of Stokes' theorem for manifolds in euclidean space. The present book has resulted from my years of teaching this course~ The other deals with the Lebesgue integral in euclidean space and its applications to Fourier analysis. Prequisites As indicated, we assume the reader has completed a one-term course in analysis that included a study of metric spaces and of functions of a single variable. We also assume the reader has some background in linear algebra, including vector spaces and linear transformations, matrix algebra, and determinants. The first chapter of the book is devoted to reviewing the basic results from linear algebra and analysis that we shall need. Results that are truly basic are V vi Preface stated without proof, but proofs are provided for those that are sometimes omitted in a first course. The student may determine from a perusal of this chapter whether his or her background is sufficient for the rest of the book. How much time the instructor will wish to spend on this chapter will depend on the experience and preparation of the students. I usually assign Sections 1 and 3 as reading material, and discuss the remainder in class. How the book is organized The main part of the book falls into two parts. The first, consisting of Chapter 2 through 4, covers material that is fairly standard: derivatives, the inverse function theorem, the Riemann integral, and the change of variables theorem for multiple integrals. The second part of the book is a bit more sophisticated. It introduces manifolds and differential forms in Rn, providing the framework for proofs of the n-dimensional version of Stokes' theorem and of the Poincare lemma. A final chapter is devoted to a discussion of abstract manifolds; it is intended as a transition to more advanced texts on the subject. The dependence among the chapters of the book is expressed in the following diagram: Chapter 1 Chapter 2 Chapter 3 The Algebra and Topology of Rn ! Differentiation ! Integration Chapter 4 ChLge of Variables Chapter 5 Mlifolds Chapter 7 Chapter 6 Differential Forms I Stokes' Theorem Chapter 8 Closed Forms and Exact Forms Chapter 9 Epilogue-Life Outside nn Preface VII Certain sections of the books are marked with an asterisk; these sections may be omitted without loss of continuity. Similarly, certain theorems that may be omitted are marked with asterisks. When I use the book in our undergraduate analysis sequence, I usually omit Chapter 8, and assign Chapter 9 as reading. With graduate students, it should be possible to cover the entire book. At the end of each section is a set of exercises. Some are computational in nature; students find it illuminating to know that one can compute the volume of a five-dimensional ball, even if the practical applications are limited! Other exercises are theoretical in nature, requiring that the student analyze carefully the theorems and proofs of the preceding section. The more difficult exercises are marked with asterisks, but none is unreasonably hard. Acknowledgements Two pioneering works in this subject demonstrated that such topics as manifolds and differential forms could be discussed with undergraduates. One is the set of notes used at Princeton c. 1960, written by Nickerson, Spencer, and Steenrod [N-S-S]. The second is the book by Spivak [S]. Our indebtedness to these sources is obvious. A more recent book on these topics is the one by Guillemin and Pollack [G-P]. A number of texts treat this material at a more advanced level. They include books by Boothby [B], Abraham, Mardsen, and Raitu [A-M-R], Berger and Gostiaux [B-G], and Fleming [F]. Any of them would be suitable reading for the student who wishes to pursue these topics further. I am indebted to Sigurdur Helgason and Andrew Browder for helpful comments. To Ms. Viola Wiley go my thanks for typing the original set of lecture notes on which the book is based. Finally, thanks is due to my students at M.I.T., who endured my struggles with this material, as I tried to learn how to make it understandable (and palatable) to them! J.R.M. Contents PREFACE V CHAPTER 1 The Algebra and Topology of Rn 1 §1. Review of Linear Algebra 1 §2. Matrix Inversion and Determinants 11 §3. Review of Topology in Rn 25 §4. Compact Subspaces and Connected Subspaces of Rn 32 CHAPTER 2 Differentiation §5. Derivative 41 §6. Continuously Differentiable Functions 49 §7. The Chain Rule 56 §8. The Inverse Function Theorem 63 *§9. The Implicit Function Theorem 71 41 ix X Contents CHAPTER 3 Integration 81 §10. The Integral over a Rectangle 81 §11. Existence of the Integral 91 §12. Evaluation of the Integral 98 §13. The Integral over a Bounded Set 104 §14. Rectifiable Sets 112 §15. Improper Integrals 121 CHAPTER 4 Changes of Variables 135 §16. Partitions of Unity 136 §17. The Change of Variables Theorem 144 §18. Diffeomorphisms in R" 152 §19. Proof of the Change of Variables Theorem 160 §20. Application of Change of Variables 169 CHAPTER 5 Manifolds 179 §21. The Volumne of a Parallelopiped 178 §22. The Volume of a Parametrized-Manifold 186 §23. Manifolds in R" 194 §24. The Boundary of a Manifold 201 §25. Integrating a Scalar Function over a Manifold 207 CHAPTER 6 Differential Forms 219 §26. §27. §28. §29. §30. *§31. §32. Multilinear Algebra 220 Alternating Tensors 226 The Wedge Product 236 Tangent Vectors and Differential Forms 244 The Differential Operator 252 Application to Vector and Scalar Fields 262 The Action of a Differentiable Map 267 CHAPTER 7 Stokes' Theorem Contents xi 275 §33. §34. §35. *§36. §37. *§38. Integrating Forms over Parametrized-Manifold 275 Orientable Manifolds 281 Integrating Forms over Oriented Manifolds 293 A Geometric Interpretation of Forms and Integrals 297 The Generalized Stokes' Theorem 301 Applications to Vector Analysis 310 CHAPTER 8 Closed Forms and Exact Forms 323 §39. The Poincare Lemma 324 §40. The deRham Groups of Punctured Euclidean Space 334 CHAPTER 9 Epilogue-Life Outside Rn 345 §41. Differentiable Manifolds and Riemannian Manifolds 345 BIBLIOGRAPHY 259 Analysis on Manifolds The Algebra and Topology of Rn §1. REVIEW OF LINEAR ALGEBRA Vector spaces Suppose one is given a set V of objects, called vectors. And suppose there is given an operation called vector addition, such that the sum of the vectors x and y is a vector denoted x + y. Finally, suppose there is given an operation called scalar multiplication, such that the product of the scalar (i.e., real number) e and the vector xis a vector denoted ex. The set V, together with these two operations, is called a vector space (or linear space) if the following properties hold for all vectors x, y, z and all scalars e, d: = (1) X + y y + X. = (2) x + (y + z) (x + y) + z. = (3) There is a unique vector Osuch that x + 0 x for all x. = (4) x + (-l)x 0. (5) lx =x. = (6) e(dx) (ed)x. = (7) (c + d)x ex + dx. (8) e(x + y) = ex+ cy. 1 2 The Algebra and Topology of Rn Chapter 1 One example of a vector space is the set Rn of all n-tuples of real numbers, with component-wise addition and multiplication by scalars. That is, if x = (x1, ... ,xn) andy= (Yt,•••,Yn), then X + Y = (x1 + Yt, · · •, Xn + Yn), ex= (cxi, ... , cxn)• The vector space properties are easy to check. If V is a vector space, then a subset W of V is called a linear subspace (or simply, a subspace) of V if for every pair x,y of elements of W and every scalar c, the vectors x + y and ex belong to W. In this case, W itself satisfies properties (1)-(8) if we use the operations that W inherits from V, so that Wis a vector space in its own right. In the first part of this book, nn and its subspaces are the only vector spaces with which we shall be concerned. In later chapters we shall deal with more general vector spaces. Let V be a vector space. A set a 1 , ... , Rm of vectors in V is said to span V if to each x in V, there corresponds at least one m-tuple of scalars C1, ... , Cm such that = X C1a1 + · · · + Cm8m, In this case, we say that x can be written as a linear combination of the vectors a1, ... , Rm. The set a1, ... , am of vectors is said to be independent if to each x in V there corresponds at most one m-tuple of scalars c1, ... , Cm such that Equivalently, {a1, ... , am} is independent if to the zero vector O there corre- sponds only one m-tuple of scalars d1, ... , dm such that = 0 d1a1 + ··· + dmam, = = = namely the scalars d1 d2 = · · · dm 0. If the set of vectors a 1, ... , Rm both spans V and is independent, it is said to be a basis for V. One has the following result: Theorem 1.1. Suppose V has a basis consisting of m vectors. Then any set of vectors that spans V has at least m vectors, and any set of vectors of V that is independent has at most m vectors. In particular, any basis for V has exactly m vectors. □ If V has a basis consisting of m vectors, we say that m is the dimension of V. We make the convention that the vector space consisting of the zero vector alone has dimension zero. §1. Review of Linear Algebra 3 It is easy to see that Rn has dimension n. (Surprise!) The following set of vectors is called the standard basis for Rn: e1 = (1,0,0, ... ,0), e2 = (0,1,0, ... ,o), en= (0,0,0, ... ,1). The vector space Rn has many other bases, but any basis for Rn must consist of precisely n vectors. One can extend the definitions of spanning, independence, and basis to allow for infinite sets of vectors; then it is possible for a vector space to have an infinite basis. (See the exercises.) However, we shall not be concerned with this situation. Because nn has a finite basis, so does every subspace of Rn. This fact is a consequence of the following theorem: Theorem 1.2. Let V be a vector space of dimension m. If W is a linear subspace of V {different from VJ, then W has dimension less than m. Furthermore, any basis a 1 , . .. , ak for W may be extended to a basis a1, ... ,ak, ak+l, ... ,am for V. □ Inner products If V is a vector space, an inner product on V is a function assigning, to each pair x, y of vectors of V, a real number denoted (x, y), such that the following properties hold for all x, y, z in V and all scalars c: (1) (x,y) = (y, x). = (2) (x + y, z) (x, z) + (y, z). = (3) (cx,y) c(x,y) = (x, cy). (4) (x,x) > 0 if x / 0. A vector space V together with an inner product on V is called an inner product space. A given vector space may have many different inner products. One par- = ticularly useful inner product on nn is defined as follows: If x = (x1, ... , Xn) and y (Y1, ... , Yn), we define The properties of an inner product are easy to verify. This is the inner product we shall commonly use in Rn. It is sometimes called the dot product; we denote it by (x, y) rather than x • y to avoid confusion with the matrix product, which we shall define shortly. 4 The Algebra and Topology of R" Chapter 1 If V is an inner product space, one defines the length (or norm) of a vector of V by the equation The norm function has the following properties: (1) llxll > 0 if xi 0. (2) llcxll = lcl llxll- (3) llx + YII < llxll + IIYII- The third of these properties is the only one whose proof requires some work; it is called the triangle inequality. (See the exercises.) An equivalent form of this inequality, which we shall frequently find useful, is the inequality (3') !Ix - YII > llxll - IIYII- Any function from V to the reals R that satisfies properties (1)-(3) just listed is called a norm on V. The length function derived from an inner product is one example of a norm, but there are other norms that are not derived from inner products. On Rn, for example, one has not only the familiar norm derived from the dot product, which is called the euclidean norm, but one has also the sup norm, which is defined by the equation The sup norm is often more convenient to use than the euclidean norm. We note that these two norms on Rn satisfy the inequalities lxl < llxll < v'nlxl. Matrices A matrix A is a rectangular array of numbers. The general number appearing in the array is called an entry of A. If the array has n rows and m columns, we say that A has size n by m, or that A is "an n by m matrix." We usually denote the entry of A appearing in the ith row and Ph column by llij; we call i the row index and j the column index of this entry. If A and B are matrices of size n by m, with general entries aii and bi;, respectively, we define A + B to be the n by m matrix whose general entry is· aij + b,;, and we define cA to be the n by m matrix whose general entry is Cllij. With these operations, the set of all n by m matrices is a vector space; the eight vector space properties are easy to verify. This fact is hardly surprising, for an n by m matrix is very much like an nm-tuple; the only difference is that the numbers are written in a rectangular array instead of a linear array. §1. Review of Linear Algebra 5 The set of matrices has, however, an additional operation, called matrix multiplication. If A is a matrix of size n by m, and if B is a matrix of size m by p, then the product A •B is defined to be the matrix C of size n by p whose general entry c1; is given by the equation I:m = c1; aikb1c;. k=l This product operation satisfies the following properties, which are straightforward to verify: (1) A· (B -C) =(A· B) · C. (2) A· (B + C) =A· B +A· C. = (3) (A+ B)-C A-C + B · C. = (4) (cA). B c(A •B) =A· (cB). (5) For each k, there is a k by k matrix I1c such that if A is any n by m matrix, and A-Im= A. In each of these statements, we assume that the matrices involved are of appropriate sizes, so that the indicated operations may be performed. The matrix I1c is the matrix of size k by k whose general entry Oij is = = = defined as follows: Di; 0 if i i= j, and /J1; 1 if i j. The matrix l1c is called the identity matrix of size k by k; it has the form 1 0 0 0 1 0 0 0 1 with entries of 1 on the "main diagonal" and entries of Oelsewhere. We extend to matrices the sup norm defined for n-tuples. That is, if A is a matrix of size n by m with general entry ai;, we define = = = IAI max{lai;I; i 1, ... , n and j 1, ... , m}. The three properties of a norm are immediate, as is the following useful result: Theorem 1.3. If A has size n by m, and B has size m by p, then IA· Bl < mlAI IBI. □ 6 The Algebra and Topology of Rn Chapter 1 Linear transformations If V and W are vector spaces, a function T : V --+ W is called a linear transformation if it satisfies the following properties, for all x, yin V and all scalars c: (1) T(x + y) ;::: T(x) + T(y). = (2) T(cx) cT(x). If, in addition, T carries V onto W in a one-to-one fashion, then T is called a linear isomorphism. One checks readily that if T : V--+ W is a linear transformation, and if S : W --+ X is a linear transformation, then the composite S o T : V --+ X is a linear transformation. Furthermore, if T : V --+ W is a linear isomorphism, then T- 1 : W--+ V is also a linear isomorphism. A linear transformation is uniquely determined by its values on basis elements, and these values may be specified arbitrarily. That is the substance of the following theorem: Theorem 1.4. Let V be a vector space with basis a1, ... , a,,.. Let W be a vector space. Given any m vectors b 1, ... , bm in W, there is exactly one linear transformation T : V --+ W such that, for all z, = T(ai) bi. □ In the special case where V and W are "tuple spaces" such as nm and R", matrix notation gives us a convenient way of specifying a linear transformation, as we now show. First we discuss row matrices and column matrices. A matrix of size 1 by n is called a row matrix; the set of all such matrices bears an obvious resemblance to Rn. Indeed, under the one-to-one correspondence the vector space operations also correspond. Thus this correspondence is a linear isomorphism. Similarly, a matrix of size n by 1 is called a column matrix; the set of all such matrices also bears an obvious resemblance to Rn. Indeed, the correspondence is a linear isomorphism. The second of these isomorphisms is particularly useful when studying linear transformations. Suppose for the moment that we represent elements §1. Review of Linear Algebra 7 of Rm and Rn by column matrices rather than by tuples. If A is a fixed n by m matrix, let us define a function T : Rm ~ Rn by the equation T(x) = A ·x. The properties of matrix product imply immediately that T is a linear trans- formation. In fact, every linear transformation of Rm to Rn has this form. The proof = is easy. Given T, let bi, ... , bm be the vectors of Rnsuch that T(e;) h;. = Then let A be the n by m matrix A [b1 • •• bm] with successive columns b 1, ... , bm. Since the identity matrix has columns e1, ... , em, the equation A· Im= A implies that A· e; = h; for all j. Then A· e; = T(e;) for all j; it follows from the preceding theorem that A• x = T(x) for all x. The convenience of this notation leads us to make the following convention: Convention. Throughout, we shall represent the elements of Rn by column matrices, unless we specifically state otherwise. Rank of a matrix Given a matrix A of size n by m, there are several important linear spaces associated with A. One is the space spanned by the columns of A, looked at as column matrices (equivalently, as elements of Rn). This space is called the column space of A, and its dimension is called the column rank of A. Because the column space of A is spanned by m vectors, its dimension can be no larger than m; because it is a subspace of Rn, its dimension can be no larger than n. Similarly, the space spanned by the rows of A, looked at as row matrices (or as elements of Rm) is called the row space of A, and its dimension is called the row rank of A. The following theorem is of fundamental importance: Theorem 1.5. For any matrix A, the row rank of A equals the column rank of A. □ Once one has this theorem, one can speak merely of the rank of a matrix A, by which one means the number that equals both the row rank of A and the column rank of A. The rank of a matrix A is an important number associated with A. One cannot in general determine what this number is by inspection. However, there is a relatively simple procedure called Gauss-Jordan reduction that can be used for finding the rank of a matrix. (It is used for other purposes as well.) We assume you have seen it before, so we merely review its major features here. 8 The Algebra and Topology of Rn Chapter 1 One considers certain operations, called elementary row operations, that are applied to a matrix A to obtain a new matrix B of the same size. They are the following: (1) Exchange rows i1 and i2 of A (where i1 f:. i2). (2) Replace row i1 of A by itself plus the scalar c times row i2 (where i1 j i2). (3) Multiply row i of A by the non-zero scalar A. Each of these operations is invertible; in fact, the inverse of an elementary operation is an elementary operation of the same type, as you can check. One has the following result: Theorem 1.6. If B is the matrix obtained by applying an elemen- tary row operation to A, then rank B = rank A. □ Gauss-Jordan reduction is the process of applying elementary operations to A to reduce it to a special form called echelon form (or stairstep form), for which the rank is obvious. An example of a matrix in this form is the following: @ * * *** B= @ * * ** 0 0@ ** 0 0 0 0 00 Here the entries beneath the "stairsteps" are 0; the entries marked * may be zero or non-zero, and the "corner entries," marked @, are non-zero. (The corner entries are sometimes called "pivots.") One in fact needs only operations of types (1) and (2) to reduce A to echelon form. Now it is easy to see that, for a matrix B in echelon form, the non-zero rows are independent. It follows that they form a basis for the row space of B, so the rank of B equals the number of its non-zero rows. For some purposes it is convenient to reduce B to an even more spe- cial form, called reduced echelon form. Using elementary operations of type (2), one can make all the entries lying directly above each of the corner entries into O's. Then by using operations of type (3), one can make all the corner entries into 1's. The reduced echelon form of the matrix B considered previously has the form: 1 0 * 0 * * C= o'71 * o * * 0 0 011 * * 0 0 0 0 0 0 §1. Review of Linear Algebra 9 It is even easier to see that, for the matrix C, its rank equals the number of its non-zero rows. Transpose of a matrix Given a matrix A of size n by m, we define the transpose of A to be the matrix D of size m by n whose general entry in row i and column j is defined by the equation di;= a;i- The matrix Dis often denoted Atr_ The following properties of the transpose operation are readily verified: (1) (Atryr = A. = + (2) (A+ B)tr Atr Btr. = (a) (A. C)tr Ctr. Atr. (4) rank Atr = rank A. The first three follow by direct computation, and the last from the fact that the row rank of Atr is obviously the same as the column rank of A. EXERCISES = 1. Let V be a vector space with inner product (x, y} and norm llxll (x, x}1/2. (a) Prove the Cauchy-Schwarz inequality (x, y} $ llxll IIYII• [Hint: = If x, y -=/:- 0, set c = 1/llxll and d 1/IIYII and use the fact that llcx ± dyll 2: O.] (b) Prove that llx + YII $ llxll + IIYII • [Hint: Compute (x + Y, x + y) and apply (a).] (c) Prove that llx - YII 2: IJxll - IIYll- 2. If A is an n by m matrix and Bis an m by p matrix, show that IA· Bl$ mlAI IBI. 3. Show that the sup norm on R2 is not derived from an inner product on R2 . [Hint: Suppose (x, y) is an inner product on R2 (not the dot product) = having the property that lxl (x, y)112 . Compute (x ± y, x ± y} and = = apply to the case x e1 and y e2.] = = 4. (a) If x (X1, X2) and y (Y1, Y2), show that the function [ 2 - 1] [Y1] -1 1 Y2 is an inner product on R2 . *(b) Show that the function (x, y) = [x1 x2] [ ab be] [YY12] is an inner product on R2 if and only if b2 - ac < 0 and a > 0. 10 The Algebra and Topology of Rn Chapter 1 *5. Let V be a vector space; let {aa} be a set of vectors of V, as a ranges over some index set J (which may be infinite). We say that the set {aa} spans V if every vector x in V can be written as a finite linear combination of vectors from this set. The set {a 0 } is independent if the scalars are uniquely determined by x. The set {aa} is a basis for V if it both spans V and is independent. (a) Check that the set R"'of all "infinite-tuples" of real numbers is a vector space under component-wise addition and scalar multiplication. = (b) Let R00 denote the subset of R"' consisting of all x (.r1, X2, ...) = such that x, 0 for all but finitely many values of i. Show R00 is a subspace of R"'; find a basis for R00 . (c) Let :F be the set of all real-valued functions/: [a, b] - R. Show that :F is a vector space if addition and scalar multiplication are defined in the natural way: (! + g)(x) = f (x) + g(x), (cf)(x) = cf(x). (d) Let :Fs be the subset of :F consisting of all bounded functions. Let :F1 consist of all integrable functions. Let :Fe consist of all continuous functions. Let :Fo consist of all continuously differentiable functions. Let :Fp consist of all polynomial functions. Show that each of these is a subspace of the preceding one, and find a basis for :Fp. There is a theorem to the effect that every vector space has a basis. The proof is non-constructive. No one has ever exhibited specific bases for the vector spaces R"', :F, :Fe, :Fi, :Fe, :Fo. (e) Show that the integral operator and the differentiation operator, (IJ)(x) = /.:1: f (t) dt and (Df)(x) = /'(x), are linear transformations. What are possible domains and ranges of these transformations, among those listed in (d)? Matrix Inversion and Determinants 11 §2. MATRIX INVERSION AND DETERMINANTS We now treat several further aspects of linear algebra. They are the following: elementary matrices, matrix inversion, and determinants. Proofs are included, in case some of these results are new to you. Elementary matrices Definition. An elementary matrix of size n by n is the matrix obtained by applying one of the elementary row operations to the identity ma- trix In. The elementary matrices are of three basic types, depending on which of the three operations is used. The elementary matrix corresponding to the first elementary operation has the form 1 1 0 1 1 0 1 1 The elementary matrix corresponding to the second elementary row operation has the form 1 1 1 E'= 0 C 1 1 1 . . row i2 12 The Algebra and Topology of Rn Chapter 1 And the elementary matrix corresponding to the third elementary row operation has the form 1 1 E" = , row t. One has the following basic result: 1 1 Theorem 2.1. Let A be an n by m matrix. Any elementary row operation on A may be carried out by premultiplying A by the corresponding elementary matrix. Proof. One proceeds by direct computation. The effect of multiplying A on the left by the matrix E is to interchange rows i1 and i2 of A. Similarly, multiplying A by E' has the effect of replacing row i1 by itself plus c times row i2. And multiplying A by E" has the effect of multiplying row i by .A. D We will use this result later on when we prove the change of variables theorem for a multiple integral, as well as in the present section. The inverse of a matrix Definition. Let A be a matrix of size n by m; let B and C be matrices = of size m by n. We say that B is a left inverse for A if B •A Im, and we = say that C is a right inverse for A if A · C In. Theorem 2.2. If A has both a left inverse B and a right inverse C, then they are unique and equal. Proof. Equality follows from the computation If B1 is another left inverse for A, we apply this same computation with B1 replacing B. We conclude that C = B 1; thus B1 and B are equal. Hence B is unique. A similar computation shows that C is unique. D Matrix Inversion and Determinants 13 Definition. If A has both a right inverse and a left inverse, then A is said to be invertible. The unique matrix that is both a right inverse and a left inverse for A is called the inverse of A, and is denoted A- 1 . A necessary and sufficient condition for A to be invertible is that A be square and of maximal rank. That is the substance of the following two theorems: Theorem 2.3. then Let A be a matrix of size n by m. If A is invertible, n =m = mnk A. Proof. Step 1. ,ve show that for any k by n matrix D, rank (D · A) s rank A. The proof is easy. If R is a row matrix of size 1 by n, then R • A is a row matrix that equals a linear combination of the rows of A, so it is an element of the row space of A. The rows of D • A are obtained by multiplying the rows of D by A. Therefore each row of D · A is an element of the row space of A. Thus the row space of D · A is contained in the row space of A and our inequality follows. Step 2. We show that if A has a left inverse B, then the rank of A equals the number of columns of A. = = s The equation Im B · A implies by Step 1 that m rank (B · A) rank A. On the other hand, the row space of A is a subspace of m-tuple space, so that rank A < m. Step 3. We prove the theorem. Let B be the inverse of A. The fact that B is a left inverse for A implies by Step 2 that rank A = m. The fact that B is a right inverse for A implies that whence by Step 2, rank A= n. □ We prove the converse of this theorem in a slightly strengthened version: Theorem 2.4. Let A be a matrix of size n by m. Suppose n =m = rank A. Then A is invertible; and furthermore, A equals a product of elementary matrices. 14 The Algebra and Topology of Rn Chapter 1 Proof. Step 1. We note first that every elementary matrix is invert- ible, and that its inverse is an elementary matrix. This follows from the fact that elementary operations are invertible. Alternatively, you can check di- rectly that the matrix E corresponding to an operation of the first type is its own inverse, that an inverse for E' can be obtained by replacing c by -c in the formula for E', and that an inverse for E" can be obtained by replacing A by 1/ A in the formula for E". Step 2. We prove the theorem. Let A be an n by n matrix of rank n. Let us reduce A to reduced echelon form C by applying elementary row operations. Because C is square and its rank equals the number of its rows, C must equal the identity matrix In. It follows from Theorem 2.1 that there is a sequence E1, ... , E1i: of elementary matrices such that If we multiply both sides of this equation on the left by E;1, then by E;!1 , and so on, we obtain the equation A -- E-11 . E-21 ••• E-k1'. thus A equals a product of elementary matrices. Direct computation shows that the matrix is both a right and a left inverse for A. □ One very useful consequence of this theorem is the following: Theorem 2.5. If A is a square matrix and if B is a left inverse for A, then B is also a right inverse for A. Proof. Since A has a left inverse, Step 2 of the proof of Theorem 2.3 implies that the rank of A equals the number of columns of A. Since A is square, this is the same as the number of rows of A, so the preceding theorem implies that A has an inverse. By Theorem 2.2, this inverse must be B. □ An n by n matrix A is said to be singular if rank A < n; otherwise, it is said to be non-singular. The theorems just proved imply that A is invertible if and only if A is non-singular. Determinants The determinant is a function that assigns, to each square matrix A, a number called the determinant of A and denoted 0, and equality holds if and only if x y. (3) d(x,z) < d(x,y) + d(y, z). A metric space is a set X together with a specific metric on X. We often suppress mention of the metric, and speak simply of "the metric space X ." If X is a metric space with metric d, and if Y is a subset of X, then the restriction of d to the set Y x Y is a metric on Y; thus Y is a metric space in its own right. It is called a subspace of X. For example, Rn has the metrics d(x, Y) = II x - Y II and d(x,y) = Ix - y I; they are called the euclidean metric and the sup metric, respectively. It follows immediately from the properties of a norm that they are metrics. For many purposes, these two metrics on Rn are equivalent, as we shall see. We shall in this book be concerned only with the metric space Rn and its subspaces, except for the expository final section, in which we deal with general metric spaces. The space Rn is commonly called n-dimensional euclidean space. If X is a metric space with metric d, then given Xo E X and given f > 0, the set 26 The Algebra and Topology of Rn Chapter 1 is called the £-neighborhood of x0 , or the £-neighborhood centered at x0 . A subset U of X is said to be open in X if for each x0 E U there is a corresponding f > 0 such that U(x0;E) is contained in U. A subset C of X is said to be closed in X if its complement X - C is open in X. It follows from the triangle inequality that an £-neighborhood is itself an open set. If U is any open set containing x 0 , we commonly refer to U simply as a neighborhood of x 0 . Theorem 3.1. Let (X, d) be a metric spa.ce. Then finite intersections and arbitrary unions of open sets of X are open in X. Similarly, finite unions and arbitrary intersections of closed sets of X are closed in X. □ Theorem 3.2. Let X be a metric space; let Y be a subspace. A subset A of Y is open in Y if and only if it has the form A= UnY, where U is open in X. Similarly, a subset A of Y is closed in Y if and only if it has the form A=CnY, where C is closed in X. □ It follows that if A is open in Y and Y is open in X, then A is open in X. Similarly, if A is closed in Y and Y is closed in X, then A is closed in X. If X is a metric space, a point x 0 of X is said to be a limit point of the subset A of X if every £-neighborhood of Xo intersects A in at least one point different from Xo. An equivalent condition is to require that every neighborhood of x0 contain infinitely many points of A. Theorem 3.3. If A is a subset of X, then the set A consisting of A and all its limit points is a closed set of X. A subset of Xis closed if and only if it contains all its limit points. □ The set A is called the closure of A. In Rn, the £-neighborhoods in our two standard metrics are given special names. If a E Rn, the £-neighborhood of a in the euclidean metric is called the open ball of radius f centered at a, and denoted B(a; f). The £-neighborhood of a in the sup metric is called the open cube of radius f centered at a, and denoted C(a; f). The inequalities Ix [ < II x II < y'n Ix I lead to the following inclusions: vn B(a; £) C C(a; E) C B(a; £). These inclusions in turn imply the following: §3. Review of Topology in Rn 27 Theorem 3.4. If X is a subspace of Rn, the collection of open sets of X is the same whether one uses the euclidean metric or the sup metric on X. The same is true for the collection of closed sets of X. □ In general, any property of a metric space X that depends only on the collection of open sets of X, rather than on the specific metric involved, is called a topological property of X. Limits, continuity, and compactness are examples of such, as we shall see. Limits and Continuity Let X and Y be metric spaces, with metrics dx and dy, respectively. We say that a function f: X -+ Y is continuous at the point Xo of X if for each open set V of Y containing f(xo), there is an open set U of X containing Xo such that f (U) C V. \,Ve say f is continuous if it is continuous at each point x 0 of.\'". Continuity off is equivalent to the requirement that for each open set V of Y, the set is open in X, or alternatively, the requirement that for each closed set D of Y, the set f- 1(D) is closed in X. Continuity may be formulated in a way that involves the metrics specif- ically. The function f is continuous at x 0 if and only if the following holds: For each f > O, there is a corresponding 8 > 0 such that dy(f(x), f(xo)) < f whenever dx(x, xo) < .i. This is the classical ('f-D formulation of continuity." Note that given Xo E X it may happen that for some 8 > 0, the 8- neighborhood of Xo consists of the point Xo alone. In that case, x 0 is called an isolated point of X, and any function f: X-+ Y is automatically continuous at xo! A constant function from X to Y is continuous, and so is the identity function ix: X-+ X. So are restrictions and composites of continuous functions: Theore1n 3.5. (a) Let xo E A, where A is a subspace of X. If f : X -+ Y is continuous at x0 , then the restricted Junction f IA: A -+ Y is continuous at x 0 . {b) Let f: X-+ Y and g: Y-+ Z. If f is continuous at x0 and g is continuous at Yo = f(x 0 ), then go f: X-+ Z is continuous at x 0 • □ Theorem 3.6. the form (a) Let X be a metric space. Let f: X-+ nn have f(x) = (f1(x), ... ,fn(x)). 28 The Algebra and Topology of R" Chapter 1 Then J is continuous at x0 if and only if each function Ji :X --+ R is continuous at x 0 . The functions Ji are called the component functions off. (b) Let J,g: X-+ R be continuous at xo. Then J + g and f - g and J ·g are continuous at xo; and f/g is continuous at Xo if g(x0 ) # 0. = (c) The projection function 'Tri :R" -+ R given by 1r,(x) Xi is con- tinuous. □ These theorems imply that functions formed from the familiar real-valued continuous functions of calculus, using algebraic operations and composites, are continuous in R". For instance, since one knows that the functions ex and sin x are continuous in R, it follows that such a function as f(s, t, u, v) = (sin(s + t))/euv is continuous in R4 . Now we define the notion of limit. Let X be a metric space. Let ACX and let J: A-+ Y. Let Xo be a limit point of the domain A of J. (The point Xo may or may not belong to A.) We say that f(x) approaches Yo as x approaches Xo if for each open set V of Y containing y0 , there is an open set U of X containing Xo such that f(x) E V whenever x is in Un A and x # Xo. This statement is expressed symbolically in the form f(x)-+ Yo as x-+ Xo- We also say in this situation that the limit of f(x), as x approaches Xo, is Yo- This statement is expressed symbolically by the equation lim f(x) = Yo• x-xo Note that the requii-ement that x 0 be a limit point of A guarantees that there exist points x different from x0 belonging to the set Un A. We do not attempt to define the limit off if x0 is not a limit point of the domain of J. Note also that the value off at Xo (provided f is even defined at xo) is not involved in the definition of the limit. The notion of limit can be formulated in a way that involves the metrics specifically. One shows readily that f(x) approaches Yo as x approaches Xo if and only if the following condition holds: For each € > 0, there is a corresponding 6 > 0 such that o. dy(f(x), y0 ) < € whenever x E A and O< dx(x, x0) < There is a direct relation between limits and continuity; it is the following: Review of Topology in R" 29 Theorem 3. 7. Let f: X --+ Y. If x 0 is an isolated point of X, then f is continuous at x 0 • Otherwise, f is continuous at x 0 if and only if f (X) --+ f (XO) as X --+ XO , □ Most of the theorems dealing with continuity have counterparts that deal with limits: Thcoren1 3.8. (a) Let A C X; let f: A --+ R" have the form f(x) =(f1(x), ... , fn(x)). Let a= (a1, ... , an)- Then f(x) --+ a as x--+ Xo if and only if fi(x)--+ ai as x --+ xo, for each i. (b) Let f,g: A--+ R. If f(x) --+ a and g(x)--+ b as x --+ xo, then as X --+ Xo, J(x) + g(x)--+ a+ b, J(x) - g(x)--+ a - b, f(x) •g(x)--+ a• b; also, f(x)/g(x)--+ a/b if b # 0. D Interior and Exterior The following concepts make sense in an arbitrary metric space. Since we shall use them only for R", we define them only in that case. Definition. Let A be a subset of Rn. The interior of A, as a subset of Rn, is defined to be the union of all open sets of R" that are contained in A; it is denoted Int A. The exterior of A is defined to be the union of all open sets of R" that are disjoint from A; it is denoted Ext A. The boundary of A consists of those points of Rn that belong neither to Int A nor to Ext A; it is denoted Bd A. A point x is in Bd A if and only if every open set containing x intersects both A and the complement R" - A of A. The space R" is the union of the disjoint sets Int A, Ext A, and Bd A; the first two are open in nn and the third is closed in Rn. For example, suppose Q is the rectangle consisting of all points x of R" such that ai < Xi < bi for all i. You can check that 30 The Algebra and Topology of Rn Chapter 1 We often call Int Q an open rectangle. Furthermore, Ext Q = R" - Q and Bd Q = Q - Int Q. An open cube is a special case of an open rectangle; indeed, The corresponding (closed) rectangle is often called a closed cube, or simply a cube, centered at a. EXERCISES Throughout, let X be a metric space with metric d. 1. Show that U(x0 ; t:) is an open set. 2. Let Y C X. Give an example where A is open in Y but not open in X. Give an example where A is closed in Y but not closed in X. 3. Let ACX. Show that if C is a closed set of X and C contains A, then C contains A. 4. (a) Show that if Q is a rectangle, then Q equals the closure of Int Q. (b) If Dis a closed set, what is the relation in general between the set D and the closure of Int D? (c) If U is an open set, what is the relation in general between the set U and the interior of U? 5. Let /: X - Y. Show that / is continuous if and only if for each x EX there is a neighborhood U of x such that / IU is continuous. 6. Let X = AU B, where A and B are subspaces of X. Let /: X - Y; suppose that the restricted functions /IA:A-Y and /IB:B-Y are continuous. Show that if both A and B are closed in X, then / is continuous. 7. Finding the limit of a composite function go f is easy if both / and g are continuous; see Theorem 3.5. Otherwise, it can be a bit tricky: Let/ :X - Y and g: Y - Z. Let Xo be a limit point of X and let Yo be a limit point of Y. See Figure 3.1. Consider the following three conditions: (i) / (x) - Yo as x - Xo. (ii) g(y) - Zo as y - Yo, (iii) g(f (x)) - zo as x - Xo. (a) Give an example where (i) and (ii) hold, but (iii) does not. = (b) Show that if (i) and (ii) hold and if g(y0 ) zo, then (iii) holds. Review of Topology in R" 31 = 8. Let f: R - R be defined by setting f(x) sin z if z is rational, and /(x) = 0 otherwise. At what points is/ continuous? 9. If we denote the general point of R2 by (z, y), determine Int A, Ext A, a.nd Bd A for the subset A of R2 specified by each of the following conditions: = (a) x 0. (e) x and y are rational. (b) 0 $ X < 1. (f) 0 < x2 + y2 < 1. (c) 0 :5 x < 1 and O:5 y < 1. (g) y < x2 • (d) xis rational and y > 0. (h) y :5 x 2 . I g • --· y • Zo Yo y Figure 3.1 32 The Algebra and Topology of Rn Chapter 1 §4. COMPACT SUBSPACES ANO CONNECTED SUBSPACES OF R" An important class of subspaces of Rn is the class of compact spaces. We shall use the basic properties of such spaces constantly. The properties we shall need are summarized in the theorems of this section. Proofs are included, since some of these results you may not have seen before. A second useful class of spaces is the class of connected spaces; we summarize here those few properties we shall need. We do not attempt to deal here with compactness and connectedness in arbitrary metric spaces, but comment that many of our proofs do hold in that more general situation. Compact spaces Definition. Let X be a subspace of Rn. A covering of Xis a collection of subsets of R" whose union contains X; if each of the subsets is open in Rn, it is called an open covering of X. The space X is said to be compact if every open covering of X contains a finite subcollection that also forms an open covering of X. While this definition of compactness involves open sets of R", it can be reformulated in a manner that involves only open sets of the space X: Theorem 4.1. A subspace X of Rn is compact if and only if for every collection of sets open in X whose union is X, there is a finite subcollection whose union equals X. Proof. Suppose X is compact. Let {A0} be a collection of sets open in X whose union is X. Choose, for each a, an open set Ua of R" such = that A 0 U0 n .X. Since X is compact, some finite subcollection of the = collection {Ua} covers X, say for a a 1, ... , O'k. Then the sets A0 , for a = O::i, ... , ak, have X as their union. The proof of the con verse is similar. D The following result is always proved in a first course in analysis, so the proof will be omitted here: Theorem 4.2. The subspace [a, b] of R is compact. □ Definition. A subspace X of Rn is said to be bounded if there is an M such that lxl < l.1 for all x EX. We shall eventually show that a subspace of Rn is compact if and only if it is closed and bounded. Half of that theorem is easy; we prove it now: §4. Compact Subspaces and Connected Subspaces of Rn 33 Theorem 4.3. If X is a compact subspace of Rn, then X is closed and bounded. Proof. Step 1. We show that X is bounded. For each positive integer N, let UN denote the open cube UN= C(O;N). Then UN is an open set; and U1 C U2 C • • •; and the sets UN cover all of R" (so in particular they cover X). Some finite subcollection also covers X, say for N =Ni, ... ,N1;. If M is the largest of the numbers N 1, ... , N 1;, then X is contained in UM; thus X is bounded. Step 2. We show that X is closed by showing that the complement of X is open. Let a be a point of R" not in X; we find an £-neighborhood of a that lies in the complement of X. For each positive integer N, consider the cube CN={x;Jx-al <1/N}. Then C1 :) C2 :) • • •, and the intersection of the sets CN consists of the point a alone. Let VN be the complement of CN; then VN is an open set; and V1 C V2 C • • •; and the sets VN cover all of R"except for the point a (so they cover X). Some finite subcollection covers X, say for N = N1, ... , N1;. If M is the largest of the numbers Ni, ... , Nk, then X is contained in VM. Then the set CM is disjoint from X, so that in particular the open cube C(a; 1/M) lies in the complement of X. See Figure 4.1. D X Figure 4.1 Corollary 4.4. Let X be a compact subspace of R. Then X has a largest element and a smallest element. 34 The Algebra and Topology of Rn Chapter 1 Proof. Since X is bounded, it has a greatest lower bound and a least upper bound. Since X is closed, these elements must belong to X. □ Here is a basic (and familiar) result that is used constantly: Theorem 4.5 {Extre1ne-value theorem). Let X be a compact subspace of Rm. If f : X - Rn is continuous, then f (X) is a compact subspace of Rn. In particular, if : X - R is continuous, then has a maximum value and a minimum value. Proof. Let {Va} be a collection of open sets of Rn that covers f(X). The sets f- 1(Va) form an open covering of X. Hence some finitely many of them cover X, say for a= a1, ... ,ak. Then the sets Va for a= 01, ... ,ak cover f (X). Thus f (X) is compact. Now if : X --+ R is continuous, (X) is compact, so it has a largest element and a smallest element. These are the maximum and minimum values of . □ Now we prove a result that may not be so familiar. Definition. Let ..Y be a subset of nn. Given f > 0, the union of the sets B(a; f), as a ranges over all points of X, is called the €-neighborhood of X in the euclidean metric. Similarly, the union of the sets C(a; f) is called the €-neighborhood of X in the sup metric. Theoren1 4.6 {The €-neighborhood theorem). Let X be a com- pact subspace of R"; let U be an open set of R"containing X. Then there is an l > 0 such that the €-neighborhood of X (in either metric) is contained in U. Proof. The €-neighborhood of X in the euclidean metric is contained in the €-neighborhood of X in the sup metric. Therefore it suffices to deal only with the latter case. Step 1. Let C be a fixed subset of R". For each x E R", we define d(x, C) = inf {Ix - c I; c E C}. We call d(x,C) the distance from x to C. We show it is continuous as a function of x: Let c EC; let x, y ER". The triangle inequality implies that d(x,C)- lx-yl < Ix-cl- lx-yl < ly-cl, §4. Compact Subspaces and Connected Subspaces of Rn 35 This inequality holds for all c E C; therefore d(x,C)- lx-yl 0 for all x EX. For if x EX, then some c5-neighborhood of xis contained in U, whence J(x) > c5. Because X is compact, f has a minimum value €. Because f takes on only positive values, this minimum value is positive. Then the €-neighborhood of X is contained in U. □ This theorem does not hold without some hypothesis on the set X. If X is the x-axis in R2, for example, and U is the open set then there is no € such that the €-neighborhood of X is contained in U. See Figure 4.2. Figure 4.2 Here is another familiar result. Theorem 4.7 (Uniform continuity). Let X be a compact subspace of Rm; let f : X -+ Rn be continuous. Given € > 0, there is a c5 > 0 such that whenever x, y E X, Ix - y I < c5 implies I/(x) - /(y) I < €. 36 The Algebra and Topology of nn Chapter 1 This result also holds if one uses the euclidean metric instead of the sup metric. The condition stated in the conclusion of the theorem is called the condition of uniform continuity. Proof. Consider the subspace X X X of nm X nm; and within this, consider the space .6. = { (x, x) Ix EX}, which is called the diagonal of X x X. The diagonal is a compact subspace of R2m, since it is the image of the compact space X under the continuous map f(x) = (x, x). We prove the theorem first for the euclidean metric. Consider the function g : X x X ~ R defined by the equation g(x, y) = II f (x) - f (y) II• Then consider the set of points (x, y) of X x X for which g(x, y) < €. Because g is continuous, this set is an open set of X x X. Also, it contains the diagonal .6., since g(x, x) = 0. Therefore, it equals the intersection with X x X of an open set U of Rm x Rm that contains .6.. See Figure 4.3. (x,y) .,...,__,_-(y' y) X X Figure 4.3 Compactness of .6. implies that for some 6, the 6-neighborhood of .6. is contained in U. This is the fJ required by our theorem. For if x, y E X with llx-yll <6,then II (x, Y) - (y, Y) II = ll (x - Y, 0) II = II x - Y II < c5, so that (x,y) belongs to the 6-neighborhood of the diagonal .6.. Then (x,y) belongs to U, so that g(x, y) < €, as desired. Compact Subspaces and Connected Subspaces of Rn 37 The corresponding result for the sup metric can be derived by a similar proof, or simply by noting that if Ix-y I < 8/ fa, then II x -y II < 8, whence I/(x) - /(y) I < II f(x) - f (y) II < €. □ To complete our characterization of the compact subspaces of Rn, we need the following lemma: Lemma 4.8. The rectangle = Q [a1, b1] X • • • X [an, bn] in Rn is compact. = Proof. We proceed by induction on n. The lemma is true for n I; we suppose it true for n - 1 and prove it true for n. We can write where X is a rectangle in Rn- 1. Then X is compact by the induction hypothesis. Let A be an open covering of Q. Step 1. We show that given t E [an, bn], there is an € > 0 such that the set + X X (t - €, t €) can be covered by finitely many elements of A. The set X x t is a compact subspace of Rn, for it is the image of X under = the continuous map / : X _,. Rn given by f (x) (x, t). Therefore it may be covered by finitely many elements of A, say by A1, ... , Ak. Let U be the union of these sets; then U is open and contains Xx t. See Figure 4.4. u t Xx t X Figure 4.4 Because X x t is compact, there is an € > 0 such that the €-neighborhood of Xx tis contained in U. Then in particular, the set Xx (t - €, t + €) is contained in U, and hence is covered by A1, ... , A1:. 38 The Algebra and Topology of Rn Chapter 1 Step 2. By the result of Step 1, we may for each t E [an, bn] choose an open interval V, about t, such that the set X x V, can be covered by finitely many elements of the collection A. Now the open intervals ¼ in R cover the interval [an, bn]; hence finitely many of them cover this interval, say for t = t1, ... , tm. Then Q = X x (an, bn] is contained in the union of the sets X x Vi = for t t1, ... , tm; since each of these sets can be covered by finitely many elements of A, so may Q be covered. D Theorem 4.9. If X is a closed and bounded subspace of R", then X is compact. Proof. Let A be a collection of open sets that covers X. Let us adjoin to this collection the single set R" - X, which is open in Rn because X is closed. Then we have an open covering of all of R". Because X is bounded, we can choose a rectangle Q that contains X; our collection then in particular covers Q. Since Q is compact, some finite subcollection covers Q. If this finite sub collection contains the set R" - X, we discard it from the collection. We then have a finite sub collection of the collection A; it may not cover all of Q, but it certainly covers X, since the set R" - X we discarded contains no point of X. □ All the theorems of this section hold if Rn and nm are replaced by arbitrary metric spaces, except for the theorem just proved. That theorem does not hold in an arbitrary metric space; see the exercises. Connected spaces If X is a metric space, then X is said to be connected if X cannot be written as the union of two disjoint non-empty sets A and B, each of which is open in X. The following theorem is always proved in a first course in analysis, so the proof will be omitted here: Theoren1 4.10. The closed interval (a, b] of R" is connected. □ The basic fact about connected spaces that we shall use is the following: Theorein 4.11 (Inter1nediate.value theorem). Let X be connected. If f : X --+ Y is continuous, then f(X) is a connected subspace of Y. = In particular, if r. Then A and B are open in R; if the set f (X) does not contain r, then f (X) is the union of the disjoint sets f(X) n A and f(X) n B, each of which is open in f (X). This contradicts connectedness off(X). □ If a and b are points of Rn, then the line segment joining a and b is = defined to be the set of all points x of the form x a+ t(b - a), where 0 :s; t < 1. Any line segment is connected, for it is the image of the interval [O, 1) under the continuous map t -- a+ t(b - a). A subset A of R" is said to be convex if for every pair a,b of points of A, the line segment joining a and b is contained in A. Any convex subset A of Rn is automatically connected: For if A is the union of the disjoint sets U and V, each of which is open in A, we need merely choose a in U and b in V, and note that if L is the line segment joining a and b, then the sets Un L and V n L are disjoint, non-empty, and oper. in L. It follows that in R" all open balls and open cubes and rectangles are connected. (See the exercises.) EXERCISES 1. Let R+ denote the set of positive real numbers. = (a) Show that the continuous function f : R+ --+- R given by f (x) 1/(l+x) is bounded but has neither a maximum value nor a minimum value. (b) Show that the continuous function g : R+ - R given by g(x) = sin( 1/ x) is bounded but does not satisfy the condition of uniform continuity on R+. 2. Let X denote the subset (-1, 1) X 0 of R2 , and let U be the open ball B(O; 1) in R2 , which contains X. Show there is no£ > 0 such that the £-neighborhood of X in R" is contained in U. = 3. Let RO() be the set of all "infinite-tuples" x (x1, X2, ... ) of real numbers that end in an infinite string of O's. (See the exercises of § 1.) Define an inner product on RO() by the rule (x, y) :;:; Ex,y,. (This is a finite sum, since all but finitely many terms vanish.) Let II x - y II be the corresponding metric on R00 • Define e, = {0 O 1, 0 I • • • l I I • • • IO I • • •) I where 1 appears in the i' h place. Then the e, form a basis for R00 • Let X be the set of all the points ei. Show that X is closed, bounded, and non-compact. 40 The Algebra and Topology of R" Chapter 1 4. (a) Show that open balls and open cubes in Rnare convex. (b) Show that (open and closed) rectangles in Rn are convex. Differentiation In this chapter, we consider functions mapping Rm into Rn, and we define what we mean by the derivative of such a function. Much of our discussion will simply generalize facts that are already familiar to you from calculus. The two major results of this chapter are the inverse function theorem, which gives conditions under which a differentiable function from Rn to Rn has a differentiable inverse, and the implicit function theorem, which provides the theoretical underpinning for the technique of implicit differentiation as studied in calculus. Recall that we write the elements of Rm and Rn as column matrices unless specifically stated otherwise. §5. THE DERIVATIVE First, let us recall how the derivative of a real-valued function of a real variable is defined. Let A be a subset of R; let : A --+ R. Suppose A contains a neighbor- hood of the point a. We define the derivative of at a by the equation "..,( V/ a ) -- 1· 1m ¢,(a+ t)t - ¢,(a) , t-+0 provided the limit exists. In this case, we say that is differentiable at a. The following facts are an immediate consequence: 41 42 Differentiation Chapter 2 (1) Differentiable functions are continuous. (2) Composites of differentiable functions are differentiable. We seek now to define the derivative of a function f mapping a subset of Rm into nn. We cannot simply replace a and tin the definition just given by points of Rm, for we cannot divide a point of Rn by a point of Rm if m > 1! Here is a first attempt at a definition: Definition. Let A C Rm; let f : A - Rn. Suppose A contains a neighborhood of a. Given u E Rm with u f. 0, define ! '(a; u ) = 11. m /(a+ tut) - J(a) , t-o provided the limit exists. This limit depends both on a and on u; it is called the directional derivative of / at a with respect to the vector u. (In calculus, one usually requires u to be a unit vector, but that is not necessary.) EXAMPLE 1. Let / : R2 ---+ R be given by the equation = The directional derivative of / at a (a1, a2) with respect to the vector u=(l,O)is = = / '(a; u ) 11. m (a1 + t)at2 - a1a2 a2. t-o With respect to the vector v = (1, 2), the directional derivative is = = / '(a; v ) 11. m (a1 + t) (a2 t+ 2t) - a1a2 + a2 2a1 . t-0 It is tempting to believe that the "directional derivative" is the appropri- ate generalization of the notion of "derivative," and to say that f is differen- tiable at a if f'(a; u) exists for every u f. 0. This would not, however, be a very useful definition of differentiability. It would not follow, for instance, that differentiability implies continuity. (See Example 3 following.) Nor would it follow that composites of differentiable functions are differentiable. (See the exercises of § 7.) So we seek something stronger. In order to motivate our eventual definition, let us reformulate the defi- nition of differentiability in the single-variable case as follows: Let A be a subset of R; let : A - R. Suppose A contains a neighbor- hood of a. We say that is differentiable at a if there is a number A such that (a + t) - (a) - At _ 0 t as t-+- 0. §s. The Derivative 43 The number A, which is unique, is called the derivative of at a, and denoted ' (a). This formulation of the definition makes explicit the fact that if is differ- entiable, then the linear function At is a good approximation to the "increment function" (a+ t)- (a); we often call At the "first-order approximation" or the "linear approximation" to the increment function. Let us generalize this version of the definition. If AC nm and if/ : A--+ R", what might we mean by a "first-order" or "linear" approximation to the increment function /(a+ h) - /(a)? The natural thing to do is to take a function that is linear in the sense of linear algebra. This idea leads to the following definition: Definition. Let A C Rm, let f : A --+ R". Suppose A contains a neighborhood of a. We say that / is differentiable at a if there is an n by m matrix B such that f (a+ h) - f (a) - B •h _.. 0 lhl as h- 0. The matrix B, which is unique, is called the derivative off at a; it is denoted Df(a). Note that the quotient of which we are taking the limit is defined for h in some deleted neighborhood of O, since the domain off contains a neighborhood of a. Use of the sup norm in the denominator is not essential; one obtains an equivalent definition if one replaces Ih I by II h II- It is easy to see that B is unique. Suppose C is another matrix satisfying this condition. Subtracting, we have (C-B) ·h lhl - 0 = as h - 0. Let u be a fixed vector; set h tu; let t --+ O. It follows that (C - B) · u = O. Since u is arbitrary, C = B. EXAMPLE 2. Let/: Rm - R" be defined by the equation /(x) = B •x + b, where B is an n by m matrix, and b E Rn. Then / is differentiable and D f (x) = B. Indeed, since = /(a+ h) - /(a) B • h, the quotient used in defining the derivative vanishes identically. 44 Differentiation Chapter 2 We now show that this definition is stronger than the tentative one we gave earlier, and that it is indeed a "suitable" definition of differentiability. Specifically, we verify the following facts, in this section and those following: (1) Differentiable functions are continuous. (2) Composites of differentiable functions are differentiable. (3) Differentiability off at a implies the existence of all the directional derivatives of f at a. We also show how to compute the derivative when it exists. Theorem 5.1. Let A C Rm; let f : A - R". If f is differentiable at a, then all the directional derivatives off at a exist, and f'(a; u) = D f (a)• u. Proof. Let B = D f (a). Seth= tu in the definition of differentiability, where t :/ 0. Then by hypothesis, f (a + tu) - / (a) - B · tu ~ 0 ltul as t - 0. If t approaches Othrough positive values, we multiply (*) by lul to conclude that f (a + tu) - / (a) _ B . u ~ 0 t as t - 0, as desired. If t approaches O through negative values, we multiply (*) by -lul to reach the same conclusion. Thus f'(a; u) = B •u. D = EXAMPLE 3. Define /: R2 - R by setting /(0) 0 and We show all directional derivatives of/ exist at 0, but that / is not differen- tiable at 0. Let u =f:. 0. Then /(0 + tu) - /(0) _ (th)2 (tk) !_ if u = [hkl t (th)• + (tk)2 t so that /'(0; u) = { h2 /k o ~£ 1£ k k ==f:. 0, o. §s. The Derivative 45 Thus f'(O; u) exists for all u =/ 0. However, the function f is not differentiable at 0. For if g : R2 -+ R is a function that is differentiable at O, then Dg(O) is a 1 by 2 matrix of the form [a b], and g'(O; u) =ah+ bk, which is a linear function of u. But /'(O; u) is not a linear function of u. The function f is particularly interesting. It is differentiable (and hence continuous) on each straight line through the origin. (In fact, on the straight = line y mx, it has the value mx/(m2 + x 2).) But f is not differentiable at the origin; in fact, f is not even continuous at the origin! For / has value 0 at the origin, while arbitrarily near the origin are points of the form (t, t2), at which f has value 1/2. See Figure 5.1. Figure 5.1 Thcoren1 5.2. Let A C Rm; let f : A --+ Rn. If f is differentiable at a, then f is continuous at a. = Proof. Let B D f (a). For h near O but different from 0, write By hypothesis, the expression in brackets approaches 0 as h approaches 0. Then, by our basic theorems on limits, lim [f(a + h) - f (a)] = 0. h-O Thus f is continuous at a. D We shall deal with composites of differentiable functions in § 7. Now we show how to calculate D f(a), provided it exists. We first intro- duce the notion of the "partial derivatives" of a real-valued function. 46 Differentiation Chapter 2 Definition. Let A C Rm; let f : A - R. We define the Ph partial derivative off at a to be the directional derivative of/ at a with respect to the vector ei, provided this derivative exists; and we denote it by D1 f (a). That is, Partial derivatives are usually easy to calculate. Indeed, if we set then the ph partial derivative of / at a equals, by definition, simply the = ordinary derivative of the function ¢> at the point t a;. Thus the partial derivative D; f can be calculated by treating X1, ... , X;-1, x;+ 1 , ... , Xm as constants, and differentiating the resulting function with respect to x1, using the familiar differentiation rules for functions of a single variable. We begin by calculating the derivative c• f in the case where f is a real-valued function. Theorem 5.3. Let A c Rm; let f : A - R. If f is differentiable at a, then That is, if D /(a) exists, it is the row matrix whose entries are the partial derivatives of/ at a. Proof. By hypothesis, D f (a) exists and is a matrix of size 1 by m. Let It follows (using Theorem 5.1) that We generalize this theorem as follows: Theorem 5.4. Let A C Rm; let f : A - Rn. Suppose A contains a neighborhood of a. Let Ji : A - R be the ith component function off, so that f (x) = [ /1(:x)] . fn(x) §5. The Derivative 47 (a) The function f is differentiable at a if and only if each component function Ji is differentiable at a. (b) If f is differentiable at a, then its derivative is the n by m matrix whose ith row is the derivative of the function Ji. This theorem tells us that Df(a) = [ : D/1(a)] , Dfn(a) so that D /(a) is the matrix whose entry in row i and column j is D;/i(a). Proof. Let B be an arbitrary n by m matrix. Consider the function F(l ) _ /(a+ h) - /(a) - B •h l - lhl ' which is defined for O < lhl < € (for some€). Now F(h) is a column matrix of size n by 1. Its ith entry satisfies the equation = F.·(h) fi(a + h) - /i(a) - (row i of B) •h ' lhl • Leth approach 0. Then the matrix F(h) approaches O if and only if each of its entries approaches 0. Hence if B is a matrix for which F(h) - O, then the ith row of Bis a matrix for which ~(h) - 0. And conversely. The theorem follows. D Let AC Rffl and /: A - Rn. If the partial derivatives of the component functions Ji of/ exist at a, then one can form the matrix that has D;/i(a) as its entry in row i and column j. This matrix is called the Jacobian matrix of f. If f is differentiable at a, this matrix equals D /(a). However, it is possible for the partial derivatives, and hence the Jacobian matrix, to exist, without it following that f is differentiable at a. (See Example 3 preceding.) This fact leaves us in something of a quandary. We have no convenient way at present for determining whether or not a function is differentiable (other than going back to the definition). We know that such familiar functions as sin(xy) and + xy 2 ze:cy have partial derivatives, for that fact is a consequence of familiar theorems from single-variable analysis. But we do not know they are differentiable. We shall deal with this problem in the next section. 48 Differentiation Chapter 2 = = REMARK. If m 1 or n 1, our definition of the derivative is simply a reformulation, in matrix notation, of concepts familiar from calculus. For instance, if/ : R1 --+ R3 is a differentiable function, its derivative is the column matrix r/{(t)] D f (t) = /Ht) . n(t) In calculus, f is often interpreted as a parametrized-curve, and the vector is called the velocity vector of the curve. (Of course, in calculus one is apt to k use i,J, and for the unit basis vectors in R3 rather than e1 ,e2 , and e3 .) For another example, consider a differentiable function g : R3 --+ R1 . Its derivative is the row matrix and the directional derivative equals the matrix product Dg(x)-u. In calculus, the function g is often interpreted as a scalar field, and the vector field is called the gradient of g. (It is often denoted by the symbol ~g.) The directional derivative of g with respect to u is written in calculus as the dot product of the vectors grad g and u. Note that vector notation is adequate for dealing with the derivative of / when either the domain or the range of f has dimension 1. For a general function / : Rm -+ Rn, matrix notation is needed. EXERCISES 1. Let AC Rm; let f: A - Rn. Show that if f'(a; u) exists, then /'(a; cu) exists and equals cf'(a; u). 2. Let f : R2 --+ R be defined by setting / (0) = 0 and f(x,y) = xy/(x2 + y2) if (x,y) =I- 0. (a) For which vectors u =:/- 0 does f'(0; u) exist? Evaluate it when it exists. (b) Do Di/ and D2/ exist at 0? (c) Is / differenti able at 0? (d) Is/ continuous at 0? §6. Continuously Differentiable Functions 49 3. Repeat Exercise 2 for the function f defined by setting / (0) = 0 and = f(x, y) x 2 y2 /(:r:2y2 + (y - x)2) if (x, y) =fi 0. = 4. Repeat Exercise 2 for the function / defined by setting /(0) 0 and 5. Repeat Exercise 2 for the function f(x,y)=lxl+IYI• 6. Repeat Exercise 2 for the function = 7. Repeat Exercise 2 for the function / defined by setting /(0) 0 and f(x, y) = x Iy 1/(x2 + 11)112 if (x, y) =fi 0. §6. CONTINUOUSLY DIFFERENTIABLE FUNCTIONS In this section, we obtain a useful criterion for differentiability. We know that mere existence of the partial derivatives does not imply differentiability. If, however, we impose the (comparatively mild) additional condition that these partial derivatives be continuous, then differentiability is assured. We begin by recalling the mean-value theorem of single-variable analysis: Theorem 6.1 (Mean-value theorem). If: [a, b] -+ R is continu- ous at each point of the closed interval [a, b], and differentiable at each point of the open interval (a,b), then there exists a point c of (a,b) such that (b) - (a)= ¢/(c)(b- a). □ In practice, we most often apply this theorem when is differentiable on an open interval containing [a,b]. In this case, of course, is continuous on [a,b]. 50 Differentiation Chapter 2 Theorem 6.2. Let A be open in Rm. Suppose that the partial derivatives D;fi(x) of the component functions off exist at each point x of A and are continuous on A. Then f is differentiable at each point of A. A function satisfying the hypotheses of this theorem is often said to be continuously differentiable, or of class C 1, on A. Proof In view of Theorem 5.4, it suffices to prove that each component function of f is differentiable. Therefore we may restrict ourselves to the case of a real-valued function f : A -+ R. Let a be a point of A. We are given that, for some€, the partial derivatives D; f(x) exist and are continuous for Ix - al < €. We wish to show that f is differentiable at a. Step 1. Let h be a point of Rm with O < lhl < ~; let h1, ... , hm be the components of h. Consider the following sequence of points of Rm: Po= a, P1=a+h1e1, P2 =a+ h1e1 + h2e2, The points Pi all belong to the (closed) cube C of radius Ih I centered at a. = Figure 6.1 illustrates the case where m 3 and all hi are positive. Figure 6.1 Since we are concerned with the differentiability of f, we shall need to deal with the difference f(a + h) - f(a). We begin by writing it in the form L m /(a+ h) - f(a) = [f(p;) - /(P;-i)]. j=l §6. Continuously Differentiable Functions 51 Consider the general term of this summation. Let j be fixed, and define (t) = f (P;-1 + te; ). Assume h; f. 0 for the moment. As t ranges over the closed interval I with end points O and h;, the point P;-i + te; ranges over the line segment from P;-1 to P;i this line segment lies in C, and hence in A. Thus is defined for t in an open interval about I. Ast varies, only the j th component of the point P;-i +te; varies. Hence because D;f exists at each point of A, the function is differentiable on an open interval containing I. Applying the mean-value theorem to , we conclude that (h;) - (0) = '(c; )h; for some point c; between O and h;. (This argument applies whether h; is positive or negative.) We can rewrite this equation in the form • where q; is the point P;-1 + c;e; of the line segment from P;-1 to P;, which lies in C. = We derived (**) under the assumption that h; -1- 0. If h; 0, then (**) holds automatically, for any point q; of C. Using (**), we rewrite (*) in the form L m /(a +h)- /(a)= D;/(q;)h;, j=l where each point Qj lies in the cube C of radius lhl centered at a. Step 2. We prove the theorem. Let B be the matrix B = [D1/(a) ••• Dm/(a)]. Then L m B -h = D;f(a)h;. j=l Using(***), we have t /(a+h)- /(a)- B •h = [D;/(q;)- D;f(a)]h;; lhl j=l lhl then we let h -+ 0. Since Q; lies in the cube C of radius lhl centered at a, we have q,; -+ a. Since the partials of / are continuous at a, the factors in 52 Differentiation Chapter 2 brackets all go to zero. The factors h; /lhl are of course bounded in absolute value by 1. Hence the entire expression goes to zero, as desired. D One effect of this theorem is to reassure us that the functions familiar to us from calculus are in fact differentiable. We know how to compute the partial derivatives of such functions as sin(xy) and xy 2 + zexy, and we know that these partials are continuous. Therefore these functions are differentiable. In practice, we usually deal only with functions that are of class C1. While it is interesting to know there are functions that are differentiable but not of class C 1 , such functions occur rarely enough that we need not be concerned with them. Suppose f is a function mapping an open set A of Rm into Rn, and suppose the partial derivatives Di/i of the component functions of/ exist on A. These then are functions from A to R, and we may consider their partial derivatives, which have the form Dk(D; Ji) and are called the second-order partial derivatives of /. Similarly, one defines the third-order partial derivatives of the functions fi, or more generally the partial derivatives of order r for arbitrary r. If the partial derivatives of the functions /i of order less than or equal to r are continuous on A, we say / is of class er on A. Then the function / is of class er on A if and only if each function D;/i is of class cr-1 on A. We say f is of class C00 on A if the partials of the functions /, of all orders are continuous on A. As you may recall, for most functions the "mixed" partial derivatives are equal. This result in fact holds under the hypothesis that the function / is of class C2 , as we now show. Theorem 6.3. Let A be open in Rm; let f : A-+ R be a Junction of class C2 . Then for each a E A, Proof Since one calculates the partial derivatives in question by letting all variables other than Xk and Xj remain constant, it suffices to consider the case where / is a function merely of two variables. So we assume that A is open in R2 , and that / : A -+ R2 is of class C2 . Step 1. We first prove a certain "second-order" mean-value theorem for /. Let Q = [a, a+ h] x [b, b + k] §6. Continuously Differentiable Functions 53 be a rectangle contained in A. Define >.(h,k) = f(a,b)- /(a+ h, b) - f(a,b + k) + f(a + h, b+ k). Then >. is the sum, with appropriate signs, of the values of / at the four vertices of Q. See Figure 6.2. We show that there are points p and q of Q such that >.(h,k) = D2D1/(p) •hk, and >.(h,k) = D1D2/(q) •hk. b+k b T I I I + I l a s Figure 6.2 a+h By symmetry, it suffices to prove the first of these equations. To begin, we define (a+ h)- (a)= >.(h, k), as you can check. Because D 1/ exists in A, the function is differentiable in an open interval containing [a, a + h]. The mean-value theorem implies that (a + h) - '(so) •h for some So between a and a+ h. This equation can be rewritten in the form Now So is fixed, and we consider the function D1/(so, t). Because D2D1f exists in A, this function is differentiable for t in an open interval about [b, b+ k]. We apply the mean-value theorem once more to conclude that 54 Differentiation Chapter 2 for some to between b and b + k. Combining (*) and (**) gives our desired result. Step 2. We prove the theorem. Given the point a = (a,b) of A and given t > 0, let Q, be the rectangle Qt= [a,a + t] x [b,b + t]. If t is sufficiently small, Qt is contained in A; then Step 1 implies that for some point Pt in Qt. If we let t --+ 0, then Pt --+ a. Because D2D1f is continuous, it follows that A similar argument, using the other equation from Step 1, implies that The theorem follows. D EXERCISES = 1. Show that the function f (x, y) lxyl is differentiable at O, but is not of class C1 in any neighborhood of O. 2. Define / : R -+ R by setting /(0) = 0, and f (t) = t2 sin{l/t) if t-::/- 0. (a) Show/ is differentiable at 0, and calculate /'{0). (b) Calculate J1 (t) if t -::/- 0. (c) Show /' is not continuous at 0. (d) Conclude that / is differentiable on R but not of class C1 on R. 3. Show that the proof of Theorem 6.2 goes through if we assume merely that the partials D, f exist in a neighborhood of a and are continuous at a. 4. Show that if AC Rm and / : A - R, and if the partials Djf exist and are bounded in a neighborhood of a, then / is continuous at a. 5. Let f : R2 __. R2 be defined by the equation = f(r,0) (rcos0, rsin0). It is called the polar coordinate transformation. Continuously Differentiable Functions 55 (a) Calculate D f and det D f. (b) Sketch the image under / of the set S = [1, 2] x [O, 1r]. [Hint: Find the images under / of the line segments that bound S.] 6. Repeat Exercise 5 for the function f : R2 --+ R2 given by f(x,y) = (x2 - y2 , 2xy). Take S to be the set S = {(x,y) lx2 + y2 :S a2 and x ~ 0 and y ~ O}. [Hint: = Parametrize part of the boundary of S by setting x a cost and y = a sin t; find the image of this curve. Proceed similarly for the rest of the boundary of S.] We remark that if one identifies the complex numbers C with R2 in the usual way, then f is just the function f(z) = z 2 . 7. Repeat Exercise 5 for the function f : R2 --+ R2 given by f (x, y) = (ex cosy, ex sin y). Take S to be the set S = [O, I] x [O, 1r]. We remark that if one identifies C with R2 as usual, then f is the function f (z) = ez. 8. Repeat Exercise 5 for the function f : R3 --+ R3 given by f(p,,0) = (pcos0sin¢, psin0sincp, pcoscp). It is called the spherical coordinate transformation. Take S to be the set S = [1,2] X (0,71"/2] X (0,71"/2]. 9. Let g : R -+ R be a function of class C 2 . Show that l1. m h-o g(a+h}-2gh(2a)+g(a-h) _ - g"(a) . [Hint: Consider Step 1 of Theorem 6.3 in the case f(x, y) = g(x + y).] *10. Define f : R2 -+ R by setting f (0) = 0, and f(x,y) = xy(x2 -y2 )/(x2 + y2 ) if (x,y) =I- O. (a) Show D1f and D2/ exist at 0. (b) Calculate D1/ and D2f at (x, y) =I- 0. (c) Show/ is of class C 1 on R2 . [Hint: Show D1f(x, y) equals the prod- uct of y and a bounded function, and D2/(x,y) equals the product of x and a bounded function.] (d) Show that D2D1f and D1D2/ exist at 0, but are not equal there. 56 Differentiation §7. THE CHAIN RULE Chapter 2 In this section we show that the composite of two differentiable functions is differentiable, and we derive a formula for its derivative. This formula is commonly called the "chain rule." Theorem 7.1. Let Ac Rm; let B c Rn. Let / : A --+ Rn and g : B --+ RP' with f (A) C B. Suppose f (a) :::: b. If J is differentiable at a, and if g is differentiable at b, then the composite function go f is differentiable at a. Furthermore, D(g o f)(a) = Dg(b). D /(a), where the indicated product is matrix multiplication. Although this version of the chain rule may look a bit strange, it is really just the familiar chain rule of calculus in a new guise. You can convince yourself of this fact by writing the formula out in terms of partial derivatives. We shall return to this matter later. Proof. For convenience, let x denote the general point of Rm, and let y denote the general point of Rn. By hypothesis, g is defined in a neighborhood of b; choose f so that g(y) is defined for IY - bl < f. Similarly, since f is defined in a neighborhood of a and is continuous at a, we can choose 6 so that /(x) is defined and satisfies the condition 1/(x) - bl < f, for Ix - al < 6. Then the composite function (go f)(x) = g(/(x)) is defined for Ix - al < b. See Figure 7.1. g •c Figure 7.1 z ERP §7. The Chain Rule 57 Step 1. Throughout, let .6.(h) denote the function = .6.(h) /(a+ h) - / (a), which is defined for lhl < 6. First, we show that the quotient l.6.(h)l/lhJ is bounded for h in some deleted neighborhood of 0. For this purpose, let us introduce the function F(h) defined by setting F(O) = 0 and F(h) = [11.(h) -,ita) •h] for O< ihi < 6. Because / is differentiable at a, the function F is continuous at 0. Furthermore, one has the equation .6.(h) = DJ (a) •h + lhlF(h) = for O< lhl < 6, and also for h 0 (trivially). The triangle inequality implies that 1.6.(h)l < mlD f (a)I lh[ + lhl IF(h)I. Now IF(h)I is bounded for h in a neighborhood of O; in fact, it approaches 0 as h approaches 0. Therefore 1.6.(h)I / Jhl is bounded on a deleted neighborhood of 0. Step 2. We repeat the construction of Step 1 for the function g. We = define a function G(k) by setting G(O) 0 and = G(k) g(b + k) - g(b) - Dg(b) •k lkl for O< lkl < f. Because g is differentiable at b, the function G is continuous at 0. Further- more, for lkl < f, G satisfies the equation = g(b + k) - g(b) Dg(b). k + lklG(k). Step 3. We prove the theorem. Let b be any point of Rm with jhl < 6. Then l.6.(h)I < f, so we may substitute .6.(h) fork in formula (**). After this substitution, b + k becomes = = b + .6.(h) /(a)+ .6.(h) /(a+ h), so formula (**) takes the form = g(f(a + b)) - g(/(a)) Dg(b) •.6.(h) + l.6.(h)IG{.6.(h)). 58 Differentiation Chapter 2 Now we use (*) to rewrite this equation in the form 1 lh/ [g(f(a + h)) - g(/(a)) - Dg(b) • D f(a). h] = Dg(b) •F(h) + 1h111 ~(h)IG(~(h)). This equation holds for O < lhl < b. In order to show that go f is differentiable at a with derivative Dg(b) • DJ(a), it suffices to show that the right side of this equation goes to zero as h approaches 0. The matrix Dg(b) is constant, while F(h) --+ 0 as h --+ 0 (because F is continuous at O and vanishes there). The factor G(~(h)) also approaches zero as h --+ O; for it is the composite of two functions G and ~, both of which are continuous at O and vanish there. Finally, l~(h)I / lhl is bounded in a deleted neighborhood of O, by Step 1. The theorem follows. D Here is an immediate consequence: Corollary 7.2. Let A be open in Rm; let B be open in R". Let f : A --+ R" and g : B - RP, with f(A) CB. If f and g are of class er, so is the composite function go f. Proof. The chain rule gives us the formula D(g o f)(x) = Dg(f(x)) · DJ(x), which holds for x E A. Suppose first that / and g are of class C 1 . Then the entries of Dg are continuous real-valued functions defined on B; because f is continuous on A, the composite function Dg (f (x)) is also continuous on A. Similarly, the entries of the matrix D f (x) are continuous on A. Because the entries of the matrix product are algebraic functions of the entries of the matrices involved, the entries of the product Dg (J(x)} · D J(x) are also continuous on A. Then go J is of class C 1 on A. To prove the general case, we proceed by induction. Suppose the theorem is true for functions of class er- 1. Let f and g be of class Cr. Then the entries of Dg are real-valued functions of class cr-l on B. Now f is of class cr-l on A (being in fact of class Cr); hence the induction hypothesis implies that the function D;gi(f(x)), which is a composite of two functions of class cr- l, is of class cr- l. Since the entries of the matrix fl j (X) are also of class cr-l on A by hypothesis, the entries of the product Dg(f(x)) ·DJ(x) are of class cr- 1 on A. Hence go f is of class Cr on A, as desired. §7. The Chain Rule 59 er er The theorem follows for r finite. If now / and g are of class C00 , then they are of class for every r, whence 9 0 / is also of class for every r. □ As another application of the chain rule, we generalize the mean-value theorem of single-variable analysis to real-valued functions defined in Rm. We will use this theorem in the next section. Theorem 7.3 (Mean-value theorem). Let A be open in Rm; let f : A ...... R be differentiable on A. If A contains the line segment with = end points a and a+ h, then there is a point c a+ t0h with O < t0 < 1 of this line segment such that = /(a+ h)- f(a) D/(c) •h. = Proof. Set (t) f(a + th); then is defined fort in an open interval about [O, 1]. Being the composite of differentiable functions, is differentiable; its derivative is given by the formula '(t) = D f (a+ th)• h. The ordinary mean-value theorem implies that (1) - (O) ='(to)· 1 for some to with O< to < 1. This equation can be rewritten in the form f (a+ h) - f (a) = D f (a+ t0h) · h. □ As yet another application of the chain rule, we consider the problem of differentiating an inverse function. Recall the situation that occurs in single-variable analysis. Suppose (x) is differentiable on an open interval, with '(x) > 0 on that interval. Then is strictly increasing and has an inverse function 'Ip, which is defined by letting 1/J(y) be that unique number x such that (x) = y. The function 'l/J is in fact differentiable, and its derivative satisfies the equation = tf/(y) 1/'(x), = where y (x). There is a similar formula for differentiating the inverse of a function / of several variables. In the present section, we do not consider the question whether the function f has an inverse, or whether that inverse is differentiable. We consider only the problem of finding the derivative of the inverse function. 60 Differentiation Chapter 2 = Theorem 7.4. Let A be open in Rn; let f : A --+ Rn; let f (a) b. = Suppose that g maps a neighborhood of b into Rn, that g(b) a, and = g(f(x)) X for all x in a neighborhood of a. If f is differentiable at a and if g is differentiable at b, then Proof. Let i : Rn --+ Rn be the identity function; its derivative is the identity matrix In. We are given that g(f (x)) = i(x) for all x in a neighborhood of a. The chain rule implies that Dg(b) - D /(a)= In. Thus Dg(b) is the inverse matrix to D f (a) (see Theorem 2.5). D The preceding theorem implies that if a differentiable function / is to have a differentiable inverse, it is necessary that the matrix D f be non-singular. It is a somewhat surprising fact that this condition is also sufficient for a function f of class C 1 to have an inverse, at least locally. We shall prove this fact in the next section. REMARK. Let us make a comment on notation. The usefulness of well-chosen notation can hardly be overemphasized. Arguments that are obscure, and formulas that are complicated, sometimes become beautifully simple once the proper notation is chosen. Our use of matrix notation for the derivative is a case in point. The formulas for the derivatives of a composite function and an inverse function could hardly be simpler. Nevertheless, a word may be in order for those who rememher the notation used in calculus for partial derivatives, and the version of the chain rule proved there. In advanced mathematics, it is usual to use either the functional notation ¢' or the operator notation D¢ for the derivative of a real-valued function of a real variable. (D¢ denotes a 1 by 1 matrix in this case!) In calculus, however, another notation is common. One often denotes the derivative ¢'(x) by the symbol d¢/dx, or, introducing the "variable" y by setting y = cp(x), by the symbol dy/ dx. This notation was introduced by Leibnitz, one of the originators of calculus. It comes from the time when the focus of every physical and mathematical problem was on the variables involved, and when functions as such were hardly even thought about. §7. The Chain Rule 61 The Leibnitz notation has some familiar virtues. For one thing, it makes the chain rule easy to remember. Given functions¢: R - R and tp: R-;, R, the derivative of the composite function tp o ¢ is given by the formula D(l/} o ¢)(x) = D¢(¢(x)) • D¢(x). If we introduce variables by setting y = ¢( x) and z = t/J(y), then the derivative of the composite function z = t/J(¢(x)) can be expressed in the Leibnitz notation by the formula dz dz dy dx = dy. dx. The latter formula is easy to remember because it looks like the formula for multiplying fractions! However, this notation has its ambiguities. The letter "z," when it appears on the left side of this equation, denotes one function (a function of x); and when it appears on the right side, it denotes a different function (a function of y). This can lead to difficulties when it comes to computing higher derivatives unless one is very careful. The formula for the derivative of an inverse function is also easy to remember. If y = ¢(x) has the inverse function x = l/J(y), then the derivative of tp is expressed in Leibnitz notation by the equation 1 dx/dy = dy/dx' which looks like the formula for the reciprocal of a fraction! The Leibnitz notation can easily be extended to functions of several vari- ables. If A C Rm and f : A - R, we often set Y = f (x) = f (x1, ... , Xm), and denote the partial derivative Di/ by one of the symbols of or OXi The Leibnitz notation is not nearly as convenient in this situation. Consider the chain rule, for example. If f •. Rm ~ ~ R" and g: Rn - R, then the composite function F;;;; go f maps Rm into R, and its derivative is given by the formula = DF(x) Dg(f(x)) • Df(x), 62 Differentiation which can be written out in the form Chapter 2 Dm-~~(x)] . Dm/n(x) The formula for the Ph partial derivative of F is thus given by the equa- tion L n DiF(x) = D,.g(f(x)) Difk(x). k=l If we shift to "variable" notation by setting y = /(x) and z = g(y ), this equation becomes this is probably the version of the chain rule you learned in calculus. Only familiarity would suggest that it is easier to remember than (*)! Certainly one cannot obtain the formula for {)zj OXj by a simple-minded multiplication of fractions, as in the single-variable case. The formula for the derivative of an inverse function is even more troublesome. Suppose f : R2 - R2 is differentiable and has a differentiable inverse function g. The derivative of g is given by the formula = Dg(y) [D/(x))-1 . where y = / (x). In Leibnitz notation, this formula takes the form = 8x1/oy2] [{)yifox1 oy1/8x2]-l 8x2/oy2 oy2/8x1 {)y2/ox2 Recalling the formula for the inverse of a matrix, we see that the partial derivative OXi/Oyj is about as far from being the reciprocal of the partial derivative oy,/OXi as one could imagine! §8. The Inverse Function Theorem 63 EXERCISES 1. Let f : R3 - R2 satisfy the conditions /(0) = (1, 2) and Df(O) = [ 1 2 3] . 0 0 1 Let g : R2 - R2 be defined by the equation g(x, y) = (x + 2y + I, 3xy). Find D(g o /)(0). 2. Let f : R2 - R3 and g: R3 - R2 be given by the equations f (x) = (€ 2 x 1 +x2 , 3x2 - cos X1, Xi+ X2 + 2), g(y) = (3y1 + 2yz + Yi, yt - y3 + 1). (a) If F(x) = g(/(x)), find DF(O). [Hint: Don't compute F explicitly.] (b) If G(y) = f (g(y)), find DG(O). 3. Let f : R3 - R and g : R2 - R be differentiable. Let F : R2 - R be defined by the equation F(x, y) = f (x, y, g(x, y)). (a) Find DF in terms of the partials off and g. (b) If F(x, y) = 0 for all (x, y), find D1g and D2g in terms of the partials off. 4. Let g: R2 - R2 be defined by the equation g(x, y) = (x,y + x 2). Let = / : R2 - R be the function defined in Example 3 of§ 5. Let h fog. Show that the directional derivatives of f and g exist everywhere, but that there is au =j:. 0 for which h'(O; u) does not exist. §8. THE INVERSE FUNCTION THEOREM Let A be open in Rn; let f : A -+ Rn be of class C 1. We know that for f to have a differentiable inverse, it is necessary that the derivative Df (x) of/ be non-singular. We now prove that this condition is also sufficient for / to have a differentiable inverse, at least locally. This result is called the inverse Junction theorem. We begin by showing that non-singularity of D f implies that / is locally one-to-one. 64 Differentiation Chapter 2 Lemma 8.1. Let A be open in nn ,· let f: A - Rn be of class C 1 . If D/(a) is non-singular, then there exists an o > 0 such that the inequality lf(xo) - f(xi)I > alxo - xii holds for all x0,x1 in some open cube C(a; €) centered at a. It follows that f is one-to-one on this open cube. = Proof. Let E D f(a); then E is non-singular. We first consider the linear transformation that maps x to E •x. We compute = lxo - x1 I IE-1 • (E •xo - E •xi)! < njE- 1 1 • JE •xo - E-xij. = If we set 2o I/nlE- 1 I, then for all xo, x1 in Rn, Now we prove the lemma. Consider the function = H(x) f(x)-E ·x. = Then DH(x) Df(x)-E, so thatDH(a) = 0. Because H isofclassC1, we can choose€> 0 so that JDH(x)I < a/n for x in the open cube C = C(a; €). The mean-value theorem, applied to the ith component function of H, tells us that, given xo, x1 E C, there is a c E C such that Then for xo, x1 EC, we have olxo - x1 I> IH(xo) - H(x1)I =I/(xo) - E •Xo - f (xi) + E •x1 I >IE· x1 - E •xol - lf(x1) - /(xo)I ~ 20:lx1 - xol - lf(x1) - f(xo)I- The lemma follows. D Now we show that non-singularity of D f, in the case where f is one-to- one, implies that the inverse function is differentiable. §8. The Inverse Function Theorem 65 Theorem 8.2. Let A be open in Rn; let f : A -+ Rn be of class er; let B = f(A). If f is one-to-one on A and if Df(x) is non-singular for x E A, then the set B is open in Rn and the inverse function g : B -+ A is of class er. Proof. Step 1. We prove the following elementary result: If : A -+ R = is differentiable and if has a local minimum at x 0 E A, then D has a local minimum at x0 means that ef>(x0) for all x in a neighborhood of x 0 . Then given u f. 0, ef>(xo + tu) - ef>(xo) > 0 for all sufficiently small values oft. Therefore = r/>'(xo; u) lim [ef>(xo + tu) - r/>(xo)]/t t-+0 is non-negative if t approaches O through positive values, and is non-positive = if t approaches O through negative values. It follows that ef>'(x0 ; u) 0. In = = particular, D;r/>(xo) 0 for all j, so that Def>(xo) 0. Step 2. We show that the set B is open in Rn. Given b E B, we show B contains some open ball B(b; 6) about b. We begin by choosing a rectangle Q lying in A whose interior contains = the point a J- 1(b) of A. The set Bd Q is compact, being closed and bounded in Rn. Then the set /(Bd Q) is also compact, and thus is closed and bounded in Rn. Because f is one-to-one, f(Bd Q) is disjoint from b; because f(Bd Q) is closed, we can choose 6 > 0 so that the ball B(b; 26) is disjoint = from f(Bd Q). Given c E B(b; 6) we show that c f(x) for some x E Q; it = then follows that the set f (A) B contains each point of B(b; 6), as desired. See Figure 8.1. -I - Figure 8.1 /(Bd Q) 66 Differentiation Chapter 2 Given c E B(b; «5), consider the real-valued function (x) = 11/(x) - cll2, which is of class er. Because Q is compact, this function has a minimum value on Q; suppose that this minimum value occurs at the point x of Q. We = show that /(x) c. Now the value of at the point a is (a) = 11/(a) - cll 2 = llb - cll2 < «52 • Hence the minimum value of on Q must be less than «52 . It follows that this minimum value cannot occur on Bd Q, for if x E Bd Q, the point f (x) lies outside the ball B(b; 2«5), so that 11/(x) - ell > «5. Thus the minimum value of occurs at a point x of Int Q. Because x is interior to Q, it follows that has a local minimum at x; then by Step 1, the derivative of vanishes at x. Since = Ln (x) (fk(x) - c1:)2, n = L D;(x) 2(/1:(x)- c1:)D;Jk(x). k=l = The equation D(x) 0 can be written in matrix form as = (/n(x) - Cn)] •D/(x) 0. Now D f(x) is non-singular, by hypothesis. Multiplying both sides of this = equation on the right by the inverse of D /(x), we see that f(x) - c O, as desired. Step 3. The function / : A --+ B is one-to-one by hypothesis; let g : B--+ A be the inverse function. We show g is continuous. Continuity of g is equivalent to the statement that for each open set U of = = A, the set V g- 1 (U) is open in B. But V f(U); and Step 2, applied to the set U, which is open in A and hence open in Rn, tells us that V is open in Rn and hence open in B. See Figure 8.2. §8. The Inverse Function Theorem 67 I g Figure 8.2 It is an interesting fa.ct that the results of Steps 2 and 3 hold without assuming that D f (x) is non-singular, or even that / is di:fferentiable. If A is open in Rn and / : A-+ Rn is continuous and one-to-one, then it is true that /(A) is open in Rn and the inverse function g is continuous. This result is known as the Brouwer theorem on invariance of domain. Its proof requires the tools of algebraic topology and is quite difficult. We have proved the differentiable version of this theorem. Step 4. Given b E B, we show that g is differentiable at b. -r~7)- Let a be the point g(b), and let E = D f (a). We show that the function G(k) = [g(b + k) E-•. k], which is defined _for k in a deleted neighborhood of 0, approaches 0 as k approaches 0. Then g is differentiable at b with derivative E- 1. Let us define d(k) = g(b + k) - g(b) for k near 0. We first show that there is an f > 0 such that ld(k)l/lkl is bounded for O < lkl < f. (This would follow from differentiability of g, but that is what we are trying to prove!) By the preceding lemma, there is a neighborhood C of a and an a > 0 such that lf(xo) - f(x1)I > alxo - x1 I for x0 ,x1 EC. Now /(C) is a neighborhood of b, by Step 2; choose f so that = h+k is in /(C) whenever lkl < f. Then for lkl < f, we can set xo g(b + k) and x 1 = g(b) and rewrite the preceding inequality in the form [(b + k) - bl> alg(b + k) - g(b)I, 68 Differentiation Chapter 2 which implies that 1/a> l~(k)l/lkl, as desired. Now we show that G(k) ---. 0 as k ---. 0. Let 0 < lkl < f. We have = G(k) t.(k) lkf-' •k by definition, = -E-1 . [k - E •~(k)] l~(k)I l~(k)I lkl • (Here we use the fact that ~(k) # 0 for k # 0, which follows from the fact that g is one-to-one.) Now E- 1 is constant, and l~(k)I/ lkl is bounded. It remains to show that the expression in brackets goes to zero. We have = b + k = f (g(b + k)) = f(g(b) + 6.(k)) f (a+ ~(k)). Thus the expression in brackets equals f (a+ ~(k)) - /(a) - E · ~(k) l~(k)I Let k ---. 0. Then ~(k) ---. 0 as well, because g is continuous. Since f is differentiable at a with derivative E, this expression goes to zero, as desired. Step 5. Finally, we show the inverse function g is of class Cr. Because g is differentiable, Theorem 7.4 applies to show that its derivative is given by the formula Dg(y) = [D/(g(y))J- 1 , for y E B. The function Dg thus equals the composite of three functions: B-.!....+ A !!.L GL(n) ~ GL(n), where GL(n) is the set of non-singular n by n matrices, and I is the function that maps each non-singular matrix to its inverse. Now the function I is given by a specific formula involving determinants. In fact, the entries of l(C) are rational functions of the entries of C; as such, they are C00 functions of the entries of C. We proceed by induction on r. Suppose f is of class C1 . Then D f is continuous. Because g and I are also continuous (indeed, g is differentiable and I is of class C00 ), the composite function, which equals Dg, is also continuous. Hence g is of class C 1 . Suppose the theorem holds for functions of class cr- 1. Let f be of class er. Then in particular f is of class cr- l, so that (by the induction hypothesis), the inverse function g is of class cr-l _ Furthermore, the function D f is of class cr-l. VVe invoke Corollary 7.2 to conclude that the composite function, which equals Dg, is of class er- 1. Then g is of class er. □ Finally, we prove the inverse function theorem. §8. The Inverse Function Theorem 69 Theorem 8.3 (The inverse function theorem). Let A be open in R"; let f : A --+ R" be of class er. If Df(x) is non-singular at the point a of A, there is a neighoorhood U of the point a such that f carries U in a one-to-one fashion onto an open set V of R" and the inverse function is of class er. Proof. By Lemma 8.1, there is a neighborhood U0 of a on which f is one-to-one. Because det Df(x) is a continuous function ofx, and det Df(a) f; 0, there is a neighborhood U1 of a such that det D f (x) f; 0 on U1 . If U equals the intersection of Uo and U1, then the hypotheses of the preceding theorem are satisfied for / : U --+ R". The theorem follows. D This theorem is the strongest one that can be proved in general. While the non-singularity of D f on A implies that / is locally one-to-one at each point of A, it does not imply that f is one-to-one on all of A. Consider the following example: EXAMPLE 1. Let / : R2 -+ R2 be defined by the equation J (r, 9) = (r cos 9, r sin 9). Then Df(r,9)= [ cos 9 -rsin 9] , sin 9 r cos (J so that det Df(r,9) = r. Let A be the open set (0, 1) x (O,b) in the (r,8) plane. Then DJ is non- singular at each point of A. However, / is one-to-one on A only if b < 2,r. See Figures 8.3 and 8.4. 8 1 ----I--- y Figure 8.3 70 Differentiation () 1 ---f -- Chapter 2 y Figure B.4 EXERCISES 1. Let f : R2 - R2 be defined by the equation J(x,y) = (x2 -'Jt,2xy). (a) Show that f is one-to-one on the set A consisting of all (x, y) with = = x > 0. [Hint: If f(x,y) f(a,b), then 11/(x,y)II 11/(a,b)II-] = (b) What is the set B f(A)? (c) If g is the inverse function, find Dg(O, 1). 2. Let / : R2 - R2 be defined by the equation = f(x,y) (excosy,exsiny). (a) Show that / is one-to-one on the set A consisting of all (x, y) with 0 < y < 2,r. [Hint: See the hint in the preceding exercise.] (b) What is the set B = f (A)? (c) If g is the inverse function, find Dg(O, 1). = 3. Let / : Rn - Rn be given by the equation / (x) llxll2 • x. Show that / is of class C00 and that / carries the unit ball B(O; 1) onto itself in a one-to-one fashion. Show, however, that the inverse function is not differentiable at 0. 4. Let g ; R2 - R2 be given by the equation Let / : R2 - R3 be given by the equation f(x, y) = (3x - y2 , 2x + y, xy + y3). §9. The Implicit Function Theorem 71 (a) Show that there is a neighborhood of (0, 1) that g carries in a oneto-one fashion onto a neighborhood of (2, 0). (b) Find D(f o g-1 ) at {2, 0). 5. Let A be open in Rn; let /: A-+ Rn be of class Cr; assume D/(x) is non-singular for x E A. Show that even if/ is not one-to-one on A, the set B = /(A) is open in Rn. *§9. THE IMPLICIT FUNCTION THEOREM The topic of implicit differentiation is one that is probably familiar to you from calculus. Here is a typical problem: "Assume that the equation x3 y + 2e~Y = 0 determines y as a differentiable function of x. Find dy/ dx ." One solves this calculus problem by "looking at y as a function of x," and differentiating with respect to x. One obtains the equation which one solves for dy/dx. The derivative dy/dx is of course expressed in terms of x and the unknown function y. The case of an arbitrary function f is handled similarly. Supposing that the equation f(x, y) = 0 determines y as a differentiable function of x, say y = g(x), the equation J(x,g(x)) = 0 is an identity. One applies the chain rule to calculate of/ox+ (of /oy)g'(x) = 0, so that , 8//ox 9 (x) = - 8 f /8y ' where the partial derivatives are evaluated at the point (x,g(x)). Note that the solution involves a hypothesis not given in the statement of the problem. In order to find g'( x ), it is necessary to assume that Of/8y is non-~ero at the point in question. It in fact turns out that the non-vanishing of 8 J/ 8y is also sufficient to justify the assumptions we made in solving the problem. That is, if the function f(x,y) has the property that 8f/8y "IO at a point (a,b) that is a solution of the equation f(x,y) = 0, then this equation does determine y as a function of x, for x near a, and this function of x is differentiable. 72 Differentiation Chapter 2 This result is a special case of a theorem called the implicit function theorem, which we prove in this section. The general case of the implicit function theorem involves a system of equations rather than a single equation. One seeks to solve this system for some of the unknowns in terms of the others. Specifically, suppose that J : Rk+n -+ Rn is a function of class C 1 . Then the vector equation is equivalent to a system of n scalar equations ink+ n unknowns. One would expect to be able to assign arbitrary values to k of the unknowns and to solve for the remaining unknowns in terms of these. One would also expect that the resulting functions would be differentiable, and that one could by implicit differentiation find their derivatives. There are two separate problems here. The first is the problem of finding the derivatives of these implicitly defined functions, assuming they exist; the solution to this problem generalizes the computation of g'(x) just given. The second involves showing that (under suitable conditions) the implicitly defined functions exist and are differentiable. In order to state our results in a convenient form, we introduce a new notation for the matrix D f and its submatrices: Definition. Let A be open in Rm; let f : A-+ Rn be differentiable. Let f1, ... , fn be the component functions off. We sometimes use the notation for the derivative off. On occasion we shorten this to the notation DJ= 8J/8x. More generally, we shall use the notation 8(fi1,•",fi,,) 8(xii,··•,x;,) to denote the k by f matrix that consists of the entries of D f lying in rows ii, ... , i1: and columns Ji, ... ,Jt- The general entry of this matrix, in row p and column q, is the partial derivative 8fip/8x;q• Now we deal with the problem of finding the derivative of an implicitly defined function, assuming it exists and is differentiable. For simplicity, we shall assume that we have solved a system of n equations ink+ n unknowns for the last n unknowns in terms of the first k unknowns. The Implicit Function Theorem 73 Theorem 9.1. Let A be open in Rk+n; let f : A -+ Rn be differentiable. Write fin the form f(x,y), for x E Rk and y E Rn; then DJ has the form l· D f = [8f / 8x 8f / 8y Suppose there is a differentiable function g : B -+ Rn defined on an open set B in Rk, such that f (x,g(x)) =0 for all x EB. Then for x EB, 8f Bx (x,g(x)) + 8f By (x,g(x)) • Dg(x) = 0. This equation implies that if the n by n matrix 8 f / 8y is non-singular at the point (x, g(x)), then Dg(x) = - [8Byf i-l (x, g(x)) • 8Bxf (x, g(x )) . = = Note that in the case n k 1, this is the same formula for the derivative that was derived earlier; the matrices involved are 1 by 1 matrices in that case. Proof. Given g, let us define h : B-+ Rk+n by the equation h(x) = (x, g(x)). The hypotheses of the theorem imply that the composite function JI (X) = f (h(X)) = f (X' g(X)) is defined and equals zero for all x E B. The chain rule then implies that = = o DH(x) D f (h(x)) · Dh(x) :t l· = [:~ (h(x)) (h(x)) [n:(x)] = :: (h(x)) + :~ (h(x)) •Dg(x), as desired. D The preceding theorem tells us that in order to compute Dg, we must assume that the matrix 8 f / 8y is non-singular. Now we prove that the non- singularity of 8 f / 8y suffices to guarantee that the function g exists and is differentiable. 74 Differentiation Chapter 2 Theorem 9.2 (Implicit function theorem). Let A be open in Rk+n; let f : A -+ R" be of class er. Write f in the form f (x, y)' for x E Rk and y E Rn. Suppose that (a, b) is a point of A such that f(a, b) =0 and det 8f By (a, b) # 0. Then there is a neighborhood B of a in Rk and a unique continuous = function g: B-+ Rn such that g(a) b and = f (x,g(x)) 0 for all x E B. The function g is in fact of class er. Proof. We construct a function F to which we can apply the inverse function theorem. Define F : A ____,. Rk+n by the equation F(x,y) = (x,f(x,y)). Then F maps the open set A of Rk+n into Rk x Rn = Rk+n. Furthermore, DF = [&!i&x &f~&J • Computing 0 (by continuity of g and go). Since B is connected, the latter set must be empty. D In our proof of the implicit function theorem, there was of course nothing special about solving for the last n coordinates; that choice was made simply for convenience. The same argument applies to the problem of solving for any n coordinates in terms of the others. 76 Differentiation Chapter 2 For example, suppose A is open in R5 and f : A ----► R2 is a function = of class er. Suppose one wishes to "solve" the equation f (x, y, z, u, v) 0 for the two unknowns y and u in terms of the other three. In this case, the = implicit function theorem tells us that if a is a point of A such that f (a) 0 and 8/ det 8(y,u) (a) -I 0, = then one can solve for y and u locally near that point, say y ( x, z, v) and = u 1/J(x, z, v). Furthermore, the derivatives of and 1/; satisfy the formula 1- l =- 8(, 1P) 8(x, z, v) [ 81 l [ 81 8(y, u) • 8(x, z, v) • EXAMPLE 1. Let /: R2 ---.. R be given by the equation f(x,y) = x2 + y2 - 5. = = Then the point (x, y) (1, 2) satisfies the equation /(x, y) 0. Both {)J /8x and {Jf / {)y are non-zero at (1,2), so we can solve this equation locally for either variable in terms of the other. In particular, we can solve for yin terms of x, obtaining the function y = g(x) = [5 - ;z;2]112 . Note that this solution is not unique in a neighborhood of x = 1 unless we specify that g is continuous. For instance, the function for X ~ 1, for z < 1 satisfies the same conditions, but is not continuous. See Figure 9.2. 1 Figure 9.2 §9. The Implicit Function Theorem 77 = EXAMPLE 2. Let/ be the function of Example 1. The point (x, y) (v's, 0) also satisfies the equation / (x, y) = 0. The derivative lJf / IJy vanishes at (-Is, 0), so we do not expect to be able to solve for y in terms of x near this point. And, in fact, there is no neighborhood B of -Is on which we can solve for yin terms of x. See Figure 9.3. (v's, 0) Figure 9.3 EXAMPLE 3. Let / : R2 - R be given by the equation = Then (0,0) is a solution of the equation f(x, y) 0. Because IJJ /IJy vanishes at (0,0), we do not expect to be able to solve this equation for yin terms of x near (0,0). But in fact, we can; and furthermore, the solution is unique! = However, the function we obtain is not differentiable at x 0. See Figure 9.4. Figure 9.4 EXAMPLE 4. Let / : R2 - R be given by the equation = J(x,y) y2 - x 4 • = Then (0,0) is a solution of the equation f(x, y) 0. Because IJ/ /IJy vanishes at {0,0), we do not expect to be able to solve for yin terms of x near (0,0). In 78 Differentiation Chapter 2 fact, however, we can do so, and we can do so in such a way that the resulting function is differentiable. However, the solution is not unique. Figure 9.5 = Now the point (1,2) also satisfies the equation f(x, y) 0. Because {)J /{)y is non-zero at (1,2), one can solve this equation for y as a continuous = function of x in a neighborhood of x 1. See Figure 9.5. One can in fact express y as a continuous function of x on a larger neighborhood than the one pictured, but if the neighborhood is large enough that it contains 0, then the solution is not unique on that larger neighborhood. EXERCISES 1. Let / : R3 - R2 be of class C 1 ; write/ in the form f(x, Y1, Y2). Assume that /(3, -1, 2) = 0 and D/(3, -1, 2) = 1 [1 2 -1 (a) Show there is a function g : B - R2 of class C1 defined on an open set B in R such that = I (X' 91 (X), g2 ( X)) 0 for x E B, and g(3) = (-1, 2). (b) Find Dg(3). = (c) Discuss the problem of solving the equation / (x, Y1, Y2) 0 for an arbitrary pair of the unknowns in terms of the third, near the point (3, -1, 2). 2. Given/: R5 - R2, of class C1 . Let a= (1,2,-1,3,0); suppose that /(a)= 0 and 3 1 -1 DJ(a) = [: 01 2 §9. The Implicit Function Theorem 79 (a) Show there is a function g : B - R2 of class C1 defined on an open set B of R3 such that = for x (x1, z2, xa) EB, and g(l, 3, 0) = (2, -1). (b) Find Dg(l, 3, 0). = (c) Discuss the problem of solving the equation / (x) 0 for an arbitrary pair of the unknowns in terms of the others, near the point a. 2- = 3. Let / : R R be of class C1, with /(2, -1) -1. Set G(x,y,u) = f(x,y) +u2, = H(x,y,u) ux+ 3y3 +u3 • The equations G(x, y, u) = 0 and H(x, y, u) = 0 have the solution = (x, y, u) (2, -1, 1). = (a) What conditions on DJ ensure that there are C1 functions z g(y) = and u h(y) defined on an open set in R that satisfy both equations, such that g(-1) = 2 and h(-1) = 1? = (b) Under the conditions of (a), and assuming that D/(2, -1) [1 -3], find g'(-1) and h'(-1). 4. Let F : R2 - R be of class C2 I with F(O, 0) = 0 and DF(0, 0) = (2 3). Let G : R3 - R be defined by the equation = G(x,y,z) F(x+2y+3z -I,x3 +y2- z2 ). = = (a) Note that G(-2, 3, -1) F(0, 0) O. Show that one can solve = = the equation G(x, y, z) 0 for z, say z g(x, y), for (z, y) in a = neighborhood B of (-2, 3), such that g(-2, 3) -1. (b) Find Dg(-2, 3). = = = *(c) If D1D1F 3 and D1D2F -1 and D2D2F 5 at (0,0), find D2D1g(-2, 3). 5. Let /, g : R3 - R be functions of class C1 . "In general," one expects = = that each of the equations /(x, y, z) 0 and g(x, y, z) 0 represents a smooth surface in R3, and that their intersection is a smooth curve. Show that if (x0 , y0 , zo) satisfies both equations, and if IJ(f, g)/8(x, y, z) has rank 2 at (xo, Yo, zo), then near (xo, Yo, zo), one can solve these equations for two of x, y, z in terms of the third, thus representing the solution set locally as a parametrized curve. 6. Let/: Rk+n - R" be of class C 1 ; suppose that /(a)= 0 and that D/(a) has rank n. Show that if c is a point of R" sufficiently close to O, then = the equation / (x) c has a solution. Integration In this chapter, we define the integral of real-valued function of several real variables, and derive its properties. The integral we study is called Riemann integral; it is a direct generalization of the integral usually studied in a first course in single-variable analysis. §10. THE INTEGRAL OVER A RECTANGLE We begin by defining the volume of a rectangle. Let be a rectangle in R". Each of the intervals [ai, bi] is called a component interval of Q. The maximum of the numbers b1 - a1, ... , bn - an is called the width of Q. Their product is called the volwne of Q. = In the case n l, the volume and the width of the (I-dimensional) rectangle [a,bJ are the same, namely, the number b- a. This number is also called the length of [a,b]. Definition. Given a closed interval [a, b] of R, a partition of [a, bJ is a finite collection P of points of [a, b] that includes the points a and b. We 81 82 Integration Chapter 3 usually index the elements of P in increasing order, for notational convenience, as a = to < ti < · · · < t" = b; each of the intervals [ti-l, ti], for i = 1, ... , k, is called a subinterval determined by P, of the interval [a, b]. More generally, given a rectangle in Rn, a partition P of Q is an n-tuple (P1, ... , Pn) such that P; is a partition of [a;, b;] for each j. If for each j, I; is one of the subintervals determined by P; of the interval [a;, b;], then the rectangle is called a subrectangle determined by P, of the rectangle Q. The maxi- mum width of these subrectangles is called the mesh of P. Definition. Let Q be a rectangle in Rn; let f : Q -+ R; assume / is bounded. Let P be a partition of Q. For each subrectangle R determined by P, let mn(/) = inf{/(x) Ix E R}, Mn(/) = sup{/(x) Ix E R}. We define the lower sum and the upper sum, respectively, of /, determined by P, by the equations L L(f, P) = mn(f) •v(R), n L U(f,P) = Mn(/)• v(R), n where the summations extend over all subrectangles R determined by P. Let P = (P1, ... , Pn) be a partition of the rectangle Q. If P" is a partition of Q obtained from P by adjoining additional points to some or all of the partitions P1, ... , Pn, then P" is called a refinement of P. Given two partitions P and P' = (P{, ... , P~) of Q, the partition is a refinement of both P and P'; it is called their common refinement. Passing from P to a refinement of P of course affects lower sums and upper sums; in fact, it tends to increase the lower sums and decrease the upper sums. That is the substance of the following lemma: §10. The Integral Over a Rectangle 83 Lemma 10.1. Let P be a partition of the rectangle Q; let f ; Q -+ R be a bounded function. If P" is a refinement of P, then L(f, P) < L(f, P") and U(f, P") < U(f, P). Proof Let Q be the rectangle = Q [a1,b1] X • • • X [an,bn]• It suffices to prove the lemma when P" is obtained by adjoining a single additional point to the partition of one of the component intervals of Q. Suppose, to be definite, that P is the partition (P 1, ... , Pn) and that P" is obtained by adjoining the point q to the partition P1. Further, suppose that P1 consists of the points and that q lies interior to the subinterval [ti-i, ti]We first compare the lower sums L(f, P) and L(f, P"). Most of the subrectangles determined by Pare also subrectangles determined by P". An exception occurs for a subrectangle determined by P of the form Rs =[ti-1, ti] x S (where S is one of the subrectangles of [a2 , b2] x ••• x [an, bn] determined by (P2, ... , Pn)), The term involving the subrectangle Rs disappears from the lower sum and is replaced by the terms involving the two subrectangles which are determined by P". See Figure 10.1. s q Figure 10.1 84 Integration Chapter 3 Now since mn5 (f) < f(x) for each x E R~ and for each x E Ri, it follows that = Because v(Rs) v(R's) + v(R~) by direct computation, we have Since this inequality holds for each subrectangle of the form Rs, it follows that L(f, P) < L(f, P"), as desired. A similar argument applies to show that U(f,P) > U(f,P"). □ Now we explore the relation between upper sums and lower sums. We have the following result: Lemma 10.2. Let Q be a rectangle; let f : Q ---+ R be a bounded function. If P and P' are any two partitions of Q, then L(f, P) < U(f, P'). Proof. In the case where P = P', the result is obvious: For any sub- rectangle R determined by P, we have mn(f) < Mn(f). Multiplying by v(R) and summing gives the desired inequality. In general, given partitions P and P' of Q, let P" be their common refinement. Using the preceding lemma, we conclude that L(f, P) < L(f, P") < U(f, P") < U(f, P'). □ Now (finally) we define the integral. Definition. Let Q be a rectangle; let f: Q -+ R be a bounded function. As P ranges over all partitions of Q, define = f J sup {L(f, P)} h p and }fqf = inf {U(f,P)}. p §10. The Integral Over a Rectangle 85 These numbers are called the lower integral and upper integral, respec- tively, of / over Q. They exist because the numbers L(f, P) are bounded above by U(f, P') where P' is any fixed partition of Q; and the numbers U(f, P) are bounded below by L(f, P'). If the upper and lower integrals off over Q are equal, we say / is integrable over Q, and we define the inte- gral of/ over Q to equal the common value of the upper and lower integrals. We denote the integral of/ over Q by either of the symbols I. or /(x). xeQ EXAMPLE 1. Let / : [a, bJ -+ R be a non-negative bounded function. If P = is a partition of I [a, b), then L(f, P) equals the total area of a bunch of rectangles inscribed in the region between the graph of/ and the x-axis, and U(f, P) equals the total area of a bunch of rectangles circumscribed a.bout this region. See Figure 10.2. L(f,P) U(f,p) a b a b Figure 10.2 The lower integral represents the so-called "inner area" of this region, computed by approximating the region by inscribed rectangles, while the upper integral represents the so-called "outer area," computed by approximating the region by circumscribed rectangles. If the "inner" and "outer" areas are equal, then / is integrable. Similarly, if Q is a. rectangle in R2 and / : Q -+ R is non-negative and bounded, one can picture L(f, P) as the total volume of a bunch of boxes inscribed in the region between the graph of/ and the zy-plane, and U(/, P) 86 Integration Chapter 3 as the total volume of a bunch of boxes circumscribed about this region. See Figure 10.3. Figure 10.3 = = EXAMPLE 2. Let I [D, 1]. Let / : I - R he defined by setting f (x) 0 if = xis rational, and f (x) 1 if x is irrational. We show that / is not integrable over/. Let P be a partition of I. If R is any subinterval determined by P, then mR(/) = 0 and MR(!) = 1, since R contains both rational and irrational numbers. Then L(f, P) = L O• v(R) = D, R and U(f, P) = L 1 • v(R) = I. R Since P is arbitrary, it follows that the lower integral of / over I equals 0, and the upper integral equals 1. Thus/ is not integrable over I. A condition that is often useful for showing that a given function is integrable is the following: Theorem 10.3 (The Rien1ann condition). Let Q be a rectangle; let f : Q - R be a bounded function. Then equality holds if and only if given £ > 0, there exists a corresponding partition P of Q Jor which U(f, P) - L(f, P) < L §10. The Integral Over a Rectangle 87 Proof. Let P' be a fixed partition of Q. It follows from the fact that L(f, P) < U(f, P') for every partition P of Q, that 1f < U(f, P'). Now we use the fact that P' is arbitrary to conclude that Suppose now that the upper and lower integrals are equal. Choose a partition P so that L(f, P) is within £/2 of the integral Jq f, and a partition P' so that U(f, P') is within f./2 of the integral Jq /. Let P" be their common refinement. Since k L(f' P) < L(f, P") < f < u(I, P") < u(f, P'), the lower and upper sums for f determined by P" are within f. of each other. Conversely, suppose the upper and lower integrals are not equal. Let Let P be any partition of Q. Then hence the upper and lower sums for f determined by P are at least f. apart. Thus the Riemann condition does not hold. □ Here is an easy application of this theorem. = Theorem 10.4. Every constant function f(x) c is integrable. Indeed, if Q is a rectangle and if P is a partition of Q, then where the summation extends over all subrectangles determined by P.