3172 lines
138 KiB
Plaintext
3172 lines
138 KiB
Plaintext
ANALYSIS
|
|
ON
|
|
|
|
Analysis on Manifolds
|
|
James R. Munkres
|
|
Massachusetts Institute of Technology Cambridge, Massachusetts
|
|
ADDISON-WESLEY PUBLISHING COMPANY
|
|
The Advanced Book Program Redwood City, California • Menlo Park, California • Reading, Massachusetts New York • Don Mills, Ontario • Wokingham, United Kingdom • Amsterdam
|
|
Bonn• Sydney •Singapore• Tokyo• Madrid• San Juan
|
|
|
|
Publisher: Allan M. Wylde Production Manager: Jan V. Benes Marketing Manager: Laura Likely Electronic Composition: Peter Vacek Cover Design: Iva Frank
|
|
|
|
Library of Congress Cataloging-in-Publication Data
|
|
|
|
Munkres, James R., 1930-
|
|
|
|
Analysis on manifolds/James R. Munkres.
|
|
|
|
p. cm.
|
|
|
|
Includes bibliographical references.
|
|
|
|
1. Mathematical analysis. 2. Manifolds (Mathematics)
|
|
|
|
QA300.M75 1990
|
|
|
|
516.3'6'20-dc20
|
|
|
|
91-39786
|
|
|
|
ISBN 0-201-51035-9
|
|
|
|
CIP
|
|
|
|
This book was prepared using the '!EX typesetting language.
|
|
Copyright ©1991 by Addison-Wesley Publishing Company, The Advanced Book Program, 350 Bridge Parkway, Suite 209, Redwood City, CA 94065
|
|
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Published simultaneously in Canada.
|
|
ABCDEFGHIJ-MA-943210
|
|
|
|
Preface
|
|
This book is intended as a text for a course in analysis, at the senior or first-year graduate level.
|
|
A year-long course in real analysis is an essential part of the preparation of any potential mathematician. For the first half of such a course, there is substantial agreement as to what the syllabus should be. Standard topics include: sequence and series, the topology of metric spaces, and the derivative and the Riemannian integral for functions of a single variable. There are a number of excellent texts for such a course, including books by Apostol [A], Rudin [Ru], Goldberg [Go], and Royden (Ro], among others.
|
|
There is no such universal agreement as to what the syllabus of the second half of such a course should be. Part of the problem is that there are simply too many topics that belong in such a course for one to be able to treat them all within the confines of a single semester, at more than a superficial level.
|
|
At M.I.T., we have dealt with the problem by offering two independent second-term courses in analysis. One of these deals with the derivative and the Riemannian integral for functions of several variables, followed by a treatment of differential forms and a proof of Stokes' theorem for manifolds in euclidean space. The present book has resulted from my years of teaching this course~ The other deals with the Lebesgue integral in euclidean space and its applications to Fourier analysis.
|
|
Prequisites
|
|
As indicated, we assume the reader has completed a one-term course in analysis that included a study of metric spaces and of functions of a single variable. We also assume the reader has some background in linear algebra, including vector spaces and linear transformations, matrix algebra, and determinants.
|
|
The first chapter of the book is devoted to reviewing the basic results from linear algebra and analysis that we shall need. Results that are truly basic are
|
|
V
|
|
|
|
vi Preface
|
|
stated without proof, but proofs are provided for those that are sometimes omitted in a first course. The student may determine from a perusal of this chapter whether his or her background is sufficient for the rest of the book.
|
|
How much time the instructor will wish to spend on this chapter will depend on the experience and preparation of the students. I usually assign Sections 1 and 3 as reading material, and discuss the remainder in class.
|
|
How the book is organized
|
|
The main part of the book falls into two parts. The first, consisting of Chapter 2 through 4, covers material that is fairly standard: derivatives, the inverse function theorem, the Riemann integral, and the change of variables theorem for multiple integrals. The second part of the book is a bit more sophisticated. It introduces manifolds and differential forms in Rn, providing the framework for proofs of the n-dimensional version of Stokes' theorem and of the Poincare lemma.
|
|
A final chapter is devoted to a discussion of abstract manifolds; it is intended as a transition to more advanced texts on the subject.
|
|
The dependence among the chapters of the book is expressed in the following diagram:
|
|
|
|
Chapter 1 Chapter 2 Chapter 3
|
|
|
|
The Algebra and Topology of Rn
|
|
!
|
|
Differentiation
|
|
!
|
|
Integration
|
|
|
|
Chapter 4 ChLge of Variables
|
|
|
|
Chapter 5 Mlifolds
|
|
|
|
Chapter 7
|
|
|
|
Chapter 6 Differential Forms
|
|
I
|
|
Stokes' Theorem
|
|
|
|
Chapter 8 Closed Forms and Exact Forms
|
|
|
|
Chapter 9 Epilogue-Life Outside nn
|
|
|
|
Preface VII
|
|
Certain sections of the books are marked with an asterisk; these sections may be omitted without loss of continuity. Similarly, certain theorems that may be omitted are marked with asterisks. When I use the book in our undergraduate analysis sequence, I usually omit Chapter 8, and assign Chapter 9 as reading. With graduate students, it should be possible to cover the entire book.
|
|
At the end of each section is a set of exercises. Some are computational in nature; students find it illuminating to know that one can compute the volume of a five-dimensional ball, even if the practical applications are limited! Other exercises are theoretical in nature, requiring that the student analyze carefully the theorems and proofs of the preceding section. The more difficult exercises are marked with asterisks, but none is unreasonably hard.
|
|
Acknowledgements
|
|
Two pioneering works in this subject demonstrated that such topics as manifolds and differential forms could be discussed with undergraduates. One is the set of notes used at Princeton c. 1960, written by Nickerson, Spencer, and Steenrod [N-S-S]. The second is the book by Spivak [S]. Our indebtedness to these sources is obvious. A more recent book on these topics is the one by Guillemin and Pollack [G-P]. A number of texts treat this material at a more advanced level. They include books by Boothby [B], Abraham, Mardsen, and Raitu [A-M-R], Berger and Gostiaux [B-G], and Fleming [F]. Any of them would be suitable reading for the student who wishes to pursue these topics further.
|
|
I am indebted to Sigurdur Helgason and Andrew Browder for helpful comments. To Ms. Viola Wiley go my thanks for typing the original set of lecture notes on which the book is based. Finally, thanks is due to my students at M.I.T., who endured my struggles with this material, as I tried to learn how to make it understandable (and palatable) to them!
|
|
J.R.M.
|
|
|
|
Contents
|
|
|
|
PREFACE
|
|
|
|
V
|
|
|
|
CHAPTER 1 The Algebra and Topology of Rn
|
|
|
|
1
|
|
|
|
§1. Review of Linear Algebra 1 §2. Matrix Inversion and Determinants 11 §3. Review of Topology in Rn 25 §4. Compact Subspaces and Connected Subspaces of Rn 32
|
|
|
|
CHAPTER 2 Differentiation
|
|
§5. Derivative 41 §6. Continuously Differentiable Functions 49 §7. The Chain Rule 56 §8. The Inverse Function Theorem 63 *§9. The Implicit Function Theorem 71
|
|
|
|
41 ix
|
|
|
|
X Contents
|
|
|
|
CHAPTER 3 Integration
|
|
|
|
81
|
|
|
|
§10. The Integral over a Rectangle 81 §11. Existence of the Integral 91 §12. Evaluation of the Integral 98 §13. The Integral over a Bounded Set 104 §14. Rectifiable Sets 112 §15. Improper Integrals 121
|
|
|
|
CHAPTER 4 Changes of Variables
|
|
|
|
135
|
|
|
|
§16. Partitions of Unity 136 §17. The Change of Variables Theorem 144
|
|
§18. Diffeomorphisms in R" 152
|
|
§19. Proof of the Change of Variables Theorem 160 §20. Application of Change of Variables 169
|
|
|
|
CHAPTER 5 Manifolds
|
|
|
|
179
|
|
|
|
§21. The Volumne of a Parallelopiped 178 §22. The Volume of a Parametrized-Manifold 186
|
|
§23. Manifolds in R" 194
|
|
§24. The Boundary of a Manifold 201 §25. Integrating a Scalar Function over a Manifold 207
|
|
|
|
CHAPTER 6 Differential Forms
|
|
|
|
219
|
|
|
|
§26. §27. §28. §29. §30. *§31. §32.
|
|
|
|
Multilinear Algebra 220 Alternating Tensors 226 The Wedge Product 236 Tangent Vectors and Differential Forms 244 The Differential Operator 252 Application to Vector and Scalar Fields 262 The Action of a Differentiable Map 267
|
|
|
|
CHAPTER 7 Stokes' Theorem
|
|
|
|
Contents xi
|
|
275
|
|
|
|
§33. §34. §35. *§36. §37. *§38.
|
|
|
|
Integrating Forms over Parametrized-Manifold 275 Orientable Manifolds 281 Integrating Forms over Oriented Manifolds 293 A Geometric Interpretation of Forms and Integrals 297 The Generalized Stokes' Theorem 301 Applications to Vector Analysis 310
|
|
|
|
CHAPTER 8 Closed Forms and Exact Forms
|
|
|
|
323
|
|
|
|
§39. The Poincare Lemma 324 §40. The deRham Groups of Punctured Euclidean Space 334
|
|
|
|
CHAPTER 9 Epilogue-Life Outside Rn
|
|
|
|
345
|
|
|
|
§41. Differentiable Manifolds and Riemannian Manifolds 345
|
|
|
|
BIBLIOGRAPHY
|
|
|
|
259
|
|
|
|
Analysis on Manifolds
|
|
|
|
The Algebra and Topology of Rn
|
|
§1. REVIEW OF LINEAR ALGEBRA
|
|
Vector spaces
|
|
Suppose one is given a set V of objects, called vectors. And suppose
|
|
there is given an operation called vector addition, such that the sum of the vectors x and y is a vector denoted x + y. Finally, suppose there is given an operation called scalar multiplication, such that the product of the scalar
|
|
(i.e., real number) e and the vector xis a vector denoted ex. The set V, together with these two operations, is called a vector space
|
|
(or linear space) if the following properties hold for all vectors x, y, z and all scalars e, d:
|
|
= (1) X + y y + X.
|
|
= (2) x + (y + z) (x + y) + z.
|
|
= (3) There is a unique vector Osuch that x + 0 x for all x. = (4) x + (-l)x 0.
|
|
(5) lx =x.
|
|
= (6) e(dx) (ed)x. = (7) (c + d)x ex + dx.
|
|
(8) e(x + y) = ex+ cy.
|
|
1
|
|
|
|
2 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
One example of a vector space is the set Rn of all n-tuples of real numbers, with component-wise addition and multiplication by scalars. That is, if x = (x1, ... ,xn) andy= (Yt,•••,Yn), then
|
|
X + Y = (x1 + Yt, · · •, Xn + Yn),
|
|
ex= (cxi, ... , cxn)•
|
|
|
|
The vector space properties are easy to check.
|
|
|
|
If V is a vector space, then a subset W of V is called a linear subspace
|
|
|
|
(or simply, a subspace) of V if for every pair x,y of elements of W and every
|
|
scalar c, the vectors x + y and ex belong to W. In this case, W itself satisfies
|
|
|
|
properties (1)-(8) if we use the operations that W inherits from V, so that
|
|
|
|
Wis a vector space in its own right.
|
|
|
|
In the first part of this book, nn and its subspaces are the only vector
|
|
|
|
spaces with which we shall be concerned. In later chapters we shall deal with
|
|
|
|
more general vector spaces.
|
|
|
|
Let V be a vector space. A set a 1 , ... , Rm of vectors in V is said to
|
|
|
|
span V if to each x in V, there corresponds at least one m-tuple of scalars
|
|
|
|
C1, ... , Cm such that
|
|
|
|
= X C1a1 + · · · + Cm8m,
|
|
|
|
In this case, we say that x can be written as a linear combination of the vectors a1, ... , Rm.
|
|
The set a1, ... , am of vectors is said to be independent if to each x in V there corresponds at most one m-tuple of scalars c1, ... , Cm such that
|
|
|
|
Equivalently, {a1, ... , am} is independent if to the zero vector O there corre-
|
|
sponds only one m-tuple of scalars d1, ... , dm such that
|
|
= 0 d1a1 + ··· + dmam, = = = namely the scalars d1 d2 = · · · dm 0.
|
|
If the set of vectors a 1, ... , Rm both spans V and is independent, it is said to be a basis for V.
|
|
One has the following result:
|
|
Theorem 1.1. Suppose V has a basis consisting of m vectors. Then any set of vectors that spans V has at least m vectors, and any set of vectors of V that is independent has at most m vectors. In particular, any basis for V has exactly m vectors. □
|
|
If V has a basis consisting of m vectors, we say that m is the dimension of V. We make the convention that the vector space consisting of the zero
|
|
vector alone has dimension zero.
|
|
|
|
§1.
|
|
|
|
Review of Linear Algebra
|
|
|
|
3
|
|
|
|
It is easy to see that Rn has dimension n. (Surprise!) The following set of vectors is called the standard basis for Rn:
|
|
e1 = (1,0,0, ... ,0), e2 = (0,1,0, ... ,o),
|
|
|
|
en= (0,0,0, ... ,1).
|
|
The vector space Rn has many other bases, but any basis for Rn must consist of precisely n vectors.
|
|
One can extend the definitions of spanning, independence, and basis to
|
|
allow for infinite sets of vectors; then it is possible for a vector space to have an infinite basis. (See the exercises.) However, we shall not be concerned with this situation.
|
|
Because nn has a finite basis, so does every subspace of Rn. This fact is
|
|
a consequence of the following theorem:
|
|
Theorem 1.2. Let V be a vector space of dimension m. If W is a linear subspace of V {different from VJ, then W has dimension less than m. Furthermore, any basis a 1 , . .. , ak for W may be extended to a basis a1, ... ,ak, ak+l, ... ,am for V. □
|
|
Inner products
|
|
If V is a vector space, an inner product on V is a function assigning, to each pair x, y of vectors of V, a real number denoted (x, y), such that the following properties hold for all x, y, z in V and all scalars c:
|
|
(1) (x,y) = (y, x).
|
|
= (2) (x + y, z) (x, z) + (y, z). = (3) (cx,y) c(x,y) = (x, cy).
|
|
(4) (x,x) > 0 if x / 0.
|
|
A vector space V together with an inner product on V is called an inner
|
|
product space. A given vector space may have many different inner products. One par-
|
|
= ticularly useful inner product on nn is defined as follows: If x = (x1, ... , Xn)
|
|
and y (Y1, ... , Yn), we define
|
|
|
|
The properties of an inner product are easy to verify. This is the inner product we shall commonly use in Rn. It is sometimes called the dot product; we denote it by (x, y) rather than x • y to avoid confusion with the matrix product, which we shall define shortly.
|
|
|
|
4
|
|
|
|
The Algebra and Topology of R"
|
|
|
|
Chapter 1
|
|
|
|
If V is an inner product space, one defines the length (or norm) of a vector of V by the equation
|
|
|
|
The norm function has the following properties:
|
|
(1) llxll > 0 if xi 0.
|
|
(2) llcxll = lcl llxll-
|
|
(3) llx + YII < llxll + IIYII-
|
|
The third of these properties is the only one whose proof requires some work; it is called the triangle inequality. (See the exercises.) An equivalent form of this inequality, which we shall frequently find useful, is the inequality
|
|
(3') !Ix - YII > llxll - IIYII-
|
|
Any function from V to the reals R that satisfies properties (1)-(3) just
|
|
listed is called a norm on V. The length function derived from an inner product is one example of a norm, but there are other norms that are not derived from inner products. On Rn, for example, one has not only the familiar norm derived from the dot product, which is called the euclidean norm, but one has also the sup norm, which is defined by the equation
|
|
|
|
The sup norm is often more convenient to use than the euclidean norm. We note that these two norms on Rn satisfy the inequalities
|
|
lxl < llxll < v'nlxl.
|
|
Matrices
|
|
A matrix A is a rectangular array of numbers. The general number
|
|
appearing in the array is called an entry of A. If the array has n rows and m
|
|
columns, we say that A has size n by m, or that A is "an n by m matrix."
|
|
We usually denote the entry of A appearing in the ith row and Ph column by llij; we call i the row index and j the column index of this entry.
|
|
If A and B are matrices of size n by m, with general entries aii and bi;,
|
|
respectively, we define A + B to be the n by m matrix whose general entry is· aij + b,;, and we define cA to be the n by m matrix whose general entry
|
|
is Cllij. With these operations, the set of all n by m matrices is a vector
|
|
space; the eight vector space properties are easy to verify. This fact is hardly
|
|
surprising, for an n by m matrix is very much like an nm-tuple; the only difference is that the numbers are written in a rectangular array instead of a
|
|
linear array.
|
|
|
|
§1.
|
|
|
|
Review of Linear Algebra
|
|
|
|
5
|
|
|
|
The set of matrices has, however, an additional operation, called matrix
|
|
multiplication. If A is a matrix of size n by m, and if B is a matrix of size m by p, then the product A •B is defined to be the matrix C of size n by
|
|
p whose general entry c1; is given by the equation
|
|
|
|
I:m
|
|
|
|
= c1;
|
|
|
|
aikb1c;.
|
|
|
|
k=l
|
|
|
|
This product operation satisfies the following properties, which are straightforward to verify:
|
|
(1) A· (B -C) =(A· B) · C.
|
|
(2) A· (B + C) =A· B +A· C.
|
|
= (3) (A+ B)-C A-C + B · C. = (4) (cA). B c(A •B) =A· (cB).
|
|
(5) For each k, there is a k by k matrix I1c such that if A is any n by m
|
|
matrix, and A-Im= A.
|
|
|
|
In each of these statements, we assume that the matrices involved are of
|
|
appropriate sizes, so that the indicated operations may be performed.
|
|
The matrix I1c is the matrix of size k by k whose general entry Oij is
|
|
= = = defined as follows: Di; 0 if i i= j, and /J1; 1 if i j. The matrix l1c is
|
|
called the identity matrix of size k by k; it has the form
|
|
|
|
1 0
|
|
|
|
0
|
|
|
|
0 1
|
|
|
|
0
|
|
|
|
0 0
|
|
|
|
1
|
|
|
|
with entries of 1 on the "main diagonal" and entries of Oelsewhere.
|
|
We extend to matrices the sup norm defined for n-tuples. That is, if A
|
|
is a matrix of size n by m with general entry ai;, we define
|
|
= = = IAI max{lai;I; i 1, ... , n and j 1, ... , m}.
|
|
|
|
The three properties of a norm are immediate, as is the following useful result:
|
|
|
|
Theorem 1.3. If A has size n by m, and B has size m by p, then
|
|
|
|
IA· Bl < mlAI IBI. □
|
|
|
|
6 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
Linear transformations
|
|
If V and W are vector spaces, a function T : V --+ W is called a linear transformation if it satisfies the following properties, for all x, yin V and
|
|
all scalars c:
|
|
(1) T(x + y) ;::: T(x) + T(y).
|
|
= (2) T(cx) cT(x).
|
|
If, in addition, T carries V onto W in a one-to-one fashion, then T is called
|
|
a linear isomorphism. One checks readily that if T : V--+ W is a linear transformation, and if
|
|
S : W --+ X is a linear transformation, then the composite S o T : V --+ X is
|
|
a linear transformation. Furthermore, if T : V --+ W is a linear isomorphism,
|
|
then T- 1 : W--+ V is also a linear isomorphism. A linear transformation is uniquely determined by its values on basis
|
|
elements, and these values may be specified arbitrarily. That is the substance of the following theorem:
|
|
|
|
Theorem 1.4. Let V be a vector space with basis a1, ... , a,,.. Let
|
|
W be a vector space. Given any m vectors b 1, ... , bm in W, there is exactly one linear transformation T : V --+ W such that, for all z,
|
|
= T(ai) bi. □
|
|
|
|
In the special case where V and W are "tuple spaces" such as nm and
|
|
R", matrix notation gives us a convenient way of specifying a linear transformation, as we now show.
|
|
First we discuss row matrices and column matrices. A matrix of size 1
|
|
by n is called a row matrix; the set of all such matrices bears an obvious
|
|
resemblance to Rn. Indeed, under the one-to-one correspondence
|
|
|
|
the vector space operations also correspond. Thus this correspondence is a
|
|
linear isomorphism. Similarly, a matrix of size n by 1 is called a column
|
|
matrix; the set of all such matrices also bears an obvious resemblance to Rn. Indeed, the correspondence
|
|
|
|
is a linear isomorphism. The second of these isomorphisms is particularly useful when studying
|
|
linear transformations. Suppose for the moment that we represent elements
|
|
|
|
§1.
|
|
|
|
Review of Linear Algebra
|
|
|
|
7
|
|
|
|
of Rm and Rn by column matrices rather than by tuples. If A is a fixed n by m matrix, let us define a function T : Rm ~ Rn by the equation
|
|
T(x) = A ·x.
|
|
The properties of matrix product imply immediately that T is a linear trans-
|
|
formation. In fact, every linear transformation of Rm to Rn has this form. The proof
|
|
= is easy. Given T, let bi, ... , bm be the vectors of Rnsuch that T(e;) h;. = Then let A be the n by m matrix A [b1 • •• bm] with successive columns
|
|
b 1, ... , bm. Since the identity matrix has columns e1, ... , em, the equation
|
|
A· Im= A implies that A· e; = h; for all j. Then A· e; = T(e;) for all j; it follows from the preceding theorem that A• x = T(x) for all x.
|
|
The convenience of this notation leads us to make the following convention:
|
|
|
|
Convention. Throughout, we shall represent the elements of Rn by column matrices, unless we specifically state otherwise.
|
|
|
|
Rank of a matrix
|
|
Given a matrix A of size n by m, there are several important linear spaces associated with A. One is the space spanned by the columns of A, looked at as column matrices (equivalently, as elements of Rn). This space is called
|
|
the column space of A, and its dimension is called the column rank of A. Because the column space of A is spanned by m vectors, its dimension can
|
|
be no larger than m; because it is a subspace of Rn, its dimension can be no
|
|
larger than n. Similarly, the space spanned by the rows of A, looked at as row matrices
|
|
(or as elements of Rm) is called the row space of A, and its dimension is called the row rank of A.
|
|
The following theorem is of fundamental importance:
|
|
|
|
Theorem 1.5. For any matrix A, the row rank of A equals the column rank of A. □
|
|
|
|
Once one has this theorem, one can speak merely of the rank of a matrix A, by which one means the number that equals both the row rank of A and the column rank of A.
|
|
The rank of a matrix A is an important number associated with A. One cannot in general determine what this number is by inspection. However, there is a relatively simple procedure called Gauss-Jordan reduction that can be used for finding the rank of a matrix. (It is used for other purposes as well.) We assume you have seen it before, so we merely review its major features here.
|
|
|
|
8 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
One considers certain operations, called elementary row operations,
|
|
that are applied to a matrix A to obtain a new matrix B of the same size.
|
|
They are the following:
|
|
(1) Exchange rows i1 and i2 of A (where i1 f:. i2).
|
|
(2) Replace row i1 of A by itself plus the scalar c times row i2 (where i1 j i2).
|
|
(3) Multiply row i of A by the non-zero scalar A.
|
|
Each of these operations is invertible; in fact, the inverse of an elementary
|
|
operation is an elementary operation of the same type, as you can check. One
|
|
has the following result:
|
|
Theorem 1.6. If B is the matrix obtained by applying an elemen-
|
|
tary row operation to A, then
|
|
rank B = rank A. □
|
|
|
|
Gauss-Jordan reduction is the process of applying elementary operations
|
|
|
|
to A to reduce it to a special form called echelon form (or stairstep form),
|
|
|
|
for which the rank is obvious. An example of a matrix in this form is the
|
|
|
|
following:
|
|
|
|
@ * * ***
|
|
|
|
B=
|
|
|
|
@ * * **
|
|
|
|
0 0@ **
|
|
|
|
0 0 0 0 00
|
|
|
|
Here the entries beneath the "stairsteps" are 0; the entries marked * may be zero or non-zero, and the "corner entries," marked @, are non-zero. (The corner entries are sometimes called "pivots.") One in fact needs only operations of types (1) and (2) to reduce A to echelon form.
|
|
Now it is easy to see that, for a matrix B in echelon form, the non-zero
|
|
rows are independent. It follows that they form a basis for the row space of B,
|
|
so the rank of B equals the number of its non-zero rows.
|
|
For some purposes it is convenient to reduce B to an even more spe-
|
|
cial form, called reduced echelon form. Using elementary operations of type (2), one can make all the entries lying directly above each of the corner entries into O's. Then by using operations of type (3), one can make all the
|
|
corner entries into 1's. The reduced echelon form of the matrix B considered
|
|
previously has the form:
|
|
|
|
1 0 * 0 * *
|
|
C= o'71 * o * *
|
|
0 0 011 * * 0 0 0 0 0 0
|
|
|
|
§1.
|
|
|
|
Review of Linear Algebra
|
|
|
|
9
|
|
|
|
It is even easier to see that, for the matrix C, its rank equals the number
|
|
of its non-zero rows.
|
|
|
|
Transpose of a matrix
|
|
Given a matrix A of size n by m, we define the transpose of A to be the matrix D of size m by n whose general entry in row i and column j is
|
|
defined by the equation di;= a;i- The matrix Dis often denoted Atr_ The following properties of the transpose operation are readily verified:
|
|
(1) (Atryr = A.
|
|
= + (2) (A+ B)tr Atr Btr.
|
|
= (a) (A. C)tr Ctr. Atr.
|
|
(4) rank Atr = rank A.
|
|
The first three follow by direct computation, and the last from the fact that
|
|
the row rank of Atr is obviously the same as the column rank of A.
|
|
|
|
EXERCISES
|
|
= 1. Let V be a vector space with inner product (x, y} and norm llxll
|
|
(x, x}1/2.
|
|
(a) Prove the Cauchy-Schwarz inequality (x, y} $ llxll IIYII• [Hint:
|
|
= If x, y -=/:- 0, set c = 1/llxll and d 1/IIYII and use the fact that
|
|
llcx ± dyll 2: O.] (b) Prove that llx + YII $ llxll + IIYII • [Hint: Compute (x + Y, x + y)
|
|
and apply (a).]
|
|
(c) Prove that llx - YII 2: IJxll - IIYll-
|
|
2. If A is an n by m matrix and Bis an m by p matrix, show that
|
|
IA· Bl$ mlAI IBI.
|
|
3. Show that the sup norm on R2 is not derived from an inner product on R2 . [Hint: Suppose (x, y) is an inner product on R2 (not the dot product)
|
|
= having the property that lxl (x, y)112 . Compute (x ± y, x ± y} and = = apply to the case x e1 and y e2.] = = 4. (a) If x (X1, X2) and y (Y1, Y2), show that the function
|
|
|
|
[ 2 - 1] [Y1]
|
|
|
|
-1
|
|
|
|
1
|
|
|
|
Y2
|
|
|
|
is an inner product on R2 . *(b) Show that the function
|
|
|
|
(x, y) = [x1 x2] [ ab be] [YY12] is an inner product on R2 if and only if b2 - ac < 0 and a > 0.
|
|
|
|
10 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
*5. Let V be a vector space; let {aa} be a set of vectors of V, as a ranges over some index set J (which may be infinite). We say that the set {aa} spans V if every vector x in V can be written as a finite linear combination
|
|
|
|
of vectors from this set. The set {a 0 } is independent if the scalars are uniquely determined by x. The set {aa} is a basis for V if it both spans V and is independent.
|
|
(a) Check that the set R"'of all "infinite-tuples" of real numbers
|
|
|
|
is a vector space under component-wise addition and scalar multiplication.
|
|
= (b) Let R00 denote the subset of R"' consisting of all x (.r1, X2, ...) = such that x, 0 for all but finitely many values of i. Show R00 is a
|
|
subspace of R"'; find a basis for R00 . (c) Let :F be the set of all real-valued functions/: [a, b] - R. Show that
|
|
:F is a vector space if addition and scalar multiplication are defined in the natural way:
|
|
(! + g)(x) = f (x) + g(x),
|
|
(cf)(x) = cf(x).
|
|
(d) Let :Fs be the subset of :F consisting of all bounded functions. Let :F1 consist of all integrable functions. Let :Fe consist of all continuous functions. Let :Fo consist of all continuously differentiable functions. Let :Fp consist of all polynomial functions. Show that each of these is a subspace of the preceding one, and find a basis for :Fp. There is a theorem to the effect that every vector space has a basis. The proof is non-constructive. No one has ever exhibited specific bases for the vector spaces R"', :F, :Fe, :Fi, :Fe, :Fo.
|
|
(e) Show that the integral operator and the differentiation operator,
|
|
(IJ)(x) = /.:1: f (t) dt and (Df)(x) = /'(x),
|
|
are linear transformations. What are possible domains and ranges of these transformations, among those listed in (d)?
|
|
|
|
Matrix Inversion and Determinants 11 §2. MATRIX INVERSION AND DETERMINANTS
|
|
|
|
We now treat several further aspects of linear algebra. They are the following: elementary matrices, matrix inversion, and determinants. Proofs are included, in case some of these results are new to you.
|
|
|
|
Elementary matrices
|
|
Definition. An elementary matrix of size n by n is the matrix obtained by applying one of the elementary row operations to the identity ma-
|
|
trix In.
|
|
|
|
The elementary matrices are of three basic types, depending on which of the three operations is used. The elementary matrix corresponding to the first elementary operation has the form
|
|
|
|
1
|
|
|
|
1
|
|
|
|
0
|
|
|
|
1
|
|
|
|
1
|
|
|
|
0
|
|
|
|
1
|
|
|
|
1
|
|
|
|
The elementary matrix corresponding to the second elementary row operation has the form
|
|
|
|
1 1 1
|
|
E'=
|
|
0
|
|
|
|
C
|
|
1 1 1
|
|
|
|
. .
|
|
row i2
|
|
|
|
12 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
And the elementary matrix corresponding to the third elementary row operation has the form
|
|
|
|
1 1
|
|
E" =
|
|
|
|
, row t.
|
|
|
|
One has the following basic result:
|
|
|
|
1 1
|
|
|
|
Theorem 2.1. Let A be an n by m matrix. Any elementary row operation on A may be carried out by premultiplying A by the corresponding elementary matrix.
|
|
|
|
Proof. One proceeds by direct computation. The effect of multiplying A on the left by the matrix E is to interchange rows i1 and i2 of A. Similarly, multiplying A by E' has the effect of replacing row i1 by itself plus c times
|
|
row i2. And multiplying A by E" has the effect of multiplying row i by .A. D
|
|
|
|
We will use this result later on when we prove the change of variables theorem for a multiple integral, as well as in the present section.
|
|
The inverse of a matrix
|
|
Definition. Let A be a matrix of size n by m; let B and C be matrices
|
|
= of size m by n. We say that B is a left inverse for A if B •A Im, and we
|
|
= say that C is a right inverse for A if A · C In.
|
|
Theorem 2.2. If A has both a left inverse B and a right inverse C, then they are unique and equal.
|
|
Proof. Equality follows from the computation
|
|
|
|
If B1 is another left inverse for A, we apply this same computation with B1 replacing B. We conclude that C = B 1; thus B1 and B are equal. Hence B
|
|
is unique. A similar computation shows that C is unique. D
|
|
|
|
Matrix Inversion and Determinants 13
|
|
|
|
Definition. If A has both a right inverse and a left inverse, then A is said to be invertible. The unique matrix that is both a right inverse and a left inverse for A is called the inverse of A, and is denoted A- 1 .
|
|
|
|
A necessary and sufficient condition for A to be invertible is that A be square and of maximal rank. That is the substance of the following two theorems:
|
|
|
|
Theorem 2.3. then
|
|
|
|
Let A be a matrix of size n by m. If A is invertible,
|
|
n =m = mnk A.
|
|
|
|
Proof. Step 1. ,ve show that for any k by n matrix D,
|
|
rank (D · A) s rank A.
|
|
The proof is easy. If R is a row matrix of size 1 by n, then R • A is a row
|
|
matrix that equals a linear combination of the rows of A, so it is an element of the row space of A. The rows of D • A are obtained by multiplying the rows of D by A. Therefore each row of D · A is an element of the row space
|
|
of A. Thus the row space of D · A is contained in the row space of A and our
|
|
inequality follows.
|
|
Step 2. We show that if A has a left inverse B, then the rank of A equals the number of columns of A.
|
|
= = s The equation Im B · A implies by Step 1 that m rank (B · A)
|
|
rank A. On the other hand, the row space of A is a subspace of m-tuple
|
|
space, so that rank A < m.
|
|
Step 3. We prove the theorem. Let B be the inverse of A. The fact
|
|
that B is a left inverse for A implies by Step 2 that rank A = m. The fact
|
|
that B is a right inverse for A implies that
|
|
|
|
whence by Step 2, rank A= n. □
|
|
|
|
We prove the converse of this theorem in a slightly strengthened version:
|
|
|
|
Theorem 2.4.
|
|
|
|
Let A be a matrix of size n by m. Suppose
|
|
n =m = rank A.
|
|
|
|
Then A is invertible; and furthermore, A equals a product of elementary matrices.
|
|
|
|
14 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
Proof. Step 1. We note first that every elementary matrix is invert-
|
|
ible, and that its inverse is an elementary matrix. This follows from the fact
|
|
that elementary operations are invertible. Alternatively, you can check di-
|
|
rectly that the matrix E corresponding to an operation of the first type is its own inverse, that an inverse for E' can be obtained by replacing c by -c in the formula for E', and that an inverse for E" can be obtained by replacing
|
|
A by 1/ A in the formula for E".
|
|
Step 2. We prove the theorem. Let A be an n by n matrix of rank n. Let us reduce A to reduced echelon form C by applying elementary row operations. Because C is square and its rank equals the number of its rows, C must equal the identity matrix In. It follows from Theorem 2.1 that there is a sequence E1, ... , E1i: of elementary matrices such that
|
|
|
|
If we multiply both sides of this equation on the left by E;1, then by E;!1 ,
|
|
and so on, we obtain the equation
|
|
A -- E-11 . E-21 ••• E-k1'.
|
|
thus A equals a product of elementary matrices. Direct computation shows
|
|
that the matrix
|
|
is both a right and a left inverse for A. □
|
|
One very useful consequence of this theorem is the following:
|
|
Theorem 2.5. If A is a square matrix and if B is a left inverse for A, then B is also a right inverse for A.
|
|
Proof. Since A has a left inverse, Step 2 of the proof of Theorem 2.3 implies that the rank of A equals the number of columns of A. Since A is square, this is the same as the number of rows of A, so the preceding theorem implies that A has an inverse. By Theorem 2.2, this inverse must be B. □
|
|
An n by n matrix A is said to be singular if rank A < n; otherwise,
|
|
it is said to be non-singular. The theorems just proved imply that A is invertible if and only if A is non-singular.
|
|
Determinants
|
|
The determinant is a function that assigns, to each square matrix A, a number called the determinant of A and denoted <let A.
|
|
|
|
§2.
|
|
|
|
Matrix Inversion and Determinants 15
|
|
|
|
The notation IAI is often used for the determinant of A, but we are using
|
|
this notation to denote the sup norm of A. So we shall use "det A" to denote
|
|
the determinant instead. In this section, we state three axioms for the determinant function, and
|
|
we assume the existence of a function satisfying these axioms. The actual construction of the general determinant function will be postponed to a later chapter.
|
|
|
|
Definition. A function that assigns, to each n by n matrix A, a real number denoted det A, is called a determinant function if it satisfies the
|
|
following axioms:
|
|
(1) If B is the matrix obtained by exchanging any two rows of A, then det B = - det A.
|
|
(2) Given i, the function det A is linear as a function of the ith row alone.
|
|
= (3) det In l.
|
|
Condition (2) can be formulated as follows: Let i be fixed. Given an
|
|
n-tuple x, let Ai(x) denote the matrix obtained from A by replacing the ith row by x. Then condition (2) states that
|
|
= det Ai( ax+ by) a det Ai(x) + b <let Ai(y).
|
|
These three axioms characterize the determinant function uniquely, as we shall see.
|
|
|
|
EXAMPLE 1. In low dimensions, it is easy to construct the determinant function. For 1 by 1 matrices, the function
|
|
det [a]= a
|
|
will do. For 2 by 2 matrices, the function
|
|
|
|
suffices. And for 3 by 3 matrices, the function
|
|
|
|
will do, as you can readily check. For matrices of larger size, the definition is more complicated. For example, the expression for the determinant of a 4 by 4 matrix involves 24 terms; and for a 5 by 5 matrix, it involves 120 terms! Obviously, a less direct approach is needed. We shall return to this matter in Chapter 6.
|
|
Using the axioms, one can determine how the elementary row operations affect the value of the determinant. One has the following result:
|
|
|
|
16 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
Theorem 2.6. Let A be an n by n matrix.
|
|
(a) If E is the elementary matrix corresponding to the operation that exchanges rows i 1 and i2 , then det(E • A) = - <let A.
|
|
(b) If E' is the elementary matrix corresponding to the operation that replaces row i 1 of A by itself plus c times row i2 , then det(E' •A)= <let A.
|
|
(c) If E" is the elementary matrix corresponding to the operation that multiplies row i of A by the non-zero scalar ,\, then det(E" •A)= ,\(det A).
|
|
(d) If A is the identity matrix In, then det A = l.
|
|
|
|
Proof. Property (a) is a restatement of Axiom 1, and (d) is a restate-
|
|
|
|
ment of Axiom 3. Property (c) follows directly from linearity (Axiom 2); it
|
|
|
|
states merely that
|
|
|
|
= detAi(.\x) .\(detAi(x)).
|
|
|
|
Now we verify (b). Note first that if A has two equal rows, then <let A = 0.
|
|
For exchanging these rows does not change the matrix A, but by Axiom 1 it changes the sign of the determinant. Now let E' be the elementary operation
|
|
= that replaces row i i1 by itself plus c times row i2. Let x equal row i1 and
|
|
let y equal row i2. We compute
|
|
|
|
det(E' • A) = <let Ai(x + cy)
|
|
= <let Ai(x) + c <let Ai(Y)
|
|
= <let Ai(x), since Ai(Y) has two equal rows, = <let A, since Ai(x) = A. □
|
|
|
|
The four properties of the determinant function stated in this theorem are what one usually uses in practice rather than the axioms themselves. They also characterize the determinant completely, as we shall see.
|
|
One can use these properties to compute the determinants of the elemen-
|
|
tary matrices. Setting A= In in Theorem 2.6, we have
|
|
= det E = -1 and <let E' 1 and det E" = .\.
|
|
We shall see later how they can be used to compute the determinant in general. Now we derive the further properties of the determinant function that we
|
|
shall need.
|
|
Theore1n 2. 7. Let A be a square matrix. If the rows of A are
|
|
= independent, then <let A f: 0; if the rows are dependent, then <let A 0.
|
|
Thus an n by n matrix A has rank n if and only if det A f: 0.
|
|
|
|
Matrix Inversion and Determinants 17
|
|
Proof. First, we note that if the ith row of A is the zero row, then det A = 0. For multiplying row i by 2 leaves A unchanged; on the other hand, it must multiply the value of the determinant by 2.
|
|
Second, we note that applying one of the elementary row operations to A
|
|
does not affect the vanishing or non-vanishing of the determinant, for it alters
|
|
the value of the determinant by a factor of either -1 or 1 or A (where A '::/; 0). Now by means of elementary row operations, let us reduce A to a matrix B
|
|
in echelon form. (Elementary operations of types (1) and (2) will suffice.) If
|
|
the rows of A are dependent, rank A < n; then rank B < n, so that B must
|
|
= have a zero row. Then det B = 0, as just noted; it follows that det A 0.
|
|
If the rows of A are independent, let us reduce B further to echelon form C. Since C is square and has rank n, C must equal the identity ma-
|
|
trix In, Then det C '::/; 0; it follows that det A'::/; 0. □
|
|
The proof just given can be refined so as to provide a method for calculating the determinant function:
|
|
Theorem 2.8. Given a square matrix A, let use reduce it to echelon form B by elementary row operations of types (1) and (2). If
|
|
B has a zero row, then det A = 0. Otherwise, let k be the number of row
|
|
exchanges involved in the reduction process. Then det A equals (-1l
|
|
times the product of the diagonal entries of B.
|
|
= Proof. If B has a zero row, then rank A < n and det A 0. So
|
|
suppose that B has no zero row. We know from (a) and (b) of Theorem 2.6
|
|
that det A= (-l)l: det B. Furthermore, B must have the form
|
|
|
|
B=
|
|
|
|
'
|
|
|
|
0 0
|
|
|
|
bnn
|
|
|
|
where the diagonal entries are non-zero. It remains to show that
|
|
|
|
det B = bub22 •••bnn•
|
|
|
|
For that purpose, let us apply elementary operations of type (2) to make the entries above the diagonal into zeros. The diagonal entries are unaffected by the process; therefore the resulting matrix has the form
|
|
|
|
0 0
|
|
|
|
bnn
|
|
|
|
18 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
Since only operations of type (2) are involved, we have <let B = <let C.
|
|
Now let us multiply row 1 of C by 1/bu, row 2 by l/b22, and so on, obtaining
|
|
as our end result the identity matrix In. Property (c) of Theorem 2.6 implies that
|
|
<let In= (1/bn) (l/b22) • ••(1/bnn) <let C,
|
|
so that (using property (d))
|
|
|
|
as desired. □
|
|
Corollary 2.9. The determinant function is uniquely characterized by its three axioms. It is also characterized by the four properties listed in Theorem 2.6.
|
|
Proof. The calculation of det A just given uses only properties (a)-(d) of Theorem 2.6. These in turn follow from the three axioms. D
|
|
Theorem 2.10. Let A and B be n by n matrices. Then
|
|
= det(A · B) (<let A)· (det B).
|
|
|
|
Proof. Step 1. The theorem holds when A is an elementary matrix.
|
|
|
|
Indeed:
|
|
|
|
det(E · B) = -det B = (det E)(det B),
|
|
|
|
det(E' •B) = <let B = (<let E')(det B),
|
|
= det(E" • B) =A· det B (<let E") (det B).
|
|
|
|
= Step 2. The theorem holds when rank A n. For in that case, A is
|
|
a product of elementary matrices, and one merely applies Step 1 repeatedly.
|
|
= Specifically, if A E 1 •••Ek, then
|
|
|
|
= det(A • fl) det(E1 ···Ek • B)
|
|
= (detEi)det(E2 •• -Ei: • B)
|
|
|
|
= (det E 1) (<let E 2 ) • •• ( <let E,.:) (det B).
|
|
This equation holds for all B. In the case B = In, it tells us that
|
|
|
|
The theorem follows.
|
|
|
|
§2.
|
|
|
|
Matrix Inversion and Determinants 19
|
|
|
|
Step 3. We complete the proof by showing that the theorem holds if
|
|
rank A< n. We have in general,
|
|
= = rank (A· B) rank (A• B)tr rank (Btr • Atr) < rank Atr,
|
|
where the inequality follows from Step 1 of Theorem 2.3. Thus if rank A < n,
|
|
the theorem holds because both sides of the equation vanish. D
|
|
Even in low dimensions, this theorem would be very unpleasant to prove by direct computation. You might try it in the 2 by 2 case!
|
|
Theorem 2.11. <let Atr = <let A.
|
|
Proof. Step 1. We show the theorem holds when A is an elementary matrix.
|
|
Let E, E', and E" be elementary matrices of the three basic types. Direct
|
|
= = inspection shows that Etr E and (E")tr E", so the theorem is trivial
|
|
in these cases. For the matrix E' of type (2), we note that its transpose is another elementary matrix of type (2), so that both have determinant 1.
|
|
Step 2. We verify the theorem when A has rank n. In that case, A is a product of elementary matrices, say
|
|
|
|
Then
|
|
|
|
<let Atr = det(Elr · · · E~r · Eir) = (<let Elr) •· · (det E~r) (det Efr) = (<let Ee)··· (det E2)(<let E1) = (<let E1) (<let E2) · · · (det Ee)
|
|
=det(E1 •E2 ···Ek) =det A.
|
|
|
|
by Theorem 2.10, by Step 1,
|
|
|
|
Step 3. The theorem holds if rank A < n. In this case, rank A tr < n,
|
|
= so that <let Atr = 0 <let A. D
|
|
|
|
A formula for A- l
|
|
We know that A is invertible if and only if <let A f:. 0. Now we derive a formula for A- 1 that involves determinants explicitly.
|
|
|
|
Definition. Let A be an n by n matrix. The matrix of size n - 1 by n - 1 that is obtained from A by deleting the ith row and the Ph column of A is called the (i,j)-minor of A. It is denoted Aij• The number
|
|
(-l)Hi <let Ai;
|
|
is called the (i,j)-cofactor of A.
|
|
|
|
20 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
Lemma 2.12. Let A be an n by n matrix; let b denote its entry in row i and column j.
|
|
(a) If all the entries in row i other than b vanish, then
|
|
= det A b( -1 / +i det Aii .
|
|
|
|
(b) The same equation holds if all the entries in column j other than the entry b vanish.
|
|
|
|
Proof. Step 1. We verify a special case of the theorem. Let b, a2, ... , an be fixed numbers. Given an n - 1 by n - 1 matrix D, let A(D) denote the n by n matrix
|
|
|
|
b a2 ... an
|
|
A(D) = 0
|
|
D
|
|
0
|
|
= We show that det A(D) b(<let D).
|
|
If b = 0, this result is obvious, since in that case rank A(D) < n. So
|
|
assume b =/:- 0. Define a function f by the equation
|
|
= f(D) (1/b) det A(D).
|
|
\Ve show that f satisfies the four properties stated in Theorem 2.6t so that
|
|
f(D) = det D.
|
|
Exchanging two rows of D has the effect of exchanging two rows of A(D),
|
|
which changes the value off by a factor -1. Replacing row i 1 of D by itself
|
|
plus c times row i2 of D has the effect of replacing row (i1 + 1) of A(D) by itself plus row (i2 + 1) of A(D), which leaves the value off unchanged. Multiplying row i of D by .X has the effect of multiplying row (i + 1) of A(D)
|
|
by A, which changes the value off by a factor of A. Finally, if D = In-l,
|
|
= then A(D) is in echelon form, so det A(D) b • 1 •• •1 by Theorem 2.8, and f(D) = 1.
|
|
Step 2. It follows by taking transposes that
|
|
|
|
b 0 ... 0
|
|
|
|
a2
|
|
det
|
|
|
|
=b(<let D).
|
|
|
|
D
|
|
|
|
an
|
|
|
|
Step 3. \Ve prove the theorem. Let A be a matrix satisfying the hypotheses of our theorem. One can by a sequence of i-1 exchanges of adjacent
|
|
|
|
§2.
|
|
|
|
Matrix Inversion and Determinants 21
|
|
|
|
rows bring the ?h row of A up to the top of matrix, without affecting the order of the remaining rows. Then by a sequence of j - 1 exchanges of adja-
|
|
cent columns, one can bring the Jth column of this matrix to the left edge of
|
|
the matrix, without affecting the order of the remaining columns. The matrix C that results has the form of one of the matrices considered in Steps 1
|
|
and 2. Furthermore, the (1,1)-minor C1,1 of the matrix C is identical with
|
|
the (i, j)-minor Aij of the original matrix A.
|
|
Now each row exchange changes the sign of the determinant. So does each column exchange, by Theorem 2.11. Therefore
|
|
|
|
det C = (-l)(i-l)+(j-l) <let A= (-l)i+J det A.
|
|
|
|
Thus
|
|
|
|
<let A= (-l)i+i <let C,
|
|
= (-1 i+i b det C1 ,1
|
|
= (-1/+ibdet Aii•
|
|
|
|
by Steps 1 and 2, □
|
|
|
|
Theore1n 2.13 (Cramer's rule). Let A be an n by n matrix with
|
|
successive columns a 1, ... , an. Let
|
|
|
|
X = [] and C = []
|
|
be column matrices. If A• x = c, then
|
|
|
|
Proof. Let e1, ... , en be the standard basis for Rn, where each ei 1s written as a column matrix. Let C be the matrix
|
|
= = The equations A •e; aj and A •x c imply that
|
|
By Theorem 2.10,
|
|
= (det A)· (<let C) det [a1 • • •ai-1 c ai+l ···an]-
|
|
|
|
22 The Algebra and Topology of Rn
|
|
|
|
Now C has the form
|
|
|
|
1
|
|
|
|
X1
|
|
|
|
0
|
|
|
|
Chapter 1
|
|
|
|
C= 0
|
|
|
|
Xi
|
|
|
|
0 '
|
|
|
|
0
|
|
|
|
Xn
|
|
|
|
1
|
|
|
|
where the entry Xi appears in row i and column i. Hence by the preceding
|
|
|
|
lemma,
|
|
|
|
= detC X;(-l)Hidetln-1 = Xi.
|
|
|
|
The theorem follows. D
|
|
|
|
Here now is the formula we have been seeking:
|
|
|
|
Theorem 2.14.
|
|
Then
|
|
|
|
Let A be an n by n matrix of rank n; let B = A- 1 .
|
|
|
|
= (-l);+s det A;,
|
|
|
|
b,;
|
|
|
|
<let A •
|
|
|
|
Proof. Let j be fixed throughout this argument. Let
|
|
|
|
= denote the Ph column of the matrix B. The fact that A • B In implies in = particular that A •x e; . Cramer's rule tells us that
|
|
(det A)· Xi= <let [a1 • • ·Bi-1 e; 8i+1 ···an].
|
|
We conclude from Lemma 2.12 that
|
|
= (<let A)· x, 1 • (-1)i+i det A;i.
|
|
Since Xi =bi;, our theorem follows. D
|
|
This theorem gives us an algorithm for computing the inverse of a ma-
|
|
trix A. One proceeds as follows: (1) First, form the matrix whose entry in row i and column j is (-1 )i+i det Ai;; this matrix is called the matrix of cofactors of A.
|
|
(2) Second, take the transpose of this matrix.
|
|
(3) Third, divide each entry of this matrix by <let A.
|
|
|
|
Matrix Inversion and Determinants 23
|
|
This algorithm is in fact not very useful for practical purposes; computing determinants is simply too time-consuming. The importance of this formula for A- 1 is theoretical, as we shall see. If one wishes actually to compute A- 1, there is an algorithm based on Gauss-Jordan reduction that is more efficient. It is outlined in the exercises.
|
|
Expansion by cofactors
|
|
We now derive a final formula for evaluating the determinant. This is the one place we actually need the axioms for the determinant function rather than the properties stated in Theorem 2.6.
|
|
Theorem 2.15. Let A be an n by n matrix. Let i be fixed. Then
|
|
n
|
|
det A= 1)-l)i+kao: •det AH:•
|
|
k=l
|
|
Here Aik is, as usual, the (i, k)-minor of A. This rule is called the "rule
|
|
for expansion of the determinant by cofactors of the ith row." There is a
|
|
similar rule for expansion by cofactors of the ih column, proved by taking transposes.
|
|
Proof. Let Ai(x), as usual, denote the matrix obtained from A by re-
|
|
placing the ith row by the n-tuple x. If e 1 , ... , en denote the usual basis vectors in R" (written as row matrices in this case), then the ith row of A
|
|
can be written in the form
|
|
|
|
Then
|
|
|
|
L n
|
|
<let A= aik • <let A,:(ek)
|
|
|
|
by linearity (Axiom 2),
|
|
|
|
k=l
|
|
= Ln ai1:(-l )i+k det Aik
|
|
|
|
by Lemma 2.12. D
|
|
|
|
k=l
|
|
|
|
24 The Algebra and Topology of Rn EXERCISES 1. Consider the matrix
|
|
|
|
Chapter 1
|
|
|
|
(a) Find two different left inverses for A.
|
|
(b) Show that A has no right inverse.
|
|
2. Let A be an n by m matrix with n =I- m.
|
|
= (a.) If rank A m, show there exists a matrix D that is a. product of
|
|
elementary ma.trices such that
|
|
|
|
(b) Show that A has a left inverse if and only if rank A= m. (c) Show that A has a right inverse if and only if rank A= n.
|
|
3. Verify that the functions defined in Example 1 satisfy the axioms for the determinant function.
|
|
4. (a) Let A be an n by n matrix of rank n. By applying elementary row operations to A, one can reduce A to the identity matrix. Show
|
|
that by applying the same operations, in the same order, to In, one obtains the matrix A-1. (b) Let
|
|
!~] A= [~
|
|
Calculate A-1 by using the algorithm suggested in (a). [Hint: An easy way to do this is to reduce the 3 by 6 matrix (A /3] to reduced echelon form.]
|
|
(c) Calculate A-1 using the formula involving determinants.
|
|
5. Let
|
|
where ad - be =I- 0. Find A- 1 .
|
|
*6. Prove the following:
|
|
Theorem. Let A be a k by k matrix, let D have size n by n and let C have size n by k. Then
|
|
= det [~ ~] (detA)-(detD).
|
|
|
|
§3.
|
|
|
|
Review of Topology in Rn 25
|
|
|
|
Proof. First show tha.t
|
|
|
|
[
|
|
|
|
A o
|
|
|
|
o] · [lk In C
|
|
|
|
O] = [A
|
|
D C
|
|
|
|
o] . D
|
|
|
|
Then use Lemma 2.12.
|
|
|
|
§3. REVIEW OF TOPOLOGY IN Rn
|
|
Metric spaces
|
|
Recall that if A and B are sets, then Ax B denotes the set of all ordered
|
|
pairs (a,b) for which a EA and b EB. Given a set X, a metl'ic on X is a function d: X x X ....... R such that
|
|
the following properties hold for all x, y, z E X:
|
|
(1) d(x,y) = d(y,x).
|
|
= (2) d(x, y) > 0, and equality holds if and only if x y.
|
|
(3) d(x,z) < d(x,y) + d(y, z).
|
|
A metric space is a set X together with a specific metric on X. We often suppress mention of the metric, and speak simply of "the metric space X ."
|
|
If X is a metric space with metric d, and if Y is a subset of X, then the restriction of d to the set Y x Y is a metric on Y; thus Y is a metric space in its own right. It is called a subspace of X.
|
|
For example, Rn has the metrics
|
|
d(x, Y) = II x - Y II and d(x,y) = Ix - y I;
|
|
they are called the euclidean metric and the sup metric, respectively. It follows immediately from the properties of a norm that they are metrics. For many purposes, these two metrics on Rn are equivalent, as we shall see.
|
|
We shall in this book be concerned only with the metric space Rn and its subspaces, except for the expository final section, in which we deal with general metric spaces. The space Rn is commonly called n-dimensional euclidean space.
|
|
If X is a metric space with metric d, then given Xo E X and given f > 0,
|
|
the set
|
|
|
|
26 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
is called the £-neighborhood of x0 , or the £-neighborhood centered at x0 . A subset U of X is said to be open in X if for each x0 E U there is a
|
|
corresponding f > 0 such that U(x0;E) is contained in U. A subset C of X
|
|
is said to be closed in X if its complement X - C is open in X. It follows
|
|
from the triangle inequality that an £-neighborhood is itself an open set.
|
|
If U is any open set containing x 0 , we commonly refer to U simply as a neighborhood of x 0 .
|
|
|
|
Theorem 3.1. Let (X, d) be a metric spa.ce. Then finite intersections and arbitrary unions of open sets of X are open in X. Similarly, finite unions and arbitrary intersections of closed sets of X are closed in X. □
|
|
|
|
Theorem 3.2. Let X be a metric space; let Y be a subspace. A subset A of Y is open in Y if and only if it has the form
|
|
|
|
A= UnY,
|
|
|
|
where U is open in X. Similarly, a subset A of Y is closed in Y if and only if it has the form
|
|
A=CnY,
|
|
where C is closed in X. □
|
|
|
|
It follows that if A is open in Y and Y is open in X, then A is open in X. Similarly, if A is closed in Y and Y is closed in X, then A is closed in X.
|
|
If X is a metric space, a point x 0 of X is said to be a limit point of the subset A of X if every £-neighborhood of Xo intersects A in at least
|
|
one point different from Xo. An equivalent condition is to require that every
|
|
neighborhood of x0 contain infinitely many points of A.
|
|
|
|
Theorem 3.3. If A is a subset of X, then the set A consisting of A and all its limit points is a closed set of X. A subset of Xis closed if and only if it contains all its limit points. □
|
|
|
|
The set A is called the closure of A.
|
|
|
|
In Rn, the £-neighborhoods in our two standard metrics are given special
|
|
|
|
names. If a E Rn, the £-neighborhood of a in the euclidean metric is called the
|
|
|
|
open ball of radius f centered at a, and denoted B(a; f). The £-neighborhood
|
|
|
|
of a in the sup metric is called the open cube of radius f centered at a, and
|
|
denoted C(a; f). The inequalities Ix [ < II x II < y'n Ix I lead to the following
|
|
|
|
inclusions:
|
|
|
|
vn B(a; £) C C(a; E) C B(a; £).
|
|
|
|
These inclusions in turn imply the following:
|
|
|
|
§3.
|
|
|
|
Review of Topology in Rn 27
|
|
|
|
Theorem 3.4. If X is a subspace of Rn, the collection of open sets of X is the same whether one uses the euclidean metric or the sup metric on X. The same is true for the collection of closed sets of X. □
|
|
|
|
In general, any property of a metric space X that depends only on the collection of open sets of X, rather than on the specific metric involved, is called a topological property of X. Limits, continuity, and compactness are examples of such, as we shall see.
|
|
|
|
Limits and Continuity
|
|
Let X and Y be metric spaces, with metrics dx and dy, respectively. We say that a function f: X -+ Y is continuous at the point Xo of X if for each open set V of Y containing f(xo), there is an open set U of X
|
|
containing Xo such that f (U) C V. \,Ve say f is continuous if it is continuous
|
|
at each point x 0 of.\'". Continuity off is equivalent to the requirement that
|
|
for each open set V of Y, the set
|
|
|
|
is open in X, or alternatively, the requirement that for each closed set D
|
|
of Y, the set f- 1(D) is closed in X.
|
|
Continuity may be formulated in a way that involves the metrics specif-
|
|
ically. The function f is continuous at x 0 if and only if the following holds: For each f > O, there is a corresponding 8 > 0 such that
|
|
dy(f(x), f(xo)) < f whenever dx(x, xo) < .i.
|
|
This is the classical ('f-D formulation of continuity."
|
|
Note that given Xo E X it may happen that for some 8 > 0, the 8-
|
|
neighborhood of Xo consists of the point Xo alone. In that case, x 0 is called an
|
|
isolated point of X, and any function f: X-+ Y is automatically continuous
|
|
at xo! A constant function from X to Y is continuous, and so is the identity
|
|
function ix: X-+ X. So are restrictions and composites of continuous functions:
|
|
|
|
Theore1n 3.5. (a) Let xo E A, where A is a subspace of X. If
|
|
f : X -+ Y is continuous at x0 , then the restricted Junction f IA: A -+ Y
|
|
is continuous at x 0 .
|
|
{b) Let f: X-+ Y and g: Y-+ Z. If f is continuous at x0 and g is continuous at Yo = f(x 0 ), then go f: X-+ Z is continuous at x 0 • □
|
|
|
|
Theorem 3.6. the form
|
|
|
|
(a) Let X be a metric space. Let f: X-+ nn have f(x) = (f1(x), ... ,fn(x)).
|
|
|
|
28 The Algebra and Topology of R"
|
|
|
|
Chapter 1
|
|
|
|
Then J is continuous at x0 if and only if each function Ji :X --+ R is
|
|
continuous at x 0 . The functions Ji are called the component functions
|
|
off.
|
|
(b) Let J,g: X-+ R be continuous at xo. Then J + g and f - g and
|
|
J ·g are continuous at xo; and f/g is continuous at Xo if g(x0 ) # 0.
|
|
= (c) The projection function 'Tri :R" -+ R given by 1r,(x) Xi is con-
|
|
tinuous. □
|
|
|
|
These theorems imply that functions formed from the familiar real-valued continuous functions of calculus, using algebraic operations and composites, are continuous in R". For instance, since one knows that the functions ex and sin x are continuous in R, it follows that such a function as
|
|
f(s, t, u, v) = (sin(s + t))/euv
|
|
|
|
is continuous in R4 .
|
|
Now we define the notion of limit. Let X be a metric space. Let ACX
|
|
and let J: A-+ Y. Let Xo be a limit point of the domain A of J. (The point
|
|
Xo may or may not belong to A.) We say that f(x) approaches Yo as x
|
|
approaches Xo if for each open set V of Y containing y0 , there is an open set U of X containing Xo such that f(x) E V whenever x is in Un A and
|
|
x # Xo. This statement is expressed symbolically in the form
|
|
|
|
f(x)-+ Yo as x-+ Xo-
|
|
|
|
We also say in this situation that the limit of f(x), as x approaches Xo, is Yo- This statement is expressed symbolically by the equation
|
|
|
|
lim f(x) = Yo•
|
|
x-xo
|
|
Note that the requii-ement that x 0 be a limit point of A guarantees that
|
|
there exist points x different from x0 belonging to the set Un A. We do not attempt to define the limit off if x0 is not a limit point of the domain
|
|
of J.
|
|
Note also that the value off at Xo (provided f is even defined at xo) is not involved in the definition of the limit.
|
|
The notion of limit can be formulated in a way that involves the metrics specifically. One shows readily that f(x) approaches Yo as x approaches Xo
|
|
if and only if the following condition holds: For each € > 0, there is a corresponding 6 > 0 such that
|
|
o. dy(f(x), y0 ) < € whenever x E A and O< dx(x, x0) <
|
|
There is a direct relation between limits and continuity; it is the following:
|
|
|
|
Review of Topology in R" 29
|
|
Theorem 3. 7. Let f: X --+ Y. If x 0 is an isolated point of X,
|
|
then f is continuous at x 0 • Otherwise, f is continuous at x 0 if and only if f (X) --+ f (XO) as X --+ XO , □
|
|
|
|
Most of the theorems dealing with continuity have counterparts that deal with limits:
|
|
|
|
Thcoren1 3.8.
|
|
|
|
(a) Let A C X; let f: A --+ R" have the form
|
|
f(x) =(f1(x), ... , fn(x)).
|
|
|
|
Let a= (a1, ... , an)- Then f(x) --+ a as x--+ Xo if and only if fi(x)--+ ai as x --+ xo, for each i.
|
|
|
|
(b) Let f,g: A--+ R. If f(x) --+ a and g(x)--+ b as x --+ xo, then as
|
|
|
|
X --+ Xo,
|
|
|
|
J(x) + g(x)--+ a+ b,
|
|
|
|
J(x) - g(x)--+ a - b,
|
|
|
|
f(x) •g(x)--+ a• b;
|
|
|
|
also, f(x)/g(x)--+ a/b if b # 0. D
|
|
|
|
Interior and Exterior
|
|
The following concepts make sense in an arbitrary metric space. Since we shall use them only for R", we define them only in that case.
|
|
|
|
Definition. Let A be a subset of Rn. The interior of A, as a subset of Rn, is defined to be the union of all open sets of R" that are contained in A; it is denoted Int A. The exterior of A is defined to be the union of all open sets of R" that are disjoint from A; it is denoted Ext A. The boundary of A consists of those points of Rn that belong neither to Int A nor to Ext A; it is denoted Bd A.
|
|
|
|
A point x is in Bd A if and only if every open set containing x intersects both A and the complement R" - A of A. The space R" is the union of the disjoint sets Int A, Ext A, and Bd A; the first two are open in nn and the
|
|
third is closed in Rn. For example, suppose Q is the rectangle
|
|
|
|
consisting of all points x of R" such that ai < Xi < bi for all i. You can check
|
|
that
|
|
|
|
30 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
We often call Int Q an open rectangle. Furthermore, Ext Q = R" - Q and Bd Q = Q - Int Q.
|
|
An open cube is a special case of an open rectangle; indeed,
|
|
|
|
The corresponding (closed) rectangle
|
|
|
|
is often called a closed cube, or simply a cube, centered at a.
|
|
EXERCISES
|
|
Throughout, let X be a metric space with metric d. 1. Show that U(x0 ; t:) is an open set. 2. Let Y C X. Give an example where A is open in Y but not open in X. Give an example where A is closed in Y but not closed in X. 3. Let ACX. Show that if C is a closed set of X and C contains A, then C contains A.
|
|
4. (a) Show that if Q is a rectangle, then Q equals the closure of Int Q.
|
|
(b) If Dis a closed set, what is the relation in general between the set D and the closure of Int D?
|
|
(c) If U is an open set, what is the relation in general between the set U
|
|
and the interior of U? 5. Let /: X - Y. Show that / is continuous if and only if for each x EX
|
|
there is a neighborhood U of x such that / IU is continuous.
|
|
6. Let X = AU B, where A and B are subspaces of X. Let /: X - Y;
|
|
suppose that the restricted functions
|
|
/IA:A-Y and /IB:B-Y
|
|
are continuous. Show that if both A and B are closed in X, then / is continuous. 7. Finding the limit of a composite function go f is easy if both / and g are continuous; see Theorem 3.5. Otherwise, it can be a bit tricky:
|
|
Let/ :X - Y and g: Y - Z. Let Xo be a limit point of X and let Yo be a limit point of Y. See Figure 3.1. Consider the following three conditions:
|
|
(i) / (x) - Yo as x - Xo. (ii) g(y) - Zo as y - Yo, (iii) g(f (x)) - zo as x - Xo. (a) Give an example where (i) and (ii) hold, but (iii) does not.
|
|
= (b) Show that if (i) and (ii) hold and if g(y0 ) zo, then (iii) holds.
|
|
|
|
Review of Topology in R" 31
|
|
|
|
= 8. Let f: R - R be defined by setting f(x) sin z if z is rational, and
|
|
/(x) = 0 otherwise. At what points is/ continuous?
|
|
|
|
9. If we denote the general point of R2 by (z, y), determine Int A, Ext A, a.nd
|
|
Bd A for the subset A of R2 specified by each of the following conditions:
|
|
|
|
= (a) x 0.
|
|
|
|
(e) x and y are rational.
|
|
|
|
(b) 0 $ X < 1.
|
|
|
|
(f) 0 < x2 + y2 < 1.
|
|
|
|
(c) 0 :5 x < 1 and O:5 y < 1.
|
|
|
|
(g) y < x2 •
|
|
|
|
(d) xis rational and y > 0.
|
|
|
|
(h) y :5 x 2 .
|
|
|
|
I
|
|
|
|
g
|
|
|
|
• --· y
|
|
|
|
• Zo
|
|
|
|
Yo
|
|
|
|
y
|
|
|
|
Figure 3.1
|
|
|
|
32 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
§4. COMPACT SUBSPACES ANO CONNECTED SUBSPACES OF R"
|
|
|
|
An important class of subspaces of Rn is the class of compact spaces. We shall use the basic properties of such spaces constantly. The properties we shall need are summarized in the theorems of this section. Proofs are included, since some of these results you may not have seen before.
|
|
A second useful class of spaces is the class of connected spaces; we summarize here those few properties we shall need.
|
|
We do not attempt to deal here with compactness and connectedness in arbitrary metric spaces, but comment that many of our proofs do hold in that more general situation.
|
|
Compact spaces
|
|
Definition. Let X be a subspace of Rn. A covering of Xis a collection
|
|
of subsets of R" whose union contains X; if each of the subsets is open in Rn,
|
|
it is called an open covering of X. The space X is said to be compact if
|
|
every open covering of X contains a finite subcollection that also forms an open covering of X.
|
|
While this definition of compactness involves open sets of R", it can be reformulated in a manner that involves only open sets of the space X:
|
|
Theorem 4.1. A subspace X of Rn is compact if and only if for every collection of sets open in X whose union is X, there is a finite subcollection whose union equals X.
|
|
Proof. Suppose X is compact. Let {A0} be a collection of sets open in X whose union is X. Choose, for each a, an open set Ua of R" such
|
|
= that A 0 U0 n .X. Since X is compact, some finite subcollection of the
|
|
= collection {Ua} covers X, say for a a 1, ... , O'k. Then the sets A0 , for
|
|
a = O::i, ... , ak, have X as their union.
|
|
The proof of the con verse is similar. D
|
|
The following result is always proved in a first course in analysis, so the proof will be omitted here:
|
|
Theorem 4.2. The subspace [a, b] of R is compact. □
|
|
Definition. A subspace X of Rn is said to be bounded if there is an
|
|
M such that lxl < l.1 for all x EX.
|
|
We shall eventually show that a subspace of Rn is compact if and only if it is closed and bounded. Half of that theorem is easy; we prove it now:
|
|
|
|
§4.
|
|
|
|
Compact Subspaces and Connected Subspaces of Rn 33
|
|
|
|
Theorem 4.3. If X is a compact subspace of Rn, then X is closed and bounded.
|
|
Proof. Step 1. We show that X is bounded. For each positive integer N, let UN denote the open cube UN= C(O;N). Then UN is an open set; and U1 C U2 C • • •; and the sets UN cover all of R" (so in particular they cover X). Some finite subcollection also covers X, say for N =Ni, ... ,N1;. If M is the largest of the numbers N 1, ... , N 1;, then X is contained in UM; thus X is bounded.
|
|
Step 2. We show that X is closed by showing that the complement of X is open. Let a be a point of R" not in X; we find an £-neighborhood of a that lies in the complement of X.
|
|
For each positive integer N, consider the cube
|
|
CN={x;Jx-al <1/N}.
|
|
Then C1 :) C2 :) • • •, and the intersection of the sets CN consists of the
|
|
point a alone. Let VN be the complement of CN; then VN is an open set; and V1 C V2 C • • •; and the sets VN cover all of R"except for the point a (so they
|
|
cover X). Some finite subcollection covers X, say for N = N1, ... , N1;. If M is the largest of the numbers Ni, ... , Nk, then X is contained in VM. Then
|
|
the set CM is disjoint from X, so that in particular the open cube C(a; 1/M)
|
|
lies in the complement of X. See Figure 4.1. D
|
|
|
|
X
|
|
|
|
Figure 4.1
|
|
Corollary 4.4. Let X be a compact subspace of R. Then X has a largest element and a smallest element.
|
|
|
|
34 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
Proof. Since X is bounded, it has a greatest lower bound and a least upper bound. Since X is closed, these elements must belong to X. □
|
|
|
|
Here is a basic (and familiar) result that is used constantly:
|
|
|
|
Theorem 4.5 {Extre1ne-value theorem). Let X be a compact subspace of Rm. If f : X - Rn is continuous, then f (X) is a compact subspace of Rn.
|
|
In particular, if </> : X - R is continuous, then </> has a maximum value and a minimum value.
|
|
|
|
Proof. Let {Va} be a collection of open sets of Rn that covers f(X).
|
|
The sets f- 1(Va) form an open covering of X. Hence some finitely many of
|
|
them cover X, say for a= a1, ... ,ak. Then the sets Va for a= 01, ... ,ak cover f (X). Thus f (X) is compact.
|
|
Now if</> : X --+ R is continuous, </>(X) is compact, so it has a largest
|
|
element and a smallest element. These are the maximum and minimum values of <f>. □
|
|
|
|
Now we prove a result that may not be so familiar.
|
|
Definition. Let ..Y be a subset of nn. Given f > 0, the union of the
|
|
sets B(a; f), as a ranges over all points of X, is called the €-neighborhood
|
|
of X in the euclidean metric. Similarly, the union of the sets C(a; f) is called the €-neighborhood of X in the sup metric.
|
|
|
|
Theoren1 4.6 {The €-neighborhood theorem). Let X be a com-
|
|
pact subspace of R"; let U be an open set of R"containing X. Then
|
|
there is an l > 0 such that the €-neighborhood of X (in either metric)
|
|
is contained in U.
|
|
|
|
Proof. The €-neighborhood of X in the euclidean metric is contained in the €-neighborhood of X in the sup metric. Therefore it suffices to deal only with the latter case.
|
|
Step 1. Let C be a fixed subset of R". For each x E R", we define
|
|
|
|
d(x, C) = inf {Ix - c I; c E C}.
|
|
|
|
We call d(x,C) the distance from x to C. We show it is continuous as a function of x:
|
|
Let c EC; let x, y ER". The triangle inequality implies that
|
|
d(x,C)- lx-yl < Ix-cl- lx-yl < ly-cl,
|
|
|
|
§4.
|
|
|
|
Compact Subspaces and Connected Subspaces of Rn 35
|
|
|
|
This inequality holds for all c E C; therefore
|
|
|
|
d(x,C)- lx-yl <d(y,C),
|
|
|
|
so that
|
|
|
|
d(x,C)-d(y,C) < lx-yl.
|
|
|
|
The same inequality holds if x and y are interchanged; continuity of d(x, C)
|
|
follows.
|
|
|
|
Step 2. We prove the theorem. Given U, define / : X -+ R by the equation
|
|
/ (x) = d(x, Rn - U).
|
|
|
|
Then/ is a continuous function. Furthermore, /(x) > 0 for all x EX. For if x EX, then some c5-neighborhood of xis contained in U, whence J(x) > c5.
|
|
Because X is compact, f has a minimum value €. Because f takes on only
|
|
positive values, this minimum value is positive. Then the €-neighborhood
|
|
of X is contained in U. □
|
|
|
|
This theorem does not hold without some hypothesis on the set X. If X is the x-axis in R2, for example, and U is the open set
|
|
|
|
then there is no € such that the €-neighborhood of X is contained in U. See
|
|
Figure 4.2.
|
|
|
|
Figure 4.2
|
|
Here is another familiar result.
|
|
Theorem 4.7 (Uniform continuity). Let X be a compact subspace
|
|
of Rm; let f : X -+ Rn be continuous. Given € > 0, there is a c5 > 0
|
|
such that whenever x, y E X,
|
|
Ix - y I < c5 implies I/(x) - /(y) I < €.
|
|
|
|
36 The Algebra and Topology of nn
|
|
|
|
Chapter 1
|
|
|
|
This result also holds if one uses the euclidean metric instead of the sup metric.
|
|
|
|
The condition stated in the conclusion of the theorem is called the condition of uniform continuity.
|
|
|
|
Proof. Consider the subspace X X X of nm X nm; and within this,
|
|
consider the space
|
|
.6. = { (x, x) Ix EX},
|
|
which is called the diagonal of X x X. The diagonal is a compact subspace of R2m, since it is the image of the compact space X under the continuous map f(x) = (x, x).
|
|
We prove the theorem first for the euclidean metric. Consider the function
|
|
g : X x X ~ R defined by the equation
|
|
g(x, y) = II f (x) - f (y) II•
|
|
Then consider the set of points (x, y) of X x X for which g(x, y) < €. Because g is continuous, this set is an open set of X x X. Also, it contains the diagonal .6., since g(x, x) = 0. Therefore, it equals the intersection with X x X of an open set U of Rm x Rm that contains .6.. See Figure 4.3.
|
|
|
|
(x,y) .,...,__,_-(y' y)
|
|
X
|
|
|
|
X
|
|
Figure 4.3
|
|
Compactness of .6. implies that for some 6, the 6-neighborhood of .6. is
|
|
contained in U. This is the fJ required by our theorem. For if x, y E X with
|
|
llx-yll <6,then
|
|
II (x, Y) - (y, Y) II = ll (x - Y, 0) II = II x - Y II < c5,
|
|
so that (x,y) belongs to the 6-neighborhood of the diagonal .6.. Then (x,y) belongs to U, so that g(x, y) < €, as desired.
|
|
|
|
Compact Subspaces and Connected Subspaces of Rn 37
|
|
The corresponding result for the sup metric can be derived by a similar
|
|
proof, or simply by noting that if Ix-y I < 8/ fa, then II x -y II < 8, whence I/(x) - /(y) I < II f(x) - f (y) II < €. □
|
|
To complete our characterization of the compact subspaces of Rn, we need the following lemma:
|
|
Lemma 4.8. The rectangle
|
|
= Q [a1, b1] X • • • X [an, bn]
|
|
in Rn is compact.
|
|
= Proof. We proceed by induction on n. The lemma is true for n I; we
|
|
suppose it true for n - 1 and prove it true for n. We can write
|
|
where X is a rectangle in Rn- 1. Then X is compact by the induction hypothesis. Let A be an open covering of Q.
|
|
Step 1. We show that given t E [an, bn], there is an € > 0 such that the
|
|
set
|
|
+ X X (t - €, t €)
|
|
can be covered by finitely many elements of A.
|
|
The set X x t is a compact subspace of Rn, for it is the image of X under
|
|
= the continuous map / : X _,. Rn given by f (x) (x, t). Therefore it may be
|
|
covered by finitely many elements of A, say by A1, ... , Ak. Let U be the union of these sets; then U is open and contains Xx t. See
|
|
Figure 4.4.
|
|
u
|
|
t Xx t
|
|
X
|
|
Figure 4.4
|
|
Because X x t is compact, there is an € > 0 such that the €-neighborhood
|
|
of Xx tis contained in U. Then in particular, the set Xx (t - €, t + €) is
|
|
contained in U, and hence is covered by A1, ... , A1:.
|
|
|
|
38 The Algebra and Topology of Rn
|
|
|
|
Chapter 1
|
|
|
|
Step 2. By the result of Step 1, we may for each t E [an, bn] choose an open interval V, about t, such that the set X x V, can be covered by finitely
|
|
many elements of the collection A.
|
|
Now the open intervals ¼ in R cover the interval [an, bn]; hence finitely many of them cover this interval, say for t = t1, ... , tm.
|
|
Then Q = X x (an, bn] is contained in the union of the sets X x Vi
|
|
= for t t1, ... , tm; since each of these sets can be covered by finitely many
|
|
elements of A, so may Q be covered. D
|
|
|
|
Theorem 4.9. If X is a closed and bounded subspace of R", then X is compact.
|
|
|
|
Proof. Let A be a collection of open sets that covers X. Let us adjoin
|
|
to this collection the single set R" - X, which is open in Rn because X is closed. Then we have an open covering of all of R". Because X is bounded, we can choose a rectangle Q that contains X; our collection then in particular covers Q.
|
|
Since Q is compact, some finite subcollection covers Q. If this finite sub collection contains the set R" - X, we discard it from the collection. We
|
|
then have a finite sub collection of the collection A; it may not cover all of Q,
|
|
but it certainly covers X, since the set R" - X we discarded contains no point of X. □
|
|
|
|
All the theorems of this section hold if Rn and nm are replaced by arbitrary metric spaces, except for the theorem just proved. That theorem does not hold in an arbitrary metric space; see the exercises.
|
|
|
|
Connected spaces
|
|
If X is a metric space, then X is said to be connected if X cannot be written as the union of two disjoint non-empty sets A and B, each of which is open in X.
|
|
The following theorem is always proved in a first course in analysis, so the proof will be omitted here:
|
|
|
|
Theoren1 4.10. The closed interval (a, b] of R" is connected. □
|
|
|
|
The basic fact about connected spaces that we shall use is the following:
|
|
|
|
Theorein 4.11 (Inter1nediate.value theorem). Let X be connected. If f : X --+ Y is continuous, then f(X) is a connected subspace
|
|
of Y.
|
|
= In particular, if </J : X --+ R is continuous and if f(x 0) < r < f(x1)
|
|
for some points x0 ,x1 of X, then f(x) r for some point x of X.
|
|
|
|
Compact Subspaces and Connected Subspaces of Rn 39
|
|
= Proof. Suppose / (X) AU B, where A and B are disjoint sets open
|
|
in f(X). Then J- 1(A) and J- 1(B) are disjoint sets whose union is X, and each is open in X because f is continuous. This contradicts connectedness
|
|
of X.
|
|
Given <P, let A consist of all yin R with y < r, and let B consist of ally with y > r. Then A and B are open in R; if the set f (X) does not contain r,
|
|
then f (X) is the union of the disjoint sets f(X) n A and f(X) n B, each of
|
|
which is open in f (X). This contradicts connectedness off(X). □
|
|
|
|
If a and b are points of Rn, then the line segment joining a and b is
|
|
= defined to be the set of all points x of the form x a+ t(b - a), where
|
|
0 :s; t < 1. Any line segment is connected, for it is the image of the interval
|
|
[O, 1) under the continuous map t -- a+ t(b - a). A subset A of R" is said to be convex if for every pair a,b of points of
|
|
A, the line segment joining a and b is contained in A. Any convex subset A of Rn is automatically connected: For if A is the union of the disjoint sets U and V, each of which is open in A, we need merely choose a in U and b in V, and note that if L is the line segment joining a and b, then the sets Un L
|
|
and V n L are disjoint, non-empty, and oper. in L.
|
|
It follows that in R" all open balls and open cubes and rectangles are
|
|
connected. (See the exercises.)
|
|
|
|
EXERCISES
|
|
|
|
1. Let R+ denote the set of positive real numbers.
|
|
= (a) Show that the continuous function f : R+ --+- R given by f (x)
|
|
1/(l+x) is bounded but has neither a maximum value nor a minimum value.
|
|
(b) Show that the continuous function g : R+ - R given by g(x) = sin( 1/ x) is bounded but does not satisfy the condition of uniform continuity on R+.
|
|
2. Let X denote the subset (-1, 1) X 0 of R2 , and let U be the open ball
|
|
B(O; 1) in R2 , which contains X. Show there is no£ > 0 such that the
|
|
£-neighborhood of X in R" is contained in U.
|
|
= 3. Let RO() be the set of all "infinite-tuples" x (x1, X2, ... ) of real numbers
|
|
that end in an infinite string of O's. (See the exercises of § 1.) Define an inner product on RO() by the rule (x, y) :;:; Ex,y,. (This is a finite sum, since all but finitely many terms vanish.) Let II x - y II be the corresponding metric on R00 • Define
|
|
|
|
e, = {0 O 1, 0 I • • • l I
|
|
|
|
I • • • IO I • • •) I
|
|
|
|
where 1 appears in the i' h place. Then the e, form a basis for R00 • Let X be the set of all the points ei. Show that X is closed, bounded, and non-compact.
|
|
|
|
40 The Algebra and Topology of R"
|
|
|
|
Chapter 1
|
|
|
|
4. (a) Show that open balls and open cubes in Rnare convex. (b) Show that (open and closed) rectangles in Rn are convex.
|
|
|
|
Differentiation
|
|
|
|
In this chapter, we consider functions mapping Rm into Rn, and we define what we mean by the derivative of such a function. Much of our discussion will simply generalize facts that are already familiar to you from calculus.
|
|
The two major results of this chapter are the inverse function theorem, which gives conditions under which a differentiable function from Rn to Rn has a differentiable inverse, and the implicit function theorem, which provides the theoretical underpinning for the technique of implicit differentiation as studied in calculus.
|
|
Recall that we write the elements of Rm and Rn as column matrices unless specifically stated otherwise.
|
|
|
|
§5. THE DERIVATIVE
|
|
|
|
First, let us recall how the derivative of a real-valued function of a real variable
|
|
is defined.
|
|
Let A be a subset of R; let </> : A --+ R. Suppose A contains a neighbor-
|
|
hood of the point a. We define the derivative of</> at a by the equation
|
|
|
|
"..,(
|
|
V/
|
|
|
|
a
|
|
|
|
)
|
|
|
|
--
|
|
|
|
1·
|
|
1m
|
|
|
|
¢,(a+ t)t
|
|
|
|
-
|
|
|
|
¢,(a) ,
|
|
|
|
t-+0
|
|
|
|
provided the limit exists. In this case, we say that </> is differentiable at a.
|
|
The following facts are an immediate consequence:
|
|
|
|
41
|
|
|
|
42 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
(1) Differentiable functions are continuous. (2) Composites of differentiable functions are differentiable. We seek now to define the derivative of a function f mapping a subset of Rm into nn. We cannot simply replace a and tin the definition just given by
|
|
points of Rm, for we cannot divide a point of Rn by a point of Rm if m > 1!
|
|
Here is a first attempt at a definition:
|
|
Definition. Let A C Rm; let f : A - Rn. Suppose A contains a
|
|
neighborhood of a. Given u E Rm with u f. 0, define
|
|
! '(a; u ) = 11. m /(a+ tut) - J(a) , t-o
|
|
provided the limit exists. This limit depends both on a and on u; it is called the directional derivative of / at a with respect to the vector u. (In calculus, one usually requires u to be a unit vector, but that is not necessary.)
|
|
|
|
EXAMPLE 1. Let / : R2 ---+ R be given by the equation
|
|
|
|
= The directional derivative of / at a (a1, a2) with respect to the vector
|
|
u=(l,O)is
|
|
= = / '(a; u ) 11. m (a1 + t)at2 - a1a2 a2. t-o
|
|
With respect to the vector v = (1, 2), the directional derivative is
|
|
= = / '(a; v ) 11. m (a1 + t) (a2 t+ 2t) - a1a2 + a2 2a1 . t-0
|
|
|
|
It is tempting to believe that the "directional derivative" is the appropri-
|
|
|
|
ate generalization of the notion of "derivative," and to say that f is differen-
|
|
|
|
tiable at a if f'(a; u) exists for every u f. 0. This would not, however, be a
|
|
|
|
very useful definition of differentiability. It would not follow, for instance, that
|
|
|
|
differentiability implies continuity. (See Example 3 following.) Nor would it
|
|
|
|
follow that composites of differentiable functions are differentiable. (See the
|
|
|
|
exercises of § 7.) So we seek something stronger.
|
|
|
|
In order to motivate our eventual definition, let us reformulate the defi-
|
|
|
|
nition of differentiability in the single-variable case as follows:
|
|
|
|
Let A be a subset of R; let <f> : A - R. Suppose A contains a neighbor-
|
|
|
|
hood of a. We say that <f> is differentiable at a if there is a number A such
|
|
|
|
that
|
|
|
|
<f>(a + t) - <f>(a) - At _ 0
|
|
t
|
|
|
|
as
|
|
|
|
t-+- 0.
|
|
|
|
§s.
|
|
|
|
The Derivative 43
|
|
|
|
The number A, which is unique, is called the derivative of</> at a, and denoted </>' (a).
|
|
This formulation of the definition makes explicit the fact that if</> is differ-
|
|
entiable, then the linear function At is a good approximation to the "increment function" </>(a+ t)- </>(a); we often call At the "first-order approximation" or
|
|
the "linear approximation" to the increment function.
|
|
Let us generalize this version of the definition. If AC nm and if/ : A--+ R", what might we mean by a "first-order" or "linear" approximation to the increment function /(a+ h) - /(a)? The natural thing to do is to take a
|
|
function that is linear in the sense of linear algebra. This idea leads to the
|
|
following definition:
|
|
|
|
Definition. Let A C Rm, let f : A --+ R". Suppose A contains a
|
|
neighborhood of a. We say that / is differentiable at a if there is an n by
|
|
m matrix B such that
|
|
|
|
f (a+ h) - f (a) - B •h _.. 0
|
|
lhl
|
|
|
|
as
|
|
|
|
h- 0.
|
|
|
|
The matrix B, which is unique, is called the derivative off at a; it is denoted Df(a).
|
|
|
|
Note that the quotient of which we are taking the limit is defined for h in some deleted neighborhood of O, since the domain off contains a neighborhood of a. Use of the sup norm in the denominator is not essential; one obtains an equivalent definition if one replaces Ih I by II h II-
|
|
It is easy to see that B is unique. Suppose C is another matrix satisfying
|
|
this condition. Subtracting, we have
|
|
(C-B) ·h
|
|
lhl - 0
|
|
= as h - 0. Let u be a fixed vector; set h tu; let t --+ O. It follows that
|
|
(C - B) · u = O. Since u is arbitrary, C = B.
|
|
|
|
EXAMPLE 2. Let/: Rm - R" be defined by the equation
|
|
/(x) = B •x + b,
|
|
where B is an n by m matrix, and b E Rn. Then / is differentiable and
|
|
D f (x) = B. Indeed, since
|
|
= /(a+ h) - /(a) B • h,
|
|
|
|
the quotient used in defining the derivative vanishes identically.
|
|
|
|
44 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
We now show that this definition is stronger than the tentative one we gave earlier, and that it is indeed a "suitable" definition of differentiability. Specifically, we verify the following facts, in this section and those following:
|
|
(1) Differentiable functions are continuous.
|
|
(2) Composites of differentiable functions are differentiable. (3) Differentiability off at a implies the existence of all the directional
|
|
derivatives of f at a.
|
|
We also show how to compute the derivative when it exists.
|
|
|
|
Theorem 5.1. Let A C Rm; let f : A - R". If f is differentiable at a, then all the directional derivatives off at a exist, and
|
|
|
|
f'(a; u) = D f (a)• u.
|
|
|
|
Proof. Let B = D f (a). Seth= tu in the definition of differentiability, where t :/ 0. Then by hypothesis,
|
|
|
|
f (a + tu) - / (a) - B · tu ~ 0
|
|
ltul
|
|
|
|
as t - 0. If t approaches Othrough positive values, we multiply (*) by lul to
|
|
|
|
conclude that
|
|
|
|
f (a + tu) - / (a) _ B . u ~ 0
|
|
t
|
|
|
|
as t - 0, as desired. If t approaches O through negative values, we multiply
|
|
(*) by -lul to reach the same conclusion. Thus f'(a; u) = B •u. D
|
|
|
|
= EXAMPLE 3. Define /: R2 - R by setting /(0) 0 and
|
|
|
|
We show all directional derivatives of/ exist at 0, but that / is not differen-
|
|
tiable at 0. Let u =f:. 0. Then
|
|
|
|
/(0 + tu) - /(0) _ (th)2 (tk) !_ if u = [hkl
|
|
|
|
t
|
|
|
|
(th)• + (tk)2 t
|
|
|
|
so that
|
|
|
|
/'(0; u) = { h2 /k
|
|
o
|
|
|
|
~£ 1£
|
|
|
|
k k
|
|
|
|
==f:.
|
|
|
|
0,
|
|
o.
|
|
|
|
§s.
|
|
|
|
The Derivative 45
|
|
|
|
Thus f'(O; u) exists for all u =/ 0. However, the function f is not differentiable
|
|
at 0. For if g : R2 -+ R is a function that is differentiable at O, then Dg(O) is a 1 by 2 matrix of the form [a b], and
|
|
|
|
g'(O; u) =ah+ bk,
|
|
|
|
which is a linear function of u. But /'(O; u) is not a linear function of u. The function f is particularly interesting. It is differentiable (and hence
|
|
continuous) on each straight line through the origin. (In fact, on the straight
|
|
= line y mx, it has the value mx/(m2 + x 2).) But f is not differentiable at
|
|
the origin; in fact, f is not even continuous at the origin! For / has value 0 at the origin, while arbitrarily near the origin are points of the form (t, t2), at
|
|
which f has value 1/2. See Figure 5.1.
|
|
|
|
Figure 5.1
|
|
Thcoren1 5.2. Let A C Rm; let f : A --+ Rn. If f is differentiable at a, then f is continuous at a.
|
|
= Proof. Let B D f (a). For h near O but different from 0, write
|
|
By hypothesis, the expression in brackets approaches 0 as h approaches 0. Then, by our basic theorems on limits,
|
|
lim [f(a + h) - f (a)] = 0.
|
|
h-O
|
|
Thus f is continuous at a. D We shall deal with composites of differentiable functions in § 7.
|
|
Now we show how to calculate D f(a), provided it exists. We first intro-
|
|
duce the notion of the "partial derivatives" of a real-valued function.
|
|
|
|
46 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
Definition. Let A C Rm; let f : A - R. We define the Ph partial
|
|
derivative off at a to be the directional derivative of/ at a with respect
|
|
to the vector ei, provided this derivative exists; and we denote it by D1 f (a).
|
|
That is,
|
|
|
|
Partial derivatives are usually easy to calculate. Indeed, if we set
|
|
|
|
then the ph partial derivative of / at a equals, by definition, simply the
|
|
= ordinary derivative of the function ¢> at the point t a;. Thus the partial
|
|
derivative D; f can be calculated by treating X1, ... , X;-1, x;+ 1 , ... , Xm as constants, and differentiating the resulting function with respect to x1, using
|
|
the familiar differentiation rules for functions of a single variable.
|
|
We begin by calculating the derivative c• f in the case where f is a
|
|
real-valued function.
|
|
Theorem 5.3. Let A c Rm; let f : A - R. If f is differentiable
|
|
at a, then
|
|
That is, if D /(a) exists, it is the row matrix whose entries are the partial
|
|
derivatives of/ at a.
|
|
Proof. By hypothesis, D f (a) exists and is a matrix of size 1 by m. Let
|
|
|
|
It follows (using Theorem 5.1) that
|
|
|
|
We generalize this theorem as follows:
|
|
Theorem 5.4. Let A C Rm; let f : A - Rn. Suppose A contains a neighborhood of a. Let Ji : A - R be the ith component function off,
|
|
so that
|
|
f (x) = [ /1(:x)] .
|
|
fn(x)
|
|
|
|
§5.
|
|
|
|
The Derivative 47
|
|
|
|
(a) The function f is differentiable at a if and only if each component
|
|
function Ji is differentiable at a.
|
|
(b) If f is differentiable at a, then its derivative is the n by m matrix
|
|
whose ith row is the derivative of the function Ji.
|
|
This theorem tells us that
|
|
|
|
Df(a) = [ : D/1(a)] ,
|
|
Dfn(a)
|
|
so that D /(a) is the matrix whose entry in row i and column j is D;/i(a).
|
|
Proof. Let B be an arbitrary n by m matrix. Consider the function
|
|
|
|
F(l ) _ /(a+ h) - /(a) - B •h
|
|
|
|
l -
|
|
|
|
lhl
|
|
|
|
'
|
|
|
|
which is defined for O < lhl < € (for some€). Now F(h) is a column matrix
|
|
of size n by 1. Its ith entry satisfies the equation
|
|
|
|
= F.·(h) fi(a + h) - /i(a) - (row i of B) •h
|
|
|
|
'
|
|
|
|
lhl
|
|
|
|
•
|
|
|
|
Leth approach 0. Then the matrix F(h) approaches O if and only if each of its entries approaches 0. Hence if B is a matrix for which F(h) - O, then the ith row of Bis a matrix for which ~(h) - 0. And conversely. The theorem follows. D
|
|
|
|
Let AC Rffl and /: A - Rn. If the partial derivatives of the component
|
|
functions Ji of/ exist at a, then one can form the matrix that has D;/i(a) as
|
|
its entry in row i and column j. This matrix is called the Jacobian matrix of f. If f is differentiable at a, this matrix equals D /(a). However, it is
|
|
possible for the partial derivatives, and hence the Jacobian matrix, to exist, without it following that f is differentiable at a. (See Example 3 preceding.)
|
|
This fact leaves us in something of a quandary. We have no convenient way at present for determining whether or not a function is differentiable (other than going back to the definition). We know that such familiar functions as
|
|
sin(xy) and + xy 2 ze:cy
|
|
|
|
have partial derivatives, for that fact is a consequence of familiar theorems from single-variable analysis. But we do not know they are differentiable.
|
|
We shall deal with this problem in the next section.
|
|
|
|
48 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
= = REMARK. If m 1 or n 1, our definition of the derivative is simply
|
|
a reformulation, in matrix notation, of concepts familiar from calculus. For instance, if/ : R1 --+ R3 is a differentiable function, its derivative is the column matrix
|
|
r/{(t)]
|
|
D f (t) = /Ht) .
|
|
n(t)
|
|
In calculus, f is often interpreted as a parametrized-curve, and the vector
|
|
|
|
is called the velocity vector of the curve. (Of course, in calculus one is apt to
|
|
k use i,J, and for the unit basis vectors in R3 rather than e1 ,e2 , and e3 .)
|
|
For another example, consider a differentiable function g : R3 --+ R1 . Its
|
|
derivative is the row matrix
|
|
|
|
and the directional derivative equals the matrix product Dg(x)-u. In calculus,
|
|
the function g is often interpreted as a scalar field, and the vector field
|
|
|
|
is called the gradient of g. (It is often denoted by the symbol ~g.) The directional derivative of g with respect to u is written in calculus as the dot product of the vectors grad g and u.
|
|
Note that vector notation is adequate for dealing with the derivative of / when either the domain or the range of f has dimension 1. For a general function / : Rm -+ Rn, matrix notation is needed.
|
|
EXERCISES
|
|
1. Let AC Rm; let f: A - Rn. Show that if f'(a; u) exists, then /'(a; cu)
|
|
exists and equals cf'(a; u).
|
|
2. Let f : R2 --+ R be defined by setting / (0) = 0 and
|
|
f(x,y) = xy/(x2 + y2) if (x,y) =I- 0.
|
|
(a) For which vectors u =:/- 0 does f'(0; u) exist? Evaluate it when it exists.
|
|
(b) Do Di/ and D2/ exist at 0?
|
|
(c) Is / differenti able at 0? (d) Is/ continuous at 0?
|
|
|
|
§6.
|
|
|
|
Continuously Differentiable Functions 49
|
|
|
|
3. Repeat Exercise 2 for the function f defined by setting / (0) = 0 and
|
|
= f(x, y) x 2 y2 /(:r:2y2 + (y - x)2) if (x, y) =fi 0. = 4. Repeat Exercise 2 for the function / defined by setting /(0) 0 and
|
|
|
|
5. Repeat Exercise 2 for the function f(x,y)=lxl+IYI•
|
|
6. Repeat Exercise 2 for the function
|
|
= 7. Repeat Exercise 2 for the function / defined by setting /(0) 0 and f(x, y) = x Iy 1/(x2 + 11)112 if (x, y) =fi 0.
|
|
|
|
§6. CONTINUOUSLY DIFFERENTIABLE FUNCTIONS
|
|
In this section, we obtain a useful criterion for differentiability. We know that mere existence of the partial derivatives does not imply differentiability. If, however, we impose the (comparatively mild) additional condition that these partial derivatives be continuous, then differentiability is assured.
|
|
We begin by recalling the mean-value theorem of single-variable analysis:
|
|
Theorem 6.1 (Mean-value theorem). If</>: [a, b] -+ R is continu-
|
|
ous at each point of the closed interval [a, b], and differentiable at each point of the open interval (a,b), then there exists a point c of (a,b) such that
|
|
</>(b) - </>(a)= ¢/(c)(b- a). □
|
|
In practice, we most often apply this theorem when </> is differentiable on an open interval containing [a,b]. In this case, of course,</> is continuous on [a,b].
|
|
|
|
50 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
Theorem 6.2. Let A be open in Rm. Suppose that the partial derivatives D;fi(x) of the component functions off exist at each point x of A and are continuous on A. Then f is differentiable at each point of A.
|
|
|
|
A function satisfying the hypotheses of this theorem is often said to be
|
|
continuously differentiable, or of class C 1, on A.
|
|
|
|
Proof In view of Theorem 5.4, it suffices to prove that each component function of f is differentiable. Therefore we may restrict ourselves to the case of a real-valued function f : A -+ R.
|
|
Let a be a point of A. We are given that, for some€, the partial derivatives
|
|
D; f(x) exist and are continuous for Ix - al < €. We wish to show that f is
|
|
differentiable at a.
|
|
Step 1. Let h be a point of Rm with O < lhl < ~; let h1, ... , hm be the
|
|
components of h. Consider the following sequence of points of Rm:
|
|
|
|
Po= a, P1=a+h1e1, P2 =a+ h1e1 + h2e2,
|
|
|
|
The points Pi all belong to the (closed) cube C of radius Ih I centered at a.
|
|
= Figure 6.1 illustrates the case where m 3 and all hi are positive.
|
|
|
|
Figure 6.1
|
|
Since we are concerned with the differentiability of f, we shall need to
|
|
deal with the difference f(a + h) - f(a). We begin by writing it in the form
|
|
L m
|
|
/(a+ h) - f(a) = [f(p;) - /(P;-i)].
|
|
j=l
|
|
|
|
§6.
|
|
|
|
Continuously Differentiable Functions 51
|
|
|
|
Consider the general term of this summation. Let j be fixed, and define
|
|
</>(t) = f (P;-1 + te; ).
|
|
Assume h; f. 0 for the moment. As t ranges over the closed interval I with
|
|
end points O and h;, the point P;-i + te; ranges over the line segment from
|
|
P;-1 to P;i this line segment lies in C, and hence in A. Thus</> is defined for t in an open interval about I.
|
|
Ast varies, only the j th component of the point P;-i +te; varies. Hence because D;f exists at each point of A, the function </> is differentiable on
|
|
an open interval containing I. Applying the mean-value theorem to </>, we conclude that
|
|
</>(h;) - </>(0) = </>'(c; )h;
|
|
for some point c; between O and h;. (This argument applies whether h; is
|
|
positive or negative.) We can rewrite this equation in the form •
|
|
|
|
where q; is the point P;-1 + c;e; of the line segment from P;-1 to P;, which lies in C.
|
|
= We derived (**) under the assumption that h; -1- 0. If h; 0, then (**)
|
|
holds automatically, for any point q; of C.
|
|
Using (**), we rewrite (*) in the form
|
|
L m
|
|
/(a +h)- /(a)= D;/(q;)h;,
|
|
j=l
|
|
where each point Qj lies in the cube C of radius lhl centered at a.
|
|
Step 2. We prove the theorem. Let B be the matrix
|
|
|
|
B = [D1/(a) ••• Dm/(a)].
|
|
|
|
Then
|
|
|
|
L m
|
|
B -h = D;f(a)h;.
|
|
j=l
|
|
|
|
Using(***), we have
|
|
t /(a+h)- /(a)- B •h = [D;/(q;)- D;f(a)]h;;
|
|
|
|
lhl
|
|
|
|
j=l
|
|
|
|
lhl
|
|
|
|
then we let h -+ 0. Since Q; lies in the cube C of radius lhl centered at a,
|
|
we have q,; -+ a. Since the partials of / are continuous at a, the factors in
|
|
|
|
52 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
brackets all go to zero. The factors h; /lhl are of course bounded in absolute
|
|
value by 1. Hence the entire expression goes to zero, as desired. D
|
|
|
|
One effect of this theorem is to reassure us that the functions familiar to us
|
|
from calculus are in fact differentiable. We know how to compute the partial
|
|
derivatives of such functions as sin(xy) and xy 2 + zexy, and we know that
|
|
these partials are continuous. Therefore these functions are differentiable.
|
|
In practice, we usually deal only with functions that are of class C1. While
|
|
it is interesting to know there are functions that are differentiable but not of
|
|
class C 1 , such functions occur rarely enough that we need not be concerned
|
|
with them.
|
|
Suppose f is a function mapping an open set A of Rm into Rn, and suppose
|
|
the partial derivatives Di/i of the component functions of/ exist on A. These then are functions from A to R, and we may consider their partial derivatives, which have the form Dk(D; Ji) and are called the second-order partial
|
|
derivatives of /. Similarly, one defines the third-order partial derivatives
|
|
of the functions fi, or more generally the partial derivatives of order r for
|
|
arbitrary r.
|
|
If the partial derivatives of the functions /i of order less than or equal
|
|
to r are continuous on A, we say / is of class er on A. Then the function / is of class er on A if and only if each function D;/i is of class cr-1 on A.
|
|
We say f is of class C00 on A if the partials of the functions /, of all orders are continuous on A.
|
|
As you may recall, for most functions the "mixed" partial derivatives
|
|
|
|
are equal. This result in fact holds under the hypothesis that the function /
|
|
is of class C2 , as we now show.
|
|
Theorem 6.3. Let A be open in Rm; let f : A-+ R be a Junction
|
|
of class C2 . Then for each a E A,
|
|
|
|
Proof Since one calculates the partial derivatives in question by letting all variables other than Xk and Xj remain constant, it suffices to consider the
|
|
case where / is a function merely of two variables. So we assume that A is open in R2 , and that / : A -+ R2 is of class C2 .
|
|
Step 1. We first prove a certain "second-order" mean-value theorem for /. Let
|
|
Q = [a, a+ h] x [b, b + k]
|
|
|
|
§6.
|
|
|
|
Continuously Differentiable Functions 53
|
|
|
|
be a rectangle contained in A. Define
|
|
>.(h,k) = f(a,b)- /(a+ h, b) - f(a,b + k) + f(a + h, b+ k).
|
|
|
|
Then >. is the sum, with appropriate signs, of the values of / at the four
|
|
|
|
vertices of Q. See Figure 6.2. We show that there are points p and q of Q
|
|
|
|
such that
|
|
|
|
>.(h,k) = D2D1/(p) •hk, and
|
|
|
|
>.(h,k) = D1D2/(q) •hk.
|
|
|
|
b+k b
|
|
|
|
T
|
|
|
|
I
|
|
|
|
I
|
|
|
|
I
|
|
|
|
+
|
|
|
|
I
|
|
l
|
|
|
|
a
|
|
|
|
s
|
|
|
|
Figure 6.2
|
|
|
|
a+h
|
|
|
|
By symmetry, it suffices to prove the first of these equations. To begin,
|
|
|
|
we define
|
|
|
|
<f,(s) = f(s, b + k) - f(s, b).
|
|
|
|
Then </>(a+ h)- </>(a)= >.(h, k), as you can check. Because D 1/ exists in A,
|
|
the function </> is differentiable in an open interval containing [a, a + h]. The
|
|
mean-value theorem implies that
|
|
|
|
</>(a + h) - <f,(a) = </>'(so) •h
|
|
|
|
for some So between a and a+ h. This equation can be rewritten in the form
|
|
|
|
Now So is fixed, and we consider the function D1/(so, t). Because D2D1f exists in A, this function is differentiable for t in an open interval about
|
|
[b, b+ k]. We apply the mean-value theorem once more to conclude that
|
|
|
|
54 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
for some to between b and b + k. Combining (*) and (**) gives our desired
|
|
result.
|
|
Step 2. We prove the theorem. Given the point a = (a,b) of A and
|
|
given t > 0, let Q, be the rectangle
|
|
|
|
Qt= [a,a + t] x [b,b + t].
|
|
|
|
If t is sufficiently small, Qt is contained in A; then Step 1 implies that
|
|
|
|
for some point Pt in Qt. If we let t --+ 0, then Pt --+ a. Because D2D1f is
|
|
continuous, it follows that
|
|
|
|
A similar argument, using the other equation from Step 1, implies that
|
|
|
|
The theorem follows. D
|
|
EXERCISES
|
|
= 1. Show that the function f (x, y) lxyl is differentiable at O, but is not of
|
|
class C1 in any neighborhood of O.
|
|
2. Define / : R -+ R by setting /(0) = 0, and
|
|
f (t) = t2 sin{l/t) if t-::/- 0.
|
|
(a) Show/ is differentiable at 0, and calculate /'{0).
|
|
(b) Calculate J1 (t) if t -::/- 0.
|
|
(c) Show /' is not continuous at 0.
|
|
(d) Conclude that / is differentiable on R but not of class C1 on R.
|
|
3. Show that the proof of Theorem 6.2 goes through if we assume merely
|
|
that the partials D, f exist in a neighborhood of a and are continuous
|
|
at a. 4. Show that if AC Rm and / : A - R, and if the partials Djf exist and
|
|
are bounded in a neighborhood of a, then / is continuous at a. 5. Let f : R2 __. R2 be defined by the equation
|
|
= f(r,0) (rcos0, rsin0).
|
|
It is called the polar coordinate transformation.
|
|
|
|
Continuously Differentiable Functions 55
|
|
|
|
(a) Calculate D f and det D f.
|
|
(b) Sketch the image under / of the set S = [1, 2] x [O, 1r]. [Hint: Find
|
|
the images under / of the line segments that bound S.]
|
|
6. Repeat Exercise 5 for the function f : R2 --+ R2 given by
|
|
|
|
f(x,y) = (x2 - y2 , 2xy).
|
|
|
|
Take S to be the set
|
|
|
|
S = {(x,y) lx2 + y2 :S a2 and x ~ 0 and y ~ O}.
|
|
[Hint: = Parametrize part of the boundary of S by setting x a cost and
|
|
y = a sin t; find the image of this curve. Proceed similarly for the rest of
|
|
the boundary of S.] We remark that if one identifies the complex numbers C with R2 in
|
|
the usual way, then f is just the function f(z) = z 2 .
|
|
7. Repeat Exercise 5 for the function f : R2 --+ R2 given by
|
|
f (x, y) = (ex cosy, ex sin y).
|
|
|
|
Take S to be the set S = [O, I] x [O, 1r].
|
|
We remark that if one identifies C with R2 as usual, then f is the
|
|
function f (z) = ez.
|
|
8. Repeat Exercise 5 for the function f : R3 --+ R3 given by
|
|
f(p,</>,0) = (pcos0sin¢, psin0sincp, pcoscp).
|
|
|
|
It is called the spherical coordinate transformation. Take S to be
|
|
|
|
the set
|
|
|
|
S = [1,2] X (0,71"/2] X (0,71"/2].
|
|
|
|
9. Let g : R -+ R be a function of class C 2 . Show that
|
|
|
|
l1. m
|
|
h-o
|
|
|
|
g(a+h}-2gh(2a)+g(a-h)
|
|
|
|
_
|
|
-
|
|
|
|
g"(a) .
|
|
|
|
[Hint: Consider Step 1 of Theorem 6.3 in the case f(x, y) = g(x + y).]
|
|
|
|
*10. Define f : R2 -+ R by setting f (0) = 0, and
|
|
|
|
f(x,y) = xy(x2 -y2 )/(x2 + y2 ) if (x,y) =I- O.
|
|
|
|
(a) Show D1f and D2/ exist at 0. (b) Calculate D1/ and D2f at (x, y) =I- 0. (c) Show/ is of class C 1 on R2 . [Hint: Show D1f(x, y) equals the prod-
|
|
uct of y and a bounded function, and D2/(x,y) equals the product of x and a bounded function.]
|
|
(d) Show that D2D1f and D1D2/ exist at 0, but are not equal there.
|
|
|
|
56 Differentiation §7. THE CHAIN RULE
|
|
|
|
Chapter 2
|
|
|
|
In this section we show that the composite of two differentiable functions is differentiable, and we derive a formula for its derivative. This formula is commonly called the "chain rule."
|
|
Theorem 7.1. Let Ac Rm; let B c Rn. Let
|
|
/ : A --+ Rn and g : B --+ RP'
|
|
with f (A) C B. Suppose f (a) :::: b. If J is differentiable at a, and if g
|
|
is differentiable at b, then the composite function go f is differentiable at a. Furthermore,
|
|
D(g o f)(a) = Dg(b). D /(a),
|
|
where the indicated product is matrix multiplication.
|
|
Although this version of the chain rule may look a bit strange, it is really just the familiar chain rule of calculus in a new guise. You can convince yourself of this fact by writing the formula out in terms of partial derivatives. We shall return to this matter later.
|
|
Proof. For convenience, let x denote the general point of Rm, and let y denote the general point of Rn.
|
|
By hypothesis, g is defined in a neighborhood of b; choose f so that g(y)
|
|
is defined for IY - bl < f. Similarly, since f is defined in a neighborhood of a
|
|
and is continuous at a, we can choose 6 so that /(x) is defined and satisfies
|
|
the condition 1/(x) - bl < f, for Ix - al < 6. Then the composite function
|
|
(go f)(x) = g(/(x)) is defined for Ix - al < b. See Figure 7.1.
|
|
|
|
g •c
|
|
|
|
Figure 7.1
|
|
|
|
z ERP
|
|
|
|
§7.
|
|
|
|
The Chain Rule 57
|
|
|
|
Step 1. Throughout, let .6.(h) denote the function
|
|
= .6.(h) /(a+ h) - / (a),
|
|
which is defined for lhl < 6. First, we show that the quotient l.6.(h)l/lhJ is
|
|
bounded for h in some deleted neighborhood of 0. For this purpose, let us introduce the function F(h) defined by setting
|
|
F(O) = 0 and
|
|
|
|
F(h) = [11.(h) -,ita) •h] for O< ihi < 6.
|
|
|
|
Because / is differentiable at a, the function F is continuous at 0. Furthermore, one has the equation
|
|
|
|
.6.(h) = DJ (a) •h + lhlF(h)
|
|
= for O< lhl < 6, and also for h 0 (trivially). The triangle inequality implies
|
|
that
|
|
1.6.(h)l < mlD f (a)I lh[ + lhl IF(h)I.
|
|
Now IF(h)I is bounded for h in a neighborhood of O; in fact, it approaches 0 as h approaches 0. Therefore 1.6.(h)I / Jhl is bounded on a deleted neighborhood of 0.
|
|
Step 2. We repeat the construction of Step 1 for the function g. We
|
|
= define a function G(k) by setting G(O) 0 and
|
|
|
|
= G(k)
|
|
|
|
g(b + k) - g(b) - Dg(b) •k
|
|
lkl
|
|
|
|
for
|
|
|
|
O< lkl < f.
|
|
|
|
Because g is differentiable at b, the function G is continuous at 0. Further-
|
|
more, for lkl < f, G satisfies the equation
|
|
= g(b + k) - g(b) Dg(b). k + lklG(k).
|
|
Step 3. We prove the theorem. Let b be any point of Rm with jhl < 6. Then l.6.(h)I < f, so we may substitute .6.(h) fork in formula (**). After this
|
|
substitution, b + k becomes
|
|
= = b + .6.(h) /(a)+ .6.(h) /(a+ h),
|
|
|
|
so formula (**) takes the form
|
|
= g(f(a + b)) - g(/(a)) Dg(b) •.6.(h) + l.6.(h)IG{.6.(h)).
|
|
|
|
58 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
Now we use (*) to rewrite this equation in the form
|
|
|
|
1 lh/
|
|
|
|
[g(f(a + h))
|
|
|
|
-
|
|
|
|
g(/(a))
|
|
|
|
-
|
|
|
|
Dg(b) • D f(a). h]
|
|
|
|
= Dg(b) •F(h) + 1h111 ~(h)IG(~(h)).
|
|
|
|
This equation holds for O < lhl < b. In order to show that go f is differentiable
|
|
at a with derivative Dg(b) • DJ(a), it suffices to show that the right side of
|
|
this equation goes to zero as h approaches 0. The matrix Dg(b) is constant, while F(h) --+ 0 as h --+ 0 (because F
|
|
is continuous at O and vanishes there). The factor G(~(h)) also approaches zero as h --+ O; for it is the composite of two functions G and ~, both of
|
|
which are continuous at O and vanish there. Finally, l~(h)I / lhl is bounded
|
|
in a deleted neighborhood of O, by Step 1. The theorem follows. D
|
|
|
|
Here is an immediate consequence:
|
|
Corollary 7.2. Let A be open in Rm; let B be open in R". Let
|
|
f : A --+ R" and g : B - RP,
|
|
with f(A) CB. If f and g are of class er, so is the composite function
|
|
go f.
|
|
Proof. The chain rule gives us the formula
|
|
D(g o f)(x) = Dg(f(x)) · DJ(x),
|
|
which holds for x E A. Suppose first that / and g are of class C 1 . Then the entries of Dg are
|
|
continuous real-valued functions defined on B; because f is continuous on
|
|
A, the composite function Dg (f (x)) is also continuous on A. Similarly, the entries of the matrix D f (x) are continuous on A. Because the entries of the
|
|
matrix product are algebraic functions of the entries of the matrices involved,
|
|
the entries of the product Dg (J(x)} · D J(x) are also continuous on A. Then go J is of class C 1 on A.
|
|
To prove the general case, we proceed by induction. Suppose the theorem
|
|
is true for functions of class er- 1. Let f and g be of class Cr. Then the
|
|
entries of Dg are real-valued functions of class cr-l on B. Now f is of class
|
|
cr-l on A (being in fact of class Cr); hence the induction hypothesis implies that the function D;gi(f(x)), which is a composite of two functions of class
|
|
cr- l, is of class cr- l. Since the entries of the matrix fl j (X) are also of class
|
|
cr-l on A by hypothesis, the entries of the product Dg(f(x)) ·DJ(x) are of class cr- 1 on A. Hence go f is of class Cr on A, as desired.
|
|
|
|
§7.
|
|
|
|
The Chain Rule 59
|
|
|
|
er er The theorem follows for r finite. If now / and g are of class C00 , then they
|
|
are of class for every r, whence 9 0 / is also of class for every r. □
|
|
|
|
As another application of the chain rule, we generalize the mean-value theorem of single-variable analysis to real-valued functions defined in Rm. We will use this theorem in the next section.
|
|
Theorem 7.3 (Mean-value theorem). Let A be open in Rm; let f : A ...... R be differentiable on A. If A contains the line segment with
|
|
= end points a and a+ h, then there is a point c a+ t0h with O < t0 < 1
|
|
of this line segment such that
|
|
= /(a+ h)- f(a) D/(c) •h.
|
|
= Proof. Set </>(t) f(a + th); then </> is defined fort in an open interval
|
|
about [O, 1]. Being the composite of differentiable functions, </> is differentiable; its derivative is given by the formula
|
|
</>'(t) = D f (a+ th)• h.
|
|
The ordinary mean-value theorem implies that
|
|
</>(1) - </>(O) =</>'(to)· 1
|
|
for some to with O< to < 1. This equation can be rewritten in the form
|
|
f (a+ h) - f (a) = D f (a+ t0h) · h. □
|
|
|
|
As yet another application of the chain rule, we consider the problem of differentiating an inverse function.
|
|
Recall the situation that occurs in single-variable analysis. Suppose </>(x)
|
|
is differentiable on an open interval, with </>'(x) > 0 on that interval. Then</>
|
|
is strictly increasing and has an inverse function 'Ip, which is defined by letting
|
|
1/J(y) be that unique number x such that </>(x) = y. The function 'l/J is in fact
|
|
differentiable, and its derivative satisfies the equation
|
|
= tf/(y) 1/</>'(x),
|
|
= where y </>(x).
|
|
There is a similar formula for differentiating the inverse of a function / of several variables. In the present section, we do not consider the question
|
|
whether the function f has an inverse, or whether that inverse is differentiable.
|
|
We consider only the problem of finding the derivative of the inverse function.
|
|
|
|
60 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
= Theorem 7.4. Let A be open in Rn; let f : A --+ Rn; let f (a) b.
|
|
= Suppose that g maps a neighborhood of b into Rn, that g(b) a, and
|
|
|
|
= g(f(x)) X
|
|
|
|
for all x in a neighborhood of a. If f is differentiable at a and if g is differentiable at b, then
|
|
|
|
Proof. Let i : Rn --+ Rn be the identity function; its derivative is the identity matrix In. We are given that
|
|
g(f (x)) = i(x)
|
|
for all x in a neighborhood of a. The chain rule implies that
|
|
Dg(b) - D /(a)= In.
|
|
Thus Dg(b) is the inverse matrix to D f (a) (see Theorem 2.5). D
|
|
The preceding theorem implies that if a differentiable function / is to have
|
|
a differentiable inverse, it is necessary that the matrix D f be non-singular. It is a somewhat surprising fact that this condition is also sufficient for a function f of class C 1 to have an inverse, at least locally. We shall prove this
|
|
fact in the next section.
|
|
REMARK. Let us make a comment on notation. The usefulness of well-chosen notation can hardly be overemphasized. Arguments that are obscure, and formulas that are complicated, sometimes become beautifully simple once the proper notation is chosen. Our use of matrix notation for the derivative is a case in point. The formulas for the derivatives of a composite function and an inverse function could hardly be simpler.
|
|
Nevertheless, a word may be in order for those who rememher the notation used in calculus for partial derivatives, and the version of the chain rule proved there.
|
|
In advanced mathematics, it is usual to use either the functional notation
|
|
¢' or the operator notation D¢ for the derivative of a real-valued function of a real variable. (D¢ denotes a 1 by 1 matrix in this case!) In calculus,
|
|
however, another notation is common. One often denotes the derivative ¢'(x)
|
|
by the symbol d¢/dx, or, introducing the "variable" y by setting y = cp(x),
|
|
by the symbol dy/ dx. This notation was introduced by Leibnitz, one of the originators of calculus. It comes from the time when the focus of every physical and mathematical problem was on the variables involved, and when functions as such were hardly even thought about.
|
|
|
|
§7.
|
|
|
|
The Chain Rule 61
|
|
|
|
The Leibnitz notation has some familiar virtues. For one thing, it makes
|
|
the chain rule easy to remember. Given functions¢: R - R and tp: R-;, R, the derivative of the composite function tp o ¢ is given by the formula
|
|
D(l/} o ¢)(x) = D¢(¢(x)) • D¢(x).
|
|
If we introduce variables by setting y = ¢( x) and z = t/J(y), then the derivative
|
|
of the composite function z = t/J(¢(x)) can be expressed in the Leibnitz
|
|
notation by the formula
|
|
dz dz dy
|
|
dx = dy. dx.
|
|
|
|
The latter formula is easy to remember because it looks like the formula for multiplying fractions! However, this notation has its ambiguities. The letter "z," when it appears on the left side of this equation, denotes one function (a function of x); and when it appears on the right side, it denotes a different function (a function of y). This can lead to difficulties when it comes to computing higher derivatives unless one is very careful.
|
|
The formula for the derivative of an inverse function is also easy to remember. If y = ¢(x) has the inverse function x = l/J(y), then the derivative
|
|
of tp is expressed in Leibnitz notation by the equation
|
|
1
|
|
dx/dy = dy/dx'
|
|
|
|
which looks like the formula for the reciprocal of a fraction! The Leibnitz notation can easily be extended to functions of several vari-
|
|
ables. If A C Rm and f : A - R, we often set
|
|
Y = f (x) = f (x1, ... , Xm),
|
|
and denote the partial derivative Di/ by one of the symbols
|
|
of or
|
|
OXi
|
|
The Leibnitz notation is not nearly as convenient in this situation. Consider the chain rule, for example. If
|
|
f •. Rm ~ ~ R" and g: Rn - R,
|
|
then the composite function F;;;; go f maps Rm into R, and its derivative is
|
|
given by the formula
|
|
|
|
= DF(x) Dg(f(x)) • Df(x),
|
|
|
|
62 Differentiation
|
|
which can be written out in the form
|
|
|
|
Chapter 2
|
|
|
|
Dm-~~(x)] .
|
|
Dm/n(x)
|
|
The formula for the Ph partial derivative of F is thus given by the equa-
|
|
tion
|
|
L n
|
|
DiF(x) = D,.g(f(x)) Difk(x).
|
|
k=l
|
|
If we shift to "variable" notation by setting y = /(x) and z = g(y ), this
|
|
equation becomes
|
|
|
|
this is probably the version of the chain rule you learned in calculus. Only familiarity would suggest that it is easier to remember than (*)! Certainly one cannot obtain the formula for {)zj OXj by a simple-minded multiplication of fractions, as in the single-variable case.
|
|
The formula for the derivative of an inverse function is even more troublesome. Suppose f : R2 - R2 is differentiable and has a differentiable inverse function g. The derivative of g is given by the formula
|
|
= Dg(y) [D/(x))-1 .
|
|
|
|
where y = / (x). In Leibnitz notation, this formula takes the form
|
|
|
|
= 8x1/oy2] [{)yifox1 oy1/8x2]-l
|
|
|
|
8x2/oy2
|
|
|
|
oy2/8x1 {)y2/ox2
|
|
|
|
Recalling the formula for the inverse of a matrix, we see that the partial derivative OXi/Oyj is about as far from being the reciprocal of the partial derivative oy,/OXi as one could imagine!
|
|
|
|
§8.
|
|
|
|
The Inverse Function Theorem 63
|
|
|
|
EXERCISES
|
|
1. Let f : R3 - R2 satisfy the conditions /(0) = (1, 2) and
|
|
Df(O) = [ 1 2 3] . 0 0 1
|
|
Let g : R2 - R2 be defined by the equation
|
|
g(x, y) = (x + 2y + I, 3xy).
|
|
Find D(g o /)(0). 2. Let f : R2 - R3 and g: R3 - R2 be given by the equations
|
|
f (x) = (€ 2 x 1 +x2 , 3x2 - cos X1, Xi+ X2 + 2),
|
|
g(y) = (3y1 + 2yz + Yi, yt - y3 + 1).
|
|
(a) If F(x) = g(/(x)), find DF(O). [Hint: Don't compute F explicitly.]
|
|
(b) If G(y) = f (g(y)), find DG(O).
|
|
3. Let f : R3 - R and g : R2 - R be differentiable. Let F : R2 - R be
|
|
defined by the equation
|
|
F(x, y) = f (x, y, g(x, y)).
|
|
(a) Find DF in terms of the partials off and g. (b) If F(x, y) = 0 for all (x, y), find D1g and D2g in terms of the partials
|
|
off.
|
|
4. Let g: R2 - R2 be defined by the equation g(x, y) = (x,y + x 2). Let
|
|
= / : R2 - R be the function defined in Example 3 of§ 5. Let h fog.
|
|
Show that the directional derivatives of f and g exist everywhere, but that there is au =j:. 0 for which h'(O; u) does not exist.
|
|
|
|
§8. THE INVERSE FUNCTION THEOREM
|
|
Let A be open in Rn; let f : A -+ Rn be of class C 1. We know that for f to have a differentiable inverse, it is necessary that the derivative Df (x) of/
|
|
be non-singular. We now prove that this condition is also sufficient for / to have a differentiable inverse, at least locally. This result is called the inverse Junction theorem.
|
|
We begin by showing that non-singularity of D f implies that / is locally
|
|
one-to-one.
|
|
|
|
64 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
Lemma 8.1. Let A be open in nn ,· let f: A - Rn be of class C 1 . If D/(a) is non-singular, then there exists an o > 0 such that the
|
|
inequality
|
|
lf(xo) - f(xi)I > alxo - xii
|
|
|
|
holds for all x0,x1 in some open cube C(a; €) centered at a. It follows that f is one-to-one on this open cube.
|
|
|
|
= Proof. Let E D f(a); then E is non-singular. We first consider the
|
|
linear transformation that maps x to E •x. We compute
|
|
|
|
= lxo - x1 I IE-1 • (E •xo - E •xi)!
|
|
< njE- 1 1 • JE •xo - E-xij.
|
|
= If we set 2o I/nlE- 1 I, then for all xo, x1 in Rn,
|
|
|
|
Now we prove the lemma. Consider the function
|
|
= H(x) f(x)-E ·x.
|
|
= Then DH(x) Df(x)-E, so thatDH(a) = 0. Because H isofclassC1, we
|
|
can choose€> 0 so that JDH(x)I < a/n for x in the open cube C = C(a; €).
|
|
The mean-value theorem, applied to the ith component function of H, tells
|
|
us that, given xo, x1 E C, there is a c E C such that
|
|
|
|
Then for xo, x1 EC, we have
|
|
olxo - x1 I> IH(xo) - H(x1)I
|
|
=I/(xo) - E •Xo - f (xi) + E •x1 I
|
|
>IE· x1 - E •xol - lf(x1) - /(xo)I
|
|
~ 20:lx1 - xol - lf(x1) - f(xo)I-
|
|
The lemma follows. D
|
|
Now we show that non-singularity of D f, in the case where f is one-to-
|
|
one, implies that the inverse function is differentiable.
|
|
|
|
§8.
|
|
|
|
The Inverse Function Theorem 65
|
|
|
|
Theorem 8.2. Let A be open in Rn; let f : A -+ Rn be of class er;
|
|
let B = f(A). If f is one-to-one on A and if Df(x) is non-singular for
|
|
x E A, then the set B is open in Rn and the inverse function g : B -+ A
|
|
is of class er.
|
|
|
|
Proof. Step 1. We prove the following elementary result: If</> : A -+ R
|
|
= is differentiable and if</> has a local minimum at x 0 E A, then D<f,(x0) 0.
|
|
To say that r/> has a local minimum at x0 means that </J(x) > ef>(x0) for
|
|
all x in a neighborhood of x 0 . Then given u f. 0,
|
|
ef>(xo + tu) - ef>(xo) > 0
|
|
|
|
for all sufficiently small values oft. Therefore
|
|
= r/>'(xo; u) lim [ef>(xo + tu) - r/>(xo)]/t t-+0
|
|
is non-negative if t approaches O through positive values, and is non-positive
|
|
= if t approaches O through negative values. It follows that ef>'(x0 ; u) 0. In = = particular, D;r/>(xo) 0 for all j, so that Def>(xo) 0.
|
|
Step 2. We show that the set B is open in Rn. Given b E B, we show B contains some open ball B(b; 6) about b.
|
|
We begin by choosing a rectangle Q lying in A whose interior contains
|
|
= the point a J- 1(b) of A. The set Bd Q is compact, being closed and
|
|
bounded in Rn. Then the set /(Bd Q) is also compact, and thus is closed and
|
|
bounded in Rn. Because f is one-to-one, f(Bd Q) is disjoint from b; because
|
|
f(Bd Q) is closed, we can choose 6 > 0 so that the ball B(b; 26) is disjoint
|
|
= from f(Bd Q). Given c E B(b; 6) we show that c f(x) for some x E Q; it = then follows that the set f (A) B contains each point of B(b; 6), as desired.
|
|
See Figure 8.1.
|
|
|
|
-I -
|
|
Figure 8.1
|
|
|
|
/(Bd Q)
|
|
|
|
66 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
Given c E B(b; «5), consider the real-valued function
|
|
|
|
</>(x) = 11/(x) - cll2,
|
|
|
|
which is of class er. Because Q is compact, this function has a minimum value on Q; suppose that this minimum value occurs at the point x of Q. We
|
|
= show that /(x) c.
|
|
Now the value of </> at the point a is
|
|
|
|
</>(a) = 11/(a) - cll 2 = llb - cll2 < «52 •
|
|
|
|
Hence the minimum value of</> on Q must be less than «52 . It follows that this minimum value cannot occur on Bd Q, for if x E Bd Q, the point f (x) lies
|
|
outside the ball B(b; 2«5), so that 11/(x) - ell > «5. Thus the minimum value
|
|
of</> occurs at a point x of Int Q. Because x is interior to Q, it follows that </> has a local minimum at x;
|
|
then by Step 1, the derivative of</> vanishes at x. Since
|
|
|
|
= Ln
|
|
|
|
</>(x)
|
|
|
|
(fk(x) - c1:)2,
|
|
|
|
n
|
|
|
|
= L D;</>(x)
|
|
|
|
2(/1:(x)- c1:)D;Jk(x).
|
|
|
|
k=l
|
|
|
|
= The equation D</>(x) 0 can be written in matrix form as
|
|
|
|
= (/n(x) - Cn)] •D/(x) 0.
|
|
|
|
Now D f(x) is non-singular, by hypothesis. Multiplying both sides of this
|
|
= equation on the right by the inverse of D /(x), we see that f(x) - c O, as
|
|
desired.
|
|
Step 3. The function / : A --+ B is one-to-one by hypothesis; let g : B--+ A be the inverse function. We show g is continuous.
|
|
Continuity of g is equivalent to the statement that for each open set U of
|
|
= = A, the set V g- 1 (U) is open in B. But V f(U); and Step 2, applied to
|
|
the set U, which is open in A and hence open in Rn, tells us that V is open in Rn and hence open in B. See Figure 8.2.
|
|
|
|
§8.
|
|
|
|
The Inverse Function Theorem 67
|
|
|
|
I
|
|
|
|
g
|
|
|
|
Figure 8.2
|
|
It is an interesting fa.ct that the results of Steps 2 and 3 hold without assuming that D f (x) is non-singular, or even that / is di:fferentiable. If A is open in Rn and / : A-+ Rn is continuous and one-to-one, then it is true that /(A) is open in Rn and the inverse function g is continuous. This result is known as the Brouwer theorem on invariance of domain. Its proof requires the tools of algebraic topology and is quite difficult. We have proved the differentiable version of this theorem.
|
|
|
|
Step 4. Given b E B, we show that g is differentiable at b.
|
|
|
|
-r~7)- Let a be the point g(b), and let E = D f (a). We show that the function
|
|
|
|
G(k) = [g(b + k)
|
|
|
|
E-•. k],
|
|
|
|
which is defined _for k in a deleted neighborhood of 0, approaches 0 as k
|
|
|
|
approaches 0. Then g is differentiable at b with derivative E- 1.
|
|
|
|
Let us define
|
|
|
|
d(k) = g(b + k) - g(b)
|
|
|
|
for k near 0. We first show that there is an f > 0 such that ld(k)l/lkl is
|
|
bounded for O < lkl < f. (This would follow from differentiability of g, but
|
|
|
|
that is what we are trying to prove!) By the preceding lemma, there is a
|
|
neighborhood C of a and an a > 0 such that
|
|
|
|
lf(xo) - f(x1)I > alxo - x1 I
|
|
|
|
for x0 ,x1 EC. Now /(C) is a neighborhood of b, by Step 2; choose f so that
|
|
= h+k is in /(C) whenever lkl < f. Then for lkl < f, we can set xo g(b + k)
|
|
and x 1 = g(b) and rewrite the preceding inequality in the form
|
|
|
|
[(b + k) - bl> alg(b + k) - g(b)I,
|
|
|
|
68 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
which implies that
|
|
|
|
1/a> l~(k)l/lkl,
|
|
|
|
as desired.
|
|
Now we show that G(k) ---. 0 as k ---. 0. Let 0 < lkl < f. We have
|
|
|
|
= G(k) t.(k) lkf-' •k by definition,
|
|
|
|
= -E-1 . [k - E •~(k)] l~(k)I
|
|
|
|
l~(k)I
|
|
|
|
lkl •
|
|
|
|
(Here we use the fact that ~(k) # 0 for k # 0, which follows from the fact
|
|
|
|
that g is one-to-one.) Now E- 1 is constant, and l~(k)I/ lkl is bounded. It
|
|
|
|
remains to show that the expression in brackets goes to zero. We have
|
|
|
|
= b + k = f (g(b + k)) = f(g(b) + 6.(k)) f (a+ ~(k)).
|
|
|
|
Thus the expression in brackets equals
|
|
f (a+ ~(k)) - /(a) - E · ~(k)
|
|
|
|
l~(k)I
|
|
Let k ---. 0. Then ~(k) ---. 0 as well, because g is continuous. Since f is
|
|
differentiable at a with derivative E, this expression goes to zero, as desired.
|
|
|
|
Step 5. Finally, we show the inverse function g is of class Cr. Because g is differentiable, Theorem 7.4 applies to show that its derivative is given by the formula
|
|
|
|
Dg(y) = [D/(g(y))J- 1 ,
|
|
|
|
for y E B. The function Dg thus equals the composite of three functions:
|
|
|
|
B-.!....+ A !!.L GL(n) ~ GL(n),
|
|
|
|
where GL(n) is the set of non-singular n by n matrices, and I is the function
|
|
that maps each non-singular matrix to its inverse. Now the function I is given by a specific formula involving determinants. In fact, the entries of l(C) are rational functions of the entries of C; as such, they are C00 functions of the
|
|
entries of C.
|
|
We proceed by induction on r. Suppose f is of class C1 . Then D f is
|
|
continuous. Because g and I are also continuous (indeed, g is differentiable
|
|
and I is of class C00 ), the composite function, which equals Dg, is also continuous. Hence g is of class C 1 .
|
|
Suppose the theorem holds for functions of class cr- 1. Let f be of
|
|
class er. Then in particular f is of class cr- l, so that (by the induction
|
|
hypothesis), the inverse function g is of class cr-l _ Furthermore, the function
|
|
D f is of class cr-l. VVe invoke Corollary 7.2 to conclude that the composite function, which equals Dg, is of class er- 1. Then g is of class er. □
|
|
|
|
Finally, we prove the inverse function theorem.
|
|
|
|
§8.
|
|
|
|
The Inverse Function Theorem 69
|
|
|
|
Theorem 8.3 (The inverse function theorem). Let A be open
|
|
in R"; let f : A --+ R" be of class er. If Df(x) is non-singular at
|
|
the point a of A, there is a neighoorhood U of the point a such that f carries U in a one-to-one fashion onto an open set V of R" and the
|
|
inverse function is of class er.
|
|
|
|
Proof. By Lemma 8.1, there is a neighborhood U0 of a on which f is
|
|
one-to-one. Because det Df(x) is a continuous function ofx, and det Df(a) f; 0, there is a neighborhood U1 of a such that det D f (x) f; 0 on U1 . If U equals
|
|
the intersection of Uo and U1, then the hypotheses of the preceding theorem are satisfied for / : U --+ R". The theorem follows. D
|
|
|
|
This theorem is the strongest one that can be proved in general. While
|
|
the non-singularity of D f on A implies that / is locally one-to-one at each point of A, it does not imply that f is one-to-one on all of A. Consider the
|
|
following example:
|
|
|
|
EXAMPLE 1. Let / : R2 -+ R2 be defined by the equation
|
|
|
|
J (r, 9) = (r cos 9, r sin 9).
|
|
|
|
Then
|
|
|
|
Df(r,9)= [ cos 9
|
|
|
|
-rsin 9] ,
|
|
|
|
sin 9 r cos (J
|
|
|
|
so that det Df(r,9) = r.
|
|
Let A be the open set (0, 1) x (O,b) in the (r,8) plane. Then DJ is non-
|
|
singular at each point of A. However, / is one-to-one on A only if b < 2,r.
|
|
See Figures 8.3 and 8.4.
|
|
|
|
8 1
|
|
|
|
----I--- y
|
|
|
|
Figure 8.3
|
|
|
|
70 Differentiation
|
|
()
|
|
1
|
|
|
|
---f --
|
|
|
|
Chapter 2
|
|
y
|
|
|
|
Figure B.4
|
|
EXERCISES
|
|
1. Let f : R2 - R2 be defined by the equation
|
|
J(x,y) = (x2 -'Jt,2xy).
|
|
(a) Show that f is one-to-one on the set A consisting of all (x, y) with
|
|
= = x > 0. [Hint: If f(x,y) f(a,b), then 11/(x,y)II 11/(a,b)II-] = (b) What is the set B f(A)?
|
|
(c) If g is the inverse function, find Dg(O, 1).
|
|
2. Let / : R2 - R2 be defined by the equation
|
|
= f(x,y) (excosy,exsiny).
|
|
(a) Show that / is one-to-one on the set A consisting of all (x, y) with
|
|
0 < y < 2,r. [Hint: See the hint in the preceding exercise.]
|
|
(b) What is the set B = f (A)?
|
|
(c) If g is the inverse function, find Dg(O, 1).
|
|
= 3. Let / : Rn - Rn be given by the equation / (x) llxll2 • x. Show that
|
|
/ is of class C00 and that / carries the unit ball B(O; 1) onto itself in
|
|
a one-to-one fashion. Show, however, that the inverse function is not differentiable at 0.
|
|
4. Let g ; R2 - R2 be given by the equation
|
|
Let / : R2 - R3 be given by the equation
|
|
f(x, y) = (3x - y2 , 2x + y, xy + y3).
|
|
|
|
§9.
|
|
|
|
The Implicit Function Theorem 71
|
|
|
|
(a) Show that there is a neighborhood of (0, 1) that g carries in a oneto-one fashion onto a neighborhood of (2, 0).
|
|
(b) Find D(f o g-1 ) at {2, 0). 5. Let A be open in Rn; let /: A-+ Rn be of class Cr; assume D/(x) is
|
|
non-singular for x E A. Show that even if/ is not one-to-one on A, the
|
|
set B = /(A) is open in Rn.
|
|
|
|
*§9. THE IMPLICIT FUNCTION THEOREM
|
|
The topic of implicit differentiation is one that is probably familiar to you from calculus. Here is a typical problem:
|
|
"Assume that the equation x3 y + 2e~Y = 0 determines y as
|
|
a differentiable function of x. Find dy/ dx ."
|
|
One solves this calculus problem by "looking at y as a function of x," and
|
|
differentiating with respect to x. One obtains the equation
|
|
|
|
which one solves for dy/dx. The derivative dy/dx is of course expressed in
|
|
terms of x and the unknown function y.
|
|
The case of an arbitrary function f is handled similarly. Supposing that
|
|
the equation f(x, y) = 0 determines y as a differentiable function of x, say
|
|
y = g(x), the equation J(x,g(x)) = 0 is an identity. One applies the chain
|
|
rule to calculate
|
|
of/ox+ (of /oy)g'(x) = 0,
|
|
|
|
so that
|
|
|
|
,
|
|
|
|
8//ox
|
|
|
|
9 (x) = - 8 f /8y '
|
|
|
|
where the partial derivatives are evaluated at the point (x,g(x)). Note that
|
|
the solution involves a hypothesis not given in the statement of the problem.
|
|
In order to find g'( x ), it is necessary to assume that Of/8y is non-~ero at the
|
|
point in question.
|
|
It in fact turns out that the non-vanishing of 8 J/ 8y is also sufficient
|
|
to justify the assumptions we made in solving the problem. That is, if the
|
|
function f(x,y) has the property that 8f/8y "IO at a point (a,b) that is a
|
|
solution of the equation f(x,y) = 0, then this equation does determine y as
|
|
a function of x, for x near a, and this function of x is differentiable.
|
|
|
|
72 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
This result is a special case of a theorem called the implicit function
|
|
theorem, which we prove in this section.
|
|
The general case of the implicit function theorem involves a system of
|
|
equations rather than a single equation. One seeks to solve this system for
|
|
some of the unknowns in terms of the others. Specifically, suppose that J :
|
|
Rk+n -+ Rn is a function of class C 1 . Then the vector equation
|
|
|
|
is equivalent to a system of n scalar equations ink+ n unknowns. One would
|
|
expect to be able to assign arbitrary values to k of the unknowns and to solve for the remaining unknowns in terms of these. One would also expect that the resulting functions would be differentiable, and that one could by implicit differentiation find their derivatives.
|
|
There are two separate problems here. The first is the problem of finding the derivatives of these implicitly defined functions, assuming they exist; the solution to this problem generalizes the computation of g'(x) just given. The second involves showing that (under suitable conditions) the implicitly defined functions exist and are differentiable.
|
|
In order to state our results in a convenient form, we introduce a new
|
|
notation for the matrix D f and its submatrices:
|
|
Definition. Let A be open in Rm; let f : A-+ Rn be differentiable. Let
|
|
f1, ... , fn be the component functions off. We sometimes use the notation
|
|
|
|
for the derivative off. On occasion we shorten this to the notation
|
|
DJ= 8J/8x.
|
|
More generally, we shall use the notation
|
|
8(fi1,•",fi,,) 8(xii,··•,x;,)
|
|
to denote the k by f matrix that consists of the entries of D f lying in rows
|
|
ii, ... , i1: and columns Ji, ... ,Jt- The general entry of this matrix, in row p
|
|
and column q, is the partial derivative 8fip/8x;q•
|
|
Now we deal with the problem of finding the derivative of an implicitly defined function, assuming it exists and is differentiable. For simplicity, we shall assume that we have solved a system of n equations ink+ n unknowns for the last n unknowns in terms of the first k unknowns.
|
|
|
|
The Implicit Function Theorem 73
|
|
|
|
Theorem 9.1. Let A be open in Rk+n; let f : A -+ Rn be differentiable. Write fin the form f(x,y), for x E Rk and y E Rn; then DJ has the form
|
|
l· D f = [8f / 8x 8f / 8y
|
|
|
|
Suppose there is a differentiable function g : B -+ Rn defined on an open
|
|
|
|
set B in Rk, such that
|
|
|
|
f (x,g(x)) =0
|
|
|
|
for all x EB. Then for x EB,
|
|
|
|
8f Bx
|
|
|
|
(x,g(x))
|
|
|
|
+
|
|
|
|
8f
|
|
By
|
|
|
|
(x,g(x))
|
|
|
|
• Dg(x)
|
|
|
|
= 0.
|
|
|
|
This equation implies that if the n by n matrix 8 f / 8y is non-singular at the point (x, g(x)), then
|
|
|
|
Dg(x) = -
|
|
|
|
[8Byf
|
|
|
|
i-l (x, g(x))
|
|
|
|
• 8Bxf
|
|
|
|
(x,
|
|
|
|
g(x ))
|
|
|
|
.
|
|
|
|
= = Note that in the case n k 1, this is the same formula for the derivative
|
|
that was derived earlier; the matrices involved are 1 by 1 matrices in that
|
|
case.
|
|
|
|
Proof. Given g, let us define h : B-+ Rk+n by the equation
|
|
h(x) = (x, g(x)).
|
|
The hypotheses of the theorem imply that the composite function
|
|
JI (X) = f (h(X)) = f (X' g(X))
|
|
is defined and equals zero for all x E B. The chain rule then implies that
|
|
= = o DH(x) D f (h(x)) · Dh(x)
|
|
:t l· = [:~ (h(x)) (h(x)) [n:(x)]
|
|
|
|
= :: (h(x)) + :~ (h(x)) •Dg(x),
|
|
as desired. D
|
|
The preceding theorem tells us that in order to compute Dg, we must assume that the matrix 8 f / 8y is non-singular. Now we prove that the non-
|
|
singularity of 8 f / 8y suffices to guarantee that the function g exists and is
|
|
differentiable.
|
|
|
|
74 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
Theorem 9.2 (Implicit function theorem). Let A be open in
|
|
|
|
Rk+n; let f : A -+ R" be of class er. Write f in the form f (x, y)'
|
|
|
|
for x E Rk and y E Rn. Suppose that (a, b) is a point of A such that
|
|
f(a, b) =0 and
|
|
|
|
det
|
|
|
|
8f
|
|
By (a,
|
|
|
|
b)
|
|
|
|
#
|
|
|
|
0.
|
|
|
|
Then there is a neighborhood B of a in Rk and a unique continuous
|
|
= function g: B-+ Rn such that g(a) b and = f (x,g(x)) 0
|
|
for all x E B. The function g is in fact of class er.
|
|
|
|
Proof. We construct a function F to which we can apply the inverse function theorem. Define F : A ____,. Rk+n by the equation
|
|
F(x,y) = (x,f(x,y)).
|
|
Then F maps the open set A of Rk+n into Rk x Rn = Rk+n. Furthermore,
|
|
|
|
DF = [&!i&x &f~&J •
|
|
|
|
Computing <let DF by repeated application of Lemma 2.12, we have
|
|
= det DF <let 8f/8y. Thus DF is non-singular at the point (a, b).
|
|
Now F(a, b) = (a,O). Applying the inverse function theorem to the map F, we conclude that there exists an open set U x V of Rk+n about (a, b) (where U is open in Rk and V is open in Rn) such that:
|
|
(1) F maps U x V in a one-to-one fashion onto an open set Win Rk+n containing (a, 0).
|
|
(2) The inverse function G: W-+ U x V is of class er.
|
|
= Note that because F(x, y) (x, f(x,y)), we have
|
|
|
|
(x,y) = G(x,f(x,y)).
|
|
|
|
Thus G preserves the first k coordinates, as F does. Then we can write G in
|
|
|
|
the form
|
|
|
|
G(x, z) = (x, h(x, z))
|
|
|
|
w for X E Rk and z E Rn; here h is a function of class er mapping into Rn.
|
|
|
|
Let B be a connected neighborhood of a in Rk, chosen small enough that
|
|
|
|
B x O is contained in 1¥. See Figure 9.1. We prove existence of the function
|
|
|
|
g: B-+ R". If x EB, then (x,O) E l1V, so we have:
|
|
|
|
G(x, 0) = (x, h(x, 0)), = = (x,O) F(x,h(x,O)) (x,f(x,h(x,o))), = 0 J(x, h(x,O)).
|
|
|
|
§9.
|
|
R"
|
|
V
|
|
|
|
The Implicit Function Theorem 75
|
|
|
|
(a, b) F G
|
|
|
|
u
|
|
|
|
Rk
|
|
|
|
(x,O) / /(a,O)
|
|
|
|
w
|
|
|
|
Rk
|
|
|
|
Figure 9.1
|
|
|
|
= = We set g(x) h(x, 0) for x EB; then g satisfies the equation f (x,g(x)) 0,
|
|
as desired. Furthermore,
|
|
(a,b) =G(a,0) = (a,h(a,0));
|
|
then b = g(a), as desired.
|
|
Now we prove uniqueness of g. Let 9o : B -+ Rn be a continuous function satisfying the conditions in the conclusion of our theorem. Then in particular, g0 agrees with g at the point a. We show that if 9o agrees with g at the point ao E B, then g0 agrees with g in a neighborhood Bo of ao. This is easy.
|
|
The map g carries a0 into V. Since 9o is continuous, there is a neighborhood
|
|
Bo of a 0 contained in B such that 9o also maps Bo into V. The fact that
|
|
/ (x, go(x)) = 0 for x E Bo implies that
|
|
F(x, 9o(x)) = (x, 0), so
|
|
(x, Yo(x)) = G(x, 0) = (x, h(x, 0)).
|
|
Thus Yo and g agree on Bo. It follows that 9o and g agree on all of B: The set
|
|
= of points of B for which lg(x) - g0(x)I 0 is open in B (as we just proved),
|
|
and so is the set of points of B for which jg(x) - go(x)I > 0 (by continuity of g and go). Since B is connected, the latter set must be empty. D
|
|
In our proof of the implicit function theorem, there was of course nothing special about solving for the last n coordinates; that choice was made simply for convenience. The same argument applies to the problem of solving for any n coordinates in terms of the others.
|
|
|
|
76 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
For example, suppose A is open in R5 and f : A ----► R2 is a function
|
|
= of class er. Suppose one wishes to "solve" the equation f (x, y, z, u, v) 0
|
|
for the two unknowns y and u in terms of the other three. In this case, the
|
|
= implicit function theorem tells us that if a is a point of A such that f (a) 0
|
|
and
|
|
8/
|
|
det 8(y,u) (a) -I 0,
|
|
= then one can solve for y and u locally near that point, say y </>( x, z, v) and = u 1/J(x, z, v). Furthermore, the derivatives of</> and 1/; satisfy the formula
|
|
|
|
1- l =- 8(</>, 1P)
|
|
8(x, z, v)
|
|
|
|
[ 81 l [ 81 8(y, u) • 8(x, z, v) •
|
|
|
|
EXAMPLE 1. Let /: R2 ---.. R be given by the equation
|
|
f(x,y) = x2 + y2 - 5.
|
|
= = Then the point (x, y) (1, 2) satisfies the equation /(x, y) 0. Both {)J /8x
|
|
and {Jf / {)y are non-zero at (1,2), so we can solve this equation locally for
|
|
either variable in terms of the other. In particular, we can solve for yin terms of x, obtaining the function
|
|
y = g(x) = [5 - ;z;2]112 .
|
|
Note that this solution is not unique in a neighborhood of x = 1 unless we
|
|
specify that g is continuous. For instance, the function
|
|
for X ~ 1,
|
|
for z < 1
|
|
satisfies the same conditions, but is not continuous. See Figure 9.2.
|
|
|
|
1 Figure 9.2
|
|
|
|
§9.
|
|
|
|
The Implicit Function Theorem 77
|
|
|
|
= EXAMPLE 2. Let/ be the function of Example 1. The point (x, y) (v's, 0)
|
|
also satisfies the equation / (x, y) = 0. The derivative lJf / IJy vanishes at (-Is, 0), so we do not expect to be able to solve for y in terms of x near this point. And, in fact, there is no neighborhood B of -Is on which we can solve
|
|
for yin terms of x. See Figure 9.3.
|
|
|
|
(v's, 0)
|
|
|
|
Figure 9.3 EXAMPLE 3. Let / : R2 - R be given by the equation
|
|
= Then (0,0) is a solution of the equation f(x, y) 0. Because IJJ /IJy vanishes
|
|
at (0,0), we do not expect to be able to solve this equation for yin terms of x near (0,0). But in fact, we can; and furthermore, the solution is unique!
|
|
= However, the function we obtain is not differentiable at x 0. See Figure 9.4.
|
|
|
|
Figure 9.4
|
|
EXAMPLE 4. Let / : R2 - R be given by the equation
|
|
= J(x,y) y2 - x 4 • = Then (0,0) is a solution of the equation f(x, y) 0. Because IJ/ /IJy vanishes
|
|
at {0,0), we do not expect to be able to solve for yin terms of x near (0,0). In
|
|
|
|
78 Differentiation
|
|
|
|
Chapter 2
|
|
|
|
fact, however, we can do so, and we can do so in such a way that the resulting function is differentiable. However, the solution is not unique.
|
|
|
|
Figure 9.5
|
|
|
|
= Now the point (1,2) also satisfies the equation f(x, y) 0. Because
|
|
{)J /{)y is non-zero at (1,2), one can solve this equation for y as a continuous
|
|
= function of x in a neighborhood of x 1. See Figure 9.5. One can in fact
|
|
express y as a continuous function of x on a larger neighborhood than the one
|
|
pictured, but if the neighborhood is large enough that it contains 0, then the solution is not unique on that larger neighborhood.
|
|
|
|
EXERCISES
|
|
|
|
1. Let / : R3 - R2 be of class C 1 ; write/ in the form f(x, Y1, Y2). Assume that /(3, -1, 2) = 0 and
|
|
|
|
D/(3,
|
|
|
|
-1,
|
|
|
|
2)
|
|
|
|
=
|
|
|
|
1 [1
|
|
|
|
2
|
|
|
|
-1
|
|
|
|
(a) Show there is a function g : B - R2 of class C1 defined on an open
|
|
set B in R such that
|
|
= I (X' 91 (X), g2 ( X)) 0
|
|
for x E B, and g(3) = (-1, 2).
|
|
(b) Find Dg(3).
|
|
= (c) Discuss the problem of solving the equation / (x, Y1, Y2) 0 for an
|
|
arbitrary pair of the unknowns in terms of the third, near the point (3, -1, 2).
|
|
2. Given/: R5 - R2, of class C1 . Let a= (1,2,-1,3,0); suppose that
|
|
/(a)= 0 and
|
|
|
|
3 1 -1
|
|
DJ(a) = [: 01 2
|
|
|
|
§9.
|
|
|
|
The Implicit Function Theorem 79
|
|
|
|
(a) Show there is a function g : B - R2 of class C1 defined on an open set B of R3 such that
|
|
|
|
= for x (x1, z2, xa) EB, and g(l, 3, 0) = (2, -1).
|
|
(b) Find Dg(l, 3, 0).
|
|
= (c) Discuss the problem of solving the equation / (x) 0 for an arbitrary
|
|
pair of the unknowns in terms of the others, near the point a.
|
|
2- = 3. Let / : R R be of class C1, with /(2, -1) -1. Set
|
|
G(x,y,u) = f(x,y) +u2,
|
|
= H(x,y,u) ux+ 3y3 +u3 •
|
|
The equations G(x, y, u) = 0 and H(x, y, u) = 0 have the solution
|
|
= (x, y, u) (2, -1, 1). = (a) What conditions on DJ ensure that there are C1 functions z g(y)
|
|
= and u h(y) defined on an open set in R that satisfy both equations,
|
|
such that g(-1) = 2 and h(-1) = 1?
|
|
= (b) Under the conditions of (a), and assuming that D/(2, -1) [1 -3],
|
|
find g'(-1) and h'(-1). 4. Let F : R2 - R be of class C2 I with F(O, 0) = 0 and DF(0, 0) = (2 3).
|
|
Let G : R3 - R be defined by the equation
|
|
= G(x,y,z) F(x+2y+3z -I,x3 +y2- z2 ).
|
|
= = (a) Note that G(-2, 3, -1) F(0, 0) O. Show that one can solve
|
|
= = the equation G(x, y, z) 0 for z, say z g(x, y), for (z, y) in a = neighborhood B of (-2, 3), such that g(-2, 3) -1.
|
|
(b) Find Dg(-2, 3).
|
|
= = = *(c) If D1D1F 3 and D1D2F -1 and D2D2F 5 at (0,0), find
|
|
D2D1g(-2, 3). 5. Let /, g : R3 - R be functions of class C1 . "In general," one expects
|
|
= = that each of the equations /(x, y, z) 0 and g(x, y, z) 0 represents a
|
|
smooth surface in R3, and that their intersection is a smooth curve. Show
|
|
that if (x0 , y0 , zo) satisfies both equations, and if IJ(f, g)/8(x, y, z) has rank 2 at (xo, Yo, zo), then near (xo, Yo, zo), one can solve these equations
|
|
for two of x, y, z in terms of the third, thus representing the solution set locally as a parametrized curve.
|
|
6. Let/: Rk+n - R" be of class C 1 ; suppose that /(a)= 0 and that D/(a)
|
|
has rank n. Show that if c is a point of R" sufficiently close to O, then
|
|
= the equation / (x) c has a solution.
|
|
|
|
Integration
|
|
In this chapter, we define the integral of real-valued function of several real variables, and derive its properties. The integral we study is called Riemann integral; it is a direct generalization of the integral usually studied in a first course in single-variable analysis.
|
|
§10. THE INTEGRAL OVER A RECTANGLE
|
|
We begin by defining the volume of a rectangle. Let
|
|
be a rectangle in R". Each of the intervals [ai, bi] is called a component interval of Q. The maximum of the numbers b1 - a1, ... , bn - an is called the width of Q. Their product
|
|
is called the volwne of Q.
|
|
= In the case n l, the volume and the width of the (I-dimensional)
|
|
rectangle [a,bJ are the same, namely, the number b- a. This number is also called the length of [a,b].
|
|
Definition. Given a closed interval [a, b] of R, a partition of [a, bJ is a finite collection P of points of [a, b] that includes the points a and b. We
|
|
81
|
|
|
|
82 Integration
|
|
|
|
Chapter 3
|
|
|
|
usually index the elements of P in increasing order, for notational convenience, as
|
|
a = to < ti < · · · < t" = b;
|
|
each of the intervals [ti-l, ti], for i = 1, ... , k, is called a subinterval determined by P, of the interval [a, b]. More generally, given a rectangle
|
|
|
|
in Rn, a partition P of Q is an n-tuple (P1, ... , Pn) such that P; is a
|
|
partition of [a;, b;] for each j. If for each j, I; is one of the subintervals determined by P; of the interval [a;, b;], then the rectangle
|
|
|
|
is called a subrectangle determined by P, of the rectangle Q. The maxi-
|
|
mum width of these subrectangles is called the mesh of P.
|
|
Definition. Let Q be a rectangle in Rn; let f : Q -+ R; assume / is
|
|
bounded. Let P be a partition of Q. For each subrectangle R determined by P, let
|
|
mn(/) = inf{/(x) Ix E R},
|
|
Mn(/) = sup{/(x) Ix E R}.
|
|
We define the lower sum and the upper sum, respectively, of /, determined by P, by the equations
|
|
L L(f, P) = mn(f) •v(R), n
|
|
L U(f,P) = Mn(/)• v(R), n
|
|
where the summations extend over all subrectangles R determined by P.
|
|
Let P = (P1, ... , Pn) be a partition of the rectangle Q. If P" is a
|
|
partition of Q obtained from P by adjoining additional points to some or all
|
|
of the partitions P1, ... , Pn, then P" is called a refinement of P. Given two partitions P and P' = (P{, ... , P~) of Q, the partition
|
|
|
|
is a refinement of both P and P'; it is called their common refinement. Passing from P to a refinement of P of course affects lower sums and
|
|
upper sums; in fact, it tends to increase the lower sums and decrease the upper sums. That is the substance of the following lemma:
|
|
|
|
§10.
|
|
|
|
The Integral Over a Rectangle 83
|
|
|
|
Lemma 10.1. Let P be a partition of the rectangle Q; let f ; Q -+ R be a bounded function. If P" is a refinement of P, then
|
|
L(f, P) < L(f, P") and U(f, P") < U(f, P).
|
|
|
|
Proof Let Q be the rectangle
|
|
= Q [a1,b1] X • • • X [an,bn]•
|
|
It suffices to prove the lemma when P" is obtained by adjoining a single
|
|
additional point to the partition of one of the component intervals of Q. Suppose, to be definite, that P is the partition (P 1, ... , Pn) and that P" is obtained by adjoining the point q to the partition P1. Further, suppose that
|
|
P1 consists of the points
|
|
|
|
and that q lies interior to the subinterval [ti-i, ti]We first compare the lower sums L(f, P) and L(f, P"). Most of the
|
|
subrectangles determined by Pare also subrectangles determined by P". An exception occurs for a subrectangle determined by P of the form
|
|
Rs =[ti-1, ti] x S
|
|
(where S is one of the subrectangles of [a2 , b2] x ••• x [an, bn] determined by
|
|
(P2, ... , Pn)), The term involving the subrectangle Rs disappears from the
|
|
lower sum and is replaced by the terms involving the two subrectangles
|
|
|
|
which are determined by P". See Figure 10.1.
|
|
|
|
s
|
|
|
|
q
|
|
Figure 10.1
|
|
|
|
84 Integration
|
|
|
|
Chapter 3
|
|
|
|
Now since mn5 (f) < f(x) for each x E R~ and for each x E Ri, it
|
|
follows that
|
|
|
|
= Because v(Rs) v(R's) + v(R~) by direct computation, we have
|
|
|
|
Since this inequality holds for each subrectangle of the form Rs, it follows
|
|
that
|
|
L(f, P) < L(f, P"),
|
|
as desired.
|
|
A similar argument applies to show that U(f,P) > U(f,P"). □
|
|
|
|
Now we explore the relation between upper sums and lower sums. We have the following result:
|
|
Lemma 10.2. Let Q be a rectangle; let f : Q ---+ R be a bounded function. If P and P' are any two partitions of Q, then
|
|
L(f, P) < U(f, P').
|
|
Proof. In the case where P = P', the result is obvious: For any sub-
|
|
rectangle R determined by P, we have mn(f) < Mn(f). Multiplying by
|
|
v(R) and summing gives the desired inequality.
|
|
In general, given partitions P and P' of Q, let P" be their common
|
|
refinement. Using the preceding lemma, we conclude that
|
|
L(f, P) < L(f, P") < U(f, P") < U(f, P'). □
|
|
|
|
Now (finally) we define the integral.
|
|
|
|
Definition. Let Q be a rectangle; let f: Q -+ R be a bounded function.
|
|
As P ranges over all partitions of Q, define
|
|
|
|
= f J sup {L(f, P)} h p
|
|
|
|
and
|
|
|
|
}fqf =
|
|
|
|
inf {U(f,P)}.
|
|
p
|
|
|
|
§10.
|
|
|
|
The Integral Over a Rectangle 85
|
|
|
|
These numbers are called the lower integral and upper integral, respec-
|
|
tively, of / over Q. They exist because the numbers L(f, P) are bounded above by U(f, P') where P' is any fixed partition of Q; and the numbers U(f, P) are bounded below by L(f, P'). If the upper and lower integrals
|
|
off over Q are equal, we say / is integrable over Q, and we define the inte-
|
|
gral of/ over Q to equal the common value of the upper and lower integrals.
|
|
We denote the integral of/ over Q by either of the symbols
|
|
|
|
I. or
|
|
|
|
/(x).
|
|
|
|
xeQ
|
|
|
|
EXAMPLE 1. Let / : [a, bJ -+ R be a non-negative bounded function. If P
|
|
= is a partition of I [a, b), then L(f, P) equals the total area of a bunch of
|
|
rectangles inscribed in the region between the graph of/ and the x-axis, and
|
|
U(f, P) equals the total area of a bunch of rectangles circumscribed a.bout this region. See Figure 10.2.
|
|
|
|
L(f,P)
|
|
|
|
U(f,p)
|
|
|
|
a
|
|
|
|
b
|
|
|
|
a
|
|
|
|
b
|
|
|
|
Figure 10.2
|
|
|
|
The lower integral represents the so-called "inner area" of this region, computed by approximating the region by inscribed rectangles, while the upper integral represents the so-called "outer area," computed by approximating the region by circumscribed rectangles. If the "inner" and "outer" areas are equal, then / is integrable.
|
|
Similarly, if Q is a. rectangle in R2 and / : Q -+ R is non-negative and
|
|
bounded, one can picture L(f, P) as the total volume of a bunch of boxes inscribed in the region between the graph of/ and the zy-plane, and U(/, P)
|
|
|
|
86 Integration
|
|
|
|
Chapter 3
|
|
|
|
as the total volume of a bunch of boxes circumscribed about this region. See Figure 10.3.
|
|
|
|
Figure 10.3
|
|
= = EXAMPLE 2. Let I [D, 1]. Let / : I - R he defined by setting f (x) 0 if = xis rational, and f (x) 1 if x is irrational. We show that / is not integrable
|
|
over/.
|
|
Let P be a partition of I. If R is any subinterval determined by P, then
|
|
mR(/) = 0 and MR(!) = 1, since R contains both rational and irrational
|
|
numbers. Then
|
|
L(f, P) = L O• v(R) = D,
|
|
R
|
|
and
|
|
U(f, P) = L 1 • v(R) = I.
|
|
R
|
|
Since P is arbitrary, it follows that the lower integral of / over I equals 0, and the upper integral equals 1. Thus/ is not integrable over I.
|
|
A condition that is often useful for showing that a given function is integrable is the following:
|
|
Theorem 10.3 (The Rien1ann condition). Let Q be a rectangle; let f : Q - R be a bounded function. Then
|
|
equality holds if and only if given £ > 0, there exists a corresponding
|
|
partition P of Q Jor which
|
|
U(f, P) - L(f, P) < L
|
|
|
|
§10.
|
|
|
|
The Integral Over a Rectangle 87
|
|
|
|
Proof. Let P' be a fixed partition of Q. It follows from the fact that
|
|
L(f, P) < U(f, P') for every partition P of Q, that
|
|
1f < U(f, P').
|
|
|
|
Now we use the fact that P' is arbitrary to conclude that
|
|
|
|
Suppose now that the upper and lower integrals are equal. Choose a
|
|
partition P so that L(f, P) is within £/2 of the integral Jq f, and a partition
|
|
P' so that U(f, P') is within f./2 of the integral Jq /. Let P" be their common
|
|
refinement. Since
|
|
k L(f' P) < L(f, P") < f < u(I, P") < u(f, P'),
|
|
the lower and upper sums for f determined by P" are within f. of each other.
|
|
Conversely, suppose the upper and lower integrals are not equal. Let
|
|
|
|
Let P be any partition of Q. Then
|
|
|
|
hence the upper and lower sums for f determined by P are at least f. apart.
|
|
Thus the Riemann condition does not hold. □
|
|
Here is an easy application of this theorem.
|
|
= Theorem 10.4. Every constant function f(x) c is integrable.
|
|
Indeed, if Q is a rectangle and if P is a partition of Q, then
|
|
|
|
where the summation extends over all subrectangles determined by P.
|
|
|