ANALYSIS
ON

Analysis on Manifolds
James R. Munkres
Massachusetts Institute of Technology Cambridge, Massachusetts
ADDISON-WESLEY PUBLISHING COMPANY
The Advanced Book Program Redwood City, California • Menlo Park, California • Reading, Massachusetts New York • Don Mills, Ontario • Wokingham, United Kingdom • Amsterdam
Bonn• Sydney •Singapore• Tokyo• Madrid• San Juan

Publisher: Allan M. Wylde Production Manager: Jan V. Benes Marketing Manager: Laura Likely Electronic Composition: Peter Vacek Cover Design: Iva Frank

Library of Congress Cataloging-in-Publication Data

Munkres, James R., 1930-

Analysis on manifolds/James R. Munkres.

p. cm.

Includes bibliographical references.

1. Mathematical analysis. 2. Manifolds (Mathematics)

QA300.M75 1990

516.3'6'20-dc20

91-39786

ISBN 0-201-51035-9

CIP

This book was prepared using the '!EX typesetting language.
Copyright ©1991 by Addison-Wesley Publishing Company, The Advanced Book Program, 350 Bridge Parkway, Suite 209, Redwood City, CA 94065
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Published simultaneously in Canada.
ABCDEFGHIJ-MA-943210

Preface
This book is intended as a text for a course in analysis, at the senior or first-year graduate level.
A year-long course in real analysis is an essential part of the preparation of any potential mathematician. For the first half of such a course, there is substantial agreement as to what the syllabus should be. Standard topics include: sequence and series, the topology of metric spaces, and the derivative and the Riemannian integral for functions of a single variable. There are a number of excellent texts for such a course, including books by Apostol [A], Rudin [Ru], Goldberg [Go], and Royden (Ro], among others.
There is no such universal agreement as to what the syllabus of the second half of such a course should be. Part of the problem is that there are simply too many topics that belong in such a course for one to be able to treat them all within the confines of a single semester, at more than a superficial level.
At M.I.T., we have dealt with the problem by offering two independent second-term courses in analysis. One of these deals with the derivative and the Riemannian integral for functions of several variables, followed by a treatment of differential forms and a proof of Stokes' theorem for manifolds in euclidean space. The present book has resulted from my years of teaching this course~ The other deals with the Lebesgue integral in euclidean space and its applications to Fourier analysis.
Prequisites
As indicated, we assume the reader has completed a one-term course in analysis that included a study of metric spaces and of functions of a single variable. We also assume the reader has some background in linear algebra, including vector spaces and linear transformations, matrix algebra, and determinants.
The first chapter of the book is devoted to reviewing the basic results from linear algebra and analysis that we shall need. Results that are truly basic are
V

vi Preface
stated without proof, but proofs are provided for those that are sometimes omitted in a first course. The student may determine from a perusal of this chapter whether his or her background is sufficient for the rest of the book.
How much time the instructor will wish to spend on this chapter will depend on the experience and preparation of the students. I usually assign Sections 1 and 3 as reading material, and discuss the remainder in class.
How the book is organized
The main part of the book falls into two parts. The first, consisting of Chapter 2 through 4, covers material that is fairly standard: derivatives, the inverse function theorem, the Riemann integral, and the change of variables theorem for multiple integrals. The second part of the book is a bit more sophisticated. It introduces manifolds and differential forms in Rn, providing the framework for proofs of the n-dimensional version of Stokes' theorem and of the Poincare lemma.
A final chapter is devoted to a discussion of abstract manifolds; it is intended as a transition to more advanced texts on the subject.
The dependence among the chapters of the book is expressed in the following diagram:

Chapter 1 Chapter 2 Chapter 3

The Algebra and Topology of Rn
!
Differentiation
!
Integration

Chapter 4 ChLge of Variables

Chapter 5 Mlifolds

Chapter 7

Chapter 6 Differential Forms
I
Stokes' Theorem

Chapter 8 Closed Forms and Exact Forms

Chapter 9 Epilogue-Life Outside nn

Preface VII
Certain sections of the books are marked with an asterisk; these sections may be omitted without loss of continuity. Similarly, certain theorems that may be omitted are marked with asterisks. When I use the book in our undergraduate analysis sequence, I usually omit Chapter 8, and assign Chapter 9 as reading. With graduate students, it should be possible to cover the entire book.
At the end of each section is a set of exercises. Some are computational in nature; students find it illuminating to know that one can compute the volume of a five-dimensional ball, even if the practical applications are limited! Other exercises are theoretical in nature, requiring that the student analyze carefully the theorems and proofs of the preceding section. The more difficult exercises are marked with asterisks, but none is unreasonably hard.
Acknowledgements
Two pioneering works in this subject demonstrated that such topics as manifolds and differential forms could be discussed with undergraduates. One is the set of notes used at Princeton c. 1960, written by Nickerson, Spencer, and Steenrod [N-S-S]. The second is the book by Spivak [S]. Our indebtedness to these sources is obvious. A more recent book on these topics is the one by Guillemin and Pollack [G-P]. A number of texts treat this material at a more advanced level. They include books by Boothby [B], Abraham, Mardsen, and Raitu [A-M-R], Berger and Gostiaux [B-G], and Fleming [F]. Any of them would be suitable reading for the student who wishes to pursue these topics further.
I am indebted to Sigurdur Helgason and Andrew Browder for helpful comments. To Ms. Viola Wiley go my thanks for typing the original set of lecture notes on which the book is based. Finally, thanks is due to my students at M.I.T., who endured my struggles with this material, as I tried to learn how to make it understandable (and palatable) to them!
J.R.M.

Contents

PREFACE

V

CHAPTER 1 The Algebra and Topology of Rn

1

§1. Review of Linear Algebra 1 §2. Matrix Inversion and Determinants 11 §3. Review of Topology in Rn 25 §4. Compact Subspaces and Connected Subspaces of Rn 32

CHAPTER 2 Differentiation
§5. Derivative 41 §6. Continuously Differentiable Functions 49 §7. The Chain Rule 56 §8. The Inverse Function Theorem 63 *§9. The Implicit Function Theorem 71

41 ix

X Contents

CHAPTER 3 Integration

81

§10. The Integral over a Rectangle 81 §11. Existence of the Integral 91 §12. Evaluation of the Integral 98 §13. The Integral over a Bounded Set 104 §14. Rectifiable Sets 112 §15. Improper Integrals 121

CHAPTER 4 Changes of Variables

135

§16. Partitions of Unity 136 §17. The Change of Variables Theorem 144
§18. Diffeomorphisms in R" 152
§19. Proof of the Change of Variables Theorem 160 §20. Application of Change of Variables 169

CHAPTER 5 Manifolds

179

§21. The Volumne of a Parallelopiped 178 §22. The Volume of a Parametrized-Manifold 186
§23. Manifolds in R" 194
§24. The Boundary of a Manifold 201 §25. Integrating a Scalar Function over a Manifold 207

CHAPTER 6 Differential Forms

219

§26. §27. §28. §29. §30. *§31. §32.

Multilinear Algebra 220 Alternating Tensors 226 The Wedge Product 236 Tangent Vectors and Differential Forms 244 The Differential Operator 252 Application to Vector and Scalar Fields 262 The Action of a Differentiable Map 267

CHAPTER 7 Stokes' Theorem

Contents xi
275

§33. §34. §35. *§36. §37. *§38.

Integrating Forms over Parametrized-Manifold 275 Orientable Manifolds 281 Integrating Forms over Oriented Manifolds 293 A Geometric Interpretation of Forms and Integrals 297 The Generalized Stokes' Theorem 301 Applications to Vector Analysis 310

CHAPTER 8 Closed Forms and Exact Forms

323

§39. The Poincare Lemma 324 §40. The deRham Groups of Punctured Euclidean Space 334

CHAPTER 9 Epilogue-Life Outside Rn

345

§41. Differentiable Manifolds and Riemannian Manifolds 345

BIBLIOGRAPHY

259

Analysis on Manifolds

The Algebra and Topology of Rn
§1. REVIEW OF LINEAR ALGEBRA
Vector spaces
Suppose one is given a set V of objects, called vectors. And suppose
there is given an operation called vector addition, such that the sum of the vectors x and y is a vector denoted x + y. Finally, suppose there is given an operation called scalar multiplication, such that the product of the scalar
(i.e., real number) e and the vector xis a vector denoted ex. The set V, together with these two operations, is called a vector space
(or linear space) if the following properties hold for all vectors x, y, z and all scalars e, d:
= (1) X + y y + X.
= (2) x + (y + z) (x + y) + z.
= (3) There is a unique vector Osuch that x + 0 x for all x. = (4) x + (-l)x 0.
(5) lx =x.
= (6) e(dx) (ed)x. = (7) (c + d)x ex + dx.
(8) e(x + y) = ex+ cy.
1

2 The Algebra and Topology of Rn

Chapter 1

One example of a vector space is the set Rn of all n-tuples of real numbers, with component-wise addition and multiplication by scalars. That is, if x = (x1, ... ,xn) andy= (Yt,•••,Yn), then
X + Y = (x1 + Yt, · · •, Xn + Yn),
ex= (cxi, ... , cxn)•

The vector space properties are easy to check.

If V is a vector space, then a subset W of V is called a linear subspace

(or simply, a subspace) of V if for every pair x,y of elements of W and every
scalar c, the vectors x + y and ex belong to W. In this case, W itself satisfies

properties (1)-(8) if we use the operations that W inherits from V, so that

Wis a vector space in its own right.

In the first part of this book, nn and its subspaces are the only vector

spaces with which we shall be concerned. In later chapters we shall deal with

more general vector spaces.

Let V be a vector space. A set a 1 , ... , Rm of vectors in V is said to

span V if to each x in V, there corresponds at least one m-tuple of scalars

C1, ... , Cm such that

= X C1a1 + · · · + Cm8m,

In this case, we say that x can be written as a linear combination of the vectors a1, ... , Rm.
The set a1, ... , am of vectors is said to be independent if to each x in V there corresponds at most one m-tuple of scalars c1, ... , Cm such that

Equivalently, {a1, ... , am} is independent if to the zero vector O there corre-
sponds only one m-tuple of scalars d1, ... , dm such that
= 0 d1a1 + ··· + dmam, = = = namely the scalars d1 d2 = · · · dm 0.
If the set of vectors a 1, ... , Rm both spans V and is independent, it is said to be a basis for V.
One has the following result:
Theorem 1.1. Suppose V has a basis consisting of m vectors. Then any set of vectors that spans V has at least m vectors, and any set of vectors of V that is independent has at most m vectors. In particular, any basis for V has exactly m vectors. □
If V has a basis consisting of m vectors, we say that m is the dimension of V. We make the convention that the vector space consisting of the zero
vector alone has dimension zero.

§1.

Review of Linear Algebra

3

It is easy to see that Rn has dimension n. (Surprise!) The following set of vectors is called the standard basis for Rn:
e1 = (1,0,0, ... ,0), e2 = (0,1,0, ... ,o),

en= (0,0,0, ... ,1).
The vector space Rn has many other bases, but any basis for Rn must consist of precisely n vectors.
One can extend the definitions of spanning, independence, and basis to
allow for infinite sets of vectors; then it is possible for a vector space to have an infinite basis. (See the exercises.) However, we shall not be concerned with this situation.
Because nn has a finite basis, so does every subspace of Rn. This fact is
a consequence of the following theorem:
Theorem 1.2. Let V be a vector space of dimension m. If W is a linear subspace of V {different from VJ, then W has dimension less than m. Furthermore, any basis a 1 , . .. , ak for W may be extended to a basis a1, ... ,ak, ak+l, ... ,am for V. □
Inner products
If V is a vector space, an inner product on V is a function assigning, to each pair x, y of vectors of V, a real number denoted (x, y), such that the following properties hold for all x, y, z in V and all scalars c:
(1) (x,y) = (y, x).
= (2) (x + y, z) (x, z) + (y, z). = (3) (cx,y) c(x,y) = (x, cy).
(4) (x,x) > 0 if x / 0.
A vector space V together with an inner product on V is called an inner
product space. A given vector space may have many different inner products. One par-
= ticularly useful inner product on nn is defined as follows: If x = (x1, ... , Xn)
and y (Y1, ... , Yn), we define

The properties of an inner product are easy to verify. This is the inner product we shall commonly use in Rn. It is sometimes called the dot product; we denote it by (x, y) rather than x • y to avoid confusion with the matrix product, which we shall define shortly.

4

The Algebra and Topology of R"

Chapter 1

If V is an inner product space, one defines the length (or norm) of a vector of V by the equation

The norm function has the following properties:
(1) llxll > 0 if xi 0.
(2) llcxll = lcl llxll-
(3) llx + YII < llxll + IIYII-
The third of these properties is the only one whose proof requires some work; it is called the triangle inequality. (See the exercises.) An equivalent form of this inequality, which we shall frequently find useful, is the inequality
(3') !Ix - YII > llxll - IIYII-
Any function from V to the reals R that satisfies properties (1)-(3) just
listed is called a norm on V. The length function derived from an inner product is one example of a norm, but there are other norms that are not derived from inner products. On Rn, for example, one has not only the familiar norm derived from the dot product, which is called the euclidean norm, but one has also the sup norm, which is defined by the equation

The sup norm is often more convenient to use than the euclidean norm. We note that these two norms on Rn satisfy the inequalities
lxl < llxll < v'nlxl.
Matrices
A matrix A is a rectangular array of numbers. The general number
appearing in the array is called an entry of A. If the array has n rows and m
columns, we say that A has size n by m, or that A is "an n by m matrix."
We usually denote the entry of A appearing in the ith row and Ph column by llij; we call i the row index and j the column index of this entry.
If A and B are matrices of size n by m, with general entries aii and bi;,
respectively, we define A + B to be the n by m matrix whose general entry is· aij + b,;, and we define cA to be the n by m matrix whose general entry
is Cllij. With these operations, the set of all n by m matrices is a vector
space; the eight vector space properties are easy to verify. This fact is hardly
surprising, for an n by m matrix is very much like an nm-tuple; the only difference is that the numbers are written in a rectangular array instead of a
linear array.

§1.

Review of Linear Algebra

5

The set of matrices has, however, an additional operation, called matrix
multiplication. If A is a matrix of size n by m, and if B is a matrix of size m by p, then the product A •B is defined to be the matrix C of size n by
p whose general entry c1; is given by the equation

I:m

= c1;

aikb1c;.

k=l

This product operation satisfies the following properties, which are straightforward to verify:
(1) A· (B -C) =(A· B) · C.
(2) A· (B + C) =A· B +A· C.
= (3) (A+ B)-C A-C + B · C. = (4) (cA). B c(A •B) =A· (cB).
(5) For each k, there is a k by k matrix I1c such that if A is any n by m
matrix, and A-Im= A.

In each of these statements, we assume that the matrices involved are of
appropriate sizes, so that the indicated operations may be performed.
The matrix I1c is the matrix of size k by k whose general entry Oij is
= = = defined as follows: Di; 0 if i i= j, and /J1; 1 if i j. The matrix l1c is
called the identity matrix of size k by k; it has the form

1 0

0

0 1

0

0 0

1

with entries of 1 on the "main diagonal" and entries of Oelsewhere.
We extend to matrices the sup norm defined for n-tuples. That is, if A
is a matrix of size n by m with general entry ai;, we define
= = = IAI max{lai;I; i 1, ... , n and j 1, ... , m}.

The three properties of a norm are immediate, as is the following useful result:

Theorem 1.3. If A has size n by m, and B has size m by p, then

IA· Bl < mlAI IBI. □

6 The Algebra and Topology of Rn

Chapter 1

Linear transformations
If V and W are vector spaces, a function T : V --+ W is called a linear transformation if it satisfies the following properties, for all x, yin V and
all scalars c:
(1) T(x + y) ;::: T(x) + T(y).
= (2) T(cx) cT(x).
If, in addition, T carries V onto W in a one-to-one fashion, then T is called
a linear isomorphism. One checks readily that if T : V--+ W is a linear transformation, and if
S : W --+ X is a linear transformation, then the composite S o T : V --+ X is
a linear transformation. Furthermore, if T : V --+ W is a linear isomorphism,
then T- 1 : W--+ V is also a linear isomorphism. A linear transformation is uniquely determined by its values on basis
elements, and these values may be specified arbitrarily. That is the substance of the following theorem:

Theorem 1.4. Let V be a vector space with basis a1, ... , a,,.. Let
W be a vector space. Given any m vectors b 1, ... , bm in W, there is exactly one linear transformation T : V --+ W such that, for all z,
= T(ai) bi. □

In the special case where V and W are "tuple spaces" such as nm and
R", matrix notation gives us a convenient way of specifying a linear transformation, as we now show.
First we discuss row matrices and column matrices. A matrix of size 1
by n is called a row matrix; the set of all such matrices bears an obvious
resemblance to Rn. Indeed, under the one-to-one correspondence

the vector space operations also correspond. Thus this correspondence is a
linear isomorphism. Similarly, a matrix of size n by 1 is called a column
matrix; the set of all such matrices also bears an obvious resemblance to Rn. Indeed, the correspondence

is a linear isomorphism. The second of these isomorphisms is particularly useful when studying
linear transformations. Suppose for the moment that we represent elements

§1.

Review of Linear Algebra

7

of Rm and Rn by column matrices rather than by tuples. If A is a fixed n by m matrix, let us define a function T : Rm ~ Rn by the equation
T(x) = A ·x.
The properties of matrix product imply immediately that T is a linear trans-
formation. In fact, every linear transformation of Rm to Rn has this form. The proof
= is easy. Given T, let bi, ... , bm be the vectors of Rnsuch that T(e;) h;. = Then let A be the n by m matrix A [b1 • •• bm] with successive columns
b 1, ... , bm. Since the identity matrix has columns e1, ... , em, the equation
A· Im= A implies that A· e; = h; for all j. Then A· e; = T(e;) for all j; it follows from the preceding theorem that A• x = T(x) for all x.
The convenience of this notation leads us to make the following convention:

Convention. Throughout, we shall represent the elements of Rn by column matrices, unless we specifically state otherwise.

Rank of a matrix
Given a matrix A of size n by m, there are several important linear spaces associated with A. One is the space spanned by the columns of A, looked at as column matrices (equivalently, as elements of Rn). This space is called
the column space of A, and its dimension is called the column rank of A. Because the column space of A is spanned by m vectors, its dimension can
be no larger than m; because it is a subspace of Rn, its dimension can be no
larger than n. Similarly, the space spanned by the rows of A, looked at as row matrices
(or as elements of Rm) is called the row space of A, and its dimension is called the row rank of A.
The following theorem is of fundamental importance:

Theorem 1.5. For any matrix A, the row rank of A equals the column rank of A. □

Once one has this theorem, one can speak merely of the rank of a matrix A, by which one means the number that equals both the row rank of A and the column rank of A.
The rank of a matrix A is an important number associated with A. One cannot in general determine what this number is by inspection. However, there is a relatively simple procedure called Gauss-Jordan reduction that can be used for finding the rank of a matrix. (It is used for other purposes as well.) We assume you have seen it before, so we merely review its major features here.

8 The Algebra and Topology of Rn

Chapter 1

One considers certain operations, called elementary row operations,
that are applied to a matrix A to obtain a new matrix B of the same size.
They are the following:
(1) Exchange rows i1 and i2 of A (where i1 f:. i2).
(2) Replace row i1 of A by itself plus the scalar c times row i2 (where i1 j i2).
(3) Multiply row i of A by the non-zero scalar A.
Each of these operations is invertible; in fact, the inverse of an elementary
operation is an elementary operation of the same type, as you can check. One
has the following result:
Theorem 1.6. If B is the matrix obtained by applying an elemen-
tary row operation to A, then
rank B = rank A. □

Gauss-Jordan reduction is the process of applying elementary operations

to A to reduce it to a special form called echelon form (or stairstep form),

for which the rank is obvious. An example of a matrix in this form is the

following:

@ * * ***

B=

@ * * **

0 0@ **

0 0 0 0 00

Here the entries beneath the "stairsteps" are 0; the entries marked * may be zero or non-zero, and the "corner entries," marked @, are non-zero. (The corner entries are sometimes called "pivots.") One in fact needs only operations of types (1) and (2) to reduce A to echelon form.
Now it is easy to see that, for a matrix B in echelon form, the non-zero
rows are independent. It follows that they form a basis for the row space of B,
so the rank of B equals the number of its non-zero rows.
For some purposes it is convenient to reduce B to an even more spe-
cial form, called reduced echelon form. Using elementary operations of type (2), one can make all the entries lying directly above each of the corner entries into O's. Then by using operations of type (3), one can make all the
corner entries into 1's. The reduced echelon form of the matrix B considered
previously has the form:

1 0 * 0 * *
C= o'71 * o * *
0 0 011 * * 0 0 0 0 0 0

§1.

Review of Linear Algebra

9

It is even easier to see that, for the matrix C, its rank equals the number
of its non-zero rows.

Transpose of a matrix
Given a matrix A of size n by m, we define the transpose of A to be the matrix D of size m by n whose general entry in row i and column j is
defined by the equation di;= a;i- The matrix Dis often denoted Atr_ The following properties of the transpose operation are readily verified:
(1) (Atryr = A.
= + (2) (A+ B)tr Atr Btr.
= (a) (A. C)tr Ctr. Atr.
(4) rank Atr = rank A.
The first three follow by direct computation, and the last from the fact that
the row rank of Atr is obviously the same as the column rank of A.

EXERCISES
= 1. Let V be a vector space with inner product (x, y} and norm llxll
(x, x}1/2.
(a) Prove the Cauchy-Schwarz inequality (x, y} $ llxll IIYII• [Hint:
= If x, y -=/:- 0, set c = 1/llxll and d 1/IIYII and use the fact that
llcx ± dyll 2: O.] (b) Prove that llx + YII $ llxll + IIYII • [Hint: Compute (x + Y, x + y)
and apply (a).]
(c) Prove that llx - YII 2: IJxll - IIYll-
2. If A is an n by m matrix and Bis an m by p matrix, show that
IA· Bl$ mlAI IBI.
3. Show that the sup norm on R2 is not derived from an inner product on R2 . [Hint: Suppose (x, y) is an inner product on R2 (not the dot product)
= having the property that lxl (x, y)112 . Compute (x ± y, x ± y} and = = apply to the case x e1 and y e2.] = = 4. (a) If x (X1, X2) and y (Y1, Y2), show that the function

[ 2 - 1] [Y1]

-1

1

Y2

is an inner product on R2 . *(b) Show that the function

(x, y) = [x1 x2] [ ab be] [YY12] is an inner product on R2 if and only if b2 - ac < 0 and a > 0.

10 The Algebra and Topology of Rn

Chapter 1

*5. Let V be a vector space; let {aa} be a set of vectors of V, as a ranges over some index set J (which may be infinite). We say that the set {aa} spans V if every vector x in V can be written as a finite linear combination

of vectors from this set. The set {a 0 } is independent if the scalars are uniquely determined by x. The set {aa} is a basis for V if it both spans V and is independent.
(a) Check that the set R"'of all "infinite-tuples" of real numbers

is a vector space under component-wise addition and scalar multiplication.
= (b) Let R00 denote the subset of R"' consisting of all x (.r1, X2, ...) = such that x, 0 for all but finitely many values of i. Show R00 is a
subspace of R"'; find a basis for R00 . (c) Let :F be the set of all real-valued functions/: [a, b] - R. Show that
:F is a vector space if addition and scalar multiplication are defined in the natural way:
(! + g)(x) = f (x) + g(x),
(cf)(x) = cf(x).
(d) Let :Fs be the subset of :F consisting of all bounded functions. Let :F1 consist of all integrable functions. Let :Fe consist of all continuous functions. Let :Fo consist of all continuously differentiable functions. Let :Fp consist of all polynomial functions. Show that each of these is a subspace of the preceding one, and find a basis for :Fp. There is a theorem to the effect that every vector space has a basis. The proof is non-constructive. No one has ever exhibited specific bases for the vector spaces R"', :F, :Fe, :Fi, :Fe, :Fo.
(e) Show that the integral operator and the differentiation operator,
(IJ)(x) = /.:1: f (t) dt and (Df)(x) = /'(x),
are linear transformations. What are possible domains and ranges of these transformations, among those listed in (d)?

Matrix Inversion and Determinants 11 §2. MATRIX INVERSION AND DETERMINANTS

We now treat several further aspects of linear algebra. They are the following: elementary matrices, matrix inversion, and determinants. Proofs are included, in case some of these results are new to you.

Elementary matrices
Definition. An elementary matrix of size n by n is the matrix obtained by applying one of the elementary row operations to the identity ma-
trix In.

The elementary matrices are of three basic types, depending on which of the three operations is used. The elementary matrix corresponding to the first elementary operation has the form

1

1

0

1

1

0

1

1

The elementary matrix corresponding to the second elementary row operation has the form

1 1 1
E'=
0

C
1 1 1

. .
row i2

12 The Algebra and Topology of Rn

Chapter 1

And the elementary matrix corresponding to the third elementary row operation has the form

1 1
E" =

, row t.

One has the following basic result:

1 1

Theorem 2.1. Let A be an n by m matrix. Any elementary row operation on A may be carried out by premultiplying A by the corresponding elementary matrix.

Proof. One proceeds by direct computation. The effect of multiplying A on the left by the matrix E is to interchange rows i1 and i2 of A. Similarly, multiplying A by E' has the effect of replacing row i1 by itself plus c times
row i2. And multiplying A by E" has the effect of multiplying row i by .A. D

We will use this result later on when we prove the change of variables theorem for a multiple integral, as well as in the present section.
The inverse of a matrix
Definition. Let A be a matrix of size n by m; let B and C be matrices
= of size m by n. We say that B is a left inverse for A if B •A Im, and we
= say that C is a right inverse for A if A · C In.
Theorem 2.2. If A has both a left inverse B and a right inverse C, then they are unique and equal.
Proof. Equality follows from the computation

If B1 is another left inverse for A, we apply this same computation with B1 replacing B. We conclude that C = B 1; thus B1 and B are equal. Hence B
is unique. A similar computation shows that C is unique. D

Matrix Inversion and Determinants 13

Definition. If A has both a right inverse and a left inverse, then A is said to be invertible. The unique matrix that is both a right inverse and a left inverse for A is called the inverse of A, and is denoted A- 1 .

A necessary and sufficient condition for A to be invertible is that A be square and of maximal rank. That is the substance of the following two theorems:

Theorem 2.3. then

Let A be a matrix of size n by m. If A is invertible,
n =m = mnk A.

Proof. Step 1. ,ve show that for any k by n matrix D,
rank (D · A) s rank A.
The proof is easy. If R is a row matrix of size 1 by n, then R • A is a row
matrix that equals a linear combination of the rows of A, so it is an element of the row space of A. The rows of D • A are obtained by multiplying the rows of D by A. Therefore each row of D · A is an element of the row space
of A. Thus the row space of D · A is contained in the row space of A and our
inequality follows.
Step 2. We show that if A has a left inverse B, then the rank of A equals the number of columns of A.
= = s The equation Im B · A implies by Step 1 that m rank (B · A)
rank A. On the other hand, the row space of A is a subspace of m-tuple
space, so that rank A < m.
Step 3. We prove the theorem. Let B be the inverse of A. The fact
that B is a left inverse for A implies by Step 2 that rank A = m. The fact
that B is a right inverse for A implies that

whence by Step 2, rank A= n. □

We prove the converse of this theorem in a slightly strengthened version:

Theorem 2.4.

Let A be a matrix of size n by m. Suppose
n =m = rank A.

Then A is invertible; and furthermore, A equals a product of elementary matrices.

14 The Algebra and Topology of Rn

Chapter 1

Proof. Step 1. We note first that every elementary matrix is invert-
ible, and that its inverse is an elementary matrix. This follows from the fact
that elementary operations are invertible. Alternatively, you can check di-
rectly that the matrix E corresponding to an operation of the first type is its own inverse, that an inverse for E' can be obtained by replacing c by -c in the formula for E', and that an inverse for E" can be obtained by replacing
A by 1/ A in the formula for E".
Step 2. We prove the theorem. Let A be an n by n matrix of rank n. Let us reduce A to reduced echelon form C by applying elementary row operations. Because C is square and its rank equals the number of its rows, C must equal the identity matrix In. It follows from Theorem 2.1 that there is a sequence E1, ... , E1i: of elementary matrices such that

If we multiply both sides of this equation on the left by E;1, then by E;!1 ,
and so on, we obtain the equation
A -- E-11 . E-21 ••• E-k1'.
thus A equals a product of elementary matrices. Direct computation shows
that the matrix
is both a right and a left inverse for A. □
One very useful consequence of this theorem is the following:
Theorem 2.5. If A is a square matrix and if B is a left inverse for A, then B is also a right inverse for A.
Proof. Since A has a left inverse, Step 2 of the proof of Theorem 2.3 implies that the rank of A equals the number of columns of A. Since A is square, this is the same as the number of rows of A, so the preceding theorem implies that A has an inverse. By Theorem 2.2, this inverse must be B. □
An n by n matrix A is said to be singular if rank A < n; otherwise,
it is said to be non-singular. The theorems just proved imply that A is invertible if and only if A is non-singular.
Determinants
The determinant is a function that assigns, to each square matrix A, a number called the determinant of A and denoted <let A.

§2.

Matrix Inversion and Determinants 15

The notation IAI is often used for the determinant of A, but we are using
this notation to denote the sup norm of A. So we shall use "det A" to denote
the determinant instead. In this section, we state three axioms for the determinant function, and
we assume the existence of a function satisfying these axioms. The actual construction of the general determinant function will be postponed to a later chapter.

Definition. A function that assigns, to each n by n matrix A, a real number denoted det A, is called a determinant function if it satisfies the
following axioms:
(1) If B is the matrix obtained by exchanging any two rows of A, then det B = - det A.
(2) Given i, the function det A is linear as a function of the ith row alone.
= (3) det In l.
Condition (2) can be formulated as follows: Let i be fixed. Given an
n-tuple x, let Ai(x) denote the matrix obtained from A by replacing the ith row by x. Then condition (2) states that
= det Ai( ax+ by) a det Ai(x) + b <let Ai(y).
These three axioms characterize the determinant function uniquely, as we shall see.

EXAMPLE 1. In low dimensions, it is easy to construct the determinant function. For 1 by 1 matrices, the function
det [a]= a
will do. For 2 by 2 matrices, the function

suffices. And for 3 by 3 matrices, the function

will do, as you can readily check. For matrices of larger size, the definition is more complicated. For example, the expression for the determinant of a 4 by 4 matrix involves 24 terms; and for a 5 by 5 matrix, it involves 120 terms! Obviously, a less direct approach is needed. We shall return to this matter in Chapter 6.
Using the axioms, one can determine how the elementary row operations affect the value of the determinant. One has the following result:

16 The Algebra and Topology of Rn

Chapter 1

Theorem 2.6. Let A be an n by n matrix.
(a) If E is the elementary matrix corresponding to the operation that exchanges rows i 1 and i2 , then det(E • A) = - <let A.
(b) If E' is the elementary matrix corresponding to the operation that replaces row i 1 of A by itself plus c times row i2 , then det(E' •A)= <let A.
(c) If E" is the elementary matrix corresponding to the operation that multiplies row i of A by the non-zero scalar ,\, then det(E" •A)= ,\(det A).
(d) If A is the identity matrix In, then det A = l.

Proof. Property (a) is a restatement of Axiom 1, and (d) is a restate-

ment of Axiom 3. Property (c) follows directly from linearity (Axiom 2); it

states merely that

= detAi(.\x) .\(detAi(x)).

Now we verify (b). Note first that if A has two equal rows, then <let A = 0.
For exchanging these rows does not change the matrix A, but by Axiom 1 it changes the sign of the determinant. Now let E' be the elementary operation
= that replaces row i i1 by itself plus c times row i2. Let x equal row i1 and
let y equal row i2. We compute

det(E' • A) = <let Ai(x + cy)
= <let Ai(x) + c <let Ai(Y)
= <let Ai(x), since Ai(Y) has two equal rows, = <let A, since Ai(x) = A. □

The four properties of the determinant function stated in this theorem are what one usually uses in practice rather than the axioms themselves. They also characterize the determinant completely, as we shall see.
One can use these properties to compute the determinants of the elemen-
tary matrices. Setting A= In in Theorem 2.6, we have
= det E = -1 and <let E' 1 and det E" = .\.
We shall see later how they can be used to compute the determinant in general. Now we derive the further properties of the determinant function that we
shall need.
Theore1n 2. 7. Let A be a square matrix. If the rows of A are
= independent, then <let A f: 0; if the rows are dependent, then <let A 0.
Thus an n by n matrix A has rank n if and only if det A f: 0.

Matrix Inversion and Determinants 17
Proof. First, we note that if the ith row of A is the zero row, then det A = 0. For multiplying row i by 2 leaves A unchanged; on the other hand, it must multiply the value of the determinant by 2.
Second, we note that applying one of the elementary row operations to A
does not affect the vanishing or non-vanishing of the determinant, for it alters
the value of the determinant by a factor of either -1 or 1 or A (where A '::/; 0). Now by means of elementary row operations, let us reduce A to a matrix B
in echelon form. (Elementary operations of types (1) and (2) will suffice.) If
the rows of A are dependent, rank A < n; then rank B < n, so that B must
= have a zero row. Then det B = 0, as just noted; it follows that det A 0.
If the rows of A are independent, let us reduce B further to echelon form C. Since C is square and has rank n, C must equal the identity ma-
trix In, Then det C '::/; 0; it follows that det A'::/; 0. □
The proof just given can be refined so as to provide a method for calculating the determinant function:
Theorem 2.8. Given a square matrix A, let use reduce it to echelon form B by elementary row operations of types (1) and (2). If
B has a zero row, then det A = 0. Otherwise, let k be the number of row
exchanges involved in the reduction process. Then det A equals (-1l
times the product of the diagonal entries of B.
= Proof. If B has a zero row, then rank A < n and det A 0. So
suppose that B has no zero row. We know from (a) and (b) of Theorem 2.6
that det A= (-l)l: det B. Furthermore, B must have the form

B=

'

0 0

bnn

where the diagonal entries are non-zero. It remains to show that

det B = bub22 •••bnn•

For that purpose, let us apply elementary operations of type (2) to make the entries above the diagonal into zeros. The diagonal entries are unaffected by the process; therefore the resulting matrix has the form

0 0

bnn

18 The Algebra and Topology of Rn

Chapter 1

Since only operations of type (2) are involved, we have <let B = <let C.
Now let us multiply row 1 of C by 1/bu, row 2 by l/b22, and so on, obtaining
as our end result the identity matrix In. Property (c) of Theorem 2.6 implies that
<let In= (1/bn) (l/b22) • ••(1/bnn) <let C,
so that (using property (d))

as desired. □
Corollary 2.9. The determinant function is uniquely characterized by its three axioms. It is also characterized by the four properties listed in Theorem 2.6.
Proof. The calculation of det A just given uses only properties (a)-(d) of Theorem 2.6. These in turn follow from the three axioms. D
Theorem 2.10. Let A and B be n by n matrices. Then
= det(A · B) (<let A)· (det B).

Proof. Step 1. The theorem holds when A is an elementary matrix.

Indeed:

det(E · B) = -det B = (det E)(det B),

det(E' •B) = <let B = (<let E')(det B),
= det(E" • B) =A· det B (<let E") (det B).

= Step 2. The theorem holds when rank A n. For in that case, A is
a product of elementary matrices, and one merely applies Step 1 repeatedly.
= Specifically, if A E 1 •••Ek, then

= det(A • fl) det(E1 ···Ek • B)
= (detEi)det(E2 •• -Ei: • B)

= (det E 1) (<let E 2 ) • •• ( <let E,.:) (det B).
This equation holds for all B. In the case B = In, it tells us that

The theorem follows.

§2.

Matrix Inversion and Determinants 19

Step 3. We complete the proof by showing that the theorem holds if
rank A< n. We have in general,
= = rank (A· B) rank (A• B)tr rank (Btr • Atr) < rank Atr,
where the inequality follows from Step 1 of Theorem 2.3. Thus if rank A < n,
the theorem holds because both sides of the equation vanish. D
Even in low dimensions, this theorem would be very unpleasant to prove by direct computation. You might try it in the 2 by 2 case!
Theorem 2.11. <let Atr = <let A.
Proof. Step 1. We show the theorem holds when A is an elementary matrix.
Let E, E', and E" be elementary matrices of the three basic types. Direct
= = inspection shows that Etr E and (E")tr E", so the theorem is trivial
in these cases. For the matrix E' of type (2), we note that its transpose is another elementary matrix of type (2), so that both have determinant 1.
Step 2. We verify the theorem when A has rank n. In that case, A is a product of elementary matrices, say

Then

<let Atr = det(Elr · · · E~r · Eir) = (<let Elr) •· · (det E~r) (det Efr) = (<let Ee)··· (det E2)(<let E1) = (<let E1) (<let E2) · · · (det Ee)
=det(E1 •E2 ···Ek) =det A.

by Theorem 2.10, by Step 1,

Step 3. The theorem holds if rank A < n. In this case, rank A tr < n,
= so that <let Atr = 0 <let A. D

A formula for A- l
We know that A is invertible if and only if <let A f:. 0. Now we derive a formula for A- 1 that involves determinants explicitly.

Definition. Let A be an n by n matrix. The matrix of size n - 1 by n - 1 that is obtained from A by deleting the ith row and the Ph column of A is called the (i,j)-minor of A. It is denoted Aij• The number
(-l)Hi <let Ai;
is called the (i,j)-cofactor of A.

20 The Algebra and Topology of Rn

Chapter 1

Lemma 2.12. Let A be an n by n matrix; let b denote its entry in row i and column j.
(a) If all the entries in row i other than b vanish, then
= det A b( -1 / +i det Aii .

(b) The same equation holds if all the entries in column j other than the entry b vanish.

Proof. Step 1. We verify a special case of the theorem. Let b, a2, ... , an be fixed numbers. Given an n - 1 by n - 1 matrix D, let A(D) denote the n by n matrix

b a2 ... an
A(D) = 0
D
0
= We show that det A(D) b(<let D).
If b = 0, this result is obvious, since in that case rank A(D) < n. So
assume b =/:- 0. Define a function f by the equation
= f(D) (1/b) det A(D).
\Ve show that f satisfies the four properties stated in Theorem 2.6t so that
f(D) = det D.
Exchanging two rows of D has the effect of exchanging two rows of A(D),
which changes the value off by a factor -1. Replacing row i 1 of D by itself
plus c times row i2 of D has the effect of replacing row (i1 + 1) of A(D) by itself plus row (i2 + 1) of A(D), which leaves the value off unchanged. Multiplying row i of D by .X has the effect of multiplying row (i + 1) of A(D)
by A, which changes the value off by a factor of A. Finally, if D = In-l,
= then A(D) is in echelon form, so det A(D) b • 1 •• •1 by Theorem 2.8, and f(D) = 1.
Step 2. It follows by taking transposes that

b 0 ... 0

a2
det

=b(<let D).

D

an

Step 3. \Ve prove the theorem. Let A be a matrix satisfying the hypotheses of our theorem. One can by a sequence of i-1 exchanges of adjacent

§2.

Matrix Inversion and Determinants 21

rows bring the ?h row of A up to the top of matrix, without affecting the order of the remaining rows. Then by a sequence of j - 1 exchanges of adja-
cent columns, one can bring the Jth column of this matrix to the left edge of
the matrix, without affecting the order of the remaining columns. The matrix C that results has the form of one of the matrices considered in Steps 1
and 2. Furthermore, the (1,1)-minor C1,1 of the matrix C is identical with
the (i, j)-minor Aij of the original matrix A.
Now each row exchange changes the sign of the determinant. So does each column exchange, by Theorem 2.11. Therefore

det C = (-l)(i-l)+(j-l) <let A= (-l)i+J det A.

Thus

<let A= (-l)i+i <let C,
= (-1 i+i b det C1 ,1
= (-1/+ibdet Aii•

by Steps 1 and 2, □

Theore1n 2.13 (Cramer's rule). Let A be an n by n matrix with
successive columns a 1, ... , an. Let

X = [] and C = []
be column matrices. If A• x = c, then

Proof. Let e1, ... , en be the standard basis for Rn, where each ei 1s written as a column matrix. Let C be the matrix
= = The equations A •e; aj and A •x c imply that
By Theorem 2.10,
= (det A)· (<let C) det [a1 • • •ai-1 c ai+l ···an]-

22 The Algebra and Topology of Rn

Now C has the form

1

X1

0

Chapter 1

C= 0

Xi

0 '

0

Xn

1

where the entry Xi appears in row i and column i. Hence by the preceding

lemma,

= detC X;(-l)Hidetln-1 = Xi.

The theorem follows. D

Here now is the formula we have been seeking:

Theorem 2.14.
Then

Let A be an n by n matrix of rank n; let B = A- 1 .

= (-l);+s det A;,

b,;

<let A •

Proof. Let j be fixed throughout this argument. Let

= denote the Ph column of the matrix B. The fact that A • B In implies in = particular that A •x e; . Cramer's rule tells us that
(det A)· Xi= <let [a1 • • ·Bi-1 e; 8i+1 ···an].
We conclude from Lemma 2.12 that
= (<let A)· x, 1 • (-1)i+i det A;i.
Since Xi =bi;, our theorem follows. D
This theorem gives us an algorithm for computing the inverse of a ma-
trix A. One proceeds as follows: (1) First, form the matrix whose entry in row i and column j is (-1 )i+i det Ai;; this matrix is called the matrix of cofactors of A.
(2) Second, take the transpose of this matrix.
(3) Third, divide each entry of this matrix by <let A.

Matrix Inversion and Determinants 23
This algorithm is in fact not very useful for practical purposes; computing determinants is simply too time-consuming. The importance of this formula for A- 1 is theoretical, as we shall see. If one wishes actually to compute A- 1, there is an algorithm based on Gauss-Jordan reduction that is more efficient. It is outlined in the exercises.
Expansion by cofactors
We now derive a final formula for evaluating the determinant. This is the one place we actually need the axioms for the determinant function rather than the properties stated in Theorem 2.6.
Theorem 2.15. Let A be an n by n matrix. Let i be fixed. Then
n
det A= 1)-l)i+kao: •det AH:•
k=l
Here Aik is, as usual, the (i, k)-minor of A. This rule is called the "rule
for expansion of the determinant by cofactors of the ith row." There is a
similar rule for expansion by cofactors of the ih column, proved by taking transposes.
Proof. Let Ai(x), as usual, denote the matrix obtained from A by re-
placing the ith row by the n-tuple x. If e 1 , ... , en denote the usual basis vectors in R" (written as row matrices in this case), then the ith row of A
can be written in the form

Then

L n
<let A= aik • <let A,:(ek)

by linearity (Axiom 2),

k=l
= Ln ai1:(-l )i+k det Aik

by Lemma 2.12. D

k=l

24 The Algebra and Topology of Rn EXERCISES 1. Consider the matrix

Chapter 1

(a) Find two different left inverses for A.
(b) Show that A has no right inverse.
2. Let A be an n by m matrix with n =I- m.
= (a.) If rank A m, show there exists a matrix D that is a. product of
elementary ma.trices such that

(b) Show that A has a left inverse if and only if rank A= m. (c) Show that A has a right inverse if and only if rank A= n.
3. Verify that the functions defined in Example 1 satisfy the axioms for the determinant function.
4. (a) Let A be an n by n matrix of rank n. By applying elementary row operations to A, one can reduce A to the identity matrix. Show
that by applying the same operations, in the same order, to In, one obtains the matrix A-1. (b) Let
!~] A= [~
Calculate A-1 by using the algorithm suggested in (a). [Hint: An easy way to do this is to reduce the 3 by 6 matrix (A /3] to reduced echelon form.]
(c) Calculate A-1 using the formula involving determinants.
5. Let
where ad - be =I- 0. Find A- 1 .
*6. Prove the following:
Theorem. Let A be a k by k matrix, let D have size n by n and let C have size n by k. Then
= det [~ ~] (detA)-(detD).

§3.

Review of Topology in Rn 25

Proof. First show tha.t

[

A o

o] · [lk In C

O] = [A
D C

o] . D

Then use Lemma 2.12.

§3. REVIEW OF TOPOLOGY IN Rn
Metric spaces
Recall that if A and B are sets, then Ax B denotes the set of all ordered
pairs (a,b) for which a EA and b EB. Given a set X, a metl'ic on X is a function d: X x X ....... R such that
the following properties hold for all x, y, z E X:
(1) d(x,y) = d(y,x).
= (2) d(x, y) > 0, and equality holds if and only if x y.
(3) d(x,z) < d(x,y) + d(y, z).
A metric space is a set X together with a specific metric on X. We often suppress mention of the metric, and speak simply of "the metric space X ."
If X is a metric space with metric d, and if Y is a subset of X, then the restriction of d to the set Y x Y is a metric on Y; thus Y is a metric space in its own right. It is called a subspace of X.
For example, Rn has the metrics
d(x, Y) = II x - Y II and d(x,y) = Ix - y I;
they are called the euclidean metric and the sup metric, respectively. It follows immediately from the properties of a norm that they are metrics. For many purposes, these two metrics on Rn are equivalent, as we shall see.
We shall in this book be concerned only with the metric space Rn and its subspaces, except for the expository final section, in which we deal with general metric spaces. The space Rn is commonly called n-dimensional euclidean space.
If X is a metric space with metric d, then given Xo E X and given f > 0,
the set

26 The Algebra and Topology of Rn

Chapter 1

is called the £-neighborhood of x0 , or the £-neighborhood centered at x0 . A subset U of X is said to be open in X if for each x0 E U there is a
corresponding f > 0 such that U(x0;E) is contained in U. A subset C of X
is said to be closed in X if its complement X - C is open in X. It follows
from the triangle inequality that an £-neighborhood is itself an open set.
If U is any open set containing x 0 , we commonly refer to U simply as a neighborhood of x 0 .

Theorem 3.1. Let (X, d) be a metric spa.ce. Then finite intersections and arbitrary unions of open sets of X are open in X. Similarly, finite unions and arbitrary intersections of closed sets of X are closed in X. □

Theorem 3.2. Let X be a metric space; let Y be a subspace. A subset A of Y is open in Y if and only if it has the form

A= UnY,

where U is open in X. Similarly, a subset A of Y is closed in Y if and only if it has the form
A=CnY,
where C is closed in X. □

It follows that if A is open in Y and Y is open in X, then A is open in X. Similarly, if A is closed in Y and Y is closed in X, then A is closed in X.
If X is a metric space, a point x 0 of X is said to be a limit point of the subset A of X if every £-neighborhood of Xo intersects A in at least
one point different from Xo. An equivalent condition is to require that every
neighborhood of x0 contain infinitely many points of A.

Theorem 3.3. If A is a subset of X, then the set A consisting of A and all its limit points is a closed set of X. A subset of Xis closed if and only if it contains all its limit points. □

The set A is called the closure of A.

In Rn, the £-neighborhoods in our two standard metrics are given special

names. If a E Rn, the £-neighborhood of a in the euclidean metric is called the

open ball of radius f centered at a, and denoted B(a; f). The £-neighborhood

of a in the sup metric is called the open cube of radius f centered at a, and
denoted C(a; f). The inequalities Ix [ < II x II < y'n Ix I lead to the following

inclusions:

vn B(a; £) C C(a; E) C B(a; £).

These inclusions in turn imply the following:

§3.

Review of Topology in Rn 27

Theorem 3.4. If X is a subspace of Rn, the collection of open sets of X is the same whether one uses the euclidean metric or the sup metric on X. The same is true for the collection of closed sets of X. □

In general, any property of a metric space X that depends only on the collection of open sets of X, rather than on the specific metric involved, is called a topological property of X. Limits, continuity, and compactness are examples of such, as we shall see.

Limits and Continuity
Let X and Y be metric spaces, with metrics dx and dy, respectively. We say that a function f: X -+ Y is continuous at the point Xo of X if for each open set V of Y containing f(xo), there is an open set U of X
containing Xo such that f (U) C V. \,Ve say f is continuous if it is continuous
at each point x 0 of.\'". Continuity off is equivalent to the requirement that
for each open set V of Y, the set

is open in X, or alternatively, the requirement that for each closed set D
of Y, the set f- 1(D) is closed in X.
Continuity may be formulated in a way that involves the metrics specif-
ically. The function f is continuous at x 0 if and only if the following holds: For each f > O, there is a corresponding 8 > 0 such that
dy(f(x), f(xo)) < f whenever dx(x, xo) < .i.
This is the classical ('f-D formulation of continuity."
Note that given Xo E X it may happen that for some 8 > 0, the 8-
neighborhood of Xo consists of the point Xo alone. In that case, x 0 is called an
isolated point of X, and any function f: X-+ Y is automatically continuous
at xo! A constant function from X to Y is continuous, and so is the identity
function ix: X-+ X. So are restrictions and composites of continuous functions:

Theore1n 3.5. (a) Let xo E A, where A is a subspace of X. If
f : X -+ Y is continuous at x0 , then the restricted Junction f IA: A -+ Y
is continuous at x 0 .
{b) Let f: X-+ Y and g: Y-+ Z. If f is continuous at x0 and g is continuous at Yo = f(x 0 ), then go f: X-+ Z is continuous at x 0 • □

Theorem 3.6. the form

(a) Let X be a metric space. Let f: X-+ nn have f(x) = (f1(x), ... ,fn(x)).

28 The Algebra and Topology of R"

Chapter 1

Then J is continuous at x0 if and only if each function Ji :X --+ R is
continuous at x 0 . The functions Ji are called the component functions
off.
(b) Let J,g: X-+ R be continuous at xo. Then J + g and f - g and
J ·g are continuous at xo; and f/g is continuous at Xo if g(x0 ) # 0.
= (c) The projection function 'Tri :R" -+ R given by 1r,(x) Xi is con-
tinuous. □

These theorems imply that functions formed from the familiar real-valued continuous functions of calculus, using algebraic operations and composites, are continuous in R". For instance, since one knows that the functions ex and sin x are continuous in R, it follows that such a function as
f(s, t, u, v) = (sin(s + t))/euv

is continuous in R4 .
Now we define the notion of limit. Let X be a metric space. Let ACX
and let J: A-+ Y. Let Xo be a limit point of the domain A of J. (The point
Xo may or may not belong to A.) We say that f(x) approaches Yo as x
approaches Xo if for each open set V of Y containing y0 , there is an open set U of X containing Xo such that f(x) E V whenever x is in Un A and
x # Xo. This statement is expressed symbolically in the form

f(x)-+ Yo as x-+ Xo-

We also say in this situation that the limit of f(x), as x approaches Xo, is Yo- This statement is expressed symbolically by the equation

lim f(x) = Yo•
x-xo
Note that the requii-ement that x 0 be a limit point of A guarantees that
there exist points x different from x0 belonging to the set Un A. We do not attempt to define the limit off if x0 is not a limit point of the domain
of J.
Note also that the value off at Xo (provided f is even defined at xo) is not involved in the definition of the limit.
The notion of limit can be formulated in a way that involves the metrics specifically. One shows readily that f(x) approaches Yo as x approaches Xo
if and only if the following condition holds: For each € > 0, there is a corresponding 6 > 0 such that
o. dy(f(x), y0 ) < € whenever x E A and O< dx(x, x0) <
There is a direct relation between limits and continuity; it is the following:

Review of Topology in R" 29
Theorem 3. 7. Let f: X --+ Y. If x 0 is an isolated point of X,
then f is continuous at x 0 • Otherwise, f is continuous at x 0 if and only if f (X) --+ f (XO) as X --+ XO , □

Most of the theorems dealing with continuity have counterparts that deal with limits:

Thcoren1 3.8.

(a) Let A C X; let f: A --+ R" have the form
f(x) =(f1(x), ... , fn(x)).

Let a= (a1, ... , an)- Then f(x) --+ a as x--+ Xo if and only if fi(x)--+ ai as x --+ xo, for each i.

(b) Let f,g: A--+ R. If f(x) --+ a and g(x)--+ b as x --+ xo, then as

X --+ Xo,

J(x) + g(x)--+ a+ b,

J(x) - g(x)--+ a - b,

f(x) •g(x)--+ a• b;

also, f(x)/g(x)--+ a/b if b # 0. D

Interior and Exterior
The following concepts make sense in an arbitrary metric space. Since we shall use them only for R", we define them only in that case.

Definition. Let A be a subset of Rn. The interior of A, as a subset of Rn, is defined to be the union of all open sets of R" that are contained in A; it is denoted Int A. The exterior of A is defined to be the union of all open sets of R" that are disjoint from A; it is denoted Ext A. The boundary of A consists of those points of Rn that belong neither to Int A nor to Ext A; it is denoted Bd A.

A point x is in Bd A if and only if every open set containing x intersects both A and the complement R" - A of A. The space R" is the union of the disjoint sets Int A, Ext A, and Bd A; the first two are open in nn and the
third is closed in Rn. For example, suppose Q is the rectangle

consisting of all points x of R" such that ai < Xi < bi for all i. You can check
that

30 The Algebra and Topology of Rn

Chapter 1

We often call Int Q an open rectangle. Furthermore, Ext Q = R" - Q and Bd Q = Q - Int Q.
An open cube is a special case of an open rectangle; indeed,

The corresponding (closed) rectangle

is often called a closed cube, or simply a cube, centered at a.
EXERCISES
Throughout, let X be a metric space with metric d. 1. Show that U(x0 ; t:) is an open set. 2. Let Y C X. Give an example where A is open in Y but not open in X. Give an example where A is closed in Y but not closed in X. 3. Let ACX. Show that if C is a closed set of X and C contains A, then C contains A.
4. (a) Show that if Q is a rectangle, then Q equals the closure of Int Q.
(b) If Dis a closed set, what is the relation in general between the set D and the closure of Int D?
(c) If U is an open set, what is the relation in general between the set U
and the interior of U? 5. Let /: X - Y. Show that / is continuous if and only if for each x EX
there is a neighborhood U of x such that / IU is continuous.
6. Let X = AU B, where A and B are subspaces of X. Let /: X - Y;
suppose that the restricted functions
/IA:A-Y and /IB:B-Y
are continuous. Show that if both A and B are closed in X, then / is continuous. 7. Finding the limit of a composite function go f is easy if both / and g are continuous; see Theorem 3.5. Otherwise, it can be a bit tricky:
Let/ :X - Y and g: Y - Z. Let Xo be a limit point of X and let Yo be a limit point of Y. See Figure 3.1. Consider the following three conditions:
(i) / (x) - Yo as x - Xo. (ii) g(y) - Zo as y - Yo, (iii) g(f (x)) - zo as x - Xo. (a) Give an example where (i) and (ii) hold, but (iii) does not.
= (b) Show that if (i) and (ii) hold and if g(y0 ) zo, then (iii) holds.

Review of Topology in R" 31

= 8. Let f: R - R be defined by setting f(x) sin z if z is rational, and
/(x) = 0 otherwise. At what points is/ continuous?

9. If we denote the general point of R2 by (z, y), determine Int A, Ext A, a.nd
Bd A for the subset A of R2 specified by each of the following conditions:

= (a) x 0.

(e) x and y are rational.

(b) 0 $ X < 1.

(f) 0 < x2 + y2 < 1.

(c) 0 :5 x < 1 and O:5 y < 1.

(g) y < x2 •

(d) xis rational and y > 0.

(h) y :5 x 2 .

I

g

• --· y

• Zo

Yo

y

Figure 3.1

32 The Algebra and Topology of Rn

Chapter 1

§4. COMPACT SUBSPACES ANO CONNECTED SUBSPACES OF R"

An important class of subspaces of Rn is the class of compact spaces. We shall use the basic properties of such spaces constantly. The properties we shall need are summarized in the theorems of this section. Proofs are included, since some of these results you may not have seen before.
A second useful class of spaces is the class of connected spaces; we summarize here those few properties we shall need.
We do not attempt to deal here with compactness and connectedness in arbitrary metric spaces, but comment that many of our proofs do hold in that more general situation.
Compact spaces
Definition. Let X be a subspace of Rn. A covering of Xis a collection
of subsets of R" whose union contains X; if each of the subsets is open in Rn,
it is called an open covering of X. The space X is said to be compact if
every open covering of X contains a finite subcollection that also forms an open covering of X.
While this definition of compactness involves open sets of R", it can be reformulated in a manner that involves only open sets of the space X:
Theorem 4.1. A subspace X of Rn is compact if and only if for every collection of sets open in X whose union is X, there is a finite subcollection whose union equals X.
Proof. Suppose X is compact. Let {A0} be a collection of sets open in X whose union is X. Choose, for each a, an open set Ua of R" such
= that A 0 U0 n .X. Since X is compact, some finite subcollection of the
= collection {Ua} covers X, say for a a 1, ... , O'k. Then the sets A0 , for
a = O::i, ... , ak, have X as their union.
The proof of the con verse is similar. D
The following result is always proved in a first course in analysis, so the proof will be omitted here:
Theorem 4.2. The subspace [a, b] of R is compact. □
Definition. A subspace X of Rn is said to be bounded if there is an
M such that lxl < l.1 for all x EX.
We shall eventually show that a subspace of Rn is compact if and only if it is closed and bounded. Half of that theorem is easy; we prove it now:

§4.

Compact Subspaces and Connected Subspaces of Rn 33

Theorem 4.3. If X is a compact subspace of Rn, then X is closed and bounded.
Proof. Step 1. We show that X is bounded. For each positive integer N, let UN denote the open cube UN= C(O;N). Then UN is an open set; and U1 C U2 C • • •; and the sets UN cover all of R" (so in particular they cover X). Some finite subcollection also covers X, say for N =Ni, ... ,N1;. If M is the largest of the numbers N 1, ... , N 1;, then X is contained in UM; thus X is bounded.
Step 2. We show that X is closed by showing that the complement of X is open. Let a be a point of R" not in X; we find an £-neighborhood of a that lies in the complement of X.
For each positive integer N, consider the cube
CN={x;Jx-al <1/N}.
Then C1 :) C2 :) • • •, and the intersection of the sets CN consists of the
point a alone. Let VN be the complement of CN; then VN is an open set; and V1 C V2 C • • •; and the sets VN cover all of R"except for the point a (so they
cover X). Some finite subcollection covers X, say for N = N1, ... , N1;. If M is the largest of the numbers Ni, ... , Nk, then X is contained in VM. Then
the set CM is disjoint from X, so that in particular the open cube C(a; 1/M)
lies in the complement of X. See Figure 4.1. D

X

Figure 4.1
Corollary 4.4. Let X be a compact subspace of R. Then X has a largest element and a smallest element.

34 The Algebra and Topology of Rn

Chapter 1

Proof. Since X is bounded, it has a greatest lower bound and a least upper bound. Since X is closed, these elements must belong to X. □

Here is a basic (and familiar) result that is used constantly:

Theorem 4.5 {Extre1ne-value theorem). Let X be a compact subspace of Rm. If f : X - Rn is continuous, then f (X) is a compact subspace of Rn.
In particular, if </> : X - R is continuous, then </> has a maximum value and a minimum value.

Proof. Let {Va} be a collection of open sets of Rn that covers f(X).
The sets f- 1(Va) form an open covering of X. Hence some finitely many of
them cover X, say for a= a1, ... ,ak. Then the sets Va for a= 01, ... ,ak cover f (X). Thus f (X) is compact.
Now if</> : X --+ R is continuous, </>(X) is compact, so it has a largest
element and a smallest element. These are the maximum and minimum values of <f>. □

Now we prove a result that may not be so familiar.
Definition. Let ..Y be a subset of nn. Given f > 0, the union of the
sets B(a; f), as a ranges over all points of X, is called the €-neighborhood
of X in the euclidean metric. Similarly, the union of the sets C(a; f) is called the €-neighborhood of X in the sup metric.

Theoren1 4.6 {The €-neighborhood theorem). Let X be a com-
pact subspace of R"; let U be an open set of R"containing X. Then
there is an l > 0 such that the €-neighborhood of X (in either metric)
is contained in U.

Proof. The €-neighborhood of X in the euclidean metric is contained in the €-neighborhood of X in the sup metric. Therefore it suffices to deal only with the latter case.
Step 1. Let C be a fixed subset of R". For each x E R", we define

d(x, C) = inf {Ix - c I; c E C}.

We call d(x,C) the distance from x to C. We show it is continuous as a function of x:
Let c EC; let x, y ER". The triangle inequality implies that
d(x,C)- lx-yl < Ix-cl- lx-yl < ly-cl,

§4.

Compact Subspaces and Connected Subspaces of Rn 35

This inequality holds for all c E C; therefore

d(x,C)- lx-yl <d(y,C),

so that

d(x,C)-d(y,C) < lx-yl.

The same inequality holds if x and y are interchanged; continuity of d(x, C)
follows.

Step 2. We prove the theorem. Given U, define / : X -+ R by the equation
/ (x) = d(x, Rn - U).

Then/ is a continuous function. Furthermore, /(x) > 0 for all x EX. For if x EX, then some c5-neighborhood of xis contained in U, whence J(x) > c5.
Because X is compact, f has a minimum value €. Because f takes on only
positive values, this minimum value is positive. Then the €-neighborhood
of X is contained in U. □

This theorem does not hold without some hypothesis on the set X. If X is the x-axis in R2, for example, and U is the open set

then there is no € such that the €-neighborhood of X is contained in U. See
Figure 4.2.

Figure 4.2
Here is another familiar result.
Theorem 4.7 (Uniform continuity). Let X be a compact subspace
of Rm; let f : X -+ Rn be continuous. Given € > 0, there is a c5 > 0
such that whenever x, y E X,
Ix - y I < c5 implies I/(x) - /(y) I < €.

36 The Algebra and Topology of nn

Chapter 1

This result also holds if one uses the euclidean metric instead of the sup metric.

The condition stated in the conclusion of the theorem is called the condition of uniform continuity.

Proof. Consider the subspace X X X of nm X nm; and within this,
consider the space
.6. = { (x, x) Ix EX},
which is called the diagonal of X x X. The diagonal is a compact subspace of R2m, since it is the image of the compact space X under the continuous map f(x) = (x, x).
We prove the theorem first for the euclidean metric. Consider the function
g : X x X ~ R defined by the equation
g(x, y) = II f (x) - f (y) II•
Then consider the set of points (x, y) of X x X for which g(x, y) < €. Because g is continuous, this set is an open set of X x X. Also, it contains the diagonal .6., since g(x, x) = 0. Therefore, it equals the intersection with X x X of an open set U of Rm x Rm that contains .6.. See Figure 4.3.

(x,y) .,...,__,_-(y' y)
X

X
Figure 4.3
Compactness of .6. implies that for some 6, the 6-neighborhood of .6. is
contained in U. This is the fJ required by our theorem. For if x, y E X with
llx-yll <6,then
II (x, Y) - (y, Y) II = ll (x - Y, 0) II = II x - Y II < c5,
so that (x,y) belongs to the 6-neighborhood of the diagonal .6.. Then (x,y) belongs to U, so that g(x, y) < €, as desired.

Compact Subspaces and Connected Subspaces of Rn 37
The corresponding result for the sup metric can be derived by a similar
proof, or simply by noting that if Ix-y I < 8/ fa, then II x -y II < 8, whence I/(x) - /(y) I < II f(x) - f (y) II < €. □
To complete our characterization of the compact subspaces of Rn, we need the following lemma:
Lemma 4.8. The rectangle
= Q [a1, b1] X • • • X [an, bn]
in Rn is compact.
= Proof. We proceed by induction on n. The lemma is true for n I; we
suppose it true for n - 1 and prove it true for n. We can write
where X is a rectangle in Rn- 1. Then X is compact by the induction hypothesis. Let A be an open covering of Q.
Step 1. We show that given t E [an, bn], there is an € > 0 such that the
set
+ X X (t - €, t €)
can be covered by finitely many elements of A.
The set X x t is a compact subspace of Rn, for it is the image of X under
= the continuous map / : X _,. Rn given by f (x) (x, t). Therefore it may be
covered by finitely many elements of A, say by A1, ... , Ak. Let U be the union of these sets; then U is open and contains Xx t. See
Figure 4.4.
u
t Xx t
X
Figure 4.4
Because X x t is compact, there is an € > 0 such that the €-neighborhood
of Xx tis contained in U. Then in particular, the set Xx (t - €, t + €) is
contained in U, and hence is covered by A1, ... , A1:.

38 The Algebra and Topology of Rn

Chapter 1

Step 2. By the result of Step 1, we may for each t E [an, bn] choose an open interval V, about t, such that the set X x V, can be covered by finitely
many elements of the collection A.
Now the open intervals ¼ in R cover the interval [an, bn]; hence finitely many of them cover this interval, say for t = t1, ... , tm.
Then Q = X x (an, bn] is contained in the union of the sets X x Vi
= for t t1, ... , tm; since each of these sets can be covered by finitely many
elements of A, so may Q be covered. D

Theorem 4.9. If X is a closed and bounded subspace of R", then X is compact.

Proof. Let A be a collection of open sets that covers X. Let us adjoin
to this collection the single set R" - X, which is open in Rn because X is closed. Then we have an open covering of all of R". Because X is bounded, we can choose a rectangle Q that contains X; our collection then in particular covers Q.
Since Q is compact, some finite subcollection covers Q. If this finite sub collection contains the set R" - X, we discard it from the collection. We
then have a finite sub collection of the collection A; it may not cover all of Q,
but it certainly covers X, since the set R" - X we discarded contains no point of X. □

All the theorems of this section hold if Rn and nm are replaced by arbitrary metric spaces, except for the theorem just proved. That theorem does not hold in an arbitrary metric space; see the exercises.

Connected spaces
If X is a metric space, then X is said to be connected if X cannot be written as the union of two disjoint non-empty sets A and B, each of which is open in X.
The following theorem is always proved in a first course in analysis, so the proof will be omitted here:

Theoren1 4.10. The closed interval (a, b] of R" is connected. □

The basic fact about connected spaces that we shall use is the following:

Theorein 4.11 (Inter1nediate.value theorem). Let X be connected. If f : X --+ Y is continuous, then f(X) is a connected subspace
of Y.
= In particular, if </J : X --+ R is continuous and if f(x 0) < r < f(x1)
for some points x0 ,x1 of X, then f(x) r for some point x of X.

Compact Subspaces and Connected Subspaces of Rn 39
= Proof. Suppose / (X) AU B, where A and B are disjoint sets open
in f(X). Then J- 1(A) and J- 1(B) are disjoint sets whose union is X, and each is open in X because f is continuous. This contradicts connectedness
of X.
Given <P, let A consist of all yin R with y < r, and let B consist of ally with y > r. Then A and B are open in R; if the set f (X) does not contain r,
then f (X) is the union of the disjoint sets f(X) n A and f(X) n B, each of
which is open in f (X). This contradicts connectedness off(X). □

If a and b are points of Rn, then the line segment joining a and b is
= defined to be the set of all points x of the form x a+ t(b - a), where
0 :s; t < 1. Any line segment is connected, for it is the image of the interval
[O, 1) under the continuous map t -- a+ t(b - a). A subset A of R" is said to be convex if for every pair a,b of points of
A, the line segment joining a and b is contained in A. Any convex subset A of Rn is automatically connected: For if A is the union of the disjoint sets U and V, each of which is open in A, we need merely choose a in U and b in V, and note that if L is the line segment joining a and b, then the sets Un L
and V n L are disjoint, non-empty, and oper. in L.
It follows that in R" all open balls and open cubes and rectangles are
connected. (See the exercises.)

EXERCISES

1. Let R+ denote the set of positive real numbers.
= (a) Show that the continuous function f : R+ --+- R given by f (x)
1/(l+x) is bounded but has neither a maximum value nor a minimum value.
(b) Show that the continuous function g : R+ - R given by g(x) = sin( 1/ x) is bounded but does not satisfy the condition of uniform continuity on R+.
2. Let X denote the subset (-1, 1) X 0 of R2 , and let U be the open ball
B(O; 1) in R2 , which contains X. Show there is no£ > 0 such that the
£-neighborhood of X in R" is contained in U.
= 3. Let RO() be the set of all "infinite-tuples" x (x1, X2, ... ) of real numbers
that end in an infinite string of O's. (See the exercises of § 1.) Define an inner product on RO() by the rule (x, y) :;:; Ex,y,. (This is a finite sum, since all but finitely many terms vanish.) Let II x - y II be the corresponding metric on R00 • Define

e, = {0 O 1, 0 I • • • l I

I • • • IO I • • •) I

where 1 appears in the i' h place. Then the e, form a basis for R00 • Let X be the set of all the points ei. Show that X is closed, bounded, and non-compact.

40 The Algebra and Topology of R"

Chapter 1

4. (a) Show that open balls and open cubes in Rnare convex. (b) Show that (open and closed) rectangles in Rn are convex.

Differentiation

In this chapter, we consider functions mapping Rm into Rn, and we define what we mean by the derivative of such a function. Much of our discussion will simply generalize facts that are already familiar to you from calculus.
The two major results of this chapter are the inverse function theorem, which gives conditions under which a differentiable function from Rn to Rn has a differentiable inverse, and the implicit function theorem, which provides the theoretical underpinning for the technique of implicit differentiation as studied in calculus.
Recall that we write the elements of Rm and Rn as column matrices unless specifically stated otherwise.

§5. THE DERIVATIVE

First, let us recall how the derivative of a real-valued function of a real variable
is defined.
Let A be a subset of R; let </> : A --+ R. Suppose A contains a neighbor-
hood of the point a. We define the derivative of</> at a by the equation

"..,(
V/

a

)

--

1·
1m

¢,(a+ t)t

-

¢,(a) ,

t-+0

provided the limit exists. In this case, we say that </> is differentiable at a.
The following facts are an immediate consequence:

41

42 Differentiation

Chapter 2

(1) Differentiable functions are continuous. (2) Composites of differentiable functions are differentiable. We seek now to define the derivative of a function f mapping a subset of Rm into nn. We cannot simply replace a and tin the definition just given by
points of Rm, for we cannot divide a point of Rn by a point of Rm if m > 1!
Here is a first attempt at a definition:
Definition. Let A C Rm; let f : A - Rn. Suppose A contains a
neighborhood of a. Given u E Rm with u f. 0, define
! '(a; u ) = 11. m /(a+ tut) - J(a) , t-o
provided the limit exists. This limit depends both on a and on u; it is called the directional derivative of / at a with respect to the vector u. (In calculus, one usually requires u to be a unit vector, but that is not necessary.)

EXAMPLE 1. Let / : R2 ---+ R be given by the equation

= The directional derivative of / at a (a1, a2) with respect to the vector
u=(l,O)is
= = / '(a; u ) 11. m (a1 + t)at2 - a1a2 a2. t-o
With respect to the vector v = (1, 2), the directional derivative is
= = / '(a; v ) 11. m (a1 + t) (a2 t+ 2t) - a1a2 + a2 2a1 . t-0

It is tempting to believe that the "directional derivative" is the appropri-

ate generalization of the notion of "derivative," and to say that f is differen-

tiable at a if f'(a; u) exists for every u f. 0. This would not, however, be a

very useful definition of differentiability. It would not follow, for instance, that

differentiability implies continuity. (See Example 3 following.) Nor would it

follow that composites of differentiable functions are differentiable. (See the

exercises of § 7.) So we seek something stronger.

In order to motivate our eventual definition, let us reformulate the defi-

nition of differentiability in the single-variable case as follows:

Let A be a subset of R; let <f> : A - R. Suppose A contains a neighbor-

hood of a. We say that <f> is differentiable at a if there is a number A such

that

<f>(a + t) - <f>(a) - At _ 0
t

as

t-+- 0.

§s.

The Derivative 43

The number A, which is unique, is called the derivative of</> at a, and denoted </>' (a).
This formulation of the definition makes explicit the fact that if</> is differ-
entiable, then the linear function At is a good approximation to the "increment function" </>(a+ t)- </>(a); we often call At the "first-order approximation" or
the "linear approximation" to the increment function.
Let us generalize this version of the definition. If AC nm and if/ : A--+ R", what might we mean by a "first-order" or "linear" approximation to the increment function /(a+ h) - /(a)? The natural thing to do is to take a
function that is linear in the sense of linear algebra. This idea leads to the
following definition:

Definition. Let A C Rm, let f : A --+ R". Suppose A contains a
neighborhood of a. We say that / is differentiable at a if there is an n by
m matrix B such that

f (a+ h) - f (a) - B •h _.. 0
lhl

as

h- 0.

The matrix B, which is unique, is called the derivative off at a; it is denoted Df(a).

Note that the quotient of which we are taking the limit is defined for h in some deleted neighborhood of O, since the domain off contains a neighborhood of a. Use of the sup norm in the denominator is not essential; one obtains an equivalent definition if one replaces Ih I by II h II-
It is easy to see that B is unique. Suppose C is another matrix satisfying
this condition. Subtracting, we have
(C-B) ·h
lhl - 0
= as h - 0. Let u be a fixed vector; set h tu; let t --+ O. It follows that
(C - B) · u = O. Since u is arbitrary, C = B.

EXAMPLE 2. Let/: Rm - R" be defined by the equation
/(x) = B •x + b,
where B is an n by m matrix, and b E Rn. Then / is differentiable and
D f (x) = B. Indeed, since
= /(a+ h) - /(a) B • h,

the quotient used in defining the derivative vanishes identically.

44 Differentiation

Chapter 2

We now show that this definition is stronger than the tentative one we gave earlier, and that it is indeed a "suitable" definition of differentiability. Specifically, we verify the following facts, in this section and those following:
(1) Differentiable functions are continuous.
(2) Composites of differentiable functions are differentiable. (3) Differentiability off at a implies the existence of all the directional
derivatives of f at a.
We also show how to compute the derivative when it exists.

Theorem 5.1. Let A C Rm; let f : A - R". If f is differentiable at a, then all the directional derivatives off at a exist, and

f'(a; u) = D f (a)• u.

Proof. Let B = D f (a). Seth= tu in the definition of differentiability, where t :/ 0. Then by hypothesis,

f (a + tu) - / (a) - B · tu ~ 0
ltul

as t - 0. If t approaches Othrough positive values, we multiply (*) by lul to

conclude that

f (a + tu) - / (a) _ B . u ~ 0
t

as t - 0, as desired. If t approaches O through negative values, we multiply
(*) by -lul to reach the same conclusion. Thus f'(a; u) = B •u. D

= EXAMPLE 3. Define /: R2 - R by setting /(0) 0 and

We show all directional derivatives of/ exist at 0, but that / is not differen-
tiable at 0. Let u =f:. 0. Then

/(0 + tu) - /(0) _ (th)2 (tk) !_ if u = [hkl

t

(th)• + (tk)2 t

so that

/'(0; u) = { h2 /k
o

~£ 1£

k k

==f:.

0,
o.

§s.

The Derivative 45

Thus f'(O; u) exists for all u =/ 0. However, the function f is not differentiable
at 0. For if g : R2 -+ R is a function that is differentiable at O, then Dg(O) is a 1 by 2 matrix of the form [a b], and

g'(O; u) =ah+ bk,

which is a linear function of u. But /'(O; u) is not a linear function of u. The function f is particularly interesting. It is differentiable (and hence
continuous) on each straight line through the origin. (In fact, on the straight
= line y mx, it has the value mx/(m2 + x 2).) But f is not differentiable at
the origin; in fact, f is not even continuous at the origin! For / has value 0 at the origin, while arbitrarily near the origin are points of the form (t, t2), at
which f has value 1/2. See Figure 5.1.

Figure 5.1
Thcoren1 5.2. Let A C Rm; let f : A --+ Rn. If f is differentiable at a, then f is continuous at a.
= Proof. Let B D f (a). For h near O but different from 0, write
By hypothesis, the expression in brackets approaches 0 as h approaches 0. Then, by our basic theorems on limits,
lim [f(a + h) - f (a)] = 0.
h-O
Thus f is continuous at a. D We shall deal with composites of differentiable functions in § 7.
Now we show how to calculate D f(a), provided it exists. We first intro-
duce the notion of the "partial derivatives" of a real-valued function.

46 Differentiation

Chapter 2

Definition. Let A C Rm; let f : A - R. We define the Ph partial
derivative off at a to be the directional derivative of/ at a with respect
to the vector ei, provided this derivative exists; and we denote it by D1 f (a).
That is,

Partial derivatives are usually easy to calculate. Indeed, if we set

then the ph partial derivative of / at a equals, by definition, simply the
= ordinary derivative of the function ¢> at the point t a;. Thus the partial
derivative D; f can be calculated by treating X1, ... , X;-1, x;+ 1 , ... , Xm as constants, and differentiating the resulting function with respect to x1, using
the familiar differentiation rules for functions of a single variable.
We begin by calculating the derivative c• f in the case where f is a
real-valued function.
Theorem 5.3. Let A c Rm; let f : A - R. If f is differentiable
at a, then
That is, if D /(a) exists, it is the row matrix whose entries are the partial
derivatives of/ at a.
Proof. By hypothesis, D f (a) exists and is a matrix of size 1 by m. Let

It follows (using Theorem 5.1) that

We generalize this theorem as follows:
Theorem 5.4. Let A C Rm; let f : A - Rn. Suppose A contains a neighborhood of a. Let Ji : A - R be the ith component function off,
so that
f (x) = [ /1(:x)] .
fn(x)

§5.

The Derivative 47

(a) The function f is differentiable at a if and only if each component
function Ji is differentiable at a.
(b) If f is differentiable at a, then its derivative is the n by m matrix
whose ith row is the derivative of the function Ji.
This theorem tells us that

Df(a) = [ : D/1(a)] ,
Dfn(a)
so that D /(a) is the matrix whose entry in row i and column j is D;/i(a).
Proof. Let B be an arbitrary n by m matrix. Consider the function

F(l ) _ /(a+ h) - /(a) - B •h

l -

lhl

'

which is defined for O < lhl < € (for some€). Now F(h) is a column matrix
of size n by 1. Its ith entry satisfies the equation

= F.·(h) fi(a + h) - /i(a) - (row i of B) •h

'

lhl

•

Leth approach 0. Then the matrix F(h) approaches O if and only if each of its entries approaches 0. Hence if B is a matrix for which F(h) - O, then the ith row of Bis a matrix for which ~(h) - 0. And conversely. The theorem follows. D

Let AC Rffl and /: A - Rn. If the partial derivatives of the component
functions Ji of/ exist at a, then one can form the matrix that has D;/i(a) as
its entry in row i and column j. This matrix is called the Jacobian matrix of f. If f is differentiable at a, this matrix equals D /(a). However, it is
possible for the partial derivatives, and hence the Jacobian matrix, to exist, without it following that f is differentiable at a. (See Example 3 preceding.)
This fact leaves us in something of a quandary. We have no convenient way at present for determining whether or not a function is differentiable (other than going back to the definition). We know that such familiar functions as
sin(xy) and + xy 2 ze:cy

have partial derivatives, for that fact is a consequence of familiar theorems from single-variable analysis. But we do not know they are differentiable.
We shall deal with this problem in the next section.

48 Differentiation

Chapter 2

= = REMARK. If m 1 or n 1, our definition of the derivative is simply
a reformulation, in matrix notation, of concepts familiar from calculus. For instance, if/ : R1 --+ R3 is a differentiable function, its derivative is the column matrix
r/{(t)]
D f (t) = /Ht) .
n(t)
In calculus, f is often interpreted as a parametrized-curve, and the vector

is called the velocity vector of the curve. (Of course, in calculus one is apt to
k use i,J, and for the unit basis vectors in R3 rather than e1 ,e2 , and e3 .)
For another example, consider a differentiable function g : R3 --+ R1 . Its
derivative is the row matrix

and the directional derivative equals the matrix product Dg(x)-u. In calculus,
the function g is often interpreted as a scalar field, and the vector field

is called the gradient of g. (It is often denoted by the symbol ~g.) The directional derivative of g with respect to u is written in calculus as the dot product of the vectors grad g and u.
Note that vector notation is adequate for dealing with the derivative of / when either the domain or the range of f has dimension 1. For a general function / : Rm -+ Rn, matrix notation is needed.
EXERCISES
1. Let AC Rm; let f: A - Rn. Show that if f'(a; u) exists, then /'(a; cu)
exists and equals cf'(a; u).
2. Let f : R2 --+ R be defined by setting / (0) = 0 and
f(x,y) = xy/(x2 + y2) if (x,y) =I- 0.
(a) For which vectors u =:/- 0 does f'(0; u) exist? Evaluate it when it exists.
(b) Do Di/ and D2/ exist at 0?
(c) Is / differenti able at 0? (d) Is/ continuous at 0?

§6.

Continuously Differentiable Functions 49

3. Repeat Exercise 2 for the function f defined by setting / (0) = 0 and
= f(x, y) x 2 y2 /(:r:2y2 + (y - x)2) if (x, y) =fi 0. = 4. Repeat Exercise 2 for the function / defined by setting /(0) 0 and

5. Repeat Exercise 2 for the function f(x,y)=lxl+IYI•
6. Repeat Exercise 2 for the function
= 7. Repeat Exercise 2 for the function / defined by setting /(0) 0 and f(x, y) = x Iy 1/(x2 + 11)112 if (x, y) =fi 0.

§6. CONTINUOUSLY DIFFERENTIABLE FUNCTIONS
In this section, we obtain a useful criterion for differentiability. We know that mere existence of the partial derivatives does not imply differentiability. If, however, we impose the (comparatively mild) additional condition that these partial derivatives be continuous, then differentiability is assured.
We begin by recalling the mean-value theorem of single-variable analysis:
Theorem 6.1 (Mean-value theorem). If</>: [a, b] -+ R is continu-
ous at each point of the closed interval [a, b], and differentiable at each point of the open interval (a,b), then there exists a point c of (a,b) such that
</>(b) - </>(a)= ¢/(c)(b- a). □
In practice, we most often apply this theorem when </> is differentiable on an open interval containing [a,b]. In this case, of course,</> is continuous on [a,b].

50 Differentiation

Chapter 2

Theorem 6.2. Let A be open in Rm. Suppose that the partial derivatives D;fi(x) of the component functions off exist at each point x of A and are continuous on A. Then f is differentiable at each point of A.

A function satisfying the hypotheses of this theorem is often said to be
continuously differentiable, or of class C 1, on A.

Proof In view of Theorem 5.4, it suffices to prove that each component function of f is differentiable. Therefore we may restrict ourselves to the case of a real-valued function f : A -+ R.
Let a be a point of A. We are given that, for some€, the partial derivatives
D; f(x) exist and are continuous for Ix - al < €. We wish to show that f is
differentiable at a.
Step 1. Let h be a point of Rm with O < lhl < ~; let h1, ... , hm be the
components of h. Consider the following sequence of points of Rm:

Po= a, P1=a+h1e1, P2 =a+ h1e1 + h2e2,

The points Pi all belong to the (closed) cube C of radius Ih I centered at a.
= Figure 6.1 illustrates the case where m 3 and all hi are positive.

Figure 6.1
Since we are concerned with the differentiability of f, we shall need to
deal with the difference f(a + h) - f(a). We begin by writing it in the form
L m
/(a+ h) - f(a) = [f(p;) - /(P;-i)].
j=l

§6.

Continuously Differentiable Functions 51

Consider the general term of this summation. Let j be fixed, and define
</>(t) = f (P;-1 + te; ).
Assume h; f. 0 for the moment. As t ranges over the closed interval I with
end points O and h;, the point P;-i + te; ranges over the line segment from
P;-1 to P;i this line segment lies in C, and hence in A. Thus</> is defined for t in an open interval about I.
Ast varies, only the j th component of the point P;-i +te; varies. Hence because D;f exists at each point of A, the function </> is differentiable on
an open interval containing I. Applying the mean-value theorem to </>, we conclude that
</>(h;) - </>(0) = </>'(c; )h;
for some point c; between O and h;. (This argument applies whether h; is
positive or negative.) We can rewrite this equation in the form •

where q; is the point P;-1 + c;e; of the line segment from P;-1 to P;, which lies in C.
= We derived (**) under the assumption that h; -1- 0. If h; 0, then (**)
holds automatically, for any point q; of C.
Using (**), we rewrite (*) in the form
L m
/(a +h)- /(a)= D;/(q;)h;,
j=l
where each point Qj lies in the cube C of radius lhl centered at a.
Step 2. We prove the theorem. Let B be the matrix

B = [D1/(a) ••• Dm/(a)].

Then

L m
B -h = D;f(a)h;.
j=l

Using(***), we have
t /(a+h)- /(a)- B •h = [D;/(q;)- D;f(a)]h;;

lhl

j=l

lhl

then we let h -+ 0. Since Q; lies in the cube C of radius lhl centered at a,
we have q,; -+ a. Since the partials of / are continuous at a, the factors in

52 Differentiation

Chapter 2

brackets all go to zero. The factors h; /lhl are of course bounded in absolute
value by 1. Hence the entire expression goes to zero, as desired. D

One effect of this theorem is to reassure us that the functions familiar to us
from calculus are in fact differentiable. We know how to compute the partial
derivatives of such functions as sin(xy) and xy 2 + zexy, and we know that
these partials are continuous. Therefore these functions are differentiable.
In practice, we usually deal only with functions that are of class C1. While
it is interesting to know there are functions that are differentiable but not of
class C 1 , such functions occur rarely enough that we need not be concerned
with them.
Suppose f is a function mapping an open set A of Rm into Rn, and suppose
the partial derivatives Di/i of the component functions of/ exist on A. These then are functions from A to R, and we may consider their partial derivatives, which have the form Dk(D; Ji) and are called the second-order partial
derivatives of /. Similarly, one defines the third-order partial derivatives
of the functions fi, or more generally the partial derivatives of order r for
arbitrary r.
If the partial derivatives of the functions /i of order less than or equal
to r are continuous on A, we say / is of class er on A. Then the function / is of class er on A if and only if each function D;/i is of class cr-1 on A.
We say f is of class C00 on A if the partials of the functions /, of all orders are continuous on A.
As you may recall, for most functions the "mixed" partial derivatives

are equal. This result in fact holds under the hypothesis that the function /
is of class C2 , as we now show.
Theorem 6.3. Let A be open in Rm; let f : A-+ R be a Junction
of class C2 . Then for each a E A,

Proof Since one calculates the partial derivatives in question by letting all variables other than Xk and Xj remain constant, it suffices to consider the
case where / is a function merely of two variables. So we assume that A is open in R2 , and that / : A -+ R2 is of class C2 .
Step 1. We first prove a certain "second-order" mean-value theorem for /. Let
Q = [a, a+ h] x [b, b + k]

§6.

Continuously Differentiable Functions 53

be a rectangle contained in A. Define
>.(h,k) = f(a,b)- /(a+ h, b) - f(a,b + k) + f(a + h, b+ k).

Then >. is the sum, with appropriate signs, of the values of / at the four

vertices of Q. See Figure 6.2. We show that there are points p and q of Q

such that

>.(h,k) = D2D1/(p) •hk, and

>.(h,k) = D1D2/(q) •hk.

b+k b

T

I

I

I

+

I
l

a

s

Figure 6.2

a+h

By symmetry, it suffices to prove the first of these equations. To begin,

we define

<f,(s) = f(s, b + k) - f(s, b).

Then </>(a+ h)- </>(a)= >.(h, k), as you can check. Because D 1/ exists in A,
the function </> is differentiable in an open interval containing [a, a + h]. The
mean-value theorem implies that

</>(a + h) - <f,(a) = </>'(so) •h

for some So between a and a+ h. This equation can be rewritten in the form

Now So is fixed, and we consider the function D1/(so, t). Because D2D1f exists in A, this function is differentiable for t in an open interval about
[b, b+ k]. We apply the mean-value theorem once more to conclude that

54 Differentiation

Chapter 2

for some to between b and b + k. Combining (*) and (**) gives our desired
result.
Step 2. We prove the theorem. Given the point a = (a,b) of A and
given t > 0, let Q, be the rectangle

Qt= [a,a + t] x [b,b + t].

If t is sufficiently small, Qt is contained in A; then Step 1 implies that

for some point Pt in Qt. If we let t --+ 0, then Pt --+ a. Because D2D1f is
continuous, it follows that

A similar argument, using the other equation from Step 1, implies that

The theorem follows. D
EXERCISES
= 1. Show that the function f (x, y) lxyl is differentiable at O, but is not of
class C1 in any neighborhood of O.
2. Define / : R -+ R by setting /(0) = 0, and
f (t) = t2 sin{l/t) if t-::/- 0.
(a) Show/ is differentiable at 0, and calculate /'{0).
(b) Calculate J1 (t) if t -::/- 0.
(c) Show /' is not continuous at 0.
(d) Conclude that / is differentiable on R but not of class C1 on R.
3. Show that the proof of Theorem 6.2 goes through if we assume merely
that the partials D, f exist in a neighborhood of a and are continuous
at a. 4. Show that if AC Rm and / : A - R, and if the partials Djf exist and
are bounded in a neighborhood of a, then / is continuous at a. 5. Let f : R2 __. R2 be defined by the equation
= f(r,0) (rcos0, rsin0).
It is called the polar coordinate transformation.

Continuously Differentiable Functions 55

(a) Calculate D f and det D f.
(b) Sketch the image under / of the set S = [1, 2] x [O, 1r]. [Hint: Find
the images under / of the line segments that bound S.]
6. Repeat Exercise 5 for the function f : R2 --+ R2 given by

f(x,y) = (x2 - y2 , 2xy).

Take S to be the set

S = {(x,y) lx2 + y2 :S a2 and x ~ 0 and y ~ O}.
[Hint: = Parametrize part of the boundary of S by setting x a cost and
y = a sin t; find the image of this curve. Proceed similarly for the rest of
the boundary of S.] We remark that if one identifies the complex numbers C with R2 in
the usual way, then f is just the function f(z) = z 2 .
7. Repeat Exercise 5 for the function f : R2 --+ R2 given by
f (x, y) = (ex cosy, ex sin y).

Take S to be the set S = [O, I] x [O, 1r].
We remark that if one identifies C with R2 as usual, then f is the
function f (z) = ez.
8. Repeat Exercise 5 for the function f : R3 --+ R3 given by
f(p,</>,0) = (pcos0sin¢, psin0sincp, pcoscp).

It is called the spherical coordinate transformation. Take S to be

the set

S = [1,2] X (0,71"/2] X (0,71"/2].

9. Let g : R -+ R be a function of class C 2 . Show that

l1. m
h-o

g(a+h}-2gh(2a)+g(a-h)

_
-

g"(a) .

[Hint: Consider Step 1 of Theorem 6.3 in the case f(x, y) = g(x + y).]

*10. Define f : R2 -+ R by setting f (0) = 0, and

f(x,y) = xy(x2 -y2 )/(x2 + y2 ) if (x,y) =I- O.

(a) Show D1f and D2/ exist at 0. (b) Calculate D1/ and D2f at (x, y) =I- 0. (c) Show/ is of class C 1 on R2 . [Hint: Show D1f(x, y) equals the prod-
uct of y and a bounded function, and D2/(x,y) equals the product of x and a bounded function.]
(d) Show that D2D1f and D1D2/ exist at 0, but are not equal there.

56 Differentiation §7. THE CHAIN RULE

Chapter 2

In this section we show that the composite of two differentiable functions is differentiable, and we derive a formula for its derivative. This formula is commonly called the "chain rule."
Theorem 7.1. Let Ac Rm; let B c Rn. Let
/ : A --+ Rn and g : B --+ RP'
with f (A) C B. Suppose f (a) :::: b. If J is differentiable at a, and if g
is differentiable at b, then the composite function go f is differentiable at a. Furthermore,
D(g o f)(a) = Dg(b). D /(a),
where the indicated product is matrix multiplication.
Although this version of the chain rule may look a bit strange, it is really just the familiar chain rule of calculus in a new guise. You can convince yourself of this fact by writing the formula out in terms of partial derivatives. We shall return to this matter later.
Proof. For convenience, let x denote the general point of Rm, and let y denote the general point of Rn.
By hypothesis, g is defined in a neighborhood of b; choose f so that g(y)
is defined for IY - bl < f. Similarly, since f is defined in a neighborhood of a
and is continuous at a, we can choose 6 so that /(x) is defined and satisfies
the condition 1/(x) - bl < f, for Ix - al < 6. Then the composite function
(go f)(x) = g(/(x)) is defined for Ix - al < b. See Figure 7.1.

g •c

Figure 7.1

z ERP

§7.

The Chain Rule 57

Step 1. Throughout, let .6.(h) denote the function
= .6.(h) /(a+ h) - / (a),
which is defined for lhl < 6. First, we show that the quotient l.6.(h)l/lhJ is
bounded for h in some deleted neighborhood of 0. For this purpose, let us introduce the function F(h) defined by setting
F(O) = 0 and

F(h) = [11.(h) -,ita) •h] for O< ihi < 6.

Because / is differentiable at a, the function F is continuous at 0. Furthermore, one has the equation

.6.(h) = DJ (a) •h + lhlF(h)
= for O< lhl < 6, and also for h 0 (trivially). The triangle inequality implies
that
1.6.(h)l < mlD f (a)I lh[ + lhl IF(h)I.
Now IF(h)I is bounded for h in a neighborhood of O; in fact, it approaches 0 as h approaches 0. Therefore 1.6.(h)I / Jhl is bounded on a deleted neighborhood of 0.
Step 2. We repeat the construction of Step 1 for the function g. We
= define a function G(k) by setting G(O) 0 and

= G(k)

g(b + k) - g(b) - Dg(b) •k
lkl

for

O< lkl < f.

Because g is differentiable at b, the function G is continuous at 0. Further-
more, for lkl < f, G satisfies the equation
= g(b + k) - g(b) Dg(b). k + lklG(k).
Step 3. We prove the theorem. Let b be any point of Rm with jhl < 6. Then l.6.(h)I < f, so we may substitute .6.(h) fork in formula (**). After this
substitution, b + k becomes
= = b + .6.(h) /(a)+ .6.(h) /(a+ h),

so formula (**) takes the form
= g(f(a + b)) - g(/(a)) Dg(b) •.6.(h) + l.6.(h)IG{.6.(h)).

58 Differentiation

Chapter 2

Now we use (*) to rewrite this equation in the form

1 lh/

[g(f(a + h))

-

g(/(a))

-

Dg(b) • D f(a). h]

= Dg(b) •F(h) + 1h111 ~(h)IG(~(h)).

This equation holds for O < lhl < b. In order to show that go f is differentiable
at a with derivative Dg(b) • DJ(a), it suffices to show that the right side of
this equation goes to zero as h approaches 0. The matrix Dg(b) is constant, while F(h) --+ 0 as h --+ 0 (because F
is continuous at O and vanishes there). The factor G(~(h)) also approaches zero as h --+ O; for it is the composite of two functions G and ~, both of
which are continuous at O and vanish there. Finally, l~(h)I / lhl is bounded
in a deleted neighborhood of O, by Step 1. The theorem follows. D

Here is an immediate consequence:
Corollary 7.2. Let A be open in Rm; let B be open in R". Let
f : A --+ R" and g : B - RP,
with f(A) CB. If f and g are of class er, so is the composite function
go f.
Proof. The chain rule gives us the formula
D(g o f)(x) = Dg(f(x)) · DJ(x),
which holds for x E A. Suppose first that / and g are of class C 1 . Then the entries of Dg are
continuous real-valued functions defined on B; because f is continuous on
A, the composite function Dg (f (x)) is also continuous on A. Similarly, the entries of the matrix D f (x) are continuous on A. Because the entries of the
matrix product are algebraic functions of the entries of the matrices involved,
the entries of the product Dg (J(x)} · D J(x) are also continuous on A. Then go J is of class C 1 on A.
To prove the general case, we proceed by induction. Suppose the theorem
is true for functions of class er- 1. Let f and g be of class Cr. Then the
entries of Dg are real-valued functions of class cr-l on B. Now f is of class
cr-l on A (being in fact of class Cr); hence the induction hypothesis implies that the function D;gi(f(x)), which is a composite of two functions of class
cr- l, is of class cr- l. Since the entries of the matrix fl j (X) are also of class
cr-l on A by hypothesis, the entries of the product Dg(f(x)) ·DJ(x) are of class cr- 1 on A. Hence go f is of class Cr on A, as desired.

§7.

The Chain Rule 59

er er The theorem follows for r finite. If now / and g are of class C00 , then they
are of class for every r, whence 9 0 / is also of class for every r. □

As another application of the chain rule, we generalize the mean-value theorem of single-variable analysis to real-valued functions defined in Rm. We will use this theorem in the next section.
Theorem 7.3 (Mean-value theorem). Let A be open in Rm; let f : A ...... R be differentiable on A. If A contains the line segment with
= end points a and a+ h, then there is a point c a+ t0h with O < t0 < 1
of this line segment such that
= /(a+ h)- f(a) D/(c) •h.
= Proof. Set </>(t) f(a + th); then </> is defined fort in an open interval
about [O, 1]. Being the composite of differentiable functions, </> is differentiable; its derivative is given by the formula
</>'(t) = D f (a+ th)• h.
The ordinary mean-value theorem implies that
</>(1) - </>(O) =</>'(to)· 1
for some to with O< to < 1. This equation can be rewritten in the form
f (a+ h) - f (a) = D f (a+ t0h) · h. □

As yet another application of the chain rule, we consider the problem of differentiating an inverse function.
Recall the situation that occurs in single-variable analysis. Suppose </>(x)
is differentiable on an open interval, with </>'(x) > 0 on that interval. Then</>
is strictly increasing and has an inverse function 'Ip, which is defined by letting
1/J(y) be that unique number x such that </>(x) = y. The function 'l/J is in fact
differentiable, and its derivative satisfies the equation
= tf/(y) 1/</>'(x),
= where y </>(x).
There is a similar formula for differentiating the inverse of a function / of several variables. In the present section, we do not consider the question
whether the function f has an inverse, or whether that inverse is differentiable.
We consider only the problem of finding the derivative of the inverse function.

60 Differentiation

Chapter 2

= Theorem 7.4. Let A be open in Rn; let f : A --+ Rn; let f (a) b.
= Suppose that g maps a neighborhood of b into Rn, that g(b) a, and

= g(f(x)) X

for all x in a neighborhood of a. If f is differentiable at a and if g is differentiable at b, then

Proof. Let i : Rn --+ Rn be the identity function; its derivative is the identity matrix In. We are given that
g(f (x)) = i(x)
for all x in a neighborhood of a. The chain rule implies that
Dg(b) - D /(a)= In.
Thus Dg(b) is the inverse matrix to D f (a) (see Theorem 2.5). D
The preceding theorem implies that if a differentiable function / is to have
a differentiable inverse, it is necessary that the matrix D f be non-singular. It is a somewhat surprising fact that this condition is also sufficient for a function f of class C 1 to have an inverse, at least locally. We shall prove this
fact in the next section.
REMARK. Let us make a comment on notation. The usefulness of well-chosen notation can hardly be overemphasized. Arguments that are obscure, and formulas that are complicated, sometimes become beautifully simple once the proper notation is chosen. Our use of matrix notation for the derivative is a case in point. The formulas for the derivatives of a composite function and an inverse function could hardly be simpler.
Nevertheless, a word may be in order for those who rememher the notation used in calculus for partial derivatives, and the version of the chain rule proved there.
In advanced mathematics, it is usual to use either the functional notation
¢' or the operator notation D¢ for the derivative of a real-valued function of a real variable. (D¢ denotes a 1 by 1 matrix in this case!) In calculus,
however, another notation is common. One often denotes the derivative ¢'(x)
by the symbol d¢/dx, or, introducing the "variable" y by setting y = cp(x),
by the symbol dy/ dx. This notation was introduced by Leibnitz, one of the originators of calculus. It comes from the time when the focus of every physical and mathematical problem was on the variables involved, and when functions as such were hardly even thought about.

§7.

The Chain Rule 61

The Leibnitz notation has some familiar virtues. For one thing, it makes
the chain rule easy to remember. Given functions¢: R - R and tp: R-;, R, the derivative of the composite function tp o ¢ is given by the formula
D(l/} o ¢)(x) = D¢(¢(x)) • D¢(x).
If we introduce variables by setting y = ¢( x) and z = t/J(y), then the derivative
of the composite function z = t/J(¢(x)) can be expressed in the Leibnitz
notation by the formula
dz dz dy
dx = dy. dx.

The latter formula is easy to remember because it looks like the formula for multiplying fractions! However, this notation has its ambiguities. The letter "z," when it appears on the left side of this equation, denotes one function (a function of x); and when it appears on the right side, it denotes a different function (a function of y). This can lead to difficulties when it comes to computing higher derivatives unless one is very careful.
The formula for the derivative of an inverse function is also easy to remember. If y = ¢(x) has the inverse function x = l/J(y), then the derivative
of tp is expressed in Leibnitz notation by the equation
1
dx/dy = dy/dx'

which looks like the formula for the reciprocal of a fraction! The Leibnitz notation can easily be extended to functions of several vari-
ables. If A C Rm and f : A - R, we often set
Y = f (x) = f (x1, ... , Xm),
and denote the partial derivative Di/ by one of the symbols
of or
OXi
The Leibnitz notation is not nearly as convenient in this situation. Consider the chain rule, for example. If
f •. Rm ~ ~ R" and g: Rn - R,
then the composite function F;;;; go f maps Rm into R, and its derivative is
given by the formula

= DF(x) Dg(f(x)) • Df(x),

62 Differentiation
which can be written out in the form

Chapter 2

Dm-~~(x)] .
Dm/n(x)
The formula for the Ph partial derivative of F is thus given by the equa-
tion
L n
DiF(x) = D,.g(f(x)) Difk(x).
k=l
If we shift to "variable" notation by setting y = /(x) and z = g(y ), this
equation becomes

this is probably the version of the chain rule you learned in calculus. Only familiarity would suggest that it is easier to remember than (*)! Certainly one cannot obtain the formula for {)zj OXj by a simple-minded multiplication of fractions, as in the single-variable case.
The formula for the derivative of an inverse function is even more troublesome. Suppose f : R2 - R2 is differentiable and has a differentiable inverse function g. The derivative of g is given by the formula
= Dg(y) [D/(x))-1 .

where y = / (x). In Leibnitz notation, this formula takes the form

= 8x1/oy2] [{)yifox1 oy1/8x2]-l

8x2/oy2

oy2/8x1 {)y2/ox2

Recalling the formula for the inverse of a matrix, we see that the partial derivative OXi/Oyj is about as far from being the reciprocal of the partial derivative oy,/OXi as one could imagine!

§8.

The Inverse Function Theorem 63

EXERCISES
1. Let f : R3 - R2 satisfy the conditions /(0) = (1, 2) and
Df(O) = [ 1 2 3] . 0 0 1
Let g : R2 - R2 be defined by the equation
g(x, y) = (x + 2y + I, 3xy).
Find D(g o /)(0). 2. Let f : R2 - R3 and g: R3 - R2 be given by the equations
f (x) = (€ 2 x 1 +x2 , 3x2 - cos X1, Xi+ X2 + 2),
g(y) = (3y1 + 2yz + Yi, yt - y3 + 1).
(a) If F(x) = g(/(x)), find DF(O). [Hint: Don't compute F explicitly.]
(b) If G(y) = f (g(y)), find DG(O).
3. Let f : R3 - R and g : R2 - R be differentiable. Let F : R2 - R be
defined by the equation
F(x, y) = f (x, y, g(x, y)).
(a) Find DF in terms of the partials off and g. (b) If F(x, y) = 0 for all (x, y), find D1g and D2g in terms of the partials
off.
4. Let g: R2 - R2 be defined by the equation g(x, y) = (x,y + x 2). Let
= / : R2 - R be the function defined in Example 3 of§ 5. Let h fog.
Show that the directional derivatives of f and g exist everywhere, but that there is au =j:. 0 for which h'(O; u) does not exist.

§8. THE INVERSE FUNCTION THEOREM
Let A be open in Rn; let f : A -+ Rn be of class C 1. We know that for f to have a differentiable inverse, it is necessary that the derivative Df (x) of/
be non-singular. We now prove that this condition is also sufficient for / to have a differentiable inverse, at least locally. This result is called the inverse Junction theorem.
We begin by showing that non-singularity of D f implies that / is locally
one-to-one.

64 Differentiation

Chapter 2

Lemma 8.1. Let A be open in nn ,· let f: A - Rn be of class C 1 . If D/(a) is non-singular, then there exists an o > 0 such that the
inequality
lf(xo) - f(xi)I > alxo - xii

holds for all x0,x1 in some open cube C(a; €) centered at a. It follows that f is one-to-one on this open cube.

= Proof. Let E D f(a); then E is non-singular. We first consider the
linear transformation that maps x to E •x. We compute

= lxo - x1 I IE-1 • (E •xo - E •xi)!
< njE- 1 1 • JE •xo - E-xij.
= If we set 2o I/nlE- 1 I, then for all xo, x1 in Rn,

Now we prove the lemma. Consider the function
= H(x) f(x)-E ·x.
= Then DH(x) Df(x)-E, so thatDH(a) = 0. Because H isofclassC1, we
can choose€> 0 so that JDH(x)I < a/n for x in the open cube C = C(a; €).
The mean-value theorem, applied to the ith component function of H, tells
us that, given xo, x1 E C, there is a c E C such that

Then for xo, x1 EC, we have
olxo - x1 I> IH(xo) - H(x1)I
=I/(xo) - E •Xo - f (xi) + E •x1 I
>IE· x1 - E •xol - lf(x1) - /(xo)I
~ 20:lx1 - xol - lf(x1) - f(xo)I-
The lemma follows. D
Now we show that non-singularity of D f, in the case where f is one-to-
one, implies that the inverse function is differentiable.

§8.

The Inverse Function Theorem 65

Theorem 8.2. Let A be open in Rn; let f : A -+ Rn be of class er;
let B = f(A). If f is one-to-one on A and if Df(x) is non-singular for
x E A, then the set B is open in Rn and the inverse function g : B -+ A
is of class er.

Proof. Step 1. We prove the following elementary result: If</> : A -+ R
= is differentiable and if</> has a local minimum at x 0 E A, then D<f,(x0) 0.
To say that r/> has a local minimum at x0 means that </J(x) > ef>(x0) for
all x in a neighborhood of x 0 . Then given u f. 0,
ef>(xo + tu) - ef>(xo) > 0

for all sufficiently small values oft. Therefore
= r/>'(xo; u) lim [ef>(xo + tu) - r/>(xo)]/t t-+0
is non-negative if t approaches O through positive values, and is non-positive
= if t approaches O through negative values. It follows that ef>'(x0 ; u) 0. In = = particular, D;r/>(xo) 0 for all j, so that Def>(xo) 0.
Step 2. We show that the set B is open in Rn. Given b E B, we show B contains some open ball B(b; 6) about b.
We begin by choosing a rectangle Q lying in A whose interior contains
= the point a J- 1(b) of A. The set Bd Q is compact, being closed and
bounded in Rn. Then the set /(Bd Q) is also compact, and thus is closed and
bounded in Rn. Because f is one-to-one, f(Bd Q) is disjoint from b; because
f(Bd Q) is closed, we can choose 6 > 0 so that the ball B(b; 26) is disjoint
= from f(Bd Q). Given c E B(b; 6) we show that c f(x) for some x E Q; it = then follows that the set f (A) B contains each point of B(b; 6), as desired.
See Figure 8.1.

-I -
Figure 8.1

/(Bd Q)

66 Differentiation

Chapter 2

Given c E B(b; «5), consider the real-valued function

</>(x) = 11/(x) - cll2,

which is of class er. Because Q is compact, this function has a minimum value on Q; suppose that this minimum value occurs at the point x of Q. We
= show that /(x) c.
Now the value of </> at the point a is

</>(a) = 11/(a) - cll 2 = llb - cll2 < «52 •

Hence the minimum value of</> on Q must be less than «52 . It follows that this minimum value cannot occur on Bd Q, for if x E Bd Q, the point f (x) lies
outside the ball B(b; 2«5), so that 11/(x) - ell > «5. Thus the minimum value
of</> occurs at a point x of Int Q. Because x is interior to Q, it follows that </> has a local minimum at x;
then by Step 1, the derivative of</> vanishes at x. Since

= Ln

</>(x)

(fk(x) - c1:)2,

n

= L D;</>(x)

2(/1:(x)- c1:)D;Jk(x).

k=l

= The equation D</>(x) 0 can be written in matrix form as

= (/n(x) - Cn)] •D/(x) 0.

Now D f(x) is non-singular, by hypothesis. Multiplying both sides of this
= equation on the right by the inverse of D /(x), we see that f(x) - c O, as
desired.
Step 3. The function / : A --+ B is one-to-one by hypothesis; let g : B--+ A be the inverse function. We show g is continuous.
Continuity of g is equivalent to the statement that for each open set U of
= = A, the set V g- 1 (U) is open in B. But V f(U); and Step 2, applied to
the set U, which is open in A and hence open in Rn, tells us that V is open in Rn and hence open in B. See Figure 8.2.

§8.

The Inverse Function Theorem 67

I

g

Figure 8.2
It is an interesting fa.ct that the results of Steps 2 and 3 hold without assuming that D f (x) is non-singular, or even that / is di:fferentiable. If A is open in Rn and / : A-+ Rn is continuous and one-to-one, then it is true that /(A) is open in Rn and the inverse function g is continuous. This result is known as the Brouwer theorem on invariance of domain. Its proof requires the tools of algebraic topology and is quite difficult. We have proved the differentiable version of this theorem.

Step 4. Given b E B, we show that g is differentiable at b.

-r~7)- Let a be the point g(b), and let E = D f (a). We show that the function

G(k) = [g(b + k)

E-•. k],

which is defined _for k in a deleted neighborhood of 0, approaches 0 as k

approaches 0. Then g is differentiable at b with derivative E- 1.

Let us define

d(k) = g(b + k) - g(b)

for k near 0. We first show that there is an f > 0 such that ld(k)l/lkl is
bounded for O < lkl < f. (This would follow from differentiability of g, but

that is what we are trying to prove!) By the preceding lemma, there is a
neighborhood C of a and an a > 0 such that

lf(xo) - f(x1)I > alxo - x1 I

for x0 ,x1 EC. Now /(C) is a neighborhood of b, by Step 2; choose f so that
= h+k is in /(C) whenever lkl < f. Then for lkl < f, we can set xo g(b + k)
and x 1 = g(b) and rewrite the preceding inequality in the form

[(b + k) - bl> alg(b + k) - g(b)I,

68 Differentiation

Chapter 2

which implies that

1/a> l~(k)l/lkl,

as desired.
Now we show that G(k) ---. 0 as k ---. 0. Let 0 < lkl < f. We have

= G(k) t.(k) lkf-' •k by definition,

= -E-1 . [k - E •~(k)] l~(k)I

l~(k)I

lkl •

(Here we use the fact that ~(k) # 0 for k # 0, which follows from the fact

that g is one-to-one.) Now E- 1 is constant, and l~(k)I/ lkl is bounded. It

remains to show that the expression in brackets goes to zero. We have

= b + k = f (g(b + k)) = f(g(b) + 6.(k)) f (a+ ~(k)).

Thus the expression in brackets equals
f (a+ ~(k)) - /(a) - E · ~(k)

l~(k)I
Let k ---. 0. Then ~(k) ---. 0 as well, because g is continuous. Since f is
differentiable at a with derivative E, this expression goes to zero, as desired.

Step 5. Finally, we show the inverse function g is of class Cr. Because g is differentiable, Theorem 7.4 applies to show that its derivative is given by the formula

Dg(y) = [D/(g(y))J- 1 ,

for y E B. The function Dg thus equals the composite of three functions:

B-.!....+ A !!.L GL(n) ~ GL(n),

where GL(n) is the set of non-singular n by n matrices, and I is the function
that maps each non-singular matrix to its inverse. Now the function I is given by a specific formula involving determinants. In fact, the entries of l(C) are rational functions of the entries of C; as such, they are C00 functions of the
entries of C.
We proceed by induction on r. Suppose f is of class C1 . Then D f is
continuous. Because g and I are also continuous (indeed, g is differentiable
and I is of class C00 ), the composite function, which equals Dg, is also continuous. Hence g is of class C 1 .
Suppose the theorem holds for functions of class cr- 1. Let f be of
class er. Then in particular f is of class cr- l, so that (by the induction
hypothesis), the inverse function g is of class cr-l _ Furthermore, the function
D f is of class cr-l. VVe invoke Corollary 7.2 to conclude that the composite function, which equals Dg, is of class er- 1. Then g is of class er. □

Finally, we prove the inverse function theorem.

§8.

The Inverse Function Theorem 69

Theorem 8.3 (The inverse function theorem). Let A be open
in R"; let f : A --+ R" be of class er. If Df(x) is non-singular at
the point a of A, there is a neighoorhood U of the point a such that f carries U in a one-to-one fashion onto an open set V of R" and the
inverse function is of class er.

Proof. By Lemma 8.1, there is a neighborhood U0 of a on which f is
one-to-one. Because det Df(x) is a continuous function ofx, and det Df(a) f; 0, there is a neighborhood U1 of a such that det D f (x) f; 0 on U1 . If U equals
the intersection of Uo and U1, then the hypotheses of the preceding theorem are satisfied for / : U --+ R". The theorem follows. D

This theorem is the strongest one that can be proved in general. While
the non-singularity of D f on A implies that / is locally one-to-one at each point of A, it does not imply that f is one-to-one on all of A. Consider the
following example:

EXAMPLE 1. Let / : R2 -+ R2 be defined by the equation

J (r, 9) = (r cos 9, r sin 9).

Then

Df(r,9)= [ cos 9

-rsin 9] ,

sin 9 r cos (J

so that det Df(r,9) = r.
Let A be the open set (0, 1) x (O,b) in the (r,8) plane. Then DJ is non-
singular at each point of A. However, / is one-to-one on A only if b < 2,r.
See Figures 8.3 and 8.4.

8 1

----I--- y

Figure 8.3

70 Differentiation
()
1

---f --

Chapter 2
y

Figure B.4
EXERCISES
1. Let f : R2 - R2 be defined by the equation
J(x,y) = (x2 -'Jt,2xy).
(a) Show that f is one-to-one on the set A consisting of all (x, y) with
= = x > 0. [Hint: If f(x,y) f(a,b), then 11/(x,y)II 11/(a,b)II-] = (b) What is the set B f(A)?
(c) If g is the inverse function, find Dg(O, 1).
2. Let / : R2 - R2 be defined by the equation
= f(x,y) (excosy,exsiny).
(a) Show that / is one-to-one on the set A consisting of all (x, y) with
0 < y < 2,r. [Hint: See the hint in the preceding exercise.]
(b) What is the set B = f (A)?
(c) If g is the inverse function, find Dg(O, 1).
= 3. Let / : Rn - Rn be given by the equation / (x) llxll2 • x. Show that
/ is of class C00 and that / carries the unit ball B(O; 1) onto itself in
a one-to-one fashion. Show, however, that the inverse function is not differentiable at 0.
4. Let g ; R2 - R2 be given by the equation
Let / : R2 - R3 be given by the equation
f(x, y) = (3x - y2 , 2x + y, xy + y3).

§9.

The Implicit Function Theorem 71

(a) Show that there is a neighborhood of (0, 1) that g carries in a oneto-one fashion onto a neighborhood of (2, 0).
(b) Find D(f o g-1 ) at {2, 0). 5. Let A be open in Rn; let /: A-+ Rn be of class Cr; assume D/(x) is
non-singular for x E A. Show that even if/ is not one-to-one on A, the
set B = /(A) is open in Rn.

*§9. THE IMPLICIT FUNCTION THEOREM
The topic of implicit differentiation is one that is probably familiar to you from calculus. Here is a typical problem:
"Assume that the equation x3 y + 2e~Y = 0 determines y as
a differentiable function of x. Find dy/ dx ."
One solves this calculus problem by "looking at y as a function of x," and
differentiating with respect to x. One obtains the equation

which one solves for dy/dx. The derivative dy/dx is of course expressed in
terms of x and the unknown function y.
The case of an arbitrary function f is handled similarly. Supposing that
the equation f(x, y) = 0 determines y as a differentiable function of x, say
y = g(x), the equation J(x,g(x)) = 0 is an identity. One applies the chain
rule to calculate
of/ox+ (of /oy)g'(x) = 0,

so that

,

8//ox

9 (x) = - 8 f /8y '

where the partial derivatives are evaluated at the point (x,g(x)). Note that
the solution involves a hypothesis not given in the statement of the problem.
In order to find g'( x ), it is necessary to assume that Of/8y is non-~ero at the
point in question.
It in fact turns out that the non-vanishing of 8 J/ 8y is also sufficient
to justify the assumptions we made in solving the problem. That is, if the
function f(x,y) has the property that 8f/8y "IO at a point (a,b) that is a
solution of the equation f(x,y) = 0, then this equation does determine y as
a function of x, for x near a, and this function of x is differentiable.

72 Differentiation

Chapter 2

This result is a special case of a theorem called the implicit function
theorem, which we prove in this section.
The general case of the implicit function theorem involves a system of
equations rather than a single equation. One seeks to solve this system for
some of the unknowns in terms of the others. Specifically, suppose that J :
Rk+n -+ Rn is a function of class C 1 . Then the vector equation

is equivalent to a system of n scalar equations ink+ n unknowns. One would
expect to be able to assign arbitrary values to k of the unknowns and to solve for the remaining unknowns in terms of these. One would also expect that the resulting functions would be differentiable, and that one could by implicit differentiation find their derivatives.
There are two separate problems here. The first is the problem of finding the derivatives of these implicitly defined functions, assuming they exist; the solution to this problem generalizes the computation of g'(x) just given. The second involves showing that (under suitable conditions) the implicitly defined functions exist and are differentiable.
In order to state our results in a convenient form, we introduce a new
notation for the matrix D f and its submatrices:
Definition. Let A be open in Rm; let f : A-+ Rn be differentiable. Let
f1, ... , fn be the component functions off. We sometimes use the notation

for the derivative off. On occasion we shorten this to the notation
DJ= 8J/8x.
More generally, we shall use the notation
8(fi1,•",fi,,) 8(xii,··•,x;,)
to denote the k by f matrix that consists of the entries of D f lying in rows
ii, ... , i1: and columns Ji, ... ,Jt- The general entry of this matrix, in row p
and column q, is the partial derivative 8fip/8x;q•
Now we deal with the problem of finding the derivative of an implicitly defined function, assuming it exists and is differentiable. For simplicity, we shall assume that we have solved a system of n equations ink+ n unknowns for the last n unknowns in terms of the first k unknowns.

The Implicit Function Theorem 73

Theorem 9.1. Let A be open in Rk+n; let f : A -+ Rn be differentiable. Write fin the form f(x,y), for x E Rk and y E Rn; then DJ has the form
l· D f = [8f / 8x 8f / 8y

Suppose there is a differentiable function g : B -+ Rn defined on an open

set B in Rk, such that

f (x,g(x)) =0

for all x EB. Then for x EB,

8f Bx

(x,g(x))

+

8f
By

(x,g(x))

• Dg(x)

= 0.

This equation implies that if the n by n matrix 8 f / 8y is non-singular at the point (x, g(x)), then

Dg(x) = -

[8Byf

i-l (x, g(x))

• 8Bxf

(x,

g(x ))

.

= = Note that in the case n k 1, this is the same formula for the derivative
that was derived earlier; the matrices involved are 1 by 1 matrices in that
case.

Proof. Given g, let us define h : B-+ Rk+n by the equation
h(x) = (x, g(x)).
The hypotheses of the theorem imply that the composite function
JI (X) = f (h(X)) = f (X' g(X))
is defined and equals zero for all x E B. The chain rule then implies that
= = o DH(x) D f (h(x)) · Dh(x)
:t l· = [:~ (h(x)) (h(x)) [n:(x)]

= :: (h(x)) + :~ (h(x)) •Dg(x),
as desired. D
The preceding theorem tells us that in order to compute Dg, we must assume that the matrix 8 f / 8y is non-singular. Now we prove that the non-
singularity of 8 f / 8y suffices to guarantee that the function g exists and is
differentiable.

74 Differentiation

Chapter 2

Theorem 9.2 (Implicit function theorem). Let A be open in

Rk+n; let f : A -+ R" be of class er. Write f in the form f (x, y)'

for x E Rk and y E Rn. Suppose that (a, b) is a point of A such that
f(a, b) =0 and

det

8f
By (a,

b)

#

0.

Then there is a neighborhood B of a in Rk and a unique continuous
= function g: B-+ Rn such that g(a) b and = f (x,g(x)) 0
for all x E B. The function g is in fact of class er.

Proof. We construct a function F to which we can apply the inverse function theorem. Define F : A ____,. Rk+n by the equation
F(x,y) = (x,f(x,y)).
Then F maps the open set A of Rk+n into Rk x Rn = Rk+n. Furthermore,

DF = [&!i&x &f~&J •

Computing <let DF by repeated application of Lemma 2.12, we have
= det DF <let 8f/8y. Thus DF is non-singular at the point (a, b).
Now F(a, b) = (a,O). Applying the inverse function theorem to the map F, we conclude that there exists an open set U x V of Rk+n about (a, b) (where U is open in Rk and V is open in Rn) such that:
(1) F maps U x V in a one-to-one fashion onto an open set Win Rk+n containing (a, 0).
(2) The inverse function G: W-+ U x V is of class er.
= Note that because F(x, y) (x, f(x,y)), we have

(x,y) = G(x,f(x,y)).

Thus G preserves the first k coordinates, as F does. Then we can write G in

the form

G(x, z) = (x, h(x, z))

w for X E Rk and z E Rn; here h is a function of class er mapping into Rn.

Let B be a connected neighborhood of a in Rk, chosen small enough that

B x O is contained in 1¥. See Figure 9.1. We prove existence of the function

g: B-+ R". If x EB, then (x,O) E l1V, so we have:

G(x, 0) = (x, h(x, 0)), = = (x,O) F(x,h(x,O)) (x,f(x,h(x,o))), = 0 J(x, h(x,O)).

§9.
R"
V

The Implicit Function Theorem 75

(a, b) F G

u

Rk

(x,O) / /(a,O)

w

Rk

Figure 9.1

= = We set g(x) h(x, 0) for x EB; then g satisfies the equation f (x,g(x)) 0,
as desired. Furthermore,
(a,b) =G(a,0) = (a,h(a,0));
then b = g(a), as desired.
Now we prove uniqueness of g. Let 9o : B -+ Rn be a continuous function satisfying the conditions in the conclusion of our theorem. Then in particular, g0 agrees with g at the point a. We show that if 9o agrees with g at the point ao E B, then g0 agrees with g in a neighborhood Bo of ao. This is easy.
The map g carries a0 into V. Since 9o is continuous, there is a neighborhood
Bo of a 0 contained in B such that 9o also maps Bo into V. The fact that
/ (x, go(x)) = 0 for x E Bo implies that
F(x, 9o(x)) = (x, 0), so
(x, Yo(x)) = G(x, 0) = (x, h(x, 0)).
Thus Yo and g agree on Bo. It follows that 9o and g agree on all of B: The set
= of points of B for which lg(x) - g0(x)I 0 is open in B (as we just proved),
and so is the set of points of B for which jg(x) - go(x)I > 0 (by continuity of g and go). Since B is connected, the latter set must be empty. D
In our proof of the implicit function theorem, there was of course nothing special about solving for the last n coordinates; that choice was made simply for convenience. The same argument applies to the problem of solving for any n coordinates in terms of the others.

76 Differentiation

Chapter 2

For example, suppose A is open in R5 and f : A ----► R2 is a function
= of class er. Suppose one wishes to "solve" the equation f (x, y, z, u, v) 0
for the two unknowns y and u in terms of the other three. In this case, the
= implicit function theorem tells us that if a is a point of A such that f (a) 0
and
8/
det 8(y,u) (a) -I 0,
= then one can solve for y and u locally near that point, say y </>( x, z, v) and = u 1/J(x, z, v). Furthermore, the derivatives of</> and 1/; satisfy the formula

1- l =- 8(</>, 1P)
8(x, z, v)

[ 81 l [ 81 8(y, u) • 8(x, z, v) •

EXAMPLE 1. Let /: R2 ---.. R be given by the equation
f(x,y) = x2 + y2 - 5.
= = Then the point (x, y) (1, 2) satisfies the equation /(x, y) 0. Both {)J /8x
and {Jf / {)y are non-zero at (1,2), so we can solve this equation locally for
either variable in terms of the other. In particular, we can solve for yin terms of x, obtaining the function
y = g(x) = [5 - ;z;2]112 .
Note that this solution is not unique in a neighborhood of x = 1 unless we
specify that g is continuous. For instance, the function
for X ~ 1,
for z < 1
satisfies the same conditions, but is not continuous. See Figure 9.2.

1 Figure 9.2

§9.

The Implicit Function Theorem 77

= EXAMPLE 2. Let/ be the function of Example 1. The point (x, y) (v's, 0)
also satisfies the equation / (x, y) = 0. The derivative lJf / IJy vanishes at (-Is, 0), so we do not expect to be able to solve for y in terms of x near this point. And, in fact, there is no neighborhood B of -Is on which we can solve
for yin terms of x. See Figure 9.3.

(v's, 0)

Figure 9.3 EXAMPLE 3. Let / : R2 - R be given by the equation
= Then (0,0) is a solution of the equation f(x, y) 0. Because IJJ /IJy vanishes
at (0,0), we do not expect to be able to solve this equation for yin terms of x near (0,0). But in fact, we can; and furthermore, the solution is unique!
= However, the function we obtain is not differentiable at x 0. See Figure 9.4.

Figure 9.4
EXAMPLE 4. Let / : R2 - R be given by the equation
= J(x,y) y2 - x 4 • = Then (0,0) is a solution of the equation f(x, y) 0. Because IJ/ /IJy vanishes
at {0,0), we do not expect to be able to solve for yin terms of x near (0,0). In

78 Differentiation

Chapter 2

fact, however, we can do so, and we can do so in such a way that the resulting function is differentiable. However, the solution is not unique.

Figure 9.5

= Now the point (1,2) also satisfies the equation f(x, y) 0. Because
{)J /{)y is non-zero at (1,2), one can solve this equation for y as a continuous
= function of x in a neighborhood of x 1. See Figure 9.5. One can in fact
express y as a continuous function of x on a larger neighborhood than the one
pictured, but if the neighborhood is large enough that it contains 0, then the solution is not unique on that larger neighborhood.

EXERCISES

1. Let / : R3 - R2 be of class C 1 ; write/ in the form f(x, Y1, Y2). Assume that /(3, -1, 2) = 0 and

D/(3,

-1,

2)

=

1 [1

2

-1

(a) Show there is a function g : B - R2 of class C1 defined on an open
set B in R such that
= I (X' 91 (X), g2 ( X)) 0
for x E B, and g(3) = (-1, 2).
(b) Find Dg(3).
= (c) Discuss the problem of solving the equation / (x, Y1, Y2) 0 for an
arbitrary pair of the unknowns in terms of the third, near the point (3, -1, 2).
2. Given/: R5 - R2, of class C1 . Let a= (1,2,-1,3,0); suppose that
/(a)= 0 and

3 1 -1
DJ(a) = [: 01 2

§9.

The Implicit Function Theorem 79

(a) Show there is a function g : B - R2 of class C1 defined on an open set B of R3 such that

= for x (x1, z2, xa) EB, and g(l, 3, 0) = (2, -1).
(b) Find Dg(l, 3, 0).
= (c) Discuss the problem of solving the equation / (x) 0 for an arbitrary
pair of the unknowns in terms of the others, near the point a.
2- = 3. Let / : R R be of class C1, with /(2, -1) -1. Set
G(x,y,u) = f(x,y) +u2,
= H(x,y,u) ux+ 3y3 +u3 •
The equations G(x, y, u) = 0 and H(x, y, u) = 0 have the solution
= (x, y, u) (2, -1, 1). = (a) What conditions on DJ ensure that there are C1 functions z g(y)
= and u h(y) defined on an open set in R that satisfy both equations,
such that g(-1) = 2 and h(-1) = 1?
= (b) Under the conditions of (a), and assuming that D/(2, -1) [1 -3],
find g'(-1) and h'(-1). 4. Let F : R2 - R be of class C2 I with F(O, 0) = 0 and DF(0, 0) = (2 3).
Let G : R3 - R be defined by the equation
= G(x,y,z) F(x+2y+3z -I,x3 +y2- z2 ).
= = (a) Note that G(-2, 3, -1) F(0, 0) O. Show that one can solve
= = the equation G(x, y, z) 0 for z, say z g(x, y), for (z, y) in a = neighborhood B of (-2, 3), such that g(-2, 3) -1.
(b) Find Dg(-2, 3).
= = = *(c) If D1D1F 3 and D1D2F -1 and D2D2F 5 at (0,0), find
D2D1g(-2, 3). 5. Let /, g : R3 - R be functions of class C1 . "In general," one expects
= = that each of the equations /(x, y, z) 0 and g(x, y, z) 0 represents a
smooth surface in R3, and that their intersection is a smooth curve. Show
that if (x0 , y0 , zo) satisfies both equations, and if IJ(f, g)/8(x, y, z) has rank 2 at (xo, Yo, zo), then near (xo, Yo, zo), one can solve these equations
for two of x, y, z in terms of the third, thus representing the solution set locally as a parametrized curve.
6. Let/: Rk+n - R" be of class C 1 ; suppose that /(a)= 0 and that D/(a)
has rank n. Show that if c is a point of R" sufficiently close to O, then
= the equation / (x) c has a solution.

Integration
In this chapter, we define the integral of real-valued function of several real variables, and derive its properties. The integral we study is called Riemann integral; it is a direct generalization of the integral usually studied in a first course in single-variable analysis.
§10. THE INTEGRAL OVER A RECTANGLE
We begin by defining the volume of a rectangle. Let
be a rectangle in R". Each of the intervals [ai, bi] is called a component interval of Q. The maximum of the numbers b1 - a1, ... , bn - an is called the width of Q. Their product
is called the volwne of Q.
= In the case n l, the volume and the width of the (I-dimensional)
rectangle [a,bJ are the same, namely, the number b- a. This number is also called the length of [a,b].
Definition. Given a closed interval [a, b] of R, a partition of [a, bJ is a finite collection P of points of [a, b] that includes the points a and b. We
81

82 Integration

Chapter 3

usually index the elements of P in increasing order, for notational convenience, as
a = to < ti < · · · < t" = b;
each of the intervals [ti-l, ti], for i = 1, ... , k, is called a subinterval determined by P, of the interval [a, b]. More generally, given a rectangle

in Rn, a partition P of Q is an n-tuple (P1, ... , Pn) such that P; is a
partition of [a;, b;] for each j. If for each j, I; is one of the subintervals determined by P; of the interval [a;, b;], then the rectangle

is called a subrectangle determined by P, of the rectangle Q. The maxi-
mum width of these subrectangles is called the mesh of P.
Definition. Let Q be a rectangle in Rn; let f : Q -+ R; assume / is
bounded. Let P be a partition of Q. For each subrectangle R determined by P, let
mn(/) = inf{/(x) Ix E R},
Mn(/) = sup{/(x) Ix E R}.
We define the lower sum and the upper sum, respectively, of /, determined by P, by the equations
L L(f, P) = mn(f) •v(R), n
L U(f,P) = Mn(/)• v(R), n
where the summations extend over all subrectangles R determined by P.
Let P = (P1, ... , Pn) be a partition of the rectangle Q. If P" is a
partition of Q obtained from P by adjoining additional points to some or all
of the partitions P1, ... , Pn, then P" is called a refinement of P. Given two partitions P and P' = (P{, ... , P~) of Q, the partition

is a refinement of both P and P'; it is called their common refinement. Passing from P to a refinement of P of course affects lower sums and
upper sums; in fact, it tends to increase the lower sums and decrease the upper sums. That is the substance of the following lemma:

§10.

The Integral Over a Rectangle 83

Lemma 10.1. Let P be a partition of the rectangle Q; let f ; Q -+ R be a bounded function. If P" is a refinement of P, then
L(f, P) < L(f, P") and U(f, P") < U(f, P).

Proof Let Q be the rectangle
= Q [a1,b1] X • • • X [an,bn]•
It suffices to prove the lemma when P" is obtained by adjoining a single
additional point to the partition of one of the component intervals of Q. Suppose, to be definite, that P is the partition (P 1, ... , Pn) and that P" is obtained by adjoining the point q to the partition P1. Further, suppose that
P1 consists of the points

and that q lies interior to the subinterval [ti-i, ti]We first compare the lower sums L(f, P) and L(f, P"). Most of the
subrectangles determined by Pare also subrectangles determined by P". An exception occurs for a subrectangle determined by P of the form
Rs =[ti-1, ti] x S
(where S is one of the subrectangles of [a2 , b2] x ••• x [an, bn] determined by
(P2, ... , Pn)), The term involving the subrectangle Rs disappears from the
lower sum and is replaced by the terms involving the two subrectangles

which are determined by P". See Figure 10.1.

s

q
Figure 10.1

84 Integration

Chapter 3

Now since mn5 (f) < f(x) for each x E R~ and for each x E Ri, it
follows that

= Because v(Rs) v(R's) + v(R~) by direct computation, we have

Since this inequality holds for each subrectangle of the form Rs, it follows
that
L(f, P) < L(f, P"),
as desired.
A similar argument applies to show that U(f,P) > U(f,P"). □

Now we explore the relation between upper sums and lower sums. We have the following result:
Lemma 10.2. Let Q be a rectangle; let f : Q ---+ R be a bounded function. If P and P' are any two partitions of Q, then
L(f, P) < U(f, P').
Proof. In the case where P = P', the result is obvious: For any sub-
rectangle R determined by P, we have mn(f) < Mn(f). Multiplying by
v(R) and summing gives the desired inequality.
In general, given partitions P and P' of Q, let P" be their common
refinement. Using the preceding lemma, we conclude that
L(f, P) < L(f, P") < U(f, P") < U(f, P'). □

Now (finally) we define the integral.

Definition. Let Q be a rectangle; let f: Q -+ R be a bounded function.
As P ranges over all partitions of Q, define

= f J sup {L(f, P)} h p

and

}fqf =

inf {U(f,P)}.
p

§10.

The Integral Over a Rectangle 85

These numbers are called the lower integral and upper integral, respec-
tively, of / over Q. They exist because the numbers L(f, P) are bounded above by U(f, P') where P' is any fixed partition of Q; and the numbers U(f, P) are bounded below by L(f, P'). If the upper and lower integrals
off over Q are equal, we say / is integrable over Q, and we define the inte-
gral of/ over Q to equal the common value of the upper and lower integrals.
We denote the integral of/ over Q by either of the symbols

I. or

/(x).

xeQ

EXAMPLE 1. Let / : [a, bJ -+ R be a non-negative bounded function. If P
= is a partition of I [a, b), then L(f, P) equals the total area of a bunch of
rectangles inscribed in the region between the graph of/ and the x-axis, and
U(f, P) equals the total area of a bunch of rectangles circumscribed a.bout this region. See Figure 10.2.

L(f,P)

U(f,p)

a

b

a

b

Figure 10.2

The lower integral represents the so-called "inner area" of this region, computed by approximating the region by inscribed rectangles, while the upper integral represents the so-called "outer area," computed by approximating the region by circumscribed rectangles. If the "inner" and "outer" areas are equal, then / is integrable.
Similarly, if Q is a. rectangle in R2 and / : Q -+ R is non-negative and
bounded, one can picture L(f, P) as the total volume of a bunch of boxes inscribed in the region between the graph of/ and the zy-plane, and U(/, P)

86 Integration

Chapter 3

as the total volume of a bunch of boxes circumscribed about this region. See Figure 10.3.

Figure 10.3
= = EXAMPLE 2. Let I [D, 1]. Let / : I - R he defined by setting f (x) 0 if = xis rational, and f (x) 1 if x is irrational. We show that / is not integrable
over/.
Let P be a partition of I. If R is any subinterval determined by P, then
mR(/) = 0 and MR(!) = 1, since R contains both rational and irrational
numbers. Then
L(f, P) = L O• v(R) = D,
R
and
U(f, P) = L 1 • v(R) = I.
R
Since P is arbitrary, it follows that the lower integral of / over I equals 0, and the upper integral equals 1. Thus/ is not integrable over I.
A condition that is often useful for showing that a given function is integrable is the following:
Theorem 10.3 (The Rien1ann condition). Let Q be a rectangle; let f : Q - R be a bounded function. Then
equality holds if and only if given £ > 0, there exists a corresponding
partition P of Q Jor which
U(f, P) - L(f, P) < L

§10.

The Integral Over a Rectangle 87

Proof. Let P' be a fixed partition of Q. It follows from the fact that
L(f, P) < U(f, P') for every partition P of Q, that
1f < U(f, P').

Now we use the fact that P' is arbitrary to conclude that

Suppose now that the upper and lower integrals are equal. Choose a
partition P so that L(f, P) is within £/2 of the integral Jq f, and a partition
P' so that U(f, P') is within f./2 of the integral Jq /. Let P" be their common
refinement. Since
k L(f' P) < L(f, P") < f < u(I, P") < u(f, P'),
the lower and upper sums for f determined by P" are within f. of each other.
Conversely, suppose the upper and lower integrals are not equal. Let

Let P be any partition of Q. Then

hence the upper and lower sums for f determined by P are at least f. apart.
Thus the Riemann condition does not hold. □
Here is an easy application of this theorem.
= Theorem 10.4. Every constant function f(x) c is integrable.
Indeed, if Q is a rectangle and if P is a partition of Q, then

where the summation extends over all subrectangles determined by P.