zotero/storage/BHGGZ9EH/.zotero-ft-cache

SPACETIME AND GEOMETRY
An Introduction to General Relativity
Sean Carroll
University of Chicago
...
TT
Addison Wesley
San Francisco Boston New York Capetown Hong Kong London Madrid Mexico City Montreal Munich Paris Singapore Sydney Tokyo Toronto

Acquisitions Editor: Adam Black Project Editor: Nancy Benton Text Designer: Leslie Galen Cover Designer: Blakeley Kim Marketing Manager: Christy Lawrence Manufacturing Coordinator: Vivian McDougal Project Coordination and Electronic Page Makeup: Integre Technical Publishing Co., Inc.

Copyright© 2004 Pearson Education, Inc., publishing as Addison Wesley, 1301 Sansome St., San Francisco, CA 94111. All rights reserved. Manufactured in the United States of America. This publication is protected by Copyright and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, 1900 E. Lake Ave., Glenview, IL 60025.

Carroll, Sean
.ISBN 0-8053-8732-3 (hardcover) ~
Addison
Wesley

2 3 4 5 6 7 8 9 10 -MAL- 06 05 04 03 www.aw.com/aw

"For if each Star is little more a mathematical Point, located upon the Hemisphere of Heaven by Right Ascension and Declination, then all the Stars, taken together, tho' innumerable, must like any other set of points, in tum represent some single gigantick Equation, to the mind of God as straightforward as, say, the Equation of a Sphere,-to us unreadable, incalculable. A lonely, uncompensated, perhaps even impossible Task,-yet some of us must ever be seeking, I suppose."
-Thomas Pynchon, Mason & Dixon

Preface
General relativity is the most beautiful physical theory ever invented. It describes one of the most pervasive features of the world we experience-gravitation-in terms of an elegant mathematical structure-the differential geometry of curved spacetime-leading to unambiguous predictions that have received spectacular experimental confirmation. Consequences of general relativity, from the big bang to black holes, often get young people first interested in physics, and it is an unalloyed joy to finally reach the point in one's studies where these phenomena may be understood at a rigorous quantitative level. If you are contemplating reading this book, that point is here.
In recent decades, general relativity (GR) has become an integral and indispensable part of modem physics. For a long time after it was proposed by Einstein in 1916, GR was counted as a shining achievement that lay somewhat outside the mainstream of interesting research. Increasingly, however, contemporary students in a variety of specialties are finding it necessary to study Einstein's theory. In addition to being an active research area in its own right, GR is part of the standard syllabus for anyone interested in astrophysics, cosmology, string theory, and even particle physics. This is not to slight the more pragmatic uses of GR, including the workings of the Global Positioning System (GPS) satellite network.
There is no shortage of books on GR, and many of them are excellent. Indeed, approximately thirty years ago witnessed the appearance of no fewer than three books in the subject, each of which has become a classic in its own right: those by Weinberg (1972), Misner, Thome, and Wheeler (1973), and Hawking and Ellis (1975). Each of these books is suffused with a strongly-held point of view advocated by the authors. This has led to a love-hate relationship between these works and their readers; in each case, it takes little effort to find students who will declare them to be the best textbook ever written, or other students who find them completely unpalatable. For the individuals in question, these judgments may very well be correct; there are many different ways to approach this subject.
The present book has a single purpose: to provide a clear introduction to general relativity, suitable for graduate students or advanced undergraduates. I have attempted to include enough material so that almost any one-semester introductory course on GR can find the appropriate subjects covered in the text, but not too much more than that. In particular, I have tried to resist the temptation to write a comprehensive reference book. The only goal of this book is to teach you GR.
An intentional effort has been made to prefer the conventional over the idiosyncratic. If I can be accused of any particular ideological bias, it would be a
vii

Preface
tendency to think of general relativity as a field theory, a point of view that helps one to appreciate the connections among GR, particle physics, and string theory. At the same time, there are a number of exciting astrophysical applications of GR (black holes, gravitational lensing, the production and detection of gravitational waves, the early universe, the late universe, the cosmological constant), and I have endeavored to include at least enough background discussion of these issues to prepare students to tackle the current literature.
The primary question facing any introductory treatment of general relativity is the level of mathematical rigor at which to operate. There is no uniquely proper solution, as different students will respond with different levels of understanding and enthusiasm to different approaches. Recognizing this, I have tried to provide something for everyone. I have not shied away from detailed formalism, but have also attempted to include concrete examples and informal discussion of the concepts under consideration. Much of the most mathematical material has been relegated to the Appendices. Some of the material in the Appendices is actually an integral part of the course (for example, the discussion of conformal diagrams), but an individual reader or instructor can decide just when it is appropriate to delve into them; signposts are included in the body of the text.
Surprisingly, there are very few formal prerequisites for learning general relativity; most of the material is developed as we go along. Certainly no prior exposure to Riemannian geometry is assumed, nor would it necessarily be helpful. It would be nice to have already studied some special relativity; although a discussion is included in Chapter 1, its purpose is more to review the basics and and introduce some notation, rather than to provide a self-contained introduction. Beyond that, some exposure to electromagnetism, Lagrangian mechanics, and linear algebra might be useful, but the essentials are included here.
The structure of the book should be clear. The first chapter is a review of special relativity and basic tensor algebra, including a brief discussion of classical field theory. The next two chapters introduce manifolds and curvature in some detail; some motivational physics is included, but building a mathematical framework is the primary goal. General relativity proper is introduced in Chapter 4, along with some discussion of alternative theories. The next four chapters discuss the three major applications of GR: black holes (two chapters), perturbation theory and gravitational waves, and cosmology. Each of these subjects has witnessed an explosion of research in recent years, so the discussions here will be necessarily introductory, but I have tried to emphasize issues of relevance to current work. These three applications can be covered in any order, although there are interdependencies highlighted in the text. Discussions of experimental tests are sprinkled through these chapters. Chapter 9 is a brief introduction to quantum field theory in curved spacetime; this is not a necessary part of a first look at GR, but has become increasingly important to work in quantum gravity and cosmology, and therefore deserves some mention. On the other hand, a few topics are scandalously neglected; the initial-value problem and cosmological perturbation theory come to mind, but there are others. Fortunately there is no shortage of other resources. The Appendices serve various purposes: There are discussions of

Preface

ix

technical points that were avoided in the body of the book, crucial concepts that could have been put in various places, and extra topics that are useful but outside the main development.
Since the goal of the book is pedagogy rather than originality, I have often leaned heavily on other books (listed in the bibliography) when their expositions seemed perfectly sensible to me. When this leaning was especially heavy, I have indicated it in the text itself. It will be clear that a primary resource was the book by Wald (1984), which has become a standard reference in the field; readers of this book will hopefully be well-prepared to jump into the more advanced sections of Wald's book.
This book grew out of a set of lecture notes that were prepared when I taught a course on GR at MIT. These notes are available on the web for free, and will continue to be so; they will be linked to the website listed below. Perhaps a little over half of the material here is contained in the notes, although the advantages of owning the book (several copies, even) should go without saying.
Countless people have contributed greatly both to my own understanding of general relativity and to this book in particular-too many to acknowledge with any hope of completeness. Some people, however, deserve special mention. Ted Pyne learned the subject along with me, taught me a great deal, and collaborated with me the first time we taught a GR course, as a seminar in the astronomy department at Harvard; parts of this book are based on our mutual notes. Nick Warner taught the course at MIT from which I first learned GR, and his lectures were certainly a very heavy influence on what appears here. Neil Cornish was kind enough to provide a wealth of exercises, many of which have been included at the end of each chapter. And among the many people who have read parts of the manuscript and offered suggestions, Sanaz Arkani-Hamed was kind enough to go through the entire thing in great detail.
I would also like to thank everyone who either commented in person or by email on different parts of the book; these include Tigran Aivazian, Teodora Beloreshka, Ed Bertschinger, Patrick Brady, Peter Brown, Jennifer Chen, Michele Ferraz Figueir6, Eanna Flanagan, Jacques Frie, Ygor Geurts, Marco Godina, Monica Guica, Jim Hartle, Tamas Hauer, Daniel Holz, Ted Jacobson, Akash Kansagra, Chuck Keeton, Arthur Kosowsky, Eugene Lim, Jorma Louko, Robert A. McNees, Hayri Mutluay, Simon Ross, ltai Seggev, Robert Wald, and Barton Zwiebach. Apologies are due to anyone I may have neglected to mention. And along the way I was fortunate to be the recipient of wisdom and perspective from numerous people, including Shadi Bartsch, George Field, Deryn Fogg, Ilana Harms, Gretchen Helfrich, Mari Ruti, Maria Spiropulu, Mark Trodden, and of course my family. (This wisdom often came in the form, "What were you thinking?") Finally, I would like to thank the students in my GR classes, on whom the strategies deployed here were first tested, and express my gratitude to my students and collaborators, for excusing my book-related absences when I should have been doing research.
My friends who have written textbooks themselves tell me that the first printing of a book will sometimes contain mistakes. In the unlikely event,that this happens

Preface
here, there will be a list of errata kept at the website for the book:
http://spacetimeandgeometry.net/
The website will also contain other relevant links of interest to readers. During the time I was working on this book, I was supported by the National
Science Foundation, the Department of Energy, the Alfred P. Sloan Foundation, and the David and Lucile Packard Foundation.
Sean Carroll Chicago, Illinois June 2003

Contents

1 ■ Special Relativity and Flat Spacetime

1

1.1 Prelude 1 1.2 Space and Time, Separately and Together 3 1.3 Lorentz Transformations 12

1.4 Vectors 15

1.5 Dual Vectors (One-Forms) 18

1.6 Tensors 21 1.7 Manipulating Tensors 25 1.8 Maxwell's Equations 29 1.9 Energy and Momentum 30 1.10 Classical Field Theory 37 1.11 Exercises 45

2 ■ Manifolds

48

2.1 Gravity as Geometry 48 2.2 What Is a Manifold? 54 2.3 Vectors Again 63 2.4 Tensors Again 68 2.5 The Metric 71 2.6 An Expanding Universe 76

2.7 Causality 78

2.8 Tensor Densities 82 2.9 Differential Forms 84

2.10 Integration 88 2.11 Exercises 90

3 ■ Curvature

93

3.1 Overview 93 3.2 Covariant Derivatives 94

3.3 Parallel Transport and Geodesics 102

xi

xii

Contents

3.4 Properties of Geodesics 108 3.5 The Expanding Universe Revisited 113 3.6 The Riemann Curvature Tensor 121 3.7 Properties of the Riemann Tensor 126 3.8 Symmetries and Killing Vectors 133 3.9 Maximally Symmetric Spaces 139 3.10 Geodesic Deviation 144 3.11 Exercises 146

4 ■ Gravitation

151

4.1 Physics in Curved Spacetime 151

4.2 Einstein's Equation 155

4.3 Lagrangian Formulation 159

4.4 Properties of Einstein's Equation 165

4.5 The Cosmological Constant 171

4.6 Energy Conditions 174

4.7 The Equivalence Principle Revisited 177

4.8 Alternative Theories 181

4.9 Exercises 190

5 ■ The Schwarzschild Solution

193

5.1 The Schwarzschild Metric 193

5.2 Birkhoff's Theorem 197

5.3 Singularities 204

5.4 Geodesics of Schwarzschild 205

5.5 Experimental Tests 212

5.6 Schwarzschild Black Holes 218

5.7 The Maximally Extended Schwarzschild Solution 222

5.8 Stars and Black Holes 229

5.9 Exercises 236

6 ■ More General Black Holes

238

6.1 The Black Hole Zoo 238

6.2 Event Horizons 239

6.3 Killing Horizons 244

6.4 Mass, Charge, and Spin 248

6.5 Charged (Reissner-Nordstrom) Black Holes 254

6.6 Rotating (Kerr) Black Holes 261

6.7 The Penrose Process and Black-Hole Thermodynamics 267

6.8 Exercises 272

Contents

xiii

7 ■ Perturbation Theory and Gravitational Radiation

274

7.1 Linearized Gravity and Gauge Transformations 274

7.2 Degrees of Freedom 279

7.3 Newtonian Fields and Photon Trajectories 286

7.4 Gravitational Wave Solutions 293

7.5 Production of Gravitational Waves 300

7.6 Energy Loss Due to Gravitational Radiation 307

7.7 Detection of Gravitational Waves 315

7.8 Exercises 320

8 ■ Cosmology

323

8.1 Maximally Symmetric Universes 323

8.2 Robertson-Walker Metrics 329

8.3 The Friedmann Equation 333

8.4 Evolution of the Scale Factor 338

8.5 Redshifts and Distances 344

8.6 Gravitational Lensing 349

8.7 Our Universe 355

8.8 Inflation 365

8.9 Exercises 374

9 ■ Quantum Field Theory in Curved Spacetime

376

9.1 Introduction 376

9.2 Quantum Mechanics 378

9.3 Quantum Field Theory in Flat Spacetime 385

9.4 Quantum Field Theory in Curved Spacetime 394

9.5 The Unruh Effect 402

9.6 The Hawking Effect and Black Hole Evaporation 412

APPENDIXES

423

A ■ Maps between Manifolds

423

B ■ Diffeomorphisms and Lie Derivatives

429

C ■ Submanifolds

439

D ■ Hypersurfaces

443

xiv

Contents

E ■ Stokes's Theorem

453

F ■ Geodesic Congruences

459

G ■ Conformal Transformations

467

H ■ Conformal Diagrams

471

I ■ The Parallel Propagator

479

J ■ Noncoordinate Bases

483

Bibliography

495

Index

501

CHAPTER
1

Special Relativity and Flat Spacetime

1.1 ■ PRELUDE

General relativity (GR) is Einstein's theory of space, time, and gravitation. At heart it is a very simple subject (compared, for example, to anything involving quantum mechanics). The essential idea is perfectly straightforward: while most forces of nature are represented by fields defined on spacetime (such as the electromagnetic field, or the short-range fields characteristic of subnuclear forces), gravity is inherent in spacetime itself. In particular, what we experience as "gravity" is a manifestation of the curvature of spacetime.
Our task, then, is clear. We need to understand spacetime, we need to understand curvature, and we need to understand how curvature becomes gravity. Roughly, the first two chapters of this book are devoted to an exploration of spacetime, the third is about curvature, and the fourth explains the relationship between curvature and gravity, before we get into applications of the theory. However, let's indulge ourselves with a short preview of what is to come, which will perhaps motivate the initial steps of our journey.
GR is a theory of gravity, so we can begin by remembering our previous theory of gravity, that of Newton. There are two basic elements: an equation for the gravitational field as influenced by matter, and an equation for the response of matter to this field. The conventional Newtonian statement of these rules is in terms of forces between particles; the force between two objects of masses Mand
m separated by a vector r = re(r) is the famous inverse-square law,

GMm F = -r-2 -e(r),

(1.1)

and this force acts on a particle of mass m to give it an acceleration according to Newton's second law,

F=ma.

(1.2)

Equivalently, we could use the language of the gravitational potential <I>; the potential is related to the mass density p by Poisson's equation,

(1.3)

and the acceleration is given by the gradient of the potential,

a= V<I>.

(1.4)

1

Chapter 1 Special Relativity and Flat Spacetime

Either (1.1) and (1.2), or (1.3) and (1.4), serve to define Newtonian gravity. To define GR, we need to replace each of them by statements about the curvature of spacetime.
The hard part is the equation governing the response of spacetime curvature to the presence of matter and energy. We will eventually find what we want in the form of Einstein's equation,

(1.5)

This looks more forbidding than it should, largely because of those Greek subscripts. In fact this is simply an equation between 4 x 4 matrices, and the subscripts label elements of each matrix. The expression on the left-hand side is a measure of the curvature of spacetime, while the right-hand side measures the energy and momentum of matter, so this equation relates energy to curvature, as promised. But we will defer until later a detailed understanding of the inner workings of Einstein's equation.
The response of matter to spacetime curvature is somewhat easier to grasp: Free particles move along paths of "shortest possible distance," or geodesics. In other words, particles try their best to move on straight lines, but in a curved spacetime there might not be any straight lines (in the sense we are familiar with from Euclidean geometry), so they do the next best thing. Their parameterized paths xµ,(),.) obey the geodesic equation:

-dd2-)x-2+µ,

r

µ

,
pa

-ddx).P,-ddx-).a,

o - •

(1.6)

At this point you aren't expected to understand (1.6) any more than (1.5); but soon enough it will all make sense.
As we will discuss later, the universal nature of geodesic motion is an extremely profound feature of GR. This universality is the origin of our claim that gravity is not actually a "force," but a feature ofspacetime. A charged particle in an electric field feels an acceleration, which deflects it from straight-line motion; in contrast, a particle in a gravitational field moves along a path that is the closest thing there is to a straight line. Such particles do not feel acceleration; they are freely falling. Once we become more familiar with the spirit of GR, it will make perfect sense to think of a ball flying through the air as being more truly "unaccelerated" than one sitting on a table; the one sitting a table is being deflected away from the geodesic it would like to be on (which is why we feel a force on our feet as we stand on Earth).
The basic concept underlying our description of spacetime curvature will be that of the metric tensor, typically denoted by gµ,v• The metric encodes the ge-
ometry of a space by expressing deviations from Pythagoras's theorem, (L'll)2 =
(L'lx)2 + (L'ly)2 (where Ill is the distance between two points defined on a Carte-
sian grid with coordinate separations L'lx and L'ly). This familiar formula is valid only in conventional Euclidean geometry, where it is implicitly assumed that space is flat. In the presence of curvature our deeply ingrained notions of ge-

1.2 Space and Time, Separately and Together

3

ometry will begin to fail, and we can characterize the amount of curvature by keeping track of how Pythagoras's relation is altered. This information is contained in the metric tensor. From the metric we will derive the Riemann curvature tensor, used to define Einstein's equation, and also the geodesic equation. Setting up this mathematical apparatus is the subject of the next several chapters.
Despite the need to introduce a certain amount of formalism to discuss curvature in a quantitative way, the essential notion of GR ("gravity is the curvature of spacetime") is quite simple. So why does GR have, at least in some benighted circles, a reputation for difficulty or even abstruseness? Because the elegant truths of Einstein's theory are obscured by the accumulation of certain pre-relativity notions which, although very useful, must first be discarded in order to appreciate the world according to GR. Specifically, we live in a world in which spacetime curvature is very small, and particles are for the most part moving quite slowly compared to the speed of light. Consequently, the mechanics of Galileo and Newton comes very naturally to us, even though it is only an approximation to the deeper story.
So we will set about learning the deeper story by gradually stripping away the layers of useful but misleading Newtonian intuition. The first step, which is the subject of this chapter, will be to explore special relativity (SR), the theory of spacetime in the absence of gravity (curvature). Hopefully this is mostly review, as it will proceed somewhat rapidly. The point will be both to recall what SR is all about, and to introduce tensors and related concepts that will be crucial later on, without the extra complications of curvature on top of everything else. Therefore, for this chapter we will always be working in flat spacetime, and furthermore we will only use inertial (Cartesian-like) coordinates. Needless to say it is possible to do SR in any coordinate system you like, but it turns out that introducing the necessary tools for doing so would take us halfway to curved spaces anyway, so we will put that off for a while.

1.2 ■ SPACE AND TIME, SEPARATELY AND TOGETHER
A purely cold-blooded approach to GR would reverse the order of Chapter 2 (Manifolds) and Chapter 1 (Special Relativity and Flat Spacetime). A manifold is the kind of mathematical structure used to describe spacetime, while special relativity is a model that invokes a particular kind of spacetime (one with no curvature, and hence no gravity). However, if you are reading this book you presumably have at least some familiarity with special relativity (SR), while you may not know anything about manifolds. So our first step will be to explore the relatively familiar territory of SR, taking advantage of this opportunity to introduce concepts and notation that will be crucial to later developments.
Special relativity is a theory of the structure of spacetime, the background on which particles and fields evolve. SR serves as a replacement for Newtonian mechanics, which also is a theory of the structure of spacetime. In either case, we can distinguish between this basic structure and the various dynamical laws govern-

4

Chapter 1 Special Relativity and Flat Spacetime

particle worldline-

space at a fixed time

FIGURE 1.1 In Newtonian spacetime there is an absolute slicing into distinct copies of space at different moments in time. Particle worldlines are constrained to move forward in time, but can travel through space at any velocity; there is universal agreement on the question of whether two events at different points in space occur at the same moment of time.
ing specific systems: Newtonian gravity is an example of a dynamical system set within the context of Newtonian mechanics, while Maxwell's electromagnetism is a dynamical system operating within the context of special relativity.
Spacetime is a four-dimensional set, with elements labeled by three dimensions of space and one of time. (We'll do a more rigorous job with the definitions in the next chapter.) An individual point in spacetime is called an event. The path of a particle is a curve through spacetime, a parameterized one-dimensional set of events, called the worldline. Such a description applies equally to SR and Newtonian mechanics. In either case, it seems clear that "time" is treated somewhat differently than "space"; in particular, particles always travel forward in time, whereas they are free to move back and forth in space.
There is an important difference, however, between the set of allowed paths that particles can take in SR and those in Newton's theory. In Newtonian mechanics, there is a basic division of spacetime into well-defined slices of "all of space at a fixed moment in time." The notion of simultaneity, when two events occur at the same time, is unambiguously defined. Trajectories of particles will move ever forward in time, but are otherwise unconstrained; in particular, there is no limit on the relative velocity of two such particles.
In SR the situation is dramatically altered: in particular, there is no well-defined notion oftwo separated events occurring "at the same time." That is not to say that spacetime is completely structureless. Rather, at any event we can define a light cone, which is the locus of paths through spacetime that could conceivably be taken by light rays passing through this event. The absolute division, in Newtonian

1.2 Space and Time, Separately and Together

5

particle worldline-

light cones

FIGURE 1.2 In special relativity there is no absolute notion of "all of space at one moment in time." Instead, there is a rule that particles always travel at less than or equal to the speed of light. We can therefore define light cones at every event, which locally describe the set of allowed trajectories. For two events that are outside each others' light cones, there is no universal notion of which event occurred earlier in time.
mechanics, of spacetime into unique slices of space parameterized by time, is replaced by a rule that says that physical particles cannot travel faster than light, and consequently move along paths that always remain inside these light cones.
The absence of a preferred time-slicing in SR is at the heart of why the notion of spacetime is more fundamental in this context than in Newtonian mechanics. Of course we can choose specific coordinate systems in spacetime, and once we do, it makes sense to speak of separated events occurring at the same value of the time coordinate in this particular system; but there will also be other possible coordinates, related to the first by "rotating" space and time into each other. This phenomenon is a natural generalization of rotations in Euclidean geometry, to which we now tum.
Consider a garden-variety two-dimensional plane. It is typically convenient to label the points on such a plane by introducing coordinates, for example by defining orthogonal x and y axes and projecting each point onto these axes in the usual way. However, it is clear that most of the interesting geometrical facts about the plane are independent of our choice of coordinates; there aren't any preferred directions. As a simple example, we can consider the distance between two points, given by
(1.7)
In a different Cartesian coordinate system, defined by x' and y' axes that are rotated with respect to the originals, the formula for the distance is unaltered:
(1.8)
We therefore say that the distance is invariant under such changes of coordinates.

6

Chapter 1 Special Relativity and Flat Spacetime

FIGURE 1.3 Two-dimensional Euclidean space, with two different coordinate systems. Notions such as "the distance between two points" are independent of the coordinate system chosen.
This is why it is useful to think of the plane as an intrinsically two-dimensional space, rather than as two fundamentally distinct one-dimensional spaces brought arbitrarily together: Although we use two distinct numbers to label each point, the numbers are not the essence of the geometry, since we can rotate axes into each other while leaving distances unchanged. In Newtonian physics this is not the case with space and time; there is no useful notion of rotating space and time into each other. Rather, the notion of "all of space at a single moment in time" has a meaning independent of coordinates.
SR is a different story. Let us consider coordinates (t, x, y, z) on spacetime, set up in the following way. The spatial coordinates (x, y, z) comprise a standard Cartesian system, constructed for example by welding together rigid rods that meet at right angles. The rods must be moving freely, unaccelerated. The time coordinate is defined by a set of clocks, which are not moving with respect to the spatial coordinates. (Since this is a thought experiment, we can imagine that the rods are infinitely long and there is one clock at every point in space.) The clocks are synchronized in the following sense. Imagine that we send a beam of light from point 1 in space to point 2, in a straight line at a constant velocity c, and then immediately back to 1 (at velocity -c). Then the time on the coordinate clock when the light beam reaches point 2, which we label t2, should be halfway between the time on the coordinate clock when the beam left point 1 (t1) and the time on that same clock when it returned (ti):
(1.9)
The coordinate system thus constructed is an inertial frame, or simply "inertial coordinates." These coordinates are the natural generalization to spacetime of Cartesian (orthonormal) coordinates in space. (The reason behind the careful

1.2 Space and Time, Separately and Together

7

2

X

FIGURE 1.4 Synchronizing clocks in an inertial coordinate system. The clocks are synchronized if the time t2 is halfway between t1 and ti when we bounce a beam of light from point 1 to point 2 and back.

construction is so that we only make comparisons locally; never, for example, comparing two far-away clocks to each other at the same time. This kind of care will be even more necessary once we go to general relativity, where there will not be any way to construct inertial coordinates throughout spacetime.)
We can construct any number of inertial frames via this procedure, differing from the first one by an offset in initial position and time, angle, and (constant) velocity. In a Newtonian world, the new coordinates (t', x', y', z') would have the
feature that t' = t + constant, independent of spatial coordinates. That is, there
is an absolute notion of "two events occurring simultaneously, that is, at the same time." But in SR this isn't true; in general the three-dimensional "spaces" defined
by t = constant will differ from those defined by t' = constant.
However, we have not descended completely into chaos. Consider, without any motivation for the moment, what we will call the spacetime interval between two events:

+ + + (Ll.s)2 = -(cLl./)2 (Ll.x)z (Ll.y)2 (Ll.z)2. I

(1.10)

(Notice that it can be positive, negative, or zero even for two nonidentical points.) Here, c is some fixed conversion factor between space and time, that is, a fixed velocity. As an empirical matter, it turns out that electromagnetic waves propagate in vacuum at this velocity c, which we therefore refer to as "the speed of light." The important thing, however, is not that photons happen to travel at that speed, but that there exists a c such that the spacetime interval is invariant under changes of inertial coordinates. In other words, if we set up a new inertial frame (t', x', y', z'), the interval will be of the same form:
(1.11)

8

Chapter 1 Special Relativity and Flat Spacetime

This is why it makes sense to think of SR as a theory of four-dimensional spacetime, known as Minkowski space. (This is a special case of a four-dimensional manifold, which we will deal with in detail later.) As we shall see, the coordinate transformations that we have implicitly defined do, in a sense, rotate space and time into each other. There is no absolute notion of "simultaneous events"; whether two things occur at the same time depends on the coordinates used. Therefore, the division of Minkowski space into space and time is a choice we make for our own purposes, not something intrinsic to the situation.
Almost all of the "paradoxes" associated with SR result from a stubborn persistence of the Newtonian notions of a unique time coordinate and the existence of "space at a single moment in time." By thinking in terms of spacetime rather than space and time together, these paradoxes tend to disappear.
Let's introduce some convenient notation. Coordinates on spacetime will be denoted by letters with Greek superscript indices running from 0 to 3, with 0 generally denoting the time coordinate. Thus,

XO= ct
x1 =x x2 = y
x 3 = z.

(1.12)

(Don't start thinking of the superscripts as exponents.) Furthermore, for the sake of simplicity we will choose units in which

C = 1;

(1.13)

we will therefore leave out factors of c in all subsequent formulae. Empirically we know that c is 3 x 108 meters per second; thus, we are working in units where 1 second equals 3 x 108 meters. Sometimes it will be useful to refer to the space
and time components of xtL separately, so we will use Latin superscripts to stand
for the space components alone:

= x 1 X = xi: x2 y
x 3 = z.

(1.14)

It is also convenient to write the spacetime interval in a more compact form. We therefore introduce a 4 x 4 matrix, the metric, which we write using two lower indices:

01 00 00 ) 010 •
0 0 1

(Some references, especially field theory books, define the metric with the opposite sign, so be careful.) We then have the nice formula

= (~s) 2 1'}µ,v~Xµ, ~xv.

(1.16)

1.2 Space and Time, Separately and Together

9

FIGURE 1.5 A light cone, portrayed on a spacetime diagram. Points that are spacelike-, null-, and timelike-separated from the origin are indicated.

This formula introduces the summation convention, in which indices appearing both as superscripts and subscripts are summed over. We call such labels dummy indices; it is important to remember that they are summed over all possible values, rather than taking any specific one. (It will always turn out to be the case that dummy indices occur strictly in pairs, with one "upstairs" and one "downstairs." More on this later.) The content of (1.16) is therefore exactly the same as (1.10).
An extremely useful tool is the spacetime diagram, so let's consider Minkowski space from this point of view. We can begin by portraying the initial t and x axes at right angles, and suppressing they and z axes. ("Right angles" as drawn on a spacetime diagram don't necessarily imply "orthogonal in spacetime," although that turns out to be true for the t and x axes in this case.) It is enlightening to
= = consider the paths corresponding to travel at the speed c 1, given by x ±t.
A set of points that are all connected to a single event by straight lines moving at the speed of light is the light cone, since if we imagine including one more spatial coordinate, the two diagonal lines get completed into a cone. Light cones are naturally divided into future and past; the set of all points inside the future and past light cones of a point p are called timelike separated from p, while those outside the light cones are spacelike separated and those on the cones are lightlike or null separated from p. Referring back to (1.10), we see that the interval between timelike separated points is negative, between spacelike separated points is positive, and between null separated points is zero. (The interval is defined to be (~s)2, not the square root of this quantity.)
The fact that the interval is negative for a timelike line (on which a slowerthan-light particle will actually move) is annoying, so we define the proper time T to satisfy

(1.17)

A crucial feature of the spacetime interval is that the proper time between two events measures the time elapsed as seen by an observer moving on a straight path between the events. This is easily seen in the very special case that the two events have the same spatial coordinates, and are only separated in time; this corresponds to the observer traveling between the events being at rest in the coordinate system used. Then (~ r) 2 = -rJµ,v~xtL ~xv = (~t) 2, so ~ T = ~t, and of course we defined t as the time measured by a clock located at a fixed spatial position. But the spacetime interval is invariant under changes of inertial frame; the proper time (1.1 7) between two fixed events will be the same when evaluated in an inertial frame where the observer is moving as it is in the frame where the observer is at rest.
A crucial fact is that, for more general trajectories, the proper time and coordinate time are different (although the proper time is always that measured by the clock carried by an observer along the trajectory). Consider two trajectories between events A and C, one a straight line passing through a halfway point marked B, and another traveled by an observer moving away from A at a constant velocity
v = dx / dt to a point B' and then back at a constant velocity -v to intersect at

10

Chapter 1 Special Relativity and Flat Spacetime

B'

X
FIGURE 1.6 The twin paradox. A traveler on the straight path through spacetime ABC will age more than someone on the nonstraight path AB'C. Since proper time is a measure of distance traveled through spacetime, this should come as no surprise. (The only surprise might be that the straight path is the one of maximum proper time; this can be traced to the minus sign for the timelike component of the metric.)
',

the event C. Choose inertial coordinates such that the straight trajectory describes
a motionless particle, with event A located at coordinates (t, x) = (0, 0) and C
located at (flt, 0). The two paths then describe an isosceles triangle in spacetime;
B has coordinates (½flt, 0) and B' has coordinates (½flt, flx), with flx = ½v flt. Clearly, fl TAB = ½flt, but
fl TAB'= ✓ (½flt)2 - (flx) 2

=½~flt.

(1.18)

It should be obvious that flTBc = fl TAB and flTB'C fl TAB'· Thus, the ob-
server on the straight-line trip from event A to C experiences an elapsed time of
flTABC = flt, whereas the one who traveled out and returned experiences

flTAB'C = ~ f l t < flt.

(1.19)

Even though the two observers begin and end at the same points in spacetime, they have aged different amounts. This is the famous "twin paradox," the unfortunate scene of all sorts of misunderstandings and tortured explanations. The truth is straightforward: a nonstraight path in spacetime has a different interval than a straight path, just as a nonstraight path in space has a different length than a straight one. This isn't as trivial as it sounds, of course; the profound insight is the way in which "elapsed time along a worldline" is related to the interval traversed through spacetime. In a Newtonian world, the coordinate t represents a universal flow of time throughout all of spacetime; in relativity, t is just a convenient coordinate, and the elapsed time depends on the path along which you travel. An

1.2 Space and Time, Separately and Together

11

important distinction is that the nonstraight path has a shorter proper time. In space, the shortest distance between two points is a straight line; in spacetime, the longest proper time between two events is a straight trajectory.
Not all trajectories are nice enough to be constructed from pieces of straight lines. In more general circumstances it is useful to introduce the infinitesimal interval, or line element:

(1.20)

for infinitesimal coordinate displacements dxtL. (Yve are being quite informal here, but we'll make amends later on.) From this definition it is tempting to take the square root and integrate along a path to obtain a finite interval, but it is some-
J what unclear what JrJµ,vdxtLdxv is supposed to mean. Instead we consider a
path through spacetime as a parameterized curve, xtL(1,.). Note that, unlike conventional practice in Newtonian mechanics, the parameter ),. is not necessarily identified with the time coordinate. We can then calculate the derivatives dxtL / d),., and write the path length along a spacelike curve (one whose infinitesimal intervals are spacelike) as

f !ls=

(1.21)

where the integral is taken over the path. For timelike paths we use the proper time

(1.22)
which will be positive. (For null paths the interval is simply zero.) Of course we may consider paths that are timelike in some places and spacelike in others, but fortunately it is seldom necessary since the paths of physical particles never change their character (massive particles move on timelike paths, massless particles move on null paths). Once again, fl T really is the time measured by an observer moving along the trajectory.
The notion of acceleration in special relativity has a bad reputation, for no good reason. Of course we were careful, in setting up inertial coordinates, to make sure that particles at rest in such coordinates are unaccelerated. However, once we've set up such coordinates, we are free to consider any sort of trajectories for physical particles, whether accelerated or not. In particular, there is no truth to the rumor that SR is unable to deal with accelerated trajectories, and general relativity must be invoked. General relativity becomes relevant in the presence of gravity, when spacetime becomes curved. Any processes in flat spacetime are described within the context of special relativity; in particular, expressions such as (1.22) are perfectly general.

12

Chapter 1 Special Relativity and Flat Spacetime

1.3 ■ LORENTZ TRANSFORMATIONS

We can now consider coordinate transformations in spacetime at a somewhat more abstract level than before. We are interested in a formal description of how to relate the various inertial frames constructed via the procedure outlined above; that is, coordinate systems that leave the interval (1.16) invariant. One simple variety are the translations, which merely shift the coordinates (in space or time):

(1.23)

1
where atL is a set of four fixed numbers and 8~ is the four-dimensional version
of the traditional Kronecker delta symbol:

0µ,'
tL

=

{ 1
O

whenµ/ = µ,, when µ,' =f- µ,.

(1.24)

Notice that we put the prime on the index, not on the x. The reason for this should become more clear once we start dealing with vectors and tensors; the notation serves to remind us that the geometrical object is the same, but its components are resolved with respect to a different coordinate system. Translations leave the differences flxtL unchanged, so it is not remarkable that the interval is unchanged. The other relevant transformations include spatial rotations and offsets by a constant velocity vector, or boosts; these are linear transformations, described by multiplying xtL by a (spacetime-independent) matrix:

(1.25)

or, in more conventional matrix notation,

x' =Ax.

(1.26)

(We will generally use indices, rather than matrix notation, but right now we

have an interest in relating our discussion to certain other familiar notions usually •

described by matrices.) These transformations do not leave the differences flxtL

unchanged, but multiply them also by the matrix A. What kind of matrices will

leave the interval invariant? Sticking with the matrix notation, what we would

~~

~

(lls)2 = (llx/ 11(/lx) = (llx'/ 1J (llx') = (llx/AT11A(llx),

(1.27)

and therefore

(1.28)

1.3 Lorentz Transformations

13

or

(1.29)

(In matrix notation the order matters, while in index notation it is irrelevant.) We
1
want to find the matrices A1,L v such that the components of the matrix 171,L'v' are the same as those of rJpa; that is what it means for the interval to be invariant
under these transformations. The matrices that satisfy (1.28) are known as the Lorentz transformations;
the set of them forms a group under matrix multiplication, known as the Lorentz
group. There is a close analogy between this group and SO(3), the rotation group
in three-dimensional space. The rotation group can be thought of as 3 x 3 matrices
R that satisfy RT R = 1, where 1 is the 3 x 3 identity matrix. Such matrices are
called orthogonal, and the 3 x 3 ones form the group 0(3). This includes not only
rotations but also reversals of orientation of the spatial axes (parity transformations). Sometimes we choose to exclude parity transformations by also demanding
that the matrices have unit determinant, IR I = 1; such matrices are called special,
and the resulting group is SO(3). The orthogonality condition can be made to look
more like (1.28) if we write it as

(1.30)

So the difference between the rotation group 0(3) and the Lorentz group is the
replacement of 1, a 3 x 3 diagonal matrix with all entries equal to +1, by 17, a 4 x 4 diagonal matrix with one entry equal to -1 and the rest equal to + 1.
The Lorentz group is therefore often referred to as 0(3, 1). It includes not only
boosts and rotations, but discrete reversals of the time direction as well as parity
transformations. As before we can demand that IAI = 1, leaving the "proper
Lorentz group" SO(3,l). However, this does not leave us with what we really
want, which is the set of continuous Lorentz transformations (those connected smoothly to the identity), since a combination of a time reversal and a parity
reversal would have unit determinant. From the (p, a) = (0, 0) component of
(1.29) we can easily show that IA0' ol ~ 1, with negative values corresponding to time reversals. We can therefore demand at last that A0' o ~ 1 (in addition to IA I =
1), leaving the "proper orthochronous" or "restricted" Lorentz group. Sometimes
this is denoted by something like SO(3, l)t, but usually we will not bother to
make this distinction explicitly. Note that the 3 x 3 identity matrix is simply the metric for ordinary flat space. Such a metric, in which all of the eigenvalues are
positive, is called Euclidean, while those such as (1.15), which feature a single
minus sign, are called Lorentzian. It is straightforward to write down explicit expressions for simple Lorentz
transformations. A familiar rotation in the x-y plane is:

0

0

cose sine

- sine cose

0

0

(1.31)

14

Chapter 1 Special Relativity and Flat Spacetime

The rotation angle 0 is a periodic variable with period 2Tt. The boosts may be thought of as "rotations between space and time directions." An example is given by a boost in the x-direction:

- sinh¢ 0 0) cosh¢ 0 0
0 10 •
0 01

(1.32)

The boost parameter¢, unlike the rotation angle, is defined from -oo to oo. A general transformation can be obtained by multiplying the individual transformations; the explicit expression for this six-parameter matrix (three boosts, three rotations) is not pretty, or sufficiently useful to bother writing down. In general Lorentz transformations will not commute, so the Lorentz group is nonabelian. The set of both translations and Lorentz transformations is a ten-parameter nonabelian group, the Poincare group.
You should not be surprised to learn that the boosts correspond to changing coordinates by moving to a frame that travels at a constant velocity, but let's see it more explicitly. (Don't confuse "boosting" with "accelerating." The difference between boosting to a different reference frame and accelerating an object is the same as the difference between rotating to a different coordinate system and setting an object spinning.) For the transformation given by (1.32), the transformed coordinates t' and x' will be given by

t' = t cosh¢ - x sinh¢
x' = -t sinh¢ + x cosh¢.

(1.33)

From this we see that the point defined by x' = 0 is moving; it has a velocity

X sinhcp v=-=--=tanh¢.
t cosh¢

(1.34)

To translate into more pedestrian notation, we can replace ¢ = tanh- 1 v to obtain

t' = y(t - vx) = x' y(x - vt),

(1.35)

where y = I/-Jf=v2. So indeed, our abstract approach has recovered the conventional expressions for Lorentz transformations. Applying these fo~ulae leads to time dilation, length contraction, and so forth.
It's illuminating to consider Lorentz transformations in the context of spacetime diagrams. According to (1.33), under a boost in the x-t plane the x' axis (t' = 0) is given by t = x tanh ¢, while the t' axis (x' = 0) is given by t = x / tanh ¢. We therefore see that the space and time axes are rotated into each other, although they scissor together instead of remaining orthogonal in the traditional Euclidean sense. (As we shall see, the axes do in fact remain orthogonal in

1.4 Vectors

15

FIGURE 1.7 A Lorentz transformation relates the {t', x'} coordinates to the {t, x} coordinates. Note that light cones are unchanged.
the Lorentzian sense; that's the implication of the metric remaining invariant under boosts.) This should come as no surprise, since if spacetime behaved just like a four-dimensional version of space the world would be a very different place. We see quite vividly the distinction between this situation and the Newtonian world; in SR, it is impossible to say (in a coordinate-independent way) whether a point that is spacelike separated from p is in the future of p, the past of p, or "at the same time."
Note also that the paths defined by x' = ±t' are precisely the same as those
defined by x = ±t; these trajectories are left invariant under boosts along the x-
axis. Of course we know that light travels at this speed; we have therefore found that the speed of light is the same in any inertial frame.
1.4 ■ VECTORS
To probe the structure of Minkowski space in more detail, it is necessary to introduce the concepts of vectors and tensors. We will start with vectors, which should be familiar. Of course, in spacetime vectors are four-dimensional, and are often referred to as four-vectors. This turns out to make quite a bit of difference-for example, there is no such thing as a cross product between two four-vectors.
Beyond the simple fact of dimensionality, the most important thing to emphasize is that each vector is located at a given point in spacetime. You may be used to thinking of vectors as stretching from one point to another in space, and even of "free" vectors that you can slide carelessly from point to point. These are not useful concepts outside the context of flat spaces; once we introduce curvature, we lose the ability to draw preferred curves from one point to another, or to move vectors uniquely around a manifold. Rather, to each point p in spacetime we associate the set of all possible vectors located at that point; this set is known as the tangent space at p, or Tp. The name is inspired by thinking of the set of

16

Chapter 1 Special Relativity and Flat Spacetime

vectors attached to a point on a simple curved two-dimensional space as comprising a plane tangent to the point. (This picture relies on an embedding of the manifold and the tangent space in a higher-dimensional external space, which we won't generally have or need.) Inspiration aside, it is important to think of these vectors as being located at a single point, rather than stretching from one point to another (although this won't stop us from drawing them as arrows on spacetime diagrams).
In Chapter 2 we will relate the tangent space at each point to things we can construct from the spacetime itself. For right now, just think of Tp as an abstract vector space for each point in spacetime. A (real) vector space is a collection of objects (vectors) that can be added together and multiplied by real numbers in a linear way. Thus, for any two vectors V and W and real numbers a and b, we have

(a+ b)(V + W) = aV + bV + aW + bW.

(1.36)

Every vector space has an origin, that is, a zero vector that functions as an identity element under vector addition. In many vector spaces there are additional operations such as taking an inner (dot) product, but this is extra structure over and above the elementary concept of a vector space.
A vector is a perfectly well-defined geometric object, as is a vector field, defined as a set of vectors with exactly one at each point in spacetime. [The set of all the tangent spaces of an n-dimensional manifold M can be assembled into a 2ndimensional manifold called the tangent bundle, T(M). It is a specific example of a "fiber bundle," which is endowed with some extra mathematical structure; we won't need the details for our present purposes.] Nevertheless it is often useful to decompose vectors into components with respect to some set of basis vectors. A basis is any set of vectors which both spans the vector space (any vector is a linear combination of basis vectors) and is linearly independent (no vector in the basis

FIGURE 1.8 A suggestive drawing of the tangent space Tp, the space of all vectors at the point p.

1.4 Vectors

17

is a linear combination of other basis vectors). For any given vector space, there will be an infinite number of possible bases we could choose, but each basis will consist of the same number of vectors, known as the dimension of the space. (For a tangent space associated with a point in Minkowski space, the dimension is, of course, four.)
Let us imagine that at each tangent space we set up a basis of four vectors e(µ,), withµ, E {O, 1, 2, 3} as usual. In fact let us say that each basis is "adapted to the coordinates xW'-that is, the basis vector e(l) is what we would normally think of pointing along the x-axis. It is by no means necessary that we choose a basis adapted to any coordinate system at all, although it is often convenient. (As before, we really could be more precise here, but later on we will repeat the discussion at an excruciating level of precision, so some sloppiness now is forgivable.) Then any abstract vector A can be written as a linear combination of basis vectors:

(1.37)
The coefficients AtL are the components of the vector A. More often than not we will forget the basis entirely and refer somewhat loosely to "the vector AtL ," but keep in mind that this is shorthand. The real vector is an abstract geometrical entity, while the components are just the coefficients of the basis vectors in some convenient basis. (Since we will usually suppress the explicit basis vectors, the indices usually will label components of vectors and tensors. This is why there are parentheses around the indices on the basis vectors, to remind us that this is a collection of vectors, not components of a single vector.)
A standard example of a vector in spacetime is the tangent vector to a curve. A parameterized curve or path through spacetime is specified by the coordinates as a function of the parameter, for example, x tL (1c). The tangent vector V (1c) has components

(1.38)
The entire vector is V = V tLe(µ,). Under a Lorentz transformation the coordinates
xtL change according to (1.25), while the parameterization Ais unaltered; we can therefore deduce that the components of the tangent vector must change as

(1.39)
However, the vector V itself (as opposed to its components in some coordinate system) is invariant under Lorentz transformations. We can use this fact to derive the transformation properties of the basis vectors. Let us refer to the set of basis vectors in the transformed coordinate system as e(v'). Since the vector is invariant, we have
(1.40)

18

Chapter 1 Special Relativity and Flat Spacetime

But this relation must hold no matter what the numerical values of the components V i,i are. We can therefore say

(1.41)

To get the new basis e(v') in terms of the old one e(i,i), we should multiply by the inverse of the Lorentz transformation Av' w But the inverse of a Lorentz transformation from the unprimed to the primed coordinates is also a Lorentz transformation, this time from the primed to the unprimed systems. We will therefore introduce a somewhat subtle notation, by using the same symbol for both matrices, just with primed and unprimed indices switched. That is, the Lorentz transformation specified by Ai,i' v has an inverse transformation written as APa'. Operationally this implies

or, . = A i,i v' Av' p -_ ur/p,l ' Aa' )._ A)._ r'

a1

(1.42)

From (1.41) we then obtain the transformation rule for basis vectors:

(1.43)
Therefore the set of basis vectors transforms via the inverse Lorentz transformation of the coordinates or vector components.
Let's pause a moment to take all this in. We introduced coordinates labeled by upper indices, which transformed in a certain way under Lorentz transformations. We then considered vector components that also were written with upper indices, which made sense since they transformed in the same way as the coordinate functions. (In a fixed coordinate system, each of the four coordinates xi,i can be thought of as a function on spacetime, as can each of the four components of a vector field.) The basis vectors associated with the coordinate system transformed via the inverse matrix, and were labeled by a lower index. This notation ensured that the invariant object constructed by summing over the components and basis vectors was left unchanged by the transformation, just as we would wish. It's probably not giving too much away to say that this will continue to be the case for tensors, which may have multiple indices.

1.5 ■ DUAL VECTORS (ONE-FORMS)

Once we have set up a vector space, we can define another associated vector space (of equal dimension) known as the dual vector space. The dual space is usually
denoted by an asterisk, so that the dual space to the tangent space Tp, called the
r;. cotangent space, is denoted The dual space is the space of all linear maps r; from the original vector space to the real numbers; in math lingo, if w E is a
dual vector, then it acts as a map such that

w(aV + bW) = aw(V) + bw(W) ER,

(1.44)

1.5 Dual Vectors (One-Forms)

19

where V, W are vectors and a, b are real numbers. The nice thing about these maps is that they form a vector space themselves; thus, if wand 17 are dual vectors, we have

+ (aw+ b17)(V) = aw(V) b17(V).

(1.45)

To make this construction somewhat more concrete, we can introduce a set of basis dual vectors g(v) by demanding

(1.46)

Then every dual vector can be written in terms of its components, which we label with lower indices:

(1.47)

Usually, we will simply write wµ,, in perfect analogy with vectors, to stand for the entire dual vector. In fact, you will sometimes see elements of Tp (what we have
r; called vectors) referredJo as contravariant vectors, and elements of (what we
have called dual vectors) referred to as covariant vectors, although in this day and age these terms sound a little dated. If you just refer to ordinary vectors as vectors with upper indices and dual vectors as vectors with lower indices, nobody should be offended. Another name for dual vectors is one-forms, a somewhat mysterious designation that will become clearer in Chapter 2.
The component notation leads to a simple way of writing the action of a dual vector on a vector:

w(V) = Wµ,0(µ,\yv e(v)) = Wµ, yv g(tL\e(v)) =Wµ,Vv8~ = Wµ, VILER.

(1.48)

This is why it is rarely necessary to write the basis vectors and dual vectors explicitly; the components do all of the work. The form of (1.48) also suggests that we can think of vectors as linear maps on dual vectors, by defining

(1.49)

Therefore, the dual space to the dual vector space is the original vector space itself.
Of course in spacetime we will be interested not in a single vector space, but in fields of vectors and dual vectors. [The set of all cotangent spaces over M can be combined into the cotangent bundle, T*(M).] In that case the action of a dual vector field on a vector field is not a single number, but a scalar (function) on spacetime. A scalar is a quantity without indices, which is unchanged under

20

Chapter 1 Special Relativity and Flat Spacetime

Lorentz transformations; it is a coordinate-independent map from spacetime to the real numbers.
We can use the same arguments that we earlier used for vectors (that geometrical objects are independent of coordinates, even if their components are not) to derive the transformation properties of dual vectors. The answers are, for the components,

Wµ/ = Av µ/Wv,

(1.50)

and for basis dual vectors,
(1.51)
This is just what we would expect from index placement; the components of a dual vector transform under the inverse transformation of those of a vector. Note that this ensures that the scalar (1.48) is invariant under Lorentz transformations, just as it should be.
In spacetime the simplest example of a dual vector is the gradient of a scalar function, the set of partial derivatives with respect to the spacetime coordinates, which we denote by a lowercase d:

(1.52)

The conventional chain rule used to transform partial derivatives amounts in this case to the transformation rule of components of dual vectors:

axJ-L' axJ-L' ax/-L

= AJ-L , a¢ µ, axJ-L'

(1.53)

where we have used (1.25) to relate the Lorentz transformation to the coordinates. The fact that the gradient is a dual vector leads to the following shorthand notations for partial derivatives:

(1.54)

So, xJ-L has an upper index, but when it is in the denominator of a derivative it implies a lower index on the resulting object. In this book we will generally use aµ, rather than the comma notation. Note that the gradient does in fact act in a,, natural way on the example we gave above of a vector, the tangent vector to a curve. The result is an ordinary derivative of the function along the curve:

(1.55)

1.6 Tensors

21

1.6 ■ TENSORS

A straightforward generalization of vectors and dual vectors is the notion of a tensor. Just as a dual vector is a linear map from vectors to R, a tensor T of type (or rank) (k, l) is a multilinear map from a collection of dual vectors and vectors to R:

T : T* X ••• X T* X Tp X ••• X Tp ➔ R.

P (k times) P

(l times)

(1.56)

Here, "x" denotes the Cartesian product, so that for example Tp x Tp is the space of ordered pairs of vectors. Multilinearity means that the tensor acts linearly in each of its arguments; for instance, for a tensor of type (1, 1), we have
+ + T(aw brJ, cV dW) = acT(w, V) + + + adT(w, W) bcT(rJ, V) bdT(rJ, W). ,(L57)

From this point of view, a scalar is a type (0, 0) tensor, a vector is a type (1, 0) tensor, and a dual vector is a type (0, 1) tensor.
The space of all tensors of a fixed type (k, l) forms a vector space; they can be added together and multiplied by real numbers. To construct a basis for this space, we need to define a new operation known as the tensor product, denoted
by®· If Tis a (k, l) tensor and Sis an (m, n) tensor, we define a (k + m, l + n)
tensor T ® S by

T ® S(w(l), ... , w(k), ... , w(k+m), y(l), ... , y(l), ... , y(l+n))
= T(w(l)' ... ' w(k)' y(l)' ... ' y(l))
x S(w(k+l), ... , w(k+m), yU+l), ... , y(l+n)).

(1.58)

Note that the w(i) and y(i) are distinct dual vectors and vectors, not components thereof. In other words, first act T on the appropriate set of dual vectors and vectors, and then act S on the remainder, and then multiply the answers. Note
that, in general, tensor products do not commute: T ® S -1- S ® T.
It is now straightforward to construct a basis for the space of all (k, l) tensors, by taking tensor products of basis vectors and dual vectors; this basis will consist of all tensors of the form

(1.59)

In a four-dimensional spacetime there will be 4k+l basis tensors in all. In component notation we then write our arbitrary tensor as

(l.60)

22

Chapter 1 Special Relativity and Flat Spacetime

Alternatively, we could define the components by acting the tensor on basis vectors and dual vectors:
(l.61)

You can check for yourself, using (1.46) and so forth, that these equations all hang
together properly. As with vectors, we will usually take the shortcut of denoting the tensor T by
its components TtL1'..J.Lk v1... vz • The action of the tensors on a set of vectors and dual vectors follows the pattern established in (1.48):

= T((i) (l) '

•••

'

W(k) '

y(l) '

•••

'

y(l))

yµ,1···J.Lk

W(l) ... W(k)y(l)v1 ... y(l)vz

v1 "·Vz /J,l

/J,k

•

(l.62)

A (k, l) tensor thus has k upper indices and l lower indices. The order of the indices is obviously important, since the tensor need not act in the same way on its various arguments.
Finally, the transformation of tensor components under Lorentz transformations can be derived by applying what we already know about the transformation of basis vectors and dual vectors. The answer is just what you would expect from index placement,

(l.63)

Thus, each upper index gets transformed like a vector, and each lower index gets transformed like a dual vector.
Although we have defined tensors as linear maps from sets of vectors and tangent vectors to R, there is nothing that forces us to act on a full collection of arguments. Thus, a (1, 1) tensor also acts as a map from vectors to vectors:
(l.64)

You can check for yourself that TJ.Lv vv is a vector (that is, obeys the vector transformation law). Similarly, we can act one tensor on (all or part of) another tensor to obtain a third tensor. For example,
(l.65)

is a perfectly good (1, 1) tensor. You may be concerned that this introduction to tensors has been somewhat too
brief, given the esoteric nature of the material. In fact, the notion of tensors does not require a great deal of effort to master; it's just a matter of keeping the indices straight, and the rules for manipulating them are very natural. Indeed, a number of books like to define tensors as collections of numbers transforming according to (l.63). While this is operationally useful, it tends to obscure the deeper meaning of tensors as geometrical entities with a life independent of any chosen coordinate

1.6 Tensors

23

system. There is, however, one subtlety that we have glossed over. The notions of dual vectors and tensors and bases and linear maps belong to the realm of linear algebra, and are appropriate whenever we have an abstract vector space at hand. In the case of interest to us we have not just a vector space, but a vector space at each point in spacetime. More often than not we are interested in tensor fields, which can be thought of as tensor-valued functions on spacetime. Fortunately, none of the manipulations we defined above really care whether we are dealing with a single vector space or a collection of vector spaces, one for each event. We will be able to get away with simply calling things functions of xtL when appropriate. However, you should keep straight the logical independence of the notions we have introduced and their specific application to spacetime and relativity.
In spacetime, we have already seen some examples of tensors without calling them that. The most familiar example of a (0, 2) tensor is the metric, f/µ,v• The action of the metric on two vectors is so useful that it gets its own name, the inner product (or scalar product, or dot product):

(l.66)

Just as with the conventional Euclidean dot product, we will refer to two vectors whose inner product vanishes as orthogonal. Since the inner product is a scalar, it is left invariant under Lorentz transformations; therefore, the basis vectors of any Cartesian inertial frame, which are chosen to be orthogonal by definition, are still orthogonal after a Lorentz transformation (despite the "scissoring together" we noticed earlier). The norm of a vector is defined to be inner product of the vector with itself; unlike in Euclidean space, this number is not positive definite:

V tL is timelike V tL is lightlike or null V tL is spacelike.

(A vector can have zero norm without being the zero vector.) You will notice that the terminology is the same as that which we used earlier to classify the ) - relationship between two points in spacetime; it's no accident, of course, and we will go into more detail later.
Another tensor is the Kronecker delta ot, of type (1, 1). Thought of as a map
from vectors to vectors (or one-forms to one-forms), the Kronecker delta is simply 'the identity map. We follow the example of many other references in placing the upper and lower indices in the same column for this unique tensor; purists might write otL p or 8ptL, but these would be numerically identical, and we shouldn't get in trouble being careless in this one instance.
Related to the Kronecker delta and the metric is the inverse metric 11µ,v, a type (2, 0) tensor defined (unsurprisingly) as the "inverse" of the metric:

= rJ µ,v f/vp

vµ,

r/J,

f/pvf/ =up.

(1.67)

(It's the inverse metric since, when multiplied by the metric, it yields the identity map.) In fact, as you can check, the inverse metric has exactly the same compo-

24

Chapter 1 Special Relativity and Flat Spacetime

nents as the metric itself. This is only true in flat space in Cartesian coordinates, and will fail to hold in more general situations. There is also the Levi-Civita symbol, a (0, 4) tensor:

if µ,vpa is an even permutation of0123 if µ,vpa is an odd permutation of 0123 otherwise.

(1.68)

Here, a "permutation of 0123" is an ordering of the numbers 0, 1, 2, 3, which can be obtained by starting with 0123 and exchanging two of the digits; an even permutation is obtained by an even number of such exchanges, and an odd per-
mutation is obtained by an odd number. Thus, for example, Eo321 = -1. (The
tilde on EJ-Lvpa, and referring to it as a symbol rather than simply a tensor, derive from the fact that this object is actually not a tensor in more general geometries or coordinates; instead, it is something called a "tensor density." It is straightforward enough to define a related object that is a tensor, which we will denote by EJ-Lvpa and call the "Levi-Civita tensor." See Chapter 2 for a discussion.)
A remarkable property of the above tensors-the metric, the inverse metric, the Kronecker delta, and the Levi-Civita symbol-is that, even though they all transform according to the tensor transformation law (1.63), their components remain unchanged in any inertial coordinate system in flat spacetime. In some sense this makes them nongeneric examples of tensors, since most tensors do not have this property. In fact, these are the only tensors with this property, although we won't prove it. The Kronecker delta is even more unusual, in that it has exactly the same components in any coordinate system in any spacetime. This makes sense from the definition of a tensor as a linear map; the Kronecker tensor can be thought of as the identity map from vectors to vectors (or from dual vectors to dual vectors), which clearly must have the same components regardless of coordinate system. Meanwhile, the metric and its inverse characterize the structure of spacetime, while the Levi-Civita symbol is secretly not a true tensor at all. We shall therefore have to treat these objects more carefully when we drop our assumption of flat spacetime.
A more typical example of a tensor is the electromagnetic field strength tensor. We all know that the electromagnetic fields are made up of the electric field vector Ei and the magnetic field vector Bi. (Remember that we use Latin indices for spacelike components 1, 2, 3.) Actually these are only "vectors" under rotations in space, not under the full Lorentz group. In fact they are components of a (0, 2) tensor F,uv, defined by

(1.69)

From this point of view it is easy to transform the electromagnetic fields in one reference frame to those in another, by application of (1.63). The unifying power of the tensor formalism is evident: rather than a collection of two vectors whose

1.7 Manipulating Tensors

25

relationship and transformation properties are rather mysterious, we have a single tensor field to describe all of electromagnetism. (On the other hand, don't get carried away; sometimes it's more convenient to work in a single coordinate system using the electric and magnetic field vectors.)

1.7 ■ MANIPULATING TENSORS

With these examples in hand we can now be a little more systematic about some properties of tensors. First consider the operation of contraction, which turns a (k, l) tensor into a (k - 1, l - 1) tensor. Contraction proceeds by summing over one upper and one lower index:

(l.70)

You can check that the result is a well-defined tensor. It is only permissible to contract an upper index with a lower index (as opposed to two indices of the same type); otherwise the result would not be a well-defined tensor. (By well-defined tensor we mean either "transforming according to the tensor transformation law," or "defining a unique multilinear map from a set of vectors and dual vectors to the real numbers"; take your pick.) Note also that the order of the indices matters, so that you can get different tensors by contracting in different ways; thus,

(l.71)

in general. The metric and inverse metric can be used to raise and lower indices on ten-
sors. That is, given a tensor yaf3 yo, we can use the metric to define new tensors, which we choose to denote by the same letter T:
yrxf3J,l 0 = 1}/,lYyaf3 yo,

T l yo = 1'J1,LaTaf3 yo,

T J,lV pa

=

py aoyaf3

1}1,La1'Jvf31} 1J

yo,

(l.72)

and so forth. Notice that raising and lowering does not change the position of an index relative to other indices, and also that free indices (which are not summed over) must be the same on both sides of an equation, while dummy indices (which are summed over) only appear on one side. As an example, we can tum vectors and dual vectors into each other by raising and lowering indices:

v/,l = 1'J1,LV yv

(l.73)

Because the metric and inverse metric are truly inverses of each other, we are free to raise and lower simultaneously a pair of indices being contracted over:

(l.74)

26

Chapter 1 Special Relativity and Flat Spacetime

The ability to raise and lower indices with a metric explains why the gradient in three-dimensional flat Euclidean space is usually thought of as an ordinary vector, even though we have seen that it arises as a dual vector; in Euclidean space (where the metric is diagonal with all entries +1) a dual vector is turned into a vector with precisely the same components when we raise its index. You may then wonder why we have belabored the distinction at all. One simple reason, of course, is that in a Lorentzian spacetime the components are not equal:
(1.75)
In a curved spacetime, where the form of the metric is generally more complicated, the difference is rather more dramatic. But there is a deeper reason, namely that tensors generally have a "natural" definition independent of the metric. Even though we will always have a metric available, it is helpful to be aware of the logical status of each mathematical object we introduce. The gradient, with its action on vectors, is perfectly well-defined regardless of any metric, whereas the "gradient with upper indices" is not. (As an example, we will eventually want to take variations of functionals with respect to the metric, and will therefore have to know exactly how the functional depends on the metric, something that is easily obscured by the index notation.)
Continuing our compilation of tensor jargon, we refer to a tensor as symmetric in any of its indices if it is unchanged under exchange of those indices. Thus, if
(1.76)
we say that SJ-Lvp is symmetric in its first two indices, while if
(l.77)
we say that SJ-Lvp is symmetric in all three of its indices. Similarly, a tensor is antisymmetric (or skew-symmetric) in any of its indices if it changes sign when those indices are exchanged; thus,
(l.78)
means that AJ-Lvp is antisymmetric in its first and third indices (or just "antisymmetric in µ, and p"). If a tensor is (anti-) symmetric in all of its indices, we refer to it as simply (anti-) symmetric (sometimes with the redundant modifier "completely"). As examples, the metric 1JJ-Lv and the inverse metric rJJ-Lv are symmetric, while the Levi-Civita symbol EJ-LVpa and the electromagnetic field strength tensor FJ-Lv are antisymmetric. (Check for yourself that if you raise or lower a set of indices that are symmetric or antisymmetric, they remain that way.) Notice that it makes no sense to exchange upper and lower indices with each other, so don't succumb to the temptation to think of the Kronecker delta f/J as symmetric. On the other hand, the fact that lowering an index on f/J gives a symmetric tensor (in fact, the metric) means that the order of indices doesn't really matter, which is why we don't keep track of index placement for this one tensor.

1.7 Manipulating Tensors

27

Given any tensor, we can symmetrize (or antisymmetrize) any number of its upper or lower indices. To symmetrize, we take the sum of all permutations of the relevant indices and divide by the number of terms:

T(l,liM···J,ln)Pa =

1 -n!

(T/,lIM···J,lnPa + sum over permutations of indices f.1,1 • • • µ,n),

(l.79)

while antisymmetrization comes from the alternating sum:

T[l,liM--·l,ln]P a = -l (T/,lIM···J,lnP a + alternatm• g sum over
n'• permutat1•0ns of.md.ices µ, 1 • • • Jl,n ).

(1.80)

By "alternating sum" we mean that permutations that are the result of an odd number of exchanges are given a minus sign, thus:

Notice that round/square brackets denote symmetrization/antisymmetrization. Furthermore, we may sometimes want to (anti-) symmetrize indices that are not next to each other, in which case we use vertical bars to denote indices not included in the sum:

(1.82)

If we are contracting over a pair of upper indices that are symmetric on one tensor, only the symmetric part of the lower indices will contribute; thus,

X (l,lv) YJ,lV -- xcl,lv) Ye J,lV),

(1.83)

regardless of the symmetry properties of Y1,Lv. (Analogous statements hold for antisymmetric indices, or if it's the lower indices that are symmetric to start with.) For any two indices, we can decompose a tensor into symmetric and antisymmetric parts,

(1.84)

but this will not in general hold for three or more indices,

(1.85)

because there are parts with mixed symmetry that are not specified by either the symmetric or antisymmetric pieces. Finally, some people use a convention in which the factor of 1/ n ! is omitted. The one used here is a good one, since, for example, a symmetric tensor satisfies

28

Chapter 1 Special Relativity and Flat Spacetime

(l.86)

and likewise for antisymmetric tensors. For a (1, 1) tensor X tL v, the trace is a scalar, often denoted by leaving off the
indices, which is simply the contraction:

X=X\.

(1.87)

If we think of XIL v as a matrix, this is just the sum of the diagonal components,
so it makes sense. However, we will also use trace in the context of a (0, 2) tensor
Yµ,v, in which case it means that we should first raise an index (YtLv = gtL>.Y>.v)
and then contract:

(1.88)

(It must be this way, since we cannot sum over two lower indices.) Although
this is the sum of the diagonal components of YILv, it is certainly not the sum of
the diagonal components of Yµ,v; we had to raise an index, which in general will
change the numerical value of the components. For example, you might guess that
the trace of the metric is -1 + 1 + 1 + 1 = 2, but it's not:

(1.89)
ot (Inn dimensions, = n.) There is no reason to denote this trace by g (or 8),
since it will always be the same number, even after we make the transition to curved spaces where the metric components are more complicated. Note that antisymmetric (0, 2) tensors are always traceless.
We have been careful so far to distinguish clearly between things that are always true (on a manifold with arbitrary metric) and things that are only true in Minkowski space in inertial coordinates. One of the most important distinctions arises with partial derivatives. If we are working in flat spacetime with inertial
coordinates, then the partial derivative of a (k, l) tensor is a (k, l + 1) tensor; ,
that is,

(1.90)

transforms properly under Lorentz transformations. However, this will no longer be true in more general spacetimes, and we will have to define a covariant derivative to take the place of the partial derivative. Nevertheless, we can still use the fact that partial derivatives give us tensor in this special case, as long as we keep our wits about us. [The one exception to this warning is the partial derivative of a scalar, 3a¢, which is a perfectly good tensor (the gradient) in any spacetime.] Of course, if we fix a particular coordinate system, the partial derivative is a perfectly good operator, which we will use all the time; its failure is only that it doesn't transform in the same way as the tensors we will be using (or equivalently, that the map it defines is not coordinate-independent). One of the most

1.8 Maxwell's Equations
useful properties of partial derivatives is that they commute,
no matter what kind of object is being differentiated.

29 (l.91)

1.8 ■ MAXWELL'S EQUATIONS

We have now accumulated enough tensor know-how to illustrate some of these concepts using actual physics. Specifically, we will examine Maxwell's equations of electrodynamics. In 19th-century notation, these are

V x B- 3tE = J
V-E=p
V x E'+ 3tB = 0
V-B=O.

(l.92)

Here, E and B are the electric and magnetic field 3-vectors, J is the current, p
is the charge density, and V x and V• are the conventional curl and divergence. These equations are invariant under Lorentz transformations, of course; that's how the whole business got started. But they don't look obviously invariant; our tensor notation can fix that. Let's begin by writing these equations in component notation,

EiJkajBk - aoEi = Ji aiEi = Jo
Eijkaj Ek+ aoBi = 0
aiBi =0.

(l.93)

In these expressions, spatial indices have been raised and lowered with abandon, without any attempt to keep straight where the metric appears, because OiJ is the metric on flat 3-space, with oil its inverse (they are equal as matrices). We can therefore raise and lower indices at will, since the components don't change. Meanwhile, the three-dimensional Levi-Civita symbol EiJk is defined
just as the four-dimensional one, although with one fewer index (normalized
so that E123 = E123 = 1). We have replaced the charge density by J 0 ; this is
legitimate because the density and current together form the current 4-vector,
J/.L = (p, P, JY, JZ).
From (l.93), and the definition (l.69) of the field strength tensor FJ.Lv, it is easy to get a completely tensorial 20th-century version of Maxwell's equations. Begin by noting that we can express the field strength with upper indices as

30

Chapter 1 Special Relativity and Flat Spacetime

FOi = Ei FiJ = EiJk Bk.

(l.94)

To check this, note for example that F 01 = 11°0 11 11 Fo1 and F 12 = E123 B3. Then
the first two equations in (l.93) become

a1FiJ - aoFoi = Ji

(l.95)

Using the antisymmetry of ptv, we see that these may be combined into the single tensor equation

(l.96)

A similar line of reasoning, which is left as an exercise, reveals that the third and fourth equations in (1.93) can be written

(l.97)

It's simple to verify that the antisymmetry of FJ-Lv implies that (l.97) can be equivalently expressed as

(1.98)

The four traditional Maxwell equations are thus replaced by two, vividly demonstrating the economy of tensor notation. More importantly, however, both sides of equations (1.96) and (1.97) manifestly transform as tensors; therefore, if they are true in one inertial frame, they must be true in any Lorentz-transformed frame. This is why tensors are so useful in relativity-we often want to express relationships without recourse to any reference frame, and the quantities on each side of an equation must transform in the same way under changes of coordinates. As a matter of jargon, we will sometimes refer to quantities written in terms of tensors as covariant (which has nothing to do with "covariant" as opposed to "contravariant"). Thus, we say that (1.96) and (l.97) together serve as the covariant form of Maxwell's equations, while (1.92) or (1.93) are noncovariant.

1.9 ■ ENERGY AND MOMENTUM
We've now gone over essentially everything there is to know about the care and feeding of tensors. In the next chapter we will look more carefully at the rigorous definitions of manifolds and tensors, but the basic mechanics have been pretty well covered. Before jumping to more abstract mathematics, let's review how physics works in Minkowski spacetime.
Start with the worldline of a single particle. This is specified by a map R ➔ M, where Mis the manifold representing spacetime; we usually think of the path as

1.9 Energy and Momentum

31

a parameterized curve xtL(),.). As mentioned earlier, the tangent vector to this path is dxJ.L / d),. (note that it depends on the parameterization). An object of primary interest is the norm of the tangent vector, which serves to characterize the path; if the tangent vector is timelike/null/spacelike at some parameter value A, we say that the path is timelike/null/spacelike at that point. This explains why the same words are used to classify vectors in the tangent space and intervals between two points-because a straight line connecting, say, two timelike separated points will itself be timelike at every point along the path.
Nevertheless, be aware of the sleight of hand being pulled here. The metric, as a (0, 2) tensor, is a machine that acts on two vectors (or two copies of the same vector) to produce a number. It is therefore very natural to classify tangent vectors according to the sign of their norm. But the interval between two points isn't something quite so natural; it depends on a specific choice of path (a "straight line") that connects the points, and this choice in tum depends on the fact that spacetime is flat (wbich allows a unique choice of straight line between the points).
Let's move from the consideration of paths in general to the paths of massive particles (which will always be timelike). Since the proper time is measured by a clock traveling on a timelike worldline, it is convenient to use T as the parameter along the path. That is, we use (1.22) to compute T(A), which (if),. is a good parameter in the first place) we can invert to obtain 1,.(r), after which we can think of the path as xtL(r). The tangent vector in this parameterization is known as the four-velocity, U J.L:

(l.99)

Since dr 2 = -rJµ,vdxJ.Ldxv, the four-velocity is automatically normalized:

This absolute normalization is a reflection of the fact that the four-velocity is not a velocity through space, which can of course take on different magnitudes, but a "velocity through spacetime," through which one always travels at the same rate. The norm of the four-velocity will always be negative, since we are only defining it for timelike trajectories. You could define an analogous vector for spacelike paths as well; for null paths the proper time vanishes, so T can't be used as a parameter, and you have to be more careful. In the rest frame of a particle, its
four-velocity has components UJ.L = (1, 0, 0, 0).
A related vector is the momentum four-vector, defined by
(l.101)

32

Chapter 1 Special Relativity and Flat Spacetime

where m is the mass of the particle. The mass is a fixed quantity independent of
inertial frame, what you may be used to thinking of as the "rest mass." It turns
out to be much more convenient to take this as the mass once and for all, rather
than thinking of mass as depending on velocity. The energy of a particle is sim-
ply E = p0 , the timelike component of its momentum vector. Since it's only
one component of a four-vector, it is not invariant under Lorentz transformations;
that's to be expected, however, since the energy of a particle at rest is not the
same as that of the same particle in motion. In the particle's rest frame we have
p0 = m; recalling that we have set c = 1, we see that we have found the equation
that made Einstein a celebrity, E = mc2 . (The field equation of general relativity
is actually more fundamental than this one, but Rµ,v - = ½Rg µ,v 8TtGTµ,v doesn't elicit the visceral reaction that you get from E = mc2.) In a moving frame we can
find the components of ptL by performing a Lorentz transformation; for a particle
moving with three-velocity v = dx / dt along the x axis we have

pµ, = (ym, vym, 0, 0),

(l.102)

where y = 1 / ~ - For small v, this gives p 0 = m + ½mv2 (what we usually think of as rest energy plus kinetic energy) and p 1 = mv (what we usually think
of as Newtonian momentum). Outside this approximation, we can simply write

(1.103)

or

(l.104)
i where p2 = Oij pi.
The centerpiece of pre-relativity physics is Newton's Second Law, or f
ma = dp/ dt. An analogous equation should hold in SR, and the requirement that it be tensorial leads us directly to introduce a force four-vector f tL satisfying

(l.105)
The simplest example of a force in Newtonian physics is the force due to gravity. In relativity, however, gravity is not described by a force, but rather by the curvature of spacetime itself. Instead, let us consider electromagnetism. The three-
dimensional Lorentz force is given by f = q(E + v x B), where q is the charge on
the particle. We would like a tensorial generalization of this equation. There turns out to be a unique answer:
(1.106)
You can check for yourself that this reduces to the Newtonian version in the limit of small velocities. Notice how the requirement that the equation be tensorial,

1.9 Energy and Momentum

33

which is one way of guaranteeing Lorentz invariance, severely restricts the possible expressions we can get. This is an example of a very general phenomenon, in which a small number of an apparently endless variety of possible physical laws are picked out by the demands of symmetry.
Although ptL provides a complete description of the energy and momentum of an individual particle, we often need to deal with extended systems comprised of huge numbers of particles. Rather than specify the individual momentum vectors of each, particle, we instead describe the system as a fluid-a continuum characterized by macroscopic quantities such as density, pressure, entropy, viscosity, and so on. Although such a fluid may be composed of many individual particles with different four-velocities, the fluid itself has an overall four-velocity field. Just think of everyday fluids like air or water, where it makes sense to define a velocity for each individual fluid element even though nearby molecules may have appreciable relative velocities.
A single momentum four-vector field is insufficient to describe the energy and momentum of a fluid; we must go further and define the energy-momentum tensor (sometimes called the stress-energy tensor), T tLv. This symmetric (2, 0) tensor tells us all we need to know about the energy-like aspects of a system: energy density, pressure, stress, and so forth. A general definition of yµ,v is "the flux of fourmomentum ptL across a surface of constant xv." In fact, this definition is not going to be incredibly useful; in Chapter 4 we will define the energy-momentum tensor in terms of a functional derivative of the action with respect to the metric, which will be a more algorithmic procedure for finding an explicit expression for yµ,v. But the definition here does afford some physical insight. Consider an infinitesimal element of the fluid in its rest frame, where there are no bulk motions. Then
r 00 , the "flux of p0 (energy) in the x 0 (time) direction," is simply the rest-frame energy density p. Similarly, in this frame, yOi = yiO is the momentum density.
The spatial components yiJ are the momentum flux, or the stress; they represent the forces between neighboring infinitesimal elements of the fluid. Off-diagonal terms in yiJ represent shearing terms, such as those due to viscosity. A diagonal term such as T 11 gives the x-component of the force being exerted (per unit area) by a fluid element in the x-direction; this is what we think of as the x-component of the pressure, Px (don't confuse it with the momentum). The pressure has three components, given in the fluid rest frame (in inertial coordinates) by

Pi= T ii ·

(l.107)

There is no sum over i. To make this more concrete, let's start with the simple example of dust. (Cos-
mologists tend to use "matter" as a synonym for dust.) Dust may be defined in flat spacetime as a collection of particles at rest with respect to each other. The four-velocity field UIL(x) is clearly going to be the constant four-velocity of the individual particles. Indeed, its components will be the same at each point. Define the number-flux four-vector to be

(l.108)

34

Chapter 1 Special Relativity and Flat Spacetime

where n is the number density of the particles as measured in their rest frame. (This doesn't sound coordinate-invariant, but it is; in any frame, the number density that would be measured if you were in the rest frame is a fixed quantity.) Then N° is the number density of particles as measured in any other frame, while Ni is the flux of particles in the xi direction. Let's now imagine that each of the particles has the same mass m. Then in the rest frame the energy density of the dust is given by

p=mn.

(1.109)

By definition, the energy density completely specifies the dust. But p only measures the energy density in the rest frame; what about other frames? We notice that both n and m are 0-components of four-vectors in their rest frame; specifi-
cally, NI-L = (n, 0, 0, 0) and pl-L = (m, 0, 0, 0). Therefore pis theµ, = 0, v = 0
component of the tensor p ® N as measured in its rest frame. We are therefore led to define the energy-momentum tensor for dust:

(l.110)

where p is defined as the energy density in the rest frame. (Typically you don't just guess energy-momentum tensors by such a procedure, you derive them from equations of motion or an action principle.) Note that the pressure of the dust in any direction is zero; this should not be surprising, since pressure arises from the random motions of particles within the fluid, and we have defined dust to be free of such motions.
Dust is not sufficiently general to describe most of the interesting fluids that appear in general relativity; we only need a slight generalization, however, to arrive at the concept of a perfect fluid. A perfect fluid is one that can be completely specified by two quantities, the rest-frame energy density p, and an isotropic restframe pressure p. The single parameter p serves to specify the pressure in every direction. A consequence of isotropy is that TJ-Lv is diagonal in its rest framethere is no net flux of any component of momentum in an orthogonal direction. Furthermore, the nonzero spacelike components must all be equal, T 11 = T 22 =
r T 33 . The only two independent numbers are therefore the energy density p = 00
and the pressure p = yii; we don't need a subscript on p, since the pressure is
equal in every direction. The energy-momentum tensor of a perfect fluid therefore takes the following form in its rest frame:
0 0 p 0 0 p 0 0
(Remember that we are in flat spacetime; this will change when curvature is introduced.) We would like, of course, a formula that is good in any frame. For dust
we had TJ-Lv = pUI-LUv, so we might begin by guessing (p + p)UI-LUv, which

1.9 Energy and Momentum

35

gives

p +0 p 0O 0O 00 )
( 0 000 • 0 000

(1.112)

This is not a very clever guess, to be honest. But by subtracting this guess from our desired answer, we see that what we need to add is

0 p 0 00 ) 0p O•
0 0 p

(1.113)

Fortunately, this has an obvious covariant generalization, namely PrJµ,v ,__Thus, the general form of the energy-momentum tensor for a perfect fluid is

I T"" = (p + p)U"U" + pry""- I

(1.114)

It may seem that the procedure used to arrive at this formula was somewhat arbitrary, but we can have complete confidence in the result. Given that (l.111) should be the form of T tLv in the rest frame, and that (1.114) is a perfectly tensorial ex-
pression that reduces to (l.111) in the rest frame, we know that (l.114) must be the right expression in any frame.
The concept of a perfect fluid is general enough to describe a wide variety of physical forms of matter. To determine the evolution of such a fluid, we specify
an equation of state relating the pressure to the energy density, p = p (p). Dust is
a special case for which p = 0, while an isotropic gas of photons has p = ½p. A
more exotic example is vacuum energy, for which the energy-momentum tensor
is proportional to the metric, yµ,v = -PvacrJµ,v_ By comparing to (l.114) we find that vacuum energy is a kind of perfect fluid for which Pvac = - Pvac• The notion
of an energy density in vacuum is completely pointless in special relativity, since in nongravitational physics the absolute value of the energy doesn't matter, only the difference in energy between two states. In general relativity, however, all energy couples to gravity, so the possibility of a nonzero vacuum energy will become an important consideration, which we will discuss more fully in Chapter 4.
Besides being symmetric, T µ,v has the even more important property of being conserved. In this context, conservation is expressed as the vanishing of the "divergence":

(l.115)

This expression is a set of four equations, one for each value of v. The equation with v = 0 corresponds to conservation of energy, while aµ, T µ,k = 0 expresses

36

Chapter 1 Special Relativity and Flat Spacetime

conservation of the kth component of the momentum. Let's apply this equation to a perfect fluid, for which we have

To analyze what this equation means, it is helpful to consider separately what
happens when we project it into pieces along and orthogonal to the four-velocity
field UI-L. We first note that the normalization Uv uv = -1 implies the useful
identity

UvaJ-Luv = ½a/-L(UvUV) = 0.

(1.117)

To project (1.116) along the four-velocity, simply contract it into Uv:

UvaJ-LTJ-LV = -a/-L(pUI-L) - pa/-LUI-L.

(1.118)

Setting this to zero gives the relativistic equation of energy conservation for a perfect fluid. It will look more familiar in the nonrelativistic limit, in which

p « p.

(1.119)

The last condition makes sense, because pressure comes from the random motions of the individual particles, and in this limit these motions (as well as the bulk motion described by UI-L) are taken to be small. So in ordinary nonrelativistic language, (1.118) becomes

at p + V • (pv) = 0,

(1.120)

the continuity equation for the energy density. We next consider the part of (1.116) that is orthogonal to the four-velocity. To project a vector orthogonal to UJ-L, we multiply it by the projection tensor

(l.121)

11, To convince yourself this does the trick, check that if we have a vector v parallel
to UI-L, and another vector Wf, perpendicular to UI-L, the projection tensor will annihilate the parallel vector and preserve the orthogonal one:

pa V vii= 0

Pav WI= Wf.

(1.122)

Applied to aJ-LTJ-Lv, we obtain

pa VaJ-LTJ-LV = (p + p)UI-La/-Lua + aa p + uauJ-LaJ-LP·

(l.123)

In the nonrelativistic limit given by (1.119), setting the spatial components of this expression equal to zero yields

p[3tv+ (v• V)v] + Vp +v(atP +v• Vp) = 0.

(1.124)

1.10 Classical Field Theory

37

But notice that the last set of terms involve derivatives of p times the threevelocity v, assumed to be small; these will therefore be negligible compared to the V p term, and can be neglected. We are left with

p [3tv + (v · V)v] = -Vp,

(1.125)

which is the Euler equation familiar from fluid mechanics.

1.10 ■ CLASSICAL FIELD THEORY

When we make the transition from special relativity to general relativity, the met-

ric f/µ,v will be promoted to a dynamical tensor field, gµ,v(x). GR is thus a par-

ticular example of a classical field theory; we can build up some feeling for how

such theories work by considering classical fields defined on flat spacetime. (We

say classical field theory in contrast with quantum field theory, which is quite a

different story; we will discuss it briefly in Chapter 9, but it is outside our main

area of interest here.)

Let's begin with the familiar example of the classical mechanics of a single

particle in one dimension with coordinate q(t). We can derive the equations of

motion for such a particle by using the "principle of least action": we search for

critical points (as a function of the trajectory) of an action S, written as

fS = dt L(q, q),

(1.126)

where the function L(q, q) is the Lagrangian. The Lagrangian in point-particle mechanics is typically of the form

L = K-V,

(1.127)

where K is the kinetic energy and V the potential energy. Following the calculusof-variations procedure, which is described in any advanced textbook on classical mechanics, we show that critical points of the action [trajectories q(t) for which S remains stationary under small variations] are those that satisfy the EulerLagrange equations,

aL d ( aL )- O aq - dt a(q) - •
For example, L = ½iJ.2 - V(q) leads to
.. dV q = - dq.

(1.128) (1.129)

Field theory is a similar story, except that we replace the single coordinate q(t) by a set of spacetime-dependent fields, <J>i (xµ,), and the action S becomes ajunctional of these fields. A functional is simply a function of an infinite number of

38

Chapter 1 Special Relativity and Flat Spacetime

variables, such as the values of a field in some region of spacetime. Functionals are often expressed as integrals. Each <l>i is a function on spacetime (at least in some coordinate system), and i is an index labeling our individual fields. For example, in electromagnetism (as we will see below) the fields are the four components of a one-form called the "vector potential," A J-L:
(1.130)
We're being very lowbrow here, in thinking of a one-form field as four different functions rather than a single tensor object. This point of view makes sense so long as we stick to a fixed coordinate system, and it will make our calculations more straightforward.
In field theory, the Lagrangian can be expressed as an integral over space of a Lagrange density £, which is a function of the fields <l>i and their spacetime derivatives aJ-L <J>i:

(1.131)

So the action is

(1.132)

The Lagrange density is a Lorentz scalar. We typically just say "Lagrangian" when we mean "Lagrange density." It will most often be convenient to define a field theory by specifying the Lagrange density, from which all of the equations of motion can be readily derived.
We will use "natural units," in which not only c = 1 but also Ii = k = 1, where Ii = h/2Tt, his Planck's constant, and k is Boltzmann's constant. The objection
might be raised that we shouldn't involve Ii in a purely classical discussion; but all we are doing here is choosing units, not determining physics. (The relevance of Ii would appear if we were to quantize our field theory and obtain particles, but we won't get that far right now.) In natural units we have

[energy] = [mass] = [(length)- 1] = [(time)- 1].

(1.133)

We will most often use energy or mass as our fundamental unit. Since the action is an integral of L (with units of energy) over time, it is dimensionless:

[SJ = [E][T] = Mo.

(1.134)

The volume element has units

(1.135)

so to get a dimensionless action we require that the Lagrange density have units

(1.136)

1.10 Classical Field Theory

39

The Euler-Lagrange equations come from requiring that the action be unchanged under small variations of the fields,

<l>i ➔ <l>i + 0<l>i, 3µ,<l>i ➔ 3µ,<l>i + 8(3µ,<l>i) = 3µ,<l>i + 3µ,(8<1>i).

(l.137) (1.138)

The expression for the variation in 3µ, <J>i is simply the derivative of the variation of <J>i. Since 8<l>i is assumed to be small, we may Taylor-expand the Lagrangian under this variation:

£(<I>i, aµ, <I>i) ➔ £(<I>i + o<I>i, aµ, <I>i + aµ,o<I>i)

=

. £(<1>1,

. a,l.l., <1>1)

+

-a_.c

8<1>

.
1

+

a<J>l

a.c .
3(3µ,<1> 1 )

.

all

(8<1>

1 ).

,..,

(1.139)

Correspondingly, the action goes to S ➔ S + 8S, with

(1.140)

We would like to factor out 8<J>i from the integrand, by integrating the second term by parts:

(l.141)

The final term is a total derivative-the integral of something of the form aµ, V µ, _ that can be converted to a surface term by Stokes's theorem (the four-dimensional version, that is; see Appendix E for a discussion). Since we are considering variational problems, we can choose to consider variations that vanish at the boundary (along with their derivatives). It is therefore traditional in such contexts to integrate by parts with complete impunity, always ignoring the boundary contributions. (Sometimes this is not okay, as in instanton calculations in Yang-Mills theory.)
We are therefore left with

f 8S =

d 4x

[ -a.c. a<J>l

a,l.l., ( 3(3aµ_, c<f>.l) )] o<I> i .

(1.142)

The functional derivative 8S/ 8<J>i of a functional S with respect to a function <J>i is defined to satisfy

f 8S =

d

4

8S x0-<J>.l

8<1>

.
1

'

(1.143)

40

Chapter 1 Special Relativity and Flat Spacetime

when such an expression is valid. We can therefore express the notion that S is at a critical point by saying that the functional derivative vanishes. The final equations of motion for our field theory are thus:

(1.144)

These are known as the Euler-Lagrange equations for a field theory in flat spacetime.
The simplest example of a field is a real scalar field:

¢(xtL) : (spacetime) ➔ R.

(1.145)

Slightly more complicated examples would include complex scalar fields, or maps from spacetime to any vector space or even any manifold (sometimes called "nonlinear sigma models"). Upon quantization, excitations of the field are observable as particles. Scalar fields give rise to spinless particles, while vector fields and other tensors give rise to higher-spin particles. If the field were complex instead of real, it would have two degrees of freedom rather than just one, which would be interpreted as a particle and a distinct antiparticle. Real fields are their own antiparticles. An example of a real scalar field would be the neutral Jr-meson.
So let's consider the classical mechanics of a single real scalar field. It will have an energy density that is a local function of spacetime, and includes various contributions:

kinetic energy : gradient energy : potential energy :

(1.146)

Actually, although the potential is a Lorentz-invariant function, the kinetic and gradient energies are not by themselves Lorentz-invariant; but we can combine them into a manifestly Lorentz-invariant form:

(1.147)

[The combination 1Jµ,v(3µ,</J)(3v</J) is often abbreviated as (3¢)2.J So a reasonable choice of Lagrangian for our single real scalar field, analogous to L = K - V in
the point-particle case, would be

(1.148)

This generalizes "kinetic minus potential energy" to "kinetic minus gradient mi-
nus potential energy density." Note that since [,CJ = M4, we must have [VJ = •
M 4 . Also, since [3µ,J = [3/3xtLJ = M1, we have

(1.149)

1.10 Classical Field Theory

41

For the Lagrangian (1.148) we have
3,e (l.150)
3¢
The second of these equations is a little tricky, so let's go through it slowly. When differentiating the Lagrangian, the trick is to make sure that the index placement is "compatible" (so that if you have a lower index on the thing being differentiated with respect to, you should have only lower indices when the same kind of object appears in the thing being differentiated), and also that the indices are strictly different. The first of these is already satisfied in our example, since we are differentiating a function of 3µ,¢ with respect to 3µ,¢• Later on, we will need to be more careful. To fulfill the second, we simply relabel dummy indices:

(1.151)

Then we can use the general rule, for any object with one index such as Vµ,, that

(1.152)

because each component of Va is treated as a distinct variable. So we have

a(a:,t,) [~pa (ap,P)(a0 ,t,)] = ~pa [O~(aa,P) + (ap,P)O,;.'] = 1]µ,a (3a</J) + 1JPIL(3p</J) = 21]µ,v3v</J.

(1.153)

This leads to the second expression in (1.150). Putting (1.150) into (1.144) leads to the equation of motion

(l.154)
where □ = 1Jµ,v3µ,3v is known as the d'Alembertian. Note that our metric sign
convention(-+++) comes into this equation; with the alternative (+---)convention the sign would have been switched. In flat spacetime (l.154) is equivalent to
(1.155)

A popular choice for the potential V is that of a simple harmonic oscillator, V(</J) = ½m 2¢ 2. The parameter mis called the mass of the field, and you should notice that the units work out correctly. You may be wondering how a field can have mass. When we quantize the field we find that momentum eigenstates are collections of particles, each with mass m. At the classical level, we think of "mass" as simply a convenient characterization of the field dynamics. Then our

42

Chapter 1 Special Relativity and Flat Spacetime

equation of motion is

(1.156)

the famous Klein-Gordon equation. This is a linear differential equation, so the sum of two solutions is a solution; a complete set of solutions (in the form of plane waves) is easy to find, as you can check for yourself.
A slightly more elaborate example of a field theory is provided by electromagnetism. We mentioned that the relevant field is the vector potential Aµ,; the timelike component Ao can be identified with the electrostatic potential <I>, and the spacelike components with the traditional vector potential A (in terms of which
the magnetic field is given by B = V x A). The field strength tensor, with com-
ponents given by (1.69), is related to the vector potential by

(1.157)

From this definition we see that the field strength tensor has the important property of gauge invariance: when we perform a gauge transformation on the vector potential,
(1.158)

the field strength tensor is left unchanged:

(1.159)
The last equality follows from the fact that partial delivatives commute, 3µ,3v =
av aµ,. Gauge invaliance is a symmetry that is fundamental to our understanding of electromagnetism, and all observable quantities must be gauge-invaliant. Thus, while the dynamical field of the theory (with respect to which we vary the action to derive equations of motion) is Aµ,, physical quantities will generally be expressed in terms of Fµ,v·
We already know that the dynamical equations of electromagnetism are Maxwell's equations, (1.96) and (l.97). Given the definition of the field stregth tensor in terms of the vector potential, (l.97) is actually automatic:

(1.160)

again because partial derivatives commute. On the other hand, (1.96) is equivalent to Euler-Lagrange equations of the form

(1.161)

if we presciently choose the Lagrangian to be
£ = -¼Fµ,vFµ,v + Aµ,lµ,.

(l.162)

1.10 Classical Field Theory

43

For this choice, the first term in the Euler-Lagrange equation is straightforward:

(1.163)

The second term is tricker. First we write F1,LvF1,Lv as

FJ,lV F J,lV = Faf3 Faf3 = 1}ap 1} f3aFaf3 F pa·

(1.164)

We want to work with lower indices on FJ,lv, since we are differentiating with respect to al,lAv, which has lower indices. Likewise we change the dummy indices on F1,LvF1,Lv, since we want to have different indices on the thing being differentiated and the thing we are differentiating with respect to. Once you get familiar with this stuff it will become second nature and you won't need nearly so many steps. This lets us write

(l.165)

3Faf3 - 81,lov - 81,lov 3(31,lAv) - a f3 f3 a· Combining (1.166) with (1.165) yields

(1.166)

a(3F(3a1f3,lFAavf)3) = .i,PPn.,f3a [cola,l8Vf3 - 8f/3,l8aV)Fpa + (8JP,l8Va - 8Ja,l8PV)F.af3 ] = (171,LPrJVa - rJvprJJ,la)Fpa + (rJaJ,llV - 1}avl/,l)Faf3 = FJ,lV - FVJ,l + FJ,lV - FVJ,l
(1.167)

so
a,e
---=-FJ,lV_ 3(31,lAv)
Then sticking (1.163) and (1.168) into (1.161) yields precisely (1.96):

(1.168)

(1.169)

Note that we switched the order of the indices on F/,lv in order to save ourselves from an unpleasant minus sign.
You may wonder what the purpose of introducing a Lagrangian formulation is, if we were able to invent the equations of motion before we ever knew the Lagrangian (as Maxwell did for his equations). There are a number of reasons,

44

Chapter 1 Special Relativity and Flat Spacetime

starting with the basic simplicity of positing a single scalar function of spacetime, the Lagrange density, rather than a number of (perhaps tensor-valued) equations of motion. Another reason is the ease with which symmetries are implemented; demanding that the action be invariant under a symmetry ensures that the dynamics respects the symmetry as well. Finally, as we will see in Chapter 4, the action leads via a direct procedure (involving varying with respect to the metric itself) to a unique energy-momentum tensor. Applying this procedure to (1.148) leads straight to the energy-momentum tensor for a scalar field theory,

(1.170)

Similarly, from (1.162) we can derive the energy-momentum tensor for electromagnetism,

r-rtLV _ ptLApV _ lntLvp)..aF

1 EM-

).. 4·1

Aa•

(1.171)

Using the appropriate equations of motion, you can show that these energy-
momentum tensors are conserved, 3µ,Tµ,v = 0 (and will be asked to do so in the
Exercises). The two examples we have considered-scalar field theory and electro-
magnetism-are paradigms for much of our current understanding of nature. The Standard Model of particle physics consists of three types of fields: gauge fields, Higgs fields, and fermions. The gauge fields describe the "forces" of nature, including the strong and weak nuclear forces in addition to electromagnetism. The gauge fields giving rise to the nuclear forces are described by one-form potentials, just as in electromagnetism; the difference is that they are matrix-valued rather than ordinary one-forms, and the symmetry groups corresponding to gauge transformations are therefore noncommutative (nonabelian) symmetries. The Higgs fields are scalar fields much as we have described, although they are also matrix-valued. The fermions include leptons (such as electrons and neutrinos) and quarks, and are not described by any of the tensor fields we have discussed here, but rather by a different kind of field called a spinor. We won't get around to discussing spinors in this book, but they play a crucial role in particle physics and their coupling to gravity is interesting and subtle. Upon quantization, these fields give rise to particles of different spins; gauge fields are spin-1, scalar fields are spin-0, and the Standard Model fermions are spin-½.
Before concluding this chapter, let's ask an embarassingly simple question: Why should we consider one classical field theory rather than some other one? More concretely, let's say that we have discovered some particle in nature, and we know what kind of field we want to use to describe it; how should we pick the Lagrangian for this field? For example, when we wrote down our scalar-field Lagrangian (1.148), why didn't we include a term of the form

(1.172)

1. 1 1 Exercises

45

where }.,, is a coupling constant? Ultimately, of course, we work by trial and error and try to fit the data given to us by experiment. In classical field theory, there's not much more we could do; generally we would start with a simple Lagrangian, and perhaps make it more complicated if the first try failed to agree with the data. But quantum field theory actually provides some simple guidelines, and since we use classical field theory as an approximation to some underlying quantum theory, it makes sense to take advantage of these principles. To make a long story short, quantum field theory allows "virtual" processes at arbitrarily high energies to contribute to what we observe at low energies. Fortunately, the effect of these processes can be summarized in a low-energy effective field theory. In the effective theory, which is what we actually observe, the result of high-energy processes is simply to "renormalize" the coupling constants of our theory. Consider an arbitrary coupling constant, which we can express as a parameterµ, (with dimensions of mass) raised to some power,}.,,= µ,q (unless)., is dimensionless, in which case the discussion becomes more subtle). Very roughly speaking, the effect of highenergy processes will be to makeµ, very large. Slightly more specifically, µ, will be pushed up to a scale at which new physics kicks in, whatever that may be. Therefore, potential higher-order terms we might think of adding to a Lagrangian are suppressed, because they are multiplied by coupling constants that are very small. For (1.172), for example, we must have }.,, = µ,- 2 , so )., will be tiny (becauseµ, will be big). Only the lowest-order terms we can put in our Lagrangian will come with dimensionless couplings (or ones with units of mass to a positive power), so we only need bother with those at low energies. This feature of field theory allows for a dramatic simplification in considering all of the models we might want to examine.
As mentioned at the beginning of this section, general relativity itself is a classical field theory, in which the dynamical field is the metric tensor. It is nevertheless fair to think of GR as somehow different; for the most part other classical field theories rely on the existence of a pre-existing spacetime geometry, whereas in GR the geometry is determined by the equations of motion. (There are exceptions to this idea, called topological field theories, in which the metric makes no appearance.) Our task in the next few chapters is to explore the nature of curved geometries as characterized by the spacetime metric, before moving in Chapter 4 to putting these notions to work in constructing a theory of gravitation.

1.11 ■ EXERCISES
1. Consider an inertial frame S with coordinates xtL = (t, x, y, z), and a frame S' with
coordinates xtL' related to S by a boost with velocity parameter v along the y-axis.
Imagine we have a wall at rest in S', lying along the line x' = -y'. From the point of
view of S, what is the relationship between the incident angle of a ball hitting the mirror (traveling in the x-y plane) and the reflected angle? What about the velocity before and after?

46

Chapter 1 Special Relativity and Flat Spacetime

2. Imagine that space (not spacetime) is actually a finite box, or in more sophisticated terms, a three-torus, of size L. By this we mean that there is a coordinate system xtL = (t, x, y, z) such that every point with coordinates (t, x, y, z) is identified with every
point with coordinates (t, x + L, y, z), (t, x, y + L, z), and (t, x, y, z + L). Note that
the time coordinate is the same. Now consider two observers; observer A is at rest in this coordinate system (constant spatial coordinates), while observer B moves in the x-direction with constant velocity v. A and B begin at the same event, and while A remains still, B moves once around the universe and comes back to intersect the worldline of A without ever having to accelerate (since the universe is periodic). What are the relative proper times experienced in this interval by A and B? Is this consistent with your understanding of Lorentz invariance?
3. Three events, A, B, C, are seen by observer O to occur in the order ABC. Another ob-
server, 6, sees the events to occur in the order CB A. Is it possible that a third observer
sees the events in the order AC B? Support your conclusion by drawing a spacetime diagram.
4. Projection effects can trick you into thinking that an astrophysical object is moving "superluminally." Consider a quasar that ejects gas with speed v at an angle 0 with respect to the line-of-sight of the observer. Projected onto the sky, the gas appears to travel perpendicular to the line of sight with angular speed Vapp/ D, where D is the distance to the quasar and Vapp is the apparent speed. Derive an expression for Vapp in terms of v and 0. Show that, for appropriate values of v and 0, Vapp can be greater than 1.
5. Particle physicists are so used to setting c = 1 that they measure mass in units of energy. In particular, they tend to use electron volts (1 eV = 1.6 x 10-12 erg= 1.8 x 10-33 g), or, more commonly, keV, MeV, and GeV (103 eV, 106 eV, and 109 eV, respectively). The muon has been measured to have a mass of 0.106 GeV and a rest frame lifetime of 2.19 x 10-6 seconds. Imagine that such a muon is moving in the circular storage ring of a particle accelerator, 1 kilometer in diameter, such that the muon's total energy is 1000 GeV. How long would it appear to live from the experimenter's point of view? How many radians would it travel around the ring?
6. In Euclidean three-space, let p be the point with coordinates (x, y, z) = (1, 0, -1). Consider the following curves that pass through p:
xi(),.)= (J,., (J,. - 1)2, -J,.)
xi(µ) = (cos f.l, sinµ,µ - 1)
+ xi (a)= (o-2, o-3 0-2, a).

(a) Calculate the components of the tangent vectors to these curves at p in the coordinate basis {ax, ay, az}.
(b) Let f = x2 + y2 - yz. Calculate df /dJ,., df /dµ and df /do-.

7. Imagine we have a tensor Xµ,v and a vector VIL, with components

0 0

1 3

-;1)

00 '

-2

ytL = (-1, 2, 0, -2).

1 .11 Exercises

47

Find the components of: (a) X/.Lv (b) Xµ, V (c) x(µ,v)
(d) X[µ,v] (e) X;_;_ (f) VILVµ, (g) Vµ,XJ.LV
8. If avTµ,v = QtL, what physically does the spatial vector Qi represent? Use the dust
energy momentum tensor to make your case.

9. For a system of discrete point particles the energy-momentum tensor takes the form

= '°' (a) (a)

T µ,v

L..J
a

Pµ, Pv O(a)
P

8(3)(x - x(a))
'

(1.173)

where the index a labels the different particles. Show that, for a dense collection of particles with isotropically distributed velocities, we can smooth over the individual particle worldlines to obtain the perfect-fluid energy-momentum tensor (1.114).

10. Using the tensor transformation law applied to Fµ,v, show how the electric and magnetic field 3-vectors E and B transform under (a) a rotation about the y-axis, (b) a boost along the z-axis.

11. Verify that (1.98) is indeed equivalent to (1.97), and that they are both equivalent to the last two equations in (1.93).

12. Consider the two field theories we explicitly discussed, Maxwell's electromagnetism
(let J/.L = 0) and the scalar field theory defined by (1.148).
(a) Express the components of the energy-momentum tensors of each theory in threevector notation, using the divergence, gradient, curl, electric, and magnetic fields, and an overdot to denote time derivatives.
(b) Using the equations of motion, verify (in any notation you like) that the energymomentum tensors are conserved.

13. Consider adding to the Lagrangian for electromagnetism an additional term of the form
= .C' Eµ,vpa pµ,v ppa.
(a) Express .C' in terms of E and B.
(b) Show that including .C' does not affect Maxwell's equations. Can you think of a deep reason for this?

CHAPTER
2

Manifolds

2.1 ■ GRAVITY AS GEOMETRY
Gravity is special. In the context of general relativity, we ascribe this specialness to the fact that the dynamical field giving rise to gravitation is the metric tensor describing the curvature of spacetime itself, rather than some additional field propagating through spacetime; this was Einstein's profound insight. The physical principle that led him to this idea was the universality of the gravitational interaction, as formalized by the Principle of Equivalence. Let's see how this physical principle leads us to the mathematical strategy of describing gravity as the geometry of a curved manifold.
The Principle of Equivalence comes in a variety of forms, the first of which is the Weak Equivalence Principle, or WEP. The WEP states that the inertial mass and gravitational mass of any object are equal. To see what this means, think about Newtonian mechanics. The Second Law relates the force exerted on an object to the acceleration it undergoes, setting them proportional to each other with the constant of proportionality being the inertial mass mi:
(2.1)
The inertial mass clearly has a universal character, related to the resistance you feel when you try to push on the object; it takes the same value no matter what kind of force is being exerted. We also have Newton's law of gravitation, which can be thought of as stating that the gravitational force exerted on an object is proportional to the gradient of a scalar field <I>, known as the gravitational potential. The constant of proportionality in this case is called the gravitational mass mg:
(2.2)
On the face of it, mg has a very different character than mi; it is a quantity specific to the gravitational force. If you like, mg/mi can be thought of as the "gravitational charge" of the body. Nevertheless, Galileo long ago showed (apocryphally by dropping weights off of the Leaning Tower of Pisa, actually by rolling balls down inclined planes) that the response of matter to gravitation is universalevery object falls at the same rate in a gravitational field, independent of the composition of the object. In Newtonian mechanics this translates into the WEP, which is simply
(2.3)
48

2.1 Gravity as Geometry

49

for any object. An immediate consequence is that the behavior of freely-falling test particles is universal, independent of their mass (or any other qualities they may have); in fact, we have

a=-V<I>.

(2.4)

Experimentally, the independence of the acceleration due to gravity on the composition of the falling object has been verified to extremely high precision by the Eotvos experiment and its modem successors.
This suggests an equivalent formulation of the WEP: there exists a preferred class of trajectories through spacetime, known as inertial (or "freely-falling") trajectories, on which unaccelerated particles travel-where unaccelerated means "subject only to gravity." Clearly this is not true for other forces, such as electromagnetism. In the presence of an electric field, particles with opposite charges will move on quite different trajectories. Every particle, on the other hand, has an identical gravitational charge.
The universality of gravitation, as implied by the WEP, can be stated in another, more popular, form. Imagine that we consider a physicist in a tightly sealed box, unable to observe the outside world, who is doing experiments involving the motion of test particles, for example to measure the local gravitational field. Of course she would obtain different answers if the box were sitting on the moon or on Jupiter than she would on Earth. But the answers would also be different if the box were accelerating at a constant velocity; this would change the acceleration of the freely-falling particles with respect to the box. The WEP implies that there is no way to disentangle the effects of a gravitational field from those of being in a uniformly accelerating frame, simply by observing the behavior of freely-falling particles. This follows from the universality of gravitation; in electrodynamics, in contrast, it would be possibl~ to distinguish between uniform acceleration and an electromagnetic field, by observing the behavior of particles with different charges. But with gravity it is impossible, since the "charge" is necessarily proportional to the (inertial) mass.
To be careful, we should limit our claims about the impossibility of distinguishing gravity from uniform acceleration by restricting our attention to "small enough regions of spacetime." If the sealed box were sufficiently big, the gravitational field would change from place to place in an observable way, while the effect of acceleration would always be in the same direction. In a rocket ship or elevator, the particles would always fall straight down. In a very big box in a gravitational field, however, the particles would move toward the center of the Earth, for example, which would be a different direction for widely separated experiments. The WEP can therefore be stated as follows: The motion offreely-falling particles are the same in a gravitational field and a uniformly accelerated frame, in small enough regions ofspacetime. In larger regions of spacetime there will be inhomogeneities in the gravitational field, which will lead to tidal forces, which can be detected.
After the advent of special relativity, the concept of mass lost some of its uniqueness, as it became clear that mass was simply a manifestation of energy

50

Chapter 2 Manifolds

and momentum (as we have seen in Chapter 1). It was therefore natural for Einstein to think about generalizing the WEP to something more inclusive. His idea was simply that there should be no way whatsoever for the physicist in the box to distinguish between uniform acceleration and an external gravitational field, no matter what experiments she did (not only by dropping test particles). This reasonable extrapolation became what is now known as the Einstein Equivalence Principle, or EEP: In small enough regions of spacetime, the laws ofphysics reduce to those of special relativity; it is impossible to detect the existence of a gravitational field by means of local experiments.
In fact, it is hard to imagine theories that respect the WEP but violate the EEP. Consider a hydrogen atom, a bound state of a proton and an electron. Its mass is actually less than the sum of the masses of the proton and electron considered individually, because there is a negative binding energy-you have to put energy into the atom to separate the proton and electron. According to the WEP, the gravitational mass of the hydrogen atom is therefore less than the sum of the masses of its constituents; the gravitational field couples to electromagnetism (which holds the atom together) in exactly the right way to make the gravitational mass come out right. This means that not only must gravity couple to rest mass universally, but also to all forms of energy and momentum-which is practically the claim of the EEP. It is possible to come up with counterexamples, however; for example, we could imagine a theory of gravity in which freely falling particles began to rotate as they moved through a gravitational field. Then they could fall along the same paths as they would in an accelerated frame (thereby satisfying the WEP), but you could nevertheless detect the existence of the gravitational field (in violation of the EEP). Such theories seem contrived, but there is no law of nature that forbids them.
Sometimes a distinction is drawn between "gravitational laws of physics" and "nongravitational laws of physics," and the EEP is defined to apply only to the latter. Then the Strong Equivalence Principle (SEP) is defined to include all of the laws of physics, gravitational and otherwise. A theory that violated the SEP but not the EEP would be one in which the gravitational binding energy did not contribute equally to the inertial and gravitational mass of a body; thus, for example, test particles with appreciable self-gravity (to the extent that such a concept makes sense) could fall along different trajectories than lighter particles.
It is the EEP that implies (or at least suggests) that we should attribute the action of gravity to the curvature of spacetime. Remember that in special relativity a prominent role is played by inertial frames-while it is not possible to single out some frame of reference as uniquely "at rest," it is possible to single out a family of frames that are "unaccelerated" (inertial). The acceleration of a charged particle in an electromagnetic field is therefore uniquely defined with respect to these frames. The EEP, on the other hand, implies that gravity is inescapablethere is no such thing as a "gravitationally neutral object" with respect to which we can measure the acceleration due to gravity. It follows that the acceleration due to gravity is not something that can be reliably defined, and therefore is of little use.

2.1 Gravity as Geometry

51

Instead, it makes more sense to define "unaccelerated" as "freely falling," and that is what we shall do. From here we are led to the idea that gravity is not a "force"-a force is something that leads to acceleration, and our definition of zero acceleration is "moving freely in the presence of whatever gravitational field happens to be around."
This seemingly innocuous step has profound implications for the nature of spacetime. In SR, we have a procedure for starting at some point and constructing an inertial frame that stretches throughout spacetime, by joining together rigid rods and attaching clocks to them. But, again due to inhomogeneities in the gravitational field, this is no longer possible. If we start in some freely-falling state and build a large structure out of rigid rods, at some distance away freely-falling objects will look like they are accelerating with respect to this reference frame, as shown in Figure 2.1. The solution is to retain the notion of inertial frames, but to discard the hope that they can be uniquely extended throughout space and time. Instead we can define locally inertial frames, those that follow the motion of individual freely falling particles in small enough regions of spacetime. (Every time we say "small enough regions," purists should imagine a limiting procedure in which we take the appropriate spacetime volume to zero.) This is the best we can do, but it forces us to give up a good deal. For example, we can no longer speak with confidence about the relative velocity of far-away objects, since the inertial reference frames appropriate to those objects are completely different from those appropriate to us.
Our job as physicists is to construct mathematical models of the world, and then test the predictions of such models against observations and experiments. Following the implications of the universality of gravitation has led us to give up on the idea of expressing gravity as a force propagating through spacetime,

FIGURE 2.1 Failure of global frames. Since every particle feels the influence.of gravity, we define "unaccelerating" as "freely falling." As a consequence, it becomes impossible to define globally inertial coordinate systems by the procedure outlined in Chapter 1, since particles initially at rest will begin to move with respect to such a frame.

52

Chapter 2 Manifolds

FIGURE 2.2 The Doppler shift as measured by two rockets separated by a distance z,
each feeling an acceleration a.

FIGURE 2.3 Gravitational redshift on the surface of the Em.th, as measured by observers at different elevations.

and indeed to give up on the idea of global reference frames stretching throughout spacetime. We therefore need to invoke a mathematical framework in which physical theories can be consistent with these conclusions. The solution will be to imagine that spacetime has a curved geometry, and that gravitation is a manifestation of this curvature. The appropriate mathematical structure used to describe curvature is that of a differentiable manifold: essentially, a kind of set that looks locally like flat space, but might have a very different global geometry. (Remember that the EEP can be stated as "the laws of physics reduce to those of special relativity in small regions of spacetime," which matches well with the mathematical notion of a set that locally resembles flat space.)
We cannot prove that gravity should be thought of as the curvature of spacetime; instead we can propose the idea, derive its consequences, and see if the result is a reasonable fit to our experience of the world. Let's set about doing just that.
Consider one of the celebrated predictions of the EEP, the gravitational redshift. Imagine two boxes, a distance z apart, each moving with some constant acceleration a in a region far away from any gravitational fields, as shown in Figure 2.2. At time to the trailing box emits a photon of wavelength "-O· The boxes remain a constant distance apart, so the photon reaches the leading box after a
time 1:.:..t = z/c in our background reference frame. (We assume 1:.:..v/c is small,
so we only work to first order.) In this time the boxes will have picked up an additional velocity 1:.:..v = al:.:..t = az/c. Therefore, the photon reaching the leading box will be redshifted by the conventional Doppler effect, by an amount

az

C

c2.

(2.5)

According to the EEP, the same thing should happen in a uniform gravitational field. So we imagine a tower of height z sitting on the surface of a planet, with ag the strength of the gravitational field (what Newton would have called the "acceleration due to gravity"), as portrayed in Figure 2.3. We imagine that observers in the box at the top of the tower are able to detect photons emitted from the ground, but are otherwise unable to look outside and see that they are sitting on a tower.

2 .1 Gravity as Geometry

53

In other words, they have no way of distinguishing this situation from that of the accelerating rockets. Therefore, the EEP allows us to conclude immediately that a photon emitted from the ground with wavelength AO will be redshifted by an amount

llA GgZ

~=?I·

(2.6)

This is the famous gravitational redshift. Notice that it is a direct consequence of the EEP; the details of general relativity were not required.
The formula for the redshift is more often stated in terms of the Newtonian
potential <I>, where ag = V<I>. (The sign is changed with respect to the usual
convention, since we are thinking of ag as the acceleration of the reference frame, not of a particle with respect to this reference frame.) A nonconstant gradient of <I> is like a time-varying acceleration, and the equivalent net velocity is given by integrating over the time between emission and absorption of the photon. We then have

(2.7)
where fl <I> is the total change in the gravitational potential, and we have once
again set c = 1. This simple formula for the gravitational redshift continues to be
true in more general circumstances. Of course, by using the Newtonian potential at all, we are restricting our domain of validity to weak gravitational fields.
From the EEP we have argued in favor of a gravitational redshift; we may now use this phenomenon to provide further support for the idea that we should think of spacetime as curved. Consider the same experimental setup that we had before, now portrayed on the spacetime diagram in Figure 2.4. A physicist on the ground
emits a beam of light with wavelength AO from a height zo, which travels to the top of the tower at height zl • The time between when the beginning of any single
wavelength of the light is emitted and the end of that same wavelength is emitted
is llto = AO/c, and the same time interval for the absorption is llt1 = A1/ c, where
time is measured by clocks located at the respective elevations. Since we imagine that the gravitational field is static, the paths through spacetime followed by the leading and trailing edge of the single wave must be precisely congruent. (They are represented by generic curved paths, since we do not pretend that we know just what the paths will be.) Simple geometry seems to imply that the times llto and llt1 must be the same. But of course they are not; the gravitational redshift implies that the elevated experimenters observe fewer wavelengths per second, so that llt1 > llto. We can interpret this roughly as "the clock on the tower appears to run more quickly." What went wrong? Simple geometry-the spacetime through which the photons traveled was curved.

54

Chapter 2 Manifolds

I
!1t0
L

zo

z

FIGURE 2.4 Spacetime diagram of the gravitational-redshift experiment portrayed in Figure 2.3. Spacetime paths beginning at different moments are congruent, but the time intervals as measured on the ground and on the tower are different, signaling a breakdown of Euclidean geometry.

We therefore would like to describe spacetime as a kind of mathematical structure that looks locally like Minkowski space, but may possess nontrivial curvature over extended regions. The kind of object that encompasses this notion is that of a manifold. In this chapter we will confine ourselves to understanding the concept of manifolds and the structures we may define on them, leaving the precise characterization of curvature for the next chapter.

2.2 ■ WHAT IS A MANIFOLD?
Manifolds (or differentiable manifolds) are one of the most fundamental concepts in mathematics and physics. We are all used to the properties of n-dimensional Euclidean space, Rn, the set of n-tuples (x 1, ... , xn), often equipped with a flat positive-definite metric with components Oij • Mathematicians have worked for many years to develop the theory of analysis in Rn-differentiation, integration, properties of functions, and so on. But clearly there are other spaces (spheres, for example) which we intuitively think of as "curved" or perhaps topologically complicated, on which we would like to perform analogous operations.
To address this problem we invent the notion of a manifold, which corresponds to a space that may be curved and have a complicated topology, but in local regions looks just like Rn. Here by "looks like" we do not mean that the metric is the same, but only that more primitive notions like functions and coordinates work in a similar way. The entire manifold is constructed by smoothly sewing together these local regions. A crucial point is that the dimensionality n of the Euclidean spaces being used must be the same in every patch of the manifold; we then say

2.2 What Is a Manifold?

55

that the manifold is of dimension n. With this approach we can analyze functions on such a space by converting them (locally) to functions in a Euclidean space. Examples of manifolds include:
• Rn itself, including the line (R), the plane (R2), and so on. This should be obvious, since Rn looks like Rn not only locally but globally.
• The n-sphere, sn. This can be defined as the locus of all points some fixed distance from the origin in Rn+l. The circle is of course S1, and the twosphere S2 is one of the most useful examples of a manifold. (The zero-
sphere s0, if you think about it, consists of two points. We say that s0 is a
disconnected zero-dimensional manifold.) It's worth emphasizing that the definition of sn in terms of an embedding in Rn+ 1 is simply a convenient shortcut; all of the manifolds we will discuss may be defined in their own right, without recourse to higher-dimensional flat spaces.
• Then-torus yn results from taking an n-dimensional cube and identifying opposite sides. The two-torus T 2 is a square with opposite sides identified, as shown in Figure 2.5. The surface of a doughnut is a familiar example.
• A Riemann surface of genus g is essentially a two-torus with g holes instead of just one, as shown in Figure 2.6. S2 may be thought of as a Riemann surface of genus zero. In technical terms (not really relevant to our present dis-

identifying opposite sides
FIGURE 2.5 The torus, T 2, constructed by identifying opposite sides of a square.

genus 0

genus 1

genus 2

FIGURE 2.6 Riemann surfaces of different genera (plural of "genus").

56

Chapter 2 Manifolds

cussion), every "compact orientable boundaryless" two-dimensional manifold is a Riemann surface of some genus.
• More abstractly, a set of continuous transformations such as rotations in Rn forms a manifold. Lie groups are manifolds that also have a group structure. So for example S0(2), the set of rotations in two dimensions, is the same manifold as S1 (although in general group manifolds will be more complicated than spheres).
• The direct product of two manifolds is a manifold. That is, given manifolds M and M' of dimension n and n', we can construct a manifold M x M',
of dimension n + n', consisting of ordered pairs (p, p') with p E M and
p' EM'.
With all of these examples, the notion of a manifold may seem vacuous: what isn't a manifold? Plenty of things are not manifolds, because somewhere they do not look locally like Rn. Examples include a one-dimensional line running into a two-dimensional plane, and two cones stuck together at their vertfces, as portrayed in Figure 2.7. More subtle examples are shown in Figure 2.8. Consider for example a single (two-dimensional) cone. There is clearly a sense in which the cone looks locally like R2; at the same time, there is just as clearly something singular about the vertex of the cone. This is where the word "differentiable" in "differentiable manifold" begins to play a role; as we will see when we develop the formal definition, the cone can be thought of as a type of manifold, but one that is not smooth at its vertex. (Other types of singularities are more severe, and will prevent us from thinking of certain spaces as manifolds, smooth or otherwise.) Another example is a line segment (with endpoints included). This certainly will

FIGURE 2.7 Examples of spaces that are not manifolds: a line ending on a plane, and two cones intersecting at their vertices. In each case there is a point that does not look locally like a Euclidean space of fixed dimension.

2.2 What Is a Manifold?

57

FIGURE 2.8 Subtle examples. The single cone can be thought of as a manifold, but not a smooth one, due to the singularity at its origin. A line segment is not a manifold, but may be described by the more general notion of "manifold with boundary."

not fit under the definition of manifolds we will develop, due to the endpoints.

Nevertheless, we can extend the definition to include "manifolds with boundary,"

of which the line segment is a paradigmatic example. A brief discussion of mani-

folds with boundary is in Appendix D.

These subtle cases should convince you of the need for a rigorous definition,

which we now begin to construct; our discussion follows that of Wald (1984).

The informal idea of a manifold is that of a space consisting of patches that look

locally like Rn, and are smoothly sewn together. We therefore need to formalize

the notions of "looking locally like Rn" and "smoothly sewn together." We require

a number of preliminary definitions, most of which are fairly clear, but it's nice to

be complete. The most elementary notion is that of a map between two sets. (We

assume you know what a set is, or think you do; we won't need to be too precise.)

Given two sets M and N, a map ¢ : M ➔ N is a relationship that assigns, to

each element of M, exactly one element of N. A map is therefore just a simple

generalization of a function. Given two maps ¢ : A ➔ B and V' : B ➔ C, we

define the composition V' o ¢ : A ➔ C by the operation (to ¢)(a) = t(</J(a)),

as in Figure 2.9. So a E A, ¢(a) E B, and thus (to ¢)(a) E C. The order in

which the maps are written makes sense, since the one on the right acts first.

)

A map ¢ is called one-to-one (or injective) if each element of N has at most one element of M mapped into it, and onto (or surjective) if each element of N has

at least one element of M mapped into it. (If you think about it, better names for

"one-to-one" would be "one-from-one" or for that matter "two-to-two.") Consider

functions ¢ : R ➔ R. Then¢ (x) = ex is one-to-one, but not onto; ¢ (x) = x 3 - x is onto, but not one-to-one; ¢(x) = x 3 is both; and ¢(x) = x 2 is neither, as in

Figure 2.10.

FIGURE 2.9 The map 1/r o 4> A ➔ C is formed by composing¢ : A ➔ B and
yr:B ➔ C.

58

Chapter 2 Manifolds

X

X

X
one-to-one, not onto

onto, not one-to-one

both FIGURE 2.10 Types of maps.

X
neither

cf>
¢-1 FIGURE 2.11 A map and its inverse.

The set M is known as the domain of the map ¢, and the set of points in N that M gets mapped into is called the image of¢. For any subset U c N, the
set of elements of M that get mapped to U is called the preimage of U under ¢, or ¢-1(U). A map that is both one-to-one and onto is known as invertible (or bijective). In this case we can define the inverse map ¢-1 : N ➔ M by
(¢- 1 o ¢)(a) = a, as in Figure 2.11. Note that the same symbol ¢-1 is used for
both the preimage and the inverse map, even though the former is always defined
and the latter is only defined in some special cases.
The notion of continuity of a map is actually a very subtle one, the precise
formulation of which we won't need. Instead we will assume you understand the
concepts of continuity and differentiability as applied to ordinary functions, maps ¢ : R ➔ R. It will then be useful to extend these notions to maps between more general Euclidean spaces, ¢ : Rm ➔ Rn. A map from Rm to Rn takes an m-tuple (x 1, x 2, ... , xm) to an n-tuple (y 1, y2, ... , yn), and can therefore be thought of as a collection of n functions </Ji of m variables:

= y 1

,+,1
'f/

(X 1 ,

X 2 ,

...

,

X m )

= y2 qJ2 (X 1, X 2 , ... , X m )

(2.8)

y n ='fA/.n(X 1,X2 , ... ,Xm) .
We will refer to any one of these functions as CP if its pth derivative exists and is continuous, and refer to the entire map ¢ : Rm ➔ Rn as CP if each of its com-
ponent functions are at least CP. Thus a c0 map is continuous but not necessarily
differentiable, while a C00 map is continuous and can be differentiated as many
times as you like. Consider for example the function of one variable ¢ (x) = ix 31. This function is infinitely differentiable everywhere except at x = 0, where it is
differentiable twice but not three times; we therefore say that it is C2 . C00 maps are sometimes called smooth.

x2

/ ' ✓--- ......

I/ /

\ \

I

\

\

y

I

\

I

\

'

'-...

__

-

I
,./o/ pen

ball

xl
FIGURE 2.12 An open ball defined in Rn.

2.2 What Is a Manifold?

59

We will call two sets M and N diffeomorphic if there exists a C00 map ¢ : M ➔ N with a C00 inverse ¢-1 : N ➔ M; the map¢ is then called a diffeomorphism. This is the best notion we have that two spaces are "the same" as manifolds. For example, when we said that SO(2) was the same manifold as S1, we meant they were diffeomorphic. See Appendix B for more discussion.
These basic definitions may have been familiar to you, even if only vaguely remembered. We will now put them to use in the rigorous definition of a manifold. Unfortunately, a somewhat baroque procedure is required to formalize this relatively intuitive notion. We will first have to define the notion of an open set, on which we can put coordinate systems, and then sew the open sets together in an appropriate way.
We start with the notion of an open ball, which is the set of all points x in
Rn such that Ix - yl < r for some fixed y E Rn and r E R, where Ix - YI =
[~=i (xi -yi)2] 112 . Note that this is a strict inequality-the open ball is the interior of an n-sphere of radius r centered at y, as shown in Figure 2.12. An open set in Rn is a set constructed from an arbitrary (maybe infinite) union of open balls. In other words, V C Rn is open if, for any y E V, there is an open ball centered at y that is completely inside V. Roughly speaking, an open set is the interior of some (n - 1)-dimensional closed surface (or the union of several such interiors).
A chart or coordinate system consists of a subset U of a set M, along with a one-to-one map ¢ : U ➔ Rn, such that the image ¢ (U) is open in Rn, as in Figure 2.13. (Any map is onto its image, so the map¢ : U ➔ </J(U) is invertible if it is one-to-one.) We then can say that U is an open set in M. A C00 atlas is an indexed collection of charts {(Ua, </Ja)} that satisfies two conditions:

1. The union of the Ua is equal to M; that is, the Ua cover M.
2. The charts are smoothly sewn together. More precisely, if two charts overlap, UanUf3 =f- 0, then the map (¢ao¢·j1) takes points in ¢f3 (UanUf3) C Rn
onto an open set <Pa (Ua n Uf3) C Rn, and all of these maps must be C00
where they are defined. This should be clearer from Figure 2.14, adapted from Wald (1984).

So a chart is what we normally think of as a coordinate system on some open set, and an atlas is a system of charts that are smoothly related on their overlaps.
At long last, then: a C00 n-dimensional manifold (or n-manifold for short) is simply a set M along with a maximal atlas, one that contains every possible

M FIGURE 2.13 A coordinate chart covering an open subset U of M.

60

Chapter 2 Manifolds

M

These maps are only defined on the shaded regions, and must be smooth here.
FIGURE 2.14 Overlapping coordinate charts.

compatible chart. We can also replace C00 by CP in all the above definitions.

For our purposes the degree of differentiability of a manifold is not crucial; we

will always assume that any manifold is as differentiable as necessary for the

application under consideration. The requirement that the atlas be maximal is so

that two equivalent spaces equipped with different atlases don't count as different

manifolds. This definition captures in formal terms our notion of a set that looks

locally like Rn. Of course we will rarely have to make use of the full power of the

definition, but precision is its own reward.

One nice thing about our definition is that it does not rely on an embed-

ding of the manifold in some higher-dimensional Euclidean space. In fact, any

n-dimensional manifold can be embedded in R2n (Whitney's embedding theo-

rem), and sometimes we will make use of this fact, such as in our definition of the

sphere above. (A Klein bottle is an example of a 2-manifold that cannot be embedded in R3, although it can be embedded in R4.) But it is important to recognize

that the manifold has an individual existence independent of any embedding. It is

not necessary to believe, for example, that four-dimensional spacetime is stuck in

some larger space. On the other hand, it might be; we really don't know. Recent

advances in string theory have led to the suggestion that our visible universe is

actually a "brane" (generalization of "membrane'') inside a higher-dimensional

space. But as far as classical GR is concerned, the four-dimensional view is p~r-

fectly adequate.

'

Why was it necessary to be so finicky about charts and their overlaps, rather

than just covering every manifold with a single chart? Because most manifolds cannot be covered with just one chart. Consider the simplest example, S1. There
is a conventional coordinate system, 0 : S1 ➔ R, where 0 = 0 at the top of

the circle and wraps around to 2n. However, in the definition of a chart we have
required that the image 0 (S1) be open in R. If we include either 0 = 0 or 0 = 2n,

we have a closed interval rather than an open one; if we exclude both points, we

2.2 What Is a Manifold?

61

s1
FIGURE 2.15 Two coordinate charts, which together
cover s1.

haven't covered the whole circle. So we need at least two charts, as shown in
Figure 2.15. A somewhat more complicated example is provided by S2, where once again
no single chart will cover the manifold. A Mercator projection, traditionally used
for world maps, misses both the North and South poles (as well as the International Date Line, which involves the same problem with 0 that we found for S1.)
Let's take S2 to be the set of points in R3 defined by (x 1) 2 + (x2)2 + (x 3)2 = 1.
We can construct a chart from an open set U1, defined to be the sphere minus the
north pole, via stereographic projection, illustrated in Figure 2.16. Thus, we draw
a straight line from the north pole to the plane defined by x 3 = -1, and assign to
the point on S2 intercepted by the line the Cartesian coordinates (y 1, y2) of the
appropriate point on the plane. Explicitly, the map is given by

1
¢1 (x '

2
x '

3
x )

-=

1
(y '

2
y )

-
-

( 2xl 1 - x3 '

2x2 ) 1 - x3 •

(2.9)

Check this for yourself. Another chart (U2, ¢2) is obtained by projecting from the
south pole to the plane defined by x 3 = +1. The resulting coordinates cover the
sphere minus the south pole, and are given by

1 2 3
</J2(x 'x 'x )

=

1 2
(z 'z )

=

(

12+xlx3'

12+x2x3 )

•

(2.10)

Together, these two charts cover the entire manifold, and they overlap in the region
-1 < x 3 < +1. Another thing you can check is that the composition ¢2 o ¢11 is
given by

i

4yi

+ Z = [(yl )2 (y2)2]'

(2.11)

FIGURE 2.16 Defining a stereographic coordinate chart on S2 by projecting from the north pole down to a plane tangent to the south pole. Such a chart covers all of the sphere except for the north pole itself.

62

Chapter 2 Manifolds

FIGURE 2.17 The chain rule relates the partial derivatives of go f to those of g and f.

and is C00 in the region of overlap. As long as we restrict our attention to this region, (2.11) is just what we normally think of as a change of coordinates.
We therefore see the necessity of charts and atlases: Many manifolds cannot be covered with a single coordinate system. Nevertheless, it is often convenient to work with a single chart, and just keep track of the set of points that aren't
included. One piece of conventional calculus that we will need later is the chain rule.
Let us imagine that we have maps f : Rm ➔ Rn and g : Rn ➔ Rt, and therefore the composition (go f) : Rm ➔ Rt, as shown in Figure 2.17. We can label points in each space in terms of components: xa on Rm, yb on Rn, and zc on Rt, where
the indices range over the appropriate values. The chain rule relates the partial derivatives of the composition to the partial derivatives of the individual maps:

• (2.12)

This is usually abbreviated to

(2.13)

There is nothing illegal or immoral about using this shorthand form of the chain
rule, but you should be able to visualize the maps that underlie the construction.
Recall that when m = n, the determinant of the matrix ayb ;axa is called the
Jacobian of the map, and the map is invertible whenever the Jacobian is nonzero.

2 .3 Vectors Again

63

2.3 ■ VECTORS AGAIN

Having constructed this groundwork, we can now proceed to introduce various kinds of structure on manifolds. We begin with vectors and tangent spaces. In our discussion of special relativity we were intentionally vague about the definition of vectors and their relationship to the spacetime. One point we stressed was the notion of a tangent space-the set of all vectors at a single point in spacetime. The reason for this emphasis was to remove from your minds the idea that a vector stretches from one point on the manifold to another, but instead is just an object associated with a single point. What is temporarily lost by adopting this view is a way to make sense of statements like "the vector points in the x direction"-if the tangent space is merely an abstract vector space associated with each point, it's hard to know what this should mean. Now it's time to fix the problem.
Let's imagine that we wanted to construct the tangent space at a point p in a manifold M, using only things that are intrinsic to M (no embeddings in higherdimensional spaces). A first guess might be to use our intuitive knowledge that there are objects called "tangent vectors to curves," which belong in the tangent space. We might therefore consider the set of all parameterized curves through p-that is, the space of all (nondegenerate) maps y : R ➔ M, such that pis in the image of y. The temptation is to define the tangent space as simply the space of all tangent vectors to these curves at the point p. But this is obviously cheating; the tangent space Tp is supposed to be the space of vectors at p, and before we have defined this we don't have an independent notion of what "the tangent vector to a curve" is supposed to mean. In some coordinate system xtL any curve through p defines an element of Rn specified by the n real numbers dxtL / d)., (where )., is the parameter along the curve), but this map is clearly coordinate-dependent, which is not what we want.
Nevertheless we are on the right track, we just have to make things independent of coordinates. To this end we define :F to be the space of all smooth functions on M (that is, C00 maps f : M ➔ R). Then we notice that each curve through p defines an operator on this space, the directional derivative, which maps f ➔ df / d)., (at p). We will make the following claim: the tangent space Tp can be identified with the space of directional derivative operators along curves through p. To establish this idea we must demonstrate two things: first, that the space of directional derivatives is a vector space, and second that it is the vector space we want (it has the same dimensionality as M, yields a natural idea of a vector pointing along a certain direction, and so on).
The first claim, that directional derivatives form a vector space, seems straightforward enough. Imagine two operators d/d)., and d/drJ representing derivatives along two curves xtL().,) and xtL(rJ) through p. There is no problem adding these
and scaling by real numbers, to obtain a new operator a(d/d).,) + b(d/drJ). It is
not immediately obvious, however, that the space closes; in other words, that the
resulting operator is itself a derivative operator. A good derivative operator is one
that acts linearly on functions, and obeys the conventional Leibniz (product) rule on products of functions. Our new operator is manifestly linear, so we need to

64

Chapter 2 Manifolds

FIGURE 2.18 Partial derivatives define directional derivatives along curves that keep all of the other coordinates constant.

verify that it obeys the Leibniz rule. We have

(

add).,

+ bd- ) drJ

(Jg)= afd-g +agd-f +bfdg- +bgd-f

d).,

d).,

drJ

drJ

= (a df + b df) g + (a dg + b dg) f. (2.14)

d)., drJ

d)., drJ

As we had hoped, the product rule is satisfied, and the set of directional derivatives is therefore a vector space.
Is it the vector space that we would like to identify with the tangent space? The easiest way to become convinced is to find a basis for the space. Consider again a coordinate chart with coordinates xtL. Then there is an obvious set of n directional derivatives at p, namely the partial derivatives 3µ, at p, as shown in Figure 2.18. Note that this is really the definition of the partial derivative with respect to xtL:
the directional derivative along a curve defined by xv = constant for all v =j=. µ,
parameterized by xtL itself. We are now going to claim that the partial derivative operators {aµ,} at p form a basis for the tangent space Tp. (It follows immediately that Tp is n-dimensional, since that is the number of basis vectors.) To see this we will show that any directional derivative can be decomposed into a sum of real numbers times partial derivatives. This will just be the familiar expression for the components of a tangent vector, but it's nice to see it from the big-machinery approach. Consider an n-manifold M, a coordinate chart</> : M ➔ Rn, a curve y : R ➔ M, and a function f : M ➔ R. This leads to the tangle of maps shown in Figure 2.19. If)., is the parameter along y, we want to express the vector/operator
djd)., in terms of the partials aw Using the chain rule (2.12), we have

d

d

d)., f = d)., (f o y)

= ~[(f o ¢-1) 0 (</> 0 y)]
d).,

2.3 Vectors Again

65

I ,-/,
n J'o/
th __
FIGURE 2.19 Decomposing the tangent vector to a curve y : R ➔ Min terms of partial derivatives with respect to coordinates on M.

(2.15)

The first line simply takes the informal expression on the left-hand side and
rewrites it as an honest derivative of the function (f o y) : R ➔ R. The second line just comes from the definition of the inverse map ¢-1 (and associativity
of the operation of composition). The third line is the formal chain rule (2.12),
and the last line is a return to the informal notation of the start. Since the function
f was arbitrary, we have

- - - a d dxtL
d)., - d)., w

(2.16)

Thus, the partials {3µ,} do indeed represent a good basis for the vector space of directional derivatives, which we can therefore safely identify with the tangent space.
Of course, the vector represented by djd)., is one we already know; it's the tangent vector to the curve with parameter).,. Thus (2.16) can be thought of as a restatement of equation (1.38), where we claimed that the components of the tangent vector were simply dxtL jd).,. The only difference is that we are working
on an arbitrary manifold, and we have specified our basis vectors to bee(µ,) = aw This particular basis (e(µ,) = 3µ,) is known as a coordinate basis for Tp; it
is the formalization of the notion of setting up the basis vectors to point along the coordinate axes. There is no reason why we are limited to coordinate bases when we consider tangent vectors. For example, the coordinate basis vectors are typically not normalized to unity, nor orthogonal to each other, as we shall see

66

Chapter 2 Manifolds

shortly. This is not a situation we can define away; on a curved manifold, a coordinate basis will never be orthonormal throughout a neighborhood of any point where the curvature does not vanish. Of course we can define noncoordinate orthonormal bases, for example by giving their components in a coordinate basis, and sometimes this technique is useful. However, coordinate bases are very simple and natural, and we will use them almost exclusively throughout the book; for a look at orthonormal bases, see Appendix J. (It is standard in the study of vector analysis in three-dimensional Euclidean space to choose orthonormal bases rather than coordinate bases; you should therefore be careful when applying formulae from GR texts to the study of non-Cartesian coordinates in flat space.)
One of the advantages of the rather abstract point of view we have taken toward vectors is that the transformation law under changes of coordinates is immediate.
Since the basis vectors are e(µ,) = aµ,, the basis vectors in some new coordinate
system xtL' are given by the chain rule (2.13) as

axtL aµ,,=--, aµ,.
axtL

(2.17)

We can get the transformation law for vector components by the same technique
used in flat space, demanding that the vector V = VJ.Laµ, be unchanged by a
change of basis. We have

vµ, aµ,

=

I
VIL aµ,,

I axJ.L =VIL--, aµ,,
axtL

(2.18)

and hence, since the matrix axµ,'/ axtL is the inverse of the matrix axtL /axµ,',

I
VIL

=

axJ.L' --VIL.

axtL

(2.19)

Since the basis vectors are usually not written explicitly, the rule (2.19) for trans-

forming components is what we call the "vector transformation law." We notice

that it is compatible with the transformation of vector components in special rel-

ativity

under Lorentz transformations,

VIL'

=

1
AtL µ,VIL,

since

a Lorentz

transfor-

mation is

a

special kind

of coordinate transformation,

with xtL'

=

1
AtL µ,xtL.

But

(2.19) is much more general, as it encompasses the behavior of vectors under arbi-

trary changes of coordinates (and therefore bases), not just linear transformations.

As usual, we are trying to emphasize a somewhat subtle ontological distinction-

in principle, tensor components need not change when we change coordinates,

they change when we change the basis in the tangent space, but we have decided

to use the coordinates to define our basis. Therefore a change of coordinates in-

duces a change of basis, as indicated in Figure 2.20.

Since a vector at a point can be thought of as a directional derivative operator

along a path through that point, it should be clear that a vector field defines a map

from smooth functions to smooth functions all over the manifold, by taking a

2 .3 Vectors Again

67

FIGURE 2.20 A change of coordinates xtL ➔ xtL' induces a change of basis in the tangent space.

derivative at each point. Given two vector fields X and Y, we can therefore define
their commutator [X, Y] by its action on a function f (xtL):

[X, Y](J) = X(Y(J)) - Y(X(f)).

(2.20)

The virtue of the abstract point of view is that, clearly, this operator is independent
of coordinates. In fact, the commutator of two vector fields is itself a vector field:
if f and g are functions and a and b are real numbers, the commutator is linear,

= [X, Y](af + bg) a[X, Y](J) + b[X, Y](g),

(2.21)

and obeys the Leibniz rule,
[X, Y](fg) = f[X, Y](g) + g[X, Y](f).

(2.22)

Both properties are straightforward to check, which is a useful exercise to do. An equally interesting exercise is to derive an explicit expression for the components of the vector field [X, Y]tL, which turns out to be

(2.23)

By construction this is a well-defined tensor; but you should be slightly worried by the appearance of the partial derivatives, since partial derivatives of vectors are not well-defined tensors (as we discuss in the next section). Yet another fascinating exercise is to perform explicitly a coordinate transformation on the expression (2.23), to verify that all potentially nontensorial pieces cancel and the result transforms like a vector field. The commutator is a special case of the Lie derivative, discussed in Appendix B; it is sometimes referred to as the Lie bracket. Note that since partials commute, the commutator of the vector fields given by the partial derivatives of coordinate functions, {3µ,}, always vanishes.

68

Chapter 2 Manifolds

2.4 ■ TENSORS AGAIN

Having explored the world of vectors, we continue to retrace the steps we took in flat space, and now consider dual vectors (one-forms). Once again the cotangent space r; can be thought of as the set of linear maps w : Tp ➔ R. The canonical example of a one-form is the gradient of a function f, denoted df, as in (1.52). Its action on a vector d / d)., is exactly the directional derivative of the function:

d) df df ( d)., = d).,.

(2.24)

It's tempting to ask, "why shouldn't the function f itself be considered the oneform, and df/d)., its action?" The point is that a one-form, like a vector, exists only at the point it is defined, and does not depend on information at other points on M. If you know a function in some neighborhood of a point, you can take its derivative, but not just from knowing its value at the point; the gradient, on the other hand, encodes precisely the information necessary to take the directional derivative along any curve through p, fulfilling its role as a dual vector.
You may have noticed that we defined vectors using structures intrinsic to the manifold (directional derivatives along curves), and used that definition to define one-forms in terms of the dual vector space. This might lead to the impression that vectors are somehow more fundamental; in fact, however, we could just as well have begun with an intrinsic definition of one-forms and used that to define vectors as the dual space. Roughly speaking, the space of one-forms at pis equivalent to the space of all functions that vanish at p and have the same second partial derivatives. In fact, doing it that way is more fundamental, if anything, since we can provide intrinsic definitions of all q-forms (totally antisymmetric tensors with q lower indices), which we will discuss in Section 2.9 (although we will not delve into the specifics of the intrinsic definitions).
Just as the partial derivatives along coordinate axes provide a natural basis for the tangent space, the gradients of the coordinate functions xtL provide a natural basis for the cotangent space. Recall that in flat space we constructed a basis for
r; by demanding that§(µ,) (e(v)) = 8~. Continuing the same philosophy on an
arbitrary manifold, we find that (2.24) leads to

axtL

dxlL(a ) = V

-axv

=

otL. V

(2.25)

Therefore the gradients {dx tL} are an appropriate set of basis one-forms; an arbitrary one-form is expanded into components as w = wµ, dx tL.
The transformation properties of basis dual vectors and components follow from what is by now the usual procedure. We obtain, for basis one-forms,

= dx,.,.,,' -adX µx,' ,..I,I 3xtL

(2.26)

and for components,

2.4 Tensors Again

69

(2.27)

We will usually write the components wµ, when we speak about a one-form w. Just as in flat space, a (k, l) tensor is a multilinear map from a collection of
k dual vectors and l vectors to R. Its components in a coordinate basis can be obtained by acting the tensor on basis one-forms and vectors,
(2.28)
This is equivalent to the expansion
(2.29)
The transformation law for general tensors follows the same pattern of replacing the Lorentz transformation matrix used in flat space with a matrix representing more general coordinate transformations:

(2.30)

This tensor transformation law is straightforward to remember, since there really isn't anything else it could be, given the placement of indices.
Actually, however, it is often easier to transform a tensor by taking the identity of basis vectors and one-forms as partial derivatives and gradients at face value, and simply substituting in the coordinate transformation. As an example, consider a symmetric (0, 2) tensor Son a two-dimensional manifold, whose components in a coordinate system (x 1 = x, x 2 = y) are given by

(2.31)

This can be written equivalently as
S = Sµ,v(dxlL ® dxv) = (dx)2 + x2(dy)2,

(2.32)

where in the last line the tensor product symbols are suppressed for brevity (as will become our custom). Now consider new coordinates

I 2X X =-
y yI = -y
2

(2.33)

70

Chapter 2 Manifolds

(valid, for example, when x > 0, y > 0). These can be immediately inverted to obtain

= X X 1y1 y = 2y'.

(2.34)

Instead of using the tensor transformation law, we can simply use the fact that we know how to take derivatives to express dxtt in terms of dxtt'. We have

dx = y' dx' +x' dy' dy = 2dy'.

(2.35)

We need only plug these expressions directly into (2.32) to obtain (remembering that tensor products don't commute, so dx' dy' =j=. dy' dx'):

or

Sµ,'v' = ( (x~y')~

x'y'

)

(x') 2 + 4(x'y')2 •

(2.37)

Notice that it is still symmetric. We did not use the transformation law (2.30) directly, but doing so would have yielded the same result, as you can check.
For the most part the various tensor operations we defined in flat space are unaltered in a more general setting: contraction, symmetrization, and so on. There are three important exceptions: partial derivatives, the metric, and the Levi-Civita tensor. Let's look at the partial derivative first.
Unfortunately, the partial derivative of a tensor is not, in general, a new tensor. The gradient, which is the partial derivative of a scalar, is an honest (0, 1) tensor, as we have seen. But the partial derivative of higher-rank tensors is not tensorial, as we can see by considering the partial derivative of a one-form, aµ, Wv, and changing to a new coordinate system:

(2.38)
The second term in the last line should not be there if 3µ, Wv were to transform as a (0, 2) tensor. As you can see, it arises because the derivative of the transformation matrix does not vanish, as it did for Lorentz transformations in flat space.
Differentiation is obviously an important tool in physics, so we will have to invent new tensorial operations to take the place of the partial derivative. In fact we will invent several: the exterior derivative, the covariant derivative, and the Lie derivative.

2.5 The metric

71

2.5 ■ THE METRIC

The metric tensor is such an important object in curved space that it is given a new
symbol, gµ,v (while f/µ,v is reserved specifically for the Minkowski metric). There are few restrictions on the components of gµ,v, other than that it be a symmetric (0, 2) tensor. It is usually, though not always, taken to be nondegenerate, meaning
that the determinant g = lgµ,v I doesn't vanish. This allows us to define the inverse
metric gtL v via

(2.39)

The symmetry of gµ,v implies that gtLv is also symmetric. Just as in special relativity, the metric and its inverse may be used to raise and lower indices on tensors. You may be familiar with the notion of a "metric" used in the study of topology, where we also demand that the metric be positive-definite (no negative eigenvalues). The metric we use in general relativity cannot be used to define a topology, but it will have other uses.
It will take some time to fully appreciate the role of the metric in all of its glory, but for purposes of inspiration [following Sachs and Wu (1977)] we can list the various uses to which gµ,v will be put: (1) the metric supplies a notion of "past" and "future"; (2) the metric allows the computation of path length and proper time; (3) the metric determines the "shortest distance" between two points, and therefore the motion of test particles; (4) the metric replaces the Newtonian gravitational field¢; (5) the metric provides a notion of locally inertial frames and therefore a sense of "no rotation"; (6) the metric determines causality, by defining the speed of light faster than which no signal can travel; (7) the metric replaces the traditional~uclidean three-dimensional dot product of Newtonian mechanics. Obviously these ideas are not all completely independent, but we get some sense of the importance of this tensor.
In our discussion of path lengths in special relativity we (somewhat handwav-
ingly) introduced the line element as ds 2 = f/µ,vdxtLdxv, which was used to get
the length of a path. Of course now that we know that dxtL is really a basis dual vector, it becomes natural to use the terms "metric" and "line element" interchangeably, and write
(2.40)

To be perfectly consistent we should write this as "g," and sometimes will, but more often than not g is used for the determinant lgµ,vl, For example, we know that the Euclidean line element in a three-dimensional space with Cartesian coordinates is
(2.41)

We can now change to any coordinate system we choose. For example, in spherical coordinates we have

72

Chapter 2 Manifolds

x = r sin 0 cos ¢ y = r sine sin¢ z = r cos 0,

(2.42)

which leads directly to
= ds 2 dr2 + r2 d0 2 + r 2 sin2 0 d¢2.

(2.43)

Obviously the components of the metric look different than those in Cartesian coordinates, but all of the properties of the space remain unaltered.
Most references are not sufficiently picky to distinguish between "dx ," the informal notion of an infinitesimal displacement, and "dx ," the rigorous notion of a basis one-form given by the gradient of a coordinate function. (They also tend to neglect the fact that tensor products don't commute, and write expressions like
dxdy + dydx as 2dxdy; it should be clear what is meant from the context.) In
fact our notation "ds 2" does not refer to the differential of anything, or the square of anything; it's just conventional shorthand for the metric tensor, a multilinear map from two vectors to the real numbers. Thus, we have a set of equivalent expressions for the inner product of two vectors V i,i and wv:

(2.44)

Meanwhile, "(dx) 2" refers specifically to the honest (0, 2) tensor dx ® dx.
A good example of a non-Euclidean manifold is the two-sphere, which can be thought of as the locus of points in R3 at distance 1 from the origin. The metric
in the (0, ¢) coordinate system can be derived by setting r = 1 and dr = 0 in
(2.43):

ds 2 = d02 + sin2 0 d¢2.

(2.45)

This is completely consistent with the interpretation of ds as an infinitesimal length, as illustrated in Figure 2.21. Anyone paying attention should at this point
be asking, "What in the world does it mean to set dr = O? We know that dr
is a well-defined nonvanishing one-form field." As occasionally happens, we are using sloppy language to motivate a step that is actually quite legitimate; see Appendix A for a discussion of how submanifolds inherit metrics from the spaces in which they are embedded.
As we shall see, the metric tensor contains all the information we need to describe the curvature of the manifold (at least in what is called Riemannian geometry; we will get into some of the subtleties in the next chapter). In Minkowski space we can choose coordinates in which the components of the metric are constant; but it should be clear that the existence of curvature is more subtle than having the metric depend on the coordinates, since in the example above we showed how the metric in flat Euclidean space in spherical coordinates is a function of r and 0. Later, we shall see that constancy of the metric components is sufficient for a space to be flat, and in fact there always exists a coordinate system on any

2.5 The metric

73

FIGURE 2.21 The line element on a two-dimensional sphere.

flat space in which the metric is constant. But we might not know how to find such a coordinate system, and there are many ways for a space to deviate from flatness; we will therefore want a more precise characterization of the curvature, which will be introduced later.
A useful characterization of the metric is obtained by putting gµ,v into its canonical form. In this form the metric components become

gµ,v = diag (-1, -1, ... , -1, +l, +1, ... , +l, 0, 0, ... , 0),

(2.46)

where "diag" means a diagonal matrix with the given elements. The signature of the metric is the number of both positive and negative eigenvalues; we speak of "a metric with signature minus-plus-plus-plus" for Minkowski space, for example. If any of the eigenvalues are zero, the metric is "degenerate," and the inverse metric will not exist; if the metric is continuous and nondegenerate, its signature will be the same at every point. We will always deal with continuous, nondegenerate metrics. If all of the signs are positive, the metric is called Euclidean or Riemannian (or just positive definite), while if there is a single minus it is called Lorentzian or pseudo-Riemannian, and any metric with some +1 'sand some -l's is called indefinite. (So the word Euclidean sometimes means that the space is flat, and sometimes doesn't, but it always means that the canonical form is strictly positive; the terminology is unfortunate but standard.) The spacetimes of interest in general relativity have Lorentzian metrics.
We haven't yet demonstrated that it is always possible to put the metric into canonical form. In fact it is always possible to do so at some point p E M, but in general it will only be possible at that single point, not in any neighborhood of p. Actually we can do slightly better than this; it turns out that at any point p there exists a coordinate system xfl in which g(lo takes its canonical form and the first derivatives 3a,grio all vanish (while the second derivatives a,3aa,grio cannot be

74

Chapter 2 Manifolds

made to all vanish):

(2.47)

Such coordinates are known as locally inertial coordinates, and the associated basis vectors constitute a local Lorentz frame; we often put hats on the indices when we are in these special coordinates. Notice that in locally inertial coordinates the metric at p looks like that of flat space to first order. This is the rigorous notion of the idea that "small enough regions of spacetime look like flat (Minkowski) space." Also, there is no difficulty in simultaneously constructing sets of basis vectors at every point in M such that the metric takes its canonical form; the problem is that in general there will not be a coordinate system from which this basis can be derived. Bases of this sort are discussed in Appendix J.
We will delay a discussion of how to construct locally inertial coordinates until Chapter 3. It is useful, however, to see a sketch a proof of their existence for the specific case of a Lorentzian metric in four dimensions. The idea is to consider the transformation law for the metric

ax/-L axv gp_,o = axµ, ax 0 gµ,v,

(2.48)

and expand both sides in Taylor series in the sought-after coordinates xµ,. The expansion of the old coordinates x J-L looks like

(2.49)

with the other expansions proceeding along the same lines. [For simplicity we
have set xJ-L(p) = xµ,(p) = 0.] Then, using some extremely schematic notation,
the expansion of (2.48) to second order is

(2.50)

!>t !; !;:; = (:;

;x2
+ ( a~ g +

ag) /

ax a3x

a2x a2x

ax a2x A

ax ax AA )

AA

+ ( ax a.xaxax g + axax axax g + ax axax ag + ax ax aag p xx.

We can set terms of equal order in .x on each side equal to each other. Therefore, the components gp_,o(P), 10 numbers in all (to describe a symmetric two-index tensor), are determined by the matrix (axJ-L ;axfL)p, This is a 4 x 4

2.5 The metric

75

matrix with no constraints; thus, we are free to choose 16 numbers. Clearly this is enough freedom to put the 10 numbers of gp_,o(p) into canonical form, at least as far as having enough degrees of freedom is concerned. (In fact there are some limitations-if you go through the procedure carefully, you find for example that you cannot change the signature.) The six remaining degrees of freedom can be interpreted as exactly the six parameters of the Lorentz group; we know that these leave the canonical form unchanged. At first order we have the derivatives 3a,gp_,o(p), four derivatives of ten components for a total of 40 numbers. But looking at the right-hand side of (2.50) we see that we now have the additional freedom to choose (3 2xtt /3xµ, 1 axµ,2 )p, In this set of numbers there are 10 independent choices of the indices fl1 and fl2 (it's symmetric, since partial derivatives commute) and four choices ofµ,, for a total of 40 degrees of freedom. This is precisely the number of choices we need to determine all of the first derivatives of the metric, which we can therefore set to zero. At second order, however, we are concerned with 3,33a,gp_,o(p); this is symmetric inf; and & as well as fl and
v, for a total of 10 x 10 = 100 numbers. Our ability to make additional choices
is contained in (3 3xtt;axP.,1axii2axµ,3)p- This is symmetric in the three lower indices, which gives 20 possibilities, times four for the upper index gives us 80 degrees of freedom-20 fewer than we require to set the second derivatives of the metric to zero. So in fact we cannot make the second derivatives vanish; the deviation from :flatness must therefore be measured by the 20 degrees of freedom representing the second derivatives of the metric tensor field. We will see later how this comes about, when we characterize curvature using the Riemann tensor, which will tum out to have 20 independent components in four dimensions.
Locally inertial coordinates are unbelievably useful. Best of all, their usefulness does not generally require that we actually do the work of constructing such coordinates (although we will give a recipe for doing so in the next chapter), but simply that we know that they do exist. The usual trick is to take a question of physical interest, answer it in the context of locally inertial coordinates, and then express that answer in a coordinate-independent form. Take a very simple example, featuring an observer with four-velocity Up., and a rocket flying past with four-velocity V µ,. What does the observer measure as the ordinary three-velocity of the rocket? In special relativity the answer is straightforward. Work in inertial coordinates (globally, not just locally) such that the observer is in the rest frame and the rocket is moving along the x-axis. Then the four-velocity of the observer
is Up., = (1, 0, 0, 0) and the four-velocity of the rocket is yP., = (y, vy, 0, 0), where vis the three-velocity and y = 1 / ~ , so that v = J1 - y-2. Since
we are in flat spacetime (for the moment), we have

since 1100 = -1. The :flat-spacetime answer would therefore be

(2.51)

(2.52)

76

Chapter 2 Manifolds

Now we can go back to curved spacetime, where the metric is no longer flat. But at the point where the measurement is being done, we are free to use locally inertial coordinates, in which case the components of gp.,v are precisely those of rJµ,v. So (2.52) is still true in curved spacetime in this particular coordinate system. But (2.52) is a completely tensorial equation, which doesn't care what coordinate system we are in; therefore it is true in complete generality. This kind of procedure will prove its value over and over.

2.6 ■ AN EXPANDING UNIVERSE

A simple example of a nontrivial Lorentzian geometry is provided by a fourdimensional cosmological spacetime with metric

(2.53)

This describes a universe for which "space at a fixed moment of time" is a flat three-dimensional Euclidean space, which is expanding as a function of time. Worldlines that remain at constant spatial coordinates xi are said to be comoving; similarly, we denote a region of space that expands along with boundaries defined by fixed spatial coordinates as a "comoving volume." Since the metric describes (distance)2, the relative distance between comoving points is growing as a(t) in this spacetime; the function a is called the scale factor. This is a special case of a Robertson-Walker metric, one in which spatial slices are geometrically flat; there are other cases for which spatial slices are curved (as we will discuss in Chapter 8). But our interest right now is not in where this metric came from, but in using it as a playground to illustrate some of the ideas we have developed.
Typical solutions for the scale factor are power laws,

0<q<l.

(2.54)

t, Actually there are all sorts of solutions, but these are some particularly simple and
relevant ones. A matter-dominated flat universe satisfies q = while a radiationdominated flat universe satisfies q = ½. An obvious feature is that the scale factor
goes to zero as t ➔ 0, and along with it the spatial components of the metric.
This is a coordinate-dependent statement, and in principle there might be another
coordinate system in which everything looks finite; in this case, however, t = 0
represents a true singularity of the geometry (the "Big Bang"), and should be
excluded from the manifold. The range of the t coordinate is therefore

0 < t < 00.

(2.55)

Our spacetime comes to an end at t = 0.
Light cones in this curved geometry are defined by null paths, those for which
ds 2 = 0. We can draw a spacetime diagram by considering null paths for which

2.6 An Expanding Universe
y and z are held constant; then
0 = -dt2 + t2q d.x 2 ,
which implies

77 (2.56)

(2.57)

You might worry that, after all that fuss about d.xtt being a basis one-form and not a differential, we have sloppily "divided by dt2" to go from (2.56) to (2.57). The
truth is much more respectable. What we actually did was to take the (0, 2) tensor
defined by (2.56), which takes two vectors and returns a real number, and act it on
two copies of the vector V = (dxtt /d)..)3µ,, the tangent vector to a curve xtt()..).
Consider just the dt2 piece acting on V:

dt 2(V, V) = (dt ® dt)(V, V) = dt(V) •dt(V),

(2.58)

where the notation dt (V) refers to a real number that we compute as
a") dt (V) = dt ( d::

=

dxtt -d)d ..

t

(aµ,)

dxtt at =d).- , a- xtt

dt
= d)..'

(2.59)

where in the third line we have invoked (2.25). Following the same procedure with d.x 2 , we find that (2.56) implies

2

2

0= -

(

-dt d)..

)

+t2q (d- x) d)..

'

(2.60)

from which (2.57) follows via the one-dimensional chain rule,

dx dx d).. dt d).. dt

(2.61)

The lesson should be clear: expressions such as (2.56) describe well-defined tensors, but manipulation of the basis one-forms as if they were simply "differentials" does get you the right answer. (At least, most of the time; it's a good idea to keep the more formal definitions in mind.)
We can solve (2.57) to obtain

(2.62)

78

Chapter 2 Manifolds

FIGURE 2.22 Spacetime diagram for a flat Robertson-Walker universe with a(t) ex: tq, for O < q < l. The dashed line at the bottom of the figure represents the singularity at
t = 0. Since light cones are tangent to the singularity, the pasts of two points may be
nonoverlapping.
where xo is a constant of integration. These curves define the light cones of our expanding universe, as plotted in Figure 2.22. Since we have assumed O < q < 1,
the light cones are tangent to the singularity at t = 0. A crucial feature of this
geometry is that the light cones of two points need not intersect in the past; this is in contrast to Minkowski space, for which the light cones of any two points always intersect in both the past and future. We say that every event defines an "horizon," outside of which there exist worldlines that can have had no influence on what happens at that event. This is because, since nothing can travel faster than light, each point can only be influenced by events that are either on, or in the interior of, its past light cone (indeed, we refer to the past light cone plus its interior as simply "the past" of an event). Two events outside each others' horizons are said to be "out of causal contact." These notions will be explored more carefully in the next section, as well as in Chapters 4 and 8.
2.7 ■ CAUSALITY
Many physical questions can be cast as an initial-value problem: given the state of a system at some moment in time, what will be the state at some later time? The fact that such questions have definite answers is due to causality, the idea that future events can be understood as consequences of initial conditions plus the laws of physics. Ini~al-value problems are as common in GR as in Newtonian physics or special relativity; however, the dynamical nature of the spacetime background introduces new ways in which an initial-value formulation could break down. Here we very briefly introduce some of the concepts used in understanding how causality works in GR.

2.7 Causality

79

We will look at the problem of evolving matter fields on a fixed background spacetime, rather than the evolution of the metric itself. Our guiding principle will be that no signals can travel faster than the speed of light; therefore information will only flow along timelike or null trajectories (not necessarily geodesics). Since it is sometimes useful to distinguish between purely timelike paths and ones that are merely non-spacelike, we define a causal curve to be one which is timelike or null everywhere. Then, given any subset Sofa manifold M, we define the causal future of S, denoted 1+(S), to be the set of points that can be reached from S by following a future-directed causal curve; the chronological future 1+ (S) is the set of points that can be reached by following a future-directed timelike curve. Note that a curve of zero length is chronal but not causal; therefore, a point p will always be in its own causal future 1+ (p), but not necessarily its own chronological future 1+(p) (although it could be, as we mention below). The causal past 1- and chronological past 1- are defined analogously.
A subset S c M is called achronal if no two points in S are connected by a
timelike curve; for example, any edgeless spacelike hypersurface in Minkowski spacetime is achronal. Given a closed achronal set S, we define the future domain of dependence of S, denoted n+ (S), as the set of all points p such that every pastmoving inextendible causal curve through p must intersect S. (lnextendible just means that the curve goes on forever, not ending at some finite point; closed means that the complement of the set is an open set.) Elements of S itself are elements of n+(S). The past domain of dependence n-(S) is defined by replacing future with past. Generally speaking, some points in M will be in one of the domains of dependence, and some will be outside; we define the boundary of n+(S) to be the future Cauchy horizon H+(S), and likewise the boundary of n-(S) to be the past Cauchy horizon H-(S). You can convince yourself that they are both null surfaces. The domains of dependence and Cauchy horizons are illustrated in Figure 2.23, in which Sis taken to be a connected subset of an achronal surface}:,

FIGURE 2.23 A connected subset Sofa spacelike surface :E, along with its causal structure. n± (S) denotes the future/past domain of dependence of S, and H± (S) the future/past Cauchy horizon.

80

Chapter 2 Manifolds

FIGURE 2.24 The surface I: is everywhere spacelike but lies in the past of the past light cone of the point p; its domain of dependence is not all of the spacetime.

The usefulness of these definitions should be apparent; if nothing moves faster than light, signals cannot propagate outside the light cone of any point p. Therefore, if every curve that remains inside this light cone must intersect S, then information specified on S should be sufficient to predict what the situation is at p; that is, initial data for matter fields given on S can be used to solve for the value of the fields at p. The set of all points for which we can predict what happens by knowing what happens on Sis the union D(S) = n+(S) U n-(S), called simply the domain of dependence. A closed achronal surface ~ is said to be a Cauchy surface if the domain of dependence D(~) is the entire manifold; from information given on a Cauchy surface, we can predict what happens throughout all of spacetime. If a spacetime has a Cauchy surface (which it may not), it is said to be globally hyperbolic.
Any set ~ that is closed, achronal, and has no edge, is called a partial Cauchy surface. A partial Cauchy surface can fail to be an actual Cauchy surface either through its own fault, or through a fault of the spacetime. One possibility is that we have just chosen a "bad" hypersurface (although it is hard to give a general prescription for when a hypersurface is bad in this sense). Consider Minkowski space, and an edgeless spacelike hypersurface ~. which remains to the past of the light cone of some point, as in Figure 2.24. In this case ~ is an achronal surface, but it is clear that n+(~) ends at the light cone, and we cannot use information on ~ to predict what happens throughout Minkowski space. Of course, there are other surfaces we could have picked for which the domain of dependence would have been the entire manifold, so this doesn't worry us too much.
A somewhat more nontrivial way for a Cauchy horizon to arise is through the appearance of closed timelike curves. In Newtonian physics, causality is enforced by the relentless forward march of an absolute notion of time. In special relativity things are even more restrictive; not only must you move forward in time, but the speed of light provides a limit on how swiftly you may move through space (you must stay within your forward light cone). In general relativity it remains true that you must stay within your forward light cone; however, this becomes strictly a local notion, as globally the curvature of spacetime might "tilt" light cones from one place to another. It becomes possible in principle for light cones to be sufficiently distorted that an observer can move on a forwarddirected path that is everywhere timelike and yet intersects itself at a point in its "past"-this is a closed timelike curve.
As a simple example, consider a two-dimensional geometry with coordinates
{t, x}, such that points with coordinates (t, x) and (t, x + l) are identified. The
topology is thus Rx S1. We take the metric to be

ds 2 = - cos(1c)dt2 - sin(1c)[dt dx + dx dt] + cos(1c)dx2 ,

(2.63)

where

A= coC1 t,

(2.64)

2.7 Causality

81

identify

closed timelike curve

FIGURE 2.25 A cylindrical spacetime with closed timelike curves. The light cones progressively tilt, such that the domain of dependence of the surface I: fills the lower part of the spacetime, but comes to an end when the closed timelike curves come into existence.

FIGURE 2.26 A singularity at p removes any points in its future from the domain of dependence of a surface I: in its past.

which goes from ).,(t = -oo) = 0 to ).,(t = oo) = n. This metric doesn't represent any special famous solution to general relativity, it was just cooked up to provide an interesting example of closed timelike curves; but there is a wellknown example known as Misner space, with similar properties. In the spacetime defined by (2.63), the light cones progressively tilt as you go forward in time, as shown in Figure 2.25. Fort < 0, the light cones point forward, and causality is maintained. Once t > 0, however, x becomes the timelike coordinate, and it is possible to travel on a timelike trajectory that wraps around the S1 and comes back to itself; this is a closed timelike curve. If we had specified a surface ~ to this past of this point, then none of the points in the region containing closed timelike curves are in the domain of dependence of ~, since the closed timelike curves themselves do not intersect ~. There is thus necessarily a Cauchy horizon
at the surface t = 0. This is obviously a worse problem than the previous one,
since a well-defined initial value problem does not seem to exist in this spacetime.
A final example is provided by the existence of singularities, points that are not in the manifold even though they can be reached by traveling along a geodesic for a finite distance. Typically these occur when the curvature becomes infinite at some point; if this happens, the point can no longer be said to be part of the spacetime. Such an occurrence can lead to the emergence of a Cauchy horizon, as depicted in Figure 2.26-a point p, which is in the future of a singularity, cannot be in the domain of dependence of a hypersurface to the past of the singularity, because there will be curves from p that simply end at the singularity.

82

Chapter 2 Manifolds

These obstacles can also arise in the initial value problem for GR, when we try to evolve the metric itself from initial data. However, they are of different degrees of troublesomeness. The possibility of picking a "bad" initial hypersurface does not arise very often, especially since most solutions are found globally (by solving Einstein's equation throughout spacetime). The one situation in which you have to be careful is in numerical solution of Einstein's equation, where a bad choice of hypersurface can lead to numerical difficulties, even if in principle a complete solution exists. Closed timelike curves seem to be something that GR works hard to avoid-there are certainly solutions that contain them, but evolution from generic initial data does not usually produce them. Singularities, on the other hand, are practically unavoidable. The simple fact that the gravitational force is always attractive tends to pull matter together, increasing the curvature, and generally leading to some sort of singularity. Apparently we must learn to live with this, although there is some hope that a well-defined theory of quantum gravity will eliminate (or at least teach us how to deal with) the singularities of classical GR.

2.8 ■ TENSOR DENSITIES

Tensors possess a compelling beauty and simplicity, but there are times when it is useful to consider nontensorial objects. Recall that in Chapter 1 we introduced the completely antisymmetric Levi-Civita symbol, defined as

+ 1 Eµ,iJ.L2·"JJ,n = { -1
0

if JL1/.L2 • • •JLn is an even permutation of 01 •• • (n - 1),

if µ,1µ,2 •• •JLn is an odd permutation of 01 •• • (n - 1),

otherwise.

(2.65)

By definition, the Levi-Civita symbol has the components specified above in any

coordinate system (at least, in any right-handed coordinate system; switching the

handedness multiplies the components of EMM"·/J,n by an overall minus sign).

This is called a "symbol," of course, because it is not a tensor; it is defined not to

change under coordinate transformations. We were only able to treat it as a tensor

in inertial coordinates in flat spacetime, since Lorentz transformations would have

left the components invariant anyway. Its behavior can be related to that of an

ordinary tensor by first noting that, given any n x n matrix Mµ, µ,', the determinant

IMI obeys

(2.66)

This is just a streamlined expression for the determinant of any matrix, completely
equivalent to the usual formula in terms of matrices of cofactors. (You can check it
for yourself for 2x 2 or 3 x 3 matrices.) It follows that, setting Mµ, µ,' = axµ, /axµ,',
we have

2.8 Tensor Densities

83

axµ,' I

axJ-Ll ax/-L2

axJ-Ln

= Eµ,lµ,;••·J-L~

ax/-L E/-LIJ-L2···J-Ln axJ-L1 ax/-L; ••• axJ-L~'

I

(2.67)

where we have also used the facts that the matrix axµ,' ;axJ-L is the inverse of
ax J-L /axµ,', and that the determinant of an inverse matrix is the inverse of the
determinant, IM-11 = IMl-1. So the Levi-Civita symbol transforms in a way
close to the tensor transformation law, except for the determinant out front. Objects transforming in this way are known as tensor densities. Another example is
given by the determinant of the metric, g = Igµ,v 1- It's easy to check, by taking
the determinant of both sides of (2.48), that under a coordinate transformation we get

(2.68)

Therefore g is also not a tensor; it transforms in a way similar to the Levi-Civita symbol, except that the Jacobian is raised to the - 2 power. The power to which the Jacobian is raised is known as the weight of the tensor density; the Levi-Civita symbol is a density of weight 1, while g is a (scalar) density of weight -2.
However, we don't like tensor densities as much as we like tensors. There is a simple way to convert a density into an honest tensor-multiply by lglw/2, where w is the weight of the density (the absolute value signs are there because g < 0 for Lorentzian metrics). The result will transform according to the tensor transformation law. Therefore, for example, we can define the Levi-Civita tensor as

(2.69)

Since this is a real tensor, we can raise indices and so on. Sometimes people define a version of the Levi-Civita symbol with upper indices, EJ-LIJ-L2--·J-Ln, whose components are numerically equal to sgn(g )Eµ,iJ-L2···J-Ln, where sgn(g) is the sign of the metric determinant. This turns out to be a density of weight -1, and is related to the tensor with upper indices (obtained by using gJ-Lv to raise indices on
Eµ,11-,l2···!-,ln) by

~ l_ E /-,ll/-,l2·--l-,ln -_ _

E-µ,1µ,2···1-,ln .

(2.70)

Something you often end up doing is contracting p indices on EJ-LIJ-L2··•J-Ln with Eµ,iJ-L2···J-Ln; the result can be expressed in terms of an antisymmetrized product of Kronecker deltas as

/-Ll/-L2 ... /-,lp<XI···<Xn-p

E

E/-,lll-,l2···J-Lp/3i···/3n-p

-
-

(

-

l

)

s

p

. I

(n

_

p

)lda1 .uf3I

... r<Xn-p] uf3n-p ,

(2.71)

wheres is the number of negative eigenvalues of the metric (for Lorentzian signature with our conventions, s = 1). The most common example is p = n - l,

84

Chapter 2 Manifolds

for which we have

(2.72)

2.9 ■ DIFFERENTIAL FORMS
Let us now introduce a special class of tensors, known as differential forms (or just forms). A differential p-form is simply a (0, p) tensor that is completely antisymmetric. Thus, scalars are automatically 0-forms, and dual vectors are automatically one-forms (thus explaining this terminology from before). We also have the the 4-form Eµ, vpa . The space of all p-forms is denoted AP, and the space of all p-form fields over a manifold Mis denoted AP (M). A semi-straightforward exercise in combinatorics reveals that the number of linearly independent p-forms on an n-dimensional vector space is n!/(p!(n - p)!). So at a point on a fourdimensional spacetime there is one linearly independent 0-form, four 1-forms, six 2-forms, four 3-forms, and one 4-form. There are no p-forms for p > n, since all of the components will automatically be zero by antisymmetry.
Why should we care about differential forms? This question is hard to answer without some more work, but the basic idea is that forms can be both differentiated and integrated, without the help of any additional geometric structure. We will glance briefly at both of these operations.
Given a p-form A and a q-form B, we can form a (p + q)-form known as the
wedge product A I\ B by taking the antisymmetrized tensor product:

(2.73)

Thus, for example, the wedge product of two 1-forms is

(2.74)

Note that

(2.75) •

so you can alter the order of a wedge product if you are careful with signs. We are

free to suppress indices when using forms, since we know that all of the indices

are downstairs and the tensors are completely antisymmetric.

•

The exterior derivative d allows us to differentiate p-form fields to obtain
(p + 1)-form fields. It is defined as an appropriately normalized, antisymmetrized

partial derivative:

(2.76)

The simplest example is the gradient, which is the exterior derivative of a 0-form:

(2.77)

2.9 Differential forms

85

Exterior derivatives obey a modified version of the Leibniz rule when applied to the product of a p-form wand a q-form rJ:

d(w /\ rJ) = (dw) /\ rJ + (-l)P w I\ (drJ).

(2.78)

You are encouraged to prove this yourself. The reason why the exterior derivative deserves special attention is that it is
a tensor, even in curved spacetimes, unlike its cousin the partial derivative. For
p = l we can see this from the transformation law for the partial derivative of a
one form, (2.38); the offending nontensorial term can be written

(2.79)

This expression is symmetric in µ,' and v', since partial derivatives commute. But the exterior derivative is defined to be the antisymmetrized partial derivative, so this term vanishes (the antisymmetric part of a symmetric expression is zero). We are then left with the correct tensor transformation law; extension to arbitrary p is straightforward. So the exterior derivative is a legitimate tensor operator; it is not, however, an adequate substitute for the partial derivative, since it is only defined on forms. In the next chapter we will define a covariant derivative, which is closer to what we might think of as the extension of the partial derivative to arbitrary manifolds.
Another interesting fact about exterior differentiation is that, for any form A,

d(dA) = 0,

(2.80)

which is often written d2 = 0. This identity is a consequence of the definition of d and the fact that partial derivatives commute, aaa/3 = 3133a (acting on any-
thing). This leads us to the following mathematical aside, just for fun. We define a
p-form A to be closed if dA = 0, and exact if A= dB for some (p - 1)-form B.
Obviously, all exact forms are closed, but the converse is not necessarily true.
On a manifold M, closed p-forms comprise a vector space ZP(M), and exact
forms comprise a vector space BP (M). Define a new vector space, consisting of
elements called cohomology classes, as the closed forms modulo the exact forms:

ZP(M) HP(M)=--.
BP(M)

(2.81)

That is, two closed forms [elements of zP (M)] define the same cohomology class [elements of HP (M)] if they differ by an exact form [an element of BP (M)]. Miraculously, the dimensionality of the cohomology spaces HP (M) depends only on the topology of the manifold M. Minkowski space is topologically equivalent to R4, which is uninteresting, so that all of the HP(M) vanish for p > O; for p = 0 we have H 0 (M) = R. Therefore in Minkowski space all closed forms are exact except for zero-forms; zero-forms can't be exact since there are no -1-

86

Chapter 2 Manifolds

forms for them to be the exterior derivative of. It is striking that information about the topology can be extracted in this way, which essentially involves the solutions to differential equations.
The final operation on differential forms we will introduce is Hodge duality. We define the Hodge star operator on an n-dimensional manifold as a map from p-forms to (n - p)-forms,

(2.82)

mapping A to "A dual." Unlike our other operations on forms, the Hodge dual does depend on the metric of the manifold [which should be obvious, since we had to raise some indices on the Levi-Civita tensor in order to define (2.82)]. Applying the Hodge star twice returns either plus or minus the original form:

**A= (-l)s+p(n-p) A,

(2.83)

where s is the number of minus signs in the eigenvalues of the metric. Two facts on the Hodge dual: First, "duality" in the sense of Hodge is distinct
from the relationship between vectors and dual vectors. The idea of "duality" is that of a transformation from one space to another with the property that doing the transformation twice gets you back to the original space. It should be clear that this holds true for both the duality between vectors and one-forms, and the Hodge duality between p-forms and (n - p)-forms. A requirement of dualities between vector spaces is that the original and transformed spaces have the same dimensionality; this is true of the spaces of p- and (n - p)-forms.
The second fact concerns differential forms in three-dimensional Euclidean space. The Hodge dual of the wedge product of two I-forms gives another Iform:

(2.84)

(All of the prefactors cancel.) Since I-forms in Euclidean space are just like vectors, we have a map from two vectors to a single vector. You should convince yourself that this is just the conventional cross product, and that the appearance of the Levi-Civita tensor explains why the cross product changes sign under parity (interchange of two coordinates, or equivalently basis vectors). This is why the cross product only exists in three dimensions-because only in three dimensions do we have an interesting map from two dual vectors to a third dual vector.
Electrodynamics provides an especially compelling example of the use of differential forms. From the definition of the exterior derivative, it is clear that equation (1.89) can be concisely expressed as closure of the two-form F/.Lv:

dF=O.

(2.85)

Does this mean that F is also exact? Yes; as we've noted, Minkowski space is topologically trivial, so all closed forms are exact. There must therefore be a one-

2.9 Differential forms

87

form Aµ, such that

F=dA.

(2.86)

This one-form is the familiar vector potential of electromagnetism, with the 0
component given by the scalar potential, Ao = ¢, as we discussed in Chapter 1.
Gauge invariance is expressed by the observation that the theory is invariant under A ➔ A+ d1c for some scalar (zero-form) A, and this is also immediate from the
relation (2.86). The other one of Maxwell's equations, (1.88),

(2.87)
where the current one-form J is just the current four-vector with index lowered. Filling in the details is left for you, as good practice converting from differentialform notation to ordinary index notation.
Hodge duality is intimately related to a fascinating feature of certain field theories: duality between strong and weak coupling. It's hard not to notice that the
equations (2.85) and (2.87) look very similar. Indeed, if we set Jµ, = 0, the equa-
tions are invariant under the "duality transformations"

(2.88)
We therefore say that the vacuum Maxwell's equations are duality invariant, while the invariance is spoiled in the presence of charges. We might imagine that magnetic as well as electric monopoles existed in nature; then we could add a magnetic current term *IM to the right-hand side of (2.85), and the equations would be invariant under duality transformations plus the additional replacement J ~ JM.
(Of course a nonzero right-hand side to (2.85) is inconsistent with F = dA, so this
idea only works if Aµ, is not a fundamental variable.) Dirac considered the idea of magnetic monopoles and showed that a necessary condition for their existence is that the fundamental monopole charge be inversely proportional to the fundamental electric charge. Now, the fundamental electric charge is a small number; electrodynamics is weakly coupled, which is why perturbation theory is so remarkably successful in quantum electrodynamics (QED). But Dirac's condition on magnetic charges implies that a duality transformation takes a theory of weakly coupled electric charges to a theory of strongly coupled magnetic monopoles (and vice-versa). Unfortunately monopoles don't fit easily into ordinary electromagnetism, so these ideas aren't directly applicable; but some sort of duality symmetry may exist in certain theories (such as supersymmetric nonabelian gauge theories). If it did, we would have the opportunity to analyze a theory that looked strongly coupled (and therefore hard to solve) by looking at the weakly coupled dual version; this is exactly what happens in certain theories. The hope is that these techniques will allow us to explore various phenomena that we know exist in strongly coupled quantum field theories, such as confinement of quarks in hadrons.

88

Chapter 2 Manifolds

2.10 ■ INTEGRATION

An important appearance of both tensor densities and differential forms is in integration on manifolds. You have probably been exposed to the fact that in ordinary calculus on Rn the volume element anx picks up a factor of the Jacobian under change of coordinates:

(2.89)

There is actually a beautiful explanation of this formula from the point of view

of differential forms, which arises from the following fact: on an n-dimensional

manifold M, the integrand is properly understood as an n-fonn. In other words,

an integral over an n-dimensional region I; C Mis a map from an n-form field w

to the real numbers:

h: ➔ (i)

R.

(2.90)

u
FIGURE 2.27 An infinitesimal n-dimensional region, represented as a parallelepiped, is defined by an ordered set of n vectors, shown here as U, V, and W.

Such a statement may seem strange, but it certainly looks familiar in the context of
line integrals. In one dimension any one-form can be written w = w(x)dx, where
the first w is a one-form and w(x) denotes the (single) component function. And
J indeed, we write integrals in one dimension as w(x)dx; you may be used to
thinking of the symbol dx as an infinitesimal distance, but it is more properly a differential form.
To make this more clear, consider more than one dimension. If we are claiming that the integrand is an n-form, we need to explain in what sense it is antisymmetric, and for that matter why it is a (0, n) tensor (a linear map from a set of n
J vectors to R) at all. We all agree that integrals can be written as f (x) d µ,, where
f (x) is a scalar function on the manifold and d µ, is the volume element, or measure. The role of the volume element is to assign to every (infinitesimal) region an (infinitesimal) real number, namely the volume of that region. A nice feature of infinitesimal regions (as opposed to ones of finite size) is that they can be taken to be rectangular parallelepipeds-in the presence of curvature we have no clear sense of what a "rectangular parallelepiped" is supposed to mean, but the effects of curvature can be neglected when we work in infinitesimal regions. Clearly we are not being rigorous here, but our present purpose is exclusively motivational.
As shown in Figure 2.27 (in which we take our manifold to be three-dimensional for purposes of illustration), a parallelepiped is specified by n vectors that define its edges. Our volume element, then, should be a map from n vectors to the real numbers: dµ,(U, V, W) ER. (Actually it should be a map from infinitesimal vectors to infinitesimal numbers, but such a map also will take finite vectors to finite numbers.) It's also clear that it should be linearly scalable by real numbers; if we change the length of any of the defining vectors, the volume changes accord-
ingly: dµ,(aU, bV, cW) = abcdµ,(U, V, W). Linearity with respect to adding
vectors is not so obvious, but you can convince yourself by drawing pictures.