zotero-db/storage/TDBUQSHC/.zotero-ft-cache

Core Principles of Special and General Relativity

Core Principles of Special and General Relativity
James H. Luscombe

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742
c 2019 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper
International Standard Book Number-13: 978-1-138-54294-5 (Hardback)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microﬁlming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-proﬁt organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identiﬁcation and explanation without intent to infringe.
Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com
and the CRC Press Web site at http://www.crcpress.com

Contents

Preface

xi

CHAPTER 1 Relativity: A theory of space, time, and gravity

1

1.1 THE PRINCIPLE OF RELATIVITY

1

1.2 THE LAW OF INERTIA: FOUNDATION OF SPECIAL RELATIVITY 2

1.3 SPACE, TIME, AND SPACETIME

7

1.4 SPACETIME DIAGRAMS

11

1.5 RELATIVITY OF CAUSALITY: SPACELIKE AND TIMELIKE

13

1.6 SEGUE TO GENERAL RELATIVITY: NONINERTIAL FRAMES

15

1.7 GENERAL RELATIVITY: A THEORY OF GRAVITATION

17

1.8 HASTA LA VISTA, GRAVITY

24

CHAPTER 2 Basic special relativity

27

2.1 COMPARISON OF TIME INTERVALS: THE BONDI K-FACTOR

27

2.2 TIME DILATION

29

2.3 VELOCITY ADDITION

30

2.4 LORENTZ TRANSFORMATION

32

2.5 LENGTH CONTRACTION

34

2.6 FOUNDATIONAL EXPERIMENTS

41

CHAPTER 3 Lorentz transformation, I

45

3.1 FRAMES IN STANDARD CONFIGURATION

45

3.2 FRAMES NOT IN STANDARD CONFIGURATION

51

3.3 TRANSFORMATION OF VELOCITY AND ACCELERATION

52

3.4 RELATIVISTIC ABERRATION AND DOPPLER EFFECT

54

v

vi Contents

CHAPTER 4 Geometry of Lorentz invariance

57

4.1 LORENTZ TRANSFORMATIONS AS SPACETIME ROTATIONS

57

4.2 KINEMATIC EFFECTS FROM THE INVARIANT HYPERBOLA

61

4.3 CLASSIFICATION OF LORENTZ TRANSFORMATIONS

62

4.4 SPACETIME GEOMETRY AND CAUSALITY

64

CHAPTER 5 Tensors on ﬂat spaces

69

5.1 TRANSFORMATION PROPERTIES 5.2 TENSOR DENSITIES, INVARIANT VOLUME ELEMENT 5.3 DERIVATIVES OF TENSORS AND THE FOUR-WAVEVECTOR 5.4 INTERLUDE: DRAW A LINE HERE 5.5 TENSORS AS MULTILINEAR MAPPINGS 5.6 METRIC TENSOR REVISITED 5.7 SYMMETRY OPERATIONS ON TENSORS 5.8 LEVI-CIVITA TENSOR AND DETERMINANTS 5.9 PSEUDOTENSORS 5.10 TOTALLY ANTISYMMETRIC TENSORS 5.11 INTEGRATION ON MINKOWSKI SPACE 5.12 THE GHOST OF TENSORS YET TO COME
CHAPTER 6 Lorentz transformation, II

69 85 87 89 89 91 92 93 95 97 108 110
113

6.1 DECOMPOSITION INTO ROTATIONS AND BOOSTS

113

6.2 INFINITESIMAL LORENTZ TRANSFORMATION

119

6.3 SPINOR REPRESENTATION OF LORENTZ TRANSFORMATIONS 120

6.4 THOMAS-WIGNER ROTATION

123

CHAPTER 7 Particle dynamics

129

7.1 PROPER TIME, FOUR-VELOCITY, AND FOUR-ACCELERATION 129

7.2 THE ENERGY-MOMENTUM FOUR-VECTOR

132

7.3 ACTION PRINCIPLE FOR PARTICLES

134

7.4 KEPLER PROBLEM IN SPECIAL RELATIVITY

136

7.5 COVARIANT EULER-LAGRANGE EQUATION

138

7.6 PARTICLE CONSERVATION LAWS

140

7.7 ENERGY-MOMENTUM CONSERVATION

143

CHAPTER 8 Covariant electrodynamics

Contents vii
147

8.1 ELECTROMAGNETISM IN SPACE AND TIME

147

8.2 SOURCES IN SPACETIME: THE FOUR-CURRENT

150

8.3 CONSERVATION IN SPACETIME: SPACELIKE HYPERSURFACES 151

8.4 THE FOUR-POTENTIAL

153

8.5 MAXWELL EQUATIONS IN COVARIANT FORM: FIELD TENSOR 153

8.6 LORENTZ TRANSFORMATION OF E AND B FIELDS

155

8.7 LORENTZ FORCE AS A RELATIVISTIC EFFECT

156

8.8 INVARIANTS OF THE ELECTROMAGNETIC FIELD

157

8.9 ACTION PRINCIPLE FOR CHARGED PARTICLES

158

8.10 GAUGE INVARIANCE AND CHARGE CONSERVATION

161

CHAPTER 9 Energy-momentum of ﬁelds

165

9.1 SYMMETRIES AND CONSERVATION LAWS

165

9.2 SPACETIME HOMOGENEITY: ENERGY-MOMENTUM TENSOR 167

9.3 SPACETIME ISOTROPY: ANGULAR MOMENTUM TENSOR

169

9.4 SYMMETRIC ENERGY-MOMENTUM TENSOR

171

9.5 THE ELECTROMAGNETIC FIELD

172

CHAPTER 10 Relativistic hydrodynamics

177

10.1 NONRELATIVISTIC HYDRODYNAMICS 10.2 ENERGY-MOMENTUM TENSOR FOR PERFECT FLUIDS 10.3 ENERGY-MOMENTUM CONSERVATION 10.4 PARTICLE NUMBER CONSERVATION 10.5 COVARIANT EQUATION OF MOTION 10.6 LAGRANGIAN DENSITY
CHAPTER 11 Equivalence of local gravity and acceleration
11.1 THE EO¨ TVO¨ S EXPERIMENT 11.2 THE EQUIVALENCE PRINCIPLE 11.3 TIDAL FORCES AND REFERENCE FRAMES 11.4 WEAK AND STRONG EQUIVALENCE PRINCIPLES 11.5 SPACETIME IS GLOBALLY CURVED, LOCALLY FLAT 11.6 ENERGY COUPLES TO GRAVITY 11.7 GRAVITY AFFECTS TIME

177 180 181 182 182 183
185
185 187 187 191 192 192 194

viii Contents

CHAPTER 12 Acceleration in special relativity

199

12.1 LINEAR ACCELERATION 12.2 TWIN PARADOX 12.3 ROTATING REFERENCE FRAME 12.4 THE SAGNAC EFFECT 12.5 RELATIVISTIC DESCRIPTION OF SPIN 12.6 COVARIANT SPIN DYNAMICS
CHAPTER 13 Tensors on manifolds

199 205 207 211 212 214
217

13.1 MANIFOLDS 13.2 VECTOR AND TENSOR FIELDS 13.3 INTEGRAL CURVES, CONGRUENCES, AND FLOWS 13.4 MAPPINGS OF TENSORS 13.5 THE LIE DERIVATIVE 13.6 SUBMANIFOLDS, EMBEDDINGS, AND HYPERSURFACES 13.7 DIFFERENTIAL FORMS AND EXTERIOR DIFFERENTIATION 13.8 INTEGRATION ON MANIFOLDS
CHAPTER 14 Differential geometry

217 221 225 227 228 230 233 235
241

14.1 COVARIANT DIFFERENTIATION

241

14.2 WHAT DO THE CONNECTION COEFFICIENTS TELL US?

250

14.3 PARALLEL TRANSPORT AND GEODESIC CURVES

252

14.4 THE RIEMANN TENSOR

257

14.5 THE RICCI TENSOR AND SCALAR FIELD

265

14.6 THE EINSTEIN TENSOR

267

14.7 ISOMETRIES, KILLING VECTORS, AND CONSERVATION LAWS 268

14.8 MAXIMALLY SYMMETRIC SPACES

270

CHAPTER 15 General relativity

275

15.1 INTRODUCTION

275

15.2 WEAK, STATIC GRAVITY

276

15.3 THE EINSTEIN FIELD EQUATION

277

15.4 LAGRANGIAN FORMULATION

281

15.5 DUST

286

CHAPTER 16 The Schwarzschild metric

Contents ix
289

16.1 STATIC, SPHERICALLY SYMMETRIC SPACETIME METRICS 16.2 RICCI TENSOR FOR THE SCHWARZSCHILD METRIC 16.3 THE VACUUM SOLUTION 16.4 BIRKHOFF’S THEOREM 16.5 SPATIAL GEOMETRY OF THE SCHWARZSCHILD METRIC
CHAPTER 17 Physical effects of Schwarzschild spacetime

289 290 291 292 293
297

17.1 GEODESICS IN SCHWARZSCHILD SPACETIME 17.2 PARTICLE TRAJECTORIES 17.3 RADIAL NULL GEODESICS: KRUSKAL COORDINATES 17.4 GRAVITATIONAL DEFLECTION OF LIGHT 17.5 APSIDAL PRECESSION 17.6 GRAVITATIONAL TIME DELAY 17.7 PARAMETERIZED POST-NEWTONIAN FRAMEWORK 17.8 THE GLOBAL POSITIONING SYSTEM 17.9 SPIN PRECESSION I: GEODETIC EFFECT 17.10 WEIGHT OF AN AT-REST OBSERVER
CHAPTER 18 Linearized gravity

297 300 303 304 309 311 312 313 315 316
319

18.1 LINEARIZED FIELD EQUATION 18.2 STATIC SOURCE 18.3 FAR FROM A SLOWLY VARYING SOURCE 18.4 GRAVITOMAGNETISM: STATIONARY SOURCES 18.5 FRAME DRAGGING 18.6 SLOWLY ROTATING SOURCE 18.7 SPIN PRECESSION II: THE LENSE-THIRRING EFFECT 18.8 GRAVITATIONAL WAVES 18.9 ENERGY-MOMENTUM OF GRAVITATION
CHAPTER 19 Relativistic cosmology

319 322 323 325 326 327 328 331 333
339

19.1 THE COSMOLOGICAL PRINCIPLE

339

19.2 A COORDINATE SYSTEM FOR COSMOLOGY

340

19.3 SPACES OF CONSTANT CURVATURE

340

19.4 FRIEDMANN-ROBERTSON-WALKER SPACETIME

342

19.5 SPATIAL GEOMETRIES

343

19.6 THE FRIEDMANN EQUATIONS

346

x Contents
19.7 NEWTONIAN COSMOLOGY 19.8 COSMOLOGICAL REDSHIFT 19.9 THE EINSTEIN UNIVERSE 19.10 THE DE SITTER UNIVERSE 19.11 DARK ENERGY 19.12 THE FRIEDMANN EVOLUTION EQUATION
APPENDIX A Invariance of the wave equation

347 348 350 350 351 353
355

APPENDIX B The Doppler effect

357

APPENDIX C Topics in linear algebra

359

APPENDIX D Topics in classical mechanics

363

APPENDIX E Photon and particle orbits

375

Bibliography

379

Index

383

Preface
T HE theory of relativity is a core component of physics curricula, yet the level at which it’s taught can differ widely, from minimal coverage of special relativity (SR) in modern physics courses, to treatments using four-vectors in mechanics courses, to covariant treatments of electrodynamics, to graduate courses on general relativity (GR). I have sought to create a text aimed at advanced undergraduate/ﬁrst-year graduate students, which starts with the foundations of SR and continues through to GR, at roughly the same level of sophistication. What makes that a challenge is the mathematics involved toward the end of the journey. General relativity requires the mathematics of curved spaces, the province of differential geometry. If linear algebra comprises the mathematics of quantum mechanics, differential geometry is the lingua franca of GR, and most physics students learn this branch of mathematics in courses on GR. We start at the beginning developing the mathematics as required with the goal of providing in one voice, hopefully in an accessible style, the full picture of the subject. I assume students have had, or are taking, the standard courses in undergraduate physics curricula—analytical mechanics, quantum mechanics, electrodynamics, and mathematical methods—but not dedicated courses in relativity beyond what one encounters in a modern physics course. I assume familiarity with the Michelson-Morley experiment (MM). I do not presuppose a mastery of tensors; we supply a reasonably in-depth treatment of tensors, on ﬂat and curved spaces. There are numerous texts on relativity available, of varying degrees of rigor. I have sought a middle ground between treatments that are qualitative and lacking in mathematical details and works written by experts for experts.
Here are some points of note.
Minus signs: Minus-sign ambiguities arise at several places in relativity. The ﬁrst is the Lorentz metric. We choose (−+++); this seems best (to me)—it singles out time as the quantity warranting special treatment, so true in relativity, and it leaves alone the Euclidean metric for spatial variables. Students must learn from the outset that relativity mostly is about time. The perennial debate over the Lorentz metric will not be settled here. Another source of minus sign confusion is in the Riemann curvature tensor Rαβγδ; I have put the indices associated with derivatives in the third and fourth places, i.e., γ and δ. We take the Ricci tensor as the contraction over the ﬁrst and third indices of the Riemann tensor, Rµν = Rαµαν. Finally, the energy-momentum tensor is deﬁned so that T 00 ≥ 0.
Notation: An attempt has been made at being consistent. Scalar quantities are indicated in italic font: the speed of light, c. Vector quantities are indicated with boldface italic font: force F . Tensors considered as geometric objects are indicated with boldface Roman font: T (this notation doesn’t appear until Chapter 5). Components of tensors are indicated in italic font with indices: Tµν. Tensor densities are indicated with Gothic symbols, T; that notation is sparingly used.
Units: I have kept all the factors of c, G, and in formulas. There is a certain panache in advanced physics of working in units where c = G = 1, etc. The aim of this practice is to: 1) avoid repetitively writing the same old factors, and 2) gain insight into the geometric meaning of formulas. In a ﬁrst— and perhaps only—exposure to the subject, I have consistently worked in SI units.
Mathematics: Relativity is a mathematical theory; there’s no way around that. Tensors constitute the very language of relativity: An equation of physics expressed as a relation between tensors, if valid in one reference frame, is valid in all reference frames. Yet the mathematical preparation
xi

xii Preface
of students in this area is often insufﬁcient for a study of relativity, and the power of the theory cannot be harnessed without knowledge of its mathematical structure. To ﬁll this gap, roughly 25% of the book is devoted to the mathematics of relativity. Chapter 5 is an introduction to tensors on ﬂat spaces. Most courses will not cover all this material; consider the latter half of Chapter 5 reference material (which is used throughout the book). The ﬁrst half of Chapter 5 comprises a “tensor starter kit”—a foundation for the use of tensors in SR. For GR, a deeper understanding must be developed. To study GR at anything beyond a superﬁcial level requires a working knowledge of tensor ﬁelds on curved spaces, which is developed in Chapters 13 and 14. I considered putting the material in Chapter 13 (manifolds) into an appendix, but decided against: It should be part of the main exposition of the subject. Nevertheless, it could be skipped on a ﬁrst reading. Chapter 14 (curvature) presumes a familiarity with manifolds, but not all their properties in detail. Consult the latter half of Chapter 5 and Chapter 13 as needed. The mathematics contained in Chapters 5, 13, and 14, if encountered for the ﬁrst time, would be daunting despite my attempts to guide you through the maze. It takes time to become proﬁcient in the theory of relativity, to learn its methods and scope. Physics students tend to learn mathematics on a “need-to-know” basis, and most learn this material in courses on GR. Physicists often ﬁnd themselves strangers in a strange land of mathematics.
Organization: Chapter 1 presents an overview of SR and GR. Chapters 2–10 develop nongravitational phenomena (SR), ﬁrst without, and then with the use of tensors. Chapters 11 and 12 introduce the principle of equivalence (the equivalence of local gravity and acceleration) and the treatment of accelerated motion in SR. Chapters 13 and 14 are where a traditional book on GR would begin. Chapters 15–18 present Einstein’s ﬁeld equation, the standard ﬁrst topics in GR, and the extent to which they have been tested, mainly on the scale of the solar system. Chapter 19 concludes with a brief introduction to cosmology. Appendices contain specialized topics.
History: I have reproduced passages from the writings of Newton, Einstein, Minkowski, and others. It’s instructive for students to see how the luminaries of physics have grappled with the very subject they are encountering. No attempt has been made to offer a history of relativity.
Going outside the box: Relativity is foundational to much of physics. The book is offered against the backdrop of the corpus of physical theory, to which the student is assumed to have had exposure. When instructive I point out parallels with other branches of physics; I do not pretend that other parts of physics don’t exist.
Disclaimers: In addition to typos and outright blunders, I welcome comments on what is not clear. Invariably, when delving into a subject with sufﬁcient depth you get “hot” on the material, and many conclusions seem obvious. Later, however, they may not be so obvious. I have attempted to give all the details necessary to derive the important equations. If the presentation seems ploddingly slow at times, I’ve succeeded in bringing you up to speed. It’s all relative!
Acknowledgments: I thank my colleague Brett Borden for being my LATEX guru and differential geometry sounding board. I thank the editorial staff at CRC Press, in particular Francesca McGowan and Rebecca Davies. I thank Evelyn Helminen for making ﬁgures. I thank my family, for they have seen me too often buried in a computer. My wife Lisa I thank for her encouragement and consummate advice on how not to mangle the English language. Finally, to the students of NPS, I have learned from you, more than you know. Try to remember that science is a “work in progress”; more is unknown than known.
James H. Luscombe
Monterey, California

1 C H A P T E R
Relativity
A theory of space, time, and gravity
R ELATIVITY is a theory of space and time that provides the foundation for much of physics. It applies to any branch of physics that makes use of the four variables x, y, z, t, where x, y, z are independent spatial coordinates and t denotes time.1 While originating from a reasonable premise (see below), the theory of relativity2 implies conceptions of space, time, matter, and motion vastly different from what our everyday experience of the world leads us to formulate. To understand physics in full, as applied to phenomena beyond ordinary experience, one must study relativity (as well as quantum mechanics); our everyday experience is but a special case of all that’s possible in the universe. We’ll see that relativity consists of two theories: the special theory of relativity (SR) and the general theory of relativity (GR).
1.1 THE PRINCIPLE OF RELATIVITY
TO VANQUISH COORDINATES, TRANSCEND THEM In broadest terms, relativity holds that the universe doesn’t care what systems of coordinates, or reference frames we use to describe physical phenomena.3 Such a statement hardly sounds revolutionary, yet its implications are far-reaching because in the theory of relativity time is taken as a coordinate in a four-dimensional geometry of space and time, rather than as a parameter in pre-relativistic physics.4 Coordinates are essential for making measurements and performing calculations, yet they’re not fundamental—they don’t exist in nature—they’re artifacts of our thinking, what we as humans impose on the world. Therein lies the rub. We need coordinates for practical purposes, yet the goal of physics is to formulate laws of nature as manifestations of an objective reality, that which occurs independently of human beings.5 The laws of physics should be expressed in a way that’s independent of coordinate system. Relativity is an outgrowth of a single idea, the
1Isn’t that all of physics? Classical thermodynamics, for example, utilizes variables that characterize the state of thermal equilibrium, which is independent of position and time.
2Referring to relativity as a theory can give the impression that it’s speculative. Relativity has been thoroughly tested and is among the most secure theories in physics. It’s up to us to ﬁt our minds to the Procrustean bed of physics.
3We use the terms reference frame and coordinate system interchangeably. 4Classical physics refers to non-quantum physics; relativity belongs to classical physics. Pre-relativistic refers to physics developed prior to the advent of relativity, which dates to the year 1905. 5We use the term objective as it’s used in science, to refer to objects that exist, or processes that occur, independently of the presence of human beings. That idea conﬂicts with the acausality of measurement as taught in quantum mechanics. There is a successful marriage of quantum mechanics with SR (the Dirac equation), but not with GR. Progress has been made in incorporating quantum effects into GR, such as Hawking radiation, but there is not presently a consistent theory of quantum GR, what’s referred to as quantum gravity.
1

2 Core Principles of Special and General Relativity
principle of relativity, that physical laws be independent of the reference frame used to represent them. Relativity is therefore a law about laws.6 Albert Einstein said: “. . . time and space are modes by which we think, and not conditions in which we live.”[2, p81] The program of relativity is to express equations of physics in such a way that, if true in one system of space-time coordinates, are true in any coordinate system, and thereby transcend coordinates. We will travel far in the theory of relativity in pursuit of this goal, which, as we’ll see, is achieved by expressing equations as relations between tensors,7 tensors deﬁned on a four-dimensional geometry where time is a dimension.
1.2 THE LAW OF INERTIA: FOUNDATION OF SPECIAL RELATIVITY
Motion exists . . . relatively to things that lack it.—Galileo, 1632[3, p116]
Motion is ubiquitous, yet learning to describe it correctly took a long time to achieve. Galileo taught, for the purposes of formulating laws of motion, that states of uniform motion are the same as rest,8 when observed from reference frames in which the law of inertia holds, inertial reference frames (IRFs).9 There are an unlimited number of possible IRFs, which therefore comprise a class of frames from which to describe motion. Our ﬁrst order of business is to examine inertia and IRFs, because SR is based on the equivalence of IRFs. That we have singled out a particular type of reference frame is what puts the “special” in SR. There are two aspects to the principle of relativity: The type of phenomena that are the same for observers in equivalent reference frames, and the class of equivalent frames of references. With SR, Einstein showed that mechanical and electromagnetic phenomena obey the same laws for all inertial observers;10 with GR, he extended the class of equivalent observers to all observers, wherein he provided an explanatory framework for gravitational phenomena. We must understand how relativity is implemented for IRFs (SR) before tackling arbitrary frames of reference (GR).
1.2.1 Inertia
The property of matter known as inertia, so familiar to us today, had a difﬁcult time in becoming established. Pick up a rock and throw it. What makes it move when it leaves your hand? According to Aristotle, “Everything that is in motion must be moved by something,” an idea seemingly so compelling, it stood for almost 20 centuries.11 Galileo refuted that idea with a simple experiment.12 Drop a stone from the mast of a ship that’s at rest; note where it lands. Now repeat the experiment on a ship that’s in uniform motion. In the Aristotelian theory, the rock would land at a point displaced
6The principle of relativity is a different kind of law than other physical principles. It presumes the existence of laws of nature, that there are reproducible manifestations of the workings of nature waiting for us to describe, of which we possess a language rich enough to accurately describe. That language is mathematics, which physics relies on heavily. It’s remarkable that mathematics, a human invention, applies so well to the description of nature. To quote Eugene Wigner:[1] “. . . the mathematical formulation of the physicist’s often crude experience leads in an uncanny number of cases to an amazingly accurate description of a large class of phenomena. This shows that the mathematical language has more to commend it than being the only language which we can speak; it shows that it is, in a very real sense, the correct language.”
7If you’re uneasy about tensors, don’t worry; students are frequently ill-prepared when it comes to tensors. The mathematics of tensors will be developed as we proceed. Vectors are special cases of tensors.
8Galileo did not explicitly isolate the concept of inertial motion as a general principle, yet it’s quite clear from his writings that he understood it. Even today, students of physics are well advised to read Galileo’s Dialogue.[3]
9There are reference frames in which the law of inertia does not hold, noninertial reference frames—see Section 1.6. 10We’ll refer to inertial observers as observers at rest relative to IRFs. The “observer” is essentially the reference frame. 11Aristotle classiﬁed motion as natural and unnatural. Natural motion occurs among the four elements air, earth, ﬁre, and water, which seek to ﬁnd their natural places, e.g., heavy objects naturally move toward the center of the earth. Natural motion is unforced, not requiring the action of an external agency. Unnatural motion, however, such as horizontal motion on Earth, is forced and requires a mover. What’s the “mover” when the rock leaves your hand? Aristotle argued that air, displaced by the motion of the rock, wraps around the rock and pushes it on. A rock thrown in vacuum would not move! 12The history of inertia is more involved than our account here. A succession of investigators in the time between Aristotle and Galileo questioned the Aristotelian theory.

Inertia 3
from the mast by the distance the ship had moved during the fall.13 Galileo maintained there would be no displacement because, ﬁrst, the rock shares in the motion of the ship,14 and second, free particles move without movers, that free particles—those with no forces acting on them—once set in motion, maintain that state of motion, termed inertial motion.15 The rock accelerates under the action of gravity, but maintains its constant motion in the direction of the uniform motion of the ship because there is no force acting in that direction (assuming negligible wind resistance).16
The primary state of motion, that exhibited by free particles, is inertial—in a straight line at constant speed. Free particles of and by themselves cannot change their states of motion. That fact is highly important (essential, actually) for SR and GR. The unfolding of the inertia concept mirrors the historical development of physics, from Aristotle to Einstein, at least as far as our understanding of motion is concerned. Galileo’s experiment with the ship is a variant of an argument used by Aristotle to prove that Earth is immobile: An object projected straight up from the surface of the earth returns to the same place and thus Earth could not have moved in the meantime. Galileo maintained that nothing can be inferred from such an argument about Earth’s motion or rest. What Galileo asserted is that, except in relation to other objects, uniform motion of one’s reference frame cannot be detected—a fundamental tenet of relativity—in this case by mechanical means.17
Isaac Newton conceived of inertia not just as the property of free objects to maintain states of uniform motion, but also by what he called the inherent force, the property by which matter resists changes in motion: “Inherent force of matter is the power of resisting by which every body, so far as it is able, perseveres in its state either of resting or of moving uniformly straight forward.”[4, p404] Thus there are two aspects of inertia: perseverance and resistance. His deﬁnition of inertia should be read together with his ﬁrst law of motion: “Every body perseveres in its state of being at rest or of moving uniformly straight forward, except insofar as it is compelled to change its state by forces impressed.”[4, p416] Objects move inertially unless prevented from doing so by imposed forces, to which they provide a resistance, the inertial force.18 The inertial force is the reaction by which objects “push back” against forces attempting to prevent states of inertial motion:
Because of the inertia of matter, every body is only with difﬁculty put out of its state either of resting or of moving. Consequently, inherent force may also be called by the very signiﬁcant name of force of inertia. Moreover, a body exerts this force only during a change of its state, caused by another force impressed upon it, and this exercise of force is, depending on the view point,19 both resistance and impetus:20 resistance insofar as the body, in order to maintain its state, strives against the impressed force, and impetus insofar as the same body, yielding only with difﬁculty to the force of a resist-
13In the Aristotelian theory, once the stone has been released (and no longer has a mover), it can only undergo its “natural” motion toward the center of Earth; where the ship goes after the release of the rock is immaterial.
14This point, obvious to us today, was one that Galileo had to take pains to establish, that objects can have a superposition of motions, i.e., velocity is a vector quantity. In the Aristotelian theory, objects not subject to movers can only have their natural motions. That objects can have “two motions” (downwards and sideways) was foreign to the Aristotelian worldview.
15Galileo based this conclusion on his experiments with inclined planes: Objects accelerate on planes oriented downward, decelerate on those oriented upwards, and have no acceleration on horizontal planes.
16Truth in advertising: A particle dropped from a sufﬁciently high point would show a displacement from the Coriolis acceleration. By Earth’s rotation, a body dropped from a high elevation has a higher transverse velocity than the ground. Such a displacement actually conﬁrms Galileo’s hypothesis that different types of motion can be imparted to particles.
17In the Aristotelian theory, the speed of the ship could be inferred from the displacement of the rock. Perhaps one has ridden in a train through a tunnel (or a submarine), where, if the ride is smooth enough, one doesn’t have a sense of motion. The MM experiment failed to detect uniform motion by electromagnetic means.
18We’ll see in GR that your weight is the force which must be supplied to prevent you from continuing in a state of inertial motion. What’s seen as accelerated motion in three dimensions (under the force of gravity) corresponds to a constant state of motion in four-dimensional spacetime (deﬁned on page 5). As shown in GR, gravity is a property of spacetime.
19What we refer to as reference frame, Newton called point of view. 20Impetus is another word for momentum. What we call momentum, Newton called quantity of motion, deﬁned [4, p404] as “the velocity and quantity of matter jointly”; hence momentum p = mv. In SR, momentum is deﬁned as p = mγv (where γ = (1 − v2/c2)−1/2 and c is the speed of light), an alternative “quantity of motion.” For v c, γ ≈ 1.

4 Core Principles of Special and General Relativity
ing obstacle, endeavors to change the state of that obstacle. Resistance is commonly attributed to resting bodies and impetus to moving bodies; but motion and rest . . . are distinguished from each other only by point of view, and bodies commonly regarded as being at rest are not always truly at rest.[4, p404]
1.2.2 Inertial reference frames
In IRFs the law of inertia holds true, that free particles move in straight lines at constant speed. In view of the transition to GR, several issues are exposed by this benign statement.
1. What’s a free particle? The answer is seemingly self-evident: If free particles are unaccelerated, then not-free particles are accelerated, right? Not so fast. Such reasoning doesn’t take into account how acceleration is measured. Not all unaccelerated particles are free, and not all free particles are unaccelerated: It depends on the reference frame. In IRFs, acceleration is caused solely by forces. No force, no acceleration, and forces arise from physical interactions. In noninertial reference frames (see Section 1.6), acceleration can be an artifact of the choice of frame and not necessarily the result of forces. Forces can be identiﬁed from their physical sources. Acceleration—seemingly the quantity most accessible to direct observation—is not unambiguous because to measure it a standard of rest must be speciﬁed. Consider Earth in the gravitational ﬁeld of the sun. In a frame with the sun at rest, Earth’s acceleration is in the direction of the force produced by the sun; Newton’s second law of motion is satisﬁed. In a frame with Earth at rest, however, it is not satisﬁed because Earth’s acceleration is zero. Newton’s second law is not a general law of physics because we’re free to choose reference frames in which it doesn’t work.21 IRFs are frames in which objects with no forces acting on them have no acceleration.
2. What’s a straight line? In a given geometry, the straightest possible line is called a geodesic curve, a concept that we’ll develop. But what speciﬁes the geometry? In GR, the geometry of spacetime22 is not something known a priori, but is instead determined by its energymomentum content. Spacetime geometry is therefore physical, something that emerges from the distribution of matter-energy-momentum. Spacetime in GR is not something passive and inert; it evolves in response to matter. The version of Newton’s ﬁrst law that survives to GR is that free particles follow geodesic paths in spacetime, those determined by the distribution of energy-momentum. We return to this idea when we take up GR.
3. What’s constant speed? For speed, we need time. But whose time? Newtonian mechanics utilizes an absolute time that pervades the universe—see page 8. In relativity, time and space do not have separate existences and are reference-frame speciﬁc.
1.2.3 Equivalence of inertial reference frames
Once a frame has been found meeting the criteria for an IRF, any other frame moving relative to it with constant velocity also constitutes an IRF.23 A natural equivalence among IRFs is established by free particles: All inertial observers agree that the trajectories of free particles are described by constant velocity; all agree on the law of inertia. The value of the speed is reference-frame speciﬁc, but all agree on its constancy. Thus, all inertial observers agree on the laws of mechanics: Forces manifest in changes of states of inertial motion. Different inertial observers can observe the same phenomena and describe them by the same laws. Transforming from one set of inertial observers to another does not change the laws—the very heart of the principle of relativity.
21In a sense, that’s the problem GR ﬁxes. 22Spacetime is deﬁned on page 5. Is it obvious what the geometry of spacetime should be? 23The motion of free objects is seen as unaccelerated in both frames.

Coordinate transformations 5

1.2.4 Coordinate transformations and the principle of covariance
Transformation is central to relativity. Transformations between reference frames are effected mathematically as transformations among the different coordinates assigned to the same event by all the different, yet equivalent inertial observers. An event is a point in space at a point in time. Anything that happens, or has happened or will happen, comprises an event. The totality of all events is a four-dimensional continuum referred to as spacetime (no hyphen). We require that the mathematical form of the laws of physics be unaffected by changes in reference frames, changes in the coordinates assigned to events, a theme that accompanies us from Newtonian mechanics to SR to GR, that the laws of physics be expressed in a way that their form is invariant under progressively more general coordinate transformations. Form invariance of physical laws is called the principle of covariance, the requirement that the equations of physics adhere to the principle of relativity by having the same mathematical form in all reference frames.
Coordinate transformations in SR must be linear. All inertial observers agree that the spacetime trajectories (worldlines) of free particles are straight (see Section 1.4). Coordinate transformations between IRFs must be such as to map straight lines in spacetime onto straight lines so as to preserve the law of inertia. Only homogeneous, linear transformations map straight lines onto straight lines, where both lines pass through the same origin of the coordinate system. We’ll work through some examples to see how inertial frames can differ and yet be equivalent.

1.2.4.1 Boosts Figure 1.1 shows frames S and S with origins displaced by vector R, where the coordinate axes

zS

z S∗

r

r

R

y

y x x
Figure 1.1 Frames S and S in boost conﬁguration: coordinate axes are parallel.
are parallel. We will of course be interested in the case of relative motion where R = R(t) is time dependent, but for now let R be ﬁxed. Any transformation between frames with parallel axes (as in Fig. 1.1) is called a boost.
In Fig. 1.1 the same point in space, denoted with an asterisk, is referenced by vectors r and r , with r = r − R (law of vector addition). This simple (linear) coordinate transformation can be “inverted” by interchanging primed and unprimed quantities and letting R → −R, r = r + R. That rule will stand us in good stead with linear coordinate transformations: Interchange primed and unprimed quantities and reverse the transformation parameter (velocity, angle, etc.). Suppose S is an IRF, i.e., a frame in which a free particle is unaccelerated, r¨ = 0. By differentiating the transformation equation we conclude that r¨ = 0. If S is an IRF, so is S when it’s connected to S by a displacement. There is no unique origin for IRFs.

6 Core Principles of Special and General Relativity

1.2.4.2 Rotations A more complicated example of a linear transformation is a rotation. Figure 1.2 shows frames S and

S

S y

∗

y

xφ x

Figure 1.2 Frames having a common origin with axes rotated through a ﬁxed angle φ.

S having a common origin but with coordinate axes rigidly rotated relative to each other by a ﬁxed

angle φ. How are the coordinates assigned to the same point related? It’s an exercise in trigonometry

to show that

x y

=

cos φ − sin φ

sin φ cos φ

x y

≡ Rz(φ)

x y

,

(1.1)

where we’ve introduced the rotation operator, Rz(φ), which effects a rotation about the z-axis (coming out of the paper, not shown) through an angle φ. The inverse transformation is obtained by interchanging primed and unprimed quantities and by letting φ → −φ. If in S a free particle is observed to be unaccelerated, with x¨ = 0 and y¨ = 0, then because φ is constant, x¨ = 0 and y¨ = 0. A frame rotated relative to an IRF is also an IRF.24 There is no unique orientation of IRFs. General linear
transformations involving both boosts and rotations are covered in Chapter 6.

1.2.4.3 Galilean transformations

Now let R in Fig. 1.1 vary linearly with time, R = vt, where v is a constant vector. Both observers carry identical clocks, which are synchronized when the origins coincide. By “common sense” reasoning, r and r are related by r = r − vt. Implicit is the assumption that time in S , t , is the same as that in S, t = t (absolute time, see page 8). This “obvious” assumption was rarely made explicit in pre-relativistic physics. By differentiating the transformation formula, we have the Galilean velocity addition formula25 u = u − v, where u ≡ dr/dt and u ≡ dr /dt . If in S a free particle is described by r¨ = 0, then r¨ = 0 as well. If S is an IRF, then so is S if it’s moving uniformly relative to S. It’s difﬁcult to appreciate at ﬁrst the deep implications of this result!
The transformation r = r − vt can be written in terms of its vector components:

x  x vx

y  = y − t vy .

(1.2)

z

z

vz

Equation (1.2) underscores the pre-relativistic concept that we live in a three-dimensional world with time as a universal parameter (t = t). If time is included as a separate dimension, however, r = r − vt and t = t can be expressed as a linear transformation in four-dimensional spacetime:

 t   1 0 0 0  t 

x y

  

=

−vx −vy

1 0

0 1

0 x 0 y

.

z

−vz 0 0 1 z

(Galilean transformation)

(1.3)

24The frames S and S are related through a ﬁxed angle. A rotating reference frame, with φ = φ(t) is not an IRF. 25How velocities transform between IRFs in SR is treated in the next two chapters.

Space, time, and spacetime 7

Let’s get in the habit of listing the time “coordinate” ﬁrst,26 as in Eq. (1.3). Equation (1.3) is the
Galilean transformation (GT), the form of relativity based on everyday experience. Despite its common-sense appeal, the GT does not lead to predictions in agreement with experiment;27 it will
be replaced by another linear transformation of spacetime coordinates that does lead to agreement with experiment—the Lorentz transformation (LT).28

1.2.4.4 Form invariance

The idea of form invariance can be illustrated using the GT, because acceleration is invariant under that transformation: a ≡ (d2/dt 2)r = (d2/dt2)(r − vt) = (d2/dt2)r = a. Observers in S and

S agree on the form of Newton’s second law: F = ma = ma = F , where mass is the same in

all IRFs.29 The laws of mechanics are invariant under the GT. What about electromagnetism?

Maxwell’s given in terms

eoqfuealteiocntrsopmreadgincetttihcepeaxraismteentceerso, fcel=ect1ro/m√ag0µne0t.icItw’sasvheoswthnatinprAoppapgeantdeixwiAth

a speed that the

wave equation transforms under the GT for frames in relative motion along a common x-axis as:

∂2 1 ∂2 ∂x2 − c2 ∂t2 =

1 − v2/c2

∂2 1 ∂2 2v ∂2 ∂x 2 − c2 ∂t 2 + c2 ∂x ∂t

.

(A.3)

Form invariance therefore does not hold for the wave equation under the GT, implying a crack in the foundation of physics. The inconsistency is that Maxwell’s equations are fundamental laws of physics, yet a prediction of those equations is not invariant under the GT, while the laws of mechanics are. Let’s consider the three possible explanations for this inconsistency:

1. The principle of relativity applies to mechanics, but not to electromagnetism. Maxwell’s equations predict a speed of electromagnetic waves, but don’t specify a reference frame. Perhaps there is only one reference frame in which the speed of light is c? If so, one could detect that frame by electromagnetic means—the MM experiment.

2. The principle of relativity applies to mechanics and electromagnetism but Maxwell’s equations are incorrect. If so, one should ﬁnd discrepancies between the predictions of Maxwell’s equations and experimental results. Such discrepancies have yet to be found.

3. The principle of relativity applies to mechanics and electromagnetism, but Newton’s laws are incorrect. If so, one should ﬁnd discrepancies between the predictions of Newton’s laws and experimental results—something routinely done at particle accelerators which produce speeds v c. If Newton’s laws are incorrect, so is the GT, and we’re back to square one.

Einstein opted for the third explanation. He asserted that the principle of relativity applies to all of physics, not just to mechanics. He then took that idea to its logical extreme. The speed of light is a law of physics, not merely something that we measure. Einstein took the bold step of asserting that the speed of light is the same for all inertial observers, which experiment has shown to be true!

1.3 SPACE, TIME, AND SPACETIME
1.3.1 Newtonian space and time
Relativity is concerned with space and time and how the two are related through motion. It’s useful to state Newton’s conceptions of space and time, which, while not satisfactory by today’s standards, continue to frame the discussion:[4, p408]
26In pre-relativistic physics, time is a parameter, not a coordinate. 27What’s wrong with common sense? If you had to put your ﬁnger on it, it would be the assumption that t = t, the notion of absolute simultaneity. 28The properties of Lorentz transformations are developed throughout this book. 29Mass is the same in all IRFs, wherein all observers claim themselves at rest.

8 Core Principles of Special and General Relativity
• Absolute space, of its own nature without reference to anything external, always remains homogeneous and immovable.
• Absolute, true, and mathematical time, in and of itself and of its own nature, without reference to anything external, ﬂows uniformly and by another name is called duration.
What’s meant by absolute? Einstein gave a good deﬁnition:[5, p55] “. . . absolute means not only physically real, but also independent in its physical properties, having a physical effect, but not itself inﬂuenced by physical conditions.” We’ll use absolute in Einstein’s sense—physically existing, but not inﬂuenced by physical conditions. Newton’s space and time are absolute in that sense: They exist—by deﬁnition—independent of anything else. These notions unravel in relativity. Space and time are not independent of each other, but are two aspects of a single entity: spacetime.
It’s understandable that space would be conceived as absolute. Look out at the night sky. Space appears as a vast, ﬁxed arena containing the objects of the universe. Already we’re up against cosmological questions. Does space exist independently of the objects in the universe (as Newton would have it), passively containing them, or do the properties of space manifest because of the objects in the universe (the picture afforded by GR)? Is the universe separate from the objects it contains? Is it a vast collection of independent objects, or is it a single entity? GR will weigh in on these questions.
1.3.2 Simultaneity—the death knell of absolute time Snap your ﬁngers. In the Newtonian framework you’ve just speciﬁed “now” at every point of the universe, no matter how distant, because time exists independently of space. That notion is indicated in Fig. 1.3. Two points in space having the same time are said to be simultaneous. An instant of time
Figure 1.3 Surfaces of simultaneity in Newtonian spacetime.
thus determines a three-dimensional surface of simultaneity,30 extending throughout all of space. Simultaneity is therefore absolute in pre-relativistic physics, existing independently of anything else. In relativity, simultaneity is not absolute—two events simultaneous in one IRF, are not in another. Sit equidistant between two friends, and have them snap their ﬁngers at the same time; you hear both simultaneously. To someone walking past you at a constant rate, however, the same ﬁngersnaps would not be simultaneous.31 The ﬁnger-snap would be heard ﬁrst from the sound source that the walker is moving toward. Whose description of these events is “right”? Relativity shows there is no absolute meaning to the “same time.” Absolute time does not exist—it’s not true that time exists independently of anything else. Time is not a parameter provided by the universe, as it is in pre-relativistic physics; relativity shows that time exists locally, relative to a given reference frame.
30Actually a three-dimensional hypersurface. Our familiar notion of surface (such as the surface of an apple) is a twodimensional set of points, or manifold, embedded in three-dimensional space. A hypersurface is an (n − 1)-dimensional manifold embedded in n-dimensional space. Manifolds and hypersurfaces will be systematically introduced in later chapters.
31The relativity of simultaneity is illustrated in Fig. 1.6.

Space, time, and spacetime 9
The term relativity is misleading. Relativity does not claim that “everything is relative” (as is sometimes falsely stated), only that some things are relative, such as simultaneity. Relative refers to measurements made relative to a given reference frame, the results of which may not be the same in all reference frames. The purpose of relativity is to discover what is not relative, that what is the same for all observers is a law of physics. Relativity shows that simultaneity is not a law of physics.
1.3.3 Absolute space—is it real?
Absolute space, “homogeneous and immovable,” would be the ultimate reference frame from which it could be decided whether objects are “really” at rest. How would we recognize an object absolutely at rest? The answer is, we can’t.32 Rest cannot be ascertained against a backdrop of “nothingness” (absolute space); there must be other objects around to compare with—rest exists only in relation to other objects, which can be considered reference frames. The same is true of motion. We cannot perceive motion in itself (relative to absolute space); motion is perceived only in relation to objects—all motion is relative.33 Nevertheless, if a reference frame could exist from which all motion is relative to, yet which is itself absolutely at rest, let yourself be at rest in that frame. Someone drifting by in a rocket ship would say you’re in motion! Everything moves with respect to everything else, and every inertial observer claims they are at rest.
Absolute space is thus an empty concept because only relative motion can be observed. Perhaps that’s why it went largely unchallenged in the 200 years between the time of Newton and the late 19th century, because it has no observable consequences.34 The concept of absolute space received support, however, from Maxwellian electrodynamics. Maxwell’s equations predict a speed of electromagnetic waves, but they don’t specify a reference frame—what better evidence for a preferred frame like absolute space? Physicists of the late 19th century inferred there must be only one reference frame in which the speed of light is c (called the ether frame, presumably absolute space). Einstein, however, reached the opposite conclusion: If Maxwell’s equation don’t specify a reference frame, all inertial observers measure the same speed of light.
1.3.4 Spacetime coordinates and notational conventions
In the theory of relativity time is taken as a coordinate in the speciﬁcation of physical phenomena, in addition to spatial coordinates. Ask a friend to meet you for coffee. You must specify a point in space, three coordinates (on the surface of Earth usually two sufﬁce), at a point in time, making four numbers in all. Thus, you’re asking to meet your friend at a speciﬁed spacetime point, i.e., event.
The “gist” of relativity is that different observers assign different coordinates to the same events, underscoring that coordinates are without fundamental signiﬁcance. Events are physical and exist independently of the coordinates assigned to them.35 The procedure in SR by which coordinates are assigned to events, the coordinization of spacetime, is discussed below. In GR, the assignment of spacetime coordinates is associated with its mathematical structure as a manifold. In SR, spacetime is ﬂat, while in GR spacetime is curved. Flat geometries can be covered by a single system of coordinates, whereas curved geometries require overlapping coordinate systems. Curved geome-
32The unobservability of absolute space underscores a lesson from the history of physics: Physics is based on what can be measured. Notions of what might or could exist “anyway,” but that we can’t detect, like absolute space, tend to get excised from physics. “Excess” theoretical structures imply that alternative theories are possible.
33Recall Galileo’s words (page 2): “Motion exists relative to things that lack it”. 34There were objections to absolute space most notably from George Berkeley and Ernst Mach. Berkeley’s 1721 essay On Motion objected to absolute space because it’s not observable; see [6], paragraphs 58, 59, and 64. Mach’s Science of Mechanics [7] (published in 1883) provided the most incisive and inﬂuential critique of Newtonian mechanics. Mach contended we’re not allowed to invent concepts like absolute space. In the world we know of, motion is relative. We should not invent concepts that contravene that fact. “No one is competent to predicate things about absolute space and absolute motion; they are things of thought, pure mental concepts, that cannot be produced in experience.” 35Spacetime in SR is absolute—existing, but not inﬂuenced by physical conditions.

10 Core Principles of Special and General Relativity

tries, however, are locally ﬂat—what we learn about coordinatizing spacetime in SR applies to

limited regions of spacetime in GR. To locate an object in three-dimensional space, three numbers,

or coordinates, must be speciﬁed. In the Cartesian coordinate system, the numbers are tradition-

ally denoted (x, y, z). But there are other coordinate systems, e.g., spherical coordinates, (r, θ, φ).

We’ll denote spatial coordinates in a way that doesn’t commit to a particular coordinate system with

the notation x1, x2, x3 , or simply xi, where it’s understood that i = 1, 2, 3. The use of super-

scripts takes some getting used to, but it’s standard notation in tensor analysis.36 When there’s a possibility for confusion, we’ll denote the square of x as (x)2 to avoid mistaking it with the coordinate x2; contrary to what you might think, problems of that sort do not occur often. The time

coordinate will be parameterized, for reasons explained in Section 1.4, as x0 ≡ ct. An event thus

has coordinates x0, x1, x2, x3. To save writing, spacetime coordinates are conventionally denoted xµ, where it’s understood that µ = 0, 1, 2, 3. Greek letters denote spacetime coordinates, xρ, while

Roman letters denote spatial coordinates, xk. The indices ρ and k are dummy indices having no

absolute meaning. Thus,

3 ν=0

xν

=

x0

+

3 j=1

xj .

As

we’ll

see,

two

types

of

coordinates

arise

in non-orthogonal coordinate systems: contravariant, denoted with superscripts, xµ, and covari-

ant, denoted with subscripts, xν. Because GR seeks to work in arbitrary coordinate systems—not necessarily orthogonal—both types of coordinates, xν and xν, will be used.

Sidebar discussion: In 1908 Hermann Minkowski delivered a seminal presentation, Space and Time,37 in which he showed that the results of SR, as derived algebraically by Einstein in 1905, have a natural and intelligible explanation when space and time are conceived geometrically as belonging to a four-dimensional continuum with a non-Euclidean geometry.

The views of space and time which I wish to lay before you have sprung from the soil of experimental physics, and therein lies their strength. They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve as an independent entity.

Many of the terms we use in relativity are due to Minkowski: Proper time, spacelike vector, timelike vector. He didn’t use the term lightcone, but he did speak of “front” and “back” cones, which we will call future and past lightcones. It’s clear that Minkowski had worked out much concerning the geometry of spacetime, what today we call Minkowski space (see Chapter 5). Minkowski died suddenly in 1909 at age 44; one can only wonder what additional contributions he might have made. What we call spacetime, Minkowski called the world: “A point of space at a point of time, that is, a system of values x, y, z, t, I will call a world-point. The multiplicity of all thinkable x, y, z, t systems we will christen the world.” The term worldline is due to Minkowski:

We ﬁx our attention on the substantial point which is at the world-point x, y, z, t, and imagine that we are to recognize this substantial point at any other time. Let the variations dx, dy, dz of the space coordinates of this substantial point correspond to a time element dt. Then we obtain, as an image, so to speak, of the everlasting career of the substantial point, a curve in the world, a worldline, the points of which can be referred unequivocally to the parameter t from −∞ to +∞. The whole universe is seen to resolve itself into similar worldlines, and . . . in my opinion physical laws might ﬁnd their most perfect expression as reciprocal relations between these worldlines.

36To quote O. Veblen (from 1927), [8, p1] “Recent advances in the theory of differential invariants and the wide use of this theory in physical investigations have brought about a rather general acceptance of a particular type of notation, the essential feature of which is the systematic use of subscripts and superscripts . . . .” The use of subscripts and superscripts is not as arbitrary as it might ﬁrst appear; the way the two types of indices are used in calculations is quite logical and consistent.
37Reprinted in The Principle of Relativity [9, p73], an important collection of articles by Einstein, Lorentz, Minkowski, and Weyl. A chance to read the original literature in English translation.

Spacetime diagrams 11

1.4 SPACETIME DIAGRAMS
Comprehending relativity is greatly facilitated through the use of spacetime diagrams, also called Minkowski diagrams, and we’ll use them freely. On such diagrams, time is displayed along the vertical axis, with spatial dimensions displayed on horizontal axes (see Fig. 1.4). It’s simplest to
t

∆x

x = v(t + t0)

A ∆t θ B

C

x

Figure 1.4 Particle worldlines: A is at rest, B is in uniform motion, and C is accelerated.
take time as orthogonal38 to the three-dimensional space of spatial variables, as in Fig. 1.3. While we employ an orthogonal spacetime coordinate system, the geometry of spacetime is non-Euclidean (as we’ll show); don’t be fooled into thinking that an orthogonal set of axes implies a Euclidean geometry. Many ingrained habits must be unlearned in “doing” geometry on spacetime diagrams, particularly in calculating distances. Particle A is at rest in the reference frame of Fig. 1.4. The “motion” (history) in spacetime of a stationary object is a line parallel to the time axis. Particle B has constant velocity; its worldline is straight, with speed v = ∆x/∆t = tan θ m s−1. Particle C is accelerating; its worldline is curved. In IRFs the worldlines of free particles are straight.
ct lightline
c∆t
∆x x
Figure 1.5 Lightline (photon worldline) on a spacetime diagram.
Of particular interest are the worldlines of photons; see Fig. 1.5. Using meters and seconds as the units of length and time, the worldline of a photon would be almost parallel to the spatial axis, with θ ≈ π/2. There’s nothing fundamental about units, however; one size doesn’t ﬁt all and it’s common to adopt units that are suited to the problem at hand (e.g., the electron volt). It’s convenient to scale times by 1/c ≈ 3.3 ns m−1, the time for light to travel one meter. With t → t/(1/c) = ct, the worldline of a photon—the lightline—is at the angle π/4 with respect to the ct and x-axes. We’ll draw lightlines at 45◦ angles relative to the space and time axes.
The relativity of simultaneity can be illustrated on a spacetime diagram. Consider a photon source C in a train car39 situated equidistant between detectors A and B; see Fig. 1.6. The source emits photons back to back. In a reference frame at rest with respect to the train car, photons arrive at the detectors simultaneously. In a reference frame at rest relative to the train station, however,
38In rotating reference frames, the orthogonality between space and time breaks down. 39Einstein’s relativity tends to get done in train stations and in elevators.

12 Core Principles of Special and General Relativity

ct

tA = tBA

B

ct A

B ∆(ct)

x C

x C

Figure 1.6 Relativity of simultaneity. Photons are received simultaneously in the frame of the emitter, but not in a frame in which emitter and receiver are moving to the right.

which the train is assumed passing through, the photon source and the detectors are in motion with speed v from left to right. The two frames synchronize their identical clocks when the origins of their coordinate systems coincide, whereupon the photons are emitted. Seen from the frame of the station, event A happens before B; the photon ﬁrst encounters detector A moving toward it. Simultaneity is not absolute: What’s observed as simultaneous in one IRF, is not in another.
There’s a fundamental reason to use ct as the temporal coordinate. The fusion of space and time into spacetime requires that spacetime coordinates all have the same dimension. The coordinates of an event in one IRF are, under the LT, a linear combination of the coordinates in another IRF, which can be accomplished only if space and time coordinates have the same dimension. We require a conversion factor between spatial and temporal measures, which must be the same for all IRFs. We’ll show (Chapter 3) that a LT followed by a LT, is itself a LT—what’s required by the principle of relativity that all IRFs be equivalent. Such universality is possible only if the conversion factor is universal. The principle of relativity requires a universal speed. Experiment shows that speed is the speed of light. For frames in relative motion along a common x-axis (see Fig. 3.1), the spacetime coordinates transform under the LT (Eq. (3.17))

ct   γ −βγ 0 0 ct

x

 

y

  

=

−βγ

 

0

γ 0

0 1

0  x 

0

 

y

 

,

(1.4)

z

0 0 01 z

where γ ≡ (1 − β2)−1/2 is the Lorentz factor, with β ≡ v/c. Under the LT, the time coordinate in S is a mixture of the time and space coordinates in S.40 For that reason, the time coordinate x0 = ct must have the dimension of length.
The worldline of an object at rest in a given IRF is parallel to the time axis in that frame, e.g., worldline A in Fig. 1.4. The worldline of A might just as well be the time axis in that frame, what we’ll assume from now on. Let observer B be at rest relative to A—see Fig. 1.7. At time t1, A emits a photon toward B that’s reﬂected by a mirror attached to B, with the return of the photon recorded at time t2. A concludes that B has the spatial coordinate xp = c(t2 − t1)/2, half the time difference between emission and reception, and that the reﬂection event occurred at time tp = (t2 + t1)/2, the mean of the two times. This procedure is called the radar method of coordinatizing spacetime;
it assigns spacetime coordinates to events,

(ctp, xp) =

1 2

c(t2

+

t1),

1 2

c(t2

−

t1)

(1.5)

40It is sometimes said (erroneously) that the GT is the version of the LT for low speeds, v true—the GT does not mix in the spatial coordinate for the new time coordinate, at any speed.

c, yet that cannot be

A ct

B

ct2

Relativity of causality 13

(ctp, xp)

ct1 x

Figure 1.7 Radar method of assigning coordinates to events.

based on measurements made by A using light signals.41 The radar method builds in the isotropy of the speed of light (established in the MM experiment). The “outbound” speed of light is the same as that for the photon’s return journey, and we are free to orient A and B in any direction.
We can redraw Fig. 1.7 as the left portion of Fig. 1.8. A photon emitted at time t1 is reﬂected

t2 (t, x)

t + x/c

(t, x)

t1

t − x/c

Figure 1.8 Photon emitted at t − x/c is reﬂected at (t, x) and received at t + x/c.
from spacetime point (t, x) and received at time t2. We haven’t drawn observer B in Fig. 1.8, whose only role in Fig. 1.7 was to hold a reﬂector. Using Eq. (1.5), we can solve for t1 and t2 in terms of (t, x), t1 = t − x/c and t2 = t + x/c; these times are shown in the right portion of Fig. 1.8.

1.5 RELATIVITY OF CAUSALITY: SPACELIKE AND TIMELIKE

While spacetime coordinates are reference-frame dependent, there is an invariant involving the squares of coordinates that’s the same for all inertial observers,42 the spacetime separation

3
s2 ≡ −(x0)2 + (xi)2 .
i=1

(1.6)

For an event with coordinates xµ in one IRF, the coordinates of the same event in another IRF, xµ , are such that43

−(ct )2 + (x )2 + (y )2 + (z )2 = s2 = −(ct)2 + (x)2 + (y)2 + (z)2 .

(1.7)

41Note that the radar method does not call upon us to compare times as measured in different reference frames; it uses measurements made in a single reference frame.
42That there is an invariant quantity among coordinates assigned by different inertial observers to the same event implies that spacetime possesses an intrinsic geometry—the subject matter of the rest of the book.
43Equation (1.7) applies to IRFs having a common spacetime origin. Note the prime placed on the index, xµ .

14 Core Principles of Special and General Relativity

Equation (1.7) can be veriﬁed using the special case of a LT given by Eq. (1.4); it’s true, however, for any LT. In Chapter 4 we deﬁne a LT as any linear transformation that leaves s2 invariant. The space-
time separation is an example of a quantity that’s not relative—it’s observer independent. A way to motivate the invariance of s2 is to consider two IRFs in relative motion along their common x-axis.
At the instant their origins coincide, a ﬂash of light is emitted. Both see an expanding wavefront that in their coordinates is described by −(ct)2 + x2 = 0. In whatever way the coordinates transform between IRFs, by the principle of relativity both must conclude that x2 −(ct)2 = 0 = (x )2 −(ct )2. While a wavefront of light is described by s2 = 0, Eq. (1.7) holds for any value of s2.

x0

xµ(1) ∆xµ

xµ(2)

x1

Figure 1.9 Spacetime separation vector between distinct events.

The separation between events is deﬁned analogously.44 Consider two events in a given IRF that have coordinates xµ(1) and xµ(2). Deﬁne the difference vector45 ∆xµ (see Fig. 1.9)

x0

x0

∆x0

∆xµ

≡

xµ(2)

−

xµ(1)

=

x1 x2

−

x1 x2

  

≡

∆x1 ∆x2

.

x3 (2)

x3 (1)

∆x3

The spacetime separation between these events is deﬁned in the same way as in Eq. (1.6):

3
(∆s)2 = −(∆x0)2 + (∆xi)2 .
i=1

(1.8)

Even though the separation has been deﬁned as the square of the quantity ∆s, the value of (∆s)2
can be, depending on the events, positive, zero, or negative. There is the temptation to deﬁne ∆s itself as an imaginary quantity when (∆s)2 < 0, a temptation we will resist.46
The three possible signs of (∆s)2 provide an absolute way of characterizing spacetime separations. Because (∆s)2 is an invariant, no LT can change its sign.

 spacelike  (∆s)2 is called: lightlike
timelike

if (∆s)2 > 0 ; if (∆s)2 = 0 ; if (∆s)2 < 0 .

Figure 1.10 shows examples of the three types of spacetime separations. Timelike separations do not have to be “above” the lightline, nor spacelike separations “below.” It’s the slope of the lines that counts, not their location in a spacetime diagram.

44Because coordinates are deﬁned relative to an origin, s2 is the separation between the event with coordinates xµ and
the event at the origin. 45Such a vector is called a four-vector; see Chapter 5. 46One could either work with a Euclidean geometry that allowed pure imaginary distances, or one could work with a
non-Euclidean geometry from the outset. The latter is more in keeping with the requirements of GR; we will not venture down the path of “ict”—as was done in the early days of relativity.

Noninertial reference frames 15

ct timelike

lightlike spacelike

x

Figure 1.10 Timelike, lightlike, and spacelike separations of spacetime events.

We can always ﬁnd a reference frame in which spacelike-separated events are simultaneous: For ∆t = 0, it is automatically the case that (∆s)2 > 0. However, for frames in which (∆s)2 > 0,
∆t can be of either sign or zero. Thus, one cannot speak of a causal relation between spacelike-
separated events. For A and B spacelike-separated events, one can ﬁnd frames in which the events occur in either order,47 in which A precedes B or B precedes A. This fact is a major departure from pre-relativistic physics, in which the time order of events is absolute.48 Timelike-separated events,
on the other hand, can never be simultaneous: No reference frame can be found for which ∆t = 0 as it would violate (∆s)2 < 0. The temporal order in which timelike-separated events occur is
therefore absolute because we can’t ﬁnd a frame in which ∆t vanishes. Only for timelike-separated
events can we speak of causality.

1.6 SEGUE TO GENERAL RELATIVITY: NONINERTIAL FRAMES
Newton’s laws work in IRFs, which as we have seen, are frames of reference in which Newton’s laws work! What saves us from a circular trap is the ability to identify physical sources of force; only in IRFs is the acceleration of objects solely due to forces—only in IRFs does the Newtonian paradigm apply (F = ma). From the point of view of fundamental physics, Newton’s second law is limited by its specialization to IRFs. GR provides equations of motion valid in arbitrary frames of reference. To what extent do noninertial reference frames ﬁnd use in Newtonian dynamics, despite nominally being excluded from the framework of pre-relativistic mechanics? Such a question might appear off topic, but given that SR is based on the equivalence of IRFs, and that GR frees itself from IRFs, it’s useful to look at pre-relativistic uses of noninertial frames.

1.6.1 Linear acceleration

Referring to Figure 1.1, r = R + r where now we allow all quantities to be time dependent. Differentiating twice with respect to absolute time, r¨ = A + r¨ where A ≡ R¨ is the relative acceleration between frames.49 Let S be an IRF in which the observed acceleration of a particle of
mass m is associated with a force, r¨ = F /m. We therefore have an equation similar to Newton’s

second law:

1

r¨ = (F − mA) .

(1.9)

m

The acceleration observed in the accelerated frame (r¨ ) is due to forces (F ) and the force-like

quantity −mA, termed the ﬁctitious force, so named because, while it has the dimension of force,

is not a force; genuine forces can be traced to physical interactions. Acceleration and force have the

same values in all IRFs; they are absolute, observer-independent quantities. In noninertial frames, r¨ is an apparent acceleration: It’s not absolute, it’s reference-frame dependent; from Eq. (1.9) r¨ is

47Demonstrated in Section 2.4. 48For the most part the predictions of SR smoothly go over to those of Newtonian mechanics as v/c → 0. Certain
conclusions, however, have no counterpart in pre-relativistic physics, such as the acausality of spacelike-separated events. 49The transformation of acceleration under the LT, which does not assume absolute time, is covered in Chapter 3.

16 Core Principles of Special and General Relativity

offset from the acceleration due to forces F /m by the acceleration of the reference frame, A. The
acceleration A in the ﬁctitious force is the acceleration of the frame, not that of the particle. Figure 1.11 shows a noninertial frame N , an elevator accelerating relative to IRF I.50 In I, a

N

N

A

A

free particle

elevator meets particle

I

I

Figure 1.11 Left: In I r¨ = 0 (free particle); in N , apparent acceleration r¨ = −A. Right: In N , r¨ = 0 (at rest); in I, F = mA.
free particle has inertial motion, r¨ = 0, whereas in N it has acceleration r¨ = −A. An observer in N concludes a force produces the observed acceleration, yet there is no force, no physical agency acting on the particle, which is why −mA is called ﬁctitious. When the elevator ﬂoor meets the particle, however, the ﬁctitious force becomes real.51 At this point, the elevator prevents the particle from continuing (“persevering”) in its inertial motion. An inertial observer concludes there is a force on the object, F = mA, which follows from Eq. (1.9) with r¨ = 0. The object resists changes in its inertial state and exerts a force back on the elevator, −mA ≡ Fi (“endeavors to change the state of that obstacle”).52 When observed from a noninertial frame, a particle moving by inertia appears to accelerate in the direction opposite to the acceleration of the frame; we can still apply Newton’s second law in this case by regarding the apparent acceleration as caused by a ﬁctitious force. When, however, the particle is prevented from moving by inertia and made to move with the acceleration of the frame, the particle resists acceleration through a real force, the inertial force.

1.6.2 Rotating reference frame

Inertial forces arise in rotating reference frames. Consider a frame (x , y , z ) rotating at a constant rate Ω relative to an IRF (x, y, z) about the common z, z axis. As is well known,53 the acceleration
observed in a rotating frame is related to the force F through an equation analogous to Eq. (1.9),

1 r¨ = (F − 2mΩ × r˙ − mΩ × Ω × r) .
m

(1.10)

The inertial force Fi = −2mΩ × r˙ − mΩ × Ω × r involves the Coriolis force and the centrifugal force. These forces are quite real, as anyone who has ridden a merry-go-round can attest.

50Assume the elevator is sufﬁciently outside the gravitational ﬁeld of Earth that gravity can be ignored. 51It’s sometimes said, incorrectly, that any force observed in a noninertial frame is a ﬁctitious force. Forces in noninertial
frames are quite real, as anyone who’s ridden in an automobile can attest. “Real ﬁctitious forces” are best given another
name—inertial force—because they arise from the inertia of matter. 52We’re referring to Newton’s words cited in Section 1.2. 53Any standard text on classical mechanics will have a derivation of the centrifugal and Coriolis forces.

General relativity is a theory of gravitation 17

1.6.3 D’Alembert’s principle
By d’Alembert’s principle,54 Newton’s second law is written in a seemingly trivial way F −ma = 0, equivalently F +Fi = 0, so that an object in motion can be treated as if in static equilibrium between impressed forces F and the inertial force Fi (produced by the mass in response to the changes in inertial motion brought about by F ).[10, p88] That a mass in motion can be treated as if at rest underscores the relativity of motion. An object appears at rest in a frame moving with an object (r¨ = 0), and in such a frame we have from either Eq. (1.9) or Eq. (1.10) equilibrium between impressed and inertial forces. D’Alembert’s principle could be considered a precursor to GR—it gives insight into how an equation of motion might appear in an arbitrary frame of reference.
An example from elementary mechanics illustrates these ideas. The left portion of Fig. 1.12 schematically shows a car undergoing acceleration A as seen from an IRF. Attached to the car is

A θ m

IT m W

N

T

Fi m

W

Figure 1.12 Ball hanging from the ceiling of an accelerating car. Forces as seen from an inertial frame, I, and a noninertial frame, N .
a ball of mass m hanging from a string. The forces “impressed” on the ball are the tension T in the string and its weight W . As shown in the middle portion of Fig. 1.12, these are the forces that cause the ball to undergo the acceleration observed in inertial frame I, T + W = mA. In the usual coordinate system involving horizontal and vertical components, where vertical is deﬁned by the direction of gravity, Newton’s second law separates into two scalar equations T sin θ = mA and T cos θ−mg = 0, from which we ﬁnd tan θ = A/g and T = m g2 + A2. In the noninertial frame N of the car, no acceleration is observed and Eq. (1.9) gives the same equation of force balance: 0 = T + W − mA. When the car is not accelerated, the ball hangs “straight down” in the direction of gravity with the tension equal to the weight, T = mg. With the car accelerated, we can view the tension as balancing the resultant of W and the inertial force Fi = −mA, T = − (W + Fi) (right portion of Fig. 1.12). Alternatively, the inertial force is the opposite of (reaction to) the resultant of the physical forces, T and W , Fi = − (T + W ).

1.7 GENERAL RELATIVITY: A THEORY OF GRAVITATION

1.7.1 Newtonian gravitation—consistent with the theory of relativity?

In elementary physics one ﬁrst learns about Newton’s law of motion F = ma, which applies for any force F , and, second, Newton’s law of gravitation—an expression for a force law—that masses m1 and m2 at locations r1 and r2 experience an attractive force of magnitude

F

=

G

m1m2 |r1 − r2|2

,

where G is the gravitational constant. Newton’s law of gravity works well in explaining many phenomena, from predicting solar eclipses to sending satellites to distant planets. Despite its successes, however, Newtonian gravitation is not consistent with relativity, for two main reasons.

54D’Alembert’s principle is a formulation of classical mechanics equivalent to Hamilton’s principle (a more well-known formulation of classical mechanics.) See Appendix D.

18 Core Principles of Special and General Relativity

1. The locations speciﬁed by r1 and r2 in Newton’s formula are implicitly assumed to occur at the same time. Relativity shows there is no absolute meaning to the “same time.”
2. What mediates gravity? If m1 were to move suddenly, Newton’s formula would have that the force on m2 would change instantly, yet instantaneous interactions are not physical, the bugaboo of action at a distance.55
Newton wrote in 1693:[11, p217] “. . . so that one body may act upon another at a distance through a vacuum without the mediation of anything else, by and through which their action and force may be conveyed from one to another, is to me so great an absurdity, that I believe no man who has in philosophic matters a competent faculty of thinking could ever fall into it.” In 1713 he wrote,[4, p943] “I have not as yet been able to deduce from phenomena the reason for these properties of gravity, and I do not feign hypotheses . . . it is enough that gravity really exists and acts according to the laws that we have set forth, and is sufﬁcient to explain all the motions of the heavenly bodies . . . ”. Newton appeals to pragmatism: Even though he can’t explain the workings of gravity, his law of gravity works and works well and, as he tells it, explains “all” the motions of celestial bodies. Or does it?

1.7.2 Do we need a relativistic theory of gravitation?

Under what conditions do relativistic effects become important in gravitational physics? We know that modiﬁcations to Newtonian dynamics manifest as speeds become comparable with the speed of light, v c. What relativistic effects are speciﬁcally associated with gravity? Consider the energy of the gravitational ﬁeld. In Newtonian theory, the energy stored in the gravitational ﬁeld of a mass M of radius R with uniform mass density is given by the expression56

3 GM 2 Egrav = 5 R .
Let’s ignore the numerical factor and take as a measure of gravitational energy the terms GM 2/R. The rest energy is another kind of energy, Erest = M c2. By forming the ratio Egrav/Erest we obtain a characteristic dimensionless number specifying the gravitational energy relative to the rest energy,

Egrav Erest

=

GM Rc2

≡

Φ c2

,

(1.11)

where Φ is the gravitational potential—the gravitational potential energy per mass—which has the dimension of speed squared.57 Newton’s law of gravity, like Coulomb’s law, is a 1/r2 law. Any

result obtained in electrostatics has an analog in Newtonian gravity. For future reference, Table 1.1 compares the properties of the Newtonian gravitational ﬁeld with those of the electrostatic ﬁeld.

55Newton’s law of gravitation was controversial when it was introduced. Aristotle taught that heavenly objects (stars and planets) by their nature move in circles at constant speed, while on Earth heavy objects move toward the center of Earth. Stones fall, but planets don’t. Descartes, in an attempt to explain planetary orbits, proposed that the sun sets up a whirlpool motion to keep planets moving in circular motion. Kepler (at roughly the same time) discovered that planets move in elliptical orbits, not circular. It’s against this backdrop that Newton’s law of gravity is startling. Newton offered no explanation of how the sun could exert an inﬂuence on Earth over vast distances—action at a distance. He “merely” offered a formula that predicts the motion of objects subject to gravity. With his inverse-square law, Newton could account for Kepler’s three laws of planetary motion; he also showed that Descartes’s whirlpool hypothesis contradicts Kepler’s third law. To illustrate the difﬁculty inherent with action at a distance, what would you think of a theory purporting that radiant energy disappears from the sun and eight minutes later appears on Earth without accounting for how it happens? Newton’s law of gravity is an effective, phenomenological description that provides no explanation for the mechanism of gravity. As we’ll see, GR holds that spacetime itself is the underlying substrate that mass couples to.
56A similar expression holds for the energy stored in the electric ﬁeld associated with a uniform ball of charge, a calculation you’ve probably already done.
57The electrostatic potential is the energy per charge, which is given a special unit—a volt is a joule per coulomb. Gravitational energy per mass (the gravitational potential) has the dimension of speed squared; just think of kinetic energy ∝ mv2.

General relativity is a theory of gravitation 19

Table 1.1 Comparison of Newtonian gravitation theory with electrostatics.

Force between point objects Field vector of a point source Gauss’s law

Newtonian gravitation
Mm Fgrav = −G r2 rˆ
GM g = − r2 rˆ ∇ · g = −4πGρ

Irrotational ﬁeld (of point source) Potential energy of point objects Potential of a point source Poisson equation Potential of an extended source
Local energy density

∇×g = 0
GM m U (r) = −
r GM Φ(r) = −
r ∇2Φ = 4πGρ
Φ(r) = −G ρ(r ) d3r |r − r |
|g(r)|2 −
8πG

Electrostatics

Qq Felec = 4π 0r2 rˆ
Q E = 4π 0r2 rˆ ∇ · E = ρelec/ 0

∇×E =0

Qq Uelec(r) = 4π 0r
Q Φelec(r) = 4π 0r ∇2Φelec = −ρelec/ 0

1 Φelec(r) = 4π 0

ρelec(r ) d3r |r − r |

0 |E(r)|2 2

A large value of the ratio in Eq. (1.11) (of order unity) would indicate an object for which the gravitational energy is comparable to M c2. The dimensionless quantity in Eq. (1.11) occurs in GR
as a measure of the signiﬁcance of relativistic effects in gravity; be on the lookout for it. While v c is an indicator that Newtonian dynamics provides an accurate description, Φ c2 is an
indicator that Newtonian gravitation should sufﬁce. Numerical values of this ratio are listed in Table
1.2 for various systems.

Table 1.2 Ratio of gravitational to rest-mass energy. System GM/(Rc2) ≡ Φ/c2

Comment

Earth Sun Black hole

10−9 10−6 0.5

GPS system inoperable without relativistic corrections
Precession of planetary orbits unaccounted by Newtonian mechanics
As relativistic as it gets

Universe

0.5

Ditto for the universe!

The gravitational energy of the Earth is seemingly a negligible fraction (10−9) of its rest energy and thus we would conclude that the Newtonian theory of gravity should sufﬁce. While largely true, there are nonetheless small effects due to time dilation in a gravitational ﬁeld that must be taken into account if the global positioning system (GPS) is to operate properly. Gravitational time dilation is not the special relativistic time dilation (“moving clocks run slow”), but rather is an effect associated with gravity, that clocks run slower the deeper they are in a gravitational potential well. The GPS system would go wrong in a matter of minutes if relativistic effects were not taken into account. Even for weak gravity there are important effects that Newtonian theory cannot describe.

20 Core Principles of Special and General Relativity

For the sun, with Egrav ≈ 10−6M c2, the orbit of Mercury precesses at a small but measurable rate that cannot be accounted for in Newtonian mechanics, yet which is explained precisely by GR. The precession of orbits is one of the classic tests of GR.58

Newton’s law of gravity contains no characteristic length scale over which it applies: It’s

intended to apply for any distance. GR, however, features an intrinsic length associated with a spherically symmetric mass M , the Schwarzschild radius rS ≡ 2GM/c2. (Remarkably, the
Schwarzschild radius can be obtained from Newtonian mechanics as the radius of an object for

which the escape velocity vesc = 2GM/R = c.) If M lies within the Schwarzschild radius, then

r = rS deﬁnes an event horizon for external observers: Signals emitted cannot reach outside ob-

servers and we have a black hole. Black holes are regions of spacetime from which nothing, not even

light,

can

escape.

For

black

holes,

Egrav/Erel

=

1 2

.

Clearly

implicit

in

the

description

of

a

black

hole

is the prediction that gravity affects the propagation of light. Gravitational lensing, the deﬂection of

light by gravity, is an experimental tool for investigating dark matter, a hypothesized form of matter

that, while not luminous, can nevertheless be inferred from its gravitational inﬂuences.

For the universe, GM/Rc2 can be estimated from its mean mass density ρ and size R: M =

4 3

πR3

ρ.

Let

ρ

be

the

critical

density

obtained

from

cosmological

theory,59

ρc

≡

3H02/(8πG)

≈

10−29

g cm−3, where H0 is the Hubble constant. Thus, GM/Rc2

=

1 2

(RH0/c)2

.

Take

the

size

of the universe to be R = ctH where tH ≡ H0−1 is the Hubble time, the approximate age of the

universe.

With these

substitutions,

GM/Rc2

=

1 2

.

While one can question

any

of these

assumptions,

the larger point is that the universe is “just” as relativistic as a black hole!

Because gravity is always attractive, why doesn’t the universe collapse? Newton concluded that

the universe must be inﬁnite in extent to avoid such a collapse. GR, however, predicts an expanding

universe! To preclude this possibility,60 Einstein introduced an adjustable constant, the cosmological

constant Λ, with the purpose of producing a static, ﬁnite-sized universe. It was later shown (in 1922, by Alexander Friedmann) that GR predicts an expanding universe no matter what the value61 of Λ.

The “standard model” of cosmology, the Friedmann-Robertson-Walker model, is derived from GR,

including Λ, a term now thought to be associated with dark energy, a proposed form of energy

that leads to a universe that’s not only expanding, but is accelerating in its expansion. In 1998 an acceleration to the expansion of the universe was discovered, and Λ was invoked as an explanation.62

Thus, astrophysical and cosmological phenomena63 require for their explanation a relativistic

theory of gravitation. We need a theoretical framework that can handle arbitrary gravitational ﬁelds,

from the environment near planets and stars, to that of black holes, and ultimately the universe. GR

is a theoretical tool for describing spacetime that incorporates the effects of gravity.

1.7.3 Thinking about relativistic gravity
Can Newtonian gravity be “ﬁxed up” so as to be relativistically correct? The short answer is no. No “tweak” of Newton’s formula has ever been found, perhaps with factors of γ here and there; it takes a major revamping of our concepts of space and time.
It’s instructive to ask, given that action at a distance is a ﬂaw of Newtonian gravitation, how is that problem sidestepped with Coulomb’s law, which has the same structure as Newton’s law of
58The three classic tests of GR are the precession of orbits, the bending of light by gravity, and the gravitational redshift. 59The mean density of the universe ρ is thought to be quite close to the critical value, ρc. Knowledge of ρ is of crucial importance to cosmology, as it determines whether the universe is open or closed. It’s found that ρ/ρc = 1.0023 ± 0.005. When contemplating a number like 10−29 g cm−3, it’s helpful to keep in mind the density of Earth (∼ 5.5 g cm−3) or the density of the sun (∼ 1.4 g cm−3). The universe as a gravitating system can perhaps be considered another state of matter, that which is governed by an incomprehensibly small density. 60In 1929, it was deduced that the universe is expanding from the redshift in spectral lines observed from distant galaxies. To Einstein in 1917, it was obvious that the universe must be static. 61As shown by Friedmann, Einstein’s static solution of the equations of GR is not stable against small perturbations. 62The 2011 Nobel Prize in Physics was awarded for the discovery of the accelerating expansion of the universe. 63And even the terrestrial GPS system.

General relativity is a theory of gravitation 21

gravity? The force between charges q1 and q2 has magnitude

F

=

k

q1q2 |r1 − r2

|2

,

where k is a proportionality factor that depends on the unit of charge adopted. Coulomb’s law suffers
from the same disease as Newton’s law—action at a distance and instantaneous interactions. What saves the day is the ﬁeld concept. Charge q1 sets up a condition in space—the electric ﬁeld—that q2 interacts with at its location, which can be symbolized: Charge1 ←→ Field ←→ Charge2. We obtain an expression for the static electric ﬁeld simply by rewriting Coulomb’s law,

F = q2

k

|r1

q1 − r2

|2

≡ q2E .

Now, merely writing F = qE would be a change of variables if we didn’t ascribe physical reality to the ﬁeld. And we do ascribe reality to the ﬁeld because we discover—using Maxwell’s equations—that the electromagnetic ﬁeld is a dynamical quantity that propagates at the speed of light and transports energy and momentum. Through Maxwell’s equations, we discover that the electromagnetic ﬁeld satisﬁes a wave equation. Thus, electromagnetism is not transmitted instantaneously as Coulomb’s law would lead us to suspect, but is instead a propagating ﬁeld at ﬁnite speed. Is the same true of gravity? The concept of a ﬁeld, one that has dynamical properties, answers the problem posed by action at a distance: It’s the ﬁeld that mediates the interaction between particles, and the ﬁeld propagates at ﬁnite speed.
Physics thrives on analogies. The paradigm of propagating ﬁelds leads us to ask: Can we formulate a ﬁeld theory of gravity? Start by rewriting the force law:

F = m2

G

|r1

m1 − r2|2

≡ m2g ,

where g signiﬁes the gravitational ﬁeld. The Newtonian gravitational ﬁeld satisﬁes Gauss’s law ∇·g = −4πGρ, where ρ is the local mass density. Note that the divergence of g is negative—there’s a negative ﬂux of ﬁeld lines through any closed surface; gravity is always attractive. This seems like a promising start, but what are the other “Maxwell equations” for gravity? Recall the crucial discoveries in electromagnetism: Charges in motion (currents) produce magnetic ﬁelds, time-varying electric ﬁelds induce magnetic ﬁelds, and time-varying magnetic ﬁelds induce electric ﬁelds. Are there analogous phenomena in gravity? Does matter in motion lead to new phenomena, akin to a magnetic ﬁeld, that affect the motion of nearby masses?
There are no “Maxwell equations” for the gravitational ﬁeld that have been discovered through experiments, akin to Faraday induction. Thus there is no way, based on analogies with the electromagnetic ﬁeld, to develop a ﬁeld theory of gravity. Yet that’s what GR accomplishes—a relativistic ﬁeld theory of gravity distinct from the theory of the electromagnetic ﬁeld. Once the machinery of GR has been developed, we’ll discover analogs between gravity and electromagnetism in limiting cases, that the gravitational ﬁeld satisﬁes a set of equations analogous to the equations of electrostatics and magnetostatics. GR predicts frame dragging, a gravitational analog of the Lorentz force in electromagnetism—that spacetime is altered by objects in motion, “dragging” nearby objects out of position compared to the predictions of Newtonian physics. While the frame-dragging effect is small, experimental conﬁrmation was reported in 2011. GR also predicts a propagating disturbance in spacetime, gravitational waves, which were detected in 2016.64

64The 2017 Nobel Prize in Physics was awarded for the observation of gravitational waves.

22 Core Principles of Special and General Relativity

1.7.4 How does GR work?

The central content of GR is the Einstein ﬁeld equation which schematically has the form

8πG Local curvature of spacetime = c4 (Local energy-momentum density) .

The curvature of spacetime, or equivalently, the geometry of spacetime, is determined by the energy-

momentum contained in that spacetime. Spacetime curvature in turn completely determines the

trajectories of particles. Mathematically, the Einstein ﬁeld equation is a relation between second-

rank tensor ﬁelds65

8πG Gµν = c4 Tµν .

(1.12)

(You’ll know what this all means soon enough: Gµν is the curvature tensor and Tµν is the energymomentum tensor that describes the density and ﬂux of energy-momentum in spacetime.) Just as

Maxwell’s equations relate the electromagnetic ﬁeld to its sources (charge and current densities), the Einstein equation relates spacetime curvature to its source: energy-momentum density. In Maxwell’s

equations, the electromagnetic ﬁeld is on spacetime; in Einstein’s equation, spacetime itself is the

ﬁeld! Gravity is not a force in the usual sense; gravity is spacetime!

The spacetime separation, Eq. (1.8) can be written

33

(∆s)2 =

ηµν ∆xµ∆xν ,

µ=0 ν=0

where the quantity ηµν is the Lorentz metric tensor

−1 0 0 0

ηµν

=

  

0 0

1 0

0 1

0 0

.

0 001

(1.13) (1.14)

The metric tensor66 contains the information required to calculate the separation between spacetime

points with coordinate differences ∆xµ. Note the metric “signature” in Eq. (1.14), the terms on the

diagonal, (− + ++). This pattern holds for all inertial observers; the metric tensor in SR is ﬁxed.

Because of the minus sign for the time coordinate, the geometry is not Euclidean.67

The worldlines of truly free particles would be straight throughout all of spacetime. When one

contemplates gravitation, one realizes that global inertial frames (holding for all of spacetime) are

an idealization: We can’t avoid the rest of the matter of the universe! Force-free motion can therefore

have only approximate validity. In GR the separation between spacetime points, which in Eq. (1.13)

applies for ﬁnite coordinate differences ∆xµ, is replaced by inﬁnitesimally separated spacetime

coordinates dxµ:

33

(ds)2 =

gµν (x)dxµdxν ,

(1.15)

µ=0 ν=0

65The Einstein ﬁeld equations are a set of 10 equations between the elements of second-rank symmetric tensors in
the four-dimensional geometry of spacetime. These equations are variously referred to in both the singular (Einstein’s ﬁeld
equation), because it’s one equation between two tensors, and in the plural (ﬁeld equations), because there are 10 independent
equations between the components of the curvature tensor and the energy-momentum tensor. When you refer to “Einstein’s equation,” people assume you’re referring to E = mc2. The Einstein ﬁeld equations are vastly richer in content than E = mc2.
66All things tensor will be explained in upcoming chapters. 67The spacetime geometry of SR is called semi-Euclidean because while not strictly a Euclidean geometry, which would
have metric signature (+ + ++), is nevertheless a ﬂat geometry.

General relativity is a theory of gravitation 23
where the quantities gµν(x) are not constant, but vary throughout spacetime—a metric tensor ﬁeld.68 The curvature tensor Gµν in Eq. (1.12) is, as we’ll see, a complicated expression involving
derivatives of the metric tensor ﬁeld gµν(x). The Einstein ﬁeld equation implies a set of ten nonlinear partial differential equations for gµν(x). Once the tensor components gµν(x) are known, the equations of motion for particles and photons are known. Particles and photons in free fall (subject to no forces other than gravity) follow geodesic curves, the shortest possible paths in spacetime, determined through a variational principle δ ds = 0. Motion, however, determines the energymomentum tensor Tµν. There is thus a feedback mechanism; see Fig. 1.13. Motion determines the

Gµν

Tµν

gµν

Figure 1.13 Motion determines spacetime curvature, which determines motion.
energy-momentum tensor Tµν, which determines the curvature tensor through the Einstein equation, the solution of which determines the metric tensor gµν, which controls motion. GR explains gravity in terms of a varying metrical relation between neighboring spacetime points, gµν(x), wherein particles get closer together in the future than they are now. Gravity is a manifestation of the curvature of spacetime, that determined by the distribution of energy-momentum.

1.7.5 Gravity is spacetime
That last statement requires elaboration, which we’ll do in a roundabout way. What do we need to understand GR? For one, we have to enlarge our mathematical toolbox. The mathematics of curvature is the province of differential geometry, the theory of tensor ﬁelds on curved manifolds. The requirement imposed by the principle of relativity, that laws of physics be independent of the reference frame used to represent them, leads to a program, the principle of covariance, of expressing equations of physics as relations among tensors because, if a tensor equation is true in one reference frame, it’s true in all reference frames. Once the mathematical foundation of tensor ﬁelds has been laid, the Einstein ﬁeld equation can be introduced forthwith. It’s important to recognize that a truly new equation of physics cannot be derived from something more fundamental. Once the ﬁeld equation has been written down (however it was conceived), there isn’t a lot of wiggle room: Either its predictions agree with experimental measurements or they don’t, and so far GR has passed every test put to it. Is that all we need, more math, in particular the mathematics of tensors? (That and the physical insight of Albert Einstein.) What about SR? In a sense GR doesn’t require SR, implausible as that might sound. The thesis of GR is that energy-momentum causes spacetime curvature, where spacetime is modeled as a four-dimensional manifold. The surface of Earth is curved yet we know locally it can be approximated as ﬂat. So too with spacetime: Curved spacetime is locally ﬂat. Manifolds are locally ﬂat at any point, where the condition for ﬂatness is that the derivatives of the metric tensor vanish in a neighborhood of that point. That leaves open the question of what the metric tensor should be for small regions of spacetime. Einstein’s answer is that it should correspond to the metric of SR. In 1911, in the time between the development of SR and that of GR, Einstein proposed the equivalence principle that the effects of gravitation are eliminated in a reference frame of limited spatial extent that’s freely falling in gravity. In a freely falling frame, objects not subject to forces (other than gravity) remain either at rest or in a state of uniform motion;69 a freely falling
68We study tensor ﬁelds in Chapter 13. 69Einstein called this the happiest thought of his life.

24 Core Principles of Special and General Relativity

frame is therefore an IRF, where SR holds sway. SR therefore becomes a theoretical “boundary condition” on GR: the theory of spacetime that results for vanishing gravity. GR must give rise to two incompatible limits as shown in Fig. 1.14: SR as G → 0 and Newtonian gravity for v c. GR thus

General relativity
G→0
Special relativity

vc
Newtonian gravity

Figure 1.14 General relativity supersedes special relativity and Newtonian gravity
supersedes both theories.70 It’s only in this sense that GR needs SR; GR is the more comprehensive theory. Even though it serves as but a limiting case of GR, it’s important to develop SR to understand what is discarded from the Newtonian framework. We do this ﬁrst without tensors, and then, once tensors have been introduced, we develop special-relativistic physics in tensor form. Getting back to gravity, freely falling particles move along geodesic curves in four-dimensional spacetime at a constant rate—no acceleration, no force required.71 In three-dimensional, space-only geometry, such particles appear to accelerate, which the Newtonian paradigm associates with a force. It’s from this perspective we say that gravity is not a force in the usual sense but rather is a manifestation of the properties of spacetime. This point of view is fully developed in the book.

1.8 HASTA LA VISTA, GRAVITY
In the next chapter we begin a systematic exposition of SR, ﬁrst without tensors, and then once tensors have been introduced (Chapter 5), we cover special-relativistic physics using tensors, following through with Einstein’s program of the principle of covariance. Only in Chapter 15 is gravity taken up as a manifestation of spacetime curvature. At this point we say hasta la vista gravity, knowing that we’ll catch up with you further on down the trail.

SUMMARY
We have presented an overview of the special and general theories of relativity without delving into speciﬁcs. Many deﬁnitions have been introduced which form the basic vocabulary of the subject.
• The theory of relativity is an outgrowth of a single idea, the principle of relativity, that the laws of physics be expressed in a way that’s independent of the reference frame used to represent them. The principle of covariance is the requirement that equations of physics adhere to the principle of relativity by having the same mathematical form in all reference frames, a goal achieved by expressing equations as relations between tensors deﬁned on a four-dimensional geometry where time is a dimension.
• A primitive concept in relativity is an event—a point in space at a point in time. The totality of all events is a four-dimensional continuum: spacetime. The theory of relativity is the study of the geometry of spacetime, the relation between points of space and points of time—which in broad terms is what physics is about. In SR spacetime is absolute—physically existing, but not inﬂuenced by physical conditions. In GR, spacetime geometry is determined by the distribution of energy and momentum. GR achieves a symmetry between spacetime acting on matter (particles follow geodesic curves established by the curvature of spacetime) and
70Newtonian mechanics is correct for phenomena with v c and for which Planck’s constant can be ignored. To the diagram in Fig. 1.14 we should add another axis with = 0, a theory that has yet to be formulated.
71See Section 14.3.5.

Exercises 25
matter acting on spacetime (curvature determined by energy-momentum through the Einstein ﬁeld equation). Such a symmetry, which might be expected generally—Newton’s third law, implies that spacetime is physical.
• SR is based on the equivalence of IRFs established by free particles; SR is the law of inertia expressed in spacetime. All inertial observers see the worldlines of free particles as straight, and all inertial observers can claim themselves at rest. The geometry of spacetime in SR is ﬂat. The conditions for ﬂatness were not speciﬁed in this chapter, but a hallmark of a ﬂat geometry is that the metric tensor consists of constant elements, as in the Lorentz metric, Eq. (1.14). SR shows that the laws of mechanics and electromagnetism are equivalent for all inertial observers, but not gravitational phenomena, which requires the machinery of GR—the transition from global to local IRFs.

EXERCISES
1.1 Are the events A and B in Fig. 1.6 timelike, lightlike, or spacelike separated? That they are simultaneous in one frame, but not in another suggests what type of spacetime separation? What if in the right portion Fig. 1.6, the train was traveling from right to left, would event A still occur before B?
Note: The remainder of the exercises for Chapter 1 require no relativity. Work them using nonrelativistic physics.
1.2 Objects O1, O2, separated by a distance L, move along the x-axis with speed v < c. O1 emits a photon toward O2, reﬂecting at event E1, which O1 absorbs at event E2. See Fig. 1.15. Calculate the times t1 and t2 in terms of L, v, and c. It may be helpful to draw the “space only” version of the diagram.

t

t2

E2

t1

E1

L x
Figure 1.15 Figure for Exercise 1.2.

1.3 A river of width L ﬂows with speed vr with respect to its banks. Two swimmers can swim relative to still water at speed c, where c > vr. The swimmers decide to have a contest. One will swim across the river and back. Call the time to accomplish this task T⊥. The other will swim up the river the same distance L and back. Call the time to accomplish this task T .

a. Show that

T

=

2L/c 1 − β2

≡

2L γ2 c

2L/c 2L

T⊥ =

≡ γ, 1 − β2 c

(P1.1)

26 Core Principles of Special and General Relativity

where β ≡ vr/c. Thus, T > T⊥. For the across-the-river swim to arrive at the point on the other bank directly across from the starting point, the swim must be “aimed” upstream at an angle tan θ = vrT⊥/(2L) relative to the line joining the points directly across the river from each other.

b. Show that for small β:

T

− T⊥

=

L β2 c

+

O(β4)

.

(P1.2)

1.4 Consider Fig. 1.16. In the same river, a swimmer swims out to a distance L1 and back at a constant angle φ relative to the bank. Call T (φ) the time to accomplish this task.

vr L2
L1 φ

Figure 1.16 A swimmer swims to a distance L1 in the river, and back, at an angle φ relative to the bank. A second swimmer swims to a distance L2 and back, at an angle φ + π/2.

a. Derive a formula for T (φ). It’s instructive to use the Galilean velocity transformation. Let c denote the velocity of the swimmer in the frame of the water. Relative to still water, the swimmers swim at speed c. The velocity of the swimmer as observed from the bank is v = vr + c, or c = v − vr. “Dot” this vector into itself to ﬁnd
c2 = v2 + vr2 − 2vvr cos(v, vr) ,

where (v, vr) denotes the angle between the vectors v and vr. Show that

T (φ)

=

2L1/c 1 − β2

1 − β2 sin2 φ ,

(P1.3)

where β = vr/c.

b. Show that Eq. (P1.3) reduces to Eq. (P1.1) in the appropriate cases.

c. A second swimmer swims out to a distance L2 and back at a constant angle φ + π/2 relative to the bank. Let T (φ + π/2) be the time to accomplish this task. Write down a formula for T (φ + π/2). Take Eq. (P1.3) and let L1 → L2 and φ → φ + π/2.

d. Calculate the difference in time for the swimmers to accomplish their tasks, ∆T (φ) ≡ T (φ + π/2) − T (φ). Show that

∆T (φ)

=

2L2/c 1 − β2

1

−

β2

cos2

φ

−

2L1/c 1 − β2

1 − β2 sin2 φ .

(P1.4)

e. Using Eq. (P1.4), show that for small β

∆T (π/2)

−

∆T (0)

=

β2 c

(L1

+

L2)

+

O(β4)

.

(P1.5)

f. Now imagine that we continuously change the angle φ in Fig. 1.16. Using Eq. (P1.4),

show that, to leading order in small β, the time difference between the two legs changes

with the angle according to

d ∆T (φ) = β2(L1 + L2) sin 2φ + O(β4) .

dφ

c

(P1.6)

2 C H A P T E R
Basic special relativity

T HE basics of special relativity (SR) are presented in this chapter using spacetime diagrams.
2.1 COMPARISON OF TIME INTERVALS: THE BONDI K-FACTOR

A

B

A

B

kT

kT

T

T

Figure 2.1 Inertial observers A and B emit photons separated by time T . Each sees the other move away at the same speed, v. Photons in the moving frames are received separated in time by kT , where k = k(v).
Let inertial observers A and B in relative motion carry identical clocks, the worldlines of which are shown in Fig. 2.1.1 A sends two ﬂashes of light to B, a time T apart. What time separation does B measure? Not T —the second photon has further to travel. Perhaps if we knew the relative speed between A and B, the time in B could be calculated? That presupposes, however, the Newtonian conception that time is the same everywhere. SR shows that time is speciﬁed by a clock in a reference frame. We know that worldlines of free particles are straight in IRFs, and that spacetime coordinates in different IRFs are related by a linear mapping (Section 1.4). The time difference measured in B must therefore be proportional2 to T , call it kT . The Bondi k-factor [12] is a function of the relative speed between frames, k = k(v). In particular, as v → 0, k(v) → 1. Both observers see the other moving away at the same speed, and thus the k-factor must be the same for both observers (equivalence of IRFs). Photons emitted by B separated by time T are measured by A to have a time separation kT (see Fig. 2.1). The rabbit is in the hat.
A and B synchronize their clocks when their worldlines cross (see Fig. 2.2). After time T , A emits a photon toward B, which is reﬂected back to A. On B’s clock, the photon arrives at time kT .
1The worldline of an observer is the time axis in its reference frame. Imagine yourself holding a clock in a room: You deﬁne the time axis for your reference frame. We don’t show the spatial axes in spacetime diagrams except when necessary. The worldline is the location of x = 0 in that frame.
2This one assumption is really the whole show. The time measurements involved are at the same spatial locations in each frame, ∆x = 0. Thus, ∆t in one frame is linearly related to ∆t in another.
27

28 Core Principles of Special and General Relativity

A

B

t

kT

k(kT )

T

x Figure 2.2 Photon emitted at time T , reﬂected at time kT , and received at time k2T .

A records the arrival of the photon at a time that’s a multiple, k, of the time at which B reﬂected the photon, kT . Thus, A records the arrival of the photon at time k2T . Using Eq. (1.5), A assigns
coordinates to the photon reﬂection from B:

(ct, x) =

1 2

c(k2

T

+

T ),

1 2

c(k2T

−

T)

.

(2.1)

What was never in doubt is that A sees B moving at speed v, which in A’s coordinates is expressed as x = vt. Thus, using Eq. (2.1),

v x cT (k2 − 1)/2 k2 − 1

β = c = ct = cT (k2 + 1)/2 = k2 + 1 .

(2.2)

Solve Eq. (2.2) for k:

1+β

k=

.

1−β

(2.3)

Voila! We see that k(0) = 1 as expected. There’s a sign convention implicit in Eq. (2.3): β > 0 corresponds to the “receiver” moving away from the photon source. Thus, k > 1 for β > 0. For the source approaching the receiver, let β → −β, 0 < k < 1. From Eq. (2.3), k(−v) = k−1(v).
The k-factor, which relates time intervals, is the inverse of the relativistic Doppler factor, which relates frequencies3 (derived in Appendix B, Eq. (B.3)). It seems that we’ve arrived at a fundamental result of SR without invoking any relativity! If we examine the argument, however, we see that it uses
the principle of relativity, that all inertial observers can claim themselves at rest, and the isotropy of the speed of light (through the use of the radar method). We motivated the k-factor by appealing to linearity, that all inertial observers see straight worldlines of free particles. The k-factor is thus ﬁrmly rooted in the fundamentals of relativity. As we now show, all the standard results of SR can be derived using the k-factor.
We can see the connection with the Doppler effect by referring to Fig. 2.3. An emitter emits signals regularly with time separation ∆t; it thus emits at the frequency fe ≡ (∆t)−1. The receiver receives signals separated by time ∆trec; hence the received frequency is frec = (∆trec)−1. The reception time is related to the emission time through the k-factor, ∆trec = k∆t. We therefore have the relativistic Doppler effect, in agreement with Eq. (B.3),

1 frec = k fe .

(2.4)

While the receiver is approaching the emitter, k < 1, a blueshift, and after, k > 1, a redshift.

3The relativistic Doppler effect is the classical Doppler effect combined with time dilation. Time dilation is not something we “ofﬁcially” know about yet; it’s derived in Section 2.2.

Time dilation 29

emitter worldline

receding

approaching

receiver worldline

∆t

Figure 2.3 Doppler effect. Approaching observer receives photons at a blueshifted frequency; receding observer receives photons at a redshifted frequency.

2.2 TIME DILATION
Figure 2.4 shows the worldlines of inertial observers A and B who have synchronized their clocks.

A

B

t

(t, x)

T t − x/c

x

x

Figure 2.4 Time dilation. Proper time T occurs at time t = γT in reference frame A.

A emits a photon at time t − x/c that’s reﬂected by B. A assigns coordinates (t, x) to the reﬂection

event. B records the arrival of the photon at time T using its clock. B is at rest relative to the clock;

time measured in that frame is the proper time. The time T is related to the time t − x/c through

the k-factor:

T = k(t − x/c) .

(2.5)

In A, x = vt, implying

T = k(t − βt) = kt(1 − β) = t 1 − β2 ,

(2.6)

where we’ve used Eq. (2.3). Equation (2.6) is usually written

1

t=

T = γT .

(2.7)

1 − β2

Equations (2.6) or (2.7) are referred to as time dilation—“moving clocks run slow.” Suppose B measures T to be one hour. A will measure a time longer than one hour, and conclude that the moving clock runs slower. We show in Chapter 4 that the effect is symmetrical: Both observers claim a moving clock runs slow.

30 Core Principles of Special and General Relativity

A

B

A

t

(t, x) (T, x = 0)

t

t − x/c

B (t, x) (T, x = 0)

x

x

Figure 2.5 Left: k-factor relates time for two events (black dots), T = k(t − x/c) Right: Time dilation relates the time coordinates assigned to the same event, t = γT .

Let’s be clear on what times are being compared. The k-factor relates the time coordinates of two distinct events—emission and reception of a photon, shown as black dots in Fig. 2.5. Emission occurs at time t − x/c in A, and reception occurs at event (T, x = 0) in B. The worldline of the clock, B, deﬁnes the line x = 0, just as the time axis in A is the line x = 0. The k-factor relates the reception time in B to the emission time in A, with T = k(t − x/c). The time dilation factor on the other hand relates the time coordinates assigned to the same event by two observers, with t = γT .

2.3 VELOCITY ADDITION
Figure 2.6 shows the worldlines of three inertial observers, A, B, and C, which synchronize their clocks as they pass at the origin. A emits two photons with a time separation T . The two photons
C

B kAC T = kBC kAB T

A kAB T
T

Figure 2.6 Composition rule for k-factors.

arrive in B separated by time kABT , where kAB is the k-factor associated with the relative speed between A and B. The photons arrive in C separated by time kACT . Alternatively, the photons leave frame B separated in time kABT , which arrive in frame C with time separation kBC kABT . Thus, kAC T = kBC kABT , implying the k-factors satisfy a multiplicative composition law:

kAC = kAB kBC .

(2.8)

The k-factor is a proxy for speed: the larger is β, the larger is k. The projection of the interval T (along the time axis in the rest frame) onto the time axes of frames in motion is a measure of speed. The way the geometry of spacetime works, k-factors are multiplicative between frames.

Adding velocities 31

Combining Eq. (2.2) with Eq. (2.5),

βAC

=

kA2 C kA2 C

−1 +1

=

kA2 B kB2 C kA2 B kB2 C

−1 +1

.

Using Eq. (2.3) for each of the k-factors, it follows that (show this)

(2.9)

βAC

=

βAB + βBC 1 + βABβBC

,

(2.10)

the Einstein velocity addition formula. For low speeds, βAB 1, βBC 1, and βAC ≈ βAB + βBC , the Galilean velocity addition formula. If we substitute βBC = 1 in Eq. (2.10), we obtain βAC = 1 for any βAB. There is an invariant limiting speed implied by the theory, β = 1.

Example. Apply the velocity addition formula to the speed of light in a moving medium. The speed of light in the rest frame of a medium of index of refraction n, is c/n. What is the speed of light u when the medium has speed v relative to the lab frame? Using Eq. (2.10),

v + c/n c

1

1

u=

= +v

1 + v/(cn) n

1 − n2

. 1 + v/(nc)

The coefﬁcient multiplying v, (1 − n−2), the Fresnel drag coefﬁcient, was conﬁrmed in the Fizeau
water tube experiment of 1851, where v c. Relativistic velocity addition thus has an observable consequence at relatively slow speeds—the speed of the ﬂow of water in the Fizeau experiment.4

Example. Particles A and C have velocities βA = 0.95 and βC = −0.95 relative to a linear accelerator. What is the velocity of C relative to A? It’s helpful to draw a spacetime diagram— see Fig. 2.7. B is a laboratory observer, situated between A and C; the precise location of B is
kBC kAB T = kAC T
kAB T
T

AB

C

Figure 2.7 Spacetime diagram for particles A and C approaching each other.

unimportant. Because A and C are both approaching B, we can set βAB = βBC = −0.95 and use Eq. (2.10) to conclude that βAC = −0.9987. In Fig. 2.7, particle A emits two photons separated by time T . Because A is approaching B, kAB < 1; similarly for kBC .

4The Fizeau experiment is worth learning about—an ingenious interferometric experiment not unlike the MM experiment. It uses a tube constructed so that water can ﬂow in opposite directions, through which beams of light pass in such a way that each beam propagates in the direction of water ﬂow. The light beams are then brought together to interfere, where the change in phase is correlated with the speed of the water. In the Fizeau experiment, the ﬂow of water can simply be turned off, something that Michelson and Morley couldn’t do—turn off the motion of the earth!

32 Core Principles of Special and General Relativity
2.4 LORENTZ TRANSFORMATION
Figure 2.8 shows the worldlines of observers A and B in relative motion (what we often call frames S and S ), which synchronize their clocks as they pass. Each uses the radar method to assign coordinates to the same event, P : (t, x) and (t , x ). A photon emitted by A at time t − x/c reﬂects from event P and is received at time t + x/c. B emits a photon at time t − x /c which reﬂects from the same event, and is recorded at time t + x /c. Note the symmetry: Both observers claim to be at rest; both emit a photon at the “same time” using their respective coordinates, t − x/c and t − x /c.
A t + x/c
B

t − x/c

t + x /c (t, x) (t , x )
P
t − x /c

Figure 2.8 Inertial observers A and B use the radar method to assign coordinates to the same event, (t, x) and (t , x ).

The emission and reception times in the two frames are naturally related through the k-factor:

t − x /c = k(t − x/c) t + x/c = k(t + x /c) .

(2.11)

Solve Eq. (2.11) for (t , x ):

ct

=

1 2

k−1 + k

ct

+

1 2

k−1 − k

x

x

=

1 2

k−1 − k

ct

+

1 2

k−1 + k

x.

Using Eq. (2.3), we have the matrix equation (show this)

ct x

=γ

1 −β

−β 1

ct x

,

(2.12)

the same as Eq. (A.6).

Location of x -axis: Lines of simultaneity
With the LT, we can ﬁnd the location of the x -axis—the spatial axis of S —in relation to the space and time axes of S. The t -axis is the worldline seen in S, x = vt, or ct = β−1x. The same result follows from Eq. (2.12) as the locus of points with x = 0 (check it!). What about the x axis? Answer: The locus of points associated with t = 0, which from Eq. (2.12) is ct = βx. Figure 2.9 shows the x and t axes both situated at the angle φ with respect to the x, t axes, where tan φ = β. As β → 1, φ → π/4. The coordinates assigned to the same event in each reference

Lorentz transformation 33

ct φ
t t

ct ∗

x

x

φ

x

x

Figure 2.9 Coordinates assigned to the same event—the asterisk—in reference frames in relative motion with speed β: (t, x) and (t , x ). The t and x -axes form the same angle φ with respect to the t and x-axes, with tan φ = β.

frame are found by projecting onto the respective space and time-axes, as shown. Knowing the x axis provides a way to test for the simultaneity of events (in S ): If two events can be connected by a line parallel to the x axis, they have the same time in that frame. Lines parallel to the x-axis are lines of simultaneity.5
We began our discussion of spacetime diagrams by agreeing to take the time axis as orthogonal to the space of spatial variables. All IRFs are equivalent, yet the space and time axes of S do not appear orthogonal in Fig. 2.9. We don’t know yet how to form the inner product of spacetime vectors. We’ll see that the t and x -axes are indeed orthogonal in S .

Example. Particle B moves away from A at speed β = 0.25, from left to right. What are the coordinates in the moving frame√assigned to an event that in the rest frame occurs at ct = 2.25 and x = 1.5? For β = 0.25, γ = 4/ 15 = 1.03. Use the LT, Eq. (2.12):

ct x

= 1.03

1 −0.25

−0.25 1

2.25 1.5

=

1.94 0.97

.

These are the coordinates shown in Fig. 2.9. What if the speed is negative (particle moves from right to left)? Figure 2.10 shows the spacetime diagram for β = −0.25.

ct ct

t

t

∗

φ

x φ

x

x

x

Figure 2.10 Spacetime diagram for a particle moving with negative velocity, right to left.

5Lines parallel to the t-axis are lines of co-locality; between timelike separated events one can always ﬁnd a frame of reference where the events occur at the same location in space.

34 Core Principles of Special and General Relativity

Example. Relativity of causality Spacelike-separated events can occur simultaneously or in either time order, depending on the reference frame (mentioned in Section 1.5). The left portion of Fig. 2.11 shows spacelike-separated events A and B as seen from a reference frame moving with speed β = 0.26 relative to the unprimed frame. A precedes B in both frames. In the right portion of Fig. 2.11 the same events occur in the opposite order in a frame moving with speed β = 0.71.

ct ct B

A

x

x

ct

ct

x

B

A

x

Figure 2.11 Time order of spacelike-separated events is reference-frame dependent.

2.5 LENGTH CONTRACTION
We now discuss, from several points of view, length contraction, the converse of time dilation, a phenomenon students (and others) tend to ﬁnd more confusing than time dilation.

2.5.1 Using the k-factor

A rod of rest length D moves along the x-axis with speed v from left to right. Figure 2.12 shows the worldlines of the front and back ends of the moving rod as Bfront and Bback from the perspective of reference frame A. Clocks are synchronized when the front edge of the rod passes the origin, O. You may ﬁnd it helpful to visualize how you would use a radar gun to measure the distance to an approaching rod, traveling straight at you. As we now show, the length of the rod as measured in A is d = D/γ, the phenomenon of length contraction.
A emits a photon at time −d/c (negative time, relative to the origin O), which reﬂects from the back end of the rod, event E. The reﬂected photon arrives in A at time d/c. The coordinate of the back end of the rod is, from the radar method, d. In B the emitted photon passes the front end of the rod at point P in Fig. 2.12. It’s as if in frame B a photon was emitted from the front end of the rod toward the back end. Such a photon would have been emitted at time −d/(kc) in the B-frame. We’ve used that k(−v) = k−1(v); the receiver (front end of the rod) is moving toward the source—negative speed. The reﬂected photon encounters Bfront at point Q in Fig. 2.12, which occurs at time kd/c (use the k-factor together with the time d/c in the A-frame). By the radar method, the coordinates for event E in frame B are

(ct , x ) =

d (k

−

k−1),

d

(k

+

k−1)

= (dβγ, dγ) .

2

2

(2.13)

Thus D = γd, or d = D/γ: Rods in motion appear shorter than the length measured in its rest frame. We’ll show in Chapter 4 that the relation is symmetrical: Both observers claim that a rod in motion has a length shorter than its rest length. The line labeled D in Fig. 2.12 is where the x -axis in B intersects event E (a line of simultaneity). In both reference frames, the spatial coordinates of the two ends of the rod have been obtained at the same time.

Bback

Length contraction 35

A

Q

Bfront

t

x

d/c

kd/c

D

E

d

O

−d/(kc)

P −d/c
Figure 2.12 Length contraction. Rod of rest length D has length D/γ measured in A.

2.5.2 Using the Lorentz transformation

Length contraction can be more readily demonstrated using the Lorentz transformation (LT). Figure

2.13 shows the worldlines of the front and back ends of the moving rod as Bfront and Bback as seen in the frame of observer A. Both observers want to measure the length of the rod, and both are careful

to measure the two ends of the rod at the same time in their reference frames.6 But of course, what’s

simultaneous in one frame is not in another. Observer B, at rest relative to the rod, measures ∆x

at t = 0 as the rest length of the rod. Observer A records the locations of the two ends of the

rod at time t, measuring the length as ∆x. The events used to measure length in frame A are not

simultaneous in frame B, and vice versa; see Fig. 2.13. Referring to the events with ∆t = 0, we

have from Eq. (2.12),

c∆t ∆x

=γ

1 −β

−β 1

0 ∆x

.

Thus, ∆x = γ∆x (length contraction) and c∆t = −βγ∆x (relativity of simultaneity).

2.5.3 Pole and barn
A paradox is a conﬂict between reality and your feeling of what reality ought to be. —Richard Feynman [13, p18-9]
One of the more well-known of the supposed paradoxes associated with SR is the pole and barn problem. In this thought experiment, a runner carries a pole that in its rest frame is 20 m long (a
6To measure the length of a stick moving past you, you wouldn’t ﬁrst measure the location of one end of the stick, and then only an hour later measure the location of the other end.

36 Core Principles of Special and General Relativity

ctA t c∆t

B Bback ct

Bfront

∆x ∆x

x
c∆t x

Figure 2.13 Length contraction ∆x = ∆x /γ where ∆x is the proper length.

long pole!). The runner is heade√d toward a barn that in its rest frame is 10 m long. The speed of the runner is such that γ = 2 (β = 3/2). In the frame of the barn, the pole appears 10 m long because of relativistic length contraction. Thus, the pole ﬁts entirely within the barn at one instant of time. In the frame of the pole, however, the barn appears 5 m long, and the pole cannot ﬁt entirely within the barn. Can these descriptions be reconciled?

ctB Back pole A

A
B Front pole

Front barn ctP
Front pole

Back pole

B

Back barn

Front barn

Back barn

10 m

10 m

xB

Barn Frame

20 m

5m

xP

Pole Frame

Figure 2.14 Reference frames of barn and pole on spacetime diagrams (not to scale).

The left portion of Fig. 2.14 shows the events in the reference frame of the barn, with the pole approaching from left to right. We consider the front of the pole and the front of the barn to be on the right, with the back of the pole and the back of the barn to the left. The front of the pole ﬁrst encounters the back of the barn. As the pole passes through the barn there is an instant of time when the pole ﬁts entirely within the barn. These events are labeled in Fig. 2.14 as A (back of the pole encountering the back of the barn) and B (front of the pole encountering the front of the barn).
The same events are shown in the reference frame of the pole in the right portion of Fig. 2.14. We can place the origin of spacetime coordinates wherever we want, but to use the LT formula the origins of the two systems of spacetime coordinates must coincide.7 In both diagrams the origin is the event in which the front of the pole encounters the back of the barn. Events A and B (from the left diagram) are shown in the right diagram as events A and B . The events are the same, but the coordinates assigned to them are different in the two frames.
7Because the Lorentz transformation is a linear, homogeneous coordinate transformation.

Length contraction 37

√

√

In the barn frame, A has coordinates (20/ 3, 0)—the time to cover 10 m at speed β = 3/2.

In the pole frame, the same event has coordinates

ctP xP

=γ

1 −β

−β 1

√

ctB xB

=

√2 −3

−3 2

√

√

20/ 3 0

=

40 3 −20

.

√

√

B has coordinates in the barn frame (20/ 3, 10) and coordinates (10/ 3, 0) in the pole frame.

Where’s the paradox? Well, there isn’t one, save for our intuition-derived expectation that the

pole ﬁtting entirely within the barn should be an objective fact, the same for all observers. That’s

the point of the exercise: What’s simultaneous in one frame (events A and B in the barn frame) is

not in another (A and B in the pole frame).

2.5.4 Length contraction, Minkowski, and the fourth dimension
Objects in motion have a length L contracted in the direction of motion, L ≤ L0, where L0 is the rest length. Does that mean objects in motion shrink? Consider an analogy from crystallography. Through a crystal lattice, various planes may be drawn containing lattice points, lattice planes, labeled by Miller indices8 (hkl), e.g., the (100) plane or the (110) plane—see Fig. 2.15. A plane is

Figure 2.15 Separation between atoms in a crystal is lattice-plane dependent.

a two-dimensional slice of a three-dimensional geometry. Restricting your attention to one of these planes, what is the distance between lattice points? The answer depends on the plane. Does it make sense to ask what is the real distance between atoms? There’s no deﬁnitive answer if all we see is a two-dimensional sampling of the points that have an arrangement in three dimensions.
The same reasoning applies to objects we see at an instant of time in our three-dimensional world, objects that exist in four-dimensional spacetime. Figure 2.16 shows the worldlines of the

ct P

ct
Q P

Qx

x

P

PQ

Q

Figure 2.16 Worldtubes intersecting lines of simultaneity for frames in relative motion. ends of two identical rods, much as in the previous ﬁgures. The points of an extended object (such
8This nomenclature is discussed in any book on solid-state physics.

38 Core Principles of Special and General Relativity
as a rod), considered in spacetime as a collection of worldlines, comprise a worldtube, shown in crosshatch in Fig. 2.16. In the (t, x) coordinate system the rest length, or proper length, is the line P P . In the (t , x ) system, attached to an identical rod moving along the x-axis, the rest length is the line Q Q . The line P P , as seen from the (t , x ) system is shown as P P , while Q Q is shown as QQ in the (t, x) system.9
Minkowski argued that the apparent deformation of a moving object can be understood as arising from a three-dimensional slice (surface of simultaneity) of a four-dimensional entity. The length depends on the intersection of the worldtube with an observer’s space—surface of simultaneity. The way the non-Euclidean geometry of spacetime works, the intersection with the rest-space of an observer produces the largest length.10 By considering spacetime as a whole, by taking a geometric point of view, Minkowski found that the perplexing results of SR can be given an intelligible explanation. His most far-reaching conclusion is that observers in relative motion have different spaces as well as times. One must arrive at this conclusion if surfaces of simultaneity (observer-dependent slices) in a four-dimensional spacetime are three-dimensional spaces.
The usual length contraction hypothesis, according to Minkowski
. . . sounds extremely fantastical, for the contraction is not to be looked upon as a consequence of resistances in the ether, or anything of that kind, but simply as a gift from above—as an accompanying circumstance of the circumstance of motion.
Minkowski held that the idea of a four-dimensional world explains the principle of relativity:
. . . the word relativity-postulate . . . seems to me very feeble. Since the postulate comes to mean that only the four-dimensional world of space and time is given by phenomena, but that the projection in space and in time may still be undertaken with a certain degree of freedom, I prefer to call it the postulate of the absolute world (or brieﬂy, the worldpostulate).
We should then have in the world no longer space, but an inﬁnite number of spaces, analogously as there is in three-dimensional space an inﬁnite number of planes. Threedimensional geometry becomes a chapter in four-dimensional physics.
Basically, because in a four-dimensional world observers in relative motion have their own spaces and times, all inertial observers describe phenomena the same way because all are at rest in their respective frames. Thus, every inertial observer measures the same speed of light using its own restspace coordinates and time. There cannot be absolute motion in the sense of Newton because there is not just one space and one time.
Minkowski’s explanation of length contraction—the same four-dimensional worldtube intersected by spaces of different observers—makes a compelling case for the reality of four-dimensional spacetime. Einstein did not at ﬁrst embrace Minkowski’s theory, but soon started to make use of tensor methods in spacetime geometry. General relativity (GR) would not be possible without a geometrical perspective on spacetime.
2.5.5 FitzGerald-Lorentz contraction
In 1888 Oliver Heaviside showed (based on the ether model) that the electric ﬁeld surrounding a spherical charge would cease to have spherical symmetry if the charge was in motion relative to the ether. In the Heaviside model, the longitudinal component of the electric ﬁeld (in the direction of motion) is affected by motion, but not the transverse components.11 In 1889, G.F. FitzGerald took
9We’re using the notation employed by Minkowski.[9, p78] Why? To encourage you to read the original literature! 10In a crystal, the lattice constant is reported as the shortest distance between atoms. 11This is of course exactly the opposite from SR, where the longitudinal ﬁeld component is invariant and the transverse components transform between IRFs. See Chapter 8.

Length contraction 39
Heaviside’s result and suggested ad hoc that the shape of an object would be altered in the direction of motion. As is well known, if the length L of the arm in a Michelson interferometer is distorted in the direction of motion such that L → L 1 − β2, it would explain the null result of the MM experiment while preserving the notion of the ether. In 1892, Lorentz independently published the same idea, although Lorentz attempted to work through detailed models of inter-molecular forces that would demonstrate the effect. The idea came to be known as the FitzGerald-Lorentz contraction hypothesis (FL).
Einstein’s hypothesis that the speed of light is the same in all IRFs also accounts for the null result of the MM experiment, without making assumptions about the internal constitution of matter. As we’ve seen, an identical formula L = L0 1 − β2 is derived in SR, and it’s important to understand the difference between relativistic length contraction and the FL contraction. Length contraction in SR is a coordinate effect, the difference in spatial coordinates of something that should be seen in its totality in four-dimensional spacetime. There is not implied an actual contraction, in distinction to the FL contraction. Asking whether a stick “really” contracts is tantamount to asking whether it’s “really” moving, which can be answered only from absolute space. If SR is correct and there is no ether and length contraction is a “real” contraction, the MM experiment would show a positive result, because the FL contraction would introduce a time difference between the arms of the interferometer. A real length contraction is not compatible with the null result of the MM experiment and isotropy of the speed of light.
2.5.6 Experimental status
There is no direct experimental conﬁrmation of relativistic length contraction. Elementary particles can be made to move rapidly (speeds comparable to c), but their linear dimensions cannot be measured directly, while macroscopic objects, the dimensions of which can be measured, cannot be made to move at relativistic speeds. The predictions of SR all emerge from the principle of relativity, and length contraction is one of its consequences. It’s often said that SR rests on two postulates (the way Einstein presented it): the principle of relativity and the invariance of the speed of light in IRFs. The principle of relativity alone predicts a universal speed, which experiment shows to be the speed of light.12 Time dilation has been conﬁrmed through the measurement of the relativistic Doppler effect (Ives-Stillwell experiment [14]). The MM experiment [15] showed that the speed of light is isotropic, used in the derivation of the radar method. The Kennedy-Thorndike experiment [16] showed that the speed of light is independent of the velocity of the source, implicitly used in the derivation of each of the effects of SR (Doppler effect, time dilation, Lorentz transformation, length contraction). While there is no direct conﬁrmation of length contraction, we show in Chapter 8 that the Lorentz force q (E + v × B) can be derived without approximation as a frame transformation, a derivation that relies heavily on length contraction.
2.5.7 Length contraction in one frame is time dilation in another
Time dilation and length contraction are each a consequence of the relativity of simultaneity. Both effects emerge from a comparison of events measured from reference frames in relative motion. In a frame at rest relative to clocks and rods, measurements taken at the same location (proper time) and at the same time (proper length), are different from measurements made in a frame in which clocks and rods are in motion, the measurements of which occur at different locations (time dilation) and at different times (length contraction). As we now show, what can be interpreted as time dilation in one frame can be interpreted as length contraction in another.
12Thus, Einstein’s second postulate is not strictly necessary. We argue in Chapter 3 that Einstein’s second postulate is the assertion that photons have no rest mass.

40 Core Principles of Special and General Relativity

Referring to Fig. 2.7, particles A and C are launched at time t = 0 with speeds of 0.95c, a distance L = 3.2 km apart (the length of the Stanford Linear Accelerator). At what time do the particles collide, in the lab frame and in the frame of one of the particles? In lab-frame coordinates (see Fig. 2.17), the particles collide at half their initial separation, tL = L/(2v) = 1.6 km/0.95c = 5.61µs. The proper time, the time that particle A experiences before the collision is, from Eq. (2.7), T = t/γ = 1.75µs, where γ = 3.2 for β = 0.95. Moving clocks run slow.
ctL
ctA

L

xL

A

C

Figure 2.17 In laboratory coordinates, C starts at L and collides with A at L/2

Let’s calculate that time using length contraction, knowing (page 31) that in the rest frame of A, C approaches with speed βr = 0.9987. The Lorentz factor associated with βr is γr = 19.51. One might think that C “sees” a contracted length L/γr = 3200 m/19.51 = 164 m (for the starting separation L = 3.2 km). A would then suffer a collision after a time L/(γrβrc) = 0.55µs, not the same as our previous calculation of 1.75µs. What’s wrong with this apparently too-facile argument? As is often the case, the problem lies in simultaneity. The starting separation is speciﬁed in the lab frame; only in that frame can we say the particles are 3.2 km apart at t = 0. What length should we use?
Let’s write down the LT between the lab frame and the IRF associated with A. The laboratory is rushing from right to left in frame A—negative velocity. From Eq. (2.12) with β → −β,

ctL xL

=γ

1 β

β 1

ctA xA

.

(2.14)

The inverse of Eq. (2.14) is found by reversing the speed, β → −β (see Exercise 2.3). Thus, we

have the equivalent LT

ctA xA

=γ

1 −β

−β 1

ctL xL

.

(2.15)

The location of the tL axis (in the (tA, xA) coordinate system) is found from Eq. (2.14) by setting xL = 0 (ctA = −β−1xA); the location of the xL axis is found by setting tL = 0 in Eq. (2.14)
(ctA = −βxA). These axes are shown in Fig. 2.18.

In the lab frame, particle C starts at (0, L). The coordinates in A associated with that event are

found from Eq. (2.15):

ctA xA

=γ

1 −β

−β 1

0 L

.

In A, C starts at coordinates (ctA = −βγL, xA = γL), i.e., in A the rest length is γL, because that’s what gets contracted to the length L speciﬁed in the lab frame, a frame that’s now in motion with respect to A. The time coordinate cTA = −βγL is the time difference in A between events that are simultaneous in the lab frame.
The worldline of C in Fig. 2.18 connects the event with coordinates (−βγL, γL) with the collision with A at (cT, 0). We can compute T from the known relative velocity between A and C:

γL βr = cT + βγL ,

ctL ctA cT

Foundational experiments 41 slope = −βr

γL

xA

βγL L
xL

Figure 2.18 Events of Fig. 2.17 shown in the frame of particle A. Not drawn to scale.

or

γL

cT = βr (1 − βrβ) .

(2.16)

Using Eq. (2.16), T = 1.75µs, the same as we found in the lab frame using time dilation.

We can ask, what equivalent length, call it D, did particle C traverse at speed βrc? That is,

express cT in terms of an equivalent length,

γL

D

cT = βr (1 − ββr) ≡ βr .

It is not difﬁcult to show that

1 1 − ββr = γr ,

(2.17)

where γr ≡ (1 − βr)−1/2 (use βr = 2β/(1 + β2)). Thus, particle C sees the rest length γL in the

A frame contracted to γL/γr.

2.6 FOUNDATIONAL EXPERIMENTS

2.6.1 The Michelson-Morley experiment: Isotropy of c

Exercise 1.4 is modeled on the MM experiment. The “swimmers” are beams of light, and the river is the ether, streaming past us in our reference frame at speed vr. Our speed relative to the ether is unknown, but can be estimated to be on the order of the speed of Earth in its orbit around the sun, vr ≈ 3 × 104 m s−1. (Thus, βr ≈ 10−4.) Associated with the time difference ∆T between the arms of the interferometer is a shift of N = f ∆T fringes, where f is the frequency of the light. Fringes are alternating bands of light and dark seen when the beams of light in the arms of the interferometer are brought together, and correspond to alternating conditions of constructive and destructive interference. A single fringe represents one wavelength of the light source. Associated with the time difference given by Eq. (P1.2), we should expect to see N = (L/λ)β2 fringe shifts implied by the motion of Earth relative to the ether. The trouble is, we can’t stop the earth to count fringe shifts. Staring through the telescope in the interferometer, there’s no “zero” marking on the fringes, implying that N = (L/λ)β2 can’t be tested. Michelson came up with the idea of watching the fringe pattern as the apparatus is rotated; in that way one could actually observe the number of fringe shifts ∆N . In rotating the apparatus through π/2 radians, one should, using Eq. (P1.5), expect to see ∆N fringe shifts, where

c

β2

∆N = (∆T (π/2) − ∆T (0)) ≈ λ

λ

(L1 + L2) .

42 Core Principles of Special and General Relativity

As the interferometer is rotated continuously, Eq. (P1.6) gives the expected change in fringe shift

per radian

d ∆N (φ) =

c

d ∆T (φ) = β2 L1 + L2 sin 2φ + O(β4) .

dφ

λ dφ

λ

(2.18)

In the MM experiment L1 + L2 ≈ 22 m and the yellow light of sodium was used, λ = 589 nm. Michelson expected to see 0.4 fringe shift and he estimated he would have been able to observe 0.01 fringe shift. Figure 2.19 shows the fringe-shift measurements from the MM experiment (solid

Figure 2.19 Fringe-shift measurements from the MM experiment (solid lines) versus orientation of the interferometer.[15] Dashed curve is Eq. (2.18) divided by eight. Upper data taken at noon, lower data taken in the evening. Reprinted by permission of the American Journal of Science.

lines), reported as fractions of a wavelength, for various orientations of the interferometer—upper

data taken at noon, lower data taken in the evening.[15] The dashed curve is the result of Eq. (2.18)

divided by eight. The number 0.05λ in Fig. 2.19 represents 0.4 fringe shift (what they expected to

observe) divided by eight. Just to be clear: What they expected to observe would have been eight

times as large as the dashed curve in Fig. 2.19. The area under the dashed curve, say from N (north)

to E (east) is

π/2 0

sin

2φdφ

=

1.

The data shown in Fig. 2.19 represents the ﬁrst experiment in support of SR. Michelson and

Morley stated about their results: “It seems fair to conclude from the ﬁgure that if there is any

displacement due to the relative motion of the earth and the luminiferous ether, this amount cannot

be much greater than 0.01 of the distance between the fringes” (which we note is in the noise of

their measurements). They concluded: “It appears, from all that precedes, reasonably certain that if

there be any relative motion between earth and the luminiferous ether, it must be small . . . .” The MM experiment can be called the most successful failed experiment.13 The experiment has

been repeated with ever-increasing precision, but always with the same negative result. In 1979 a group reported [17] a fringe shift of (1.5 ± 2.5) × 10−15, consistent with no effect at all. In 2009, a group reported [18] a measurement of the isotropy of the speed of light, ∆c/c ∼ 10−17. It appears

we’re unable to detect motion relative to the ether, and if it can’t be measured, does it exist?

13For which Albert Michelson was awarded the 1907 Nobel Prize in Physics!

Foundational experiments 43

2.6.2 The Kennedy-Thorndike experiment: c = c(v)
The Kennedy-Thorndike (KT) experiment [16] is a modiﬁcation of the MM experiment where the arms of the interferometer are intentionally made as different as possible.14 Let the longitudinal arm oriented in the direction of the earth’s motion have length LL, and let the transverse arm be oriented perpendicular to the longitudinal arm with length LT = LL.
To analyze the KT experiment, assume 1) the ether frame exists and 2) the FL contraction is real. We know these assumptions can be used to explain the results of the MM experiment; let’s see how we do with the KT experiment. Take the lengths LL and LT to be measured in the ether frame.15 The interferometer travels with speed v in the ether frame. We can use Eq. (P1.1) for the round-trip times, except now the longitudinal arm has the contracted length LL/γ(v). The traversal time in the longitudinal direction is then TL = 2LLγ(v)/c, and that for the transverse arm is TT = 2LT γ(v)/c (no contraction in the transverse direction—Chapter 3). The number of fringe shifts produced as a consequence of the earth’s motion relative to the ether is

2∆L

N = f (TL − TT ) =

f γ(v) , c

(2.19)

where f is the frequency of the light. Equation (2.19) has in common with the MM experiment that it can’t be tested directly because we can’t stop the earth. In the MM experiment the apparatus was rotated to detect a shift in fringe pattern. In the KT experiment, the apparatus was ﬁrmly ﬁxed in the laboratory; the “rotation” is provided by Earth itself, either from its daily rotation or its annual orbit around the sun. The fringe shift between Earth having velocity v and v is, from Eq. (2.19),

2∆L ∆N = f (γ(v ) − γ(v)) .
c

(2.20)

Like the MM experiment, the KT experiment produced a null result within experimental uncertainties, ∆N ≈ 0, which from Eq. (2.20) would not seem possible because v = v. The experimental ﬁnding can be reconciled with the prediction of Eq. (2.20) if we attribute to the ether a new ability, that of altering the frequency of light in a velocity-dependent manner. For Eq. (2.20) to produce ∆N = 0, it must be true that f γ(v ) = f γ(v), where f and f are frequencies that have been altered (by motion through the ether) relative to the frequency measured in the ether frame, call it f0. That is, for Eq. (2.20) to produce a null result, it must be the case that f γ(v ) = f γ(v) = f0.
On the basis of the ether model we require, to explain the MM experiment, the ability of the ether to contract objects in the direction of motion, and, to explain the KT experiment, the ability of the ether to modify the vibrational frequencies of systems in motion, all so that it (the ether) can evade detection! On the other hand, Einstein’s hypothesis, that the speed of light is the same in all IRFs, naturally accounts for the MM experiment because there’s no preferred orientation of IRFs, and the KT experiment because all IRFs in relative uniform motion are equivalent, in addition to SR making numerous other predictions. The MM experiment shows that the round-trip time required for light to cover a distance in free space is independent of direction, i.e., the speed of light is isotropic, while the KT experiment shows that the round-trip time for light to cover a distance is independent of the velocity of the source. Thus, the speed of light is independent of the velocity of the source. The ether model requires length contraction and time dilation for motion with respect to a unique reference frame, the ether, whereas in SR these relations are symmetric between IRFs: All inertial observers see rods in motion contracted and moving clocks run slow—Chapter 4.

14The MM experiment was carefully crafted to have equal-length arms. 15Such an assumption renders these quantities unknowable—that’s the problem with absolute space that “makes no
impression on our senses”—but we’re assuming the ether frame to exist; no conclusion will depend on the value of ∆L.

44 Core Principles of Special and General Relativity

SUMMARY
Spacetime diagrams were used to illustrate the basic effects of SR, the “ingredients” used in descriptions of processes in space and time: time dilation, length contraction, simultaneity, the Doppler effect, and the velocity addition formula. These phenomena are interrelated—time dilation in one frame can be explained as length contraction in another. It’s not always clear which effect is most appropriate to use in analyzing a given problem. Can these effects be viewed from a uniﬁed perspective? Yes, and there are two ways of going about that. The ﬁrst is to consider how all the spacetime coordinates change between IRFs; this is accomplished using the LT. With the LT at our disposal, we can focus on the relevant events involved in a given problem. We present a systematic derivation of the LT in the next chapter. The second way is to exploit relativistic invariance, to focus on what does not change between frames, the subject of Chapter 4.

EXERCISES

2.1 Derive Eq. (2.10) from Eq. (2.9) using Eq. (2.3) for each of the k-factors. 2.2 Derive Eq. (2.17). Use that βr = 2β/(1 + β2). 2.3 The inverse of Eq. (2.12) is found by setting β → −β. Show that

γ2

1 β

β 1

1 −β

−β 1

=

1 0

0 1

.

2.4 Derive the k-factor from the Lorentz transformation. Referring to Fig. 2.20, a photon is emitted at time T and received in a frame moving away at time kT . The Lorentz transformation relates the coordinates assigned to the same event:

ct x

=γ

1 β

β 1

kcT 0

.

We’ve used the inverse transformation. The time t is the time T plus the time for the photon to travel the distance x, t = T + x/c. Show that k is given by Eq. (2.3).

ct

kcT

cT

x Figure 2.20 Derive the k-factor using the Lorentz transformation.

3 C H A P T E R
Lorentz transformation, I

W E provide a systematic derivation of the Lorentz transformation1 (LT) and examine its kinematical consequences, what can be said without taking into account the causes of motion. (Relativistic dynamics is taken up in Chapter 7.) We derive the LT ﬁrst for frames in standard conﬁguration (deﬁned below) and then for frames not in standard conﬁguration. Along the way, the velocity addition formula emerges as a bonus.
We make two assumptions about space and time (appropriate for SR), that space is isotropic (all directions are equivalent) and that spacetime is homogeneous (no location or instant of time is preferred).2 These concepts are distinct: Isotropy does not necessarily imply homogeneity, nor does homogeneity necessarily imply isotropy. One could have homogeneous, anisotropic spaces (a crystalline environment, for example, where one direction is preferred over the others), and one could have inhomogeneous, isotropic spaces (all spatial directions equivalent, yet a special location of the origin—a set of concentric spheres about a speciﬁed origin). Remarkably, these two assumptions together with the principle of relativity sufﬁce to determine the LT.

3.1 FRAMES IN STANDARD CONFIGURATION
Let IRFs S and S be in relative motion with velocity v. Whatever is the direction of v, it is by assumption constant (IRF); let v deﬁne the direction of the x-axis. The observers synchronize their clocks when the origins of their coordinate systems coincide, i.e., where and when they have a common spacetime origin. Frames in standard conﬁguration move along their common x-axis with their y and z-axes parallel, as shown in Fig. 3.1.

z S y

z

v

S

y

x

x

Figure 3.1 Frames in standard conﬁguration. S moves to the right along the common xaxis with y and z-axes parallel.

1The LT is derived in Appendix A as the linear transformation that preserves the form of the wave equation. The same is derived in Section 2.4 using the k-factor method. The more ways you have of looking at something, the better.
2In GR, spacetime is neither homogeneous nor isotropic; the gravitational ﬁeld results from the curvature of spacetime.
45

46 Core Principles of Special and General Relativity

3.1.1 General form of the Lorentz transformation
The LT, symbolized L(v), is a linear mapping between the spacetime coordinates assigned to events by different inertial observers. All inertial observers see straight worldlines for free particles, and straight lines are preserved under homogeneous, linear mappings.3 The most general linear homogeneous mapping between four-dimensional spaces has 16 parameters. For frames in standard conﬁguration, that number can be reduced considerably by invoking homogeneity and isotropy. Write L(v) as a 4 × 4 matrix containing four unknown functions of v, α(v), δ(v), γ(v), η(v):

θt   α(v) δ(v)θ 0 0  θt

θt

x

 

y

 

=

 



−vγ(v)/θ 0

γ(v) 0

0 η(v)

0 0

  

  

x y

  

≡

L(v)

  

x y

  

.

(3.1)

z

0

0 0 η(v) z

z

We’ve introduced in Eq. (3.1) an unknown parameter θ having the dimension of speed. We argued

in Section 1.4 that the principle of relativity requires a universal speed, the same in all IRFs.4 Let θ

represent that speed; experiment will show that θ = c.

We indicate mathematically that L is a mapping from the coordinates of S to those of S with

the notation L : S → S . That is, L associates an element of S with an element5 of S . Denote

the matrix in Eq. (3.1) in block form:

AB CD

. Block B = 0 because of isotropy: We’re free

to orient the y and z-axes however we choose; the assignment of x and t can depend only on the

relative speed and not on the orientation of y and z, otherwise clocks situated differently around the

x-axis would show different times in violation of the assumption of isotropy. Block C = 0 because

of homogeneity: The assignment of y and z can’t depend on the choice of spacetime origin. Block

D is diagonal because frames in standard conﬁguration have parallel y and z-axes. The coefﬁcients

η(v) are the same for y and z because of isotropy; we’ll show that η(v) = 1. In block A there

are functions α(v) and δ(v) in the equation for t , but only one independent function γ(v) in the

equation for x , because the location of x = 0 (in S) is described by x = vt.

3.1.2 What if S moves to the left? We could equally well consider the motion of S along the negative x-axis (Fig. 3.2), in which case

z S

z S

v

x

x

y

y

Figure 3.2 Motion of S along negative x-axis of S.
the LT would follow from Eq. (3.1) by letting t → t, x → −x, y → −y, z → z, t → t , x → −x ,
3That is, lines that pass through the origin. 4This is because, as we’ll show, LTs have the property that a LT followed by a LT is itself a LT. 5The level of mathematical maturity is only going to increase from here on. Don’t ﬁght it; mathematics is in your future lightcone.

Frames in standard conﬁguration 47

y → −y , and z → z:

 θt   α(−v) δ(−v)θ 0

0   θt 

 θt 

−x −y

  

=

vγ(−v)/θ

 

0

γ(−v) 0

0 η(−v)

0 0

 −x

 

−y

≡

L(−v)

−x −y

.

(3.2)

z

0

0

0 η(−v) z

z

By changing the sign of x but keeping the sense of time unchanged (the “orientation” of time), v → −v. We’ve changed the signs of y and y for cosmetic purposes: to keep S and S right-handed systems when we reverse the sense of the x and x -axes. It’s as if we’ve rotated Fig. 3.1 180 degrees about the z-axis in S to produce Fig. 3.2.
Equation (3.2) can be derived from Eq. (3.1) by deﬁning a “reverse” operator,

1 0 0 0

R

≡

0 0

−1 0

0 −1

0 0

.

00 01

Apply R to Eq. (3.1):

θt 

θt

θt

R

  

x y

  

=

RL(v)

  

x y

  

=

RL(v)R−1

R

  

x y

  

.

(3.3)

z

z

z

Comparing Eqs. (3.3) and (3.2), we require that RL(v)R−1 = L(−v). By working out RL(v)R−1 (do it!), we learn that α(v), γ(v), and η(v) must be even functions, whereas δ(v) is an odd function. An odd function of v can be written fodd(v) = vfeven(v). Let’s represent δ(v) in terms of the even function α(v), θδ(v) = −vα(v)/f (v), where f (v) is an unknown even function having the dimension of speed. The mapping L(v) : S → S can now be parameterized

θt   α(v) −vα(v)/f (v) 0 0  θt

θt

x

 

y

  

=

−vγ(v)/θ

 

0

γ(v) 0

0 η(v)

0 0

x

 

 

y

 

=

L(v)

  

x y

  

.

(3.4)

z

0

0

0 η(v) z

z

We still have four functions of v to determine: α(v), f (v), γ(v), η(v).

3.1.3 Inverse transformation

If frame S sees S moving away with speed v to the right, S sees S moving away with speed v to the left. We’ll call this the inverse transformation. By the principle of relativity, the mapping S → S must be of the same form as L(v) in Eq. (3.4), except for v → −v (see Fig. 3.3):

θt

θt 

θt

x

 

y

 

=

L(−v)

  

x y

  

=

L(−v)L(v)

x

 

y

 

,

(3.5)

z

z

z

where we’ve used Eq. (3.4). From Eq. (3.5), it must be the case that L(−v)L(v) = I, the identity mapping, and hence L−1(v) = L(−v). The inverse LT is the original LT with the sign of the velocity
reversed. Working out L(−v)L(v) (do it!), we ﬁnd

α α − v2γ/(θf ) (α − γ)vα/f 0 0 

L(−v)L(v)

=

 



(vγ/θ)(γ − α) 0

γ γ − v2α/(θf ) 0

0 η2

0

0

 

.

(3.6)

0

0

0 η2

48 Core Principles of Special and General Relativity

z S

z S

v

x

x

y

y

Figure 3.3 Inverse transformation: Motion of S along negative x -axis.

For the right side of Eq. (3.6) to be the unit matrix we require η(v) = ±1. Because η(0) = 1, we have η = 1. Thus, coordinates transverse to the motion are unaffected. For the off-diagonal terms in Eq. (3.6) to vanish, we require α(v) = γ(v), implying that γ(v) = 1 − v2/(θf (v)) −1/2. Thus,
the LT for frames in standard conﬁguration has the form

 γ(v) −vγ(v)/f (v) 0 0

L(v)

=

−vγ(v)/θ

 

0

γ(v) 0

0 1

0 0

.

(3.7)

0

0

01

There’s still f (v) and θ to determine.

3.1.4 Group property
All IRFs are equivalent. If we transform from S to S , and then from S to S , the net effect must be the same as a single transformation from S to S , its group property. We’ll show that the group property requires f (v) to be a constant; it will also establish the Einstein velocity addition theorem. Using Eq. (3.7), transforming from S to S ,

t = γ(v1) [t − v1x/(θf (v1)]

x = γ(v1)(x − v1t) ,

where v1 is the speed of S as seen from S. Transforming from S to S ,

t = γ(v2) [t − v2x /(θf (v2)] x = γ(v2)(x − v2t ) ,

where v2 is the speed of S as seen from S . Substitute Eq. (3.8) in Eq. (3.9). We ﬁnd

t

= γ(v1)γ(v2)

1 + v1v2 θf (v2)

x

= γ(v1)γ(v2)

1 + v1v2 θf (v1)

t − 1 + v1v2 −1

v1 + v2

x

θf (v2)

f (v1) f (v2) θ

x−

1 + v1v2 θf (v1)

−1
(v1 + v2)t

.

(3.8) (3.9) (3.10)

By the principle of relativity, Eq. (3.10) must be equivalent to a LT from S to S . Equation

(3.10) must therefore have the same form as Eq. (3.8) for some speed w, the speed of S as seen

from S:

t = γ(w) [t − wx/(θf (w))]

x = γ(w)(x − wt) .

(3.11)

The factors multiplying the square brackets in Eq. (3.10) must be identical (so that Eq. (3.10) has the same form as Eq. (3.11)), implying that f (v1) = f (v2) or that f (v) is a constant; call it f . With

Frames in standard conﬁguration 49

f (v) = f , Eq. (3.10) simpliﬁes:

t

= γ(v1)γ(v2)

1 + v1v2 θf

x

= γ(v1)γ(v2)

1 + v1v2 θf

x t−

v1 + v2

f θ 1 + v1v2/(θf )

x − v1 + v2 t . 1 + v1v2/(θf )

(3.12)

Comparing Eq. (3.12) with Eq. (3.11) suggests that the compound speed w is given by

w = v1 + v2 . 1 + v1v2/(θf )

(3.13)

Equation (3.13) is more than a suggestion, however; it will be a requirement if it can be shown that

γ(w)

=

γ(v1)γ(v2)

1+

v1v2 θf

(3.14)

when w is given by Eq. (3.13). You’re going to show (Exercise 3.1) that Eq. (3.14) is an identity when the compound speed is given by Eq. (3.13) for any θ and f . We therefore have the form of the velocity addition formula and the LT, except for the constants θ and f .

3.1.5 Existence of a limiting speed
The velocity addition formula Eq. (3.13) implies the existence of a universal limiting speed, which we denote for now as ψ. Let v1 and v2 both be equal to ψ. We have from Eq. (3.13)
2ψ w = 1 + ψ2/(θf ) .
In order for w = ψ, we must have ψ2 = θf . If v1 = ψ, Eq. (3.13) (with θf = ψ2) implies w = ψ for any v2. If v1 = ψ − µ1 and v2 = ψ − µ2, with µ1 ≥ 0 and µ2 ≥ 0, Eq. (3.13) implies that w ≤ ψ for any µ1, µ2, with equality holding for µ1 or µ2 equal to zero or both (see Exercise 3.2).
It might seem that Eq. (3.13) implies three universal speeds, θ, f , and ψ. Simplicity emerges if θ = ψ, which, because ψ2 = θf , implies that f = θ. In that case there is a symmetry in the LT—see Eq. (3.7)—the space and time variables transform in an equivalent way.

3.1.6 Value of the limiting speed

The value of the limiting speed must be found experimentally. Figure 3.4 shows four data points for
measured speeds β and kinetic energies Ek of electrons.[19] The solid line represents the prediction of SR and the dashed line is the Newtonian prediction. We’ll show (in Chapter 7) that kinetic energy is related to speed through the relation Ek = (γ − 1)mc2, implying that

β2 = 1 − 1 + Ek/mc2 −2 ,

which is plotted in Fig. 3.4 as the solid curve. For low speeds β2 ≈ 2Ek/mc2, which is shown as a dashed line. The data clearly show the existence of a limiting speed, in accord with the predictions of SR and completely at odds with Newtonian mechanics, with the limiting speed equal to the speed of light within experimental accuracy. This experiment was repeated at much higher energies, up to 20 GeV, with the limiting speed found to equal c within 2 parts in 107.[20] Note that for an energy of 20 GeV, the abscissa in Fig. 3.4 would extend to the right by a factor of 4000.
Taking θ = f = c as consistent with experiment, we have the LT from Eq. (3.8)

t = γ(t − vx/c2) x = γ(x − vt) y = y z = z ,

(3.15)

50 Core Principles of Special and General Relativity

1.5 β2 1.0

Newtonian

Relativistic

0.5

0

Ek /mc2

0 1 2 3 4 5 6 7 8 9 10

Figure 3.4 Measured speeds [19] (black dots) versus kinetic energy of electrons.

the same as Eq. (A.6) and Eq. (2.12), while the velocity addition formula from Eq. (3.13),

w

=

v1 + v2 1 + v1v2/c2

,

the same as Eq. (2.10). Multiply by c, and the LT for frames in standard conﬁguration is

ct   γ −βγ 0 0 ct

x

 

y

  

=

−βγ

 

0

γ 0

0 1

0  x 

0

 

y

 

.

z

0 0 01 z

(3.16) (3.17)

3.1.7 Why c? Do photons have mass?

The question naturally arises why the speed of light is the limiting speed. While there’s no deﬁnitive
answer, the only particles that travel at the speed of light are those with zero rest mass. In SR the connection between energy and momentum is, as we’ll show, E2 = (pc)2 + (mc2)2. If m = 0, then

|p| = E/c .

(m = 0)

(3.18)

As we show in Chapter 7, in SR, p = γmv and E = γmc2. Eliminating γm between these equations, we have the general formula, valid for any m

p = Ev/c2 .

(any m)

(3.19)

Equation (3.19) is compatible with Eq. (3.18) only if |v| = c for m = 0. Does the photon have a rest mass, mγ? Experiment places an upper bound on a possible photon mass,6 mγ < 10−18eV/c2. While extremely small (24 orders of magnitude smaller than the electron mass, and 18 orders of magnitude smaller than the neutrino mass), if mγ = 0 the speed of light would not be identical with the limiting speed ψ implied by the LT. Photons have momentum because they have energy, even though they have zero mass. That photons act as particles carrying energy and momentum is veriﬁed in Compton scattering experiments.
The LT contains a ﬁnite universal limiting speed (the same in all IRFs) which we’ve identiﬁed with the speed of light in vacuum. The universality of c follows from the principle of relativity; it does not have to be postulated. If c = ψ, photons have a nonzero rest mass, and c would not be universal. Einstein’s postulate of the universality of c is equivalent to the photon having zero mass.

6A good source of information on the properties of elementary particles is the Particle Data Group, maintained online.

Frames not in standard conﬁguration 51
3.1.8 Discussion
Let’s take a moment and review the essentials of the derivation just given. A linear mapping between four-dimensional spaces would have 16 parameters in general. For frames in standard conﬁguration, that number reduces to four independent parameters when homogeneity and isotropy of spacetime are assumed, Eq. (3.4). When the principle of relativity is invoked, leading to L−1(v) = L(−v), that the mapping from S → S is the same as that from S → S (with v → −v), the LT takes the form of Eq. (3.7) containing f (v) and θ. Invoking the principle of relativity again, that a LT followed by a LT is itself a LT (all IRFs are equivalent), we ﬁnd that f (v) = f , a constant. Comparison with experiment establishes that the limiting speed θ = f = c, leaving us with the LT in the form of Eq. (3.17). That the LT can be derived under such general assumptions lends considerable support to the correctness of SR. In fact, it might lead one to wonder where all the non-intuitive “weirdness” associated with SR comes from; where did we take a “radical” step? It seems that the radical step, if it can be considered such, is in the inclusion of a separate time for each IRF, that time can no longer be considered absolute, that it too is relative to the frame of reference. Once we sign off on the idea that physics is most naturally viewed from the perspective of four-dimensional spacetime, the rest is the equivalence of IRFs, something that has long been known from the law of inertia. SR is the law of inertia expressed in spacetime.
3.2 FRAMES NOT IN STANDARD CONFIGURATION
Up to now our picture of reference frames in relative motion has been that of Fig. 3.1. Because the relative velocity v is constant (IRFs), frames in standard conﬁguration sufﬁce for many purposes, where the x-axis is aligned with v. There are occasions, however, when we need the LT between reference frames having a more general relationship.
To derive the LT for a general boost (see Fig. 1.1), where v is not aligned with a coordinate axis, express the position vector r as a sum of vectors parallel and perpendicular to v, r = r + r⊥ (see Fig. 3.5). The vector r is the projection of r onto v,
z r⊥
r

v, r

y

x

Figure 3.5 Decomposition of r = r + r⊥ into vectors parallel and perpendicular to v.

v r = (vˆ · r) vˆ = (v · r) v2 ,

(3.20)

where vˆ ≡ v/v is a unit vector. The vector r⊥ is by deﬁnition r⊥ = r − r . For the components of r , the already known LT applies, while the components of r⊥ are unchanged. From Eq. (3.15),

t = γ t − v · r /c2

r = γ(r − vt) r⊥ = r⊥ .

(3.21)

We then have, using Eq. (3.21),

r = r + r⊥ = γ(r − vt) + r⊥ = γ(r − vt) + r − r = r + (γ − 1)r − γvt . (3.22)

52 Core Principles of Special and General Relativity

Combining Eq. (3.20) with Eqs. (3.21) and (3.22), we have the vector form of the LT,

t =γ t − r · v/c2

(r · v)v

γ2

r =r + (γ − 1) v2

−

γvt

=

r

+

c2(1

+

(r γ)

·

v)v

−

γvt

γ2

=γ(r

−

vt)

+

c2(1

+

v γ)

×

(v

×

r)

,

(3.23)

where we’ve used A × (B × C) = B(A · C) − C(A · B) in the last line. Referring all vectors to the (x, y, z) basis, Eq. (3.23) can be expressed as a matrix equation,

ct   γ

x

 

y

  

=

−βxγ −βy γ

z

−βz γ

−βxγ 1 + αβx2 αβy βx
αβz βx

−βy γ
αβxβy 1 + αβy2 αβz βy

−βzγ  ct

αβxβz αβy βz

x

 

 

y

 

,

1 + αβz2 z

(3.24)

where α ≡ γ2/(1 + γ). If βy = βz = 0 and βx = β, Eq. (3.24) reduces to Eq. (3.17). Note the symmetry of the matrix in Eq. (3.24), which arises because boosts connect frames having parallel coordinate axes.
Equation (3.24) is therefore not the most general LT, because it prescribes transformations among a particular class of reference frames—those having parallel coordinate axes. We show in Chapter 6 that an arbitrary LT can be represented as a rotation followed by a boost. Rotations are described by three angles and boosts are described by three velocity components. The most general LT requires six parameters to be completely speciﬁed.

3.3 TRANSFORMATION OF VELOCITY AND ACCELERATION
The LT is a linear mapping between the coordinates of IRFs in relative motion. Velocity and acceleration involve ratios of differences between space and time coordinates. We can use the LT to “build” the transformation equations for these quantities.7

3.3.1 Velocity transformation

Let S move relative to S with constant velocity v. Let u = dr/dt be the velocity of a particle as seen in S and let u = dr /dt be the velocity of the same particle seen in S . Form the differentials dr and dt from Eq. (3.23) holding v constant:

γ2

dr

=γ

(dr

−

vdt)

+

c2(1

+

v γ)

×

(v

×

dr)

dt =γ dt − dr · v/c2 = γdt 1 − u · v/c2 .

(3.25)

Divide dr by dt in Eq. (3.25) to obtain the velocity transformation equation

u−v

γ v × (v × u)

u = 1 − v · u/c2 + c2(1 + γ) (1 − v · u/c2) .

By decomposing u = u + u⊥ into vectors parallel and perpendicular to v, we obtain

u −v u = 1 − v · u/c2 ,

(3.26) (3.27)

7Which is to say, the velocity and acceleration vectors do not transform according to the LT because they are not fourvectors. In Chapter 7 we deﬁne velocity and acceleration four-vectors by differentiating the four spacetime coordinates with respect to the proper time; these vectors do transform with the LT.

Transformation of velocity and acceleration 53

while u⊥ transforms as

u⊥

=

γ

[1

u⊥ −v·

u/c2]

.

(3.28)

Whereas the coordinates transverse to v are left unchanged, r⊥ = r⊥, the same is not true for the transverse velocity components; time transforms between frames, time is not absolute.
The inverse of Eq. (3.26) provides a clean statement of velocity addition in vector form. Switch primes and unprimes, and let v → −v:

v+u

γ v × (v × u )

u = 1 + v · u /c2 + c2(1 + γ) (1 + v · u /c2) .

(3.29)

Equation (3.29) speciﬁes the resultant of adding the velocity of S relative to S, v, to the velocity seen in S , u .

3.3.2 Non-colinear velocities

Equation (3.29) differs from Eq. (3.16), which applies for colinear velocities (all in the same line). When v and u are not colinear, new physical effects manifest.8 For non-colinear velocities, there’s an asymmetry in Eq. (3.29): The two velocities do not occur in the formula in a symmetrical manner. We deﬁne the direct sum of two velocities, which has ordered “slots” for the vectors being added,

va

⊕

vb

≡

va + vb 1 + va · vb/c2

+

γa c2(1 + γa)

va × (va × vb) 1 + va · vb/c2

,

(3.30)

where γa ≡ 1/ 1 − va2/c2 is the Lorentz factor associated with va. The relativistic addition of velocities is not associative,

v1 ⊕ v2 = v2 ⊕ v1 .

(3.31)

Only when the velocities are colinear does v1 + v2 = v2 + v1.

3.3.3 Acceleration transformation
The transformation equation for the acceleration vector a = du/dt can be obtained by differentiating Eq. (3.26) (v is constant),

1

γ

1

du = γ (1 − v · u/c2)2 du − c2(1 + γ) (v · du) v + c2 v × u × du .

(3.32)

Divide Eq. (3.32) by dt in Eq. (3.25) to obtain

1

γ

1

a = γ2 [1 − v · u/c2]3 a − c2(1 + γ) (v · a) v + c2 v × u × a .

(3.33)

While Eq. (3.33) is a complicated expression, it sufﬁces to note that a = a! That alone tells us that F = ma is not consistent with SR.9 We show in Chapter 7 how to “ﬁx up” Newton’s second law to be relativistically correct. Equation (3.33) can be simpliﬁed by decomposing a into vectors parallel and perpendicular to v, a = a + a⊥. We ﬁnd:

a a = γ3 [1 − v · u/c2]3

a⊥

=

a⊥ + v × (u × a)/c2 γ2 [1 − u · v/c2]3

.

(3.34)

We’ll use Eq. (3.34) in Chapter 12.

8The prime example is Thomas precession. 9We showed in Chapter 1 that F = ma is invariant under the Galilean transformation.

54 Core Principles of Special and General Relativity

3.4 RELATIVISTIC ABERRATION AND DOPPLER EFFECT
Let frames S and S be in standard conﬁguration—see Fig. 3.6. Let an object in S have velocity u in the x -y plane. Let u be oriented to the x -axis at angle θ so that ux = u cos θ and

yS

yS

u

v

u
θ x

θ x

Figure 3.6 Relativistic aberration.

uy = u sin θ . What is the angle θ observed in S between the velocity u and the x-axis? That question can be answered using the velocity transformation equations.
Use the inverse transformations of Eqs. (3.27) and (3.28):

ux

=

u cos θ

=

1

ux + v + vu cos θ

/c2

=

1

u cos θ + v + vu cos θ /c2

uy

=

u sin θ

=

γ (1 +

uy vu cos θ

/c2)

=

γ

u sin θ (1 + vu cos θ

/c2)

.

(3.35)

Divide the two equations in Eq. (3.35),

uy = tan θ = u sin θ

.

ux

γ (v + u cos θ )

For a light ray in S , set u = c in which case Eq. (3.36) becomes

(3.36)

sin θ

tan θ =

.

γ(β + cos θ )

(3.37)

Equation (3.37) is the formula for the relativistic aberration of light. As β → 1, the angle θ gets increasingly compressed into a cone of half-angle θ ≈ γ−1, a phenomenon known as relativistic
beaming. To show this, set β = 1 in Eq. (3.37), in which case we have, approximately,

1 tan θ ≈ tan(θ /2) .
γ

(3.38)

Irrespective of the emission angle θ , θ is compressed into γ−1 for sufﬁciently large γ.
Referring now to Fig. 3.7, suppose there’s a source of light at rest at the origin of S that emits
signals with frequency fe (as measured by an observer at rest relative to S ). The source emits signals into the direction θ , measured from the x -axis. Let E1, E2 denote the events in S at which successive light signals are emitted; the ﬁrst at t = 0 and the second at ∆T ≡ fe−1. In S, event E1 (corresponding to E1) occurs at the origin at t = 0, for which a light ray is seen to be emitted into the direction θ, measured with respect to the x-axis. (The two frames synchronized their clocks
as the origin of S passed the origin of S.) Event E2 in S (corresponding to E2) occurs at time t2 = γ∆T (time dilation) at position x2 = βγc∆T , at which another ray of light is seen to be emitted in S. (Figure 3.7 is not a spacetime diagram.10) In S, both rays are detected at a distant

10There are no spacetime diagrams in Chapter 3, which works with three-vectors, vectors in three spatial dimensions.

Aberration and Doppler effect 55

yS

P

yS

r1

r2

θ E1 E2

v

light ray

θ

x

x

E1, E2

Figure 3.7 Two photons emitted from source at rest in S in direction θ , observed in S.

location P . The question is, What is the time ∆T in S between the reception of the two signals?

The ﬁrst arrives at P at time T1 = r1/c, where r1 is the distance of P to the origin. The second

signal arrives at P at time T2 = γ∆T + r2/c, where r2 is the distance from P to the location x2.

Thus,

1 ∆T = T2 − T1 = γ∆T + c (r2 − r1) .

(3.39)

Assume that P is sufﬁciently distant that we can approximate r2 ≈ r1 − x2 cos θ = r1 − βγc∆T cos θ. Thus, from Eq. (3.39),

∆T = γ∆T (1 − β cos θ) .

(3.40)

Let fo ≡ (∆T )−1 denote the frequency observed in S; from Eq. (3.40)

fo

=

γ

(1

fe −β

cos θ)

=

γ

(1

+

β

cos θ

) fe

,

(3.41)

where the second equality follows from the aberration formula; see Exercise 3.8b. Equation (3.41)

is a general expression for the relativistic Doppler effect.

We’ll show in Section 5.3.2 that both Eqs. (3.41) and (3.37) (Doppler shift and aberration)

emerge as the result of a single LT involving an appropriately deﬁned four-vector, a vector in space-

time. That is, the Doppler effect (involving time) and aberration (involving spatial directions) are

two aspects of the same thing when viewed from the perspective of four-dimensional spacetime.

For θ = π in Eq. (3.41) (radiation emitted against the direction of motion, source receding), we

recover our previous result, Eq. (2.4), the longitudinal Doppler effect. For θ = π/2 in Eq. (3.41)

(radiation received in S orthogonal to the direction of motion), we have the transverse Doppler

effect:

1 fo = γ fe .

(3.42)

The transverse Doppler effect is a direct consequence of the time dilation of a moving clock; there is no analogous effect in pre-relativistic physics. It was ﬁrst measured in 1979.[21]

SUMMARY
We derived the LT for frames in standard conﬁguration using the homogeneity and isotropy of spacetime, and the principle of relativity, Eq. (3.17). The theory predicts a limiting speed, which experiment shows is the speed of light. We derived the LT for a general boost—where the velocity does not line up with coordinate axes—in Eq. (3.24). The addition of non-colinear velocities v1 and v2 is not associative, v1 + v2 = v2 + v1.

56 Core Principles of Special and General Relativity

EXERCISES
3.1 Show that Eq. (3.14) is an identity when the compound speed is given by Eq. (3.13), for any θ and f . Hint: Square Eq. (3.14) ﬁrst. Don’t forget the deﬁnition γ(v) = 1 − v2/(θf ) −1/2.

3.2

Show

that

Eq.

(3.13),

written

in

the

form

w

=

1

v1 + v2 + v1v2/ψ2

,

where

ψ

is

the

limiting

speed,

implies that w ≤ ψ. Hint: Let v1 = ψ − µ1 and v2 = ψ − µ2, where µ1 ≥ 0 and µ2 ≥ 0.

3.3

Referring

to

Fig.

1.1,

suppose

that

the

vector

R

is

time

dependent,

with

R

=

vt

+

1 2

at2

,

where v and a are constant vectors. The two observers synchronize their clocks as the origins

coincide. Suppose S is an IRF. Show that S is not an IRF if a = 0. Assume absolute time.

3.4 Derive Eq. (1.3). Let r = r − vt. Assume absolute time.

3.5 Derive Eq. (1.1). Show that if one frame in Fig 1.2 is an IRF, the other is as well. Thus, there is no unique orientation to IRFs. Note: φ is ﬁxed here. One system is rotated with respect to the other, not rotating.

3.6 Write down the inverse transformation to Eq. (1.1). Show that Rz(φ)Rz(−φ) = I2, the 2 × 2 identity matrix.
3.7 Show, under the transformation Eq. (1.1), that (x )2 + (y )2 = x2 + y2. That is, the distance to the origin (axis of rotation) is preserved under a rigid rotation of the coordinate axes.

3.8 a. Referring to Eq. (3.37), show that the aberration formula can be written

β + cos θ

cos θ =

.

1 + β cos θ

Hint: cos2 θ = (1 + tan2 θ)−1 b. Show that γ(1 + β cos θ ) = [γ(1 − β cos θ)]−1.

3.9 Let A = Azˆ be a velocity vector in a rectangular (x, y, z) coordinate system. Consider the
vector A+dA where dA = dA zˆ+dA⊥yˆ is a differential velocity with components parallel and perpendicular to A. Deﬁne another differential velocity dw ≡ γA2 dA zˆ + γAdA⊥yˆ, where γA ≡ 1/ 1 − A2/c2. Using Eq. (3.30) show, to ﬁrst order in small quantities, that A ⊕ dw = A + dA.

4 C H A P T E R
Geometry of Lorentz invariance

W HAT is the essence of SR? If you had to reduce relativity to a one-line description what might it be? The Bondi k-factor was derived in Chapter 2 using the principle of relativity (all inertial observers can claim themselves at rest, isotropy of the speed of light). The standard effects of SR were then derived using the k-factor, including the LT. Can it be said that the k-factor is the essence of SR? In Chapter 3 the LT was derived using the principle of relativity (all IRFs are equivalent), linearity (all inertial observers see straight worldlines for free particles), and isotropy and homogeneity of spacetime. Perhaps the LT is the heart of SR? The invariance of the spacetime separation follows from the LT, but can also be explained using the principle of relativity (all inertial observers claim they are at rest, all measure the same speed of light, Section 1.5). Amongst these interconnected ideas, can one be seen as more fundamental? As we continue in our study of relativity, we will have fewer opportunities to explicitly invoke the principle of relativity, and we’ll rely progressively more on the use of Lorentz invariance. A Lorentz invariant is a quantity that remains unchanged under the LT, what all inertial observers ﬁnd to be the same. Lorentz invariance brings to the fore Einstein’s program for relativity that what is not relative has objective meaning.
In this chapter we look at the geometry of spacetime implied by the Lorentz invariance of the spacetime separation. We will promote invariance to a more fundamental status than the LT. Instead of saying that the invariant separation follows from the LT, the LT will be deﬁned as any linear transformation that preserves the spacetime separation. If we had to come up with a “tagline” for SR, it might be the physics of the invariant separation in absolute spacetime. Such a description presages that for GR, which might be the physics of dynamic spacetime.

4.1 LORENTZ TRANSFORMATIONS AS SPACETIME ROTATIONS
In this section we show that boosts can be considered rotations in spacetime. To develop that idea we ﬁrst consider rotations in Euclidean space and apply what we learn to LTs.

4.1.1 Active vs. passive transformations
Equation (1.1), which describes how the components of a two-dimensional vector transform upon rigidly rotating the x and y-axes counterclockwise through an angle φ (see Fig. 4.1), can be written

(r) = R(φ)r .

(4.1)

Parentheses have been placed around r in Eq. (4.1) to indicate that r is the same vector before and after the transformation: only the coordinates have changed as a result of changing the coordinate

57

58 Core Principles of Special and General Relativity

y

y

y

φ

r

x

φ x

r φ r
x

Figure 4.1 Left: Passive transformation—same vector r, different coordinate system. Right: Active transformation—different vectors, r, r , same coordinate system.

system (left portion of Fig. 4.1). A rotation can just as well be seen, however, as a transformation of the vector, changing not the coordinate system but changing r to a new vector r ,

r = R(φ)r ,

(4.2)

where the components of r and r are expressed with respect to the same coordinate system (right portion of Fig. 4.1). In either case, the coordinates are transformed. The transformation in the form (r) = R(φ)r is a passive transformation, leaving the vector unchanged but changing the coordinate axes, while r = R(φ)r is an active transformation, changing the vector with respect to the same coordinate system. It’s not necessary to make the notational distinction in Eq. (4.1). Rotating the coordinate axes counterclockwise by the angle φ is equivalent to rotating the vector r clockwise through φ. The components of r in an active rotation are the same as those of (r) in a passive rotation except for the sign of the angle: a positive angle in the active transformation is opposite to that for the passive.

4.1.2 Rotational symmetry
Symmetries have two aspects: a transformation and something invariant under the transformation. For rotations the distance to the rotation axis is preserved: x 2 + y 2 = (x cos φ + y sin φ)2 + (−x sin φ + y cos φ)2 = x2 + y2. The terms x2 + y2 can be generated from the inner product rT r:

rT r = x y

x y

= x2 + y2 .

(4.3)

Rotational invariance can then be expressed as r T r = rT r. Using Eq. (4.2),

rT r = r T r = rT RT (φ)R(φ)r ,

(4.4)

where we’ve used for matrices A and B, (AB)T = BT AT . Rotational symmetry requires of R

RT (φ)R(φ) = I ,

(4.5)

or that RT (φ) = R−1(φ) = R(−φ) (an orthogonal matrix). It can be seen explicitly from Eq. (1.1) that RT (φ) = R(−φ).
The inner product in Euclidean space is deﬁned by r · r ≡ rT r = x2 + y√2. Clearly r · r ≥ 0, where equality implies r = 0. The length (norm) of r is deﬁned as ||r|| ≡ r · r = x2 + y2.
From Eq. (4.4) the inner product is invariant under rotations, as is the norm, ||r || = ||R(φ)r|| =
||r||. The invariance of the inner product generalizes to different vectors, r1 · r2 = r1 · r2. The distance from r1 to r2 is also preserved, ||R(φ)r1 − R(φ)r2|| = ||R(φ)(r1 − r2)|| = ||r1 − r2||.

Rotational symmetry 59

Even though the coordinates change under rotation, the norm and the inner product do not. Coordinates are relative to a coordinate system, but the norm and inner product are absolute quantities— they have the same value in all frames connected by rotation. Because these quantities are absolute, they have geometric meaning. A geometry is characterized by its symmetries, its invariants.
Seen as a passive transformation, the invariance of the inner product under rotations should not come as a surprise—it’s the same vector before and after the transformation. Seen as an active transformation, rotations map circles onto themselves—an invariant circle; see Fig 4.2. Let x = cos α
yr
φ r
α
x

Figure 4.2 Invariant circle under active rotations (dashed line).

and y = sin α be the coordinates for a point on the unit circle. The rotation matrix R(φ) maps (x, y) into (x , y ), where

x y

=

cos φ − sin φ

sin φ cos φ

cos α sin α

=

cos α cos φ + sin α sin φ sin α cos φ − cos α sin φ

=

cos(α − φ) sin(α − φ)

.

The transformed point in the active rotation lies on the unit circle at the angle α − φ. All possible rotations in the Euclidean plane about a ﬁxed axis are represented by the mapping of a point on a circle into another point on the same circle.

4.1.3 Invariance of the spacetime separation under Lorentz transformations

Under LTs the spacetime separation from the origin is preserved:

−(ct )2 + x 2 + y 2 + z 2 = −(ct)2 + x2 + y2 + z2 .

(1.7)

In analogy with rotations, therefore, boost transformations can be considered rotations in spacetime,

even though they can’t be visualized as such. Rotations in Euclidean space are the result of twisting

around an axis of rotation. About what axis are we twisting in implementing a LT? It can’t be

visualized. If we generalize the concept of rotation to be a mapping affecting pairs of coordinate

axes (x and y, for example), the LT is a rotation in that sense. In four-dimensional spacetime, there

are

4 2

= 6 pairs of axes: three pair the time axis with a spatial axis, and the other three involve

space-space pairs of axes.1 We show in Chapter 6 that the most general LT is characterized by

six independent parameters pertaining to mappings of the six possible pairs of coordinate axes in

four-dimensional spacetime.

Equation (1.7) speciﬁes that for an event with spacetime coordinates (ct, x, y, z) in one frame,

then for the coordinates of the same event in another IRF, (ct , x , y , z ), the two sets of coordinates

are such that Eq. (1.7) is satisﬁed.2 For frames in standard conﬁguration, the transverse coordinates are unaffected by the motion, and in that case −(ct)2 + x2 is an invariant, one that’s easier to see

the geometric meaning of.

1We’re using the binomial coefﬁcient, “N choose k,”

N k

= N !/(k!(N − k)!). In three dimensions there are

3 2

=3

pairs of coordinate axes, the same number as the dimension of the space, and thus we can associate a three-dimensional

vector with a rotation. The case of three dimensions is special, however: n = 3 is the non-trivial solution of

n 2

= n. Only

in three dimensions can we associate a rotation with a vector in the same space. The axis of rotation in a four-dimensional

space would be a vector in a six-dimensional space.

2Such a statement is possible only if the two frames have a common spacetime origin.

60 Core Principles of Special and General Relativity

The hyperbolic form of the LT (for frames in standard conﬁguration) is, from Eq. (A.5):

ct x

=

cosh θ sinh θ

sinh θ cosh θ

ct x

≡ L(θ)

ct x

,

(4.6)

where tanh θ = −β. Using Eq. (4.6),

−(ct )2 + x 2 = − (ct cosh θ + x sinh θ)2 + (ct sinh θ + x cosh θ)2 =(cosh2 θ − sinh2 θ)(−(ct)2 + x2) = −(ct)2 + x2 .

(4.7)

For given coordinates (ct, x), compute the number k ≡ −(ct)2 + x2, where k can be positive,
negative, or zero (contrast with the Euclidean distance, which is always positive). Equation (4.7) shows that −(ct )2 + x 2 = k for all possible LTs starting with (ct, x). The locus of points such that −(ct)2 + x2 = k is a hyperbola; see Fig. 4.3. The active form of the LT maps hyperbolas onto
themselves, the invariant hyperbola. The asymptotes are the lightlines ct = ±x.

ct k<0

k>0

x

Figure 4.3 Invariant hyperbola −(ct)2+x2 = k. For k > 0 or k < 0 there are two branches.

Figure 4.4 shows the LT from the passive and active points of view. As a passive transformation

ct

ct

φ tan φ = β

(r)
x
φ x

ct

tanh θ = −β r

P θ

αr Q

√

x

k

Figure 4.4 Passive and active forms of the Lorentz transformation.
the same spacetime point has coordinates in two reference frames (left portion of Fig. 4.4). To understand the active transformation, consider the hyperbola associated with k > 0. As shown in the right portio√n of Fig. 4.4, the s√pacetime vector r intersects the hyperbola at point P with coordinates ct = k sinh α and x = k cosh α, where α is a hyperbolic angle, the angle between

Kinematic effects from the invariant hyperbola 61

r and the x-axis.3 The LT maps the point (ct, x) into (ct , x ), where

ct x

=

cosh θ sinh θ

sinh θ cosh θ

√ √k sinh α
k cosh α

√ =k

sinh(α + θ) cosh(α + θ)

.

The transfo√rmed coordinates (relative√to the same coordinate axes) are to be found on the hyperbola with ct = k sinh(α + θ) and x = k cosh(α + θ), i.e., at the hyperbolic angle α + θ, the point Q in Fig. 4.4. A boost is therefore a rotation4 along the invariant hyperbola through the angle θ, where
tanh θ = −β.

4.2 KINEMATIC EFFECTS FROM THE INVARIANT HYPERBOLA

ct ct P

Q

x

√

x

k

Figure 4.5 P lies on the h√yperbola −(ct)2 + x2 = k which intersects the x-axis for all IRFs at the same values x = k.

Event P in Fig. 4.5 has coordinates (ct, x) in one frame and (ct , x ) in another. P lies on√the hyperbola −(ct)2 + x2 = k for some k√> 0. The hyperbola intersects the x-axis (t = 0) at x = k, and the x -axis (t = 0) also at x = k (point Q). Because the hyperbola is Lorentz inva√riant, it intersects the x -axis associated with any LT (starting from (ct, x)) at the same value, x = k.
The invariant hyperbola can be used to illustrate time dilation and length contraction. Referring to Fig. 4.6, IRFs S and S in relative motion synchronize their clocks when the origins coincide. The worldlines of the clocks consist of the time axes, t and t . Let the clock in S show one unit of time at event B which is where the unit hyperbola (k = −1) intersects the t -axis. The hyperbola intersects the t-axis at event A, which is also one unit of time for the clock in S. In S, event B is simultaneous with B (draw a line parallel to the x-axis). Because B > A, S concludes that the moving clock in S runs slow. As seen from S , however, event A is simultaneous with A (draw a line parallel to the x -axis). Because A > B , S concludes that the clock in S runs slow. Both observers conclude that a clock in motion runs slow.
Referring again to Fig. 4.6, let there be a rigid rod in S , in motion relative to S. At the instant the back end of the rod is at the origin of both frames, the front end is at Q in S , where the unit hyperbola (k = 1) intersects the x -axis. (S measures Q as the length of the rod at t = 0.) The hyperbola intersects the x-axis at P , also at one unit of length. The worldline of the front end of the rod intersects the x-axis at Q, which is the measured length in S (two ends of the rod at the same time, t = 0). Because Q < P , S concludes that the moving rod is contracted in length. Now let the rod be at rest in S, in motion relative to S . S measures the length of the rod as P (both ends at the
3Hyperbolic angle is measured in hyperbolic radians, twice the area enclosed by the vector r, the unit hyperbola, and the x-axis, similar to a circular radian which is twice the area between r, the unit circle, and the x-axis.[22, p444] The range of hyperbolic angle is unlimited, as we see from tanh θ = −β.
4The non-associativity of velocity addition (Section 3.3) is a consequence of the non-commutativity of rotations in four-dimensional spacetime. As is well known, three-dimensional rotations do not commute.

62 Core Principles of Special and General Relativity

ct

A ct

B

B

x

A PQ

O

QP

x

Figure 4.6 Length contraction and time dilation using the invariant hyperbola.

same time, t = 0). The worldline of the front edge of the rod intersects the x -axis at P , which is the length measured by S (both ends of the rod at the same time, t = 0.) Because P < Q , S concludes that a moving rod is contracted in length. Both observers conclude that a rod in motion is contracted.
Length contraction and time dilation are thus symmetric between IRFs. That’s quite possibly the key difference between SR and the ether model (Section 2.6), wherein length contraction and time dilation purportedly occur relative to the ether in order to explain the MM and KT experiments, properties attributed to the ether for the sole purpose of allowing it to evade detection! In SR, length contraction and time dilation are relations between coordinates assigned to events by any two IRFs.

4.3 CLASSIFICATION OF LORENTZ TRANSFORMATIONS

We now deﬁne a LT to be any linear transformation that preserves the spacetime separation. To see how that comes about, deﬁne a time inversion operator T that maps t → −t and leaves the spatial coordinates unchanged, a matrix representation of which is:

−1 0 0 0

T

=

  

0 0

1 0

0 1

0 0

.

(4.8)

0 001

With T , the spacetime separation can be generated from rT T r = −(ct)2 + x2 + y2 + z2. With the coordinates in another IRF obtained from the LT, (r) = L(v)r, we have the invariance of the separation expressed in the form r T T r = rT T r, implying that r T T r = rT LT T Lr = rT T r. We thus have a requirement on any linear transformation L that preserves the spacetime separation,

LT T L = T .

(4.9)

Equation (4.9) is analogous to the orthogonality condition for Euclidean rotations, Eq. (4.5), which

could be written RT IR = I to make it appear like Eq. (4.9). Instead of the operator T , however, we

could just as well deﬁne the parity operator, which inverts the spatial coordinates and leaves time

unchanged:

1 0 0 0 

P

=

0 0

−1 0

0 −1

0 0

  

.

(4.10)

0 0 0 −1

Lorentz group 63

Equation (1.7) would then be generated by the invariance of rT P r, implying LT P L = P .

(4.11)

Equations (4.9) and (4.11) each impose a requirement on transformations that preserve the spacetime separation. Which should we use? It’s traditional in the relativity literature to write

LT ηL = η ,

(4.12)

where η is a diagonal matrix with either (−1, 1, 1, 1) on the diagonal, symbolized diag(−1, 1, 1, 1), or diag(1, −1, −1, −1). Both conventions are in prevalent use. We will use η = diag(−1, 1, 1, 1), Eq. (1.14).5 The matrix η is the Lorentz metric tensor. Any linear transformation satisfying Eq. (4.12) is a LT.
Transformations satisfying Eq. (4.12) have the mathematical property of constituting a group, the Lorentz group. Without venturing unduly into group theory, the four properties a set of elements must have to be a group are easily demonstrated:6
• If L1 and L2 are LTs, so is the composition L ≡ L1L2. From Eq. (4.12), LT2 LT1 ηL1L2 = LT2 ηL2 = η. A LT followed by a LT is itself a LT, a manifestation of the principle of relativity—all IRFs are equivalent.7
• The composition law for LTs (matrix multiplication) is associative: (L1L2)L3 = L1(L2L3).
• There exists an identity element L = I, which qualiﬁes as a LT, IηI = η.
• For each L there exists L−1 in the group, which is more difﬁcult to prove, although it must be so physically by the principle of relativity. Take the determinant of Eq. (4.12). Using the rules of determinants, including det LT = det L, we ﬁnd (det L)2 = 1 and hence det L = ±1. Because det L = 0, L−1 exists. To show that it belongs to the group, multiply Eq. (4.12) from the left by LT −1 and from the right by L−1: LT −1 LT ηLL−1 = LT −1 ηL−1, and thus η = L−1 T ηL−1 because for any matrix LT −1 = L−1 T . Hence, L−1 is a LT.
For later use with tensor analysis, it will be useful to adopt a notation for the elements of the LT matrix, the utility of which will become apparent in the next chapter. Denote the elements of the LT matrix as Lµν , where the top (bottom) index labels rows (columns), and where Greek letters conventionally have the range (0, 1, 2, 3), with 0 labeling the time coordinate (Section 1.3).
It can be shown (Exercise 4.1) that L00 2 ≥ 1. There are then four possible types of LT: L00 ≥ 1 (orthochronous), L00 ≤ −1 (non-orthochronous), det L = ±1 (proper or improper). These constitute four categories of LTs, conventionally denoted as follows.
• L↑+ : det L = +1, L00 ≥ 1, proper, orthochronous LTs. The requirement det L = +1 (proper LTs) excludes the possibility of P or T as LTs. As a result, LTs ∈ L↑+ connect smoothly with the identity transformation as the transformation parameter (speed or rotation angle) continuously goes to zero. The requirement L00 ≥ 1 maps positive time onto positive time, “orthochronous.”
• L↑− : det L = −1, L00 ≥ 1, improper, orthochronous LTs. This class allows for the possibility of P as a LT, but excludes T .

5I prefer (−1, 1, 1, 1) because it singles out time for special treatment, and relativity is all about time. 6I say without unduly venturing into group theory because it’s a vast subject. Much you may try to resist it, “der
Gruppenpest” is an essential part of theoretical physics, which, if studied long enough, will entail picking up at least a
nodding acquaintance with group theory. Groups are deﬁned in almost any undergraduate book on algebra. 7We showed explicitly in Section 3.1.4 that the group property is satisﬁed among frames in standard conﬁguration; here
we’re establishing it for any linear transformation that satisﬁes Eq. (4.12).

64 Core Principles of Special and General Relativity

• L↓− : det L = −1, L00 ≤ −1, improper, non-orthochronous LTs. This class includes T but excludes P .
• L↓+ : det L = +1, L00 ≤ −1, proper, non-orthochronous LTs. This class allows for the combined operation P T , inversion of time and space.
Only LTs ∈ L↑+ are elements of the Lorentz group, because only this class includes the identity transformation. As a result, only LTs ∈ L↑+ can be built up out of inﬁnitesimal LTs (Chapter 6). It can be shown that L ∈ L↓− can be written as the product T L , with L ∈ L↑+. Thus we have the mapping T L↑+ → L↓−. Similarly, we have the mappings P L↑+ → L↑− and T P L↑+ → L↓+. All possible LTs can therefore be obtained from L ∈ L↑+ and the discrete transformations T and P .8

4.4 SPACETIME GEOMETRY AND CAUSALITY

4.4.1 Vector norm

A geometry involves the ability to specify the distance between points, and a natural way to do that
is through the inner product between vectors—which allows one to assign a magnitude to vectors. We deﬁne the inner product between spacetime position vectors9 using the Lorentz metric:

−1 0 0 0 ct

ct

r · r ≡ rT ηr = ct

x

y

z

0

 

0

1 0

0 1

0 0

  

x y

  

=

−ct

x

y

z

x

 

y

 

0 001 z

z

= − (ct)2 + x2 + y2 + z2 .

(4.13)

In this way r · r is a Lorentz invariant: For (r) = L(v)r, r · r ≡ (L(v)r)T ηL(v)r = rT LT (v)ηL(v)r = rT ηr ≡ r · r ,

where we’ve used Eq. (4.12). The invariance of the dot product extends to different position vectors, r1 · r2 = r1 · r2. The invariant inner product is highly useful in practice: If its value is known in one IRF, it has the same value in any other frame obtained from the ﬁrst by a LT. Just as with rotations in Euclidean space, the invariance of the inner product among spacetime vectors is a property of the geometry. While coordinates transform between IRFs, there is an intrinsic property of the spacetime geometry, the inner product, which has been deﬁned to generate the spacetime separation.
The norm of a spacetime vector can now be deﬁned,

√ √r · r ||r|| ≡ −r · r 0

for r spacelike (r · r > 0) for r timelike (r · r < 0) . for r lightlike (r · r = 0)

(4.14)

A vector with unit norm is a unit vector. Note that in contrast with Euclidean geometry, where r · r = 0 implies r = 0, a spacetime vector can have zero norm and be non-zero. A vector with zero norm is called a null vector. Because the norm has been deﬁned using the inner product, it too is a Lorentz invariant.
Spacetime separations are non-intuitive. The analog of the “3-4-5” right triangle in Euclidean geometry is in spacetime geometry a “3-5-4” triangle (see Fig. 4.7), an instance of time dilation—

8In quantum ﬁeld theory, systems with Lorentz symmetry must also have CPT symmetry—that the physics is unaffected by the combined transformation CPT where C (“charge conjugation”) converts a particle into its antiparticle.
9The symbol r has been used to denote two-dimensional vectors, as in Eq. (4.3), three-dimensional vectors, as in Eq. (3.23), and now as a four-dimensional vector. Soon we’ll refer to a spacetime position vector as a four-vector.

Spacetime geometry and causality 65 ct 3

54 x
Figure 4.7 Moving clocks run slow: in the non-Euclidean geometry of spacetime.

the length of the hypotenuse (proper time) is shorter than the base of the triangle (time ascribed to the moving clock). The hypotenuse is a timelike vector with norm −[−52 + 32] = 4. Does the hypotenuse in Fig. 4.7 look shorter than the base of the triangle? Don’t bring your geometric expectations, based on a lifetime of Euclidean reasoning, to spacetime diagrams. Figure 4.8 shows

ct

ct

x

x

Figure 4.8 Timelike unit vectors (left) and spacelike unit vectors (right).
timelike and spacelike unit vectors: Each vector in Fig. 4.8 connects the origin with a unit hyperbola. These are not all unit vectors in the same frame; each would be a unit vector in some IRF, however.
4.4.2 Orthogonality
Can spacetime vectors be orthogonal? We assumed in setting up spacetime diagrams that the time axis (timelike) is orthogonal to the space axis (spacelike). Vectors can be timelike, lightlike, or spacelike. Is orthogonality possible for each type of vector? For A and B spacetime position vectors, can A · B = 0, where the inner product is deﬁned in Eq. (4.13)? Denote the time component of A as A0 and the space components as A (a notation we use in Chapter 5). Then, using Eq. (4.13), A · A = −(A0)2 + ||A||2, where here ||A|| = A · A is the Euclidean norm of the spatial part of the vector. A spacetime vector is spacelike, lightlike, or timelike according to whether ||A|| > |A0|, ||A|| = |A0|, or ||A|| < |A0|. Orthogonality implies that A0B0 = A · B. Of the three types of vector (timelike, spacelike, lightlike), which can meet this condition?
• Let A be a timelike vector. Vector B can be orthogonal to A only if B is spacelike. Proof : Assume that A0B0 = A · B (orthogonality) and |A0| > ||A|| (timelike). Then, |A0B0| = |A · B| ≤ ||A|| ||B|| (Schwartz inequality). But |A0| > ||A||, implying that |B0| < ||B||. For orthogonality to hold, B can only be spacelike. Timelike vectors cannot be orthogonal, and timelike vectors cannot be orthogonal to lightlike vectors. A timelike vector can only be orthogonal to a spacelike vector.
• Let A be a lightlike vector. Vector B can be orthogonal to A if B is spacelike or lightlike.

66 Core Principles of Special and General Relativity
Proof : From A0B0 = A · B, |B0| = |A · B|/|A0| ≤ ||A|| ||B||/|A0| (Schwartz). But |A0| = ||A|| (lightlike), implying that ||B|| ≥ |B0|. B is either spacelike or lightlike. Lightlike vectors can be orthogonal to lightlike and spacelike vectors; two lightlike vectors can be orthogonal if and only if they’re scalar multiples of each other.
• It is possible for spacelike vectors to be orthogonal.
=⇒ Two spacetime vectors can be orthogonal if at least one of them is spacelike or both are lightlike, in which case they are proportional to each other.
4.4.3 Partition of spacetime Spacetime can be partitioned into ﬁve regions (see Fig. 4.9):
T+ ≡ Future timelike, (∆s)2 < 0, t > 0; T− ≡ Past timelike, (∆s)2 < 0, t < 0;
S ≡ Spacelike, (∆s)2 > 0; L+ ≡ Future lightlike (future light cone), (∆s)2 = 0, t > 0; L− ≡ Past lightlike (past light cone), (∆s)2 = 0, t < 0 . The LT maps each of these regions onto itself. Because (∆s)2 is Lorentz invariant, the spacelike

Figure 4.9 Partition of spacetime into spacelike, lightlike, and timelike regions.

region S is mapped onto itself under the LT; likewise with timelike and lightlike regions, where we include future and past sets. The past and future sets, however, are separately preserved under the LT. We prove this for T+, where for t > 0, we must show that t > 0 for x ∈ T+. From Eq. (4.6),

ct = ct cosh θ + x sinh θ .

(4.15)

For x ∈ T+, −ct < x < ct. Using this inequality together with Eq. (4.15), it can be shown that te−θ < t < teθ, and hence that t > 0 if t > 0. The proof for T− is similar. It’s simple to show that
the future and past light cones are separately preserved under the LT.

4.4.4 Temporal order and causality
Events in T+ cannot be simultaneous in any reference frame, and the temporal order in which events occur is the same for all observers (discussed in Section 1.5). Event A in Fig. 4.10 has t > 0 and

Spacetime geometry and causality 67
ct absolute future
A O x
B absolute past
Figure 4.10 Absolute past and absolute future. Events in T+ occur after the event at O in any reference frame. Events in T− occur before O in all reference frames.
occurs after the event at the origin, O. No LT can change the time coordinate of A to t < 0. Thus A occurs after O in all IRFs. For this reason, T+ is called the absolute future because events in this region occur after O in all frames. Likewise, T− is called the absolute past because events in this region occur before O in any frame.
Events on the future light cone, L+, can be inﬂuenced by electromagnetic signals from a source at O. Because physical effects cannot propagate faster than light, any effect originating at O can reach only those points inside T+ or on L+. And because the temporal order of events cannot be altered by a LT in this region of spacetime, it’s possible to introduce notions of cause and effect in an absolute sense, independent of reference frame. A causal connection between events can exist only if they are timelike or lightlike separated. Vectors that are either timelike or null are called causal spacetime vectors. At each point in spacetime, there corresponds a light cone with its vertex at that point—see Fig. 4.11. Each event along the worldline of a particle can affect only those events that
Figure 4.11 A point in spacetime can inﬂuence only those events within its future lightcone. lie in or on its future light cone, and can be affected only by events in or on its past light cone.10
10In quantum ﬁeld theory, measurements of a ﬁeld at the origin and at an event P do not interfere if P is spacelike separated from the origin. Operators must therefore commute for spacelike-separated events.

68 Core Principles of Special and General Relativity

SUMMARY
• A LT is any linear transformation such that LT ηL = η, where η = diag(−1, 1, 1, 1).
• The inner product between spacetime vectors is Lorentz invariant where r · r = rT ηr.
• Spacetime can be partitioned into a future timelike region, a past timelike region, a spacelike region, and the future and past light cones. The LT maps each of these regions into themselves.
• There can be a causal connection between events in spacetime, where notions of cause and effect, “earlier” and “later” are independent of reference frame, only if they are timelike or lightlike separated.
• Two spacetime vectors can be orthogonal if one of them is spacelike or both are lightlike.

EXERCISES

4.1 Show that L00 2 = 1 +

3 i=1

Li0 2. Conclude that

L00

2 ≥ 1. Hint: Use Eq. (4.12).

4.2 Show that L−1 = ηLT η and LT = ηL−1η. Hint: (η)2 = I.

4.3 Assume a 2D LT, where η =

−1 0

0 1,

L=

L00 L10

L01 L11

.

a. Use the result of Exercise 4.2 to ﬁnd L−1 in terms of the matrix elements of L. Because we have generally that L−1(v) = L(−v), use your result for L−1 to argue that the offdiagonal terms must be odd functions of v, while the diagonal terms are even functions.
b. Use your result for L−1 to conclude that L must be a symmetric matrix with L11 = L00. Hint: Compare the results of LL−1 = I with LT ηL = η.

4.4 Show that the inner product between two spacetime vectors is invariant under the LT, r1·r2 = r1 · r2, where the inner product is deﬁned as r1 · r2 = r1T ηr2.
4.5 Events A, B, and C have spacetime coordinates (ct, x) of (2, 1), (7, 4), and (5, 6), respectively. For each pair of events, answer the questions: (1) Are the events timelike, spacelike, or lightlike separated?; (2) Is it possible that one of the events could be caused by the other?
4.6 Show:

a. If two lightlike vectors are orthogonal, they are scalar multiplies of each other;
b. That the inner product between a lightlike vector and a timelike vector is negative;
c. That the sum of a lightlike vector and a timelike vector is timelike. Use the result of the previous problem;
d. For two future-pointing timelike vectors, i.e., spacetime vectors A, B with A0 > 0 and B0 > 0, that A · B < 0;
e. For A a future-pointing timelike vector and B a past-pointing timelike vector, that A · B > 0.

5 C H A P T E R
Tensors on flat spaces
E INSTEIN’S program for GR, that the laws of physics be independent of coordinate system, is realized by expressing equations as relations between tensors.1 Tensors are highly useful in SR as well: The time has come to address this important topic.
Tensors are generalizations of vectors. In an n-dimensional space, a vector is speciﬁed by n numbers. A second-rank tensor requires n2 numbers for its speciﬁcation; a rank-r tensor requires nr numbers. Physical quantities exist requiring more than n numbers for their speciﬁcation, the stress tensor for example. We start by deﬁning vectors in spacetime, and then work our way up to tensors. The traditional way of introducing tensors is through their transformation properties—how the nr numbers transform between reference frames (Section 5.1). In Section 5.5 we show that tensors are linear relations between scalars, vectors, and even other tensors.
5.1 TRANSFORMATION PROPERTIES 5.1.1 Spacetime position four-vector
Spacetime is modeled as a four-dimensional continuum2 obtained from the concatenation of space (R3) with time (R), R4 = R3 × R. Unadorned R4, however, cannot support the physics of SR; we require a mathematical model having more structure. Minkowski space (MS) is a four-dimensional vector space (with points in one-to-one correspondence with those of R4) spanned by one timelike basis vector, et, and three spacelike basis vectors, ex, ey, ez, where by convention basis vectors are labeled with subscripts.3 While any four linearly independent vectors can constitute a basis (known as a tetrad), in IRFs we require time to be orthogonal to space. Points in MS (events) are located by a position vector (relative to the origin-event4) r = rtet + rxex + ryey + rzez, called a four-vector, where by convention coordinates are labeled with superscripts. A change in reference frame is a change of basis vectors (the passive form of the transformation) in such a way that the components of r transform according to the LT.5 In Section 4.4 we were careful, in introducing the inner product, to refer to spacetime position vectors, because so far that’s the only four-vector we have: a position vector for every event. As we develop SR, a succession of four-vectors will be introduced. The ediﬁce of relativity theory is built on four-vectors and the Lorentz invariants that can be constructed from them. With the understanding that additional four-vectors are forthcoming,
1A sizable portion of Einstein’s 1916 article is devoted to tensors in a section, “Mathematical Aids to the Formation of Generally Covariant Equations.”[9, p111].
2Appendix C reviews linear algebra, including the Cartesian product. While the universe is well described as a fourdimensional entity, string theory is a proposed framework for quantum gravity that invokes extra spatial dimensions.
3Every vector space has a basis, with its dimension the maximum number of linearly independent vectors. 4Vector spaces require a zero vector, which we can take as an arbitrarily chosen event for the spacetime origin. 5Minkowski space is an inner-product space endowed with a very speciﬁc structure; it’s not simply a vector space. To speak of timelike and spacelike vectors, an inner product must already have been introduced—Section 4.4.
69

70 Core Principles of Special and General Relativity

we refer in this chapter to arbitrary four-vectors A:

A = Atet + Axex + Ayey + Azez .

(5.1)

We’ll soon distinguish two types of vectors: contravariant and covariant. Any vector whose prove-

nance can be traced to the position vector (more generally to oriented line elements) is referred to

as contravariant. The vector A in Eq. (5.1) has the form of a contravariant vector. Covariant vectors

are a geometrically distinct type of vector, related to oriented surface elements.

We adopt a notational convention that allows us to write four-vectors more compactly than

Eq. (5.1). We reserve zero to label the time component of four-vectors as well as the associated

basis vector, and we use 1, 2, 3 to label spatial components and basis vectors, instead of x, y, z, or

r, θ, φ. Thus, Eq. (5.1) can be written A = A0e0 + A1e1 + A2e2 + A3e3. This convention enables

the use of summation notation: A =

3 α=0

Aαeα.

Note

the

Greek

letter

α

as

the

summation

index. A convention in the theory of relativity is that if the sum runs from 0 to 3, use a Greek

letter as the summation index; however, if the sum runs from 1 to 3, use a Roman letter. Thus,

3 α=0

Aαeα

=

A0e0

+

3 i=1

Aiei.

Now,

having

introduced

this

convention,

much

of

what

we

cover in this chapter is general tensor analysis pertaining to any space and not speciﬁcally to MS.

When that’s the case there’s no need to adopt a notation that singles out the timelike dimension.

When we deal with relativity, however, we stick to the convention.

We will encounter expressions involving sums over numerous indices, and writing out the sum-

mation symbols becomes cumbersome. The Einstein summation convention is that repeated raised

and lowered indices imply a sum. Thus,

3 α=0

Aα

eα

≡

Aαeα.

Of

course, α

is

a

dummy

index

that has no absolute meaning. The expressions Aαeα = Aβeβ = Aγeγ are equivalent and imply

the same sum. Remember: The rule is that repeated upper and lower indices imply a summation.

Terms such as Aαeβ do not imply a sum. I will gradually work in the summation convention to gain

practice with it, but after a point I will simply use it without comment. We’ll use xµ to denote the components of the four-position, xµ = (x0, x1, x2, x3). Not just

any collection of four numbers constitutes the components of a four-vector.6 For example, can we

package the components of the electric ﬁeld vector E into a four-vector, ﬁnding something suitable

to include as the time component? It turns out the answer is No. Likewise there is no four-vector having the components of the orbital angular momentum L as its spatial part.7

5.1.2 Metric tensor

We deﬁned the inner product between position four-vectors in Eq. (4.13) so as to produce an invariant under the LT, the spacetime separation. Here we generalize the inner product for arbitrary four-vectors in a way that it generates an invariant under any invertible coordinate transformation, which includes the LT. For vectors deﬁned with respect to the same basis, A = Aαeα, B = Bβeβ (summation convention), form the inner product by “dotting” them together,

A · B = A0e0 + A1e1 + A2e2 + A3e3 · B0e0 + B1e1 + B2e2 + B3e3

33

=AαBβ (eα · eβ) ≡

AαBβ (eα · eβ) ,

α=0 β=0

(5.2)

where a double sum is implied by two sets of repeated upper and lower indices. There are 16 terms in Eq. (5.2) when it’s expanded out. We’ve “passed the buck” in deﬁning the inner product between four-vectors to the inner product between basis vectors, eα · eβ. We’re going to leave these as unspeciﬁed for now and represent them with a new symbol labelled by two indices,

gαβ ≡ eα · eβ .

(5.3)

6Another way of saying that Minkowski space is not R4. 7Fear not, however: The vectors E and L will ﬁnd their place as components of tensors.

Metric tensor 71

The 16 quantities {gαβ} are the elements of the metric tensor,8 our ﬁrst tensor. The metric tensor is one way to deﬁne a geometry:9 Geometric properties such as arc length and surface area can be
calculated once the metric has been speciﬁed. Said differently, each geometry (including spacetime)
has its own metric tensor. Combining Eqs. (5.3) and (5.2),

A · B = gαβAαBβ





33

≡

gαβAαBβ .

α=0 β=0

The components of the metric tensor are symmetric in their indices, gαβ = gβα; the metric tensor is always symmetric. For an n-dimensional space, a symmetric second-rank tensor has n(n + 1)/2 independent elements (show this); in MS there are 10 independent elements of the metric tensor.
To calculate the metric tensor, we must understand what’s meant by basis vector in this context. Consider the inﬁnitesimal displacement vector10 in the spherical coordinate system,

ds = drrˆ + rdθθˆ + r sin θdφφˆ ≡ drer + dθeθ + dφeφ

= (coordinate differential)i × (basis vector)i ≡ dxiei .

i

i

(5.4)

The basis vector ei is whatever multiplies the coordinate differential dxi in the expression for ds. In spherical coordinates, er = rˆ, eθ = rθˆ, and eφ = r sin θφˆ. Basis vectors are not necessarily unit vectors: their magnitude and direction generally vary throughout a coordinate system. The vectors {ei} are tangent to the coordinate curves that pass through a given point and point toward increasing values of the coordinate, the coordinate basis. Figure 5.1 shows coordinate basis vectors

z er
eφ P θ

eθ

φ

y

x

Figure 5.1 Coordinate basis vectors at point P in the spherical coordinate system.

er, eθ, and eφ “attached” to the point P . Only one coordinate curve is shown in Fig. 5.1, the portion of a semicircle11 that results for ﬁxed values of r and φ, with 0 ≤ θ ≤ π/2. The coordinate curve for the radial coordinate is the ray (for ﬁxed θ and φ) 0 ≤ r < ∞, while that for the azimuth angle is the circle (for ﬁxed θ and r) 0 ≤ φ < 2π.
8Actually, Eq. (5.3) speciﬁes the covariant elements of the metric tensor, gαβ . We will shortly introduce gαβ , the contravariant elements of the metric tensor.
9What is a geometry? O. Veblen and J.H.C. Whitehead offered:[23, p17] “. . . a branch of mathematics is called a geometry because the name seems good, on emotional and traditional grounds, to a sufﬁcient number of competent people.”
10The inﬁnitesimal displacement vector ds is the prototype contravariant vector. Anything called vector (in this case contravariant) must have the properties of the prototype.
11Semicircle because the polar angle has the range 0 ≤ θ ≤ π.

72 Core Principles of Special and General Relativity

We can now calculate the metric tensor for the spherical coordinate system using Eq. (5.3):

grr grθ grφ  1 0

0

[gij ] = gθr gθθ gθφ  = 0 r2

0 ,

gφr gφθ gφφ

0 0 r2 sin2 θ

(5.5)

where [gij] indicates the tensor components arranged as a matrix. The matrix in Eq. (5.5) is diagonal because the coordinate system is an orthogonal coordinate system, with, for example, er · eθ = rrˆ · θˆ = 0. The metric tensor is always diagonal for orthogonal coordinate systems. Using Eqs. (5.4) and (5.5), we have the square of the line element in spherical coordinates:

(ds)2 ≡ ds · ds = gijdxidxj = grr(dr)2 + gθθ(dθ)2 + gφφ(dφ)2 .

(5.6)

√ The line element ds = ds · ds represents a physical displacement and must have the dimension of

length. The metric tensor supplies the information required to calculate the distance between points,

the separation of which is characterized by coordinate differentials. If the coordinates do not have

a physical dimension, such as angular coordinates, the metric tensor must carry the information so
that gijdxidxj has the dimension of length squared (see gθθ and gφφ in Eq. (5.5)). We can write (ds)2 in Eq. (5.6) in the following way:

1 0

0  dr

(ds)2 = dr dθ dφ 0 r2

0  dθ = dr

0 0 r2 sin2 θ dφ

= (dr)2 + r2(dθ)2 + r2 sin2 θ(dφ)2 = gijdxidxj .

r2dθ

 dr  r2 sin2 θdφ dθ
dφ
(5.7)

Equation (5.7) has the form of Eq. (4.13) except with η replaced by [gij]. Now consider an arbitrary three-dimensional coordinate system where point P is at the intersec-
tion of three coordinate curves labeled by (u, v, w) (see Fig. 5.2). For a nearby point Q deﬁne the

Figure 5.2 General (u, v, w) coordinate system.

−−→ vector ∆s ≡ P Q; ∆s is also the vector ∆s ≡ (r + ∆r) − r, where r + ∆r and r are the position vectors for Q and P relative to the origin (not shown). To ﬁrst order in small quantities,

∂r

∂r

∂r

ds = du + dv + dw ,

(5.8)

∂u

∂v

∂w

where the derivatives (with respect to coordinates) are evaluated at P . The derivatives

eu ≡ ∂r/∂u P ev ≡ ∂r/∂v P ew ≡ ∂r/∂w P

(5.9)

form a local basis—an arbitrary ds in the neighborhood of P can be expressed as a linear combination of them—and they’re tangent to the coordinate curves. A local set of basis vectors is determined by the local structure of the coordinate system.

Dual basis 73

Example. The position vector in spherical coordinates can be written

r = r sin θ cos φxˆ + r sin θ sin φyˆ + r cos θzˆ .

Applying Eq. (5.9), we have the vectors of the coordinate basis
er =∂r/∂r = sin θ cos φxˆ + sin θ sin φyˆ + cos θzˆ eθ =∂r/∂θ = r cos θ cos φxˆ + r cos θ sin φyˆ − r sin θzˆ eφ =∂r/∂φ = −r sin θ sin φxˆ + r sin θ cos φyˆ .
The norms of these vectors are ||er|| = 1, ||eθ|| = r, and ||eφ|| = r sin θ (show this). The magnitude and direction of the basis vectors are not constant. The unit vectors are thus
eˆr = sin θ cos φxˆ + sin θ sin φyˆ + cos θzˆ eˆθ = cos θ cos φxˆ + cos θ sin φyˆ − sin θzˆ eˆφ = − sin φxˆ + cos φyˆ .
Clearly, by deﬁnition, er = eˆr, eθ = reˆθ, and eφ = r sin θeˆφ.

What about Minkowski space? For spherical coordinates, the geometry was speciﬁed ﬁrst and
then we derived the metric tensor. Physics determines the metric of spacetime for us. From Eq. (4.13), (ds)2 = −(dx0)2 + (dx1)2 + (dx2)2 + (dx3)2 ≡ gαβdxαdxβ, and thus we have the Lorentz metric tensor, what we previously wrote down in Eq. (1.14):

−1 0 0 0

[gαβ ]

=

η

=

  

0 0

1 0

0 1

0 0

.

0 001

(1.14)

Note that the inner product between the time basis vectors, e0 · e0 = −1, which is non-intuitive but
consistent with our deﬁnition of timelike unit vector (Section 4.4). In Euclidean geometry (ds)2 = gijdxidxj > 0 for any sign of the differentials dxi. A metric
is said to be positive deﬁnite12 if (ds)2 > 0 for all dxi, unless the coordinate differentials vanish, dxi = 0. Said differently, the distance between two points in a Euclidean geometry vanishes only if the points are coincident. In MS, however, (ds)2 can be positive, negative, or zero (spacelike,
timelike, lightlike), in which case the metric is said to be indeﬁnite. With an indeﬁnite metric, two points may be at zero distance [(ds)2 = 0] without being coincident (dxi = 0). We show in Section
5.6 that an indeﬁnite metric must have a nonzero null vector.

5.1.3 Dual basis, lowering and raising indices
The basis vectors {ej}nj=1 for an arbitrary coordinate system will not in general be mutually orthogonal. It’s highly useful nonetheless to have some type of orthogonality relations among basis vectors. To that end, we deﬁne another set of vectors, {ei}ni=1, the dual basis, labeled with superscripts, that
12A test for positive deﬁniteness is provided by Sylvester’s criterion that various determinants (principal minors) associated with [gij ] all be positive.[24, p52] The Lorentz metric fails this test: It’s not positive deﬁnite, and in fact is indeﬁnite.

74 Core Principles of Special and General Relativity

are orthogonal to each of the vectors {ej}, such that13

ei · ej = δji ≡

1 0

if i = j ,
if i = j

(i, j = 1, · · · , n)

(5.10)

where we’ve written the Kronecker delta in a new way.14 By deﬁnition, e2 · e1 = 0 and e1 · e2 = 0, but in general e1 · e2 = 0.
Figure 5.3 shows a non-orthogonal basis e1, e2 for vectors conﬁned to a plane. Any vector in

A2e2

A2e2

A

e2

e2

e1 A1e1 e1

A1e1

Figure 5.3 Basis vectors e1, e2, and dual basis vectors e1, e2.

the plane can be expressed as a linear combination A = A1e1 + A2e2. The dual basis vectors e1 and e2 are shown, constructed so as to satisfy Eq. (5.10): e2 · e1 = 0, e2 · e2 = 1, e1 · e2 = 0, and e1 · e1 = 1. The same vector can be expressed as a linear combination of the dual basis vectors: A = A1e1 + A2e2, where the components of A in the dual basis are labeled with subscripts.
We can express a vector in either basis. Using the summation convention,

A = Aiei = Akek .

(5.11)

There must be a connection between the components Ai and Ak (of the same vector). Take the inner product of both sides of Eq. (5.11) with ej,

ej · A = Aiej · ei = Aigji = Akej · ek = Akδjk = Aj ,

where we’ve used Eqs. (5.3) and (5.10). Thus,

Aj = gjiAi .

(5.12)

13The number of dual basis vectors {ei}ni=1 is the same as that for the original basis {ej }nj=1; the two sets are isomorphic. In crystallography the dual basis is called the reciprocal basis. For the (generally non-orthogonal) directions of crystal
axes {ei}3i=1, the dual basis vectors are deﬁned as

e1 = e2 × e3

e2 = e3 × e1

e3 = e1 × e2 .

e1 · (e2 × e3)

e1 · (e2 × e3)

e1 · (e2 × e3)

These vectors satisfy ei · ej = δji . Could it be said that one cannot understand solid-state physics without ﬁrst studying GR? 14The dual basis vectors span a logically distinct vector space known as the dual space, which plays a fundamental role in

tensor analysis. Appendix C contains an introduction to the dual space. From a set of vectors (in general non-orthogonal), a
new, orthonormal set of vectors can always be found (Gram-Schmidt process). The dual basis is not such a set. The dual basis is in general non-orthogonal; the vectors {ei} are orthonormal to every vector in the set {ej }, but not amongst themselves. The Gram-Schmidt process is a basis transformation in a given vector space; the dual basis is the basis of another space.

Dual basis 75

Equation (5.12) is an instance of lowering an index: The (covariant) metric tensor connects the
components of a vector in the dual basis Aj with its components in the coordinate basis,15 Ai. Now take the inner product between Eq. (5.11) and ej:

ej · A = Aiej · ei = Aiδij = Aj = Akek · ej ,

(5.13)

where we’ve used Eq. (5.10). We deﬁne the contravariant elements of the metric tensor as (compare

with Eq. (5.3))

gij ≡ ei · ej ,

(5.14)

where gij = gji. Combining Eqs. (5.14) and (5.13),

Aj = Akgkj .

(5.15)

Equation (5.15) is an instance of raising an index: The contravariant metric tensor connects the

components of a vector in the coordinate basis Aj with its components in the dual basis, Ak.

Is there a relation between the contravariant and covariant elements of the metric tensor, gij and

gij? Combining Eq. (5.15), the raising of an index, Ai = gijAj, with Eq. (5.12), the lowering of an

index, Aj = gjkAk,

Ai = gij Aj = gij gjkAk .

(5.16)

Equation (5.16) is equivalent to

δik − gij gjk Ak = 0 .

(5.17)

But because the {Ak} are arbitrary,

gij gjk = δik .

(5.18)

The two types of metric tensors are inverses of each other. Using Eq. (5.5) we have for spherical

coordinates,

1 0

0

gij = 0 1/r2

0 .

0 0 1/(r2 sin2 θ)

(5.19)

The representation of vectors in the coordinate and dual bases provides a convenient expression for the inner product,

A · B = Aiei · Bkek = AiBkei · ek = AiBkδki = AiBi ,

(5.20)

where we’ve used Eq. (5.10). Likewise, AiBi = gijAjgikBk = AjBkδjk = AjBj, where we’ve used Eq. (5.18).16 Note that Eq. (4.13) can be written r · r = xµxµ.

We now prove a useful result, that we can form an identity operator out of the basis vectors,

(summation convention)

I ≡ eiei ≡ eiei ,

(5.21)

where there is no “dot” between the vectors; Eq. (5.21) is an operator.17 In Cartesian coordinates, Eq. (5.21) reads I = exex + eyey + ezez. Let I = eiei act on a vector deﬁned ﬁrst with respect to
the dual basis, and then with respect to the coordinate basis,

I · A =eiei · Aj ej = eiAj ei · ej = eiAj δji = eiAi = A =eiei · Aj ej = eiAj (ei · ej ) = eiAj gji = eiAi = A ,

15Note from Eq. (5.12) that all components Ai in the coordinate basis contribute to the components Aj in the dual basis. 16Such index manipulations are affectionately known as index gymnastics. 17The juxtaposition of two vectors without a dot or cross between them is called a dyad. A dyadic is a sum of dyads. The
dyadic identity operator in Eq. (5.21) is analogous to the completeness relation I = n |n n| in Dirac notation.

76 Core Principles of Special and General Relativity

where we’ve used Eq. (5.10) in the ﬁrst line and Eq. (5.12) in the second line to lower the index.

Raising and lowering indices applies to basis vectors as well. Expand a dual basis vector in the

coordinate basis,

ei = cij ej ,

(5.22)

where the cij are unknown expansion coefﬁcients. Take the inner product between ek and both sides
of Eq. (5.22), ek · ei ≡ gik = cijek · ej = cijδjk = cik, where we have used Eq. (5.10). Hence, cik = gik and ei = gijej, just like Eq. (5.15). By a similar argument, ei = gijej. We can now establish the identity of the two forms for I in Eq. (5.21), ekek = gkjejgklel = δljejel = elel.

5.1.4 Coordinate transformations

We now examine how the components of ds change under invertible coordinate transformations. Dry and technical as this material tends to be, it’s highly important for learning about tensors.18
Let there be n independent, analytic functions of the coordinates x1, · · · , xn, yi(x1, · · · , xn) (i = 1, · · · , n), which we can denote {yi(xj)}ni=1. A set of functions is independent if the Jacobian determinant—the determinant of the matrix of partial derivatives ∂yi/∂xj (the Jacobian matrix)— does not vanish identically. The functions yi then provide another set of coordinates, a new set of
numbers to attach to the same point in space,

yi = yi(x1, · · · , xn) ,

i = 1, · · · , n

(5.23)

and constitute a transformation of cooordinates.19 By assumption (nonvanishing Jacobian determinant), Eq. (5.23) is invertible: xj = xj(y1, · · · , yn), (j = 1, · · · , n).
Consider a point P with coordinates xi and a neighboring point Q with coordinates xi +dxi; see −−→
Fig. 5.4. The points (P, Q) deﬁne the inﬁnitesimal displacement vector ds ≡ P Q with components

yi + dyi

Q

yi

xi + dxi

P xi

Figure 5.4 Inﬁnitesimally separated points in two coordinate systems.

dxi. Referring to the same points (P, Q) let there be a different coordinate system, yj. In this coordinate system the same vector ds has components dyj. The components of ds in the two
systems of coordinates are related by calculus,

dyi

=

∂yi ∂xj

dxj
P

≡ Aij dxj

,

i = 1, · · · , n

(5.24)

where the partial derivatives Aij = ∂yi/∂xj P comprise the elements of the Jacobian matrix associated with the coordinate transformation at point P .20 The derivatives exist through the analyticity

18We should be interested in coordinate transformations for two broad reasons. In SR, a LT is a change in basis vectors
in MS. In GR, spacetime cannot be modeled as MS. The curved spacetime of GR requires the mathematical structure of a
four-dimensional manifold. A curved manifold cannot be covered by a single coordinate system; it must rely on overlapping
“coordinatizations” of spacetime. Overlapping coordinate systems are another way of describing the same spacetime event
using different coordinate systems. 19For the most part we treat coordinate transformations from the passive point of view, where new coordinates are
assigned to the same points. Coordinate transformations can, however, be viewed in the active sense where new coordinates refer to different points in the same coordinate system (same basis vectors), in which case the transformation equations
determine a mapping between points. 20In writing the elements of the Jacobian matrix as Aij , we’re using the notation for matrix elements introduced in Section
4.3. The top index labels rows and the bottom index columns.

Coordinate transformations 77

of the transformation equations, Eq. (5.23). The quantities Aij are constant for Q in an inﬁnitesimal neighborhood of P . Equation (5.24) then represents a locally linear transformation, even though the transformation equations in (5.23) are not necessarily linear.21
We adopt a notational device—the Schouten index convention—that simpliﬁes tensor transfor-
mation equations.[25] Instead of inventing a different symbol for each new coordinate system (y,
x , x¯, etc), choose x to represent coordinates once and for all. Coordinates in different coordinate systems are distinguished by primes attached to indices. Thus, Eq. (5.24) is written dxi = Aij dxj, with Aij ≡ ∂xi /∂xj; Eq. (5.23) is written xi = xi (xj).
Coordinate differentials in one coordinate system thus determine the coordinate differentials in
another coordinate system. The transformation inverse to Eq. (5.24) is

dxi

=

∂xi ∂xj

dxj
P

≡ Aij dxj

,

i = 1, · · · , n

(5.25)

where Aij denotes the partial derivatives {∂xi/∂xj P }. Combining Eqs. (5.24) and (5.25), dxi =

Aij dxj = Aij Ajk dxk, or

δik − Aij Ajk dxk = 0 .

(5.26)

The matrices Aij and Ajk are thus inversely related22

Aij Ajk = δik .

(5.27)

Equation (5.27) is simply the chain rule: (∂xi/∂xj )(∂xj /∂xk) = ∂xi/∂xk = δik. Equations (5.24) and (5.25) indicate how the components of ds transform between coordinate
systems. How do the basis vectors transform? We now use the key fact that ds is the same when expressed in the two basis sets, {ei} and {ej },23

ds = dxiei = dxj ej .

(5.28)

Using a familiar strategy, take the inner product between ds and ek on both sides of Eq. (5.28),

ek · ds = dxiek · ei = dxiδki = dxk = dxj ek · ej = Ajl dxlek · ej , where we’ve used Eqs. (5.10) and (5.24). Thus,

(5.29)

Ajl ek · ej dxl = dxk .

(5.30)

By the reasoning used in Eqs. (5.17) and (5.26), Eq. (5.30) implies that Ajl ek · ej = δkl, in turn implying that [ek · ej ] is the inverse of the matrix Ajl (because the inverse of a matrix is unique).
Referring to Eq. (5.27), we identify24

ek · ej

= Akj

∂xk = ∂xj

.

(5.31)

21We’re anticipating the possibility of nonlinear coordinate transformations (which we’ll need in GR). At point P , the derivatives (∂yi/∂xj )|P are constant only within a small neighborhood of P ; a nonlinear transformation can thus be treated
as if it’s linear within a small region. The LT is strictly linear and the restriction of derivatives to their values at a point is
unnecessary. 22In other notational schemes one must come up with different symbols for the Jacobian matrix and its inverse; e.g., U
and U , or U and U −1. In either case, one has to remember which matrix applies to which transformation. In the Schouten
method there is one symbol with two types of indices, primed and unprimed. Other schemes use the same symbol for the Jacobian matrix and its inverse, but with two ways of writing the indices, Aij and Aji. Not only does one have to remember which applies to which transformation, such a scheme quickly becomes unintelligible to students at the back of the room.
23We have chosen once and for all to represent basis vectors with the symbol e. 24Note that because of the prime on the index there is (hopefully) no chance of confusing Akj = ek · ej (from Eq. (5.31)) with ek · ej = δjk (from Eq. (5.10)). In the Schouten method, the symbol Ajk is deﬁned as the Kronecker delta, Ajk ≡ δkj .

78 Core Principles of Special and General Relativity

Similarly, in Eq. (5.28) take the inner product between ds and ek , a dual basis vector in the transformed coordinate system,

ek · ds = dxiek · ei = Ail dxl ek · ei = dxj ek · ej = dxj δkj = dxk ,

where we’ve used Eq. (5.25) and the analog of Eq. (5.10) in the transformed coordinate system, ek · ej = δkj . We conclude that Ail ek · ei = δkl and hence that

ek

· ei = Aki

∂xk = ∂xi

.

(5.32)

Jacobian matrices therefore connect basis vectors in different coordinate systems (as well as coordinate differentials). From Eqs. (5.31), (5.32), and the identity operator, Eq. (5.21), we obtain the transformation equations between basis vectors,

ei = I · ei = ej ej · ei = ej Aji

ei = I · ei = ej ej · ei = ej Aji .

(5.33)

Likewise, the dual basis vectors transform inversely to Eq. (5.33)

ei = I · ei = ej ej · ei = Aij ej

ei = I · ei = ej ej · ei = Aij ej .

(5.34)

You will refer to these equations more than once; remember where you put them.

5.1.5 Tensor transformation properties: Contravariant, covariant, and all that
5.1.5.1 Scalar ﬁelds: φ (r ) = φ(r)
An invariant is a quantity that does not change under coordinate transformations. The simplest type of invariant is a scalar, a number, such as the spacetime separation. A scalar ﬁeld is a function φ(r) that assigns a number to each point in space. Points are invariant under passive coordinate transformations, which attach different labels (coordinates) to points, but do nothing to the points themselves. Any set of points is therefore invariant, as is any point function. The value of a scalar ﬁeld is invariant under passive coordinate transformations.25 If a point has coordinates xi and xj in two coordinate systems, we require of a scalar ﬁeld that φ(xi) = φ(xi(xj )) ≡ φ (xj ), i.e., the form of the function of the transformed coordinates may change, φ , but not its value φ (r ). Scalars do not exhaust the possible types of invariants under coordinate transformations; invariants other than scalars exist.

5.1.5.2 Contravariant tensors
How do the components of vectors other than ds transform under a change of basis? The question can be answered because we know how basis vectors transform. Calculus was used in Eq. (5.25) to specify the transformation property of the components of ds = dxiei. We could establish how basis vectors transform (Eqs. (5.33) and (5.34)) by requiring ds to be the same when represented in different bases, ds = dxiei = dxj ej , Eq. (5.28). That ds is the same in different bases implies that it has an existence independent of coordinate system. A quantity having such a property is said to be a geometric object. The inﬁnitesimal displacement vector ds is the prototype of a class of geometric objects referred to as contravariant vectors.26
A vector T = T iei has contravariant components T i if they transform as T i = Aij T j, i.e., like Eq. (5.25). In that way T is a geometric object, T = T jej = T jAij ei = T i ei , where we’ve used
25The temperature at a point, for example, doesn’t care what coordinates you assign to the point. 26By a contravariant vector, we mean a vector with contravariant components. The same terminology applies to contravariant tensors, covariant tensors, etc.

Contravariant tensors 79

Eq. (5.33). The contravariant components transform inversely (“contra”) to the transformation of basis vectors. Any set of n quantities {T i} that transform like the components of ds,

Tk

=

Akj

Tj

=

∂xk ∂xj

Tj

,

(5.35)

are said to be the contravariant components of a vector. Any mathematical objects that transform like the components of ds form the contravariant components of a vector.
A set of (n)2 quantities {T ij} are said to be the contravariant components of a second-rank
tensor if they transform as

Ti j

=

Aik Ajl

T kl

=

∂xi ∂xk

∂xj ∂xl

T kl

.

(5.36)

One way to create tensors is by multiplying vector components. For Ai and Bj contravariant vector components, T ij ≡ AiBj comprise the components of a second-rank tensor because they automatically transform properly. A set of (n)r objects {T i1···ir } that transform as the product of r
contravariant vector components, T k1k2···kr ≡ Akm11 Akm22 · · · Akmrr T m1m2···mr are the contravariant components of a tensor of rank r.

Example. Show that {gij} are contravariant tensor elements. To do so, we must show that they
transform properly. Starting from the deﬁnition Eq. (5.14), gi j = ei · ej = Aik ek · Ajl el = Aik Ajl gkl, where we have used Eq. (5.34), in agreement with Eq. (5.36).

Transformation relations such as Eq. (5.36) pertain to the components of tensors, but they do
not deﬁne what a tensor is. Like vectors, tensors are geometric objects. It’s common practice to refer to symbols like T ij as “tensors,” but that’s not correct. We’ll use a special notation to indicate
tensors: boldface Roman font for tensors, T, as distinguished from boldface italic font for vectors, A. A second-rank tensor T is a generalization of a vector,27 T ≡ T ijeiej. A second-rank tensor is independent of coordinate system,

T =T i j ei ej = Ail AjmT lmAki Anj eken = Aki Ail =δklδnmT lmeken = T kneken = T ,

Anj Ajm T lmeken

where we’ve used Eqs. (5.36), (5.33), and (5.27). It’s important to distinguish the tensor components T ij from the tensor as a whole, T, just as we distinguish a vector, A, from its components, Ai. A rank-r tensor is T = T k1k2···kr ek1 ek2 · · · ekr . At some point we’ll break training and start referring to tensor components as tensors (despite our admonition); continually referring to “the tensor whose components are T ij” becomes cumbersome. The distinction between a tensor and its components
should be kept in mind nevertheless.

5.1.5.3 Covariant tensors and mixed tensors

The invariance of scalar ﬁelds (φ (r ) = φ(r)) allows us to introduce how derivatives transform between coordinate systems. Using Eq. (5.25),

∂φ ∂xj

=

∂xi ∂xj

∂φ ∂xi

= Aij

∂φ ∂xi

.

(5.37)

Again, calculus is used to establish a prototype transformation equation. The form of Eq. (5.37) is the inverse of the form of Eq. (5.35) (note the location of the indices); Eq. (5.37), however,

27The order in which the basis vectors is written is important; in general T ij = T ji.

80 Core Principles of Special and General Relativity

has the same form as Eq. (5.33). Just as the inﬁnitesimal displacement vector ds is the prototype

contravariant vector, the gradient of a scalar ﬁeld ∇φ is the prototype of a class of geometric objects

called covariant vectors. A vector T = Tmem has covariant components Ti if they transform as

Tn = Am n Tm, so that T = Tmem = TmAm n en = Tn en , from Eq. (5.34). Any set of n quantities

{Ti} that transform like

Tj

=

Aij

Ti

=

∂xi ∂xj

Ti

(5.38)

are the covariant components of a vector—they “co-vary” with the basis vectors28 ei. A set of (n)2 objects {Tij} that transform like

Ti j = Aki Alj Tkl

(5.39)

are covariant components of a second-rank tensor. A second-rank tensor with covariant compo-

nents, T ≡ Tijeiej, is independent of basis (as can readily be shown). A set of (n)r objects

{Tk1···kr } that transform like Tk1···kr = Am k11 · · · Am krr Tm1···mr are the covariant components of

a tensor of rank r, T = Tk1···kr ek1 · · · ekr .

We

now

deﬁne

mixed

tensors.

A

set

of

(n)3

objects

{T

i jk

}

that

transform

as

T

i j

k

=

Aip Alj

Am k T

p lm

(5.40)

are the components of a third-rank tensor with one contravariant and two covariant indices. Nota-

tionally,

the

upper

and

lower

indices

are

set

apart,

T

i jk

.

It’s

good

hygiene

in

writing

the

components

of mixed tensors not to put superscript indices aligned with subscript indices (as in Tjik); adopting

this convention helps avoid mistakes.29 The components of a mixed tensor of type (p, q) (having p

contravariant

indices

and

q

covariant

indices)

are

a

set

of

n(p+q)

objects

{T

i1

···ip j1

···jq

}

that

trans-

form as

T

k1

···kp m1

···mq

=

Akt11

· · · Aktpp Asm11

· · · Asmqq T t1···tps1···sq

.

(5.41)

The

tensor

of

type

(p, q)

is

T

=

T

k1 ···kp m1 ···mq

ek1

·

·

·

ekp

em1

·

·

·

emq

.

A

tensor

as

a

geometric

object

is

independent

of

the

basis

vectors

used

to

represent

it:

T

=

Tij eiej

=

T

i j

ei

ej

=

Ti j eiej

=

T ij eiej .

Is δij an element of a second-rank mixed tensor as the notation suggests? How does it transform?

Using Eq. (5.41), δij ≡ Aik Am j δkm = Aik Akj = δij , from Eq. (5.27). The transformation of δji ,

δji , has the value of δji in the new frame. The Kronecker delta is a constant tensor, a tensor with

elements that are numerically the same in every coordinate system. (The same is not true of δij.30) Equation (5.10) deﬁnes the elements of the mixed metric tensor, gij = ei · ej = δij.

5.1.5.4 Inner product is a scalar

We now show that the inner product deﬁned by Eq. (5.2) is invariant. Using Eq. (5.20),

(T · U ) = Ti U i = Aki Aij TkU j = δkj TkU j = Tj U j = T · U ,

(5.42)

where we’ve used Eqs. (5.35), (5.38), and (5.27). If we know the value of the inner product in one
coordinate system, we know it in all coordinate systems. Note that the metric tensor is lurking in Eq. (5.42) from lowering indices: T · U = gαβT αU β = TβU β.

28A useful mnemonic for the placement of indices is “co goes below.”

29For

example,

by

writing

gν α C µαλ

=

C

µλ ν

we

know

where

ν

“comes

from.”

Had

we

written

Cνµλ ,

where

would

ν

“go” if later we decide to raise the index?

30The Kronecker symbol was deﬁned in Eq. (5.10) as a mixed tensor. The Kronecker symbol as it’s usually written, δij , is in general tensor analysis obtained by lowering an index: δij = gikδkj = gij .

Quotient theorem 81

5.1.6 Tensor contraction and outer product

When a contravariant (upper) index is set equal to a covariant (lower) index and summed over, it

reduces a tensor of type (p, q) to one of type (p − 1, q − 1), i.e., it lowers the tensor rank by two, a

process

called

contraction.

Consider

T

i j

≡

U iVj ,

a

second-rank

tensor

formed

from

the

product

of

vector

components

Ui

and

Vj .

If

we

set

j

=

i

and

sum

over

i,

T

i i

≡

U iVi,

we

form

a

scalar.

The

inverse process, of forming the components of higher-rank tensors from products of the components

of lower-rank tensors, is called the outer product. The product of the components of a tensor of type

(r, s) with the components of a tensor of type (p, q) form the components a new tensor of type

(r

+

p,

s

+

q).

For

example,

the

quantities

T

ij k

=

U ij Vk

are

the

components

of

a

third-rank

tensor.

If we set j = k and sum, we lower the rank by two to form a vector (ﬁrst-rank tensor), T i ≡ U ikVk.

To prove that T i is the component of a vector, we must show that it transforms like one,

T i = U i k Vk = Ail AkmU lmAnk Vn = Ail = Ail U lnVn = Ail T l ,

AkmAnk

U lmVn = Ail δnmU lmVn

where we’ve used Eqs. (5.36), (5.38), and (5.27). The tensor that results from components obtained through outer products is called the tensor product, C = A ⊗ B. If A = αiei and B = βjej, C = αiβjei ⊗ ej ≡ Cijei ⊗ ej. The quantities ei ⊗ ej are a basis for type (2, 0) tensors formed
from the basis vectors ei for type (1, 0) tensors.

5.1.7 Quotient theorem

The direct test for whether a set of mathematical objects form tensor components is to verify that
they transform appropriately. There is an indirect method for checking the tensorial character of a set of quantities known loosely as the quotient theorem, which says that in an equation U V = T , if V and T are known to be elements of a tensor, then U is also a tensor element. With the quotient
theorem, we use known tensors to ascertain the tensor character of putative tensors. Suppose {Tr} is a set of quantities we wish to test for its tensor character. Let {Xr} be the
components of a contravariant vector. If the sum TrXr is an invariant, then by the quotient theorem, the quantities Tr form the elements of a covariant vector. From the given invariance, we have TrXr = Ts Xs . We can use the known transformation properties of Xr, Eq. (5.35), to write
TrXr = Ts Asj Xj, or equivalently Tj − Ts Asj Xj = 0. Because the {Xj} are arbitrary, the
terms in parentheses must vanish, establishing Tj = Asj Ts as the elements of a covariant vector. Let’s take a more challenging example. Suppose we run into an equation,

T

mn kl

=

U mSnkl

,

(5.43)

where it’s known that T is a type (2, 2) tensor and U is a vector. By the quotient theorem we’re

entitled to conclude that S is a tensor of type (1, 2). To show this, introduce contravariant vector

components {xi} and covariant vector components {yr}. Multiply Eq. (5.43) by ymynxkxl and

contract,

T mnklymynxkxl

=

U

m

S

n kl

ym yn xk xl

=

(U mym)

S

n kl

ynxk

xl

.

(5.44)

We take this step so that the left side of Eq. (5.44) is a scalar (because we have contracted all

indices);

U mym

is

also

a

scalar.

We

have

therefore

established

that

S

n kl

ynxk

xl

is

a

scalar,

and

thus

S

n k

l

yn

xk

xl

=

S

n kl

ynxk

xl

.

Now

use

the

transformation

properties

of

xi

and

ym,

Eqs.

(5.35)

and

(5.38),

S

n k

l

Ain

yiAkj

xj Almxm

=

Sijmyixj xm. Because xi

and yj

are arbitrary,

S

n k

l

Ain

Akj

Alm

=

S

i j

m,

establishing

S

as

a

type

(1,

2)

tensor.

82 Core Principles of Special and General Relativity
5.1.8 Geometric interpretation of covariant vectors We’ve now met the main players: scalars, contravariant and covariant vectors, and their generalizations as tensors. Contravariant vectors share the attributes of the displacement vector and should simply be called vectors. What then are covariant vectors? Their components transform like those of the gradient vector, which doesn’t immediately convey a picture of what they are. For many students a course in relativity is their ﬁrst introduction to covariant vectors, and one might wonder how important they are, given that one has arrived this far in a scientiﬁc education without encountering them.31 Can we provide a geometric interpretation of covariant vectors?
As we now show, covariant vectors represent families of parallel planes.32 Figure 5.5 shows a
Figure 5.5 Plane in 3-space. The vector r − r0 lies in the plane. plane in 3-space. Locate a point on the plane having coordinates (x10, x20, x30) with the ﬁxed vector r0. Let r locate an arbitrary point on the plane with coordinates (x1, x2, x3). Let there be a vector w perpendicular to the plane with components (w1, w2, w3). The vector r − r0 lies in the plane and thus w · (r − r0) = 0. By the quotient theorem, w is a covariant vector. The coordinates {xj} of all points on the plane then satisfy the equation of a plane wixi = d, where d ≡ w · r0 is a constant. The intercept pi of the plane with the ith coordinate axis is found by setting all other coordinates xj = 0 (j = i), with the result that pi = d/wi (see Fig. 5.6). For a plane parallel to
Figure 5.6 Covariant vectors w represent families of parallel planes. the ﬁrst, its coordinates satisfy wixi = d , where d = d is another constant. The intercepts of the parallel plane are given by qi ≡ d /wi. Subtracting these equations, wi = (d − d) / qi − pi . The components wi are therefore related to the intercepts made by a pair of parallel planes with the coordinate axes. The direction of w is perpendicular to the planes, and the magnitude is speciﬁed
31One reason covariant vectors are relatively unfamiliar is that the distinction between contravariant and covariant is unnecessary in orthogonal coordinate systems, and physics is most often done using orthogonal coordinate systems. We’re marching towards GR, however, which touts itself as applying to any coordinate system.
32Planes are two-dimensional structures embedded in three-dimensional space. Planes in higher-dimensional spaces are called hyperplanes: (n − 1)-dimensional structures embedded in n-dimensional spaces, with n > 3.

Geometric interpretation of covariant vectors 83

by the distance separating the planes in the family, with magnitude inversely proportional to the interplanar separation.33

The connection with gradients thus becomes apparent. The level set of a function is the locus

of points such that f (x1, · · · , xn) = f0, where f0 is a given constant value. As is well known, the

gradient of a function is orthogonal to its level set.34 Consider the change in a scalar ﬁeld φ over a

displacement ds = duiei, with

dφ

=

∂φ ∂ui

dui

.

(5.45)

We can represent dφ (a scalar) as an inner product between ds and a new vector (the gradient) such

that dφ ≡ ∇φ · ds. If ds lies within the level set, dφ = 0, implying that ∇φ is orthogonal to the

level set of φ. By the quotient theorem, ∇φ is a covariant vector which we can represent in the dual basis, ∇ ≡ ei∇i,

dφ = ∇φ · ds = (∇iφ)ei · (dujej) = (∇iφ) dujδij = (∇iφ) dui .

(5.46)

Comparing Eqs. (5.46) and (5.45), ∇i ≡ ∂/∂ui. Note the location of the indices: A derivative with respect to a contravariant component, ui, is a covariant vector component, ∇i. To show that ∇i is the component of a covariant vector is simple; see Eq. (5.37), ∇i = Aji ∇j.

Example. The electric ﬁeld E is a geometric object. It has a natural representation as a covariant
vector Ei = −∇iφ from its role as the gradient of the electrostatic potential φ(r). It’s also naturally represented as a contravariant vector from its relation to the Newtonian equation of motion Ei = (m/q)dvi/dt. The two quantities are related through the metric tensor, Ei = gijEj. The distinction
is necessary only in non-orthogonal coordinate systems.

Gradients provide a geometric interpretation of the dual basis vectors. Vectors normal to the level set of a function35 f (u, v, w) can be expressed in a basis of normals to coordinate surfaces,

eu ≡ ∇u ev ≡ ∇v ew ≡ ∇w .

(5.47)

A coordinate surface is the surface that results by holding one of the coordinates ﬁxed.36 A sphere,
for example, results by holding the radial coordinate ﬁxed and letting the coordinates θ, φ vary; the sphere is the coordinate surface associated with the radial coordinate.37 Figure 5.7 illustrates the
distinction between coordinate basis vectors eα (tangents to coordinate curves) and the dual basis vectors eβ, orthogonal to coordinate surfaces. The vectors in Eq. (5.47) are dual to the basis vectors ei in the sense of Eq. (5.10), (u, v, w ≡ u1, u2, u3)

ei · ej

= ∇ui ·

∂r ∂uj

=

∂ui ∂xk

∂xk ∂uj

=

∂ui ∂uj

= δji

.

(5.48)

Example. Consider a coordinate system (u, v, w) deﬁned by x = u+v, y = u−v, and z = αuv+w, where α is a constant. These equations can be inverted, with

1 u = (x + y)
2

1 v = (x − y)
2

w = z − α (x2 − y2) . 4

33Your inner mathematician would want to know that vectors deﬁned from families of parallel planes can be added to
other such vectors to produce new vectors of the same type. They can, as shown in the delightful book by Weinreich.[26] 34Anyone who’s worked with topographic maps knows that a steeper terrain (gradient) is implied by contours of equal
elevation spaced closer together. 35Deﬁned with respect to a general (u, v, w) coordinate system. 36In an n-dimensional space with n > 3, (n − 1)-dimensional coordinate surfaces—called hypersurfaces—result by
holding one of the n coordinates ﬁxed. 37For a sphere, the unit vector rˆ is both tangent to the radial coordinate curve and orthogonal to the radial coordinate
surface: The distinction between the two types of vectors is unnecessary in orthogonal coordinate systems.

84 Core Principles of Special and General Relativity

Figure 5.7 Vectors of the coordinate basis eα are tangent to coordinate curves, vectors of the dual basis eβ are orthogonal to coordinate surfaces.

The coordinate surfaces for u = u0 and v = v0 are planes, while the surface for w = w0 is a hyperbolic paraboloid. The position vector can be written

r = (u + v)xˆ + (u − v)yˆ + (αuv + w)zˆ .

Using Eq. (5.9), we ﬁnd the coordinate basis vectors

∂r eu = ∂u = xˆ + yˆ + αvzˆ

∂r ev = ∂v = xˆ − yˆ + αuzˆ

∂r ew = ∂w = zˆ .

It’s easily shown that eu · ev = α2uv, eu · ew = αv, ev · ew = αu; this is not an orthogonal coordinate system. From Eq. (5.47),

eu = ∇u = 1 (xˆ + yˆ) 2

ev = ∇v = 1 (xˆ − yˆ) 2

ew = ∇w = zˆ − α (xxˆ − yyˆ) . 2

It can be veriﬁed that eu · eu = ev · ev = ew · ew = 1 and eu · ev = eu · ew = ev · ew = 0. Equation (5.10) is satisﬁed.

5.1.9 Connection with relativity
If a tensor equation is true in one reference frame, it’s true in all reference frames. Suppose we have a relation between tensors, valid in one coordinate system, Aij = Bij. Write this equation as Dij = 0, where Dij ≡ Aij − Bij. If Dij = 0 in one coordinate system, then Di j = 0 in any coordinate system, because the tensor transformation equations are linear and homogeneous. Thus, Ai j = Bi j in all coordinate systems. While the individual components Aij, Bij transform between frames, the form of the equation is the same in all coordinate systems.38 For physical laws to be the same for all observers, they must be formulated in a covariant manner, which is why it’s so important to be able to establish whether a given set of objects constitute a tensor.
Let’s pause for a passage from Einstein’s 1916 article on GR. Based on what we’ve covered in this chapter, you should be able to follow what he says:[9, p121]
38Tensor equations are called covariant equations because their form co-varies with transformations between coordinate systems.

Tensor density 85

Let certain things (“tensors”) be deﬁned with respect to any system of coordinates by a number of functions of the coordinates, called the “components” of the tensor. There are then certain rules by which these components can be calculated for a new system of coordinates, if they are known for the original system of coordinates, and if the transformation connecting the two systems is known. The things hereafter called tensors are further characterized by the fact that the equations of transformation for their components are linear and homogeneous. Accordingly, all the components in the new system vanish, if they all vanish in the original system. If, therefore, a law of nature is expressed by equating all the components of a tensor to zero, it is generally covariant. By examining the laws of the formation of tensors, we acquire the means of formulating generally covariant laws.

We can write the LT in tensor notation as a coordinate transformation in MS:

xµ = Lµν xν .

(5.49)

Regardless of the details of the LT (whether simple, as in Eq. (3.17), or more complicated as in Eq. (3.24)), because the LT is linear, Lµν = ∂xµ /∂xν, the same as Eq. (5.35). Thus, we can use all the apparatus of tensor analysis in SR with the LT as the Jacobian matrix, Aij, and indeed we must use tensor analysis in relativity to formulate covariant equations. The inverse of Eq. (5.49) is xµ = Lµν xν where Lµν is obtained from Lµν by letting β → −β. The analog of Eq. (5.27) is Lµν Lνα = δµα.
By deﬁnition the LT satisﬁes Eq. (4.12), LTηL = η, a matrix equation. In terms of tensor components (using (LT )µλ = Lλµ), Eq. (4.12) is equivalent to ηµν = LκµLλν ηκλ. The deﬁning requirement of a LT is none other than the transformation equation for the Lorentz metric! The Lorentz metric
is the same in all IRFs: The principle of relativity requires the invariance of the spacetime interval (ds)2. The Lorentz metric is thus a constant tensor in MS. If xµ transforms as in Eq. (5.49), the
basis vectors transform inversely, showing that the LT is equivalently a change of basis vectors,

eα = Lβα eβ .

(5.50)

The time axis in an IRF is perpendicular to the spatial axes, so that η0i = e0 · ei = 0. It would not appear from Fig. 2.9 and similar ﬁgures that time is orthogonal to space in the transformed frame. Nevertheless, as we now show, in the transformed frame η0 1 = 0. Using Eq. (5.50),

η1 0 ≡ e1 · e0 =Lα1 eα · Lβ0 eβ = Lα1 Lβ0 ηαβ =L01 L00 η00 + L11 L10 η11 + L21 L20 η22 + L31 L30 η33 ,

(5.51)

where we’ve used that [η] is diagonal. Thus, η1 0 = (βγ)(γ)(−1) + (γ)(βγ)(1) = 0.

5.2 TENSOR DENSITIES, INVARIANT VOLUME ELEMENT
We now bring onto the stage another member from our cast of mathematical players, densities (the ﬁnal member of the “fab four” prototypes of physical quantities, in addition to scalars, covariant, and contravariant vectors). Consider the integral of a scalar ﬁeld, φ(x)dnx. Is the integral a scalar? Not in general. While φ (r ) = φ(r) under a coordinate transformation, we have to take into account the transformation of the volume element. Under the change of variables xi = xi(xj ), the volume element of a multiple integral transforms as

dnx ≡ dx1 · · · dxn = J dx1 · · · dxn ≡ J dnx

(5.52)

86 Core Principles of Special and General Relativity

(so that the integral transforms as φ(xi)dnx = φ(xi(xj ))Jdnx ≡ φ (xj )Jdnx ), where “the Jacobian” J is the determinant of the Jacobian matrix Aij ,

∂x1

∂x1

∂x1 J = ...
∂xn

···

∂xn ...
∂xn

≡

∂(x1, · · · , xn) ∂(x1 , · · · , xn )

≡

∂xi ∂xj

= Aij

.

∂x1 · · · ∂xn

(5.53)

In general we work with oriented volume elements, implying that we don’t take the absolute value of the Jacobian determinant.39
Relative tensors of weight w have components that (by deﬁnition) transform according to the
rules we have developed (such as Eq. (5.41)), with the additional requirement of the Jacobian raised to an integer power, w:40

T k1···kp
m1 ···mq

=

J wAkt11

·

·

·

Aktpp Asm11

·

·

·

Asmqq

T

t1 ···tp s1 ···sq

.

(5.54)

Linear combinations of tensors of the same weight produce new tensors with that weight. Products of tensors of weights w1 and w2 produce tensors of weight w1 + w2. Contractions of relative tensors do not change the weight. A tensor equation must be among tensors of the same weight. We require that tensor equations valid in one coordinate system be valid in all others; this property would be lost in an equation among tensors of different weights. Relative tensors with w = ±1 occur frequently, what we’ll call tensor densities. Tensors that transform with w = 0 are called absolute tensors.
The covariant metric tensor is an absolute tensor: From Eq. (5.39),

gi j = Ali Am j glm .

(5.55)

The determinant of the metric tensor, however, is a relative scalar of weight41 w = 2. Let g denote

the determinant of the covariant metric tensor (a convention we adhere to). Applying the product

rule for determinants to Eq. (5.55),

g = J2g ,

(5.56)

where we have used Eq. (5.53). The sign of g is an absolute quantity, invariant under coordinate transformations. Equation (5.56) then provides an alternate expression for the Jacobian, one that separates the contributions from the coordinate systems it connects: J = g /g. For positive definite metrics, g > 0; for the Lorentz metric, g = −1. Using Eq. (5.56), |g | = J |g| and thus
|g| is a scalar density (transforms with w = 1). Combining J = g /g with Eq. (5.52), we have the invariant volume element

|g |dy1 · · · dyn = |g|dx1 · · · dxn .

(5.57)

Note how Eq. (5.57) has a net weight of w = 0: |g|dnx is an absolute scalar. (Under x → y, dny = J−1dnx from Eq. (5.52).) Thus, the integral of a scalar ﬁeld φdnx is not invariant, but
φ |g|dnx is, something we make frequent use of in GR; in SR it’s unnecessary because |g| = 1.
Substituting J = g /g in Eq. (5.54), we ﬁnd that

(|g

|)−w/2

T

k1

···kp m1

···mq

=

Akt11

· · · Aktpp Asm11

· · · Asmqq

(|g

|)−w/2

T

t1

···tp s1

···sq

.

39By not taking the absolute value of the determinant, we allow for the possibility of transformations with J < 0. Transformations for which J < 0 allow us to further classify tensors as pseudotensors, those that transform as tensors when J > 0, but transform with an additional change of sign when J < 0.
40Beware: Relative tensors are also deﬁned with w replaced by −w. I have adopted a deﬁnition that leads to w = +2
for the determinant of the covariant elements of the metric tensor. 41The same is true of the determinant of any covariant second-rank absolute tensor.

The four gradient 87

For a tensor T of weight w, (|g|)−w/2T transforms as an absolute tensor. Conversely, an absolute tensor U when multiplied by (|g|)w/2 becomes a tensor of weight w. In particular, |g|U is a tensor density.
A notational issue arises if J = 1. The Jacobian of proper LTs is unity, for example (Section 4.3). In that case densities “ﬂy under the radar”: Physical quantities that properly are tensor densities nominally transform as absolute tensors when J = 1. It’s traditional in the theory of tensor analysis to indicate densities with a special notation, with Gothic letters: T instead of T. I will use this notation sparingly, but it can come in handy; without it, one has to keep calling attention to the fact that certain symbols represent tensor densities.

5.3 DERIVATIVES OF TENSORS AND THE FOUR-WAVEVECTOR
5.3.1 Derivatives of tensors
Is the derivative of a tensor a tensor? How would we answer such a question? I hope you’re saying, “How does it transform?”. Before delving into that question, we need to establish some notation.
In Section 5.1 we used the gradient of a scalar function to motivate the concept of covariant vector, ∇ = ei∇i. Because a geometric object is independent of basis, however, we could have declared it to be a contravariant vector, ∇ = ei∇i—the contravariant components of a vector can always be found from the covariant components by raising the index: ∇j = gjk∇k. For the gradient as a contravariant vector, the change in a scalar function dφ = ∇φ · ds would require that we express ds = eidxi as a covariant vector, with the result that dφ = ∇iφ dxi, in which case we would conclude that ∇i = ∂/∂xi (note the placement of the indices). The quantity ∇i, being the contravariant component of a vector, must transform as such,

∇i = ∂ = ∂xk ∂ = ∂xk ∇k .

∂xi ∂xi ∂xk

∂xi

(5.58)

By Eq. (5.35), however, Eq. (5.58) should read ∇i = Aik ∇k. Comparing Eqs. (5.58) and (5.35), we conclude that Aik ≡ ∂xi /∂xk = ∂xk/∂xi . Using Eq. (5.27), Akj ≡ ∂xk/∂xj = ∂xj /∂xk. Note how the indices work here.
We now deﬁne the four-gradient, for which we switch to a fairly standard notation. Let ∂µ denote the covariant four-vector of partial derivatives ∂/∂xµ (instead of ∇µ which will be used in Chapter 14 for another purpose),42

∂

∂

∂

∂µ ≡ ∂xµ =

∂x0 , ∇

=

,∇ ∂(ct)

= (∂0, ∇) .

Likewise, let ∂µ denote the contravariant version. However, instead of ∂µ = ∂/∂xµ (which is correct), use the fact that it can be obtained by raising the index, ∂µ = gµν∂ν. Using the Lorentz

metric,

∂µ = ηµν ∂ν =

∂ − ∂x0 , ∇

=

∂ − ,∇
∂(ct)

= (−∂0, ∇) = ∂0, ∇

.

The only effective difference between ∂µ and ∂µ is in the time component, ∂0 = −∂0. The inner product ∂µ∂µ generates the wave-equation operator,

∂µ∂µ

=

∂µ∂µ

=

∂2 − ∂(x0)2

+ ∇2

=

1 − c2

∂2 ∂t2

+ ∇2

.

(5.59)

42Another prevalent notation for the partial derivative is to write ∂φ/∂xµ (what we’re calling ∂µφ) as φ,µ. This taxes everyone’s eyesight. In this notation, ∂Aν /∂xµ ≡ Aν,µ, what we’ll write as ∂µAν , which causes less eye strain.