zotero-db/storage/B9HDQP2V/.zotero-ft-cache

Albert Einstein (1879- 1955)

Oxford University Press, Great Clarendon Street, Oxford OX2 6DP Oxford New York
Athens Auckland Bangkok Bogot,a Bombay Buenos Aires Calcutta Cape Town Cliennai Dar .es Sal,aam Delhi Florence Hong Kong /sf,a,nbul
Karachi Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi Paris Siio Paolo Singapore Taipei Tokyo Toronto Warsaw
and associated companies in Berlin Ibadan
Oxford is a trade mark of Oxford University Press
Published in the United States by Oxford University Press Inc., New York
© Ray d'Inverno, 1992 Reprinted 1993, 1995 (with corrections), 1996, 1998
All rights reserved. No par-t of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press. Within the UK, exceptions are allowed i,rrespect ofanyfair dealing for the purpose ofresearch or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, .. 1988, or in the case of reprographic reproduction in accordance with the terms of
licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms and in other countries should be sent to the Rights
Department, Oxford University Press, at the address above.
This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, re-sold, hired out, or otherwise circulated
without the publisher's prior consent in any form of bind_in.9 or cover other than that in which it is publis~d and without a similar condition
including this condition being imposed on the subsequent purchaser.
A catalogue record for this book is available from the British Library
Library of Congress C-a.taloging in Publication Data
d' I nverno, R. A.
Introducing Einstein's relativity/R. A. d'lnverno.
Includes bibliographical references and index. 1. Relativity (Physics) 2. Black holes (Astronomy)
3. Gravitation. 4. Cosmology. 5. Calculus of tensors. I. Title.
QC173.55.158 1992 530, fl-dc20 91-24894 ISBN 0 19 859653 7 (Hbk) ISBN O 19 859686 3 (Pbk)
Printed in Malta by Interprint Limited

Contents

Overview

1. The organization of the book

3

1.1 Notes for the student

3

1.2 Acknowledgements

4

1.3 A brief survey of relativity theory

6

1.4 Notes for the teacher

8

1.5 A final note for the

less able student

10

Exercises

11

Part A. Special Relativity

13

2. The k-calculus

15

2.1 Model building

15

2.2 Historical background

16

2.3 Newtonian framework

16

2.4 Galilean transformations

17

2.5 The principle of special relativity

18

2.6 The constancy of the velocity of light 19

2.7 The k•factor

20

2.8 Relative speed of two inertial

observers

21

2.9 Composition law for velocities

22

2.10 The relativity of simultaneity

23

2.11 The clock paradox

24

2.12 The Lorentz transformations

25

2.13 The four-dimensional world view

26

Exercises

28

3. The key attributes of special

relativity

29

3.1 Standard derivation of the Lorentz

transformations

29

3.2 Mathematical properties of Lorentz

transformations

31

3.3 Length contraction

32

3.4 Time dilation

33

3.5 Transformation of velocities

34

3.6 Relationship between space-time

diagrams of inertial observers

35

3.7 Acceleration in special relativity

36

3.8 Uniform acceleration

37

3.9 The twin paradox

38

3.10 The Doppler effect

39

Exercises

40

4. The elements of relativistic

mechanics

42

4.1 Newtonian theory

42

4.2 Isolated systems of particles in

Newtonian mechanics

44

4.3 Relativistic mass

45

4.4 Relativistic energy

47

4.5 Photons

49

Exercises

51

Part B. The Formalism of

Tenso·rs

53

5. Tensor algebra

55

5.1 Introduction

55

5.2 Manifolds and coordinates

55

5.3 Curves and surfaces

57

5.4 Transformation of coordinates

58

5.5 Contravariant tensors

60

5.6 Covariant and mixed tensors

61

5.7 Tensor fields

62

5.8 Elementary operations with tensors 63

5.9 Index-free interpretation of contra-

variant vector fields

64

Exercises

67

viii I Contents

6. Tensor calculus

68

6.1 Partial derivative of a tensor

68

6.2 The Lie derivative

69

6.3 The affine connection and covariant

differentiation

72

6.4 Affine geodesics

74

6.5 The Riemann tensor

77

6.6 Geodesic coordinates

77

6.7 Affine flatness

78

6.8 The metric

81

6.9 Metric geodesics

82

6.10 The metric connection

84

6.11 Metric flatness

85

6.12 The curvature tensor

86

6.13 The Weyl tensor

87

Exercises

89

7. Integration, variation, and

symmetry

91

7.1 Tensor densities

91

7.2 The Levi-Civita alternating symbol 92

7.3 The metric determinant

93

7.4 Integrals and Stokes' theorem

95

7.5 The Euler- Lagrange equations

96

7.6 The variational method for geodesics 99

7.7 Isometrics

102

Exercises

103

Part C. General Relativity

105

8. Special relativity revisited

107

8.1 Minkowski space-time

107

8.2 The null cone

108

8.3 The Lorentz group

109

8.4 Proper time

111

8.5 An axiomatic formulation of special

relativity

112

8.6 A variational principle approach to

classical mechanics

114

8.7 A variational principle approach to

relativistic mechanics

116

8.8 Covariant formulation of relativistic

mechanics

117

Exercises

119

9. The principles of general

relativity

120

9.1 The role of physical principles

120

9.2 Mach's principle

121

9.3 Mass in Newtonian theory

125

9.4 The principle of equivalence

128

9.5 The principle of general covariance 130

9.6 The principle of minimal

gravitational coupling

131

9.7 The correspondence principle

132

Exercises

132

10. The field equations of general

relativity

134

10.1 Non-local lift experiments

134

10.2 The Newtonian equation of

deviation

135

10.3 The equation of geodesic deviation 136

10.4 The Newtonian correspondence 139

10.5 The vacuum field equations of

general relativity

141

10.6 The story so far

142

10.7 The full field equations of general

relativity

142

Exercises

144

11. General relativity from a

variational principle

145

11.1 The Palatini equation

145

11.2 Differential constraints on the field

equations

146

11.3 A simple example

147

11.4 The Einstein Lagrangian

148

11.5 Indirect derivation of the field

equations

149

11.6 An equivalent Lagrangian

151

11.7 The Palatini approach

152

11.8 The full field equations

153

Exercises

154

12. The energy-momentum tensor 155

12.1 Preview

155

12.2 Incoherent matter

155

12.3 Perfect fluid

157

12.4 Maxwell's equations

158

12.5 Potential formulation of Maxwell's

equations

160

12.6 The Maxwell energy-momentum

tensor

162

12.7 Other energy-momentum tensors 163

12.8 The dominant energy condition 164

12.9 The Newtonian limit

165

12.10 The coupling constant

167

Exercises

168

13. The structure of the field

equations

169

13.1 Interpretation of the field equations 169

13.2 Determinacy, non-linearity, and

differentiability

170

13.3 The cosmological term

171

13.4 The conservation equations

173

13.5 The Cauchy problem

174

13.6 The hole problem

177

13.7 The equivalence problem

178

Exercises

179

14. The Schwarzschild solution 180

14.1 Stationary solutions

180

14.2 Hypersurface-orthogonal vector

fields

181

14.3 Static solutions

183

14.4 Spherically symmetric solutions

184

14.5 The Schwarzschild solution

186

14.6 Properties of the Schwarzschild

solµtion

188

14.7 Isotropic coordinates

189

Exercises

190

15. Exper'imental tests of general

relativity

192

15.1 Introduction

192

lS.2 Classical Kepler motion

192

Contents I ix

lS.3 Advance of the perihelion of

Mercury

195

15.4 Bending of light

199

15.5 Gravitational red shift

201

lS.6 Time delay of light

204

15.7 The Eotvos experiment

205

15.8 Solar oblateness

206

15.9 A chronology of experimental and

observational events

207

IS.IO Rubber-sheet geometry

207

Exercises

209

Part D. Black Holes

211

16. Non-rotating black holes

213

16.1 Characterization of coordinates 213

16.2 Singularities

214

16.3 Spatial and space-time diagrams 215

16.4 Space-time diagram in Schwarzschild

coordinates

216

16.5 A radially infalling particle

218

16.6 Eddington-Finkelstein coordinaies 219

16.7 Event horizons

221

16.8 Black holes

223

16.9 A classical argument

224

16.10 Tidal forces in a black hole

225

16.11 Observational evidence for black

holes

226

16.12 Theoretical status of black holes 227

Exercises

229

17. Maximal extension and conformal compactification 230

17.1 Maximal analytic extensions

230

17.2 The Kruskal solution

230

17.3 The Einstein-Rosen bridge

232

17.4 Penrose diagram for Minkowski

space-timl!

234

17.5 Penrose diagram for the Kruskal

solution

237

Exercises

238

x I Contents

18. Charged black holes

239

18.1 The field of a charged mass point 239

18.2 Intrinsic and coordinate singularities 241

18.3 Space-time diagram of the

Reissner-Nordstrnm solution

242

18.4 Neutral particles in Reissuer-

Nordstrnm space-time

243

18.5 Penrose diagrams of the maximal

analytic extensions

244

Exercises

247

19. Rotating black holes

248

19.1 Null tetrads

248

19.2 The Kerr solution from a complex

transformation

250

19.3 The three main forms of the Kerr

solution

251

19.4 Basic properties of the Kerr solution 252

19.5 Singularities and horizons

254

19.6 The principal null congruences

256

19.7 Eddington-Finkelstein coordinates 258

19.8 The stationary limit

259

19.9 Maximal extension for the case

a2 < m2

260

19.10 Maximal extension for the case

a2 > m2

261

19.11 Rotating black holes

262

19.12 The singularity theorems

265

19.13 The Hawking effect

266

Exercises

268

Part E. Gravitational Waves 269

20. Plane gravitational waves

271

20.1 The linearized field equations

271

20.2 Gauge transformations

272

20.3 Linearized plane gravitational waves 274

20.4 Polarization states

278

20.5 Exact plane gravitational waves 280

20.6 Impulsive plane gravitational waves 282

20.7 Colliding impulsive plane gravita-

tional waves

283

20.8 Colliding gravitational waves

284

20.9 Detection of gravitational waves 285

Exercises

288

21. Radiation from an isolated

source

290

21.1 Radiating isolated sources

290

21.2 Characteristic hypersurfaces of

Einstein's equations

292

21.3 Radiation coordinates

293

21.4 Bondi's radiating metric

294

21.5 The characteristic initial value

problem

296

21.6 News and mass loss

297

21.7 The Petrov classification

299

21.8 The peeling-off theorem

301

21.9 The optical scalars

302

Exercises

303

Part F. Cosmology

305

22. Relativistic cosmology

307

22.1 Preview

307

22.2 Olbers' paradox

308

22.3 Newtonian cosmology

310

22.4 The cosmological principle

312

22.5 Weyl's postulate

314

22.6 Relativistic cosmology

315

22.7 Spaces of constant curvature

317

22.8 The geometry 'of 3-spaces of

constant curvature

319

22.9 Friedmann's equation

322

22.10 Propagation of light

324

22.11 A cosmological definition of

distance

325

22.12 Hubble's law in relativistic

cosmology

326

Exercises

329

23. Cosmological models

331

23.1 The flat space models

331

23.2 Models with vanishing cosmological

constant

334

23.3 Classification of Friedmann models 335

23.4 The de Sitter model

337

23.5 The first models

338

23.6 The time-scale problem

339

23.7 Later models

339

23.8 The missing matter problem

341

23.9 The standard models

342

23.10 Early epochs of the universe

343

23.11 Cosmological coincidences

343

23.12 The steady-state theory

344

23.13 The event horizon of the de Sitter

universe

348

23.14 Particle and event horizons

349

23.15 Conformal structure of

Robertson-Walker space-times

351

Contents I xi

23.16 Conformal structure of de Sitter

space-time

352

23.17 Inflation

354

23.18 The anthropic principle

356

23.19 Conclusion

358

Exercises

359

Answers to exercises

360

Further reading

370

Selected bibliography

372

Index

375

111111111111111111111111111111111111111111 111111111111111111 ■ 1111111111111111111111■ 1111111111■■■■ 111111111111111111111

6 . a l l1l1l1ll1l1ll1l1ll1l1ll1l1ll1l1ll■llll1l1ll1ll1l1ll1l1ll1l1■1B■ll1l1l1. 1■■l1l1l■11l■ll■lll■ll, 11111■■111■11■111•111•11·111·111·111·11·■·■·■·111·1111·111·111·111

l1!1ll1l1ll1ll1!■ llll■lll■lll■ llll1!1II1II1I■IIII1I1II■III1II1H■II

111

■■llllllll
IIIHlll ■■

■■■■■IIIHI
lll ■li ■IIIIIHI

IHllll■ IHl■■■ll ■ ll■ II •. .

111111■11111111 IIIHl ■■ III

111111111111111111111111■ 1111■ 111111 ■ 111•11 ■111111111111111111111 ■111111 ■11111

IIIIIIIIIIIHlllllllllll■ llll■■ ll■ llll■ IIIIIHlll■ lllll■llll■■■llll ■■ III

llllllllllll■ llllillllllll■ llllllllllll ■ IIIIIHIIHllll■ lllllll ■■•■■■■■llllll

==============1111111111111111111111111111
llllllll ■ ll■ llllllllllllllllllllll ■llll■■!llllll ■■■ ll ■■■IIIIIIIIIIIIIHlllll

111111111111■■ 1111111111 ■■■ 1111111111■■■111111■ 111111 ■ 11111111■ 111111 ■ 1

lllllllllllllll
1111111111111■

!■11i1l1l1l1l1l1l1l1l■ll■ll■111l■l■l1■11- 1 ■■11■ 11■1111■-■ •■ •■ •■ •■ •■ ••111•1■ •■■■11■111•1

■■■ I
■■ 1111

■ llllfll ■ llfllllllllllllllllll■ IIIIIIIHl■ ll ■llll ■■ IHIIHlllll ■■ll ■ IIIIIIII

1111111111111111 ■■■111111 ■■ 111111 ■11■■■1111 ■■■ 11■1111■ 11 ■11111111111

·················-···········••111■ 1111■ 1111111111 ·■·11·■·11·11·11·11-■·■·1·111·1 ■·1·1 ·■·■·11·■·■·■·■·■·1·1 ■·1•1■1■1111■1111111■11111111111111111111111■1■■

1111■
1111■

1111
11 ■

llll ■■ llllll ■■lllil ■IIIIIIIIIIHl■■■llllll■■■■llll■■ ll■ IIIIIIIIIIIIIII

·····••111111111 ■1111 lllll ■■■ IIIIIIIIIHlllll ■

■11111111111111111111 llllllllllll ■llll■ll

■ 11■■ ■■llll

••··••11■■■■■ ■ll ■llll ■ llllllll■ II

11 ■

lllil ■lllllllliillllllllllll ■ll■■■■■ ll■■ll■ llll■■■■■■■ llll■■ IIIIII ■

11111111111 ■1111 ■■ 1111111111■ 11 ■111111■ 1111111111111111■ 111111111 ■■ 11■1111 ■

l·ll·l·■·lll·l•■•ll■ll ■ ••■·l·ll·l·ll•l•ll1l1ll1l1l ■1I■I■II1I1H■ll■l ■11l11ll■ll■ll1l1■ ■■ ■l11l1l1l1l1l1l1ll■ll1l1l1l1■■■lll1l1l1l1ll■l

l1l1l1l1l1l1■ ■ l1l1l1l1l ■ ■1l1ll■ll■l ■■ l11l1l1l1l1l ■ ■■■■ 11■■l11l1l1ll■ll■ll1l1l1l1i■ll■l•■•ll·ll·■·ll·ll·■·ll·■·ll·■·■·ll·■·II·II

IIHlll■llll■ lllll■ IIIHlllll■ llllllll ■■■■ll■ll■■ llll ■ ll■ ll ■■■ll■ II ■

IIIIIIIHlllll ■ lllllllllllllll■■ll■ llll■ ll ■llll ■ll ■ llllllll■ llll■ llll■ IIII

111111■ 1111111111111■ 11111111111111111111111111■■■■ 11 ■111111■ 111 ■■■■11 ■ 11

ll!■ llllllllllllll■ llll■ llll■ ll ■■IHl■ llllllllll■ ll ■llllllll ■ ll■■■■ IIII

11111111 ■■!11 ■ 111111 ■111111111111■■■1111 ■111111■ 11 ■■ 111111 ■ 111■ 11 ■111111

111111111111111111111111111 ■11111111111111■■ 11■ 11111111 ■■ 1111■ 11 ■■ 111111 ■ 111

1111■111111111111111111111111111111■ 111111 ■ 1111 ■■1111 ■11 ■ 11111111■ 11111111 ■

llllllllllll ■■■l l■■ llll■ llll■ ll■■ lll ■■ ll ■■ ll ■■ llllllll ■■ llll■ lllll■ II ■

llllllllllilllllllll■■■■■ ll■■llll ■■lllll ■llll ■•■■ll ■■■■ llllllll■■ II llllllllllllll l■ IIIRIIIIB ■ llllllllll ■ llllllllll ■■■■■llll ■ll ■ IIIIIIIIIIIIII

111111■ 11 ■1111111■■111111111111111■ 11 ■111111■■■1111 ■11 ■■11 ■ 1111 ■ 111■ 11 ■

·11·11·11·111·1·■·■•11•11■11■1111111■111111■1■11 ■111111111111111■1■■■ 11■ 11■ ■■11 ■1111111■1 1■1111111111■1■ ■■ 11■■!■11 ■■■■■111■11111

■■ ll■ lllllllllllllllllllil ■llllll■ lll ■■ ll ■■■ ll■ ll ■ll ■ ll■■ llllll ■IIIIII

11 ■ 11■■ 11111111■11111111 ■1111 ■11 ■11■■1111 ■ 11111111■■1111 ■1111 ■ 11■ 11!11

■■ ll■■■■ ll ■IIIIHll■ lllllllll ■■ ll ■■ IIIHllll ■llll ■■ IIIIIIIIIIIIIIIIIIII ■

■■■llllllll ■ ll■■ llllllll■ ll ■llllll■ IHl■ lllllllll ■llllll■■•■■ IIIIII ■

11 ■■11 ■11■ 11 ■11111111 ■ 11■■■••·············••11■ 1111111111

llllllllllllllllllllllllll ■ll ■ llllllllllllllllllll■ ll ■■ IIIIIIIIIIIIIHl■ llll■ II ■

1.1 Notes for the student
There is little doubt that relativity theory captures the imagination. Nor is it surprising: the anti-intuitive properties of special relativity, the bizarre characteristics of black holes, the exciting prospect of gravitational wave detection and with it the advent of gravitational wave astronomy, and the sheer.scope and nature of cosmology and its posing of ultimate questions; these and other issues combine to excite the minds of the inquisitive. Yet, ifwe are to look at these issues meaningfully, then we really require both physical insight and a sound mathematical foundation. The aim of this book is to help provide these.
The book grew out of some notes I wrote in the mid-1970s to accompany a UK course on general relativity. Originally, the course was a third-year undergraduate option aimed at mathematicians and physicists. It subsequently grew to include M.Sc. students and some first-year Ph.D. students. Consequently, the notes, and with it the book, are pitched principally at the undergraduate level, but they contain sufficient depth and coverage to interest many students at the first-year graduate level. To help fulfil this dual purpose, I have indicated the more advanced sections (level-two material) by a grey shaded bar alongside the appropriate section. Level-one material is essential to the understanding of the book, whereas level two is enrichment material included for the more advanced student. To help put a bit more light and shade into the book, the more important equations and results an: given in tint panels.
In designing the course, I set myself two main objectives. First of all, I wanted the student to gain insight into, and confidence in handling, the basic equations of the theory. From the mathematical viewpoint, this requires good manipulative ability with tensors. Part B is devoted to developing the necessary expertise in tensors for the rest of the book. It is essentially written as a self-study unit. Students are urged to attempt all the exercises which accompany the various sections. Experience has shown that this is the only real way to be in a position to deal confidently with the ensuing material. From the physical viewpoint, I think the best route to understanding relativity theory is to follow the one taken by Einstein. Thus the second chapter of Part C is devoted to discussing the principles which guided Einstein in his search for a relativistic theory of gravitation. The field equations are approached first from a largely physical viewpoint using these principles and subsequently from a purely mathematical viewpoint using the

4 I The organization of the book
variational principle approach. After a chapter devoted to investigating the quantity which goes on the 'right-hand side' of the equations, the structure of the equations is discussed as a prelude to solving them in the simplest case. This part of the course ends by cpnsidering the experimental status of general relativity. The course originally assumed that the student had some reasonable knowledge of special relativity. In fact, over the years, a growing number of students have taken the course without this background, and so, for completeness, I eventually added Part A. This is designed to provide an introduction to special relativity sufficient for the needs of the rest of the
book. The second main objective of the course was to develop it in such a way
that it would be possible to reach three major topics of current interest, namely, black holes, gravitational waves, and cosmology. These topics form the subject matter of Parts D, E, and F respectively.
Each of the chapters is supported by exercises, numbering some 300 in total. The bulk of these are straightforward calculations used to fill in parts omitted in the text. The numbers in parentheses indicate the sections to which the exercises refer. Although the exercises in general are important in aiding understanding, their status is different from those in Part B. I see the exercises in Part B as being absolutely essential for understanding the rest of the book and they should not be omitted. The remaining exercises are desirable. The book is neither exhaustive nor complete, since there are topics in the theory which we do not cover or only meet briefly. However, it is hoped that it provides the student with a sound understanding of the basics of the theory.
A few words of advice if you find studying from a book hard going. Remember that understanding is not an all or nothing process. One understands things at deeper and deeper levels, as various connections are made or ideas are seen in different contexts or from a different perspective. So do not simply attempt to study a section by going through it line by line and expect it all to make sense at the first go. It is better to begin by reading through a few sections quickly- skimming- thereby trying to get a general feel for the scope, level, and coverage of the subject matter. A second reading should be more thorough, ·but should not stop if ideas are met which are not clear straightaway. In a final pass, the sections should be studied in depth with the exercises attempted at the end of each section. ·However, if you get stuck, do not stop there, press on. You will often find that the penny will drop later, sometimes on its own, or that subsequent work will produce the necessary understanding. Many exercises (and exam questions) are hierarchical in nature. They require you to establish a result at one stage which is then used at a subsequent stage. If you cannot establish the result, then do not give up. Try and use it in the subsequent section. You will often find that this will give you the necessary insight to allow you .to go back and establish the earlier result. For most students, frequent study sessions of not too long a duration are more productive than occasional long drawn out sessions. The best study environment varies greatly from one individual to another. Try experimenting with different environments to find out what is the most effective for you.
As far as initial conditions are concerned, that is assumptions about your background, it is difficult to be precise, because you can probably get by with much less than the book might seem to indicate (see §1.5). Added to which, there is a big difference between understanding a topic fully and only having some vague acquaintance with it. On the mathematical side, you certainly

need to know calculus, up to and including partial differentiation, and solution of simple ordinary differential equations. Basic algebra is assumed and some matrix theory, although you can probably take eigenvalues and diagonalisation on trust. Familiarity with vectors and some exposure to vector fields is assumed. It would also be good to have met the ideas of a vector space and bases. We use Taylor's theorem a lot, but probably knowledge of Maclaurin's will be sufficient. On the Physics side, you obviously need to know Newton's laws and Newtonian gravitation. It would be helpful also to know a little about the potential formulation of gravitation (though, again, just the basics will do). The book assumes familiarity with electromagnetism (Maxwell's equations, in particular) and fluid dynamics (the Navier-Stokes equation, in particular), but neither of these are absolutely essential. It would be very helpful to have met some ideas about waves
(such as the fundamental relationship c = lv) and the wave equation in
particular. In cosmology, it is assumed that you know something about basic astronomy.
Having listed all these topics, then, if you are still unsure about your background,, my approach would be to say: try the book and see how you get on, if it gets beyond you (and it is not a level two section) press on for a bit and, if things do not get any better, then cut out. Hopefully, you may still have learnt a lot, and you can always come back to it when your background is stronger. In fact, it should not require much background to get started, for part A on special relativity assumes very little. After that you hit part B, and this is where your motivation will be seriously tested. I hope you make it through because the pickings on the other side are very rich indeed. So, finally, good luck!
1.2 Acknowledgements
Very little of this book is wholly original. When I drew up the notes, I decided from the outset that I would collect together the best approaches to the material which were known to me. Thus, to take an example right from the beginning of the book, I believe that the k-calculus provides the best introduction to special relativity, because it offers insight from the outset through the simple diagrams that can be drawn. Indeed one of the themes of this book is the provision of a large number of illustrative diagrams (over 200 in fact). The visual sense is the most immediate we possess and helps lead directly to a better comprehension. A good sl.lbtitle for the book would be, An approach to relativity theory via space-time pictures. The k-calculus is an approach developed by H. Bondi from the earlier ideas of A. Milne. My use of it is not surprising since I spent my years as a research student at King's College, London, in the era of Hermann Bondi and Felix Pirani, and many colleagues will detect their influences throughout the book. So the fact is that many of the approaches in the book have been borrowed from one author or another; there is little that I have written completely afresh. My intention has been to organize the material in such a way that it is the more readily accessible to the majority of students.
General relativity has the reputation of being intellectually very demanding. There is the apocryphal story, I think attributed to Sir Arthur Eddington, who, when asked whether he believed it true that only three people in the world understood general relativity, replied, 'Who is the third?'

1.2 Acknowledgements I 5

I6 The organization of the book
Indeed, the intellectual leap required by Einstein to move from the special theory to the general theory is, there can be little doubt, one of the greatest in the history of human thought. So it is not surprising that the theory has the reputation it does. However, general relativity has been with us for some three-quarters of a century and our understanding is such that we can now build it up in a series of simple logical steps. This brings the theory within the grasp of most undergraduates equipped with the right ba;,,kground.
Quite clearly, I owe a huge debt to all the authors who have provided the source material for and inspiration of this book, However, I cannot make the proper detailed acknowledgements to all these authors, because some of them are not known even to me, and I would otherwise run the risk of leaving somebody out. Most of the sources can be found in the bibliography given at the end of the book, and some specific references can be found in the section on further reading. I sincerely hope I have not offended anyone (authors or publishers) in adopting this approach. I have written this book in the spirit that any explanation that aids understanding should ultimately reside in the pool of human knowledge and thence in the public domain. None the less, I would like to thank all those who, wittingly or unwittingly, have made this book possible. In particular, I would like to thank my old Oxford tutor, Alan Tayler, since it was largely his backing that led finally to the book being produced. In the process of converting the notes to a book, I have made a number of changes, and have added sections, further exercises, and answers. Consequently this new material, unlike the earlier, has not been vetted by the student body and it seems more than likely that it may contain errors of one sort or another. If this is the case, I hope that it does not detract too much from the book and, of course, I would be delighted to receive corrections from readers. However, I have sought some help and, in this respect, I would particularly like to thank my colleague James Vickers for a critical reading of
much of the book. Having said I do not wish to cite my sources, I now wish to make one
important exception. I think it would generally be accepted in the relativity community that the most authoritative text in existence in the field is The large scale structure of space-time by Stephen Hawking and George Ellis (published by Cambridge University Press). Indeed, this has taken on something akin to the status of the Bible in the field. However, it is written at a level which is perhaps too sophisticated for most undergraduates (in parts too sophisticated for most specialists!). When I compiled the notes, I had in mind the aspiration that they might provide a small stepping stone to Hawking and Ellis. In particular, I hoped it might become the next port of call for anyone wishing to pursue their interest further. To that end, and because I cannot improve on it, I have in places included extracts from that source virtually verbatim. I felt that, if students were to consult this text, then the familiarity of some of the material might instil confidence and encourage them to delve deeper. I am hugely indebted to the authors for allowing me to
borrow from their superb book.
1.3 A brief survey of relativity theory
It might be useful, before embarking on the course proper, to attempt to give some impression of the areas which come under the umbrella of relativity theory. I have attempted this schematically in Fig. 1.1. This is a rather partial

1.3 A brief survey of relativity theory I 7

~--1 rl Quantum theory 1-------------------------------- r--i I Differential geometry

H Electrodynamics

------------------------------, d r4 H Thermodynamics
----------------------------~ H Kinetictheory

I

I

I

I I

1--------------------------- I

I I
I

I I I

I II

I I I I I I
I

H Statistical mechanics ----------------------- ·

I II

I

"' .., .., .., .., ... ,!.

I

Differential topology

I I

I t tI I

I
t

I
t

I I
I

I
I I

I I

l

y

Special relativity

I I

Relativity

I
I

J.

General relativity

I I I I
lI I I I I I
Cosmology

I

,l,

i'

I
i' "'

I
+

i'

Astronomy Astrophysics
I

Experimental tests

~ Exact solutions

~ Formalisms

~

Gravitational radiation

H

Gravitational collapse

~

Orbits Gravitational waves Black holes Gravitational red shift Radar signals Light bending Gyroscopes

Classification Equivalence problem Analytic extensions Singularities Cosmic strings Complex techniques Transformation groups Algebraic computing

Tensors Frames Forms Spinors Spin coefficients Twistors

Waves Energy transfer Conservation laws Equations of motion Asymptotic structure of space-time Variational principles Group representations

Black holes Singularity theorems Global techniques Cosmic censorship

!
---++-- Initial value problem M
Hamiltonian formulation Stability theorems Superspace Positive mass theorems Numerical relativity

!
Alternative theories i..i
Torsion theories Brans-Dicke Hoyle-Narlikar Whitehead Bimetric theories etc.

!
Unified field theory
Kaluza-Klein theory

!
~ Quantum gravity
Canonical gravity Quantum theory on curved backgrounds Path-integral approach Su pe rgravity Superstrings etc.

Fig.1.1 An individual survey of relativity.

and incomplete view, but should help to convey some idea of our planned route. Most of the topics mentioned are being actively researched today. Of course, they are interrelated in a much more complex way than the diagram suggests.
Every few years since 1955 (in fact every three since 1959), the relativity community comes together in an international conference of general relativity and gravitation. The first such conference held in Berne in 1955 is now referred to as GRO, with the subsequent ones numbered accordingly. The list, to date, of the GR conferences is given in Table 1.1. At these conferences, there are specialist discussion groups which are held covering the whole area of interest. Prior to GR8, a list was published giving some detailed idea of what each discussion group would cover. This is presented below and may be used as an alternative to Fig. 1.1 to give an idea of the topics which comprise the subject.

Table 1.1
GRO 1955 Bern, Switzerland GRl 1957 Chapel Hill , North Carolina , USA GR2 1959 Royaumont, France GR3 1962 Jablonna, Poland GR4 1965 London, England GR5 1968 Tbilisi, USSR GR6 1971 Copenhagen, Denmark GR7 1974 Tel-Aviv, Israel GR8 1977 Waterloo, Canada GR9 1980 Jena, DDR GRlO 1983 Padua, Italy GRll 1986 Stockholm , Sweden GR12 1989 Boulder, Colorado, USA

8 I The organization of the book
I. Relativity and astrophysics
Relativistic stars and binaries; pulsars and quasars; gravitational waves and gravitational collapse; black holes; X-ray sources and accretion models.
II. Relativity and classical physics
Equations of motion; conservation laws; kinetic theory; asymptotic flatness and the positivity of energy; Hamiltonian theory, Lagrangians, and field theory; relativistic continuum mechanics, electrodynamics, and thermodynamics.
III. Mathematical relativity Differential geometry and fibre bundles; the topology of manifolds; applications of complex manifolds; twistors; causal and conformal structures; partial differential equations and exact solutions; stability; geometric singularities and catastrophe theory; spin and torsion: Einstein-Cartan theory.
IV. Relativity and quantum physics
Quantum theory on curved backgrounds; quantum gravity; gravitation and elementary particles; black hole evaporation; quantum cosmology.
V. Cosmology
Galaxy formation; super-clustering; cosmological consequences of spontaneous symmetry breakdown: domain structures; current estimates of cosmological parameters; radio source counts; microwave background; the isotropy of the universe; singularities.
VI. Observational and experimental relativity
Theoretical frameworks and viable theories; tests of relativity; gravitational wave detection; solar oblateness.
VII. Computers in relativity
Numerical methods; solution of field equations; symbolic manipulation systems in general relativity.
1.4 Notes for the teacher
In my twenty years as a university lecturer, I have undergone two major conversions which have profoundly affected the way I teach. These have, in their way, contributed to the existence of this book. The first conversion was to the efficacy of the printed word. I began teaching, probably like most of my colleagues, by giving lectures using the medium of chalk and talk. I soon discovered that this led to something of a conflict in that the main thing that students want from a course (apart from success in the exam) is a good set of lecture notes, whereas what I really wanted was that they should understand the course. The process of trying to give students a good set of lecture notes meant that there was, to me, a lot of time wasted in the process of note taking. I am sure colleagues know the caricature of the conventional lecture: notes are copied from the lecturer's notebook to the student's notebook without their going through the heads of-either-a definition which is perhaps too

close for comfort. I was converted at an early stage to the desirability of providing students with printed notes. The main advantage is that it frees up the lecture period from the time-consuming process of note copying, and the time released can be used more effectively for developing and explaining the course at a rate which the students are able to cope with. I still find that there is something rather final and definitive about the printed word. This has the effect on me of making me think more carefully about what goes into a course and how best to organize it. Thus, printed notes have the added advantage of making me put more into the preparation of a course than I would have done otherwise. It must be admitted that there are some disadvantages with using printed notes, but this is not the place to elaborate on them.
My second conversion was to the efficacy of self-study. This is a rather elaborate title for the concept of students studying and learning on their own from books or prepared materials. It is still a surprise to me just how little of this actually goes on in certain disciplines. And yet you would think that one of the main objectives of a university education is to teach students how to use books. My experience is that, in mathematics particularly, students find this hard to do. This is not so surprising since it requires high-level skills which many do not come to university equipped with. So one needs a mechanism which encourages students to use books. My first experience was in designing a Keller-type (self-paced) self-study course, where the students study from specially prepared units and are required to pass periodic tests before they move on to new topics. This eventually led me in other courses to use a coursework component counting towards a final assessment as a mechanism for helping to get students to study on their own. I have been involved in a good deal of research into this approach and the most frequent remark students make about coursework is that 'it gets me to work'. The coursework approach was particularly important in the design of the general relativity course for reasons which I believe are worth exploring.
In the mid-1970s, there were very few undergraduate courses in general relativity in existence in the UK. Those that there were usually only got as far as the Schwarzschild solution and then stopped. This was because the bulk of the course was devoted to developing the necessary expertise in tensors and there did'not seem to be any short cut. This meant, from the viewpoint of both the student and the teacher, that the course ended just as things were beginning to get really interesting. It was clear to me that what students really wanted to know about most were the topics of black holes, gravitational waves, and cosmology. So, from the outset, the object was to design a course which made this possible. It was achieved by separating out what is Part B of this book as a self-study unit on tensors. The notes were distributed at the beginning of the course and the students were instructed to begin immediately working through the self-study part and attempting all the exercises. The fact that most students put in the bulk of their efforts in their other courses towards the end of these courses helped in this respect, since they were less heavily loaded at the outset. The students were offered some optional tutorials in case they got stuck (as some undertaking individual study for the first time invariably did). The inducement for doing the exercises was that they counted towards the final assessment (by some 35 per cent currently). The deadline for completing the exercises was set for about a third of the way through the course. While the students were busy in their own time working on the tensors, the lecture course began by revising the key ideas in

1.4 Notes for the teacher I 9

10 I The organization of the book
special relativity. The special theory was then formulated in a tensorial way, making use of the new language and so providing some initial motivation. This was followed by a detailed and deliberate development of the principles underlying general relativity. Tensors are then used in earnest for the first time in deriving the equation of geodesic deviation of Chapter 10. It is by about this time that the students have gained considerable expertise in manipulating tensors and the lectures help to provide further motivation and consolidation. This device means that the Schwarzschild solution can be reached by not much more than half-way through the course. Another important advantage of printed lecture notes is that one has much more control over the speed at which the course is delivered, and one can to some extent tune the speed to fit the capabilities of the class.
The Southampton course is some thirty-six lectures in length. In the early years, when the students had a good background in special relativity, I was able to cover all three end topics. Indeed, in the first year of operation, I ended up in the final week by organizing five seminars given by outside speakers which all the students attended and which attempted to show how the work we had covered related to some topics of current research interest. In more recent years, the preparation of the students in special relativity has been more patchy, and so I have taken this more on board and have been somewhat less ambitious. This has usually meant leaving out a topic such as rotating black holes or gravitational radiation. Of course, since these are contained in the notes, the students are able to fill in these gaps if they so choose.
I have been encouraged to write up the notes in book form for a number of reasons. The course has been running for some fifteen years and several hundred students have been through it, so that I have a good deal of consumer experience to draw upon. Not only has the course proved popular, but it seems to have coped surprisingly successfully with students of a wide ability range. This may in part be because I have included many of the more detailed steps in the text itself (and where these have been left out they have often been relegated to straightforward exercises). In fact, the notes are sold to the students to cover the cost of production. It has been gratifying that each year a number of students who are not on the course, sometimes not even in a related discipline, but who have by chance come across the notes, purchase a copy for themselves. Finally, a number ofmy relativity colleagues both in the UK and abroad have asked for copies and used them in varying degrees in their own courses. So, despite the fact that there are a number of fine texts around in the area, I have agreed to present the notes in book form. I hope you, the teacher, find them a valuable resource in your teaching.
1.5 A final note for the less able student
I was far from being a child prodigy, and yet I learnt relativity at the age of 15! Let me elaborate. As testimony to my intellectual ordinariness, I had left my junior school at the age of 11 having achieved the unremarkable feat of coming 22nd in the class in my final set of examinations. Yet I really did know some relativity four years on -and I don't just mean the special theory, but the general theory (up to and including the Schwarzschild solution and the classical tests). I remember detecting a hint of disbelief when I recounted this to the same Alan Tayler, who was later to become my tutor, in an Oxford

Exercises I 11

entrance interview. He followed up by asking me to define a tensor, and when I rattled off a definition, he seemed somewhat surprised. Indeed, as it turned out, we did not cover very much more than I first knew in the Oxford third year specialist course on general relativity. So how was this possible?
I, too, had heard the s-tory about how only a few people in the world really understood relativity, and it had aroused my curiosity. I went to the local library and, as luck would have it, I pulled out a book entitled Einstein's Theory of Relativity by Lillian Lieber (1949). This is a very bizarre book in appearance. The book is not set out in the usual way but rather as though it were concrete poetry. Moreover, it is interspersed by surrealist drawings by Hugh Lieber involving the symbols from the text (Fig. L2). I must confess that at first sight the book looks rather cranky; but it is not. I worked through it, filling in all the details missing from the calculations as I went. What was amazing was that the book did not make too many assumptions about what mathematics the reader needed to know. For example, I had not then met partial differentiation in my school mathematics, and yet there was sufficient coverage in the book for me to cope. It felt almost as if the book had been written just for me. The combination of the intrinsic interest of the material and the success I had in doing the intervening calculations provided sufficient motivation for me to see the enterprise through to the end.
Perhaps, if you consider yourself a less able student, you are a bit daunted by the intellectual challenge that lies ahead. I will not deny that the book includes some very demanding ideas (indeed, I do not understand every facet of all of these ideas myself). But I hope the two facts that the arguments are broken down into small steps and that the calculations are doable, will help you on your way. Even if you decide to cut out after part C, you will have come a long way. Take heart from my little story- I am certain that if you persevere you will consider it worth the effort in the end.

Fig. 1.2. 'The product of two tensors is equal to another' according to Hugh Lieber.

Exercises

1.1 (§1.3) Go to the library and see if you can locate current copies of the following journals:
(i) General Relativity and Gravitation; (ii) Classical and Quantum Gravity; (iii) Journal of Mathematical Physics; (iv) Physical Review D.
See if you can relate any of the articles in them to any of the topics contained in Fig. 1.1.

1.2 Look back through copies of Scientific American for future reference, to see what articles there have been in recent years on relativity theory, especially black holes, gravitational waves, and cosmology.
1.3 Read a biography of Einstein (see Part A of the Selected Bibliography at the end of this book).

2.1 Model building
Before we start, we should be clear what we are about. The essential activity of mathematical physics, or theoretical physics, is that of modelling or model building. The activity consists of constructing a mathematical model which we hope in some way captures the essentials of the phenomena we are investigating. I think we should never fail to be surprised that this turns out to be such a productive activity. After all, the first thing you notice about the world we inhabit is that it is an extremely complex place. The fact that so much of this rich structure can be captured by what are, in essence, a set of simple formulae is to me quite astonishing. Just think how simple Newton's universal law of gravitation is; and yet it encompasses a whole spectrum of phenomena from a falling apple to the shape of a globular cluster of stars. As Einstein said, 'The most incomprehensible thing about the world is that it is comprehensible.'
The very success of the activity of modelling has, throughout the history of science, turned out to be counterproductive. Time and again, the successful model has been confused with the ultimate reality, and this in turn has stultified progress. Newtonian theory provides an outstanding example of this. So successful had it been in explaining a wide range of phenomena, that, after more than two centuries of success, the laws had taken on an absolute character. Thus it was that, when at the end of the nineteenth century it was becoming increasingly clear that something was fundamentally wrong with the current theories, there was considerable reluctance to make any fundamental changes to them. Instead, a number of artificial assumptions were made in an attempt to explain the unexpected phenomena. It eventually required the genius of Einstein to overthrow the prejudices of centuries and demonstrate in a. number of simple thought experiments that some of the most cherished assumptions of Newtonian theory were untenable. This he did in a number of brilliant papers written in 1905 proposing a theory which has become known today as the special theory of relativity.
We should perhaps be discouraged from using words like right or wrong when discussing a physical theory. Remembering that the essential activity is model building, a model should then rather be described as good or bad, depending on how well it describes the phenomena it encompasses. Thus, Newtonian theory is an excellent theory for describing a whole range of phenomena. For example, if one is concerned with describing the motion of a car, then the Newtonian framework is likely to be the appropriate one.

16 I The k-calculus

However, it fails to be appropriate if we are interested in very high speeds (comparable with the speed oflight) or very intense gravitational fields (such as in the nucleus of a galaxy). To put it another way: together with every theory, there should go its range of validity. Thus, to be more precise, we should say that Newtonian theory is an excellent theory within its range of validity. From this point of view, developing our models of the physical world does not involve us in constantly throwing theories out, since they are perceived to be wrong, or unlearning them, but rather it consists more of a process of refinement in order to increase their range of validity. So the moral of this section is that, for all their remarkable success, one must not confuse theoretical models with the ultimate reality they seek to describe.
2.2 Historical background
In 1865, James Clerk Maxwell put forward the theory of electromagnetism. One of the triumphs of the theory was the discovery that light waves are electromagnetic in character. Since all other known wave phenomena required a material medium in which the oscillations were carried, it was postulated that there existed an all-pervading medium, called the 'luminiferous ether', which carried the oscillations of electromagnetism. It was then anticipated that experiments with light would allow the absolute motion of a body through the ether to be detected. Such hopes were upset by the null result of the famous Michelson-Morley experiment (1881) which attempted to measure the velocity of the earth relative to the ether and found it to be undetectably small. In order to explain this null result, two ad hoc hypotheses were put forward by Lorentz, Fitzgerald, and Poincare (1895), namely, the contraction of rigid bodies and the slowing down of clocks when moving through the ether. These effects were contained is some simple formulae called the 'Lorentz transformations'. This would affect every apparatus designed to measure the motion relative to the ether so as to neutralize exactly all expected results. Although this theory was consistent with the observations, it had the philosophical defect that its fundamental assumptions were unverifiable.
In fact, the essence of the special theory of relativity is contained in the Lorentz transformations. However, Einstein was able to derive them from two postulates, the first being called the 'principle of special relativity' - a principle which Poincare had also suggested independently in 1904 - and the second concerning the constancy of the velocity of light. In so doing, he was forced to re-evaluate our ideas of space and time and he demonstrated through a number of simple thought experiments that the source of the limitations of the classical theory lay in the concept of simultaneity. Thus, although in a sense Einstein found nothing new in that he rederived the Lorentz transformations, his derivation was physically meaningful and in the process revealed the inadequacy of some of the fundamental assumptions of classical thought. Herein lies his chief contribution.
2.3 Newtonian framework
We start by outlining the Newtonian framework. An event intuitively means something happening in a fairly limited region of space and for a short duration in time. Mathematically, we idealize this concept to become a point

2.4 Galilean transformations I 17

I

• X

p

Fig. 2.1 Train travels in straight line.

in space and an instant in time. Everything that happens in the universe is an event or collection of events. Consider a train travelling from one station P to another R, leaving at 10 a.m. and arriving at 11 a.m. We can illustrate this in the following way: for simplicity, let us assume that the motion takes place in a straight line (say along the x-axis (Fig. 2.1); then we can represent the motion by a space-time diagram (Fig. 2.2) in which we plot the position of some fixed point on the train, which we represent by a pointer, against time. The curve in the diagram is called the history or world-line of the pointer. Notice that at Q the train was stationary for a period.
We shall call individuals equipped with a clock and a measuring rod or ruler observers. Had we looked out of the train window on our journey at a clock in a passing station then we would have expected it to agree with our watch. One of the central assumptions of the Newtonian framework is that two observers wiJJ, once they have synchronized their clocks, always agree about the time of an event, irrespective of their relative motion. This implies that for all observers time is an absolute concept. In particular, all observers can agree on an origin of time. In order to fix an event in space, an observer may choose a convenient origin in space together with a set of three Cartesian coordinate axes. We shall refer to an observer's clock, ruler, and coordinate axes as a frame of reference (Fig. 2.3). Then an observer is able to coordinatize events, that is, determine the time t an event occurs and its relative position (x, y, z).
We have set the stage with space and time; they provide the backcloth, but what is the story about? The stuff which provides the events of the universe is matter. For the moment, we shall idealize lumps of matter into objects called bodies. If the body has no physical extent, we refer to it as a particle or point mass. Thus, the role of observers in Newtonian theory is to chart the history of bodies.

t
11

10~---~-------.

p

Q

R X

Fig. 2.2 Space-time diagram of pointer.

y

t

G)

I' l""I

>------x

z Fig. 2.3 Observer's frame of reference.

2.4 Galilean transformations
Now, relativity theory is concerned with the way different observers see the same phenomena. One can ask: are the laws of physics the same for all observers or are there preferred states of motion, preferred reference systems, and so on? Newtonian theory postulates the existence of preferred frames of reference. This is contained essentially in the first law, which we shall call Nl and state in the following form:

Thus, there exists a privileged set of bodies, namely those not acted on by forces. The frame of reference of a co-moving observer is called an inertial frame (Fig. 2.4). It follows that, once we have found one inertial frame, then all

18 I The k-calculus

Fig. 2.4 Two observed bodies and their inertial frames.

y

y'

Fig. 2.5 Two frames in standard
configuration at time t.

r

V

r'0\:Y~,·=·•~j"='~j•~•j='~X

0,""'""'l""l"'I - - - - - - - --- - - - - - - - - - - - - - - - - S

J(

z

z'

others are at rest or travel with constant velocity relative to it (for otherwise Newton's first law would no longer be true). The transformation which connects one inertial frame with another is called a Galilean transformation. To fix ideas, let us consider two inertial frames called S and S' in standard configuration, that is, with axes parallel and S' moving along S's positive xaxis with constant velocity (Fig. 2.5). We also assume that the observers synchronize their clocks so that the origins of time are set when the origins of the frames coincide. It follows from Fig. 2.5 that the Galilean transformation connecting the two frames is given by

The last equation provides a manifestation of the assumption of absolute time in Newtonian theory. Now, Newton's laws hold only in inertial frames. From a mathematical viewpoint, this means that Newton's laws must be invariant under a Galilean transformation.
2.5 The principle of special relativity
We begin by stating the relativity principle which underpins Newtonian theory

I 2.6 The constancy of the velocity of light 19
This means that, if one inertial observer carries out some dynamical experiments and discovers a physical law, then any other inertial observer performing the same experiments must discover the same law. Put another way, these laws must be invariant under a Galilean transformation. That is to say, if the law involves the coordinates x, y, z, t of an inertial observer S, then the law relative to another observer S' will be the same with x, y, z, t replaced by x', y', z', t', respectively. Many fundamental principles of physics are statements of impossibility, and the above statement of the relativity principle is equivalent to the statement of the impossibility of deciding, by performing dynamical experiments, whether a body is absolutely at rest or in uniform motion. In Newtonian theory, we cannot determine the absolute position in space of an event, but only its position relative to some other event. In exactly the same way, uniform velocity has only a relative significance; we can only talk about the velocity of a body relative to some other. Thus, both position and velocity are relative concepts.
Einstein realized that the principle as stated above is empty because there is no such thing as a purely dynamical experiment. Even on a very elementary level, any dynamical experiment we think of performing involves observation, i.e. looking, and looking is a part of optics, not dynamics. In fact, the more one analyses any one experiment, the more it becomes apparent that practically all the branches of physics are involved in the experiment. Thus, Einstein took the logical step of removing the restriction of dynamics in the principle and took the following as his first postulate.
Hence we see that this principle is in no way a contradiction of Newtonian thought, but rather constitutes its logical completion.
2.6 The constancy of the velocity of light
We previously defined an observer in Newtonian theory as someone equipped with a clock and ruler with which to map the events of the universe. However, the approach of the k-calculus is to dispense with the rigid ruler and use radar methods for measuring distances. (What is rigidity anyway? If a moving frame appears non-rigid in another frame, which, if either, is the rigid one?) Thus, an observer measures the distance of an object by sending out a light signal which is reflected off the object and received back by the observer. The distance is then simply defined as half the time difference between emission and reception. Note that by this method distances are measured in
~ intervals of time, like the light year or the light second ( 1010 cm).
Why use light? The reason is that we know that the velocity of light is independent of many things. Observations from double stars tell us that the velocity of light in vacuo is independent of the motion of the sources as well as independent of colour, intensity, etc. For, if we suppose that the velocity of light were dependent on the motion of the source relative to an observer (so that if the source was coming towards us the light would be travelling faster and vice versa) then we would no longer see double stars moving in Keplerian

20 I The k-calculus

orbits (circles, ellipses) about each other: their orbits would appear distorted; yet no such distortion is observed. There are many experiments which confirm this assumption. However, these were not known to Einstein in 1905, who adopted the second postulate purely on heuristic grounds. We state the
second postulate in the following form.

Or stated another way: there is no overtaking oflight by light in empty space. The speed of light is conventionally denoted by c and has the exact numerical value 2.997 924 580 x 108 ms - 1, but in this chapter we shall adopt relativistic
units in which c is taken to be unity (i.e. c = 1). Note, in passing, that another
reason for using radar methods is that other methods are totally impracticable for large distances. In fact, these days, distances from the Earth to the Moon and Venus can be measured very accurately by bouncing radar signals off them.
2. 7 The k-factor
For simplicity, we shall begin by working in two dimensions, one spatial dimension and one time dimension. Thus, we consider a system of observers distributed along a straight line, each equipped with a clock and a flashlight. We plot the events they map in a two-dimensional space-time diagram. Let us assume we have two observers, A at rest and B moving away from A with uniform (constant) speed. Then, in a space-time diagram, the world-line of A will be represented by a vertical straight line and the world-line of B by a straight line at an angle to A's, as shown in Fig. 2.6.
A light signal in the diagram will be denoted by a straight line making an angle ¼n with the axes, because we are taking the speed of light to be 1. Now, suppose A sends out a series of flashes of light to B, where the interval between the flashes is denoted by T according to A's clock. Then it is plausible to assume that the intervals of reception by B's clock are proportional to T, say kT. Moreover, the quantity k, which we call the k-factor, is
B

Time

A

B

~ - - -- - - -- -space
Fig. 2.6 The world -lines of observers A and 8.

T
Fig. 2.7 The reciprocal nature of the k-factor.

I 2.8 Relative speed of two inertial observers 21

clearly a characteristic of the motion ofB relative to A. We now assume that if A and Bare inertial observers, then k is a constant in time. (In fact, there is a hidden assumption here, since how do we know that B's world-line will be a straight line as indicated in the diagram? Strictly speaking, we are assuming that there is a linear relationship between the space and time coordinates of A and B.) Then the principle of special relativity requires that the relationship between A and B must be reciprocal, so that, if B emits two signals with a time lapse of T according to B's clock, then A receives them after a time lapse of kT according to A's clock (Fig. 2.7). Note that, from B's point of view, A is moving away from B with the same relative speed.
Observer A assigns coordinates to an event P by bouncing a light signal off it. So that if a light signal is sent out at a time t = t1 , and received back at a time t = t2 (Fig. 2.8), then, according to our radar definition of distances, the coordinates of P are given by

t, P(t,x)
L---------x
Fig. 2.8 Coordinatizing events.

(t, x) = (½(t1 + t2), ½(t2 - t1)),

(2.2)

remembering that the velocity of light is 1. We now use the k-factor to develop the k-calculus.

2.8 Relative speed of two inertial observers

Consider the configuration shown in Fig. 2.9 and assume that A and B

A

B

synchronize their clocks to zero when they cross at event 0. After a time T, A

sends a signal to B, which is reflected back at event P. From B's point of view,

a light signal is sent to A after a time lapse of kT by B's clock. It follows from

the definition of the k-factor that A receives this signal after a time lapse of

k(kT). Then, using (2.2) with t 1 = Tand t2 = k2 T, we find the coordinates of

P according to A's clock are given by

k'T

(t,x) = (½(k2 + l)T,½(k2 - l)T).

(2.3)

Thus, as Tvaries, this gives the coordinates of the events which constitute B's world-line. Hence, if v is the velocity of B relative to A, we find

X k2 - 1 V=t=k2+1·
Solving for k in terms of v, and noting from the diagram that k must be greater than 1 if the observers are separating, we find

Fig. 2.9 Relating the k-factor to the relative speed of separation.

T

We shall see in the next chapter that this is the usual relativistic formula for

the radial Doppler shift. If B is moving away from A then k > 1 which

T

represents a 'red' shift, whereas if B is approaching A then k < 1 which

represents a 'blue' shift. Note that the transformation v -+ - v corresponds to

k ➔ l/k. Moreover,

= v = 0

k = 1,

T

as we should expect for observers relatively at rest: once they have syn- Fig. 2.10 Observers relatively at rest

chronized their clocks, the synchronization remains (Fig. 2.10).

(k = 1).

22 I The k-calculus
A

2.9 Composition law for velocities

Consider the situation in Fig. 2.11, where kAB denotes the k-factor between A and B, with kBc and kAc defined similarly. It follows immediately that

= kAc kABkBc·

(2.5)

Using (2.4), we find the corresponding composition law for velocities:

T Fig. 2.11 Composition of k-factors.

This formula has been confirmed by Fizeau's experiment in which the speed of light in a moving fluid is measured and turns out not to be simply the sum of the speed of light and the moving fluid but rather obeys the more complicated law (2.6) to higher order. Note that, if vAB and vBc are small compared with the speed of light, i.e.
then we obtain the classical Newtonian formula
to lowest order. Although the composition law for velocities is not simple, the one for k-factors is, and in special relativity it is the k-factors which are the directly measurable quantities. Note also that, formally, if we substitute
vBc = 1, representing the speed of a light signal relative to B, in (2.6), then the
resulting speed of the light signal relative to A is
+ V,tB 1 1
V,tc=-1+ =,
VAB
in agreement with the constancy of the velocity of light postulate. From the composition law, we can show that, if we add two speeds less
than the speed of light, then we again obtain a speed less than the speed of light. This does not mean, as is sometimes stated, that nothing can move faster than the speed of light in special relativity, but rather that the speed of light is a border which can not be crossed or even reached. More precisely, special relativity allows for the existence of three classes of particles.
1. Particles that move slower than the speed of light are called subluminal particles. They include material particles and elementary particles such as electrons and neutrons.
2. Particles that move with the speed of light are called luminal particles. They include the carrier of the electromagnetic field interaction, the photon, and theoretically the carrier of the gravitational field interaction, called the graviton. These are both particles with zero rest mass (see §4.5). It was thought that neutrinos also had zero rest mass, but more recent evidence suggests they may have a tiny mass.
3. Particles that move faster than the speed of light are called superluminal particles or tachyons. There was some excitement in the 1970s surrounding the possible existence of tachyons, but all attempts to detect them to date have failed. This suggests two likely possibilities: either tachyons do

2.10 The relativity of simultaneity I 23
not exist or, if they do, they do not interact with ordinary matter. This would seem to be just as well, for otherwise they could be used to signal back into the past and so would appear to violate causality. For example, it would be possible theoretically to construct a device which sent out a tachyon at a given time and which would trigger a mechanism in the device to blow it up before the tachyon was sent out!

2.10 The relativity of simultaneity

Consider two events P and Q which take place at the same time, according to A, and also at points equal but opposite distances away. A could establish this by sending out and receiving the light rays as shown in Fig. 2.12 (continuous lines). Suppose now that another inertial observer B meets A at the time these events occur according to A. B also sends out light rays RQU and SPV to illuminate the events, as shown (dashed lines). By symmetry
RU = SV and so these events are equidistant according to B. However, the
signal RQ was sent before the signal SP and so B concludes that the event Q took place well before P. Hence, events that A judges to be simultaneous, B judges not to be simultaneous. Similarly, A maintains that P, 0, and Q occurred simultaneously, whereas B maintains that they occurred in the order Q, then 0, and then P.
This relativity of simultaneity lies at the very heart of special relativity and resolves many of the paradoxes that the classical theory gives rise to, such as the Michelson-Morley experiment. Einstein realized the crucial role that simultaneity plays in the theory and gave the following simple thought experiment to illustrate its dependence on the observer. Imagine a train travelling along a straight track with velocity v relative to an observer A on the bank of the track. In the train, Bis an observer situated at the centre of one of the carriages. We assume that there are two electrical devices on the track which are the length of the carriage apart and equidistant from A. When the carriage containing B goes over these devices, they fire and activate two light sources situated at each end of the carriage (Fig. 2.13). From the configuration, it is clear that A will judge that the two events, when the light sources first switch on, occur simultaneously. However, B is travelling towards the light emanating from light source 2 and away from the light emanating from light source 1. Since the speed oflight is a constant, B will see the light from source 2 before seeing the light from source 1, and so will conclude that one light source comes on before the other.

B Fig. 2.12 Relativity of simultaneity.

Firing device 1___
X

V
_,,..- Firing device 2
X
Fig. 2.13 Light signals emanating from the two sources.

I 24 The k-calculus

'Light cone'
Fig. 2.14 Event relationships in special relativity.

We can now classify event relationships in space and time in the following manner. Consider any event O on A's world-line and the four regions, as shown in Fig. 2.14, given by the light rays ending and commencing at 0. Then the event E is on the light ray leaving 0 and so occurs after 0. •Any other inertial observer agrees on this; that is, no observer sees E illuminated before
A sends out the signal from 0. The fact that E is illuminated (because A originally sends out a signal at 0) subsequent to 0 is a manifestation of causality-the event O ultimately causes the event E. Similarly, the event F
can be reached by an inertial observer travelling from 0 with finite speed. Again, all inertial observers agree that F occurs after 0. Hence all the events in this region are called the absolute future of 0. In the same way, any event occurring in the region vertically below takes place in O's absolute past. However, the temporal relationship to 0 of events in the other two regions,
called elsewhere (or sometimes the relative past and relative future) will not be something all observers will agree upon. For example, one class of observers will say that G took place after 0, another class before, and a third class will say they took place simultaneously. The light rays entering and leaving 0 constitute what is called the light cone or null cone at 0 (the fact that it is a cone will become clearer later when we take all the spatial dimensions into account). Note that the world-line of any inertial observer or material particle passing through O must lie within the light cone at 0.

A
p
'~/
Fig. 2.15 The clock paradox.
Fig. 2.16 Spatial analogue of clock paradox.

2.11 The clock paradox
Consider three inertial observers as shown in Fig. 2.15, with the relative
velocity V,4.c = -v,4.8, Assume that A and B synchronize their clocks at O and
that C's clock is synchronized with B's at P. Let B and C meet after a time T according to B, whereupon they emit a light signal to A. According to the k-calculus, A receives the signal at R after a time kT since meeting B. Remembering that C is moving with the opposite velocity to B (so that k ➔ k - 1), then A will meet C at Q after a subsequent time lapse of k- 1 T. The
total time that A records between events O and Q is therefore (k + k- 1) T. For
k =I 1, this is greater than the combined time intervals 2T recorded between events OP and PQ by Band C. But should not tJie time lapse between the two
events agree? This is one form of the so-called clock paradox. However, it is not really a paradox, but rather what it shows is that in
relativity time, like distance, is a route-dependent quantity. The point is that the 2T measurement is made by two inertial observers, not one. Some people have tried to reverse the argument by setting B and C to rest, but this is not possible since they are in relative motion to each other. Another argument says that, when Band C meet, C should take B's clock and use it. But, in this case, the clock would have to be accelerated when being transferred to C and so it is no longer inertial. Again, some opponents of special relativity (e.g. H. Dingle) have argued that the short period of acceleration should not make such a difference, but this is analogous to saying that a journey between two points which is straight nearly all the time is about the same length as one which is wholly straight (as shown), which is absurd (Fig. 2.16). The moral is that in special relativity time is a more difficult concept to work with than the
absolute time of Newton. A more subtle point revolves around the implicit assumption that the
clocks of A and B are 'good' clocks, i.e. that the seconds of A's clock are the

2.12 The Lorentz transformations I 25
same as those of B's clock. One suggestion is that A has two clocks and adjusts the tick rate until they are the same and then sends one of them to Bat a very slow rate of acceleration. The assumption here is that the very slow rate of acceleration will not affect the tick rate of the clock. However, what is there to say that a clock may not be able to somehow add up the small bits of acceleration and so affect its performance. A more satisfactory approach would be for A and B to use identically constructed atomic clocks (which is after all what physicists use today to measure time). The objection then arises that their construction is based on ideas in quantum physics which is, a priori, outside the scope of special relativity. However, this is a manifestation of a point raised earlier, that virtually any real experiment which one can imagine carrying out involves more than one branch of physics. The whole structure is intertwined in a way which cannot easily be separated.

2.12 The Lorentz transformations

We have derived a number of important results in special relativity, which only involve one spatial dimension, by use of the k-calculus. Other results follow essentially from the trahsformations connecting inertial observers, the famous Lorentz transformations. We shall finally use the k-calculus to derive these transformations.
Let event P have coordinates (t, x) relative to A and (t', x') relative to B (Fig. 2.17). Observer A must send out a light ray at time t - x to illuminate P
at time t and also receive the reflected ray back at t + x (check this from
(2.2)). The world-line of A is given by x = 0, and the origin of A's time
coordinate tis arbitrary. Similar remarks apply to B, where we use primed quantities for B's coordinates (t', x'). Assuming A and B synchronize their clocks when they meet, then the k-calculus immediately gives

t' - x' = k(t - x), t + x = k(t' + x').

(2.7)

After some rearrangement, and using equation (2.4), we obtain the so-called

special Lorentz transformation

t-x

8
p (t,x) (r,x')

This is also referred to as a boost in the x-direction with speed v, since it takes one from A's coordinates to B's coordinates and B is moving away from A with speed v. Some simple algebra reveals the result (exercise)

Fig. 2.17 Coordinatization of events by inertial observers.

showing that the quantity t2 - x 2 is an invariant under a special Lorentz
transformation or boost.
To obtain the corresponding formulae in the case of three spatial dimen-
sions we consider Fig. 2.5 with two inertial frames in standard configuration.
Now, since by assumption the xz-plane (y = 0) of A must coincide with the
x'z'-plane (y' = 0) of B, then they and y' coordinates must be connected by a
transformation of the form

y = ny',

(2.9)

26 I The k-calculus

Fig. 2.18 The x- and y-axes reversed in Fig. 2.5.

y'

y

' Fig. 2.19 Figure 2.18 from B's poir:it of ~ - - - - - - - - - - - - - - - - - - ~ - , '

view.

z'

z

because

y =0 <-> y' =0.

We now make the assumption that space is isotropic, that is, it is the same in any direction. We then reverse the direction of the x- and y-axes of A and B and consider the motion from B's point of view (see Figs. 2.18 and 2.19). Clearly, from B's point of view, the roles of A and B have interchanged. Hence, by symmetry, we must have

y' = ny.

(2.10)

Combining (2.9) and (2.10), we find

n2 = 1 => n = ± 1.

The negative sign can be dismissed since, as v-+ 0, we must have y' -+ y, in which case n = 1. Hence, we find y' = y, and a similar argument for z
produces z' = z.

2.13 The four-dimensional woa:ld view
We now compare the special Lorentz transformation of the last section in relativistic units with the Galilean transformation connecting inertial observers in standard configuration (see Table 2.1). In a Galilean transformation, the absolute time coordinate remains invariant. However, in a

Table 2.1 Galilean transformation
t' = t x' = x- vt
Y'=Y
Z'=Z

Lorentz transformation
t- vx t'=---
(1 - v2);
x- vt
X'=---
(1 - v2 )t
y' =y
Z'= Z

2.13 The four-dimensional world view I 27
Lorentz transformation, the time and space coordinates get mixed up (note the symmetry in x and t). In the words of Minkowski, 'Henceforth space by itself, and time by itself are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.'
In the old Newtonian picture, time is split off from three-dimensional Euclidean space. Moreover, since we have an absolute concept of simultaneity, we can consider two simultaneous events with coordinates (t, Xi, Yi, zi) and (t, x 2 , y2 , z2), and then the square of the Euclidean distance between them,
(2.11) is invariant under a Galilean transformation. In the new special relativity picture, time and space merge together into a four-dimensional continuum called space-time. In this picture, the square of the interval between any two events (ti, Xi, Yi, zi) and (t2 , x 2 , y2 , z2 ) is defined by
s2 = (ti - t2)2 - (xi - x2)2 - (Yi - Y2)2 - (zi - z2)2, (2.12)
and it is this quantity which is invariant under a Lorentz transformation. Note that we always denote the square of the interval by s2, but the quantity s is only defined if the right-hand side of (2.12) is non-negative. If we consider
two events separated infinitesimally, (t, x, y, z) and (t + dt, x + dx, y + dy, z + dz), then this equation becomes
where all the infinitesimals are squared in (2.13). A four-dimensional spacetime continuum in which the above form is invariant is called Minkowski space-time and it provides the background geometry for special relativity.
So far, we have only met a special Lorentz transformation which connects two inertial frames in standard configuration. A full Lorentz transformation connects two frames in general position (Fig. 2.20). It can be shown that a full Lorentz transformation can be decomposed into an ordinary spatial rotation, followed by a boost, followed by a further ordinary rotation. Physically, the first rotation lines up the x-axis of S with the velocity v of S'. Then a boost in
y'
x'
Fig. 2.20 Two frames in general position.

28 I The k-calculus

this direction with speed v transforms S to a frame which is at rest relative to S'. A final rotation lines up the coordinate frame with that of S'. The spatial rotations introduce no new physics. The only new physical information arises from the boost and that is why we can, without loss of generality, restrict our attention to a special Lorentz transformation.

Exercises
2.1 (§2.4) Write down the Galilean transformation from observer S to observer S', where S' has velocity v1 relative to S. Find the transformation from S' to Sand state in simple terms how the transformations are related. Write down the Galilean transformation from S' to S", where S" has velocity v2 relative to S'. Find·the transformation from S to S". Prove that the Galilean transformations form an Abelian (commutative) group.
2.2 (§2,7) Draw the four fundamental k-factor diagrams (see Fig. 2.7) for the cases of two inertial observers A and B approaching and receding with uniform velocity v:
(i) as seen by A; (ii) as seen by B.
2.3 (§2.8) Show that v -+ - v corresponds to k -+ k- 1. If k > 1corresponds physically to a red shift of recession, what does k < 1 correspond to?
2.4 (§2.9) Show that (2.6) follows from (2.5). Use the composition law for velocities to prove that if O< v AB < 1 and 0 < VBc < 1, then O < VAc < 1.
2.5 (§2.9) Establish the fact that if vAB and Vac are small compared with the velocity of light, then the composition law for velocities reduces to the standard additive law of Newtonian theory.
2.6 (§2.10) In the event diagram of Fig. 2.14, find a geometrical construction for the world-line of an inertial observer passing through 0 who considers event G as occurring

simultaneously with 0. Hence describe the world-lines of inertial observers passing through 0 who consider G as occurring before or after 0.
2.7 (§2.11) Draw Fig. 2.15 from B's point of view. Coordinatize the events 0, R, and Q with respect to Band find the times between 0 and R, and R and Q, and compare them with A's timings.
2.8 (§2.12) Deduce (2.8) from (2.7). Use (2.7) to deduce directly that
Confirm the equality under the transformation formula (2.8).
2.9 (§2.12) In S, two events occur at the origin and a distance X along the x-axis simultaneously at t = 0. The time interval between the events in S' is T. Show that the spatial distance between the events in S' is (X2 + T 2)½ and determine the relative velocity v of the frames in terms of X and T.
2.10 (§2.13) Show that the interval between two events (t1,X1, Y1,zil and (t2,X2,Y2,z2) defined by
s2 = (t1 - t2)2 - (x, - X2)2 - (Yi - Y2l2 - (z, - Z2)2
is invariant under a special Lorentz transformation. Deduce the Minkowski line element (2.13) for infinitesimally
separated events. What does s2 become if t1 = t2, and how is
it related to the Euclidean distance <1 between the two events?

3.1 Standard derivation of the Lorentz transformations
We start this chapter by deriving again the Lorentz transformations, but this time by using a more standard approach. We shall· work in nonrelativistic units in which the speed of light is denoted by c. We restrict attention to two inertial observers S and S' in standard configuration. As before, we shall show that the Lorentz transformations follow from the two postulates, namely, the principle of special relativity and the constancy of the velocity of light.
Now, by the first postulate, if the observer S sees a free particle, that is, a particle with no forces acting on it, travelling in a straight line with constant velocity, then so will S'. Thus, using vector notation, it follows that under a transformation connecting the two frames
r = ro + ut -= r' = ro + u't'.
Since straight lines get mapped into straight lines, it suggests that the transformation between the frames is linear and so we shall assume that the transformation from S to S' can be written in matrix form

(3.1)

where Lis a 4 x 4 matrix of quantities which can only depend on the speed of

separation v. Using exactly the same argument as we used at the end of

§2.12, the assumption that space is isotropic leads to the transformations of y

and z being

y' = y and z' = z.

(3.2)

We next use the second postulate. Let us assume that, when the origins of S and S' are coincident, they zero their clocks, i.e. t = t' = 0, and emit a flash of light. Then, according to S, the light flash moves out radially from the origin with speed c. The wave front of light will constitute a sphere. If we define the quantity I by

+ + I(t, x, y, z) = x 2 y 2 z 2 - c 2 t 2 ,

then the events comprising this sphere must satisfy I = 0. By the second

I 30 The key attributes of special relativity

Fig. 3.1 A rotation in (x, T)-space.

postulate, S' must also see the light move out in a spherical wave front with speed c and satisfy
I'= x'z + y'2 + z'2 - c2t'2 = 0.

Thus it follows that, under a transformation connecting Sand S',

I = 0 -= I' = 0,

(3.3)

and since the transformation is linear by (3.1), we may conclude

I = nl',

(3.4)

where n is a quantity which can only depend on v. Using the same argument as we did in §2.12, we can reverse the role of S and S' and so by the relativity principle we must also have

I'= nl.

(3.5)

Combining the last two equations we find
n2 = 1 => n = ± l.

In the limit as v -+ 0, the two frames coincide and I' -+ I, from which we
conclude that we must take n = 1. Substituting n = 1 in (3.4), this becomes
x2 + Y2 + z2 _ c2t2 = x'2 + y'2 + z'2 _ c2t'2,
and, using (3.2), this reduces to

(3.6)

We next introduce imaginary time coordinates T and T' defined by

T=ict,

(3.7)

T' = ict',

(3.8)

in which case equation (3.6) becomes
x2 + r2 = x'2 + T'z.

In a two-dimensional (x, T)-space, the quantity x 2 + T 2 represents the

distance of a point P from the origin. This will only remain invariant under a

rotation in (x, T)-space (Fig. 3.1). If we denote the angle of rotation by 0, then

a rotation is given by

x' = xcos0 + Tsin0,

(3.9)

T' = -xsin0 + Tcos0.

(3.10)

Now, the origin of S' (x' = 0), as seen by S, moves along the positive x-axis of S with speed v and so must satisfy x = vt. Thus, we require
x' = 0 -= x = vt -= x = vT/ic,

using (3.7). Substituting this into (3.9) gives

tan0 = iv/c,

(3.11)

from which we see that the angle 0 is imaginary as well. We can obtain an expression for cos 0, using

I 3.2 Mathematical properties of Lorentz transformations 31
If we use the conventional symbol /3 for this last expression, i.e.

where the symbol = here means 'is defined to be', then (3.9) gives

x' = cos0(x + Ttan0) = f3[x + ict(iv/c)] = f3(x - vt).

Similarly, (3.10) gives
e T' = ict' = cos 0( - x tan + T) = f3 [ -x(iv/c) + ict],

from which we find

t' = /3(t - vx/c 2 ).

Thus, collecting the results together, we have rederived the special Lorentz transformation or boost (in non-relativistic units):

If we put c = 1, this takes the same form as we found in §2.13.
3.2 Mathematical properties of Lorentz transformations
From the results of the last section, we find the following properties of a special Lorentz transformation or boost.
1. Using the imaginary time coordinate T, a boost along the x-axis of speed v is equivalent to an imaginary rotation in (x, T)-space through an
e e angle given by tan = iv/c.
2. If we consider v to be very small compared with c, for which we use the notation v ~ c, and neglect terms of order v2/c 2, then we regain a Galilean transformation
t' = t, x' = x - vt, y' =y, z' = z.
We can obtain this result formally by taking the limit c ➔ oo in (3.12).
3. If we solve (3.12) for the unprimed coordinates, we get
t = f3(t' + vx' /c 2 ), x = f3(x' + vt'), y = y', z = z'.
This can be obtained formally from (3.12) by interchanging primed and unprimed coordinates and replacing v by - v. This we should expect from physical reasons, since, if S' moves along the positive x-axis of S with speed v, then S moves along the negative x'-axis ofS' with speed v, or, equivalently, S moves along the positive x'-axis of S' with speed - v.
4. Special Lorentz transformations form a group:
(a) The identity element is given by v = 0.
(b) The inverse element is given by -v (as in 3 above).

I 32 The key attributes of special relativity

(c) The product of two boosts with velocities v and v' is another boost with

velocity v". Since v and v' correspond to rotations in (x, T)-space of 0 and

0', where

tan0 = iv/c and tan0' = iv'/c,

then their resultant is a rotation of 0" = 0 + 0', where

w. "/c -_

tan 0,,

-_

tan (0

+ 0,)--

tan0 +
1 - tan

0ttaann00' ,

,

from which we find

,,
V

=1-+v -+vv-'v/'c-2

•

Compare this with equation (2.6) in relativistic units. (d) Associativity is left as an exercise.
5. The square of the infinitesimal interval between infinitesimally separated events (see (2.13)),

is invariant under a Lorentz transformation. We now turn to the key physical attributes of Lorentz transformations.
Throughout the remaining sections, we shall assume that S and S' are in
standard configuration with non-zero relative velocity v.
3.3 Length contraction
Consider a rod fixed in S' with endpoints x~ and x~, as shown in Fig. 3.2. In S, the ends have coordinates X,4 and x8 (which, of course, vary in time) given by the Lorentz transformations
(3.14)
In order to measure the lengths of the rod according to S, we have to find the x-coordinates of the end points at the same time according to S. If we denote the rest length, namely, the length in S', by
and the length in Sat time t = t,4 = t 8 by
l = XB - x,.,

Fig. 3.2 A rod moving with velocity v Telative to S.

then, subtracting the formulae in (3.14), we find the result

3.4 Time dilation I 33

Since
lvl < c - P> 1 - l < 10 ,
the result shows that the length of a body in the direction of its motion with uniform velocity vis reduced by a factor (1- v2/c2 )½. This phenomenon is called length contraction. Clearly, the body will have greatest length in its rest frame, in which case it is called the rest length or proper length. Note also that the length approaches zero as the velocity approaches the velocity of light.
In an attempt to explain the null result of the Michelson-Morley experiment, Fitzgerald had suggested the apparent shortening of a body in motion relative to the ether. This is rather different from the length contraction of special relativity, which is not to be regarded as illusory but is a very real effect. It is closely connected with the relativity ofsimultaneity and indeed can be deduced as a direct consequence of it. Unlike the Fitzgerald contraction, the effect is relative, i.e. a rod fixed in S appears contracted in S'. Note also that there are no contraction effects in directions transverse to the direction of motion.

3.4 Time dilation

Let a clock fixed at x' = x'.4 in S' record two successive events separated by an interval of time T0 (Fig. 3.3). The successive events in S' are (x'.4, t1) and
(x'.4, t1+ T0 ), say. Using the Lorentz transformation, we have in S

ti = P(t'i + vx'.4/c 2), t2 = P(t'i + T0 + vx'.4/c 2 ).

On subtracting, we find the time interval in S defined by

is given by

T = t2 - ti

World-line of clock

F'ig. 3.3 Successive events recorded by a clock fixed in S'.

Thus, moving clocks go slow by a factor (1- v2/c2)-t. This phenomenon is
called time dilation. The fastest rate of a clock is in its rest frame and is called s
its proper rate. Again, the effect has a reciprocal nature. Let us now consider an accelerated clock. We define an ideal clock to be
one unaffected by its acceleration; in other words, its instantaneous rate t1 depends only on its instantaneous speed v, in accordance with the above phenomenon of time dilation. This is often referred to as the clock hypothesis. The time recorded by an ideal clock is called the proper time -r (Fig. 3.4). to Thus, the proper time of an ideal clock between t0 and ti is given by

World-line of clock
f

F'ig. 3.4 Proper time recorded by an accelerated clock.

I 34 The key attributes of special relativity
The general question of what constitutes a clock or an ideal clock is a nontrivial one. However, an experiment has been performed where an atomic clock was flown round the world and then compared with an identical clock left back on the ground. The travelling clock was found on return to be running slow by precisely the amount predicted by time dilation. Another instance occurs in the study of cosmic rays. Certain mesons reaching us from the top of the Earth's atmosphere are so short-lived that, even had they been travelling at the speed oflight, their travel time in the absence of time dilation would exceed their known proper lifetimes by factors of the order of 10. However, these particles are in fact detected at the Earth's surface because their very high velocities keep them young, as it were. Of course, whether or not time dilation affects the human clock, that is, biological ageing, is still an open question. But the fact that we are ultimately made up of atoms, which do appear to suffer time dilation, would suggest that there is no reason by which we should be an exception.
3.5 Transformation of velocities
Consider a particle in motion (Fig. 3.5) with its Cartesian components of velocity being

and

dt'' = ( '

I
U

I
)

U1' 2, U3

dx'
(

dy' dz') dt'' dt'

in S'.

Taking differentials of a Lorentz transformation

t' = {J(t - vx/c 2 ), x' = {J(x - vt), y' = y, z' = z,

we get
dt' = {J(dt - vdx/c2 ),

dx' = {J(dx - v dt),

dy' = dy,

dz' = dz,

and hence

dx' {J(dx - v dt)

-dx- v• dt

u1 - v

u'i = dt' = {3(dt - vdx/c 2 ) = 1 _ _!__(v dx) = 1 - u1 v/c2 '

c2 dt

(3.18)

dy

cl2(v!:)] I dy'

dy

dt

U2=dt'= {3(dt-vdx/c2) = /3[1-

Fig. 3.5 Particle in motion relative to S and S'

~

s

S'

Path of particle

~---+---------------------------

dz'

dz

u3= dt' = /3(dt-vdx/c 2 )

3.6 Relationship between space-time diagrams of inertial observers I 35
dz dt

Notice that the velocity components u2 and u3 transverse to the direction of motion of the frame S' are affected by the transformation. This is due to the time difference in the two frames. To obtain the inverse transformations, simply interchange primes and unprimes and replace v by - v.

3.6 Relationship between space-time diagrams of inertial observers

We now show how to relate the space-time diagrams of Sand S' (see Fig. 3.6). We start by taking ct and x as the coordinate axes of S, so that a light ray has slope ¼n (as in relativistic units). Then, to draw the ct'- and x'-axes of S', we note from the Lorentz transformation equations (3.12)

ct er

ct' = 0 - ct = (v/c)x,

that is, the x'-axis, ct' = 0, is the straight line ct= (v/c)x with slope v/c < 1.

Similarly,

x'

x' = 0 - ct = (c/v)x,

that is, the ct'-axis, x' = 0, is the straight line ct= (c/v)x with slope c/v > 1. The lines parallel to O(ct') are the world-lines affixed points in S'. The lines parallel to Ox' are the lines connecting points at a fixed time according to S' and are called lines of simultaneity in S'. The coordinates of a general event Pare (ct, x) = (OR, OQ) relative to Sand (ct', x') = (OV, OU) relative to S'. However, the diagram is somewhat misleading because the length scales along the axes are not the same. To relate them, we draw in the hyperbolae
x2 - c2t2 = x'2 - c2t'2 = ± 1,

Fig. 3.6 The world-lines in S of the fixed
points and simultaneity lines of S'.

as shown in Fig. 3.7. Then, ifwe first consider the positive sign, setting ct'= 0,
we get x' = ± 1. It follows that PA is a unit distance on Ox'. Similarly, taking the negative sign and setting :t' = 0 we get ct' = ± 1 and so OB is the unit
measure on Oct'. Then the coordinates of Pin the frame S' are given by

(ct,'

x

')

=

(

ou
OA'

ov)
OB

•

Note the following properties from Fig. 3.7.

1. A boost can be thought of as a rotation through an imaginary angle in the (x, T)-plane, where Tis imaginary ti~. We have seen that this is equivalent, in the real (x, ct)~plane, to a skewing of the coordinate axes inwards through the same angle. (This was not appreciated by some past opponents of special relativity, who gave some erroneous counterarguments based on the mistaken idea that a boost could be represented by a real rotation in the (x, ct)-plane.)
2. The hyperbolae are the same for all frames and so we can draw in any number of frames in the same diagram and use the hyperbolae to calibrate them.

ct=l
x=l Fig. 3.7 Length scales in Sand S'.

36 I The key attributes of special relativity
3. The length contraction and time dilation effects can be read off directly from the diagram. For example, the world-lines of the endpoints of a unit
rod OA in S', namely x' = 0 and x' = 1, cut Ox in less than unit distance. Similarly world-lines x = 0 and x = 1 in S cut Ox' inside OE, from which
the reciprocal nature of length contraction is evident.
4. Even A has coordinates (ct', x') = (0, 1) relative to S', and hence by a Lorentz transformation coordinates (ct, x) = (Pv/c, P) relative to S. The
quantity OA defined by
OA = (c 2 t 2 + x2 )½ = {3(1 + v2/c2 )½
is a measure of the calibration factor

3.7 Acceleration in special relativity
We start with the inverse transformation of (3.18), namely,
+ U'1 V
U1 = 1 + u'1v/c2'

from which we find the differential

du - du'1 - ( u'1 + v ).!:..du' 1 - 1 + u'1v/c 2 (l + u'1v/c2)2 c2 1

1

du'1

= 132 (1 + u~v/c2)2.

Similarly, from the inverse Lorentz transformation

we find the differential

t = P(t' + x'v/c2 ),

dt = P(dt' + dx'v/c;:) = p(l + u'1v/c2)dt'.

Combining these results, we find t~t the x-component of the acceleration
}ransforms according to

Similarly, we find

du 1

1

du'1

dt = {3 3 (1 + u'1v/c2 ) 3 dt' •

(3.21)

du2

1

du2

VUz

du1

dt = _p2(1 + u'1v/c2 )2 cit' - c2{3 2 (1 + u'1 v/c 2 )3 dt''

du 3

1

du3

vu3

du'1

dt = {32 (1 + u'1v/c 2 )3 cit' - c2P2 (1 + u'1v/c 2)3 cit'.

(3.22) (3.23)

The inverse transformations can be found in the usual way. It follows from the transformation formulae that acceleration is not an
invariant in special relativity. However, it is clear from the formulae that acceleration is an absolute quantity, that is, all observers agree whether a body is accelerating or not. Put another way, if the acceleration is zero in one frame, then it is necessarily zero in any other frame. We shall see that this is

Table 3.1
Theory
Newtonian Special
relativity General
relativity

Position Relative Relat ive Relative

Velocity Relative Relative Relative

Time Absolute Relative Relative

Acceleration Absolute Absolute Relative

3.8 Uniform acceleration I 37

no longer the case in general relativity. We summarize the situation in Table 3.1, which indicates why the subject matter of the book is 'relativity' theory.
3.8 Uniform acceleration
The Newtonian definition of a particle moving under uniform acceleration is
du dt = constant.
This turns out to be inappropriate in special relativity since it would imply that u ➔ oo as t ➔ ro, which we know is impossible. We therefore adopt a different definition. Acceleration is said to be uniform in special relativity if it has the same value in any co-moving frame, that is, at each instant, the acceieration in an inertial frame travelling with the same velocity as the particle has the same value. This is analogous to the idea in Newtonian theory of motion under a constant force. For example, a spaceship whose motor is set at a constant emission rate would be uniformly accelerated in this sense. Taking the velocity of the particle to be u = u(t) relative to an inertial frame S, then at any instant in a co-moving frame S', it follows that v = u, the
velocity relative to S' is zero, i.e. u' = 0, and the acceleration is a constant, a
say, i.e. du'/dt' = a. Using (3.21), we find
!: 3 = ; 3 a = ( 1 - ::Ya.
We can solve this differential equation by separating the variables
du - - - -3 =adt
(1 - u2/c2 F
and integrating both sides. Assuming that the particle starts from rest at
t = t0 ; we find

Solving for u, we get

u =

dx
-
dt

=

a(t -

-
[1

-+

~-
a2 (t

-

t0~) ~ ~
t0 )2 /c2 J½ •

Next, integrating with respect to t, and setting x = x0 at t = t0 , produces

38 I The key attributes of special relativity

ct
II -+--+----1i---1r-t-----1----1--- x
Fig. 3.8 Hyperbolic motions.

This can be rewritten in the form
(x - x0 + c2/a)2
(c2 /a)2

(3.24)

which is a hyperbola in (x, ct)-space. If, in particular, we take x0 - c2/a = t0 = 0, then we obtain a family of hyperbolae for different values of a (Fig. 3.8). These world-lines are known as hyperbolic motions and, as we shall see in Chapter 23, they have significance in cosmology. It can be shown that the radar distance between the world-lines is a constant. Moreover, consider the regions I and II bounded by the light rays passing through 0, and a system of particles undergoing hyperbolic motions as shown in Fig. 3.8 (in some cosmological models, the particles would be galaxies). Then, remembering that light rays emanating from any point in the diagram do so at 45°, no particle in region I can communicate with another particle in region II, and vice versa. The light rays are called event horizons and act as barriers beyond which no knowledge can ever be gained. We shall see that event horizons will play an important role later in this book.

ct
3.9 The twin paradox

A

Uniform reversal

of direction

Uniform velocity

Uniform acceleration away from the Earth
Fig. 3.9 The twin paradox.

ct

Fig. 3.10 Simultaneity lines of A on the
outward and return journeys.

This is a form of the clock paradox which has caused the most controversy a controversy which raged on and off for over 50 years. The paradox concerns
two twins whom we shall call A and A. The twin A takes off in a spaceship for a return trip to some distant star. The assumption is that A is uniformly
accelerated to some given velocity which is retained until the star is reached, whereupon the motion is uniformly reversed, as shown in Fig. 3.9. According
to A, A's clock records slowly on the outward and return journeys and so, on
return, A will be younger than A. If the periods of acceleration are negligible compared with the periods of uniform velocity, then could not A reverse the argument and conclude that it is A who should appear to be the younger?
This is the basis of the paradox. The resolution rests on the fact that the accelerations, however brief, have
immediate and finite effects on A but not on A who remains inertial
throughout. One striking way of seeing this effect is to draw in the simultaneity lines of A for the periods of uniform velocity, as in Fig. 3.10. Clearly,
the period of uniform reversal has a marked effect on the simuitaneity lines. Another way oflooking at it is to see the effect that the periods of acceleration have on shortening the length of the journey as viewed by A. Let us be specific: we assume that the periods of acceleration are T1 , T2 , and T3 , and that, after the period Ti, A has attained a speed v = ✓3c/2. Then, from A's viewpoint, during the period T1 , A finds that more than half the outward
journey has been accomplished, in that A has transferred to a frame in which
the distance between the Earth and the star is more than halved by length contraction. Thus, A accomplishes the outward trip in about half the time
which A ascribes to it, and the same applies to the return trip. In fact, we could use the machinery of previous sections to calculate the time elapsed in both the periods of uniform acceleration and uniform velocity, and we would
again reach the conclusion that on return A will be younger than A. As we
have said before, this points out the fact that in special relativity time is a route-dependent quantity. The fact that in Fig. 3.9 A's world-line is longer

than A's, and yet takes less time to travel, is connected with the Minkowskian metric
= ds 2 c2 dt 2 - dx 2 - dy 2 - dz 2
and the negative signs which appear in it compared with the positive signs occurring in the usual three-dimensional Euclidean metric.

3.10 The Doppler effect

All kinds of waves appear lengthened when the source recedes from the observer: sounds are deepened, light is reddened. Exactly the opposite occurs when the source, instead, approaches the observer. We first of all calculate the classical Doppler effect.
Consider a source of light emitting radiation whose wavelength in its rest frame is A0 . Consider an observer S relative to whose frame the source is in motion with radial velocity ur. Then, if two successive pulses are emitted at time differing by dt' as measured by S', the distance these pulses have to travel will differ by an amount u,dt' (see Fig. 3.11). Since the pulses travel with speed c, it follows that they arrive at S with a time difference

giving

At= dt' + urdt'/c,
At/dt; = 1 + u,/c.

Now, using the fundamental relationship between wavelength and velocity, set
A= cAt and Ao= cdt'.

We then obtain the classical Doppler formula

3.10 The Doppler effect I 39

Let us now consider the special relativistic formula. Because of time dilation (see Fig. 3.3), the time interval between successive pulses according to S is /Jdt' (Fig. 3.12). Hence, by the same argument, the pulses arrive at S with a time difference
At = /J dt' + ur/J dt'/c

SL

s·r-.u,

(a)

------------------------~

u,dt'

S L _____________________ - -- -- --1----S'_u, Fig. 3.11 The Doppler effect:

(b)

(a) first pulse; (b) second pulse.

I 40 The key attributes of special relativity

and so this time we find that the special relativistic Doppler formula is

V

). l + u,/c
).0 (1 - v2/c 2 )½'

(3.26)

If the velocity of the source is purely radial, then u, = v and (3.26) reduces to

Fig. 3.12 The special relativistic Doppler shift.
V

This is the radial Doppler shift, and, if we set c = 1, we obtain (2.4), which is
the formula for the k-factor. Combining Figs. 2.7 and 3.12, the radial Doppler shift is illustrated in Fig. 3.13, where dt' is replaced by T. From equation (3.26), we see that there is also a change in wavelength, even when the radial velocity of the source is zero. For example, if the source is moving in a circle about the origin of S with speed v (as measured by an instantaneous comoving frame), then the transverse Doppler shift is given by

Fig. 3.13 The radial Doppler shift k.

This is a purely relativistic effect due to the time dilation of the moving source. Experiments with revolving apparatus using the so-called 'Mossbauer effect' have directly confirmed the transverse Doppler shift in full agreement with the relativistic formula, thus providing another striking verification of the phenomenon of time dilation.

Exercises
3.1 (§3.1) Sand S' are in standard configuration with v = etc
(0 < ct < 1). If a rod at rest in S' makes an angle of 45° with Ox in Sand 30° with O'x in S', then find ex.
3,2 (§3,1) Note from the previous question that perpendicular lines in one frame need not be perpendicular in another frame. This shows that there is no obvious meaning to the phrase 'two inertial frames are parallel', unless their relative velocity is along a common axis, because the axes of either frame need not appear rectangular in the other. Verify that the Lorentz transformation between frames in standard
configuration with relative velocity v = (v, 0, 0) may be
written in vector form
r' = r + ( -v;·r; (/1 - 1) - {1t) v, t' = p( t- 2v·r) .
where r = (x, y, z). The formulae are said to comprise the
'Lorentz transformation without relative rotation'. Justify

this name by showing that the formulae remain valid when the frames are not in standard configuration, but are parallel in the sense that the same rotation must be applied to each frame to bring the two into standard configuration (in which
case v is the velocity of S' relative to S, but v = (v, 0, 0) no
longer applies).
3.3 (§3.1) Prove that the first two equations of the special Lorentz transformation can be written in the form
ct' = - xsinh </> + ctcosh </>, x' = xcosh </> - ctsinh </>,
= where the rapidity </> is defined by </> tanh- 1 (v/c).
Establish also the following version of these equations:
ct'+ x' = e-4>(ct + x), ct' - x' = e4>(ct - x),
e24> = (1 + v/c)/(1 - v/c).
What relation does </> have to 0 in equation (3.11)?

3.4 (§3.1) Aberration refers to the fact that the direction of
travel of a light ray depends on the motion of the observer.
Hence, if a telescope observes a star at an inclination 0' to
the horizontal, then show that classically the 'true' inclina-
tion 0 of the star is related to 0' by

sin0 tan 0' = - - - - ,
cos0 + v/c

where v is the velocity of the telescope relative to the star. Show that the corresponding relativistic formula is

sin0

tan 0'

=

-{J(cos

---
0 + v/c)

3.5 (§3.2) Show that special Lorentz transformations are associative, that is, if O(vi) represents the transformation from observer S to S', then show that

(O(v 1)0(v2 ))0(v3 ) = O(vi)(O(v2 )0(v3 )).
3.6 (§3,3) An athlete carrying a horizontal 20-ft-long pole runs at a speed v such that (1 - v2/c 2 )-½ = 2 into a 10-ftlong room and closes the door. Explain, in the athlete's frame, in which the room is only 5 ft long, how this is possible. [Hint: no effect travels faster than light.] Show that the minimum length of the room for the performance of this
trick is 20/(.J3 + 2) ft. Draw a space-time diagram to indic-
ate what is going on in the rest frame of the athlete.
3.7 (§3.5) A particle has velocity u = (u1 , u2 , u3) in Sand u' = (u1, u;, u;) in S'. Prove from the velocity trans-
formation formulae that

c2(c2 - u'2)(c2 - v2) c2 - u2 = - - - - - - - - .
(c 2 + u'1 v)2

Deduce that, if the speed of a particle is less than c in any one inertial frame, then it is less than c in every inertial frame.
3.8 (§3.7) Check the transformation formulae for the components of acceleration (3.21)-(3.23). Deduce that acceleration is an absolute quantity in special relativity.
3.9 (§3.8) A particle moves from rest at the origin of a frame Salong the x-axis, with constant acceleration ex (as measured in an instantaneous rest frame). Show that the equation of motion is

Exercises I 41

and prove that the light signals emitted after time t = c/cx at
the origin will never reach the receding particle. A standard clock carried along with the particle is set to read zero at the beginning of the motion and reads Tat time tin S. Using the clock hypothesis, prove the following relationships:

U

IXT

- = tanh-,

C

C

(

1-

-uc22)-½

=

coshIX-T ,
C

-IXt

=

IXT sinh-,

C

C

x = ~ c2 ( cosh(X~T - 1) .

Show that, if T 2 <l! c2/a2, then, during an elapsed time Tin
the inertial system, the particle clock will record approxim-
ately the time T(l - a2 T 2/6c2).
If a = 3g, find the difference in recorded times by the
spaceship clock and those of the inertial system

(a) after 1 hour; (b) after 10 days.
3.10 (§3.9) A space traveller A travels through space with uniform acceleration g (to ensure maximum comfort). Find the distance covered in 22 years of A's time. [Hint: using years and light years as time and distance units, respectively,
then g = 1.03]. If on the other hand, A describes a straight
double path X YZ YX, with acceleration g on X Y and Z Y, and deceleration on YZ and YX, for 6 years each, then draw a space-time diagram as seen from the Earth and find by
how much the Earth would have aged in 24 years of A's
time.
3.11 (§3.10) Let the relative velocity between a source of light and an observer be u, and establish the classical Doppler formulae for the frequency shift:
source moving, observer at rest: v=-V-o -,
1 + u/c

observer moving, source at rest: v = (1 - u/c)v0 ,

where v0 is the frequency in the rest frame of the source. What are the corresponding relativistic results?
3.12 (§3.10) How fast would you need to drive towards a red traffic light for the light to appear green? [Hint: ).,ed ::::: 7 x 10- 5 cm, ).1, ••• ::::: 5 x 10- 5 cm.]

4.1 Newtonian theory
Before discussing relativistic mechanics, we shall review some basic ideas of Newtonian theory. We have met Newton's first law in §2.4, and it states that a body not acted upon by a force moves in a straight line with U{liform velocity. The second law describes what happens if an object changes its velocity. In this case, something is causing it to change its velocity and this something is called a force. For the moment, let us think of a force as something tangible like a push or a pull. Now, we know from experience that it is more difficult to push a more massive body and get it moving than it is to push a less massive body. This resistance of a body to motion, or rather change in motion, is called its inertia. To every body, we can ascribe, at least at one particular time, a number measuring its inertia, which (again for the moment) we shall call its mass m. If a body is moving with velocity v, we define its linear momentump to be the product of its mass and velocity. Then Newton's second law (N2) states that the force acting on a body is equal to the rate of change oflinear momentum. The third law (N3) is less general and talks about a restricted class of forces called internal forces, namely, forces acting on a body due to the influence of other bodies in a system. The third law states that the force acting on a body due to the influence of the other bodies, the so-called action, is equal and oppo_site to the force acting on these other bodies due to the influence of the first body, the so-called reaction. We state the two laws below.
Then, for a body of mass m with a force F acting on it, Newton's second law states

4.1 Newtonian theory I 43

If, in particular, the mass is a constant, then

dv F=m-=ma

(4.2)

dt

where a is the acceleration. Now, strictly speaking, in Newtonian theory, all observable quantities
should be defined in terms of their measurement. We have seen how an observer equipped with a frame of reference, ruler, and clock can map the events of the universe, and hence measure such quantities as position, velocity, and acceleration. However, Newton's laws introduce the new concepts of force and mass, and so we should give a prescription for their measurement. Unfortunately, any experiment designed to measure these quantities involves Newton's laws themselves in its interpretation. Thus, Newtonian mechanics has the rather unexpected property that the operational definitions of force and mass which are required to make the laws physically significant are actually contained in the laws themselves.
To make this more precise, let us discuss how we might use the laws to measure the mass of a body. We consider two bodies isolated from all other influences other than the force acting on one due to the influence of the other and vice versa (Fig. 4.1). Since the masses are assumed to be constant, we have, by Newton's second law in the form (4.2),

= = F1 m1 a1 and F2 m2 a2 .

In addition, by Newton's third law, F1 = -F2 . Hence, we have

Fig, 4.1 Measuring mass by mutually induced accelerations.

Therefore, if we take one standard body and define it to have unit mass, then we can find the mass of the other body, by using (4.3). We can keep doing this with any other body and in this way we can calibrate masses. In fact, this method is commonly used for comparing the masses of elementary particles. Of course, in practice, we cannot remove all other influences, but it may be possible to keep them almost constant and so neglect them.
We have described how to use Newton's laws to measure mass. How do we measure force? One approach is simply to use Newton's second law, work out ma for a body and then read off from the law the force acting on m. This is consistent, although rather circular, especially since a force has independent properties of its own. For example, Newton has provided us with a way for working out the force in the case of gravitation in his universal law of
gravitation (UG).

If we denote the constant of proportionality by G (with value 6.67 x 1o- 11 in
m.k.s. units), the so-called Newtonian constant, then the law is (see Fig. 4.2)

F r

Fig. 4.2 Newton's universal law of gravitation.

44 I The elements of relativistic mechanics

where a hat denotes a unit vector. There are other force laws which can be

stated separately. Again, another independent property which holds for

certain forces is contained in Newton's third law. The standard approach to

defining force is to consider it as being fundamental, in which case force laws

can be stated separately or they can be worked out from other considerations.

We postpone a more detailed critique of Newton's laws until Part C of the

book.

Special relativity is concerned with the behaviour of material bodies and

light rays in the absence of gravitation. So we shall also postpone a detailed

consideration of gravitation until we discuss general relativity in Part C of the

book. However, since we have stated Newton's universal laws of gravitation

in (4.4), we should, for completeness, include a statement of Newtonian

gravitation for a distribution of matter. A distribution of matter of mass
density p = p(x, y, z, t) gives rise to a gravitational potential cf> which satisfies

Poisson's equation

1

at points inside the distribution, where the Laplacian operator V2 is given in Cartesian coordinates by
a2 ,J2 a2 v2 = 8x2 + fJy2 + 8z2"
At points external to the distribution, this reduces to Laplace's equation

We assume that the reader is familiar with this background to Newtonian theory.
4.2 Isolated systems of particles in Newtonian mechanics
In this section, we shall, for completeness, derive the conservation of linear momentum in Newtonian mechanics for a system of n particles. Let the ith particle have constant mass mi and position vector ri relative to some arbitrary origin. Then the ith particle possesses linear momentum p1 defined
by p1 = mii'i, where the dot denotes differentiation with respect to time t. If Fi
is the total force on mi, then, by Newton's second law, we have
(4.7)
The total force F1on the ith particle can be divided into an external force Ff'1 due to any external fields present and to the resultant of the internal forces. We write
L n
= Fi Ff'1 + Fij, j= l
where Fli is the force or the ith particle due to the jth particle and where, for

convenience, we define Fii = 0. If we sum over i in (4.7), we find

- L L - L L + d •

• dp1 •

•

dt i= 1 Pi =

i= 1

dt

=

Ft"1
1= 1

Fi}·
i,J= 1

L ;= Using New~on's third law, namely, Fil= -F1i, then the last term is zero and

we obtain P = pext, where P =

1 p 1 is termed the total linear momentum

I;= of the system and p••1 = 1 Ff•1 is the total external force on the system.

If, in particular, the system of particles is isolated, then

pext = 0 => p = C,

where c is a constant vector. This leads to the law of the conservation of linear momentum of the system, namely,

I 4.3 Relativistic mass 45

4.3 Relativistic mass

The transition from Newtonian to relativistic mechanics is not, in fact, completely straightforward, because it involves at some point or another the introduction of ad hoc assumptions about the behaviour of particles in relativistic situations. We shall adopt the approach of trying to keep as close to the non-relativistic definition of energy and momentum as we can. This leads to results which in the end must be confronted with experiment. The ultimate justification of the formulae we shall derive resides in the fact that they have been repeatedly confirmed in numerous laboratory experiments in particle physics. We shall only derive them in a simple case and state that the arguments can be extended to a more general situation.
It would seem plausible that, since length and time measurements are dependent on the observer, then mass should also be an observer-dependent quantity. We thus assume that a particle which is moving with a velocity u relative to an inertial observer has a mass, which we shall term its relativistic mass, which is some function of u, that is,

m = m(u),

(4.9)

where the problem is to find the explicit dependence of m on u. We restrict attention to motion along a straight line and consider the special case of two equal particles colliding inelastically (in which case they stick together), and look at the collision from the point of view of two inertial observers Sand S' (see Fig. 4.3). Let one of the particles be at rest in the frame S and the other possess a velocity u before they collide. We then assume that they coalesce and that the combined object moves with velocity U. The masses of the two particles are respectively m(O) and m(u) by (4.9). We denote m(O) by m0 and term it the rest mass of the particle. In addition, we denote the mass of the combined object by M( U). Ifwe take S' to be the centre-of-mass frame, then it should be clear that, relative to S', the two equal particles collide with equal and opposite speeds, leaving the combined object with mass M0 at rest. It follows that S' must have velocity U relative to S.

I 46 The elements of relativistic mechanics
@-u m(u)

l
Before
Ins

Fig. 4.3 The inelastic collision in the frames Sand S'.

0---------+ u
m(U)

®O----+-U M(U)
u~
m{U)
©O Mo

After
l
Before
in S'
After

We shall assume both conservation ofrelativistic mass and conservation of linear momentum and see what this leads to. In the frame S, we obtain

m(u) + m0 = M(U), m(u)u + 0 = M(U)U,

from which we get, eliminating M ( U ),

(u ~ m(u) = m0

U ).

(4.10)

The left-hand particle has a velocity U relative to S', which in turn has a velocity U relative to S. Hence, using the composition of velocities law, we can compose these two velocities and the resultant velocity must be identical with the velocity u of the left-hand particle in S. Thus, by (2.6) in nonrelativistic units,
2U u=(1+u2; c2)·

Solving for U in terms of u, we obtain the quadratic

u + U2 - ( -2-c-;2- )

c2 = 0,

c: y- r rl which has roots u=: ±[

c2 = : [1 ± ( 1 - ::

In the limit u -+ 0, this must produce a finite result, so we must take the negative sign (check), and, substituting in (4.10), we find finally

where

This is the basic result which relates the relativistic tnass of a moving particle to its rest mass. Note that this is the same in structure as the time dilation formula (3.16), i.e. T=PT0 , where P=(l-v2 /c2 )-t, except that time

I 4.4 Relativistic energy 47

dilation involves the factor f3 which depends on the velocity v of the frame S' relative to S, whereas y depends on the velocity u of the particle relative to S. Ifwe plot m against u, we see that relativistic mass increases without bound as u approaches c (Fig. 4.4).
It is possible to extend the above argument to establish (4.11) in more general situations. However, we emphasize that it is not possible to derive the result a priori, but only with the help of extra assumptions. However it is produced, the only real test of the validity of the result is in the experimental arena and here it has been extensively confirmed.
4.4 Relativistic energy

m(u)
mo
Fig. 4.4 Relativistic mass as a function of velocity.

Let us expand the expression for the relativistic mass, namely,

m(u) = ym0 = m0 (1 - u2/c 2 )-½,

in the case when the velocity u is small compared with the speed of light c.

Then we get

(u m(u) = m0 + c12(z1-m0u2 ) + 0

4
c4 ) ,

(4.13)

where the final term stands for all terms of order (u/c)4 and higher. If we multiply both sides by c2, then, apart from the constant m0 c2 , the right-hand side is to first approximation the classical kinetic energy (k.e.), that is,

mc2 = m0c2 + ½m0 u2 + ••· ~ constant+ k.e.

(4.14)

We have seen that relativistic mass contains within it the expression for classical kinetic energy. In fact, it can be shown that the conservation of relativistic mass leads to the conservation of kinetic energy in the Newtonian approximation. As a simple example, consider the collision of two particles
with rest mass m0 and m0 , initial velocities v1 and ii1 , and final velocities v2
and ii2 , respectively (Fig. 4.5). Conservation of relativistic mass gives
m m0 (1 - vUc2 )-½ + 0 (1 - iir/c2 )-½ = m0 (1 - v~/c2 )-½
+ mo(l - vVc2 )-½. (4.15)
If we now assume that v1 , v2 , v1 , and v2 are all small compared with c, then
we find (exercise) that the leading terms in the expansion of (4.15) give

(4.16)

which is the usual conservation of energy equation. Thus, in this sense, conservation of relativistic mass includes within it conservation of energy. Now, since energy is only defined up to the addition of a constant, the result

Before

mO---V2 0

After

Fig. 4.5 Two colliding particles.

I 48 The elements of relativistic mechanics
(4.14) suggest that we regard the energy E of a particle as given·by

This is one of the most famous equations in physics. However, it is not just a

mathematical relationship between two different quantities, namely energy

and mass, but rather states that energy and mass are equivalent concepts.

Because of the arbitrariness in the actual value of E, a better way of stating

the relationship is to say that a change in energy is equal to a change in

relativistic mass, namely,

AE = Amc2

Using conventional units, c2 is a large number and indicates that a small change in mass is equivalent to an enormous change in energy. As is well known, this relationship and the deep implications it carries with it for peace and war, have been amply verified. For obvious reasons, the term m0 c-2 is termed the rest energy of the particle. Finally, we point out that conservation of linear momentum, using relativistic mass, leads to the usual conservation law in the Newtonian approximation. For example (exercise), the collision problem considered above leads to the usual conservation of linear momentum equation for slow-moving particles:

(4.18)

Extending these ideas to three spatial dimensions, then a particle moving with velocity u relative to an inertial frame S has relativistic mass m, energy E, and linear momentum p given by

Some straightforward algebra (exercise) reveals that

(E/c) 2 - p; - p; - p; .= (m0 c) 2 ,

(4.20)

where m0 c is an invariant, since it is the same for all inertial observers. If we compare this with the invariant (3.13), i.e.

(ct)2 - x2 - y2 - z2 = s2,

then it suggests that the quantities (E/c, p,,, Py, p,) transform under a Lorentz transformation in the same way as the quantities (ct, x, y, z). We shall see in Part C that the language of tensors provides a better framework for discussing transformation laws. For the moment, we shall assume that energy and momentum transform in an identical manner and quote the results. Thus, in a frame S' moving in standard configuration with velocity v relative to S, the transformation equations are (see (3.12))

The inverse transformations are obtained in the usual way, namely, by

interchanging primes and unprimes and replacing v by -v, which gives

If, in particular, we take S' to be the instantaneous rest frame of the particle, then p' = 0 and E' = E0 = m0c2. Substituting in (4.22), we find

,

moc2

2

E=/3E =(1-v2;c2)½=mc'

where m = m0 (1 - v2/c2)-½ andp = (f3vE' / c2, 0, 0) = (mv, 0, 0) = mv, which
are precisely the values of the energy, mass, and momentum arrived at in (4.19) with u replaced by v.

4.5 Photons
At the end of the last century, there was considerable conflict between theory and experiment in the investigation of radiation in enclosed volumes. In an attempt to resolve the difficulties, Max Planck proposed that light and other electromagnetic radiation consisted of individual 'packets' of energy, which he called quanta. He suggested that the energy E of each quantum was to depend on its frequency v, and proposed the simple law, called Planck's hypothesis,

where his a universal constant known now as Planck's constant. The idea of
the quantum was developed further by Einstein, especially in attempting to explain the photoelectric effect. The effect is to do with the ejection of electrons from a metal surface by incident light (especially ultraviolet) and is strongly in support of Planck's quantum hypothesis. Nowadays, the quantum theory is well established and applications of it to explain properties of molecules, atoms, and fundamental particles are at the heart of modern physics. Theories of light now give it a dual wave- particle nature. Some properties, such as diffraction and interference, are wavelike in nature, while the photoelectric effect and other cases of the interaction of light and atoms are best described on a particle basis.
The particle description oflight consists in treating it as a stream of quanta called photons. Using equation (4.19) and substituting in the speed of light,
u = c, we find
(4.24)
that is, the rest mass of a photon must be zero! This is not so bizarre as it first seems, since no inertial observer ever sees a photon at rest - its speed is always c - and so the rest mass of a photon is merely a notional quantity. If we let ii be a unit vector denoting the direction of travel of the photon, then
P = (Px, Py, P,) = pn,
and equation (4.20) becomes
(E/ c) 2 - p2 = 0.

I 4.5 Photons 49

50 I The elements of relativistic mechanics

Taking square roots (and remembering c and pare positive), we find that the energy E of a photon is related to the magnitude p of its momentum by

E = pc.

(4.25)

Finally, using the energy-mass relationship E = mc2 , we find that the rela-
tivistic mass of a photon is non-zero and is given by

m = p/ c.

(4.26)

Combining these results with Planck's hypothesis, we obtain the following formulae for the energy E, relativistic mass m, and linear momentum p of the photon:

It is gratifying to discover that special relativity, which was born to reconcile conflicts in the kinematical properties of light and matter, also includes their mechanical properties in a single all-inclusive system.
We finish this section with an argument which shows that Planck's hypothesis can be derived directly within the framework of special relativity. We have already seen in the last chapter that the radial Doppler effect for a moving source is given by (3.27), namely
~=(l+v/c)t 10 1 - v/c '
where Ao is the wavelength in the frame of the source and l is the wavelength in the frame of the observer. We write this result, instead, in terms of
frequency, using the fundamental relationships c = Av and c = Ao v0 , to
obtain

Vo=(~)½.
v 1 - v/c

(4.28)

Now, suppose that the source emits a light flash of total energy £ 0 . Let us use the equations (4.22) to find the energy received in the frame of the observer S. Since, recalling Fig. 3.11, the light flash is travelling along the negative xdirection of both frames, the relationship (4.25) leads to the result p~ = -E0/ c, with the other primed components of momentum zero. Substituting in the first equation of (4.22), namely,

we get

E = /3(E' + vp~),

or
E0 =(1+v/c)t· E 1 - v/c

(4.29)

Combining this with equation (4.28), we obtain

E0 E
Vo V

Since this relationship holds for any pair of inertial observers, it follows that

the ratio must be a universal constant, which we call h. Thus, we have derived
Planck's hypothesis E = hv.
We leave our considerations of special relativity at this point and turn our attention to the formalism of tensors. This will enable us to reformulate . special relativity in a way which will aid our transition to general relativity, that is, to a theory of gravitation consistent with special relativity.

Exercises I 51

Exercises

4.1 (§4.l) Discuss the possibility of using force rather than mass as the basic quantity, taking, for example, a standard weight at a given latitude as the unit of force. How should one then define and measure the mass of a body?
4.2 (§4.3) Show that, in the inelastic collision considered in §4.3, the rest mass of the combined object is greater than the sum of the original rest masses. Where does this increase derive from?
4.3 (§4.3) A particle of rest mass m0 and speed u strikes a
stationary particle of rest mass m0 . If the collision is perfectly inelastic, then find the rest mass of the composite particle.

4.4 (§4.4) (i) Establish the transition from equation (4.15) to (4.16).
(ii) Establish the Newtonian approximation equation (4.18).

4.5 (§4.4) Show that (4.19) leads to (4.20). Deduce_(4.21).
4.6 (§4.4) Newton's second law for a particle of relativistic mass mis
d F=-(mu).
dt

Define the work done d E in moving the particle from r to r+ dr. Show that the rate of doing work is given by

dE d(mu) -=--·u. dt dt

Use the definition of relativistic mass to obtain the result

dE
-=

m0

ud-u

dt (1 - u2/ c2)312 dt

[

Hint: u·

du= dt

u du] dt

.

Express this last result in terms of dm/ dt and integrate to obtain

E = mc2 + constant.

4.7 (§4.4) Two particles whose rest masses are m1 and m2 move along a straight line with velocities u1 and u2 , measured in the same direction. They collide inelastically to form a new particle. Show that the rest mass and velocity of the

new particle are m3 and u3 , respectively, where
ml= mf + m~ + 2m 1 m 2 y1 y2(1 - u 1 u 2/ c 2 ), + m1Y1U1 m2r2u2
= U3 + m1 Y1 m2r2
with

4.8 (§4.4) A particle of rest mass m0 , energy e0 , and momentum p0 suffers a head on elastic collision (i.e. masses of particles unaltered) with a stationary mass M . In the collision, M is knocked straight forward, with energy E and momentum P, leaving the first particle with energy e and p. Prove that

P

=2-p 0

M -

(-e0-+-M-c-2 )

-

2Meo + M2 c2 + m~c2

and

Po(m2c2 - M2c2) p=
2MeO+ M2 c2 + m~c2

What do these formulae become i!) the classical limit?

4.9 (§4.4) Assume that the formulae (4.19) hold for a tachyon, which travels with speed v > c. Taking the energy to be a measurable quantity, the,n deduce that the rest mass of a tachyon is imaginary and define the real quantity µ0 by
mo= iµo .
If the tachyon moves along the x-axis and if we assume that the x-component of the momentum is a real positive quantity, then deduce

m

=

V
-a

.

µ

o

,

lvl

E = mc 2,

where a.= (v2 /c2 - 1)-t. Plot E/mOc2 against v/c for both tachyons and sub-
luminal particles.

4.10 (§4.5) Two light rays in the (x, y)-plane of an inertial observer, making angles 0 and -0, respectively, with the positive x axis, collide at the origin. What is the velocity v of

52 I The elements of relativistic mechanics
the inertial observer (travelling in standard configuration) who sees the light rays collide head on? 4.11 (§4.5) An atom of rest mass m0 is at rest in a laboratory and absorbs a photon of frequency v. Find the velocity and mass of the recoiling particle.
4.12 (§4.5) An atom at rest in a laboratory emits a photon and recoils. If its initial mass is m0 and it loses the rest energy

e in the emission, prove that the frequency of the emitted photon is given by

5.1 Introduction
To work effectively in Newtonian theory, one really needs the language of vectors. This language, first of all, is more succinct, since it summarizes a set of three equations in one. Moreover, the formalism -o_f vectors helps to solve certain problems more readily, and, most important of all, the language reveals structure and thereby offers insight. In exactly the same way, in relativity theory, one needs the language of tensors. Again, the language helps to summarize sets of equations succinctly and to solve problems more readily, and it reveals structure in the equations. This part of the book is devoted to learning the formalism of tensors which is a pre-condition for the rest of the book.
The approach we adopt is to concentrate on the technique of tensors without taking into account the deeper geometrical significance behind the theory. We shall be concerned more with what you do with tensors rather than what tensors actually are. There are two distinct approaches to the teaching of tensors: the abstract or index-free (coordinate-free) approach and the conventional approach based on indices. There has been a move in recent years in some quarters to introduce tensors from the start using the more modern abstract approach (although some have subsequently changed their mind and reverted to the conventional approach). The main advantage of this approach is that it offers deeper geometrical insight. However, it has two disadvantages. First of all, it requires much more of a mathematical background, which in turn takes time to develop. The other disadvantage is that, for all its elegance, when one wants to do a real calculation with tensors, as one frequently needs to, then recourse has to be made to indices. We shall adopt the more conventional index approach, because it will prove faster and more practical. However, we advise those who wish to take their study of the subject further to look at the index-free approach at the first opportunity.
We repeat that the exercises are seen as integral to this part of the book and should not be omitted.
5.2 Manifolds and coordinates
We shall start by working with tensors defined inn dimensions since, and it is part of the power of the formalism, there is little extra effort involved. A tensor is an object defined on a geometric entry called a (differential) manifold. We shall not define a manifold precisely because it would involve

56 I Tensor algebra

Fig. 5.1 Plane polar coordinate curves.

us in too much of a digression. But, in simple terms, a manifold is something which 'locally' looks like a bit of n-dimensional Euclidean space JR". For example, compare a 2-sphere S2 with the Euclidean plane JR2. They are clearly different. But a small bit of S2 looks very much like a small bit of JR2 (if we neglect metrical properties). The fact that S 2 is 'compact', i.e. in some sense finite, whereas JR2 'goes off to infinity' is a global property rather than a local property. We shall not say anything precise about global properties-the topology of the manifold-, although the issue will surface when we start to look carefully at solutions of Einstein's equations in general relativity.
We shall simply take an n-dimensional manifold M to be a set of points such that each point possesses a set of n coordinates (x1, x2, ... , x"), where each coordinate ranges over a subset of the reals, which may, in particular,
range from - oo to + oo. To start off with, we can think of these coordinates
as corresponding to distances or angles in Euclidean space. The reason why the coordinates are written as superscripts rather than subscripts will become clear later. Now the key thing about a manifold is that it may not be possible to cover the whole manifold by one non-degenerate coordinate system, namely, one which ascribes a unique set of n coordinate numbers to each point. Sometimes it is simply convenient to use coordinate systems with degenerate points. For example, plane polar coordinates (R, ¢) in the plane have a degeneracy at the origin because ¢ is indeterminate there (Fig. 5.1). However, here we could avoid the· degeneracy at the origin by using Cartesian coordinates. But in other circumstances we have no choice in the matter. For example, it can be shown that there is no coordinate system which covers the whole of a 2-sphere S2 without degeneracy. The smallest number needed is two, which is shown schematically in Fig. 5.2. We therefore

,..,,..,..,..,.,.,,.,..,.__ _ _ First non-degenerate coordinate system covering North Pole

Overlap of coordinate ) systems at equator

Fig. 5.2 Two non-degenerate coordinate systems covering an 52 •

"""'~~~'----Second non-degenerate coordinate system covering South Pole

Overlap of coordinate patches

Manifold M

Fig. 5.3 Overlapping coordinate patches in a manifold.

Coordinate patch

work with coordinate systems which cover only a portion of the manifold and which are called coordinate patches. Figure 5.3 indicates this schematically. A set of coordinate patches which covers the whole manifold is called an atlas. The theory of manifolds tells us how to get from one coordinate patch to another by a coordinate transformation in the overlap region. The behaviour of geometric quantities under coordinate transformations lies at the heart of tensor calculus.

I 5.3 Curves and surfaces 57

5.3 Curves and surfaces
Given a manifold, we shall be concerned with points in it and subsets of points which define curves and surfaces of different dimensions. We shall frequently define these curves and surfaces parametrically. Thus (in exactly the same way as is done in Euclidean 2- and 3-space), since a curve has one degree of freedom it depends on one parameter and so we define a curve by the parametric equations

where u is the parameter and x1 ( u), x2( u), ... , xn(u) denote n functions of u.

Similarly, since a subspace or surface of m dimensions (m < n) has m degrees

of freedom, it depends on m parameters and it is given by the parametric

equations

xa = xa(u1, u2, ••• 'u"') (a= 1, 2, ... 'n).

(5.2)

If, in particular, m = n - 1, the subspace is called a hypersurface. In this case,

xa=xa(u1,u2, ... ,u"- 1 ) (a=l,2, ... ,n)

and the n - 1 parameters can be eliminated from these n equations to give one equation connecting the coordinates, i.e.

From a different but equivalent point 9fview, a point in a general position in a manifold has n degrees of freedom. If it is restricted to lie in a hypersurface, an (n - 1)-subspace, then its coordinates must satisfy one constraint, namely,
f (x1, x2, ... , x") = 0,

which is the same as equation (5.3). Similarly, points in an m-dimensional subspace (m < n) must satisfy n - m constraints

f 1 (x1, .t2, ... , x") = 0, }

f 2 (x1, x2, ... , x") = 0,

(5.4)

f"-"'(x1, x 2, ... , x") = 0,

which is an alternative to the parametric representation (5.2).

58 I Tensor algebra

5.4 Transformation of coordinates

As we have seen, a point in a manifold can be covered by many different coordinate patches. The essential point about tensor calculus is that when we make a statement about tensors we do not wish it simply to hold just for one coordinate system but rather for all coordinate systems. Consequently, we need to find out how quantities behave when we go from one coordinate system to another one. We therefore consider the change of coordinates
x0 -+ x'0 given by the n equations

x'• = f 0 (x1, x 2, .•. , x") (a= 1, 2, ... , n),

(5.5)

where the f's are single-valued continuous differentiable functions, at least for certain ranges of their arguments. Hence, at this stage, we view a coordinate transformation passively as assigning to a point of the manifold whose old coordinates are (x1, x 2, ... , x") the new primed coordinates (x'1, x'2, ... , x'").
We can write (5.5) more succinctly as x'• = f°(x), where, from now on, lower
case Latin indices are assumed to run from 1 to n, the dimension of the manifold, and the f" are alt functions of the old unprimed coordinates. Furthermore, we can write the equation more simply still as

where x'0 (x) denote then functionsf°(x). Notation plays an important role in tensor calculus, and equation (5.6) is clearly easier to write than equation (5.5).
We next contemplate differentiating (5.6) with respect to each of the coordinates xb. This produces then x n transformation matrix of coefficients:

OX' 1 OX' 1

OX' 1

OX 1 ox 2

ox"

[!;:] ox'2 ox'2 = OX 1 ox2

ox' 2 ox•

(5.7)

ox'" ox'"

ox'"

OX 1 ox2

ox•

The determinant J' of this matrix is called the Jacobian of the transformation:
(5.8)

We shall assume that this in non-zero for some range of the coordinates xb.

Then it follows from the implicit function theorem that we can (in principle)

solve equation (5.6) for the old coordinates x• and obtain the inverse trans-

formation equations

x" = x0 (x').

(5.9)

5.4 Transformation of coordinates I 59
It follows from the product rule for determinants that, if we define the Jacobian of the inverse transformation by

then J = 1/1 1• In three dimensions, the equation of a surface is given by z = f(x, y), then
its total differential is defined to be
aJ aJ
dz = ax dx + ay dy.
Then, in an exactly analogous manner, starting from (5.6), we define the total differential

for each a running from 1 to n. We can write this more economically by introducing an explicit summation sign:

ft <:I IQ
d ~ d b X IQ~" L'..<:ibX.
b=I uX

(5.10)

This can be written more economically still by introducing the Einstein summation convention: whenever a literal index is repeated, it is understood to imply a summation over the index from l to n, the dimension of the manifold. Hence, we can write (5.10) simply as

The index a occurring on each side of this equation is said to be free and may take on separately any value from 1 to n. The index b on the right-hand side is repeated and hence there is an implied summation from 1 to n. A repeated index is called bound or dummy because it can be replaced by any other index not already in use. For example,

because c was not already in use in the expression. We define the Kronecker
delta o,: to be a quantity which is either Oor 1 according to

o,: = { l if a = b,
0. if a -=I- b.

(5.12)

It therefore follows directly •from the definition of partial differentiation (check) that

(5.13)

60 I Tensor algebra

5.5 Contravariant tensors

The approach we are going to adopt is to define a geometrical quantity in

terms of its transformation properties under a coordinate transformation

(5.6). We shall start with a prototype and then give the general definition.

Consider two neighbouring points in the manifold P and Q with coordinates

x• and x• + dx", respectively. The two points define an infinitesimal dis-

PQ. placement or infinitesimal vector

The vector is not to be regarded as

free, but as being attached to the point P (Fig. 5.4). The components of this

vector in the x"-coordinate system are dx". The components in another

coordinate system, say the x'"-coordinate system, are dx'• which are connec-

ted to dx" by (5.11), namely,

a ,.

dx'• = a:b d~.

(5.14)

Fig. 5.4 Infinitesimal vector PQattached to P.

The transformation matrix appearing in this equation is to be regarded as being evaluated at the point P. i.e. strictly speaking we should write
(5.15)

but with this understood we shall stick to the notation of (5.14). Thus, [ox'• /oxb]p consists of an n x n matrix of real numbers. The transformation
is therefore a linear homogeneous transformation. This is our prototype. A contravariant vector or contravariant tensor of rank (order) 1 is a set of
quantities, written x• in the x•-coordinate system, associated with a point P,
which transforms under a change of coordinates according to

where the transformation matrix is evaluated at P. The infinitesimal vector
dx" is a special case of (5.16) where the components x• are infinitesimal. An
example of a vector with finite components is provided by the tangent vector
dx"/du to the curve x'l = x"(u). It is important to distinguish between the
actual geometric object like the tangent vector in Fig. 5.5 (depicted by an arrow) and its representation in a particular coordinate system, like the n numbers [dx"/du]p in the x•-coordinate system and the (in general) different
numbers [dx'"/du]p in the x'"-coordinate system. We now generalize the definition (5.16) to obtain contravariant tensors of
higher rank or order. Thus, a contravariant tensor of rank 2 is a set of n2 quantities associated with a point P, denoted by x•b in the x•-coordinate
system, which transform according to

Fig. 5.5 The tangent vector at two points
of a curve xa = x•( u).

X'•bO=X'"-O-XX'b 'd
OX' OXd

(5.17)

The quantities X'"b are the components in the x'"-coordinate system, the
transformation matrices are evaluated at P, and the law involves two dummy indices c and d. An example of such a quantity is provided by the product
Yo zb, say, of two contravariant vectors y• and z•. The definition of third-
and higher-order contravariant tensors proceeds in an analogous manner. An

I • 5.6 Covariant and mixed tensors 61
important case is a tensor of zero rank, called a scalar or scalar invariant ¢, which transforms according to

at P.

5.6 Covariant and mixed tensors

As in the last section, we begin by considering the transformation of a

prototype quantity. Let

<P = </J(x")

(5.19)

be a real-valued function on the manifold, i.e. at every point P in the manifold, q,(P) produces a real number. We also assume that q, is continuous and differentiable, so that we can obtain the differential coefficients aq,/ax•.
Now, remembering from equation (5.9) that x• can be thought of as a function of x'b, equation (5.19) can be written equivalently as

<P = </J(x"(x')).
Differentiating this with respect to x'b, using the function of a function rule, we obtain
aq, aq, ax• ax'b = ax• ax'b •

Then changing the order of the terms, the dummy index, and the free index

(from b to a) gives

a</) axb a</) ax'• = ax'• axb.

(5.20)

This is the prototype equation we are looking for. Notice that it involves the
inverse trausformation matrix axb / ax'•. Thus, a covariant vector or covariant
x. tensor of rank (order) 1 is a set of quantities, written in the x"-coordinate
·system, associated with a point P, which transforms according to

Again, the transformation matrix occurring is assumed to be evaluated at P. Similarly, we define a covariant tensor of rank 2 by the transformation law

axe 8xd X~b = ax'• ax'b Xcd'

(5.22)

and so on for higher-rank tensors. Note the convention that contravariant tensors have raised indices whereas covariant tensors have lowered indices. (The way to remember this is that co goes below.) The fact that the differentials dx" transform as a contravariant vector explains the convention that the coordinates themselves are written as x" rather than x0 , although

62 I Tensor algebra

note that it is only the differentials and not the coordinates which have

tensorial character. We can go on to define mixed tensors in the obvious way. For example, a
mixed tensor of rank 3 - one contravariant rank and two covariant rank -

satisfies

,a

0X'0 OX' OX/ d

X be = oxd ox'b ox" X ef

(5.23)

If a mixed tensor has contravariant rank p and covariant rank q, then it is said to have type or valence (p, q).

We now come to the reason why tensors are important in mathematical physics. Let us illustrate the reason by way of an example. Suppose we find in one coordinate system that two tensors, x.b and Yab say, are equal, i.e.

(5.24)

Let us multiply both sides by the matrices ox'/ox" and oxbj ox'd and take the implied summations to get

Ox' OXb

Ox' OXb

ox" ox'd x.b = ox" ox'd Yab·

Since Xab and Yab are both covariant tensors of rank 2 it follows that X~b = Y~b- In other words, the equation (5.24) holds in any other coordinate system. In short, a tensor equation which holds in one coordinate system necessarily holds in all coordinate systems. Thus, although we introduce coordinate systems for convenience in tackling particular problems, if we work with tensorial equations then they hold in all coordinate systems. Put another way, tensorial equations are coordinate-independent. This is something that the index-free or coordinate-free approach makes clear from the
outset.

5. 7 Tensor fields
In vector analysis, a fixed vector is a vector associated with a point, whereas a vector field defined over a region is an association of a vector to every point in that region. In exactly the same way, a tensor is a set of quantities defined at one point in the manifold. A tensor field defined over some region of the manifold is-an association of a tensor of the same valence to every point of the region, i.e.
P-+ T,:: :: (P),
where r,:: :: (P) is the value of the tensor at P. The tensor field is called
continuous or differentiable if its components in all coordinate systems are continuous or differentiable functions of the coordinates. The tensor field is called smooth if its components are differentiable to all orders, which .is denoted mathematically by saying that all the components are C"'. Thus, for example, a contravariant vector field defined over a region is a set of n functions defined over that region, and the vector field is smooth if the functions are all C"'. The transformation law for a contravariant vector field then becomes
(5.25)

5.8 Elementary operations with tensors I 63
at each point P in the region, since the old components X 0 are functions of the old x0 -coordinates and the new components X'0 are functions of the new x'"-coordinates.
As in the case of vectors and vector fields in vector analysis, the distinction between a tensor and a tensor field is not always made completely clear. We shall for the most part be dealing with tensor fields from now on, but to conform with general usage we shall often refer to tensor fields simply as tensors. We will again shorten the transformation law such as (5.25) to the form (5.21) with e_verything else being implied. If we wish to emphasize that a tensor is a tensor field, we shall write it in functional form, namely, as n::(x).
5.8 Elementary operations with tensors
Tensor calculus is concerned with tensorial operations, that is, operations on tensors which result in quantities which are still tensors. A simple way of establishing whether or not a quantity is a tensor is to see how it transforms under a coordinate transformation. For example, we can deduce directly from the transformation law that two tensors of the same type can be added together to give a tensor of the same type, e.g.
(5.26)
The same holds true for subtraction and scalar multiplication.
A covariant tensor of rank 2 is said to be symmetric if X ab = X ba• in which case it has only ½n(n + 1) independent components (check this by estab-
lishing how many independent components there are of a symmetric matrix of order n). Symmetry is a tensorial property. A similar definition holds for a contravariant tensor X 0b. The tensor x.b is said to be anti-symmetric or skew
symmetric if Xab = -Xba, which has only ½n(n - 1) independent compon-
ents; this is again a tensorial property. A notation frequently used to denote the symmetric part of a tensor is

and the anti-symmetric part is

In general,

= r!1
X<a,a,···a,)

(sum over all permutations of the indices a 1 to a,)

and

(alternating sum over all permutations of the indices a1 to a,)
For example, we shall need to make use of the result

64 I Tensor algebra

(A way to remember the above expression is to note that the positive terms are obtained by cycling the indices to the right and the corresponding negative terms by flipping the last two indices.) A totally symmetric tensor is defined to be one equal to its symmetric part, and a totally anti-symmetric tensor is one equal to its anti-symmetric part.
We can multiply two tensors of type (p1, qi) and (p2 , q2 ) together and
obtain a tensor of type (p1 + p2 , q1 + q2), e.g.
(5.30)
In particular, a tensor of type (p, q) when multiplied by a scalar field <f, is again a tensor of type (p, q). Given a tensor of mixed type (p, q), we can form a tensor of type (p - l, q - l) by the process of contraction, which simply involves setting a raised and lowered index equal. For example,
= contraction on a and b
xabed - - - - - - - - - xa a<d Ycd•
i.e. a tensor of type (l, 3) has become a tensor of type (0, 2). Notice that we can
contract a tensor by multiplying by the Kronecker tensor o:, e.g.
(5.31)
o: In effect, multiplying by turns the index b into a (or equivalently the index
a into b).
5.9 Index-free interpretation of contravariant vector fields
As we pointed out in §5.5, we must distinguish between the actual geometric object itself and its components in a particular coordinate system. The important point about tensors is that we want to make statements which are independent of any particular coordinate system being used. This is abundantly clear in the index-free approach to tensors. We shall get a feel for this approach in this section by considering the special case of a contravariant vector field, although similar index-free interpretations can be given for any tensor field. The key idea is to interpret the v~ctor field as an operator which maps real-valued functions into real-valued functions. Thus, if X represents a contravariant vector field, then X operates on any real-valued function! to
produce another function g, i.e. Xf = g. We shall show how actually to
compute Xf by introducing a coordinate system. However, as we shall see, we could equally well introduce any other coordinate system, and the computation would lead to the same result.
In the x0-coordinate system, we introduce the notation
= a
aa axa
and then X is defined as the operator

so that

(5.33)

I 5.9 Index-free interpretation of contravariant vector fields 65

for any real-valued function/. Let us compute X in some other x'"-coordinate system. We need to use the result (5.13) expressed in the following form: we may take x" to be a function of x'b by (5.9) and x'b to be a function of x' by (5.6), and so, using the function of a function rule, we find

= 1:11 _ ox" _ 0 "( ''( --")) ox" ox"
ub - oxb - oxb X X ~ - ox" oxb •

(5.34)

Then, using the transformation law (5.16) and (5.20) together with the above trick, we get

X'" o' = X'" __!__

"

ox'"

=o-xX'" b _ ox' _o oxb ox'" ox'

=o-x' -oxX '" bo-

ox'" 0Xb

OX'

:::::: Xb_!_
0Xb

= X"_!_
ox"

= X"o,..

Thus the result of operating on f by X will be the same irrespective of the

coordinate system employed in (5.32).

In any coordinate system, we may think of the quantities [o/ox,.]p as

forming a basis for all the vectors at P, since any vector at Pis, by (5.32), given

by

1• Xp = [X"]p[o~

that is, a linear combination of the [o/ox"]p. The vector space of all the contravariant vectors at Pis known as the tangent space at P and is written Tp(M) (Fig. 5.6). In general, the tangent space at any point in a manifold is

Contravariant vectors Tangent space Tp(M)

Fig. 5.6 The tangent space at P.

66 I Tensor algebra

different from the underlying manifold. For this reason, we need to be careful in representing a finite contravariant vector by an arrow in our figures since, strictly speaking, the arrow lies in the tangent space not the manifold. Two exceptions to this are Euclidean space and Minkowski space-time, where the tangent space at each point coincides with the manifold.
Given two vector fields X and Y we can define a new vector field called the commutator or Lie bracket of X and Y by

Letting [X, Y] =Zand operating with it on some arbitrary function!

Zf= [X, Y]f
= (XY- YX)f = X(Yf) - Y(Xf) = X( Y"o0 f) - Y(X0 00 f) = XbiJb( YoiJ0 f) - ybi)b(X0 00 f}
= (Xbob ya - ybi)bX0 )8af - xa Yb(oboaf - OaObf).

The least term vanishes since we assume commutativity of second mixed partial derivatives, i.e.

a2

a2

OaOb = ox"oxb = oxbox• = aba••

Since f is arbitrary, we obtain the result

[X, YJ" = z· = xbab y• - ybabx•

(5.36)

from which it clearly follows that the commutator of two vector fields is itself a vector field. It also follows, directly from the definition (5.35), that

[X,X]=O

(5.37)

[X, Y] = -[Y,X]

(5.38)

[x, [Y, ZJ] + [z, [X, YJ] + [ Y, [Z, XJ] = 0.

(5.39)

Equation (5.38) shows that the Lie !?racket is anti-commutative. The result (5.39) is known as Jacobi's identity. Notice it states that the left-hand side is not just equal to zero but is identically zero. What does this mean? The
equation x 2 - 4 = 0 is only satisfied by particular values of x, namely, +2
and - 2. The identity x 2 - x 2 = 0 is satisfied for all values of x. But, you may
argue, the x 2 terms cancel out, and this is precisely the point. An expression is identically zero if, when all the terms are written out fully, they all cancel in
pairs.

Exercises
5.1 (§5.3) In Euclidean 3-space JR 3: (i) Write down the equation of a circle of radius a lying in the (x, y)-plane centred at the origin in (a) parametric form (b) constraint form. (ii) Write down the equation of a hypersurface consisting of a sphere of radius a centred at the origin in (a) parametric form (b) constraint form. Eliminate the parameters in form (a) to obtain form (b).
5.2 (§5.4) Write down the change of coordinates from Cartesian coordinates (x") = (x, y, z) to spherical polar coordin-
ates (x'•) = (r, 0, ¢) in JR3. Obtain the transformation mat-
rices [8x"/8x'b] and [8x'•/axb] expressing them both in terms of the primed coordinates. Obtain the Jacobians J and J '. Where is J' zero or infinite?
5.3 (§5.4) Show by manipulating the dummy indices that
(Zab, + Z,ab + Zbca)X" xb X' = 3Z.bcx· xb X'.
5.4 (§5.4) Show that
(i) o:x· = xb, (ii) o:xb = x., (iii) o:oio: = o:.
5.5 (§5.5) If y• and z• are contravariant vectors, then show
that yazb is a contravariant tensor of rank 2.
5.6 (§5.5) Write down the change of coordinates from Car-
tesian coordinates (x") = (x, y) to plane polar coordinates
(x'•) = (R, </>) in JR2 and obtain the transformation matrix [ox'• /axb] expressed as a function of the primed coordinates. Find the components of the tangent vector to the curve consisting of a circle of radius a centred at the origin with the standard parametrization (see Exercise 5.1 (i)) and use (5.16) to find its components in the primed coordinate system.
5.7 (§5.6) Write down the definition of a tensor of type (2, 1).
5.8 (§5.6) Prove that o! has the tensor character indicated.
Prove also that Bi is a constant or numerical tensor, that is, it has the same components in all coordinate systems.
5.9 (§5.6) Show, by differentiating (5.20) with respect to x", that o2</>lox•oxb is not a tensor.

Exercises I 67
5.10 (§5.8) Show that if y•be and z•be are tensors of the type
indicated then so is their sum and difference.
5.11 (§5.8) (i) Show that the fact that a covariant second rank tensor is symmetric in one coordinate system is a tensorial property.
(ii) If x•b is anti-symmetric and Y.b is symmetric then prove that x•b Y.b = 0.
5.12 (§5.8) Prove that any covariant (or contravariant) tensor of rank 2 can be written as the sum of a symmetric and an anti-symmetric tensor. [Hint: consider the identity x.b = ½(Xab + xb.) + ½(X.b - xb.).] 5.13 (§5.8) If x•b, is a tensor of the type indicated, then
prove that the contracted quantity Y, = x•ac is a covariant
vector.
5.14 (§5.8) Evaluate o: and o:o! in n dimensions.
5.15 (§5.9) Check that the definition of the Lie bracket leads to the results (5.37), (5.38), and (5.39).
5.16 (§5.9) In JR2, let (x") = (x, y) denote Cartesian and (x'•) = (R, <I>) plane polar coordinates (see Exercise 5.6).
(i) If the vector field X has components
x• = (1, 0), then find X'".
(ii) The operator grad can be written in each coordinate system as
i aJ . aJ. aJ ~ aJ
gradf=-1+-1=-R +--, ax 8y 8R 8</> R
where f is an arbitrary function and
i R= cos <J>i + sin <f>j, = - sin<J>i + cos<J>j.
Take the scalp.r product of gradfwith i,j, R, and j in turn to find relationships between
the operators a;ax, a;ay, 8/8R, and afo</>. (iii) Express the vector field X as an operator in each
coordinate system. Use part (ii) to show that these expressions are the same.
(iv) If Yo = (0, 1) and z• = ( -y, x), then find Y'•,
Z'•, Y, and Z. (v) Evaluate all the Lie brackets of X, Y, and Z.

6.1 Partial derivative of a tensor

In the last chapter, we met algebraic operations which are tensorial, that is, which conv_ert tensors into tensors. The operations are addition, subtraction, multiplication, and contraction. The next question which arises is, What differential operations are there that are tensorial? The answer to this turns out to be very much more involved. The first thing we shall see is that partial differentiation of tensors is not tensorial. Different authors denote the partial derivative of a contravariant vector xa by

a b

a x

or

axa axb

or

a X ,b

or

xa
Jb

and similarly for higher-rank tensors. We shall use a mixture of all the first three notations. (Note that in the literature, the partial derivative of a tensor is often referred to as the ordinary derivative of a tensor, to distinguish it from the tensorial differentiation we shall shortly meet). Now differentiating (5.16) with respect to x'C, we find

a' ,a - a (ax'a b) ex - ax'C axb X

(6.1)

If the first term on the right-hand side alone were present, then this would be

the usual tensor transformation law for a tensor of type (1, 1). However, the

presence of the second term prevents abxa from behaving like a tensor.

There is a fundamental reason why this is the case. By definition, the

process of differentiation involves comparing a quantity evaluated at two

neighbouring points, P and Q say, dividing by some parameter representing

the separation of P and Q and then taking the limit as this parameter goes to

zero. In the case of a contravariant vector field xa, this would involve

computing

11. m

[Xa]p - [Xa]Q
-----~

du-o

OU

for some appropriate parameter ou. However, from the transformation law in

the form (5.25), we see that

~~:1 and X'a =[

xt.

This involves the transformation matrix evaluated at different points, from which it should be clear that X'j, - Xa is not a tensor. Similar remarks hold for differentiating tensors in general.
It turns out that if we wish to differentiate a tensor in a tensorial manner then we need to introduce some auxiliary field onto the manifold. We shall meet three different types of differentiation. First of all, in the next section, we shall introduce a contravariant vector field onto the manifold and use it to define the Lie derivative. Then we shall introduce a quantity called an affine connection and use it to define covariant differentiation. Finally, we shall introduce a tensor called a metric and from it build a special affine connection, called the metric connection, and again define covariant differentiation but relative to this specific connection.

6.2 The Lie derivative I 69

6.2 The Lie derivative

The argument we present in this section is rather intricate. It rests on the idea

of interpreting a coordinate transformation actively as a point transforma-

tion, rather than passively as we have done up to now. The important results

occur at the end of the • section and consist of the formula for the

Lie derivative of a general tensor field and the basic properties of Lie

differentiation.

We start by considering a congruence of curves defined such that only one

curve goes through each point in the manifold. Then, given any one curve of

the congruence,

x• = x"(u),

we can use it to define the tangent vector field dx•/du along the curve. If we do
this for every curve in the congruence, then we end up with a vector field x•
(given by dx"/du at each point) defined over the whole manifold (Fig. 6.1). Conversely, given a non-zero vector field X"(x) defined over the manifold,
then this can be used to define a congruence of curves in the manifold called
the orbits or trajectories of x•. The procedure is exactly the same as the way
in which a vector field gives rise to field lines or streamlines in vector analysis. These curves are obtained by solving the ordinary differential equations

Fig. 6.1 The tangent vector field resulting from a congruence of curves.

dx" cfu = x•(x (u)) .

(6.2)

The existence and uniqueness theorem for ordinary differential equations
guarantees a solution, at least for some subset of the reals. In what follows, we
are really only intere11ted in what happens locally (Fig. 6.2).
We therefore assume that x• has been given and we have constructed the
local congruence ofcurves. Suppose we have some tensor field rr: :(x) which
we wish to differentiate using x•. Then the essential idea is to use the congruence of curves to drag the tensor at some point P (i.e. rr: :(P)) along
the curve passing through P to some neighbouring point Q, and then
compare this 'dragged-along tensor' with the tensor already there (i.e.
~ :: :(Q)) (Fig. 6.3). Since the dragged-along tensor will be of the same type as

Fig. 6.2 The local congruence 6f curves resulting from a vector field .

70 I Tensor calculus

'Tensor' at P

Fig. 6.3 Using the congruence to compare tensors at neighbouring points.

'Dragged-along tensor' at Q

1
I

'Tensor' at Q

I

I I

x•(o,

I

Q

the tensor already at Q, we can subtract the two tensors at Q and so define a derivative by some limiting process as Q tends to P. The technique for dragging involves viewing the coordinate transformation from P to Q actively, and applying it to the usual transformation law for tensors. We shall consider the detailed calculation in the case of a contravariant tensor field of rank 2, r•h(x) say.
Consider the transformation

where l>u is small. This is called a point transformation and is to be regarded actively as sending the point P, with coordinates x•, to the point Q, with
coordinates x• + ou X"(x), where the coordinates of each point are given in
the same x•-coordinate system, i.e.
P--+Q
x•--+ x• + ou X"(x).
The point Q clearly lies on the curve of the congruence through P which x•
generates (Fig. 6.4). Differentiating (6.3), we get

(6.4)

Next, consider the tensor field r•b at the point P. Then its components at p are T"b(x) and, under the point transformation (6.3), we have the mapping

Fig. 6.4 The point P transformed to Qin the same xa -coordinate system.

T"b(x)--+ T'"b(x'),
i.e. the transformation 'drags' the tensor pb along from P to Q. The •components of the dragged-along tensor are given by the usual transformation law for tensors (see (5.25)), and so, using (6.4),

0 O ,a rb

T'"b( ') = _:_ _:_ red( )

X

OX' OXd

X

= (8~ + ouocX")(o~ + ouodX6 )T"d(x)

= T06(x) + [o,X0 T'6 (x) + adX6 T 0d(x)]ou + O(ou 2 ).

(6.5)

Applying Taylor's theorem to first order, we get

T"b(x') = T"6(x' + ou X'(x)) = T06 (x) + ou X' ac T06(x).

(6.6)

We are now in a position to define the Lie derivative of pb with respect to

x•, which is denoted by Lx Yob, as

6.2 The Lie derivative I 71

This involves comparing the tensor T 0 b(x') already at Q with T'0 b(x'), the dragged-along tensor at Q. Using (6.5) and (6.6), we find

(6.8)

It can be shown that it is always possible to introduce a coordinate system such that the curve passing through P is given by x 1 varying, with x 2, x 3 ,... , x" all constant along the curve, and such that

x· ~ o~ = (1, o, o, ... ,o)

(6.9)

along this curve. The notation ~ used in (6.9) means that the equation holds only in a particular coordinate system. Then it follows that

x = x·a. ~ 01,

and equation (6.8) reduces to

(6.10)

Thus, in this special coordinate system, Lie differentiation reduces to ordinary differentiation. In fact, one can define Lie differentiation starting from this viewpoint.
We end the section by collecting together some important properties of Lie differentiation with respect to X which follow from its definition.

1. It is linear; for example

where Aand µ are constants. Thus, in particular, the Lie derivative of the sum and difference of two tensors is the sum and difference, respectively, of the Lie derivatives of the two tensors.
2. It is Leibniz; that is, it satisfies the usual product rule for differentiation, for example

3. It is type-preserving; that is, the Lie derivative of a tensor of type (p, q) is again a tensor of type (p, q).
4. It commutes with contraction; for example

72 I Tensor calculus

5. The Lie derivative of a scalar field </> is given by
6. The Lie derivative of a contravariant vector field ya is given by the Lie bracket of X and Y, that is,
7. The Lie derivative of a covariant vector field Ya is given by
8. The Lie derivative of a general tensor field r:::: is obtained as follows: we
first partially differentiate the tensor and contract it with X. We then get an additional term for each index of the form of the last two terms in (6.15) and (6.16), where the corresponding sign is negative for a contravariant index and positive for a covariant index, that is,

x•

X'+bX"

I
---l 'Parallel' vector

I I I I

p

Q

Fig. 6.5 The parallel vector xa + oX8 at Q.

6.3 The affine connection and covariant differentiation

Consider a contravariant vector field xa(x) evaluated at a point Q, with
coordinates xa + cha, near to a point P, with coordinates xa. Then, by
Taylor's theorem,

xa(x + ch) = xa(x) + bxb abx·

(6.18)

to first order. If we denote the second term by bX"(x), i.e.
bX"(x) = bxbabx· = xa(x + bx) - X"(x),

(6.19)

then it is not tensorial since it involves subtracting tensors evaluated at two different points. We are going to define a tensorial derivative by introducing a
vector at Q which in some general sense is 'parallel' to x• at P. Since x• + bx•
is close to xa, we can assume that the parallel vector only differs from xa(x) by a small amount, which we denote bX"(x) (Fig. 6.5). By the same argument as in §6.1 above, bX"(x) is not tensorial, but we shall construct it in such a way as to make the difference vector

X"(x) + bX"(x) - [X"(x) + bX"(x)] = c5X"(x) - bX°(x) (6.20)

tensorial. It is natural to require that bX"(x) should vanish whenever X"(x) or bx" does. Then the simplest definition is to assume that bX" is linear in both x• and c5x", which means that there exist multiplicative factors I'f.c

I 6.3 The affine connection and covariant differentiation 73

where

(6.21)

and the minus sign is introduced to agree with convention. We have therefore introduced a set of n3 functions r,:c(x) on the manifold,
whose transformation properties have yet to be determined. This we do by defining the covariant derivative of X 0 , written in one of the notations (where we shall use a mixture of the first two)

VcXa or xa;c or X 0 11 c, by the limiting process

VcX0 =

lim

1 Tc

{X 0 (x

+

bx)

-

[X0

(x)

+

bX0

(x)]}.

~xc-o ux

In other words, it is the difference between the vector xa(Q) and the vector at Qparallel to X 0 (P), divided by the coordinate differences, in the limit as these differences tend to zero. Using (6.18) and (6.21), we find

Note that in the formula the differentiation index c comes second in the
downstairs indices of r. If we now demand that VcX0 is a tensor of type (1, 1),
then a straightforward calculation (exercise) reveals that r,;c m\lst transform
according to

Qr equivalently (exercise)

If the second term on the right-hand side were absent, then this would be the usual transformation law for a tensor of type (1, 2). However, the presence of the second term reveals that the transformation law is linear inhomogeneous,
and so rbc is not a tensor. Any quantity rbc which transforms according to
(6.23) or (6.24) is called an affine connection or sometimes simply a connection or affinity. A manifold with a continuous connection prescribed on it is called an affine manifold. From another point of view, the existence of the inhomogeneous term in the transformation law is not surprising if we are to define a tensorial derivative, since its role is to compensate for the second term which occurs in (6.1).
We next define the covariant derivative of a scalar field to be the same as its ordinary derivative, i.e.

74 I Tensor calculus

If we now demand that covariant differentiation satisfies the Leibniz rule, then we find

Notice again that the differentiation index comes last in the I'-term and that this term enters with a minus sign. Th~ name covariant derivative stems from
the fact that the derivative of a tensor/of type (p, q) is of type (p, q + l), i.e. it
has one extra covariant rank. The expression in the case of a general tensor is
(compare and contrast with (6.17))

It follows directly from the transformation laws that the sum of two connections is not a connection or a tensor. However, the difference of two connections is a tensor of valence (l, 2), because the inhomogeneous term cancels out in the transformation. For the same reason, the anti-symmetric
part of a r:c, namely, T~ = r~ - r~b
is a tensor called the torsion tensor. If the torsion tensor vanishes, then the connection is symmetric, i.e.

From now on, unless we state otherwise, we shall restrict ourselves to symmetric connections, in which case the torsion vanishes. The assumption that the connection is symmetric leads to the following useful result. In the expression for a Lie derivative of a tensor, all occurrences of the parti~ derivatives may be replaced by covariant derivatives. For example, in the cas\: of a vector (exercise)

Lx y• = xb i\ y• - Yb abx· = Xb\\ Y" - Yb\:\X".

(6.29)

6.4 Affine geodesics If rr:: is any tensor, then we introduce the notation

that is, Vx of a tensor is its covariant derivative contracted with X. Now in §6.2 we saw that a contravariant vector field X determines a local congruence of curves,
x• = x"(u),

where the tangent vector field to the congruence is

dxa = xa

du

•

We next define the absolute derivative of a tensor r:::: along a curve C of
the congruence, written D Tb: ::/Du, by

6.4 Affine geodesics I 75

The tensor rr:: is said to be parallely propagated or transported along the
curve C if

This is a first-order ordinary differential equation for n: ::, and so given an initial value for r;;:::, say rr :: (P), equation (6.32) determines a tensor along C which is eyerywhere parallel to r;;:: :(P).
Using this notation, an affine geodesic is defined as a privileged curve along which the tangent vector is propagated parallel to itself. In other words, the parallely propagated vector at any point of the curve is parallel, that is, proportional, to the tangent vector at that point:

E_(dxa) = A(u) dxa.

Du du

du

Using (6.31), the equation for an affine geodesic can be written in the form

or equivalently (exercise)

The last result is very important and so we shall establish it afresh from first principles using the notation of the last section. Let the neighbouring points P and Q on C be given by x0 (u) and
dx0 x"(u +bu)= x 0 (u) + du bu

to first order in bu, respectively. Then in the notation of the last section

dx0 bx0 = dubu.

(6.35)

76 I Tensor calculus

The vector X"(x) at Pis now the tangent vector (dx"/du) (u). The vector at Q parallel to dx"/ du is, by (6.21) and (6.35),

dx•

dxb dxc

- --I''t,--bu.

du

du du

The vector already at Q is

dx"

dx• d2 x•

du (u + bu) =du+ du2 bu

to first order in bu. These last two vectors must be parallel, so we require

ddxu" +

d 2x" du 2

bu= [1 + l(u)bu]

(ddxu" -

r:c ddxub ddxuc bu)

,

where we have written the proportionality factor as 1 + l(u)bu without loss
of generality, since the equation must hold in the limit bu ➔ 0. Subtracting dx"/du from each side, dividing by bu and taking the limit as bu tends to zero produces the result (6.34). Note that I''t, appears in the equation multiplied by the symmetric quantity (dxb/du)(dx</du), and so even if we had not assumed that I''t, was symmetric the equation picks out its symmetric part only.
If the curve is parametrized in such a way that l vanishes (that is, by the above, so that the tangent vector is transported into itself), then the parameter is a privileged parameter called an affine parameter, often conventionally denoted by s, and the affine geodesic equation reduces to

or equivalently

Fig. 6.6 Two affine geodesics passing An affine parameters is only defined up to an affine transformation (exercise)
through P with given directions.

Q
p Fig. 6.7 Two affine geodesics from P refocusing at Q.

J:: where a and /3 are constants. We can use the affine parameters to define the
affine length of the geodesic between two points P1 and P2 by ds, and so we can compare lengths on the same geodesic. However, we cannot compare lengths on different geodesics (without a metric) because of the arbitrariness in the parameters. From the existence and uniqueness theorem for ordinary differential equations, it follows that corresponding to every direction at a point there is a unique geodesic passing through the point (Fig. 6.6). Similarly, any point can be joined to any other point, as long as the points are sufficiently 'close', by a unique geodesic. However, in the large, geodesics may focus, that is, meet again (Fig. 6.7).

6.5 The Riemann tensor

Covariant differentiation, unlike partial differentiation, is not in general
commutative. For any tensor r::::, we define its commutator to be

VcVdT:::: -VdVcT::::.

Let us work out the commutator in the case of a vector X 0 • From (6.22), we
see that
VcX0 = acxa + I'fx:Xb.

Remembering that this is a tensor of type (1, 1) and using (6.27), we find

VdVcX0 = oAocX0 + r;:cxb) + r:Aocxe + rbcxb) - r:Aaexa + I'i:eXb),

with a similar expression for VeVdX0 , namely,

V, vdxa = Oc(odx· + r;:dxb) + r:c(adx• + n:dxb) - r~cea.xa + r;:exb).

Subtracting these last two equations and assuming that

OdOcX" = OcOdX•,

we obtain the result

vcvdxa -vdvcxa = R"bcdxb + (I'~d - r;c)VeX0 ,

(6.38)

where R0 bcd is defined by

6.5 The Riemann tensor l 77

Moreover, since we are only interested in torsion-free connections, the last term in (6.38) vanishes, namely, using (5.28),

Since the left-hand side of (6.40) is a tensor, it follows that R0 bcd is a tensor of type (1, 3). It is called the Riemann tensor. It can be shown that, for a symmetric connection, the commutator of any tensor can be expressed in terms of the tensor itself and the Riemann tensor. Thus, the vanishing of the Riemann tensor is a necessary and sufficient condition for the vanishing of the commutator of any tensor. In the section after next, we shall search for a geometrical characterization of the vanishing of the Riemann tensor.

6.6 Geodesic coordinates

We first prove a very useful result. At any point P in a manifold, we can

introduce a special coordinate system, called a geodesic coordinate system,

in which

[I'fx:]p ~ 0.

We can, without loss of generality, choose P to be at the origin of coordinates x0 ~ 0 and consider a transformation to a new coordinate system

(6.41)

78 I Tensor calculus

where Qi= Q~b are constants to be determined. Differentiating (6.41), we get Then, since x0 vanishes at P, we have

from which it follows immediately that the inverse matrix

[

iJx0 ]
ox'b

p

=

O't,.

Substituting these results in (6.23), we find

[r~Jp = [r't,Jp - m,. Since the connection is symmetric, we can choose the constants so that

me= [riJp, and hence we obtain the promised result

(6.42)
Many tensorial equations can be established most easily in geodesic coordinates. Note that, although the connection vanishes at P,
# [I'~,d]p Q
in general. It can be shown that the result can be extended to obtain a coordinate system in which the connection vanishes along a curve, but not in general over the whole manifold. If, however, there exists a special coordinate system in which the connection vanishes everywhere, then the manifold is called affine flat or simply flat. We shall next see that this is intimately connected with the vanishing of the Riemann tensor.

6. 7 Affine flatness
In a general affine manifold, the intuitive concept of parallelism breaks down. For if we parallely transport a vector from one point to another along two different curves we will obtain two different vectors (Fig. 6.8). If, however, we can transport a vector from one point to any other and the resulting vector is independent of the path taken, then the connection is called integrable. Thus, for the usual concept of parallelism to hold, the manifold must possess an integrable connection. We now consider two lemmas which connect together the concepts of affine flatness, integrability, and vanishing Riemann tensor.

Fig. 6.8 Parallel transport round two curves in a general affine manifold.

We consider, first, necessity. Since I''t,, is integrable, we can start with a vector X 0at any point and from it construct a unique vector field X 0(x) over

6.7 Affine flatness I 79

the manifold by parallely propagating X 0 • The equation for parallely propagating xa is

DX0 Du

= ddxu' \\Xa

= 0,

and, since dx'/du is arbitrary, it follows that the covariant derivative of X 0 vanishes, i.e.
(6.43)
Hence, this equation must possess solutions. A necessary condition for a solution of this first-order partial differential equation is

(6.44)

namely, the second mixed partial derivatives should commut~: In the previous section, we met the identity for the commutator of a vector field (6.38), namely
v,vdxa - vdv,xa = a,adxa - ada,xa + R0bcdxb.

The left-hand side of this equation vanishes by construction, that is, by (6.43);

hence it follows that (6.44) will hold if and only if

"

Rabcdxb = 0.

Finally, since Xb is arbitrary· at every point, a necessary condition for integrability is R°b,d = 0 everywhere.
We next prove sufficiency. We start by considering the difference in parallely propagating a vector X 0 around an infinitesimal loop connecting x 0 to x• + bx0 + dx0 , first via x0 + bx0 and then via x0 + dx0 (Fig. 6.9). From §6.3, if we parallely transport X 0 from x0 to x0 + bx0 , we obtain the vector

where, by (6.21),

X 0 (x +bx)= X0 (x) + c5X0 (x),

x'+bx'

x'+bx'+dx'

x'+dx'
Fig. 6.9 Transporting xa around an
infinitesimal loop .

Similarly, if we transport this vector subsequently to x0 + bx0 + dx0 , we obtain the vector
X 0 (x +bx+ dx) = X 0 (x +bx)+ i5X0 (x + bx), where, in this case,
i5X0 (x +bx)= -rb,(x + bx)Xb(x + bx)dx'. Expanding by Taylor's theorem and using the previous results, we obtain (where everything is assumed evaluated at x0 )
b.X0 (x +bx)= -(I'b, + adrb,bxd)(Xb - r:1X'bx1)dx"
= -I'~Xbdx' - adrb,Xbbxddx'
+ rb,r:,x•bx1dx' + adr~r~,x•Jxdbx1dx'.
Neglecting the last term, which is third order, we have
X 0 (x + bx + dx) = xa - r~Xbbx' - rb,Xbdx' - adn,xbbxddx' + r~r:,x•bx1dx'.
To obtain the equivalent result for the path connecting x0 to x 0 + bx0 + dx0

80 I Tensor calculus

via x• + dx0 , we simply interchange bx• and dx" to give
X 0 (x + dx + bx)
= x· - r:cxbdxc - r:cxbbxc - adrbcXbdx4bxc + r:cr~1X'dx1bxc.
Hence, the difference between these two vectors is

!!.X0 = X"(x + bx + dx) - X 0 (x + dx + bx) = (adr:c- acnd + r:,rric- r:cnd)Xbbx<dx4
=R·bdcXbbx'dx4
= -R"bcdXbbx<dx'

--...;;::==-,;~Q
Fig. 6.10 Deforming C, into C2 (infinitesimally at each stage).

by (6.39) and the fact that the Riemann tensor is anti-symmetric on its
last pair of indices (see (6.77)). Thus, the vector x• will be the same at x• +bx•+ dx0 , irrespective of which path is taken, if and only if R"bcd = O. lt follows that if the Riemann tensor vanishes then the vector x• will not change
if parallely transported around any infinitesimal closed loop. Using this result and assuming the manifold has no holes (that is, the manifold is simply connected), then we can continuously deform one curve into another by deforming the curves infinitesimally at each stage (Fig. 6.10), which establishes that the connection is integrable (check).
The second lemma is as follows.

Sufficiency is established by first choosing n linearly independent vectors
X;" (i = 1, 2, ... , n)
at P, where the bold index i runs from 1 ton and labels the vectors. Using the integrability assumption we can construct the parallel vector fields X;"(x) and these will also be linearly independent everywhere. Therefore, at each point P, X;"(P) is a non-singular matrix of numbers and so we can construct its inverse, denoted by Xib, which must satisfy
(6.45)
where there is a summation over i. Multiplying the propagation equation

by Xie produces

(6.46)

Differentiating (6.45), we obtain

X/acXib = -XibacX;" = rbc

(6.47)

by (6.46). Using (6.47), we find that

X;"(acx;b - abxic) = r:c - r~b = 0,

because the connection is symmetric by assumption. Since the determinant of X;" is non-zero, it follows that the quantity in brackets must vanish, from

which we get

i\Xib = obXic·

This in turn implies that Xib must be the gradient of n scalar fields,Ji(x) say, that is,
Xib = odi(x).

If we consider the transformation

x 0 -+x'0 =f•(x) then

(6.48)

and so, taking inverses,

(6.49)

Multiplying (6.23) by X,/ and using (6.48) and (6.49) and then (6.45) and (6.47), we find
XahI'"1,c = x.h(X•dX{X/ r:1 - xb· X/o,X•1)
= o:x,,ex/r~1 - x,,·x/r:1 = o.
Again, since the determinant of X ,,h is non-zero, I'~~ vanishes everywhere in
this coordinate system and hence the manifold is affine flat. The necessity is straightforward and is left as an exercise.
Ifwe put these two lemmas together, we get the result we have been looking for.

6.8 The metric I. 81

6.8 The metric
Any symmetric covariant tensor field of rank 2, say gab(x), defines a metric. A manifold endowed with a metric is called a Riemannian manifold. A metric can be used to define distances and lengths of vectors. The infinitesimal distance (or interval in relativity), which we call ds, between two neigh-
bouring points x• and x• + dx0 is defined by
Note that this gives the square of the infinitesimal distance, (ds)2, which is conventionally written as ds2• The equation (6.50) is also known as the line element and 9ab is also called the metric form or first fundamental form. The
square of the length or norm of a contravariant vector x• is defined by

82 I Tensor calculus

The metric is said to be positive definite or negative definite if, for all vectors
X, X 2 > 0 or X 2 < 0, respectively. Otherwise, the metric is called indefinite.

The angle between two vectors xa and ya with X 2 i= 0 and Y 2 i= 0 is given

by

g xayb
cos(X, Y) = (lgcdxc XdJ)½(lg,f ye Yfl)½.

(6.52)

In particular, the vectors xa and ya are said to be orthogonal if

gabxayb = 0.

(6.53)

If the metric is indefinite (as in relativity theory), then there exist vectors which are orthogonal to themselves called null vectors, i.e.

o. = gabxaxb

(6.54)

The determinant of the metric is denoted by

g = det(gab)

(6.55)

The metric is non-singular if g i= 0, in which case the inverse of gab• gab, is given by

It follows from this definition that gab is a contravariant tensor of rank 2 and it is called the contravariant metric. We may now use g.b and g•b to lower and raise tensorial indices by defining
(6.57) and
(6.58)
where we use the same kernel letter for the tensor. Since from now on we shall be working with a manifold endowed with a metric, we shall regard such associated contravariant and covariant tensors as representations of the same geometric object. Thus, in particular, •gab• 8!, and gab may all be thought of as different representations of the same geometric object, the metric g. Since we can raise and lower indices freely with the metric, we must be careful about the order in which we write contravariant and covariant indices. For example, in general, X/ will be different from Xba•

6.9 Metric geodesics

Consider the timelike curve C with paranretric equation x• ":' x"(u). Dividing equation (6.50) by the square of du we get

(

ds du

)

2

_
-

dxa dxb
gab du du'

(6.59)

Then the interval s between two points P I and P 2 on C is given by

=f f f s

P, _ P, ds

_ P2 (

dxa dxb )½

ds - d du -

gab d d du.

Pi

Pi U,

Pi

U U

(6.60)

We define a timelike metric geodesic between any two points P 1 and P2 as the privileged curve joining .them whose interval is stationary under small variations that vanish at the end points. Hence, the interval may be a maximum, a minimum, or a saddle point. Deriving the geodesic equations involves the calculus of variations and we postpone this to the next chapter. In that chapter, we shall see that the Euler-Lagrange equations result in the second-order differential equations

gab

d 2 xb du 2

+

{be,

a}

ddxub

ddxuc

=

(d2s;ds) du2 du gab

ddxub '

(6.61)

where the quantities in curly brackets are called the Christoffel symbols of the first kind and are defined in terms of derivatives of the metric by

6.9 Metric geodesics I 83

Multiplying through by gad and using (6.56), we get the equations

d2 xa + { a } dxb dxc = ( d2s / d,.s ) dxa

du 2

be du du

du2 du du '

(6-63)

where Uc} are the Christoffel symbols of the second kind defined by

In addition, the norm of the tangent vector dxa /du is given by (6.59). If, in

particular, we choose a parameter u which is linearly related to the interval s,

that is,

U = CXS + /J,

(6.65)

where IX and pare constants, then the right-hand side of(6.63) vanishes. In the
special case when u = s, the equations for a metric geodesic become

and

where we assume ds # 0. Apart from trivial sign changes, similar results apply for spacelike geo-
desics, except that we replace s by u, say, where
du2 = -gabdxadxb
However, in the case of an indefinite metric, there exist geodesics for which the distance between any two points is zero called null geodesics. It can also

84 I Tensor calculus

be shown that these curves can be parametrized by a special parameter u, called an affine parameter, such that their equation does not possess a righthand side, that is,

where

The last equation follows since the distance between any two points is zero, or equivalently the tangent vector is null. Again, any other affine parameter is related to u by the transformation
+ U ➔ IXU {J,
where IX and fJ are constants.

6.10 The metric connection
In general, ifwe have a manifold endowed with both an affine connection and metric, then it possesses two classes of curves, affine geodesics and metric geodesics, which will be different (Fig. 6.11). However, comparing (6.37) with (6.66), the two classes will coincide if we take

Metric or, using (6.64) and (6.62), if
geodesics

ra
be

=

{ bae

}

(6.70)

Fig. 6.11 Affine and metric geodesics on a manifold.

It follows from the last equation that the connection is necessarily symmetric, i.e.
(6.72)
In fact, if one checks the transformation properties of {;c} from first prin-
ciples, it does indeed transform like a connection (exercise). This special connection built out of the metric and its derivatives is called the metric connection. From now on, we shall always work with the metric connection and we shall denote it by qc rather than {t:,}, where I'i:c is defined by (6.71). This definition leads immediately to the identity (exercise)

Conversely, if we require that (6.73) holds for an arbitrary symmetric

connection, then it can be deduced (exercise) that the connection is necessarily the metric connection. Thus, we have the following important result.

6.11 Metric flatness I 85

In addition, we can show that and

(6.74) (6.75)

6.11 Metric flatness
Now at any point P of a manifold, g0b is a symmetric matrix of real numbers. Therefore, by standard matrix theory, there exists a transformation which
reduces the matrix to diagonal form with every diagonal term either +1
or -1. The excess of plus signs over minus signs in this form is called the signature of the metric. Assuming that the metric is ·continuous over the manifold and non-singular, then it follows that the signature is an invariant. In general, it will not be possible to find a coordinate system in which the metric reduces to this diagonal form everywhere. If, however, there does exist
a coordinate system in which the metric reduces to diagonal form with ±1
diagonal elements everywhere, then the metric is called flat. How does metric flatness relate to affine flatness in the case we are
interested in, that is, when the connection is the metric connection? The answer is contained in the following result.

Necessity follows from the fact that there exists a coordinate system in
which the metric is diagonal with ±1 diagonal elements. Since the metric is
constant everywhere, its partial derivatives vanish and therefore the metric connection I''i,c vanishes as a consequence of the definition (6.71). Since I'1,c vanishes everywhere then so must its derivatives. (One way to see this is to recall the definition of partial differentiation which involves subtracting quantities at neighbouring points. If the quantities are always zero, then their difference vanishes, and so does the resulting limit.) The Riemann tensor therefore vanishes by the definition (6.39).
Conversely, if the Riemann tensor vanishes, then by the theorem of §6.7, there exists a special coordinate system in which the connection vanishes everywhere. Since this is the metric connection, by (6.73),
Vcgab = Ocgab - r~cgdb - I'1,,:g.d = 0,

86 I Tensor calculus

from which we get

and it follows that aellab = 0. The metric is therefore constant everywhere and hence can be transformed into diagonal form with diagonal elements ±l.
Note the result (6.76) which expresses the ordinary derivative of the metric in terms of the connection. This equation will prove useful later.
Combining this theorem with the theorem of§6.7, we see that ifwe use the metric connection then metric flatness coincides with affine flatness.

6.12 The curvature tensor
The curvature tensor or Riemann-Christoffel tensor (Riemann tensor for short) is defined by (6.39), namely,

where I''fx is the metric connection, which by (6.71) is given as

I''i,, = ½g•d( ablldc + a,gdb - adllbc).

Thus, R\,d depends on the metric and its first and second derivatives. It

follows immediately from the definition that it is anti-symmetric on its last

pair of indices

R•bcd = -R"bdc•

(6.77)

The fact that the connection is symmetric leads to the identity

(6.78)

Lowering the first index with the metric, then it is easy to establish, for example by using geodesic coordinates, that the lowered tensor is symmetric under interchange of the first and last pair of indices, that is,

(6.79)

Combining this with equation (6.77), we see that the lowered tensor is antisymmetric on its first pair of indices as well:

(6.80)

Collecting these symmetries together, we see that the lowered curvature tensor satisfies

These symmetries considerably reduce the number of. independent components; in fact, inn dimensions, the number is reduced from n4 to /2 n2(n2 - 1).
In addition to the algebraic identities, it can be shown, again most easily by using geodesic coordinates, that the curvature tensor satisfies a set of

differential identities called the Bianchi identities:

6.13 The Weyl tensor I 87

We can use the curvature tensor to define several other important tensors. The Ricci tensor is defined by the contraction

which by (6.79) is symmetric. A final contraction defines the curvature scalar or Ricci scalar R by

These two tensors can be used to define the Einstein tensor
(6.85) which is also symmetric, and, by (6.82), the Einstein tensor can be shown to satisfy the contracted Bianchi identities

Note that some authors adopt a different sign convention, which leads to the Riemann tensor or the Ricci tensor having the opposite sign to ours.

6.13 The Weyl tensor

We shall mostly be concerned with tensors in four dimensions or less. The algebraic identities (6.81) lead to the following special cases for the curvature tensor:

(1) if n = 1, Rabcd = O;
(2) if n = 2, Rabcd has one independent component - essentially R; (3) if n = 3, Rabcd has six independent components - essentially R.b; (4) if n = 4, R.bcd has twenty independent components - ten of which are
given by Rab and the remaining ten by the Wey! tensor.

The Weyl tensor or conformal tensor Cabcd is defined in n dimensions, (n ~ 3) by

1

+ - - + n- Cabcd = Rabcd

2 (g.dR cb llbcRda - OacRdb - llbdRca)

1
+ (n _ l)(n _ 2) (g.cgdb - lladgcb)R.

88 I Tensor calculus

Thus, in four dimensions, this becomes

It is straightforward to show that the Weyl tensor possesses the same symmetries as the Riemann tensor, namely,

Combining this result with the previous symmetries, it then follows that the Weyl tensor is trace-free, in other words, it vanishes for any pair of contracted indices. One can think of the Weyl tensor as that part of the curvature tensor for which all contractions vanish.
Two metrics Oab and iiab are said to be conformally related or conformal to each other if
where f.!(x) is a non-zero differentiable function. Given a manifold with two metrics defined on it which are conformal, then it is straightforward from (6.51) and (6.52) to show that angles betwee·n vectors and ratios of magnitudes of vectors, but not lengths, are the same for each metric. Moreover, the null geodesics of one metric coincide with the null geodesics of the other (exercise). The metrics also possess the same Weyl tensor, i.e.
Any quantity which satisfies a relationship like (6.91) is called conformally
invariant (gab• ric, and R~d are examples of quantities which are not
conformally invariant). A metric is said to be conformally flat if it can be reduced to the form
(6.92) where flab is a flat metric. We end this section by quoting two results concerning conformally flat metrics.

I Exercises 89

Exercises
6.1 (§6.2) Prove (6.13) by showing that Lxo;; = 0 in two
ways: (i) using (6.17); (ii) from first principles (remembering Exercise 5.8). 6.2 (§6.2) Use (6.17) to find expressions for LxZ b, and Lx( ya Zb,). Use these expressions and (6.15) to check the Leibniz property in the form (6.12). 6.3 (§6.3) Establish (6.23) by assuming that the quantity defined by (6.22) has the tensor character indicated. Take the partial derivative of

with respect to x•b to establish the alternative form (6.24).

6.4 (§6.3) Show that covariant differentiation commutes
with contraction by checking that V,o;; = 0.

6.5 (§6.3) Assuming (6.22) and (6.25), apply the Leibniz rule
to the covariant derivative of X .x•, where x• is arbitrary, to
verify (6.26).

6.6 (§6.3) Check (6.29).

6.7 (§6.4) If X, Y, and Z are vector fields, f and g smooth functions, and .l. and µ constants, then show that

(i) Vx(.l.Y + µZ) = .l.Vx Y + µVxZ,
(ii) V1x+ 9 ,Z =fVxZ + gV,Z,
(iii) VxUYJ = (Xf) Y + fVx Y.

6.8 (§6.4) Show that (6.33) leads to (6.34).

6.9 (§6.4) Ifs is an affine parameter, then show that, under

the transformation

s s--+ = s(s),

the parameter swill be affine only ifs = ocs + /3, where cc and pare constants.

6.10 (§6.5) Show that

6.11 (§6.5) Show that
Vx(V yZ") - Vy(VxZ") - = Vex. Y]z· R"b,dzb X' yd.
6.12 (§6.7) Prove that if a manifold is affine flat then the connection is necessarily integrable and symmetric.
6.13 (§6.8) Show that if 9ab is diagonal, i.e. 9ab = 0 if a #- b,
then g•b is diagonal with corresponding reciprocal diagonal elements. 6.14 (§6.8) The line elements of JR 3 in Cartesian, cylindrical polar, and spherical polar coordinates are given respectively by
(i) ds2 = dx2 + dy2 + dz2 ,
= (ii) ds2 dR 2 + R2 d</> 2 + dz 2 , = (iii) ds2 dr2 + r2 d0 2 + r 2 sin 2 0dcp 2 .
Find 9ab, g•b, and g in each case. 6.15 (§6.8) Express T.b in terms of T'd. 6.16 (§6.9) Write down the tensor transformation law of 9ab· Show directly that
transforms like a connection. 6.17 (§6.9) Find the geodesic equation for JR3 in cylindrical polars. [Hint: use the results of Exercise 6.14(ii) to compute the metric connection and substitute in (6.68).] 6.18 (§6.9) Consider a 3-space with coordinates
(x") = (x, y, z) and line element ds2 = dx2 + dy2 - dz2 .
Prove that the null geodesics are given by
x = lu + I', y =mu+ m', z =nu+ n',
where u is a parameter and /, /', m, m', n, n' are arbitrary
constants satisfying 12 + m 2 - n2 = 0.

90 I Tensor calculus
6.19 (§6.10) Prove that V,gab =0. Deduce that
vbxa = g.,VbX'.
6.20 (§6,10) Suppose we have an arbitrary symmetric con-
nection I''t,, satisfying V,gab = 0. Deduce that I''t,, must be the
metric connection. [Hint: use the equation to find expressions for abgd,• a,g4b and - a,gbc, as in (6.76), add the equations together, and multiply by ½gaa.]
6.21 (§6.11) The Minkowski line element in Minkowski coordinates
= (x•) = (x0 , x', x 2 , x 3 ) (t, x, y, z)
is given by
ds2 = dt2 - dx 2 - dy2 - dz2
(i) What is the signature? (ii) Is the metric non-singular? (iii) Is the metric flat?
6.22 (f6.11) The line element of JR3 in a particular coordinate system is
ds2 =(dx 1)2 + (x 1)2 (dx 2)2 +(x1 sinx 2 )2 (dx3 )2
(i) Identify the coordinates. (ii) Is the metric flat?
6.23 (§6.12) Establish the identities (6.78) and (6.79). [Hint: choose an arbitrary point P and introduce geodesic co-
ordinates at P.] Show that (6.78) is equivalent to R•lbcdJ = 0.
6.24 (§.6.12) Establish the identity (6.82). [Hint: use geodesic coordinates.] Show that (6.82) is equivalent to
= Rd,[ab;c] 0. Deduce (6.86).
6.25 (§6.12) Show that G.b = 0 if and only if R.b = 0.
6.26 (§6.13) Establish the identity (6.89). Deduce that the Weyl tensor is trace-free on all pairs of indices.
6.27 (§6.13) Show that angles between vectors and ratios of lengths of vectors, but not lengths, are the same for conformally related metrics.

6.28 (§6.13) Prove that the null geodesics of two conformally related metrics coincide. [Hint: the two classes of geodesics need not both be affinely parametrized.]
6.29 (§6.13) Establish (6.91).
6.30 (§6.13) Establish the theorem that any two-dimensional Riemann manifold is conformally flat in the case of a metric of signature 0, i.e. at any point the metric can be reduced to the diagonal form ( + 1, -1) say. [Hint: use null curves as coordinate curves, that is, change to new coordinates
). = ).(x0 , x 1), v = v(x 0 ,x 1 )

satisfying

g•b ).,a ).,b = g® V,a V,b = Q

and show that the line element reduces to the form
ds2 = e2µ d).dv
and finally introduce new coordinates ½0- + v) and
½(). - v).]
6.31 This final exercise consists of a long calculation which will be needed later in the book. If we take coordinates

x0 =(x0 ,x1,x2 ,x3 )=(t,r,0,tj>),

then the four-dimensional spherically symmetric line element is

ds2 = e'dt2 - e'dr2 - r2d02 - r2 sin2 0dcp2,

where v = v( t, r) and ). = ).( t, r) are arbitrary functions of t and r.
(i) Find 9ab, g, and g•b (see Exercise 6.13).
(ii) Use the expressions in (i) to calculate rb,. [Hint: re-
member I'bc = I'~b-] (iii) Calculate Rahed' [Hint: use the symmetry relations
(6.81).] (iv) Calculate Rab• R, .and Gab· (v) Calculate G0 b( =g0 'G,b = Gb0 ).

7.1 Tensor densities
A tensor density of weight W, denoted conventionally by a gothic letter,
1r:, transforms like an ordinary tensor, except that in addition the Wth
power of the Jacobian
I I ox•
J = OX'b appears as a factor, i.e.
Then, with certain modifications, we can combine tensor densities in much the same way as we do tensors. One exception, which follows from (7.1), is that the product of two tensor densities of weight W1 and W2 is a tensor
density of weight W1 + W2 . There is some arbitrariness in defining the
covariant derivative of a tensor density, but we shall adhere to the definition
that if !i::: is a tensor density of weight W then
For example, the covariant derivative of a vector density of weight Wis
V/!" = o/!" + rgc!b - WI':C!". In the special case when W = + 1 and c = a, we get the important result
(check)
that is, the covariant divergence of a vector density of weight + 1 is identical
to its ordinary divergence. It can be shown that both these quantities are
scalar densities of weight + 1 (exercise).

92 I Integration, variation, and symmetry
7.2 The Levi-Civita alternating symbol
We introduce a quantity which is a generalization of the Kronecker delta o:,
but which turns out to be a tensor density. The Levi-Civita alternating
symbol eabcd is a completely anti-symmetric tensor density of weight +1and contravariant rank 4, whose values in any coordinate system is + 1 or -1 if
abed is an even or odd permutation of0123, respectively, and zero otherwise. Thus, for example, in four dimensions, if we let the coordinates range from 0, to 3 (as we shall), i.e.
then some of its values are
+ i,0123 = i,2301 = -EOt32 = -E0321 = 1
and
Similarly, we can define the covariant version Eabcd• which has weight -1. It can be used, in particular, to form the determinant of a second-rank density, i.e.
Assuming this is non-zero, we can then also use it to construct the inverse of a second-rank tensor. The covariant derivatives of both e•bcd and &abed vanish identically, which from one point of view motivates the definition (7.2).
We define the generalized Kronecker delta by
+ 1 for a t= b, a = c, b = d,
o~: = { - 1 for a t= b, a = d, b = c,
0 otherwise,
and similarly for higher-order tensors. They are constant tensors of the type indicated, and can be defined in terms of the Kronecker delta by the determinant relationships
and
od oS o:;
od~ = o: o: o~ ,
01 o} 01
and so forth. In four dimensions they are related to products of the'a1ternating symbols according to
g•bcdEefgh = 0:r:h, &abcdEefgd = 0:}~, e•bcd&efcd = 20:}, e•bcdEebcd = 3!0:, &abed &abed = 4 !.

7.3 The metric determinant I 93
7.3 The metric determinant
If we have a Riemannian manifold with metric gab• then it transforms according to
(7.4)
and so, taking determinants, we have
g' = J2g.
Hence the metric determinant g is a scalar density of weight + 2. In the later
chapters, we shall be working with metrics of negative signature in which case g will be negative, and so we write the last equation in the equivalent form
Since all these terms are now positive, we can take square roots, to get
and hence (-g)¼ is a scalar density of weight + 1. The quantity (-g)¼ plays
an important role in integration. Given any tensor r:,-::, we can form the
product (-g)¼ T:,-:: which is then a tensor density of weight + 1. In particu-
lar, we can deduce an important result from equation (7.3), namely, for any vector Ta,

Now, at any point, the covariant and contravariant metrics are symmetric matrices which are inverse to each other by
gabgbc = 0~.
Let us digress for a moment and consider the general case of finding the derivative of a determinant of a matrix whose elements are functions of the
coordinates. Consider any square matrix A = (aii). Then its inverse, (bii) say,
is defined by
(7.6)

where a is the determinant of A, A ii is tQe cofactor of a,,, and the prime denotes the transpose. Let us fix i, and expand the determinant a by the ith row. Then
n
a-- L.~,a,, A11
j=l
where we have explicitly included the summation sign for clarity. If we partially differentiate both sides with respect to aii• then we get

aa

..

-=A'J

(7.7)

aaij

'

94 I Integration, variation, and symmetry
since aii does not occur in any of the cofactors Aii (i fixed,} runs from 1 ton). . Repeating the argument for every i, as i runs from 1 to n, we see that the formula (7.7) is quite general. Let us suppose that the aii are all functions of the coordinates xk. Then the determinant is a functional of the aii, which in turn are functions of the xk, that is,
a= a(a;j(xk)). Differentiating this partially with respect to x\ using ihe function of a function rule and equation (7.7), we obtain
aa aa aaij
,, axk = aa.. axk
= abi; aaii axk
by equation (7.6). Applying this result to the metric determinant g and remembering that g•b is symmetric, we get the useful equation

We now combine this result with (6.76) (which comes directly from the vanishing of the covariant derivative of the metric) and find

acg = gg•b(r:clldb + rtcgad)

= go:r:c + uoSrtc

= 2gr;c.

(7.9)

Let us compute the covariant derivative of g using (7.2). Then, since g is a
scalar density of weight + 2, we have

Veg= acg - 2gr;c,

and so by equation (7.9) it follows that

This is again intimately connected with the choice of the definition (7.2). Similarly, we find from equation (7.9) that
ac(-g)½ - (-g)½ r:c = 0,
that is, by (7.2),

In particular, for any tensor Tt:::, this leads to the identity

Vc[(-g)tr,::::] = (-g)½(VcT,::::),

(7.12)