Principles of Quantum Mechanics
SECOND EDITION

Principles of Quantum Mechanics
SECOND EDITION
R. Shankar
Yale University New Haven, Connecticut
~Springer

Library of Congress Cataloging–in–Publication Data
Shankar, Ramamurti. Principles of quantum mechanics / R. Shankar. 2nd ed. p. cm. Includes bibliographical references and index. ISBN 0-306-44790-8 1. Quantum theory. I. Title.
QC174. 12.S52 1994 530. 1’2–dc20

94–26837 CIP

ISBN 978-1-4757-0578-2

ISBN 978-1-4757-0576-8 (eBook)

DOI: 10.1007/978-1-4757-0576-8

© 1994, 1980 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed in the United States of America.

19 18

(corrected printing, 2008)

springer.com

To My Parent~
and to Uma, Umesh, Ajeet, Meera, and Maya

Preface to the Second Edition

Over the decade and a half since I wrote the first edition, nothing has altered my

belief in the soundness of the overall approach taken here. This is based on the

response of teachers, students, and my own occasional rereading of the book. I was

generally quite happy with the book, although there were portions where I felt I

could have done better and portions which bothered me by their absence. I welcome

this opportunity to rectify all that.

Apart from small improvements scattered over the text, there are three major

changes. First, I have rewritten a big chunk of the mathematical introduction in

Chapter 1. Next, I have added a discussion of time-reversal invariance. I don't know

how it got left out the first time-1 wish I could go back and change it. The most

important change concerns the inclusion of Chaper 21, "Path Integrals: Part II."

The first edition already revealed my partiality for this subject by having a chapter

devoted to it, which was quite unusual in those days. In this one, I have cast off all

restraint and gone all out to discuss many kinds of path integrals and their uses.

Whereas in Chapter 8 the path integral recipe was simply given, here I start by

deriving it. I derive the configuration space integral (the usual Feynman integral),

phase space integral, and (oscillator) coherent state integral. I discuss two applica-

tions: the derivation and application of the Berry phase and a study of the lowest

Landau level with an eye on the quantum H.all effect. The relevance of these topics

is unquestionable. This is followed by a section of imaginary time path integrals~

its description of tunneling, instantons, and symmetry breaking, and its relation to

classical and quantum statistical mechanics. An introduction is given to the transfer

matrix. Then I discuss spin coherent state path integrals and path integrals for

fermions. These were thought to be topics too advanced for a book like this, but I

believe this is no longer true. These concepts are extensively used and it seemed a

good idea to provide the students who had the wisdom to buy this book with a head

start.

How are instructors to deal with this extra chapter given the time constraints?

I suggest omitting some material from the earlier chapters. (No one I know, myself

included, covers the whole book while teaching any fixed group of students.) A

realistic option is for the instructor to teach part of Chapter 21 and assign the rest

as reading material, as topics for take-home exams, term papers, etc. To ignore it,

vii

viii
PREFACE TO THE SECOND EDITION

I think, would be to lose a wonderful opportunity to expose the student to ideas that are central to many current research topics and to deny them the attendant excitement. Since the aim of this chapter is to guide students toward more frontline topics, it is more concise than the rest of the book. Students are also expected to consult the references given at the end of the chapter.
Over the years, I have received some very useful feedback and I thank all those students and teachers who took the time to do so. I thank Howard Haber for a discussion of the Born approximation; Harsh Mathur and Ady Stern for discussions of the Berry phase; Alan Chodos, Steve Girvin, Ilya Gruzberg, Martin Gutzwiller, Ganpathy Murthy, Charlie Sommerfeld, and Senthil Todari for many useful comments on Chapter 21. I am most grateful to Captain Richard F. Malm, U.S.C.G. (Retired), Professor Dr. D. Schlüter of the University of Kiel, and Professor V. Yakovenko of the University of Maryland for detecting numerous errors in the first printing and taking the trouble to bring them to my attention. I thank Amelia McNamara of Plenum for urging me to write this edition and Plenum for its years of friendly and warm cooperation. I thank Ron Johnson, Editor at Springer for his tireless efforts on behalf of this book, and Chris Bostock, Daniel Keren and Jimmy Snyder for their generous help in correcting errors in the 14th printing. Finally, I thank my wife Uma for shielding me as usual from real life so I could work on this edition, and my battery of kids (revised and expanded since the previous edition) for continually charging me up.
R. Shankar New Haven, Connecticut

Preface to the First Edition
Publish and perish-Giordano Bruno

Given the number of books that already exist on the subject of quantum mechanics,

one would think that the public needs one more as much as it does, say, the latest

version of the Table oflntegers. But this does not deter me (as it didn't my predeces-

sors) from trying to circulate my own version of how it ought to be taught. The

approach to be presented here (to be described in a moment) was first tried on a

group of Harvard undergraduates in the summer of '76, once again in the summer

of '77, and more recently at Yale on undergraduates ('77-'78) and graduates ('78-

'79) taking a year-long course on the subject. In all cases the results were very

satisfactory in the sense that the students seemed to have learned the subject well

and to have enjoyed the presentation. It is, in fact, their enthusiastic response and

encouragement that convinced me of the soundness of my approach and impelled

me to write this book.

The basic idea is to develop the subject from its postulates, after addressing

some indispensable preliminaries. Now, most people would agree that the best way

to teach any subject that has reached the point of development where it can be

reduced to a few postulates is to start with the latter, for it is this approach that

gives students the fullest understanding of the foundations of the theory and how it

is to be used. But they would also argue that whereas this is all right in the case of

special relativity or mechanics, a typical student about to learn quantum mechanics

seldom has any familiarity with the mathematical language in which the postulates

are stated. I agree with these people that this problem is real, but I differ in my belief

that it should and can be overcome. This book is an attempt at doing just this.

It begins with a rather lengthy chapter in which the relevant mathematics of

vector spaces developed from simple ideas on vectors and matrices the student is

assumed to know. The level of rigor is what I think is needed to make a practicing

quantum mechanic out of the student. This chapter, which typically takes six to

eight lecture hours, is filled with examples from physics to keep students from getting

too fidgety while they wait for the "real physics." Since the math introduced has to

be taught sooner or later, I prefer sooner to later, for this way the students, when

they get to it, can give quantum theory their fullest attention without having to

ix

X
PREFACE TO THE FIRST EDITION

battle with the mathematical theorems at the same time. Also, by segregating the mathematical theorems from the physical postulates, any possible confusion as to which is which is nipped in the bud.
This chapter is followed by one on classical mechanics, where the Lagrangian and Hamiltonian formalisms are developed in some depth. It is for the instructor to decide how much of this to cover; the more students know of these matters, the better they will understand the connection between classical and quantum mechanics. Chapter 3 is devoted to a brief study of idealized experiments that betray the inadequacy of classical mechanics and give a glimpse of quantum mechanics.
Having trained and motivated the students I now give them the postulates of quantum mechanics of a single particle in one dimension. I use the word "postulate" here to mean "that which cannot be deduced from pure mathematical or logical reasoning, and given which one can formulate and solve quantum mechanical problems and interpret the results." This is not the sense in which the true axiomatist would use the word. For instance, where the true axiomatist would just postulate that the dynamical variables are given by Hilbert space operators, I would add the operator identifications, i.e., specify the operators that represent coordinate and momentum (from which others can be built). Likewise, I would not stop with the statement that there is a Hamiltonian operator that governs the time evolution
through the equation i1101lf/);at= HI 'If); I would say the His obtained from the
classical Hamiltonian by substituting for x and p the corresponding operators. While the more general axioms have the virtue of surviving as we progress to systems of more degrees of freedom, with or without classical counterparts, students given just these will not know how to calculate anything such as the spectrum of the oscillator. Now one can, of course, try to "derive" these operator assignments, but to do so one would have to appeal to ideas of a postulatory nature themselves. (The same goes for "deriving'' the Schrodinger equation.) As we go along, these postulates are generalized to more degrees of freedom and it is for pedagogical reasons that these generalizations are postponed. Perhaps when students are finished with this book, they can free themselves from the specific operator assignments and think of quantum mechanics as a general mathematical formalism obeying certain postulates (in the strict sense of the term).
The postulates in Chapter 4 are followed by a lengthy discussion of the same, with many examples from fictitious Hilbert spaces of three dimensions. Nonetheless, students will find it hard. It is only as they go along and see these postulates used over and over again in the rest of the book, in the setting up of problems and the interpretation of the results, that they will catch on to how the game is played. It is hoped they will be able to do it on their own when they graduate. I think that any attempt to soften this initial blow will be counterproductive in the long run.
Chapter 5 deals with standard problems in one dimension. It is worth mentioning that the scattering off a step potential is treated using a wave packet approach. If the subject seems too hard at this stage, the instructor may decide to return to it after Chapter 7 (oscillator), when students have gained more experience. But I think that sooner or later students must get acquainted with this treatment of scattering.
The classical limit is the subject of the next chapter. The harmonic oscillator is discussed in detail in the next. It is the first realistic problem and the instructor may be eager to get to it as soon as possible. If the instructor wants, he or she can discuss the classical limit after discussing the oscillator.

We next discuss the path integral formulation due to Feynman. Given the intuitive understanding it provides, and its elegance (not to mention its ability to give the full propagator in just a few minutes in a class of problems), its omission from so many books is hard to understand. While it is admittedly hard to actually evaluate a path integral (one example is provided here), the notion of expressing the propagator as a sum over amplitudes from various paths is rather simple. The importance of this point of view is becoming clearer day by day to workers in statistical mechanics and field theory. I think every effort should be made to include at least the first three
(and possibly five) sections of this chapter in the course. The content of the remaining chapters is standard, in the first approximation.
The style is of course peculiar to this author, as are the specific topics. For instance, an entire chapter (11) is devoted to symmetries and their consequences. The chapter on the hydrogen atom also contains a section on how to make numerical estimates starting with a few mnemonics. Chapter 15, on addition of angular momenta, also contains a section on how to understand the "accidental" degeneracies in the spectra of hydrogen and the isotropic oscillator. The quantization of the radiation field is discussed in Chapter 18, on time-dependent perturbation theory. Finally the treatment of the Dirac equation in the last chapter (20) is intended to show that several things such as electron spin, its magnetic moment, the spin-orbit interaction, etc. which were introduced in an ad hoc fashion in earlier chapters, emerge as a coherent whole from the Dirac equation, and also to give students a glimpse of what lies ahead. This chapter also explains how Feynman resolves the problem of negativeenergy solutions (in a way that applies to bosons and fermions).

xi
PREPACE TO THE FIRST EDITION

For Whom Is this Book Intended?
In writing it, I addressed students who are trying to learn the subject by themselves; that is to say, I made it as self-contained as possible, included a lot of exercises and answers to most of them, and discussed several tricky points that trouble students when they learn the subject. But I am aware that in practice it is most likely to be used as a class text. There is enough material here for a full year graduate course. It is, however, quite easy so adapt it to a year-long undergraduate course. Several sections that may be omitted without loss of continuity are indicated. The sequence of topics may also be changed, as stated earlier in this preface. I thought it best to let the instructor skim through the book and chart the course for his or her class, given their level of preparation and objectives. Of course the book will not be particularly useful if the instructor is not sympathetic to the broad philosophy espoused here, namely, that first comes the mathematical training and then the development of the subject from the postulates. To instructors who feel that this approach is all right in principle but will not work in practice, I reiterate that it has been found to work in practice, not just by me but also by teachers elsewhere.
The book may be used by nonphysicists as well. (I have found that it goes well with chemistry majors in my classes.) Although I wrote it for students with no familiarity with the subject, any previous exposure can only be advantageous.
Finally, I invite instructors and students alike to communicate to me any suggestions for improvement, whether they be pedagogical or in reference to errors or
misprints.

xii
PREFACE TO THE FIRST EDITION

Acknowledgments
As I look back to see who all made this book possible, my thoughts first turn to my brother R. Rajaraman and friend Rajaram Nityananda, who, around the same time, introduced me to physics in general and quantum mechanics in particular. Next come my students, particularly Doug Stone, but for whose encouragement and enthusiastic response I would not have undertaken this project. I am grateful to Professor Julius Kovacs of Michigan State, whose kind words of encouragement assured me that the book would be as well received by my peers as it was by my students. More recently, I have profited from numerous conversations with my colleagues at Yale, in particular Alan Chodos and Peter Mohr. My special thanks go to Charles Sommerfield, who managed to make time to read the manuscript and made many useful comments and recommendations. The detailed proofreading was done by Tom Moore. I thank you, the reader, in advance, for drawing to my notice any errors that may have slipped past us.
The bulk of the manuscript production cost were borne by the J. W. Gibbs fellowship from Yale, which also supported me during the time the book was being written. Ms. Laurie Liptak did a fantastic job of typing the first 18 chapters and Ms. Linda Ford did the same with Chapters 19 and 20. The figures are by Mr. J. Brosious. Mr. R. Badrinath kindly helped with the index.t
On the domestic front, encouragement came from my parents, my in-laws, and most important of all from my wife, Uma, who cheerfully donated me to science for a year or so and stood by me throughout. Little Umesh did his bit by tearing up all my books on the subject, both as a show of support and to create a need for this one.
R. Shankar New Haven, Connecticut

tIt is a pleasure to acknowledge the help of Mr. Richard Hatch, who drew my attention to a number
of errors in the first printing.

Prelude

Our description of the physical world is dynamic in nature and undergoes frequent change. At any given time, we summarize our knowledge of natural phenomena by means of certain laws. These laws adequately describe the phenomenon studied up to that time, to an accuracy then attainable. As time passes, we enlarge the domain of observation and improve the accuracy of measurement. As we do so, we constantly
check to see :r •he laws continue to be valid. Those laws that do remain valid gain
in stature, and those that do not must be abandoned in favor of new ones that do. In this changing picture, the laws of classical mechanics formulated by Galileo,
Newton, and later by Euler, Lagrange, Hamilton, Jacobi, and others, remained unaltered for almost three centuries. The expanding domain of classical physics met its first obstacles around the beginning of this century. The obstruction came on two fronts: at large velocities and small (atomic) scales. The problem of large velocities was successfully solved by Einstein, who gave us his relativistic mechanics, while the founders of quantum mechanics-Bohr, Heisenberg, Schrodinger, Dirac, Born, and others-solved the problem of small-scale physics. The union of relativity and quantum mechanics, needed for the description of phenomena involving simultaneously large velocities and small scales, turns out to be very difficult. Although much progress has been made in this subject, called quantum field theory, there remain many open questions to this date. We shall concentrate here on just the small-scale problem, that is to say, on non-relativistic quantum mechanics.
The passage from classical to quantum mechanics has several features that are common to all such transitions in which an old theory gives way to a new one:

(1) There is a domain Dn of phenomena described by the new theory and a subdomain Do wherein the old theory is reliable (to a given accuracy).
(2) Within the subdomain Do either theory may be used to make quantitative predictions. It might often be more expedient to employ the old theory.
(3) In addition to numerical accuracy, the new theory often brings about radical conceptual changes. Being of a qualitative nature, these will have a bearing on all of Dn.

For example, in the case of relativity, Do and Dn represent (macroscopic)

phenomena involving small and arbitrary velocities, respectively, the latter, of course,

xiii

xiv
PRELUDE

being bounded by the velocity of light. In addition to giving better numerical predictions for high-velocity phenomena, relativity theory also outlaws several cherished notions of the Newtonian scheme, such as absolute time, absolute length, unlimited velocities for particles, etc.
In a similar manner. quantum mechanics brings with it not only improved numerical predictions for the microscopic world, but also conceptual changes that rock the very foundations of classical thought.
This book introduces you to this subject, starting from its postulates. Between you and the postulates there stand three chapters wherein you will find a summary of the mathematical ideas appearing in the statement of the postulates, a review of classical mechanics, and a brief description of the empirical basis for the quantum theory. In the rest of the book, the postulates are invoked to formulate and solve a variety of quantum mechanical problems. rt is hoped thaL by the time you get to the end of the book, you will be able to do the same yourself.
Note to the Student
Do as many exercises as you can, especially the ones marked * or whose results
carry equation numbers. The answer to each exercise is given <~ither with the exercise or at the end of the book.
The first chapter is very important. Do not rush through it. Even if you know the math, read it to get acquainted with the notation.
I am not saying it is an easy subject. But I hope this book makes it seem reasonable.
Good luck.

Contents

l. Mathematical Introduction

1

1.1. Linear Vector Spaces: Basics .

1

1.2. Inner Product Spaces .

7

1.3. Dual Spaces and the Dirac Notation

11

1.4. Subspaces .

17

1.5. Linear Operators .

18

1.6. Matrix Elements of Linear Operators

20

1.7. Active and Passive Transformations.

29

1.8. The Eigenvalue Problem.

30

1.9. Functions of Operators and Related Concepts

54

1.10. Generalization to Infinite Dimensions

57

2. Review of Classical Mechanics .

75

2.1. The Principle of Least Action and Lagrangian Mechanics

78

2.2. The Electromagnetic Lagrangian

83

2.3. The Two-Body Problem .

85

2.4. How Smart Is a Particle?

86

2.5. The Hamiltonian Formalism .

86

2.6. The Electromagnetic Force in the Hamiltonian Scheme

90

2.7. Cyclic Coordinates, Poisson Brackets, and Canonical

Transformations

91

2.8. Symmetries and Their Consequences

98

3. Allis Not Well with Classical Mechanics

107

3.1. Particles and Waves in Classical Physics .

107

3.2. An Experiment with Waves and Particles (Classical)

108

3.3. The Double-Slit Experiment with Light

110

3.4. Matter Waves (de Broglie Waves)

112

3.5. Conclusions

112

XV

xvi

4. The Postulates-a General Discussion

115

CONTENTS

4.1. The Postulates . . . . . . . .

115

4.2. Discussion of Postulates 1-111 .

116

4.3. The Schrodinger Equation (Dotting Your i's and

Crossing your fz's) . . . . . . . . . . . . . .

143

5. Simple Problems in One Dimension .

151

5.1. The Free Particle . . . . . .

151

5.2. The Particle in a Box . . . .

157

5.3. The Continuity Equation for Probability.

164

5.4. The Single-Step Potential: a Problem in Scattering

167

5.5. The Double-Slit Experiment

175

5.6. Some Theorems . . . . . . . . . . . . . . .

176

6. The Classical Limit .

179

7. The Harmonic Oscillator

185

7.1. Why Study the Harmonic Oscillator?

185

7.2. Review of the Classical Oscillator. .

188

7.3. Quantization of the Oscillator (Coordinate Basis).

189

7.4. The Oscillator in the Energy Basis . . . . .

202

7.5. Passage from the Energy Basis to the X Basis

216

8. The Path Integral Formulation of Quantum Theory

223

8.1. The Path Integral Recipe . . . . . . . .

223

8.2. Analysis of the Recipe . . . . . . . . .

224

8.3. An Approximation to U(t) for the Free Particle

225

8.4. Path Integral Evaluation of the Free-Particle Propagator.

226

8.5. Equivalence to the Schrodinger Equation . . . .

229

8.6. Potentials of the Form V=a+hx+cx2 +dx+exx.

231

9. The Heisenberg Uncertainty Relations. . . . .

237

9.I. Introduction . . . . . . . . . . . . .

237

9.2. Derivation of the Uncertainty Relations .

237

9.3. The Minimum Uncertainty Packet . . .

239

9.4. Applications of the Uncertainty Principle

241

9.5. The Energy-Time Uncertainty Relation

245

10. Systems with N Degrees of Freedom . . .

247

10.1. N Particles in One Dimension . . .

247

10.2. More Particles in More Dimensions

259

10.3. Identical Particles . . . . . . . .

260

11. Symmetries and Their Consequences

11.1. 11.2. 11.3. 11.4. 11.5.

Overview. Translational Invariance in Quantum Theory Time Translational Invariance. Parity Invariance Time-Reversal Symmetry .

279

xv:ii

279

CONTENTS

279

294

297

301

12. Rotational Invariance and Angular Momentum

305

12.1. Translations in Two Dimensions.

305

12.2. Rotations in Two Dimensions .

306

12.3. The Eigenvalue Problem of Lc.

313

12.4. Angular Momentum in Three Dimensions

318

12.5. The Eigenvalue Problem of L 2 and Lc

321

12.6. Solution of Rotationally Invariant Problems

339

13. The Hydrogen Atom

353

13.1. The Eigenvalue Problem

353

13.2. The Degeneracy of the Hydrogen Spectrum .

359

13.3. Numerical Estimates and Comparison with Experiment .

361

13.4. Multielectron Atoms and the Periodic Table

369

14. Spin .

373

14.1. Introduction

373

14.2. What is the Nature of Spin?

373

14.3. Kinematics of Spin

374

14.4. Spin Dynamics

385

14.5. Return of Orbital Degrees of Freedom

397

15. Addition of Angular Momenta

403

15.1. A Simple Example .

403

15.2. The General Problem

408

15.3. Irreducible Tensor Operators

416

15.4. Explanation of Some "Accidental" Degeneracies.

421

16. Variational and WKB Methods

429

16.1. The Variational Method

429

16.2. The Wentzel-Kramers-Brillouin Method

435

17. Time-Independent Perturbation Theory

451

17.1. The Formalism

451

17.2. Some Examples .

454

17.3. Degenerate Perturbation Theory .

464

xviii
CONTENTS

18. Time-Dependent Perturbation Theory . .

473

18.1. The Problem . .

473

18.2. First-Order Perturbation Theory.

474

18.3. Higher Orders in Perturbation Theory

484

18.4. A General Discussion of Electromagnetic Interactions

492

18.5. Interaction of Atoms with Electromagnetic Radiation

499

19. Scattering Theory . . . . . . . . . . . . . . . . . . .

523

19.1. Introduction . . . . . . . . . . . . . . . . . .

523

19.2. Recapitulation of One-Dimensional Scattering and Overview

524

19.3. The Born Approximation (Time-Dependent Description)

529

19.4. Born Again (The Time-Independent Approximation).

534

19.5. The Partial Wave Expansion

545

19.6. Two-Particle Scattering.

555

20. The Dirac Equation . . . . .

563

20.1. The Free-Particle Dirac Equation

563

20.2. Electromagnetic Interaction of the Dirac Particle

566

20.3. More on Relativistic Quantum Mechanics

574

21. Path Integrals-II . . . . . . . . .

581

21. 1. Derivation of the Path Integral

582

21.2. Imaginary Time Formalism . .

613

21.3. Spin and Fermion Path Integrals

636

21.4. Summary.

. . . . . .

652

Appendix

655

A. I. Matrix Inversion.

655

A.2. Gaussian Integrals

659

A.3. Complex Numbers .

660

A.4. The i8 Prescription .

661

ANSWERS TO SELECTED ExERCISES

665

TABLE oF CoNsTANTs

669

bJDEX . . . . . . .

671

1

Mathematical Introduction

The aim of this book is to provide you with an introduction to quantum mechanics, starting from its axioms. It is the aim of this chapter to equip you with the necessary mathematical machinery. All the math you will need is developed here, starting from some basic ideas on vectors and matrices that you are assumed to know. Numerous examples and exercises related to classical mechanics are given, both to provide some relief from the math and to demonstrate the wide applicability of the ideas developed here. The effort you put into this chapter will be well worth your while: not only will it prepare you for this course, but it will also unify many ideas you may have learned piecemeal. To really learn this chapter, you must, as with any other chapter,
work out the problems.

1.1. Linear Vector Spaces: Basics

In this section you will be introduced to linear vector spaces. You are surely

familiar with the arrows from elementary physics encoding the magnitude and

direction of velocity, force, displacement, torque, etc. You know how to add them

and multiply them by scalars and the rules obeyed by these operations. For example,

you know that scalar multiplication is distributive: the multiple of a sum of two

vectors is the sum of the multiples. What we want to do is abstract from this simple

case a set of basic features or axioms, and say that any set of objects obeying the same

forms a linear vector space. The cleverness lies in deciding which of the properties to

keep in the generalization. If you keep too many, there will be no other examples;

if you keep too few, there will be no interesting results to develop from the axioms.

The following is the list of properties the mathematicians have wisely chosen as

requisite for a vector space. As you read them, please compare them to the world

of arrows and make sure that these are indeed properties possessed by these familiar

vectors. But note also that conspicuously missing are the requirements that every

vector have a magnitude and direction, which was the first and most salient feature

drilled into our heads when we first heard about them. So you might think that in

dropping this requirement, the baby has been thrown out with the bath water.

However, you will have ample time to appreciate the wisdom behind this choice as

1

2
CHAPTER I

you go along and see a great unification and synthesis of diverse ideas under the heading of vector spaces. You will see examples of vector spaces that involve entities that you cannot intuitively perceive as having either a magnitude or a direction. While you should be duly impressed with all this, remember that it does not hurt at all to think of these generalizations in terms of arrows and to use the intuition to prove theorems or at the very least anticipate them.
Definition 1. A linear vector space W is a collection of objects 11 ), 12), ... , IV), ... , I W), ... , called vectors, for which there exists
1. A definite rule for forming the vector sum, denoted IV) + IW) 2. A definite rule for multiplication by scalars a, b, ... , denoted al V) with the
following features:
• The result of these operations is another element of the space, a feature called closure: IV)+ I W)e'V.
• Scalar multiplication is distributive in the vectors: a( IV)+ I W)) = al V)+al W).
• Scalar multiplication is distributive in the scalars: (a+b)l V)=al V)+bl V). • Scalar multiplication is associative: a(bl V)) = abl V). • Addition is commutative: I V) + I W) = IW) + I V). • Addition is associative: IV)+ (I W) + IZ)) =(IV)+ I W)) + IZ). • There exists a null vector 10) obeying IV)+ 10) =IV).
• For every vector IV) there exists an inverse under addition, 1- V), such that IV>+ I-V)= IO).
There is a good way to remember all of these; do what comes naturally.
Definition 2. The numbers a, b, ... are called the field over which the vector space is defined.
If the field consists of all real numbers, we have a real vector space, if they are complex, we have a complex vector space. The vectors themselves are neither real nor complex; the adjective applies only to the scalars.
Let us note that the above axioms imply
• 10) is unique, i.e., if IO') has all the properties of 10), then 10) = 10'). • OJV)=IO). • 1-V)=-JV). • 1- V) is the unique additive inverse of IV).
The proofs are left as to the following exercise. You don't have to know the proofs, but you do have to know the statements.
Exercise 1.1.1. Verify these claims. For the first consider 10) + 10') and use the advertised properties of the two null vectors in turn. For the second start with 10) = (0+ 1)1 V) + 1- V). For the third, begin with jV)+(-jV))=OjV)=IO). For the last, let IW) also satisfy IV)+ IW) = 10). Since 10) is unique, this means IV)+ IW) =IV)+ 1- V). Take it from here.

Figure 1.1. The rule for vector addition. Note that it obeys axioms (i)-(iii).

3
MATHEMATICAL INTRODUCTION

Exercise 1.1.2. Consider the set of all entities of the form (a, b, c) where the entries are real numbers. Addition and scalar multiplication are defined as follows:
(a, b, c)+(d, e, f)=(a+d, b+e, c+f)
a(a, b, c)= (a a, ab, a c).
Write down the null vector and inverse of (a, b, c). Show that vectors of the form (a, b, 1) do not forr.1 ::: vector space.
Observe that we are using a new symbol IV) to denote a generic vector. This
object is called ket V and this nomenclature is due to Dirac whose notation will be discussed at some length later. We do not purposely use the symbol V to denote the vectors as the first step in weaning you away from the limited concept of the vector as an arrow. You are however not discouraged from associating with IV) the arrowlike object till you have seen enough vectors that are not arrows and are ready to drop the crutch.
You were asked to verify that the set of arrows qualified as a vector space as you read the axioms. Here are some of the key ideas you should have gone over. The vector space consists of arrows, typical ones being Vand V'. The rule for addition is familiar: take the tail of the second arrow, put it on the tip of the first, and so on as in Fig. 1.1.
Scalar multiplication by a corresponds to stretching the vector by a factor a. This is a real vector space since stretching by a complex number makes no sense. (If a is negative, we interpret it as changing the direction of the arrow as well as rescaling
it by lal.) Since these operations acting on arrows give more arrows, we have closure.
Addition and scalar multiplication clearly have all the desired associative and distributive features. The null vector is the arrow of zero length, while the inverse of a vector is the vector reversed in direction.
So the set of all arrows qualifies as a vector space. But we cannot tamper with it. For example, the set of all arrows with positive z-componeat~ do not form a vector space: there is no inverse.
Note that so far, no reference has been made to magnitude or direction. The point is that while the arrows have these qualities, members of a vector space need not. This statement is pointless unless I can give you examples, so here are two.
Consider the set of all 2 x 2 matrices. We know how to add them and multiply them by scalars (multiply all four matrix elements by that scalar). The corresponding rules obey closure, associativity, and distributive requirements. The null matrix has all zeros in it and the inverse under addition of a matrix is the matrix with all elements negated. You must agree that here we have a genuine vector space consisting of things which don't have an obvious length or direction associated with them. When we want to highlight the fact that the matrix M is an element of a vector space, we may want to refer to it as, say, ket number 4 or: 14).

4
CHAPTER I

As a second example, consider all functionsf(x) defined in an interval 0 sx:::; L. We define scalar multiplication by a simply as af(x) and addition as pointwise addition: the sum of two functionsfand g has the valuef(x)+g(x) at the point x.
The null function is zero everywhere and the additive inverse off is - f
Exercise I. 1.3. Do functions that vanish at the end points x = 0 and x '= L form a vector
space? How about periodic fimctions obeying .f(O) =.f(L)? How about functions that obey .f(O) = 4? If the functions do not qualify, list the things that go wrong.
The next concept is that of linear independence of a set of vectors i l), i 2) ... In).
First consider a linear relation of the form

L: adi)=IO)
i-" 1

(1.1.1)

We may assume without loss of generality that the left-hand side does not contain any multiple of I0), for if it did, it could be shifted to the right, and combined with the 10) there to give 10) once more. (We are using the fact that any multiple of 10) equals 10).)

Definition 3. The set of vectors is said to be finear(v independent if the only such linear relation as Eq. ( 1.1.1) is the trivial one with all ai = 0. If the set of vectors is not linearly independent, we say they are linearly dependent.

Equation ( 1.1.1) tells us that it is not possible to vvrite any member of the linearly independent set in terms of the others. On the other hand, if the set of vectors is linearly dependent, such a relation will exist, and it must contain at least two nonzero coefficients. Let us say rl} ¥0. Then we could write

13) =

L:

···-a
--' Ii)

a3 i= !,;<S3

(1.1.2)

thereby expressing 13) in terms of the others. As a concrete example, consider two nonparallel vectors II) and 12) in a plane.
These form a linearly independent set. There is no way to write one as a multiple of
the other, or equivalently, no way to combine them to get the null vector. On the other hand, if the vectors are parallel, we can clearly write one as a multiple of the other or equivalently play them against each other to get 0.
Notice I said 0 and not 10). This is, strictly speaking, incorrect since a set of vectors can only add up to a vector and not a number. It is, however, common to represent the null vector by 0.
> Suppose we bring in a third vector 13 also in the plane. If it is parallel to either
of the first two, we already have a linearly dependent set. So let us suppose it is not. But even now the three of them are linearly dependent. This is because we can write one of them, say 13), as a linear combination of the other two. To find the combina-
tion, draw a line from the tail of 13) in the direction of 11 ). Next draw a line
antiparallel to from the tip of 13). These lines will intersect since I) and 12) are

not parallel by assumption. The intersection point P will determine how much of 11) and 12) we want: we go from the tail of 13) to P using the appropriate multiple of 11) and go from P to the tip of 13) using the appropriate multiple of 12).

5
MATHEMATICAL INTRODUCTION

Exercise 1.1. 4. Consider three elements from the vector space of real 2 x 2 matrices:

11>=[~ ~]

12>=[~ ~]

13)=[-2 -1] 0 -2

Are they linearly independent? Support your answer with details. (Notice we are calling these matrices vectors and using kets to represent them to emphasize their role as elements
of a vector space)

Exercise 1.1.5. Show that the following row vectors are linearly dependent: (1, I, 0), (1, 0, 1), and (3, 2, 1). Show the opposite for (1, 1, 0), (1, 0, 1), and (0, 1, 1).

Definition 4. A vector space has dimension n if it can accommodate a maximum of n linearly independent vectors. It will be denoted by 'l.lr(R) if the field is real and by 'l.lr(C) if the field is complex.

In view of the earlier discussions, the plane is two-dimensional and the set of all arrows not limited to the plane define a three-dimensional vector space. How about 2 x 2 matrices? They form a four-dimensional vector space. Here is a proof. The following vectors are linearly independent:

II>=[~ ~] 12>=[~ ~] 13>=[~ ~] 14>=[~ ~]

since it is impossible to form linear combinations of any three of them to give the fourth any three of them will have a zero in the one place where the fourth does not. So the space is at least four-dimensional. Could it be bigger? No, since any arbitrary 2 x 2 matrix can be written in terms of them:

[: !]=all)+bl2)+cl3)+dl4)
If the scalars a, b, c, dare real, we have a real four-dimensional space, if they are complex we have a complex four-dimensional space.
Theorem 1. Any vector IV) in ann-dimensional space can be written as a linear combination of n linearly independent vectors II) ... In).
The proof is as follows: if there were a vector IV) for which this were not possible, it would join the given set of vectors and form a set of n + I linearly independent vectors, which is not possible in an n-dimensional space by definition.

6
CHAPTER I

Definition 5. A set of n linearly independent vectors in an n-dimensional space is called a basis.

Thus we can write, on the strength of the above

IV)= L V;li)
i=l
where the vectors Ii) form a basis.

(1.1.3)

Definition 6. The coefficients of expansion V; of a vector in terms of a linearly
independent basis (I i)) are called the components of the vector in that basis.
Theorem 2. The expansion in Eq. (1.1.3) is unique.

Suppose the expansion is not unique. We must then have a second expansion:

n
IV)= L v;li)
i=l

(1.1.4)

Subtracting Eq. (1.1.4) from Eq. (1.1.3) (i.e., multiplying the second by the scalar -1 and adding the two equations) we get'

IO)=I (v;-v;)li)

(1.1.5)

which implies that

v,=v;

( 1.1.6)

since the basis vectors are linearly independent and only a trivial linear reiation
between them can exist. Note that given a basis the components are unique, but if
we change the basis, the components will change. We refer to IV) as the vector in
the abstract, having an existence of its own and satisfying various relations involving other vectors. When we choose a basis the vectors assume concrete forms in terms
of their components and the relation between vectors is satisfied by the components.
Imagine for example three arrows in the plane, A, il, Csatisfying A+ B= Caccording
to the laws for adding arrows. So far no basis has been chosen and we do not need
a basis to make the statement that the vectors from a closed triangle. Now we choose
a basis and write each vector in terms of the components. The components will
satisfy C; =A;+ B;, i = 1, 2. If we choose a different basis, the components will change
in numerical value, but the relation between them expressing the equality of C to
the sum of the other two will still hold between the new set of components.

In the case of nonarrow vectors, adding them in terms of components proceeds as in the elementary case thanks to the axioms. If
( 1.1.7)

7
MATHEMATICAL INTRODUCTION

IW)=L: wdi) then

( 1.1.8)

(1.1.9)

where we have used the axioms to carry out the regrouping of terms. Here is the conclusion:
To add two vectors, add their components.
There is no reference to taking the tail of one and putting it on the tip of the other, etc., since in general the vectors have no head or tail. Of course, if we are dealing with arrows, we can add them either using the tail and tip routine or by simply adding their components in a basis.
In the same way, we have:
(l.l.lO)

In other words, To multiply a vector by a scalar, multiply all its components by the scalar.

1.2. Inner Product Spaces

The matrix and function examples must have convinced you that we can have a vector space with no preassigned definition of length or direction for the elements. However, we can make up quantities that have the same properties that the lengths and angles do in the case of arrows. The first step is to define a sensible analog of the dot product, for in the case of arrows, from the dot product

A·B= IA II Bl cos e

(1.2.1)

we can read off the length of say A as JfAI·IAI and the cosine of the angle between two vectors as A· B/IAIIBI. Now you might rightfully object: how can you use the dot
product to define the length and angles, if the dot product itself requires knowledge of
the lengths and angles? The answer is this. Recall that the dot product has a second

8
CHAPTER l

I I
I I
I I
~--Pk-:

I

I I

V;

Pj·-·~
1 - - - - - - - - p jk -- .. - -~

Figure 1.2. Geometrical proof that the dot product obeys axiom (3) for an inner product. The axiom requires that the projections obey P,+P1 =P1k.

equivalent expression in terms of the components:

( !.2.2)
Our goal is to define a similar formula for the general case where we do have the notion of components in a basis. To this end we recall the main features of the above dot product:
I. A-B=B·A (symmetry) 2. A- A:2: 0 0 iff A= 0 (positive semidefiniteness) 3. X (bE+ cC) = b.A- B+ ot· C (linearity)
The linearity of the dot product is illustrated in Fig. 1.2. We want to invent a generalization called the inner product or scalar product between any two vectors IV) and I W). We denote it by the symbol (VI W). It is once again a number (generally complex) dependent on the two vectors. We demand that it obey the following axioms:
• (VI W) = ( W! V) * (skew-symmetry)
• (VI V) :2:0 0 iffl V) =I 0) (positive semidefiniteness) • (VI (al W) +biZ))== ( VlaW+ hZ) =a( VI W) +h( VIZ) (linearity in ket)
Definition 7. A vector space with an inner product is called an inner product space.
Notice that we have not yet given an explicit rule for actually evaluating the scalar product, we are merely demanding that any rule we come up with must have these properties. With a view to finding such a rule, let us familiarize ourselves with the axioms. The first differs from the corresponding one for the dot product and makes the inner product sensitive to the order of the two factors, with the two choices leading to complex conjugates. In a real vector space this axioms states the symmetry of the dot product under exchange of the two vectors. For the present, let us note that this axiom ensures that (VI V) is real.
The second axiom says that (VI V) is not just real bnt also positive semidefinite, vanishing only if the vector itself does. If we are going to define the length of the vector as the square root of its inner product with itself (as .in the dot product) this quantity had better be real and positive for all nonzero vectors.

The last axiom expresses the linearity of the inner product when a linear super-
position al W) + bl Z) =I a W + bZ) appears as the second vector in the scalar prod-
uct. We have discussed its validity for the arrows case (Fig. 1.2). What if the first factor in the product is a linear superposition, i.e., what is
(aW+bZI V)? This is determined by the first axiom:

(aW+bZI V) = (VIaW+bZ) *
=(a( VI W) + b( VIZ))* =a*( VI W)*+b*(VIZ)* =a*(WI V)+b*(ZI V)

(1.2.3)

9
MATHEMATICAL INTRODUCTION

which expresses the antilinearity of the inner product with respect to the first factor in the inner product. In other words, the inner product of a linear superposition with another vector is the corresponding superposition of inner products if the superposition occurs in the second factor, while it is the superposition with all coefficients conjugated if the superposition occurs in the first factor. This asymmetry, unfamiliar in real vector spaces, is here to stay and you will get used to it as you go along.
Let us continue with inner products. Even though we are trying to shed the restricted notion of a vector as an arrow and seeking a corresponding generalization of the dot product, we still use some of the same terminology.

Definition 8. We say that two vectors are orthogonal or perpendicular if their inner product vanishes.
Definition 9. We will refer to .j( VI V) =I VI as the norm or length of the vector.
A normalized vector has unit norm.

Definition 10. A set of basis vectors all of unit norm, which are pairwise orthogonal will be called an orthonormal basis.

We will also frequently refer to the inner or scalar product as the dot product.
We are now ready to obtain a concrete formula for the inner product in terms
of the components. Given IV) and IW)

IV)=:[ V;li)

IW>=I W;IJ>

we follow the axioms obeyed by the inner product to obtain:

(VI W)=:[ I v?wj(i
i j

( 1.2.4)

To go any further we have to know (iiJ), the inner product between basis vectors. That depends on the details of the basis vectors and all we know for sure is that

10
CHAPTER l

they are linearly independent This situation exists for arrows as well. Consider a
two-dimensional problem where the basis vectors are two linearly independent but nonperpendicular vectors. If we write all vectors in terms of this basis, the dot
product of any two of them will likewise be a double sum with four terms (detem1ined
by the four possible dot products between the basis vectors) as well as the vector
components. However, if we use an orthonormal basis such as z', j, only diagonal terms like (il i) will survive and we will get the familiar result A· B=A,Bx+ A,B,.
depending only on the components.
For the more general nonarrow case. we invoke Theorem 3.

Theorem 3 (Gram-Schmidt). Given a linearly independent basis we can form linear combinations of the basis vectors to obtain an orthonormal basis.

Postponing the proof for a moment. let us assume that the procedure has been implemented and that the current basis is orthonormal:

for i=j ~

(i

=(\

forirj 1

where bu is called the Kronecker delta syrnhol. Feeding this into Eq. (1.2.4) we find the double sum collapses to a single one due to the Kronecker delta, to give

(V! W)=L: v(w1

( 1.2.5)

This is the form of the inner product we will use from now on. You can now appreciate the first axiom; but for the complex conjugation of
the components of the first vector. <VI V) would not even be real, not to mention positive. But now it is given by
( 1.2.6)

and vanishes only for the null vector. This makes it sensible to refer to <VI V) as
the length or norm squared of a vector. Consider Eq. (1.2.5). Since the vector IV) is uniquely specified by its compo-
nents in a given basis, we may, in this basis, write it as a column vector:

Vv.,I] [ IV)-·+ : in this basis
Vn

( 1.2. 7)

Likewise

IW)--+

in this basis

11
MATHEMATICAL INTRODUCTION
(1.2.8)

The inner product (VI W) is given by the matrix product of the transpose conjugate
of the column vector representing IV) with the column vector representing IW):

(VI W) = [vf, vf, ... , v!]

( 1.2.9)

1.3. Dual Spaces and the Dirac Notation
There is a technical point here. The inner product is a number we are trying to generate from two kets IV) and I W), which are both represented by column vectors in some basis. Now there is no way to make a number out of two columns by direct matrix multiplication, but there is a way to make a number by matrix multiplication of a row times a column. Our trick for producing a number out of two columns has been to associate a unique row vector with one column (its transpose conjugate) and form its matrix product with the column representing the other. This has the feature that the answer depends on which of the two vectors we are going to convert to the row, the two choices (<VI W) and (WI V)) leading to answers related by
complex conjugation. But one can also take the following alternate view. Column vectors are concrete
manifestations of an abstract vector IV) or ket in a basis. We can also work backward and go from the column vectors to the abstract kets. But then it is similarly possible to work backward and associate with each row vector an abstract object (WI, called bra- W. Now we can name the bras as we want but let us do the following.
Associated with every ket I V) is a column vector. Let us take its adjoint, or transpose
conjugate, and form a row vector. The abstract bra associated with this will bear the same label, i.e., it will be called (VI. Thus there are two vector spaces, the space of kets and a dual space of bras, with a ket for every bra and vice versa (the components being related by the adjoint operation). Inner products are really defined only between bras and kets and hence from elements of two distinct but related vector spaces. There is a basis of vectors Ii) for expanding kets and a similar basis (il for expanding bras. The basis ket li) is represented in the basis we are using by a column vector with all zeros except for a I in the ith row, while the basis bra (il is a row vector with all zeros except for a I in the ith column.

12
CHAPTER I

All this may be summarized as follows:

IV) .._.

<--> [vi, vi, ... v~] <--> <VI

(1.3.1)

where<--> means "within a basis." There is, however, nothing wrong with the first viewpoint of associating a scalar
product with a pair of columns or kets (making no reference to another dual space) and living with the asymmetry between the first and second vector in the inner product (which one to transpose conjugate?). If you found the above discussion heavy going, you can temporarily ignore it. The only thing you must remember is that in the case of a general nonarrow vector space:
• Vectors can still be assigned components in some orthonormal basis, just as with arrows, but these may be complex.
• The inner product of any two vectors is given in terms of these components by Eq. (1.2.5). This product obeys all the axioms.

1.3.1. Expansion of Vectors in an Orthonormal Basis
Suppose we wish to expand a vector IV) in an orthonormal basis. To find the components that go into the expansion we proceed as follows. We take the dot
product of both sides of the assumed expansion with If): (or <JI if you are a purist)

I V)=I V;li)

( 1.3.2)

(JI V)=I v;(Jii)

(1.3.3)

( 1.3.4)

i.e., to find thejth component of a vector we take the dot product with thejth unit vector, exactly as with arrows. Using this result we may write

I V>=I li)(il V>

( 1.3.5)

Let us make sure the basis vectors look as they should. If we set IV)= IJ) in Eq.
(1.3.5), we find the correct answer: the ith component of the jth basis vector is 8iJ. Thus for example the column representing basis vector number 4 will have a 1 in the 4th row and zero everywhere else. The abstract relation
( 1.3.6)

becomes in this basis

VI

1

0

()

v2

0

1

()

=vi + v2 0 +· .. v,

Vn

0

0

13
MATHEMATICAL INTRODUCTION
(1.3.7)

1.3.2. Adjoint Operation
We have seen that we may pass from the column representing a ket to the row representing the corresponding bra by the adjoint operation, i.e., transpose
conjugation. Let us now ask: if <VI is the bra corresponding to the ket I V) what
bra corresponds to a! V) where a is some scalar? By going to any basis it is readily found that

a! V)---->

----> [a*vf, a*vi • ... , a*v~]--> (VI a*

( 1.3.8)

It is customary to write a! V) as Ia V) and the corresponding bra as (a VI. What we have found is that

(aV! =(VIa*

( 1.3.9)

Since the relation between bras and kets is linear we can say that if we have an equation among kets such as

a!V) =b! ~V) +c!Z)+ · · ·

(1.3.10)

this implies another one among the corresponding bras:

(VIa*= ( W!b* + (Zic* + · · ·

(1.3.11)

The two equations above are said to be a(ljoints of each other. .Just as any equation involving complex numbers implies another obtained by taking the complex conjugates of both sides, an equation between (bras) kets implies another one between (kets) bras. If you think in a basis, you will see that this follows simply from the fact that if two columns are equal, so are their transpose conjugates.
Here is the rule for taking the adjoint:

14
CHAPTER I

To take the adjoint of a linear equation relating kets (bras), replace every ket (bra) by its bra (ket) and complex conjugate all coefficients.

We can extend this rule as follows. Suppose we have an expansion for a vector:

IV)= I V;li)
i=1

( 1.3.12)

in terms of basis vectors. The adjoint is

(VI= I (ilv7
i=J

Recalling that v;= (il V) and v7 =<VIi), it follows that the adjoint of

IV)= I li)(iiV)
i=!

(1.3.13)

is

(VI= I (VIi)(il
i=]

( 1.3.14)

from which comes the rule:

To take the adjoint of an equation involving bras and kets and coefficients, reverse the order of all factors, exchanging bras and kets and complex conjugating all coefficients.

Gram-Schmidt Theorem
Let us now take up the Gram-Schmidt procedure for converting a linearly independent basis into an orthonormal one. The basic idea can be seen by a simple example. Imagine the two-dimensional space of arrows in a plane. Let us take two nonparallel vectors, which qualify as a basis. To get an orthonormal basis out of these, we do the following:
• Rescale the first by its own length, so it becomes a unit vector. This will be the first basis vector.
• Subtract from the second vector its projection along the first, leaving behind only the part perpendicular to the first. (Such a part will remain since by assumption the vectors are nonparallel.)
• Rescale the left over piece by its own length. We now have the second basis vector: it is orthogonal to the first and of unit length.
This simple example tells the whole story behind this procedure, which will now be discussed in general terms in the Dirac notation.

Let II), Ill),... be a linearly independent basis. The first vector of the orthonormal basis will be
II)=~ where Ill =J<IIl> III

15
MATHEMATICAL INTRODUCTION

Clearly

As for the second vector in the basis, consider
12') =III> -II)(lill)
which is Ill) minus the part pointing along the first unit vector. (Think of the arrow example as you read on.) Not surprisingly it is orthogonal to the latter:
(112') =<I ill)- (lll)(llll) =0
We now divide 12') by its norm to get 12) which will be orthogonal to the first and normalized to unity. Finally, consider
13') =ill!) -ll)(lllll) -12)(21111)
which is orthogonal to both 11) and 12). Dividing by its norm we get 13), the third member of the orthogonal basis. There is nothing new with the generation of the rest of the basis.
Where did we use the linear independence of the original basis? What if we had started with a linearly dependent basis? Then at some point a vector like 12') or 13') would have vanished, putting a stop to the whole procedure. On the other hand, linear independence will assure us that such a thing will never happen since it amounts to having a nontrivial linear combination of linearly independent vectors that adds up the null vector. (Go back to the equations for 12') or 13') and satisfy yourself that these are linear combinations of the old basis vectors.)
Exercise 1.3.1. Form an orthonormal basis in two dimensions starting with A=3t+4]
and iJ = 2i- 6]. Can you generate another orthonormal basis starting with these two vectors?
If so, produce another.

16
CHAPTER I

Exercise 1.3.2. Show how to go from the basis

to the orthonormal basis

12)=[1/~l 13)=[-2~{5]

2/JS

1/)5

When we first learn about dimensionality, we associate it with the number of perpendicular directions. In this chapter we defined it in terms of the maximum number of linearly independent vectors. The following theorem connects the two definitions.

Theorem 4. The dimensionality of a space equals nl_, the maximum number of mutually orthogonal vectors in it.

To show this, first note that any mutually orthogonal set is also linearly independent. Suppose we had a linear combination of orthogonal vectors adding up to zero. By taking the dot product of both sides with any one member and using the orthogonality we can show that the coefficient multiplying that vector had to vanish. This can clearly be done for all the coefficients, showing the linear combination is trivial.
Now n1_ can only be equal to, greater than or lesser than n, the dimensionality of the space. The Gram-Schmidt procedure eliminates the last case by explicit construction, while the linear independence of the perpendicular vectors rules out the penultimate option.

Schwarz and Triangle Inequalities Two powerful theorems apply to any inner product space obeying our axioms:

Theorem 5. The Schwarz Inequality

I<VI W)l ::;I VII WI

(1.3.15)

Theorem 6. The Triangle Inequality

IV+ WI ::; IVI +I WI

(1.3.16)

The proof of the first will be provided so you can get used to working with bras and kets. The second will be left as an exercise.

Before proving anything, note that the results are obviously true for arrows: the Schwarz inequality says that the dot product of two vectors cannot exceed the product of their lengths and the triangle inequality says that the length of a sum cannot exceed the sum of the lengths. This is an example which illustrates the merits of thinking of abstract vectors as arrows and guessing what properties they might share with arrows. The proof will of course have to rely on just the axioms.
To prove the Schwarz inequality, consider axiom (ZIZ)~O applied to

17
MATHEMATICAL INTRODUCTION

We get

IZ)=I V)- (WI V) I W) IWI 2

( 1.3.17)

(ZIZ)=(V- (W/V) WIV- (W/V) W)

IWI 2

IWI 2

V? =(VI V)- (WI V)(VI W) (~I V)*(WI

IWI 2

IWI 2

(WI V)*(WI V)(WI W)
+~~~~~~~~~
IWI 4

(1.3.18)

where we have used the antilinearity of the inner product with respect to the bra. Using

(WI V)*=(VI W)

we find

( 1.3.19)

Cross-multiplying by IWl 2 and taking square roots, the result follows.
Exercise 1.3.3. When will this equality be satisfied? Does this agree with your experience with arrows?
Exercise 1.3.4. Prove the triangle inequality starting with IV+ Wl 2. You must use Re( VI W)::;; I(VI W)l and the Schwarz inequality. Show that the final inequality becomes an
equality only if IV)= al W) where a is a real positive scalar.

1.4. Subspaces
Definition 11. Given a vector space V, a subset of its elements that form a vector space among themselvest is called a subspace. We will denote a particular subspace i of dimensionality n; by V7'.
t Vector addition and scalar multiplication are defined the same way in the subspace as in V.

18
CHAPTER I

Example 1.4.1. In the space V3(R), the following are some examples of sub-

;Y. spaces: (a) all vectors along the x axis, the space V1; (b) all vectors along the y
axis, the space V}; (c) all vectors in the x- y plane, the space V Notice that all

subspaces contain the null vector and that each vector is accompanied by its inverse

to fulfill axioms for a vector space. Thus the set of all vectors along the positive x

axis alone do not form a vector space.

0

Definition 12. Given two subspaces V7' and \lji, we define their sum
V7'EeVji =Vf:k as the set containing (1) all elements of Vi', (2) all elements of
Vji, (3) all possible linear combinations of the above. But for the elements (3),
closure would be lost.

Example 1.4.2. If, for example, V~EBV; contained only vectors along the x

and y axes, we could, by adding two elements, one from each direction, generate

one along neither. On the other hand, if we also included all linear combinations,

we would get the correct answer, V! EB V; = V~y.

0

Exercise 1.4.1. * In a space \r, prove that the set of all vectors {I Vi), I Vi), ... },
orthogonal to any IV) '# 0), form a subspace vn-l.

Exercise 1.4.2. Suppose v;'' and V;' are two subspaces such that any element of V1 is orthogonal to any element of V2 . Show that the dimensionality of V1Ef)V2 is n, +nz. (Hint:
Theorem 4.)

l.S. Linear Operators
An operator n is an instruction for transforming any given vector IV) into
another, IV'). The action of the operator is represented as follows:

OIV)=IV'>

(1.5.1)

One says that the operator n has transformed the ket IV) into the ket IV'). We
will restrict our attention throughout to operators n that do not take us out of the
vector space, i.e., if IV) is an element of a space V, so is IV')= 01 V).
Operators can also act on bras :

(V'IO=(V"I

(1.5.2)

We will only be concerned with linear operators, i.e., ones that obey the following rules:

Qal V,) = aOI V,)

(l.5.3a)

n{al V,)+ /31 Jj)} =aOI V,)+ /301 Jj)

(1.5.3b)

(V1IaO=(V;jQa

(l.5.4a)

((V,Ia + <l'flf3)0=a(V;j0+ P<l'JIO

(l.5.4b)

Figure 1.3. Action of t:~e operator R(hi). Note that R[I2)+13)]=Ri2)+RI3) as expected of a linear operator. (We will often refer to R( hi) as R if no confusion is likely.)

13) 12)+13) R

19
MATHEMATICAL INTRODUCTION

Example 1.5.1. The simplest operator is the identity operator, I, which carries the instruction:

I--> Leave the vector alone!

Thus,

II V) =IV) for all kets IV)

( 1.5.5)

and

(VII=(VI for all bras (VI

( 1.5.6)

We next pass on to a more interesting operator on w'(R):

R(~ ni)-> Rotate vector by~ n about the unit vector i

e [More generally, R(O) stands for a rotation by an angle = 101 about the axis parallel
e to the unit vector = eI e.] Let us consider the action of this operator on the three
unit vectors i, ~.and k, which in our notation will be denoted by 11 ), 12), and 13)
(see Fig. 1.3). From the figure it is clear that

R(~ni)ll)=ll)

(1.5. 7a)

R(h012)=13)

( 1.5.7b)

R(~ ni) 13) = -12)

(l.5.7c)

Clearly R(~ni) is linear. For instance, it is clear from the same figure that

R[I2)+13)]=Ri2)+RI3).

D

The nice feature of linear operators is that once their action on the basis vectors is known, their action on any vector in the space is determined. If

Oli) =It')
for a basis II), 12), ... , In) in 1,r, then for any I V)=I v;li)

(1.5.8)

20
CHAPTER I

This is the case in the example n = R(h'i ). If

is any vector, then

The product ol two operators stands for the instruction that the instruction~ corresponding to the two operators be carried out in sequence

AOI =A(QI V) )=AIOV)

(1.5.9)

where iQ V) is the ket obtained by the action of Q on I v"). The order of the operators in a product is very important: in general,

QA-AO=:[O, A]

called the commutator of Q and A isn't zero. For example R(~ rri) and R(~ rrj) do not commute, i.e., their commutator is nonzero.
Two useful identities involving commutators are

[Q, A8] = J\[0, 8] + [Q, A]O

(1.5.10)

[AQ, 8] = J\[0, 8] +[A, 8]0

(1.5.11)

Notice that apart from the emphasis on ordering, these rules resemble the chain rule in calculus for the derivative of a product.
The inverse of Q, denoted by !:T 1, satisfies:!:

(1.5.12)

Not every operator has an inverse. The condition for the existence of the inverse is
Rd given in Appendix A. L The operator Jri) has an inverse: it is R( -~ Jri ). The
inverse of a product of operators is the product of the inverses in reverse:

(1.5.13)

for only then do we have (QA)(OA)- 1 = (QA)(A ·In··· I) =OAA···Io····l =no··· I =1

1.6. Matrix Elements of Linear Operators
\Ve are now accustomed to the idea of an abstract vector being represented in a basis by an n-tuple of numbers, called its components, in terms of which all vector
~In 'V"(C) with n finite, D 1 D~f.,,,.[2f2 1 =/. Prove this using the ideas introduced toward the end of Theorem A.l.l., Appendix A.l.

operations can be carried out. We shall now see that in the same manner a linear operator can be represented in a basis by a set of n2 numbers, written as an n x n matrix, and called its matrix elements in that basis. Although the matrix elements, just like the vector components, are basis dependent, they facilitate the computation of all basis-independent quantities, by rendering the abstract operator more tangible.
Our starting point is the observation made earlier, that the action of a linear operator is fully specified by its action on the basis vectors. If the basis vectors suffer
a change

21
MATHEMATICAL INTRODUCTION

!lli)= li')

(where Ii') is known), then any vector in this space undergoes a change that is readily
calculable:

!ll V)=Q L v;li)=L v;!lli)=L vdi')

;

i

i

When we say Ii') is known, we mean that its components in the original basis

(1.6.1)

are known. The n2 numbers, Q!'i, are the matrix elements of Q in this basis. If

!liV)=IV'>
then the components of the transformed ket IV') are expressable in terms of the nij
and the components of IV):

(iln(~ v; = (il V') = (il!ll V) =

vjlj))

=L Vj(il!llj)
j

(1.6.2)

Equation (1.6.2) can be cast in matrix form:

·· v;] [(11!lll) (ll!ll2)

(llfiln)][~]

[ v...2 = (21!...ll1)

v~ (nl!lll)

(nl!lln) Vn

(1.6.3)

A mnemonic: the elements of the first column are simply the components of the first transformed basis vector 11') = !lll) in the given basis. Likewise, the elements of the
jth column represent the image of the jth basis vector after n acts on it.

22
CHAPTER I

Convince yourself that the same matrix flu acting to the left on the row vector corresponding to any ( v'l gives the row vector corresponding to ( v"l = ( v'l Q.
Example 1.6.1. Combining our mnemonic with the fact that the operator R(~ ni) has the following effect on the basis vectors:

R(jni)ll) =II)

R(jni)l2)=1 R(~ni)i3) = -12)

we can write down the matrix that represents it in the 11), 12), 13) basis:

R(±ni)•---·[~ ~ -~]

( L6.4)

_0 1 0

For instance, the ----1 in the third column tells us that R rotates 13) into -12). One

may also ignore the mnemonic altogether and simply use the definition Ru= (il RIJ)

to compute the matrix.

D

Exercise 1.6. 1. An operator n is given by the matrix

What is its action'1

Let us now consider certain specific operators and see how they appear in matrix form.
(1) The Identity Operator L

ou fu= (illiJ) = (iiJ) =

( 1.6.5)

Thus I is represented by a diagonal matrix with I 's along the diagonal. You should verify that our mnemonic gives the same result.
(2) The Projection Operators. Let us first get acquainted with projection operators. Consider the expansion of an arbitrary kct IV) in a basis:

IV)= I li)(iJV)
i=]_

In terms of the objects Ii)(il, which are linear operators, and which, by definition, act on IV) to give li)(il V), we may write the above as

23
MATHEMATICAL INTRODUCTION

( 1.6.6)

Since Eq. ( 1.6.6) is true for all IV), the object in the brackets must be identified with the identity (operator)

I= I li)(il =I IP';

i=J

jooo (

( 1.6. 7)

The object IP',= Ii)(i I is called the projection operator for the ket Ii). Equation ( 1.6.7),
which is called the completeness relation, expresses the identity as a sum over projection operators and will be invaluable to us. (If you think that any time spent on the identity, which seems to do nothing, is a waste of time, just wait and see.)
Consider

( 1.6.8)

Clearly IP', is linear. Notice that whatever IV) is, IP';I V) is a multiple of if) with a coefficient (v;) which is the component of IV) along li). Since P; projects out the
component of any ket I V) along the direction I it is called a projection operator.
The completeness relation, Eq. ( 1.6. 7), says that the sum of the projections of a vector along all the n directions equals the vector itself. Projection operators can also act on bras in the same way:

( 1.6.9)

Projection operators corresponding to the basis vectors obey

(1.6.10)
This equation tells us that (1) once IP' 1 projects out the part of IV) along !i), further applications of IP'; make no difference; and (2) the subsequent application of IP';(j of i) will result in zero, since a vector entirely along Ii) cannot have a projection along a perpendicular direction IJ).

24
CHAPTER l
Figure 1.4. P, and P, are polarizers placed in the way of a beam traveling along the z axis. The action of the polarizers on the electric field E obeys the law of combination of projection operators: P,P1=i5uP1.
The following example from optics may throw some light on the discussion. Consider a beam of light traveling along the z axis and polarized in the x ---- y plane
at an angle e with respect to the y axis (see Fig. 1.4). If a polarizer P,, that only
admits light polarized along they axis, is placed in the way, the projection E cos 8 along the y axis is transmitted. An additional polarizer P, placed in the way has no further effect on the beam. We may equate the action of the polarizer to that of a projection operator ~DY that acts on the electric field vector E. If P,. is followed by a polarizer P, the beam is completely blocked. Thus the polarizers obey the equation P1P;= 8uP; expected of projection operators.
Let us next turn to the matrix elements of !P'i. There are two approaches. The
first one, somewhat indirect, gives us a feeling for what kind of an object Ii) <i is.
We know
0 0 0 i) <---+
0
and
(i i ._ [0, 0, ...• I, 0, O•... , OJ

so that

0

0

0

0

0

li)(il <-t 1 [0,0, ... , 1,0, ... ,0]=

0

0

0

0

0

25
MATHEMATICAL INTRODUCTION
(1.6.11)

by the rules of matrix multiplication. Whereas <VI V') = (1 x n matrix) x (n x 1 matrix)= (1 x 1 matrix) is a scalar, IV)( V'l = (n x 1 matrix) x (1 x n matrix)= (n x n matrix) is an operator. The inner product <VI V') represents a bra and ket which have found each other, while IV)(V'I, sometimes called the outer product, has the two factors looking the other way for a bra or a ket to dot with.
The more direct approach to the matrix elements gives

( 1.6.12)

which is of course identical to Eq. (1.6.11 ). The same result also follows from mnemonic. Each projection operator has only one nonvanishing matrix element, a 1 at the ith element on the diagonal. The completeness relation, Eq. (1.6.7), says that when all the IP; are added, the diagonal fills out to give the identity. If we form the sum over just some of the projection operators, we get the operator which projects a given vector into the subspace spanned by just the corresponding basis vectors.

Matrices Corresponding to Products of Operators
Consider next the matrices representing a product of operators. These are related to the matrices representing the individual operators by the application ofEq. (1.6.7):

(QA)u= (iiQAij) = (iiQIAij)

=L: (iiOik)(kiAIJ)=L; O;kAkj

k

k

( 1.6.13)

Thus the matrix representing the product of operators is the product of the matrices representing the factors.

The Adjoint of an Operator Recall that given a ket al V) =I a V) the corresponding bra is
(a VI= (Via* (and not (Via)

26
CHAPTER I

In the same way, given a ket

the corresponding bra is

Ql V)= IQV)

(1.6.14)
which defines the operator (l. One may state this equation in words: if Q turns a
ket IV) to IV'), then n+ turns the bra (VI into ( V'l. Just as a and a*, IV) and (VI are related but distinct objects, so are Q and Qt. The relation between Q, and nt, called the adjoint of Q or "omega dagger," is best seen in a basis:
(Qt);j= (iiQtjj) = (Qilj)
= (jjQi) * = (jiOii) *
so

(1.6.15)
In other words, the matrix representing nt is the transpose conjugate of the matrix representing Q. (Recall that the row vector representing (VI is the transpose conjugate of the column vector representing IV). In a given basis, the adjoint operation is the same as taking the transpose conjugate.)
The adjoint of a product is the product of the adjoints in reverse:

(1.6.16) To prove this we consider (QA VI. First we treat QA as one operator and get
(QAVI = ((QA)VI = (Vj(QA)t

Next we treat (AV) as just another vector, and write

(QAVI =(Q(AV)I =(AVIQ+ We next pull out A, pushing nt further out:

Comparing this result with the one obtained a few lines above, we get the desired result.
Consider now an equation consisting of kets, scalars, and operators, such as
( 1.6.17a)

What is its adjoint? Our old rule tells us that it is In the last term we can replace (QA V61 by

27
MATHEMATICAL INTRODUCTION

so that finally we have the adjoint of Eq. (1.6.17a):
(1.6.17b)
The final rule for taking the adjoint of the most general equation we will ever encounter is this:
When a product of operators, bras, kets, and explicit numerical coefficients is encountered, reverse the order of all factors and make the substitutions Q +--> Q +, I)+--> (I, a+--> a*.
(Of course, there is no real need to reverse the location of the scalars a except in the interest of unifcrmity.)

Hermitian, Anti-Hermitian, and Unitary Operators We now turn our attention to certain special classes of operators that will play
a major role in quantum mechanics.
Definition 13. An operator Q is Hermitian if nt = n.
nt Definition 14. An operator Q is anti-Hermitian if = -Q.

The adjoint is to an operator what the complex conjugate is to numbers. Hermitian and anti-Hermitian operators are like pure real and pure imaginary numbers. Just as every number may be decomposed into a sum of pure real and pure imaginary parts,

a+a* a-a*

a=---+---

2

2

we can decompose every operator into its Hermitian and anti-Hermitian parts:

(1.6.18)
Exercise 1.6.2. * Given Q and A are Hermitian what can you say about (I) QA; (2)
QA+AQ; (3) [Q, A]; and (4) i[Q, A]?

28
CHAPTER I

Definition 15. An operator U is unitary if

uut=I

(1.6.19)

This equation tells us that U and ut are inverses of each other. Consequently,
from Eq. (1.5.12),

( 1.6.20)

Following the analogy between operators and numbers, unitary operators are
like complex numbers of unit modulus, u = e;6 . Just as u*u = 1, so is ut U =I.
Exercise 1.6.3. * Show that a product of unitary operators is unitary.

Theorem 7. Unitary operators preserve the inner product between the vectors they act on.

Proof Let

and

IV2)= Ul Vz)

Then

( V21 Vi)= ( UVzl UV,) = (Vzl UtUI V,) = (Vzl V,)

(1.6.21)

(Q.E.D.)

Unitary operators are the generalizations of rotation operators from W3(R) to Wn( C), for just like rotation operators in three dimensions, they preserve the lengths of vectors and their dot products. In fact, on a real vector space, the unitarity
condition becomes u-' = ur (T means transpose), which defines an orthogonal or
rotation matrix. [R(hi) is an example.]

Theorem 8. If one treats the columns of an n x n unitary matrix as components of n vectors, these vectors are orthonormal. In the same way, the rows may be interpreted as components of n orthonormal vectors.
Proof 1. According to our mnemonic, thejth column of the matrix representing U is the image of the jth basis vector after U acts on it. Since U preserves inner products, the rotated set of vectors is also orthonormal. Consider next the rows. We
now use the fact that ut is also a rotation. (How else can it neutralize U to give ut U =I?) Since the rows of U are the columns of ut (but for an overall complex

conjugation which does not affect the question of orthonormality), the result we already have for the columns of a unitary matrix tells us the rows of U are orthonormal.

Proof 2. Since ut U=I,

oij= (illiJ) = (il utUIJ)

= L (il Utlk)(kl UIJ)
k

= L: u)~ukj= L: ut;ukj

k

k

( 1.6.22)

which proves the theorem for the columns. A similar result for the rows follows if
we start with the equation uut =I. Q.E.D. Note that ut U=I and uut =I are not independent conditions.

Exercise 1.6.4. * It is assumed that you know (I) what a determinant is. (2) that det n7 = det n (T denotes transpose), (3) that the determinant of a product of matrices is the product
of the determinants. [If you do not, verify these properties for a two-dimensional case

29
MATHEMATICAL INTRODUCTION

with det n = (a 8 - f1 y ).] Prove that the determinant of a unitary matrix is a complex number
of unit modulus.
Exercise 1.6.5. * Verify that Rd rri) is unitary (orthogonal) by examining its matrix.
Exercise 1.6.6. Verify that the following matrices are unitary:
1-iJ 1Jl+i
2[)-i Hi
Verify that the determinant is of the form e'" in each case. Are any of the above matrices
Hermitian?

1.7. Active and Passive Transformations

Suppose we subject all the vectors I V) in a space to a unitary transformation

I V)_,.UI V)

(1.7.1)

Under this transformation, the matrix elements of any operator Q are modified as follows:

( V'IOI V)->( UV'iOI UV) = ( V'l UtQUI V)

( 1.7.2)

30
CHAPTER 1

It is dear that the same change would be effected if we left the vectors alone and subjected all operators to the change

n-.u'nu

(1.7.3)

The first case is called an active transformation and the second a passive tram:fiJrmation. The present nomenclature is in reference to the vectors: they are affected in an active transfonnation and left alone in the passive case. The situation is exactly the opposite from the point of view of the operators.
Later we will see that the physics in quantum theory lies in the matrix elements of operators, and that active and passive transformations provide us with two equivalent ways of describing the same physical transfom1ation.
ExercL1·e I. 7.1. * The trace of a matrix is defined to be the sum of its diagonal matrix
elements

TrH=:Ln,,

Show that
( 1) Tr(!1A) = Tr(Arl) (2) Tr(!1A8) = Tr(MKl) o Tr( I}HA) (The permutations arc cyclir). (3) The trace of an operator is unaffected by a unitary change of basis li)->Uii). [Equiva-
lently, show Tr !1 = Tr( U'Hl!).]
Exercise 1. 7.2. Show that the determinant of a matrix is unaffected by a unitary change of basis. [Equivalently show det D = det( U+WJ).]

1.8. The Eigenvalue Problem
Consider some linear operator Q acting on an arbitrary nonzero ket IV):

01 V)=i V')

(1.8.1)

Unless the operator happens to be a trivial one, such as the identity or its multiple, the ket will suffer a nontrivial change, i.e., IV') will not be simply related to IV). So much for an arbitrary ket. Each operator, however, has certain kets of its own, called its eigenkets, 0!1 which its action is simply that of rescaling:

fll V)=wl V)

(1.8.2)

Equation ( 1.8.2) is an eigenvalue equation: I V) is an eigenket of Q with eigenvalue w. In this chapter we will see how, given an operator Q, one can systematically determine all its eigenvalues and eigenvectors. How such an equation enters physics will be illustrated by a few examples from mechanics at the end of this section, and once we get to quantum mtxhanics proper, it will be eigen, eigen, eigen all the way.

Example 1.8.1. To illustrate how easy the eigenvalue problem really is, we will begin with a case that will be completely solved: the case 0 =I. Since
IIV)=IV)

31
MATHEMATICAL INTRODUCfiON

for alii V), we conclude that

( 1) the only eigenvalue of I is 1;

(2) all vectors are its eigenvectors with this eigenvalue.

0

Example 1.8.2. After this unqualified success, we are encouraged to take on a slightly more difficult case: Q = IP v, the projection operator associated with a normalized ket IV). Clearly
(1) any ket a 1 V) = 1a V), parallel to I V) is an eigenket with eigenvalue 1:

IP>vla V) =I V)<VIa V) = al V)l Vl 2 = l·la V)

(2) any ket I V.1), perpendicular to IV), is an eigenket with eigenvalue 0:

(3) kets that are neither, i.e., kets of the form al V) + {31 V.L), are simply not
eigenkets:
IP>v(al V) + ,81 V1)) =I a V) =I y(al V) + ,81 V.1))

Since every ket in the space falls into one of the above classes, we have found

all the eigenvalues and eigenvectors.

0

Example 1.8.3. Consider now the operator R(!JTi}. We already know that it
has one eigenket, the basis vector II ) along the x axis:
R(~lii)ll)=jl)

Are there others? Of course, any vector all) along the x axis is also unaffected by the x rotation. This is a general feature of the eigenvalue equation and reflects the linearity of the operator:

if

01 V) =(I) IV) then
Oal V) = aOI V) = a(l)j V) = r.oal V)

32
CHAPTER I

for any multiple a. Since the eigenvalue equation fixes the eigenvector only up to

an overall scale factor, we will not treat the multiples of an eigenvector as distinct

eigenvectors. With this understanding in mind, let us ask if R(~ni) has any eigenvec-

tors besides II). Our intuition says no, for any vector not along the x axis necessarily gets rotated by R(~ ni) and cannot possibly transform into a multiple of itself. Since

every vector is either parallel to II) or isn't, we have fully solved the eigenvalue

problem.

The trouble with this conclusion is that it is wrong! R(~ ni) has two other

eigenvectors besides 11 ). But our intuition is not to be blamed, for these vectors are

in V\ C) and not W3(R). It is clear from this example that we need a reliable and

systematic method for solving the eigenvalue problem in Wn( C). We now tum our

attention to this very question.

D

The Characteristic Equation and the Solution to the Eigenvalue Problem We begin by rewriting Eq. ( 1.8.2) as

(Q- col)l V) = 10) Operating both sides with (Q-col)- 1, assuming it exists, we get

(1.8.3)

( 1.8.4)

Now, any finite operator (an operator with finite matrix elements) acting on the null vector can only give us a null vector. It therefore seems that in asking for a nonzero eigenvector IV), we are trying to get something for nothing out of Eq. (1.8.4). This is impossible. It follows that our assumption that the operator (Q- col)- 1 exists (as
a finite operator) is false. So we ask when this situation will obtain. Basic matrix theory tells us (see Appendix A.l) that the inverse of any matrix Misgiven by

M_ 1 =cofactor MT det M

( 1.8.5)

Now the cofactor of M is finite if M is. Thus what we need is the vanishing of the determinant. The condition for nonzero eigenvectors is therefore

det(Q- col)= 0

( 1.8.6)

This equation will determine the eigenvalues co. To find them, we project Eq. (1.8.3) onto a basis. Dotting both sides with a basis bra <il, we get

<iiQ-co/1 V)=O

and upon introducing the representation of the identity [Eq. (1.6.7)], to the left of IV), we get the following image of Eq. (1.8.3):

L (Qii-ro8ii)v1=0
j

( 1.8. 7)

33
MATHEMATICAL INTRODUCTION

Setting the determinant to zero will give us an expression of the form

( 1.8.8)

Equation (1.8.8) is called the characteristic equation and

n

L r(ro)=

Cm(Om

m=O

(1.8.9)

is called the characteristic polynomial. Although the polynomial is being determined in a particular basis, the eigenvalues, which are its roots, are basis independent, for they are defined by the abstract Eq. (1.8.3), which makes no reference to any basis.
Now, a fundamental result in analysis is that every nth-order polynomial has n roots, not necessarily distinct and not necessarily real. Thus every operator in \r(C) has n eigenvalues. Once the eigenvalues are known, the eigenvectors may be found, at least for Hermitian and unitary operators, using a procedure illustrated by the following example. [Operators on \In(C) that are not of the above variety may not haven eigenvectors-see Exercise 1.8.4. Theorems I0 and 12 establish that Hermitian
and unitary operators on vnc c) will have n eigenvectors.]

Example 1.8.4. Let us use the general techniques developed above to find all the eigenvectors and eigenvalues of R(! rri). Recall that the matrix representing it is

Therefore the characteristic equation is

1-ro 0 0

det(R-ro/)= 0 -ro -1 =0

0

-ro

i.e.,

( 1.8.10)

34
CHAPTER I

with roots ro = I, ± i. We know that ro = I corresponds to II). Let us see this come out of the formalism. Feeding ro = I into Eq. ( I.8. 7) we find that the components Xi , x2 , and x 3 of the corresponding eigenvector must obey the equations
0 0-1
Thus any vector of the form

is acceptable, as expected. It is conventional to use the freedom in scale to normalize the eigenvectors. Thus in this case a choice is

I say a choice, and not the choice, since the vector may be multiplied by a number of modulus unity without changing the norm. There is no universally accepted convention for eliminating this freedom, except perhaps to choose the vector with real components when possible.
Note that of the three simultaneous equations above, the first is not a real equation. In general, there will be only (n- I) LI equations. This is the reason the norm of the vector is not fixed and, as slwwn in Appendix A.l, the reason the determinant vanishes.
Consider next the equations corresponding to ro = i. The components of the eigenvector obey the equations
(1-i)Xi =0 (i.e., xi =0)
(i.e., x2 = ix3)
Notice once again that we have only n- I useful equations. A properly normalized solution to the above is

A similar procedure yields the third eigenvector:
[~i] Ico= -i) +-+ -i/1.2 1

35
MATHEMATICAL INTRODUCTION
0

In the above example we have introduced a popular convention: labeling the eigenvectors by the eigenvalue. For instance, the ket corresponding to co= co; is labeled 1co= ro 1) or simply I ro1). This notation presumes that to each co1 there is just one vector labeled by it. Though this is not always the case, only a slight change in this notation will be needed to cover the general case.
The phenomenon of a single eigenvalue representing more than one eigenvector
is called degeneracy and corresponds to repeated roots for the characteristic polynomial. In the face of degeneracy, we need to modify not just the labeling, but also the procedure used in the example above for finding the eigenvectors. Imagine that instead of R(~ni) we were dealing with another operator non 'V3(R) with roots co 1
and ro2 = co3 • It appears as if we can get two eigenvectors, by the method described
above, one for each distinct co. How do we get a third? Or is there no third? These questions will be answered in all generality shortly when we examine the question of degeneracy in detail. We now turn our attention to two central theorems on Hermitian operators. These play a vital role in quantum mechanics.

Theorem 9. The eigenvalues of a Hermitian operator are real.

Proof Let

Dot both sides with <co I:
Take the adjoint to get

( 1.8.11)

Since n = n t, this becomes

Subtracting from Eq. (1.8.11) O=(ro-co*)(rolco)
ro = ro* Q.E.D.

36
CHAPTER I

Theorem 10. To every Hermitian operator n, there exists (at least) a basis
consisting of its orthonormal eigenvectors. It is diagonal in this eigenbasis and has its eigenvalues as its diagonal entries.

Proof Let us start with the characteristic equation. It must have at least one
root, call it w 1 • Corresponding to (u 1 there must exist at least one nonzero eigenvector
lm1>- [If not, Theorem (A.l.l) would imply that (Q-w 1/) is invertible.] Consider
the subspace 'W~ 11 of all vectors orthogonal to Im1). Let us choose as our basis the
vector Im1) (normalized to unity) and any n ----·1 orthonormal vectors
{vi,' V~t' ... ' V";; 1} in I I. In this basis Q has the following fom1:

W1 0 0 0 0 '.. 0
0
n.-. o

(1.8.12)

0

The first column is just the image of lw,) after Q has acted on it. Given the
first column, the first row follows from the Hermiticity of n.
The characteristic equation now takes the form

i m1- OJ)· (determinant of boxed submatrix) = 0

1/-!

I (w~-----m)

cmw"'=(w~·----w)P"- 1 (w)=O

0

Now the polynomial P'' ·· 1 must also generate one root, m2 , and a normalized
eigenvector lm2). Define the subspace 'W'li.~ of vectors in 1 1 orthogonal to fw 2) (and automatically to IWt)) and repeat the same procedure as before. Finally, the
matrix n becomes, in the basis fro,), lm,), ... , 1

OJ] 0 0

0 OJ? 0

n ...... 0 0 Oh

,:J

0 0 0

Since every Iw,) was chosen from a space that was orthogonal to the previous ones, Im 1), Iw 2 ), .•. , Iw,_ 1); the basis of eigenvectors is orthonormaL (Notice that nowhere did we have to assume that the eigenvalues were all distinct.) Q.E.D.
[The analogy between real numbers and Hermitian operators is further strength--
ened by the fact that in a certain basis (of eigenvectors) the Hermitian operator can
be represented by a matrix with all real. elements.]
In stating Theorem 10, it was indicated that there might exist more than one
basis of eigenvectors that diagonalized n. This happens if there is any degeneracy.
Suppose m1 = m2 = (J). Then we have two orthonormal vectors obeying

It follows that

37
MATHEMATICAL INTRODUCTION

for any a and [3. Since the vectors lw 1) and loh) are orthogonal (and hence LI), we find that there is a whole two-dimensional subspace spanned by m,) and 1m2), the elements of which are eigenvectors of Q with eigenvalue m. One refers to this
space as an eigenspace of Q with eigenvalue w. Besides the vectors Iw 1) and Im2), there exists an infinity of orthonormal pairs lm!), lm;), obtained by a rigid rotation of Iw 1), Im2 ), from which we may select any pair in forming the eigenbasis of n.
In general, if an eigenvalue occurs m 1 times, that is, if the characteristic equation has m; of its roots equal to some w1, there will be an eigenspace w;~: from which we may
choose any m; orthonormal vectors to form the basis referred to in Theorem 10.
In the absence of degeneracy, we can prove Theorem 9 and 10 very easily. Let
us begin with two eigenvectors:

(1.8.13a)

< < Dotting the first with m; I and the second with w;!, we get

( 1.8.13b)

( 1.8.14a)

( 1.8.14b)

Taking the adjoint of the last equation and using the Hermitian nature of Q, we get

Subtracting this equation from Eq. (1.8.14a), we get If i=j, we get, since (m1lm;)#O,

(1.8.15) (1.8.16)

38
CHAPTER I

If i#-j, we get

(1.8.17)

since mi- mf = mi- mj#-0 by assumption. That the proof of orthogonality breaks down form;= mj is not surprising, for two vectors labeled by a degenerated eigenvalue could be any two members of the degenerate space which need not necessarily be orthogonal. The modification of this proof in this case of degeneracy calls for arguments that are essentially the ones used in proving Theorem 10. The advantage in the way Theorem 10 was proved first is that it suffers no modification in the degener-
ate case.

Degeneracy
We now address the question of degeneracy as promised earlier. Now, our general analysis of Theorem 10 showed us that in the face of degeneracy, we have not one, but an infinity of orthonormal eigenbases. Let us see through an example how this variety manifests itself when we look for eigenvectors and how it is to be handled.
Example 1.8.5. Consider an operator Q with matrix elements

in some basis. The characteristic equation is
i.e., m=O, 2, 2
The vector corresponding to m = 0 is found by the usual means to be

The case m = 2 leads to the following equations for the components of the eigenvector:
0=0

Now we have just one equation, instead of the two (n- 1) we have grown accustomed to! This is a reflection of the degeneracy. For every extra appearance (besides the first) a root makes, it takes away one equation. Thus degeneracy permits us extra degrees of freedom besides the usual one (of normalization). The conditions

39
MATHEMATICAL INTRODUCTION

x 2 arbitrary
define an ensemble of vectors that are perpendicular to the first, Iw = 0), i.e., lie in
a plane perpendicular to Iw = 0). This is in agreement with our expectation that a
twofold degeneracy should lead to a two-dimensional eigenspace. The freedom in X2 (or more precisely, the ratio x2/x3) corresponds to the freedom of orientation in this plane. Let us arbitrarily choose x 2 = 1, to get a normalized eigenvector corresponding to w = 2:

The third vector is now chosen to lie in this plane and to be orthogonal to the second
(being in this plane automatically makes it perpendicular to the first Iw = 0)):

Clearly each distinct choice of the ratio, x2/x3 , gives us a distinct doublet of orthonor-

mal eigenvectors with eigenvalue 2.

D

Notice that in the face of degeneracy, IW;) no longer refers to a single ket but to a generic element of the eigenspace w::;;. To refer to a particular element, we must use the symbol IW;, a), where a labels the ket within the eigenspace. A natural
choice of the label a will be discussed shortly. We now consider the analogs of Theorems 9 and 10 for unitary operators.

Theorem 11. The eigenvalues of a unitary operator are complex numbers of unit modulus.

Theorem 12. The eigenvectors of a unitary operator are mutually orthogonal. (We assume there is no degeneracy.)

40
CHAPTER I

Proof of Both Theorems (assuming no degeneracy). Let
(1.8.18a) and
( 1.8.18b) If we take the adjoint of the second equation and dot each side with the corresponding side of the first equation, we get

so that

If i=j, we get, since (udui)#O,

( 1.8.19)

while if i#j,

( 1.8.20a)

( 1.8.20b)
since lui) #lu1 )~ui#Uf*Uiut #uiuf~uiuf # 1. (Q.E.D.) If U is degenerate, we can carry out an analysis parallel to that for the Hermitian
operator !1, with just one difference. Whereas in Eq. (1.8.12), the zeros of the first
row followed from the zeros of the first column and nt = n, here they follow from
the requirement that the sum of the modulus squared of the elements in each row adds up to I. Since I u11 = I, all the other elements in the first row must vanish.

Diagonalization of Hermitian Matrices
Consider a Hermitian operator Q on 'Vn( C) represented as a matrix in some orthonormal basis 11), ... , li), ... , In). If we trade this basis for the eigenbasis lroi), ... , lroi), ... , Icon), the matrix representing Q will become diagonal. Now the operator U inducing the change of basis

leo)= VIi)

(1.8.21)

is clearly unitary, for it "rotates" one orthonormal basis into another. (If you wish you may apply our mnemonic to U and verify its unitary nature: its columns contain the components of the eigenvectors Ico) that are orthonormal.) This result is often summarized by the statement:

Every Hermitian matrix on 'Vn( C) may be diagonalized by a unitary change of basis.

We may restate this result in terms of passive transformations as follows:
If n is a Hermitian matrix, there exists a unitary matrix U (built out of the
eigenvectors of Q) such that utnu is diagonal.
Thus the problem of finding a basis that diagonalizes Q is equivalent to solving its eigenvalue problem.

41
MATHEMATICAL INTRODliCTION

Exercise 1.8.1. (1) Find the eigenvalues and normalized eigenvectors of the matrix

n=[~ ~ ~l
0 I 4J

(2) Is the matrix Hermitian? Are the eigenvectors orthogonal?

Exercise 1.8.2. * Consider the matrix

(I) Is it Hermitian? (2) Find its eigenvalues and eigenvectors.
(3) Verify that u'nu is diagonal, U being the matrix of eigenvectors of n.
Exercise 1.8.3. * Consider the Hermitian matrix
n=i [~ ~ -~: .0 -1 3
(I) Showthatw 1=w2=1;w3=2.
(2) Show that lw = 2) is any vector of the form

(3) Show that the w =I eigenspace contains all vectors of the form
either by feeding w = 1 into the equations or by requiring thilt the w = 1 eigenspace be orthogonal to Iw = 2).

42
CHAPTER I

Exercise 1.8.4. An arbitrary n x n matrix need not have n eigenvectors. Consider as an example

Q= 'l' -14

1]
2i

(l) Showthat(u 1 =wc=3. (2) By feeding in this value show we gel only one eigenvector of the form

We cannot find another one that is LL
Exercise 1. 8. 5. * Consider the matrix
J [ cos () sin 8
Q= -sinO cos I)
(I) Show that it is unitary. (2) Show that its eigenvalues are ed' and e '" (3) Find the corresponding eigenvectors; show that they are orthogonal. (4) Verify that [/QU=(diagonal matrix), where Uis the matrix of eigenvectors ofQ.
Exercise 1.8.6. * (I) We have seen that the determinant of a matrix is unchanged under
a unitary change of basis. Argue now that
n det f! =prodUCt Of eigenvalUeS Of Q = (L);
i=!
for a Hermitian or unitary Q. (2) Using the invariance of the trace under the same transformation, show that
Td2 = 2.: w,
i'-''1
Exercise 1.81 By using the results on the trace and determinant from the last problem, show that the eigenvalues of the matrix

are 3 and -I. Verify this by explicit computation. Note that the Hermitian nature of the matrix is an essential ingredient.

Exercise 1.8.8. * Consider Hermitian matrices M\ M 2, M', M 4 that obey
i,j= I, ... , 4
(1) Show that the eigenvalues of M' are ± 1. (Hint: go to the eigenbasis of M', and use
the equation for i=j.) (2) By considering the relation
M'M'=-MiM' fori~j
show that M' are traceless. [Hint: Tr(ACB)=Tr(CBA).] (3) Show that they cannot be odd-dimensional matrices.
Exercise 1.8.9. A collection of masses ma, located at ra and rotating with angular velocity oo around a common axis has an angular momentum

43
MATHEMATICAL INTRODUCTION

where Va = oo x ra is the velocity of ma. By using the identity A X (B X c) = B( A • c)- C( A • B)
show that each Cartesian component !, of I is given by

where

or in Dirac notation
ll)=Miw)
(I) Will the angular momentum and angular velocity always be parallel? (2) Show that the moment of inertia matrix Mu is Hermitian. (3) Argue now that there exist three directions tor oo such that I and oo will be parallel. How are these directions to be found? (4) Consider the moment of inertia matrix of a sphere. Due to the complete symmetry of the sphere, it is clear that every direction is its eigendirection for rotation. What does this say about the three eigenvalues of the matrix M?
Simultaneous Diagonalization of Two Hermitian Operators Let us consider next the question of simultaneously diagonalizing two Hermitian
operators.
Theorem 13. If Q and A are two commuting Hermitian operators, there exists (at least) a basis of common eigenvectors that diagonalizes them both.

44
CHAPTER I

Proof. Consider first the case where at least one of the operators is nondegenerate, i.e., to a given eigenvalue, there is just one eigenvector, up to a scale. Let us
assume n is nondegenerate. Consider any one of its eigenvectors:

Since [A, Q] =0,

ilAI OJ;)= OJ;AI OJ;)

( 1.8.22)

i.e., AI OJ;) is an eigenvector of n with eigenvalue OJ;. Since this vector is unique up
to a scale,

( 1.8.23)
Thus IOJ;) is also an eigenvector of A with eigenvalue A;. Since every eigenvector of
n is an eigenvector of A, it is evident that the basis IOJ;) will diagonalize both
operators. Since n is nondegenerate, there is only one basis with this property.
What if both operators are degenerate? By ordering the basis vectors such that
the elements of each eigenspace are adjacent, we can get one of them, say n, into
the form (Theorem 10)

OJ,
Now this basis is not unique: in every eigenspace w::;; =W7'' corresponding to the eigenvalue OJ;, there exists an infinity of bases. Let us arbitrarily pick in w::;; a set
!OJ;, a) where the additional label a runs from 1 tom;. How does A appear in the basis? Although we made no special efforts to get A
into a simple form, it already has a simple form by virtue of the fact that it commutes
with n. Let us start by mimicking the proof in the nondegenerate case:
ilAIOJ;, a)=Ail!OJ;, a)=OJ;AIOJ;, a)

However, due to the degeneracy of n, we can only conclude that
Alm 1, a) lies in V'('• Now, since vectors from different eigenspaces are orthogonal [Eq. (1.8.15)],

45
MATHEMATICAL INTRODUCTION

iflm1, a) and lmj, f3> are basis vectors such that m1#mj. Consequently, in this basis,

which is called a block diagonal matrix for obvious reasons. The block diagonal form
of A reflects the fact that when A acts on some element Im1, a> of the eigenspace
V'('', it turns it into another element of V'(''. Within each subspace i, A is given by a matrix A1, which appears as a block in the equation above. Consider a matrix A1 in V'(''. It is Hermitian since A is. It can obviously be diagonalized by trading the basis lm1, 1), lm1, 2), ... , lm 1, m1) in W'('' that we started with, for the eigenbasis of A1• Let us make such a change of basis in each eigenspace, thereby rendering A diagonal. Meanwhile what of n? It remains diagonal of course, since it is indifferent to the choice of orthonormal basis in each degenerate eigenspace. If the eigenvalues
of A; are ;.p> ;.p>, ... , Mm,) then we end up with

Q.E.D.

46
CHAPTER I

If A is not degenerate within any given subspace, i= , for any k, !, and i, the basis we end up with is unique: the freedom Q gave us in each eigenspace is fully eliminated by A. The elements of this basis may be named uniquely by the pair of indices w and it as jw, it). with A playing the role of the extra label u. If A is degenerate within an eigenspace of Q, if say A\))= , there is a two-dimensional eigenspace from which we can choose any two orthonormal vectors for the common
basis. lt is then necessary to bring in a third operator r, that commutes with both
Q and A, and which will be nondegenerate in this subspace. In general, one can always find, for finite n, a set of operators {Q, A, 1, ... } that commute with each other and that nail down a unique, common, eigenbasis, the elements of which may be labeled unambiguously as ! w, it, y, . .. ). ln our study of quantum mechanics it will be assumed that such a complete set of commuting operators exists if n is infinite.
Exercise 1.8. 10. * By considering the commutator, :>how that the following Hermitian
matrices may be simultaneously diagonalized. Find the eigenvectors common to both and verify that under a unitary transformation to this basis, both matrices are diagonalized.

0

1

0

0 -i]

0

Since Q is degenerate and!\ is not you must be prudent in deciding which matrix dictates the choice of basis.

Example 1.8.6. We will nov· discuss, in some detail, the complete solution to a problem in mechanics. It i~ in·.portant that you understand this example thoroughly, for it not only illustrates tl1e use of the mathematical techniques developed in this chapter but also contains the main features of the central problem in quantum mechanics.
The mechanical syste'n in question is depicted in Fig. 1.5. The two masses m are coupled to each other and the walls by springs of force constant k. If x1 and x2 measure the displacements of the masses from their equilibrium points, these coordinates obey the following equations, derived through an elementary application of Newton's laws:

.YJ = -- 2k X1 + k X2 m m

( 1.8.24a)

k 2k
~'i'2=-x1--x2
m m

( 1.8.24b)

Figure 1.5. The coupled mass problem. All masses arc m. all spring constants are k. and the displacements of the masses from equilibrium are x 1 and x.•.

The problem is to find x 1(t) and x2(t) given the initial-value data, which in this case consist of the initial positions and velocities. If we restrict ourselves to the case of zero initial velocities, our problem is to find x 1(t) and x2(t), given x 1(0) and x 2(0).
In what follows, we will formulate the problem in the language of linear vector spaces and solve it using the machinery developed in this chapter. As a first step, we rewrite Eq. (1.8.24) in matrix form:

47
MATHEMATICAL JNTRODUCT!ON

(1.8.25a)

where the elements of the Hermitian matrix Qii are

QII=Qn=-2k/m,

(1.8.25b)

We now view x1 and x2 as components of an abstract vector lx), and Qii as the matrix
elements of a Hermitian operator n. Since the vector jx) has two real components, it is an element of w2(R), and Q is a Hennitian operator on w\R). The abstract form
of Eq. (1.8.25a) is

I.X(t))=Qjx(t))

( 1.8.26)

Equation ( 1.8.25a) is obtained by projecting Eq. (1.8.26) on the basis vectors II), 12), which have the following physical significance:

J I 11 ) ,___) 1 <---+ first mass displace~ by unity]

Lo L second mass und1splaced _

) I I OJ

first mass undisplaced

l

12 <---+ l1 <---+ Lsecond mass displaced by unity_

(1.8.27a) (1.8.27b)

An arbitrary state, in which the masses are displaced by x 1 and x2 , is given in this basis by

( 1.8.28)

The abstract counterpart of the above equation is
( 1.8.29)
It is in this 11), 12) basis that Q is represented by the matrix appearing in Eq. (1.8.25), with elements -2k/m, k/m, etc.
The basis II), 12) is very desirable physically, for the components of jx) in this basis (x1 and x2) have the simple interpretation as displacements of the masses. However, from the standpoint of finding a mathematical solution to the initial-value problem, it is not so desirable, for the components x1 and x 2 obey the coupled

48
CHAPTER I

differential equations (1.8.24a) and (1.8.24b). The coupling is mediated by the offdiagonal matrix elements Cl12 =Cl21 =kjm.
Having identified the problem with the 11), 12) basis, we can now see how to get around it: we must switch to a basis in which Cl is diagonal. The components of lx) in this basis will then obey another uncoupled differential equations which may be readily solved. Having found the solution, we can return to the physically preferable 11), 12) basis. This, then, is our broad strategy and we now turn to the details.
From our study of Hermitian operators we know that the basis that diagonalizes Cl is the basis of its normalized eigenvectors. Let II) and III) be its eigenvectors defined by

Clll)=-wfll)

(1.8.30a)

(1.8.30b)
We are departing here from our usual notation: the eigenvalue of n is written as - w2 rather than as w in anticipation of the fact that n has eigenvalues of the form -oi, with w real. We are also using the symbols II) and III) to denote what should
be called I- w~) and I- coi1) in our convention.
It is a simple exercise (which you should perform) to solve the eigenvalue problem of Cl in the 11), 12) basis (in which the matrix elements of Clare known) and to obtain

WI--(k-)l/2, m

(1.8.31a)

-k [ -(3k)l/2
Wu- -m '

III)+-. 2 -11]

(1.8.31b)

If we now expand the vector lx(t)) in this new basis as

lx(t)) =I I)xi(t) +I Il)xu(t)

(1.8.32)

[in analogy with Eq. (1.8.29)], the components X1 and xu will evolve as follows:

(1.8.33)
We obtain this equation by rewriting Eq. (1.8.26) in the II), III) basis in which n
has its eigenvalues as the diagonal entries, and in which lx) has components x1 and

xn. Alternately we can apply the operator

49
MATHEMATICAL INTRODUCTION

to both sides of the expansion of Eq. ( 1.8.32), and get

(1.8.34)

Since II) and III) are orthogonal, each coefficient is zero.
The solution to the decoupled equations

i=I, II

( 1.8.35)

subject to the condition of vanishing initial velocities, is

i= I, II

( 1.8.36)

As anticipated, the components of lx) in the IJ), III) basis obey decoupled equations that can be readily solved. Feeding Eq. ( 1.8.36) into Eq. ( 1.8.32) we get

(1.8.37a)

= II)(IIx(O)) cos w1t+ III)(IIIx(O)) cos m11 t

( 1.8.37b)

Equation (1.8.37) provides the explicit solution to the initial-value problem. It corresponds to the following algorithm for finding lx(t)) given lx(O)).
Step (l). Solve the eigenvalue problem of n.

Step (2). Find the coefficients x1(0) =(I: x(O)) and x11(0) = (Hjx(O)) m the expansion

lx(O)) =I I)xi(O) +I II)xn(O)

Step (3). Append to each coefficient xdO) (i= I, II) a time dependence cos m;t to get the coefficients in the expansion of Ix(l) ).
Let me now illustrate this algorithm by solving the following (general) initialvalue problem: Find the future state of the system given that at t = 0 the masses are displaced by x 1(0) and x2(0).

Step (1). We can ignore this step since the eigenvalue problem has been solved [Eq. (1.8.31)].

50
CHAPTER I

Step (2).

x1(0) = (IIx(O)) =

[X1 (Q)J x (O)=IIIIx(O))=

ll

'

.

1

(1 -1)
,

..xd·O)

X1 (0)- Xo(O)

=

"

i/2

Step (3 ).

Ix( t)) = II,; x···1····(···0····)····+····:·x····2···(···0···)·· cos OJ r t + II. .I) x··1··(···0······)···-····x····2(·0) cos OJ t

·

2L2

2112

IT

The explicit solution above can be made even more explicit by projecting lx(t)) onto the 11 ), 12) basis to find x 1( t) and x2( t), the displacements of the masses. We get (feeding in the explicit formulas for w 1 and wu)

x1(t) = <llx(t))

r2t]+ = <J II) XJ(O) ~:2(0) cos[(~

<liii> Xt(O) ~;'"(0) cos [ek')l.'21J

2

,m,

2

,m

J . [(" J =~(x 1 (0)+x2(0)]cos [( k')1" 2t. + 1 [xi(O)-x2(0)]cos ~-k·') 1/2t

2

rn .. 2

.. m, ..

( 1.8.38a)

using the fact that

<LII)=<JIII>= 1 It can likewise be shown that

(1.8.38b)

We can rewrite Eq. ( 1.8.38) in matrix form as

cos [(k/m) 112t] +cos [(3k/m) 112tJ

L,(J~ X1(t) [

2

oo'[(kjm)' ''tl- m; [(3kjm)' "t]

2

t]l cos [(k/m) 112t]- cos [(3k/m) 112
2
cos [(kjm) 1 .-'2 t] +;cos [(3~/'1l)l "/')t]

( 1.8.39)

This completes our determination of the future state of the system given the initial state.
The Propagator There are two remarkable features in Eq. ( 1.8.39):

51
MATHEMATICAL INTRODUCTION

(1) The final-state vector is obtained from the initial-state vector upon multiplication by a matrix.
(2) This matrix is independent of the initial state. We call this matrix the propagator. Finding the propagator is tantamount to finding the complete solution to the problem, for given any other initial state with displacements i 1(0) and i 2(0), we
get .x1(t) and x2(t) by applying the same matrix to the initial-state vector.

We may view Eq. (1.8.39) as the image in the II), 12) basis of the abstract relation

lx(t))= U(t)lx(O))

( 1.8.40)

By comparing this equation with Eq. (l.8.37b), we find the abstract representation of U:

U(t) = ii)(II cos m, t + III)(III cos m11 t
II
=I li)(il cos m,l
i=I

(l.8.4la) (1.8.4lb)

You may easily convince yourself that if we take the matrix elements of this operator in the ll ), 12) basis, we regain the matrix appearing in Eq. (1.8.39). For example

u,l = 01 ull >

(~} tJ (~) t]}11) =(II {II)(II cos[

12 + IH)(HI cos[

1 12

l (~f"rJ 112
= ( 11 1)(111) cos [ (;;) t + (II II)(Hil) cos [

l{ [(k) [/3k' J =2. ~;) 112 cos ; t +cos

112
t]}

Notice that U(t) [Eq. (1.8.41)] is determined completely by the eigenvectors and eigenvalues of Q. We may then restate our earlier algorithm as follows. To solve the equation

52
CHAPTER I

(1) Solve the eigenvalue problem of Q. (2) Construct the propagator U in terms of the eigenvalues and eigenvectors. (3) lx(t))= U(t)lx(O)).

The Normal Modes

There are two initial states lx(O)) for which the time evolution is particularly simple. Not surprisingly, these are the eigenkets II) and III). Suppose we have lx(O))=II). Then the state at timet is

= II(t)) U(t)ll)

= (i I) (II cos w, t +I H)(HI cos Wn t)l I)

=IT)cosm1 t

( 1.8.42)

Thus the system starting off in II) is only modified by an overall factor cos w1 t. A similar remark holds with I-+II. These two modes of vibration, in which all (two) components of a vector oscillate in step are called normal modes.
The physics of the normal modes is dear in the 11), 12) basis. In this basis

II)~ I [~J

and corresponds to a state in which both masses are displaced by equal amounts. The middle spring is then a mere spectator and each mass oscillates with a frequency w1 = (k/m) 112 in response to the end spring nearest to it. Consequently

J II(l) ~

I

[cos [(k/m) 11.2t]. l

cos [(k/m) "2t]

On the other hand, if we start with

IH)~ 1 [_:J

the masses are displaced by equal and opposite amounts. In this case the middle
spring is distorted by twice the displacement of each mass. If the masses are adjusted by ~ and -~, respectively, each mass feels a restoring force of 3k~ (2k~ from the middle spring and k~ from the end spring nearest to it). Since the effective force constant is kerr=3k~/~=3k, the vibrational frequency is (3k/m) 112 and

t(l IH,;(.t);,+->1l"' [.

cos

[(3k/r.n.)

,1'

2
,

2 '" ····cos [(3kjm) '~tL

If the system starts off in a linear combination of II) and III) it evolves into the corresponding linear combination of the normal modes ll(t)) and III(t)). This

is the content of the propagator equation
lx(t))= U(t)lx(O)) =ll)(llx(O)) cos WJt+III)(III x(O)) cos Wnt = l/(t) )(llx(O)) + III(t) )(IIIx(O))

53
MATHEMATICAL INTRODUCTION

Another way to see the simple evolution of the initial states II) and III) is to determine the matrix representing U in the II), III) basis:

U <---+ [cos co1 t

I,n

0

basis

0 J
cos COn t

( 1.8.43)

You should verify this result by taking the appropriate matrix elements of U(t) in Eq. (1.8.4lb). Since each column above is the image of the corresponding basis vectors (II) or III)) after the action of U(t), (which is to say, after time evolution), we see that the initial states II) and III) evolve simply in time.
The central problem in quantum mechanics is very similar to the simple example that we have just discussed. The state of the system is described in quantum theory by a ket IVt) which obeys the Schrodinger equation

iii Iift) =HI VI)

where 1i is a constant related to Planck's constant h by 1i =hj27r, and His a Hermitian operator called the Hamiltonian. The problem is to find IVt(t)) given IVt(O)). [Since the equation is first order in t, no assumptions need be made about Iif/(0)), which is determined by the Schrodinger equation to be (- i/1i)HI Vt(O) ).]
In most cases, His a time-independent operator and the algorithm one follows in solving this initial-value problem is completely analogous to the one we have just seen:

Step (1). Solve the eigenvalue problem of H.

Step (2). Find the propagator U(t) in terms of the eigenvectors and eigenvalues of H.

Step (3). IVt(t)) = U(t)l Vt(O)).

You must of course wait till Chapter 4 to find out the physical interpretation

of IVf), the actual form of the operator H, and the precise relation between U(t)

and the eigenvalues and eigenvectors of H.

D

Exercise 1.8.11. Consider the coupled mass problem discussed above. (l) Given that the initial state is 11), in which the first mass is displaced by unity and the second is left alone, calculate ll(t)) by following the algorithm. (2) Compare your result with that following from Eq. (1.8.39).

54
CHAPTER I

Exercise 1.8.12. Consider once again the problem discussed in the previous example. (I) Assuming that
jx)=fljx)
has a solution
jx(t)) = U(t)jx(O))
find the differential equation satisfied by U(t). Use the fact that jx(O)) is arbitrary.
(2) Assuming (as is the case) that n and U can be simultaneously diagonalized, solve
for the elements of the matrix U in this common basis and regain Eq. (1.8.43). Assume jx(O))=O.

1.9. Functions of Operators and Related Concepts
We have encountered two types of objects that act on vectors: scalars, which commute with each other and with all operators; and operators, which do not generally commute with each other. It is customary to refer to the former as c numbers and the latter as q numbers. Now, we are accustomed to functions of c numbers such as sin(x), log(x), etc. We wish to examine the question whether functions of q numbers can be given a sensible meaning. We will restrict ourselves to those functions that can be written as a power series. Consider a series
(1.9.1)

where x is a c number. We define the same function of an operator or q number to be
(1.9.2)

This definition makes sense only if the sum converges to a definite limit. To see what this means, consider a common example:
(1.9.3)

Let us restrict ourselves to Hermitian n. By going to the eigenbasis of n we can readily perform the sum of Eq. (1.9.3). Since

ll)l
n= [

J

( 1.9.4)

and (J)j
gm= [

(1.9.5)

55
MATHEMATICAL INTRODUCTION

(1.9.6)

Since each sum converges to the familiar limit e0", the operator e0 is indeed well defined by the power series in this basis (and therefore in any other).
Exercise 1.9.1. * We know that the series
f(x)= I x"
n=O
may be equated to the functionf(x) = (1- x)- 1 if lxl < 1. By going to the eigenbasis, examine when the q number power series
00
/CO.)= I n"
n=O
of a Hermitian operator 0. may be identified with (1-0.)- 1•
Exercise 1.9.2. * If His a Hermitian operator, show that U=em is unitary. (Notice the
analogy with c numbers: if() is real, u=e18 is a number of unit modulus.)
Exercise 1.9.3. For the case above, show that det U=enrH.

Derivatives of Operators with Respect to Parameters
Consider next an operator O(A.) that depends on a parameter A. Its derivative with respect to A. is defined to be

dO( A.)= lim [O(A.+ AA.)- O(A.)J

dA. ,u~o

AA.

If O(A.) is written as a matrix in some basis, then the matrix representing dO(A.)jdA. is obtained by differentiating the matrix elements of O(A.). A special case of O(A.) we

56
CHAPTER I

are interested in is

where Q is Hermitian. We can show, by going to the eigenbasis of Q, that

d~B-(A-.-) =QeAn =eMlQ=O(A.)Q
d).

(1.9.7)

The same result may be obtained, even if Q is not Hermitian, by working with the power series, provided it exists:

- I I L: L: d cc )."Q" 0() nA."-lgn

00 ;.n~lgn-1

00 ;.mgm

················= ····························=n

~ ~ =n

-~--=neA.Q

dA.n~o n! n=l n!

n-1 (n-l)!

m-o m!

Conversely, we can say that if we are confronted with the differential Eq. ( 1.9.7), its solution is given by

O(A.)=c exp(f Q dX )=c exp(QA.)

(It is assumed here that the exponential exists.) In the above, cis a constant (opera-
e tor) of integration. The solution =em corresponds to the choice c =I.
In all the above operations, we see that Q behaves as if it were just a c number. Now, the real difference between c numbers and q numbers is that the latter do not generally commute. However, if only one q number (or powers of it) enter the picture, everything commutes and we can treat them as c numbers. If one remembers this mnemonic, one can save a lot of time.
If, on the other hand, more than one q number is involved, the order of the factors is all important. For example, it is true that

as may be verified by a power-series expansion, while it is not true that

or that
unless [Q, 0] = 0. Likewise, in differentiating a product, the chain rule is
(1.9.8)

We are free to move n through e'.n and write the first term as
but not as

57
MATHEMATICAL INTRODUCTION

unless [0, 0]=0.
1.10. Generalization to Infinite Dimensions
In all of the preceding discussions, the dimensionality (n) of the space was unspecified but assumed to be some finite number. We now consider the generalization of the preceding concepts to infinite dimensions.
Let us begin by getting acquainted with an infinite-dimensional vector. Consider a function defined in some interval, say, a~x~b. A concrete example is provided by the displacementf(x, t) of a string clamped at x=O and x=L (Fig. 1.6).
Suppose we want to communicate to a person on the moon the string's displacementf(x), at some timet. One simple way is to divide the interval 0- L into 20 equal parts, measure the displacementf(x;) at the 19 points x=L/20, 2Lj20, ... , 19Lj20, and transmit the 19 values on the wireless. Given thesef(x;), our friend on the moon will be able to reconstruct the approximate picture of the string shown in Fig. 1.7.
If we wish to be more accurate, we can specify the values of f(x) at a larger number of points. Let us denote by f,(x) the discrete approximation to f(x) that coincides with it at n points and vanishes in between. Let us now interpret the ordered
n-tuple Un(xi), f,(x2), ... , f,(xn)} as components of a ket Ifn> in a vector space
'lor(R):

(1.10.1)

Figure 1.6. The string is clamped at x=O and x = L. It is free to oscillate in the plane of the paper.

Figure 1.7. The string as reconstructed by the person on the moon.

58
CHAPTER I

The basis vectors in this space are

0 0
Ix;) <--> 1 +-- ith place
0

(1.10.2)

0

corresponding to the discrete function which is unity at x = x; and zero elsewhere. The basis vectors satisfy

(x;!xj) = Ou (orthogonality)

(1.10.3)

L lx;)(x;l =I (completeness)
i=l

(l.l0.4)

Try to imagine a space containing n mutually perpendicular axes, one for each point X;. Along each axis is a unit vector lx;). The functionf,(x) is represented by a vector whose projection along the ith direction is fn(x;):

If,)= L fn(X;)Ix;)
i=1

( 1.10.5)

To every possible discrete approximation gn(x), hn(x), etc., there is a corresponding ket lg11), lh11), etc., and vice versa. You should convince yourself that if we define vector addition as the addition of the components, and scalar multiplication as the multiplication of each component by the scalar, then the set of all kets representing discrete functions that vanish at x = 0, L and that are specified at n points in between, forms a vector space.
We next define the inner product in this space:

11
( 1.10.6)
i=l

Two functionsf,(x) and gn(x) will be said to be orthogonal if <J~Ign)=O. Let us now forget the man on the moon and consider the; maximal specification
of the string's displacement, by giving its value at every point in the interval 0- L.
In this casefoo(x)=f(x) is specified by an ordered infinity of numbers: anf(x) for
each point x. Each function is now represented by a ket I/"") in an infinite-dimen-
sional vector space and vice versa. Vector addition and scalar multiplication are defined just as before. Consider, however, the inner product. For finite n it was

defined as
n
(Jnlgn) = I fn(X;)gn(X;)
i=l
in particular
n
(j,,l fn) = I [fn(X;)] 2
i=l
If we now let n go to infinity, so does the sum, for practically any function. What we need is the redefinition of the inner product for finite n in such a way that as n tends to infinity, a smooth limit obtains. The natural choice is of course

(fnlgn) = I fn(X;)gn(x;)ll,
i=l

!l=L/(n+ I)

(1.10.6')

If we now Jet n go to infinity, we get, by the usual definition of the integral,

59
MATHEMATICAL INTRODUCTION

(fig)= rf(x)g(x) dx
0
~I
(fl f> = j . / 2(x) dx
·o

(1.10.7) (1.10.8)

If we wish to go beyond the instance of the string and consider complex functions

of x as well, in some interval as, x-<;;;, b, the only modification we need is in the inner

product:

r <Jig)= f*(x)g(x) dx

(1.10.9)

a

What are the basis vectors in this space and how are they normalized? We know
that each point x gets a basis vector lx). The orthogonality of two different axes requires that

(xlx')=O, xi=x'

(1.10.10)

What if x=x'? Should we require, as in the finite-dimensional case, (xlx)= 1? The answer is no, and the best way to see it is to deduce the correct normalization. We start with the natural generalization of the completeness relation Eq. (1.10.4) to the case where the kets are labeled by a continuous index x':

f lx')(x'l dx'=I

(1.10.11)

CHAPTER I

where, as always, the identity is required to leave each ket unchanged. Dotting both

sides of Eq. (1.10.11) with some arbitrary ket If) from the right and the basis bra

(xl from the left,

r(xlx')(x'lf> dx'= (xiiif)= <xlf>

(1.10.12)

a

Now, (xl f), the projection of If) along the ·basis ket lx), is just f(x). Likewise (x'l f)= f(x'). Let the inner product (xlx') be some unknown function o(x, x').
o Since (x, x') vanishes if xi' x' we can restrict the integral to an infinitesimal region
near x'=x in Eq. (1.10.12):

fx+< o(x, x') f(x') dx' =f(x)

( 1.10.13)

x---·e

In this infinitesimal region,f(x') (for any reasonably smoothf) can be approximated
by its value at x' =x, and pulled out of the integral:

r+• f(x)

o(x, x') dx'=f(x)

(1.10.14)

x-&

so that

IX+<
x-e 8(x, x') dx' = 1

(1.10.15)

Clearly o(x, x') cannot be finite at x' = x, for then its integral over an infinitesimal
region would also be infinitesimal. In fact o(x, x') should be infinite in such a way that its integral is unity. Since o(x, x') depends only on the difference X- x'' let us write it as o(x- x'). The "function," 8(x- x'), with the properties

r o(x-x')=O, o(x-x') dx'= 1, a

x#x' a<x<b

(1.10.16)

is called the Dirac delta function and fixes the normalization of the basis vectors:

(xlx') = 8(x- x')

(l.l0.17)

It will be needed any time the basis kets are labeled by a continuous index such as x. Note that it is defined only in the context of an integration: the integral of the delta function o(x- x') with any smooth functionf(x') isf(x). One sometimes calls

(a)

(b)

61

ci1J14(x-a'~

MATHEMATICAL

dx

x-«

~

INTRODUCTION

It +tr

Figure 1.8. (a) The Gaussian g11 approaches the delta function as .&-+0. (b) Its derivative (dg/dx)(x- x')
approaches o'(x-x') as .&....o.

the delta function the sampling function, since it samples the value of the function

f(x') at one pointt

fS(x-x')f(x') dx'=f(x)

(1.10.18)

The delta function does not look like any function we have seen before, its values being either infinite or zero. It is therefore useful to view it as the limit of a more conventional function. Consider a Gaussian

(1.10.19)

as shown in Fig. 1.8a. The Gaussian is centered at x' = x, has width A, maximum
height (nA2) 112, and unit area, independent of A. As A approaches zero, g" becomes a better and better approximation to the delta function.§
It is obvious from the Gaussian model that the delta function is even. This may be verified as follows:
S(x-x') = (xlx')= (x'lx)"' = o(x'- x)"' = o(x' -x)
since the delta function is real. Consider next an object that is even more peculiar than the delta function: its
derivative with respect to the first argument x:

o'(x-x') =j_ o(x- x') = _!!__ o(x-x')

dx

dx'

(1.10.20)

What is the action of this function under the integral? The clue comes from the Gaussian model. Consider dg{j,(x-x')/dx=-dg&(x-x')jdx' as a function of x'. As
o g& shrinks, each bump at ± e will become, up to a scale factor, the function. The

:t We will often omit the limits of integration if they are unimportant.
§A fine point that will not concern you till Chapter 8: This formula for the delta function is valid even
if .&2 is pure imaginary, say, equal to ifJ2• First we see from Eq. (A.2.5) that g has unit area. Consider
next the integral of g times f(x') over a region in x' that includes x. For the most part, we get zero
because f is smooth and g is wildly oscillating as fJ -+0. However, at x = x', the derivative of the phase
of g vanishes and the oscillations are suspended. Pullingf(x' = x) out of the integral, we get the desired
result.

62
CHAPTER 1

first one will sample -f(x-s) and the second one +f(x+t:), again up to a scale, so that
I d~ o '(x- x')f(x') dx' ocf(x+ e)- f(x- t:) = 2t: 1 dx x'~x

The constant of proportionality happens to be 1/2t: so that

I d~l o'(x-x')f(x')dx'=

=df(x)

dx x'~x dx

(1.10.21)

This result may be verified as follows:

I I f~ o'(x-x')f(x') dx'= do(;;;x') f(x') dx·=

o(x-x')f(x') dx'

df(x) dx
Note that o'(x-x') is an odd function. This should be clear from Fig. 1.8b or Eq.
(1.10.20). An equivalent way to describe the action of the o' function is by the
equation

d o'(x-x') = o(x-x')-
dx'

(1.10.22)

where it is understood that both sides appear in an integral over x' and that the
differential operator acts on any function that accompanies the o' function in the
integrand. In this notation we can describe the action of higher derivatives of the delta function:

(1.10.23)

We will now develop an alternate representation of the delta function. We know from basic Fourier analysis that, given a functionf(x), we may define its transform

f f(k)=-1-1-12 00 e-ikx f(x) dx (2n) --n

(1.10.24)

and its inverse

fx' J(x')=~J-1_.'2 eikxj(k) dk (2n) ,. ···cc

Feeding Eq. (1.10.24) into Eq. (1.10.25), we get

( 1.1 0.25)

63
MATHEMATICAL INTRODUCTION

Comparing this result with Eq. (1.10.18), we see that

fx - 1

d'k e - u x ik(x' .. x)_ "( X ' ...... )

2n -x

( 1.1 0.26)

J Exercise I.JO. I.* Show that D(ax) = 8(x)/lal. [Consider 8(ax) d(ax). Remember that
8(x) = 8( -x).]

Exercise 1.10.2. * Show that

8({(x))=L:·8...(....x....,..-...x....)...

· ·

, ldf/dx,j

where x, are the zeros off(x). Hint: Where does 15(/(x)) blow up? Expand.f(.C~:) near such points in a Taylor series, keeping the first nonzero term.
Exercise 1.10.3. * Consider the thetafimction B(x- x') which vanishes if x- x' is negative
and equals l if x-x' is positive. Show that 8(x-x')=d/dx IJ(x-x').

Operators in Infinite Dimensions Having acquainted ourselves with the elements of this function space, namely,
the kets If) and the basis vectors lx), let us turn to the (linear) operators that act
on them. Consider the equation
O.f) =I]>
Since the kets are in correspondence with the functions, 0 takes the function f(x) into another,](x). Now, one operator that does such a thing is the familiar differential operator, which, acting onf(x), gives](x) ={{((x)jdx. In the function space we can describe the action of this operator as
Dl f> = dfldx)
where ldf/dx) is the ket corresponding to the function djjdx. What are the matrix elements of D in the Ix) basis? To find out, we dot both sides of the above equation

64
CHAPTER I

with (xi,

and insert the resolution of identity at the right place
Jl' (xI D Ix') (x' If) dx' =~I{ dx

(1.10.27)

Comparing this to Eq. (l.I0.21), we deduce that

ID

=D ..·= n

8'(x-x') ·

=

d

o

(

x

-

x ·

'd)x-'

(1.10.28)

o It is worth remembering that Du = '(x- .x') is to be integrated over the second index
(x') and pulls aut the derivative off at the first index (x). Some people prefer to integrate o'(x-x') over the first index, in which case it pulls out -df/dx'. Our convention is more natural if one views Dxx' as a matrix acting to the right on the components f~ =f(x') of a vector If). Thus the familiar differential operator is an infinite-dimensional matrix with the elements given above. Normally one doesn't think of D as a matrix for the following reason. Usually when a matrix acts on a vector, there is a sum over a common index. In fact, Eq. (1.1 0.27) contains such a sum over the index X 1• If, however, we feed into this equation the value of Dx-:·, the delta function renders the integration trivial:

J. ') 8(x .... x -d /'(x') dx' =({-f

d.x'·

dxl

df dx

Thus the action of Dis simply to apply d/d.x tof(x) with no sum over a common index in sight. Although we too will drop the integral over the common index ultimately, we will continue to use it for a while to remind us that D, like all linear operators, is a matrix.
Let us now ask if Dis Hermitian and examine its eigenvalue problem. If D were Hermitian, we would have

D,,.=D~,

But this is not the case:

D"=o'(x-x')

while

D~·x = 0 '(x'- = 8'(x'- x) = -o'(x- x')

But we can easily convert D to a Hermitian matrix by multiplying it with a pure imaginary number. Consider
K=-iD

65
MATHEMATICAL INTRODUCTION

which satisfies

x:x= [-io'(x'-x)]*= +io'(x'-x)= -io'(x- x') =K,:.:

It turns out that despite the above, the operator K is not guaranteed to be Hermitian, as the following analysis will indicate. Let I/) and lg) be two kets in the function space, whose images in the X basis are two functions f(x) and g(x) in the interval a- b. If K is Hermitian, it must also satisfy

(giK]/)= (gl Kf)=(Kfig)*=<JIKtig)*= (/IK]g)*

So we ask

rr(gix)(xiK]x')(x'lf)dxdx'

(fr a a :!:

<II X> <xl K] x') (x' Ig>dx dxJ"

a a

Integrating the left-hand side by parts gives

-

i

g

*

(

x

)

f

(

x

)

/; 1

+i

fb

dg*(~}f(x)dx

a

a dx

So K is Hermitian only if the surface term vanishes:

-ig*(x)f(x{ =0

(1.10.29)

In contrast to the finite-dimensional case, Kxx' = x:x is not a sufficient condition for
K to be Hermitian. One also needs to look lit the behavior of the functions at the end points a and b. Thus K is Hermitian if the space consists of functions that obey Eq. (1.10.29). One set of functions that obey this condition are the possible configurationsf(x) of the string clamped at x=O, L, sincef(x) vanishes at the end points. But condition (1.10.29) can also be fulfilled in another way. Consider functions,Jn our own three-dimensional space, parametrized by r, (}, and 4> (4> is the angle measured around the z axis). Let us require that these functions be single

66
CHAPTER I

valued. In particular, if we start at a certain point and go once around the z axis, returning to the original point, the function must take on its original value, i.e.,
f(cp)=f(¢+2n:)
In the space of such periodic functions, K= -i d/dcp is a Hermitian operator. The surface term vanishes because the contribution from one extremity cancels that from the other:
2rr
-ig*(¢) /(¢)1 = -i[g*(2n:)f (2n:)- g*(O)f(O)] = 0 0
In the study of quantum mechanics, we will be interested in functions defined
over the full interval - oo ::;; x::;; +oo. They fall into two classes, those that vanish as lxl -+ oo, and those that do not, the latter behaving as eikX, k being a real parameter
that labels these functions. It is dear that K= -i d/dx is Hermitian when sandwiched between two functions of the first class or a function from each, since in either case the surface term vanishes. When sandwiched between two functions of the second class, the Hermiticity hinges on whether
eikx e-ik'xloo ~0
-oe
If k=k', the contribution from one end cancels that from the other. If kiok', the
answer is unclear since ei<k-k')x oscillates, rather than approaching a limit as lxl --+ oo.
Now, there exists a way of defining a limit for such functions that cannot make up
their minds: the limit as lxl -+ oo is defined to be the average over a large interval. According to this prescription, we have, say as x-+ oo,
dx=O ifki:-k'

and so K is Hermitian in this space. We now tum to the eigenvalue problem of K. The task seems very formidable
indeed, for we have now to find the roots of an infinite-order characteristic polynomial and get the corresponding eigenvectors. It turns out to be quite simple and you might have done it a few times in the past without giving yourself due credit. Let us begin with

K]k)=klk>

(1.10.30)

Following the standard procedure, (xi.K]k) ==k(xl k)
f(xj.K]x') (x'l k) dx' =k'l'k(x)

67
MATHEMATICAL INTRODUC"'TION
(1.10.31)

- z.d- IP'k(x) =k'lfk(x) dx
where by definition 'l'k(x) =(xi k). This equation could have been written directly had we made the immediate substitution K= -i d/dx in the X basis. From now on we shall resort to this shortcut unless there are good reasons for not doing so.
The solution to the above equation is simply
(l.l 0.32)
where A, the overall scale, is a free parameter, unspecified by the eigenvalue problem. So the eigenvalue problem of K is fully solved: any real number k is an eigenvalue, and the corresponding eigenfunction is given by A e1kx. As usual, the freedom in scale will be used to normalize the solution. We choose A to be (l/2nr 112 so that

jk).,.... _1_____ eikx (2n)l/2

and

(klk')= fro (k!x)(xlk')dx=2~ Joo e-i(k-k')xdx=.S(k-k') (1.10.33)

-oo

-oo

(Since (k Ik) is infinite, no choice of A can normalize lk) to unity. The delta function normalization is the natural one when the eigenvalue spectrum is continuous.)
The attentive reader may have a question at this point. "Why was it assumed that the eigenvalue k was real? It is clear that the function A eikx with k=k 1 +ik2 also satisfies Eq. (1.10.31)." The answer is, yes, there are eigenfunctions of K with complex eigenvalues. If, however, our space includes such functions, K must be classified a non-Hermitian operator. (The surface term no longer vanishes since eikx blows up exponentially as x tends to either +oo or -oo, depending on the sign of the imaginary part k2-) In restricting ourselves to real k we have restricted ourselves to what we will call the physical Hilbert space, which is of interest in quantum mechanics. This space is defined as the space of functions that can be either normalized to unity or to the Dirac delta function and plays a central role in quantum mechanics. (We use the qualifier "physical" to distinguish it from the Hilbert space as defined by mathematicians, which contains only proper vectors, i.e., vectors normalizable to unity. The role of the improper vectors in quantum theory will be clear later.)

68
CHAPTER I

We will assume that the theorem proved for finite dimensions, namely, that the eigenfunctions of a Hermitian operator form a complete basis, holds in the Hilbertt space. (The trouble with infinite-dimensional spaces is that even if you have an infinite number of orthonormal eigenvectors, you can never be sure you have them all, since adding or subtracting a few still leaves you with an infinite number of them.)
Since K is a Hermitian operator, functions that were expanded in the X basis with componentsf(x)=(xlf> must also have an expansion in the K basis. To find the components, we start with a ket If), and do the following:

fx" f f(k)=(klf>=

( k j x ) ( x l f ) d x = ( 2: ) 112

.f(x) dx (1.10.34)

The passage back to the X basis is done as follows:
fcc rxy_ .f(x)=(xl.f>= (klx)(klf)dk= (2:)1/2 e1k'f(k)dk (1.10.35)

Thus the familiar Fourier transform is just the passage from one complete basis to another, lk). Either basis may be used to expand functions that belong to the Hilbert space.
The matrix elements of K are trivial in the K basis:

(kl Klk') =k'(kl k') =k'8(k -k')

(1.10.36)

Now, we know where the K basis came from: it was generated by the Hermitian operator K. Which operator is responsible for the orthonormal X basis? Let us call it the operator X. The kets lx) are its eigenvectors with eigenvalue x:

Xlx) =xlx)

( 1.10.37)

Its matrix elements in the X basis are (x'IXIx) = x1S(x'- x)

( 1.10.38)

To find its action on functions, let us begin with

XI/)=!])

and follow the routine:
J (xiXI /) = (xiX!x') (x' I/) dx' = xf(x) = (x ll> =l(x)

.'. _1(x) = x.f(x)

:j: Hereafter we will omit the qualifier "physical."

Thus the effect of X is to multiply f(x) by x. As in the case of the K operator, one generally suppresses the integral over the common index since it is rendered trivial by the delta function. We can summarize the action of X in Hilbert space as

Xlf(x))=ixf(x))

( 1.1 0.39)

69
MATHEMATICAL INTRODUCTION

where as usual lxf(x)) is the ket corresponding to the function xf(x). There is a nice reciprocity between the X and K operators which manifests itself
if we compute the matrix elements of X in the K basis:

J'"' . . . ') = +i ddk ( 21;

e'<k -k)' dx = io '(k --- k')t

---.x

Thus if jg(k)) is a ket whose image in the k basis is g(k), then

=I Xlg(k)) i dg(k)) dk

(1.1 0.40)

In summary then, in the X basis, X acts as x and K as -i djdx [on the functions f(x)], while in the K basis, K acts like k and X like i djdk [onf(k)]. Operators with such an interrelationship are said to be conjugate to each other.
The conjugate operators X and K do not commute. Their commutator may be calculated as follows. Let us operate X and K in both possible orders on some ket
If) and follow the action in the X basis:

So Therefore

XI .f)-> 4(x) Jq f)__. -i df(x)
dx
X K] .f) -> -z.x d---f--(---x----)--dx
KXI.f) _,. -i -d----- :x.f(x) dx

[X, K]l.f)-> -ix df_+ ix d.f +if= if·-'> il] /) dx dx
tIn the last step we have used the fact that o(k' --- k) = o(k --- k').

70
CHAPTER I

Since If) is an arbitrary ket, we now have the desired result:

[X, K]=il

(1.10.41)

This brings us to the end of our discussion on Hilbert space, except for a final example. Although there are many other operators one can study in this space, we restricted ourselves to X and K since almost all the operators we will need for quantum mechanics are functions of X and P= nK, where n is a constant to be
defined later.

Example 1.10.1: A Normal Mode Problem in Hilbert Space. Consider a string of length L clamped at its two ends x = 0 and L. The displacement 1Jf(x, t) obeys the
differential equation

( 1.10.42)

Given that at t = 0 the displacement is 1Jf(x, 0) and the velocity ift(x, 0) = 0, we wish
to determine the time evolution of the string. But for the change in dimensionality, the problem is identical to that of the
two coupled masses encountered at the end of Section 1.8 [see Eq. (1.8.26)]. It is recommended that you go over that example once to refresh your memory before
proceeding further. We first identify 1Jf(x, t) as components of a vector llJI( t)) in a Hilbert space,
the elements of which are in correspondence with possible displacements 1Jf, i.e., functions that are continuous in the interval 0 ~ x ~ L and vanish at the end points. You may verify that these functions do form a vector space.
The analog of the operator n in Eq. (1.8.26) is the operator iPI8x2• We recognize
this to be minus the square of the operator K ~ -i8j8x. Since K acts on a space in which 1J1(0)= 1Jf(L)=O, it is Hermitian, and so is K 2. Equation (1.10.42) has the
abstract counterpart

( 1.10.43)

We solve the initial-value problem by following the algorithm developed in Example 1.8.6:

Step (1). Solve the eigenvalue problem of -K2•
Step (2). Construct the propagator U(t) in terms of the eigenvectors and eigenvalues.

Step (3).

11J!(t)) = U(t)I1J!(O))

(1.10.44)

The equation to solve is In the X basis, this becomes

(1.10.45)

71
MATHEMATICAL INTRODUCTION

(1.10.46)

the general solution to which is
'1-'k(x) =A cos kx + B sin kx

(1.10.47)

where A and Bare arbitrary. However, not all these solutions lie in the Hilbert space we are considering. We want only those that vanish at x=O and x=L. At x=O we find

lf/k(O)=O=A

(1.10.48a)

while at x = L we find

O=BsinkL

(1.10.48b)

If we do not want a trivial solution (A= B= 0) we must demand

sin kL=O, kL=mn, m= 1, 2, 3, ...

(1.10.49)

We do not consider negative m since it doesn't lead to any further LI solutions [sin( -x) =-sin x]. The allowed eigenveQ'tors thus form a discrete set labeled by an integer m:

L L lf/m(x) = ( 2 ) 112 sm• (mnx)

(1.10.50)

where we have chosen B=(2/L} 112 so that

JL lf/m(X)lflrn'(x) dx=Omm'
0

(1.10.51)

Let us associate with each solution labeled by the integer man abstract ket lm):

lm>---> (2/L} 112 sin (mnx)

Xbasis

L

(1.10.52)

72
CHAPTER 1

(mn (m If we project llfl(t)) on the lm) basis, in which K is diagonal with eigenvalues I L)2, the components llf/(1)) will obey the decoupled equations

m= 1, 2, ...

(1.10.53)

in analogy with Eq. (1.8.33). These equations may be readily solved (subject to the condition of vanishing initial velocities) as

(mn1) (mllf!(1))=(mllf!(O))cos L

(1.10.54)

Consequently
X'
llf/(1))= L lm)(mllfl(t))
m=l
X
L = lm) (m llf/(0)) COS COm1,
m=I

m7r com =L-

or

'lj
U(t) = L lm) (m Icos COm1,
m=l

mn
com =L-

The propagator equation

ilfl(l)) = U(1)ilfi(O))

becomes in the Ix) basis

(x llf/(1)) = lf!(X, 1) =(xi U(1)llf/(O))
= foL (xl U(1)lx') (x' llf/(0)) dx'

It follows from Eq. (1.10.56) that (xl U(1)lx')=I (xlm) (ml x') cos COm1
"=~ (L2) .sm (mLnx) .sm (mLnx') coscom1

( 1.10.55) (1.10.56)
( 1.10.57) (1.10.58)

Thus, given any lJI(x', 0), we can get lJI(X, t) by performing the integral in Eq. (1.10.57), using (x\ U(t)\x') from Eq. (1.10.58). If the propagator language seems too abstract, we can begin with Eq. (1.10.55). Dotting both sides with (x\, we get

00
lJI(x,t)= I (x\m)(m\lJI(O))cosmmt
m=l

Ioo ( -2 ) 112 sm. (m-tc-x) cos Wmt(m\ lJI(O))

m~I L

L

(1.10.59)

Given \IJI(O)), one must then compute

73
MATHEMATICAL INTRODUCTION

Usually we will find that the coefficients (m \IJI(O)) fall rapidly with m so that a few

leading terms may suffice to get a good approximation.

D

Exercise I. 10.4. A string is displaced as follows at t = 0:

2xh
IJI(x,O)=z:·
2h
=z;(L-x),

L
O-<x-<2-
L
-2<-x<- L

Show that

( ) . (mtrx) ( trm) lJI x, t =

~
L....

sm -

tr m~l

L

cos romt · ~ 8h ) sm. ( -

m

2

2

Review of Classical Mechanics

In this chapter we will develop the Lagrangian and Hamiltonian formulations of mechanics starting from Newton's laws. These subsequent reformulations of mechanics bring with them a great deal of elegance and computational ease. But our principal interest in them stems from the fact that they are the ideal springboards from which to make the leap to quantum mechanics. The passage from the Lagrangian formulation to quantum mechanics was carried out by Feynman in his path integral formalism. A more common route to quantum mechanics, which we will follow for the most part, has as its starting point the Hamiltonian formulation, and it was discovered mainly by Schrodinger, Heisenberg, Dirac, and Born.
It should be emphasized, and it will soon become apparent, that all three formulations of mechanics are essentially the same theory, in that their domains of validity and predictions are identical. Nonetheless, in a given context, one or the other may be more inviting for conceptual, computational, or simply aesthetic reasons.
2.1. ·The Principle of Least Action and Lagrangian Mechanics
Let us take as our prototype of the Newtonian scheme a point particle of mass m moving along the x axis under a potential V(x). According to Newton's Second Law,
(2.1.1)
If we are given the initial state variables, the position x(tt) and velocity x(tt), we can calculate the classical trajectory Xcr (t) as follows. Using the initial velocity and acceleration [obtained from Eq. (2.1.1)] we compute the position and velocity at a
time t1+ At. For example,

Having updated the state variables to the time t1+At, we can repeat the process

again to inch forward to t1+2At and so on.

75

76

X

CHAPTER 2

Figure 2.1. The Lagrangian formalism asks what distinguishes the actual path xc1 (1) taken by the particle from all possible paths connecting the end points (x,, t;) and (x1, t1 ).
The equation of motion being second order in time, two pieces of data, x(t;) and x(t;), are needed to specify a unique xc1 (t). An equivalent way to do the same, and one that we will have occasion to employ, is to specify two space-time points (x;, t;) and (xf, t1 ) on the trajectory.
The above scheme readily generalizes to more than one particle and more than one dimension. If we use n Cartesian coordinates (x1 , X2, ... , Xn) to specify the positions of the particles, the spatial configuration of the system may be visualized as a point in an n-dimensional configuration space. (The term "configuration space" is used even if then coordinates are not Cartesian.) The motion of the representative point is given by
(2.1.2)

where mj stands for the mass of the particle whose coordinate is xj. These equations can be integrated step by step, just as before, to determine the trajectory.
In the Lagrangian formalism, the problem of a single particle in a potential V(x) is posed in a different way: given that the particle is at X; and x1 at times t; and t1 , respectively, what is it that distinguishes the actual trajectory xc1 (t) from all other trajectories or paths that connect these points? (See Fig. 2.1.)
The Lagrangian approach is thus global, in that it tries to determine at one stroke the entire trajectory Xc1 (t), in contrast to the local approach of the Newtonian scheme, which concerns itself with what the particle is going to do in the next
infinitesimal time interval. The answer to the question posed above comes in three parts:

(l) Define a being the kinetic

function !£',called the and potential energies

Lagrangian, given by!£= of the particle. Thus !£ =

T-V, T and V
!f(x, x, t). The

explicit t dependence may arise if the particle is in an external time-dependent field.

We will, however, assume the absence of this t dependence.

(2) For each path x(t) connecting (xio t;) and (x1 , t1 ), calculate the action

S[x(t)] defined by

fit
S[x{t)] = !f'(x, x) dt
,,

(2.1.3)

77
lC
REVIEW OF CLASSICAL MECHANICS

Figure 2.2. If xc1 (t) minimizes S, then 8S11 >=0 if we
go to any nearby path xc1 (t) + TJ(I).

We use square brackets to enclose the argument of S to remind us that the function S depends on an entire path or function x(t), and not just the value of x at some time t. One calls S afunctional to signify that it is a function of a function.
(3) The classical path is one on which S is a minimum. (Actually we will only require that it be an extremum. It is, however, customary to refer to this condition as the principle of least action.)

We will now verify that this principle reproduces Newton's Second Law. The first step is to realize that a functional S[x(t)] is just a function ofn variables as n-+oo. In other words, the function x(t) simply specifies an infinite number of values x(t;), ... , x(t), ... , x(t1 ), one for each instant in time t in the interval t;$,t$,tf> and Sis a function of these variables. To find its minimum we simply generalize the procedure for the finite n case. Let us recall that iff=f(x 1, •• • , Xn) = f(x); the minimum x0 is characterized by the fact that if we move away from it by a small amount TJ in any direction, the first-order change lif0 ) in f vanishes. That is, if we make a Taylor expansion:

I. f(x0 +TJ)=f(x0)+ ofi1J;+higher-ordertermsin 11 ;-I OX; x0

(2.1.4)

then

(2.1.5)

From this condition we can deduce an equivalent and perhaps more familiar

of/ox;, expression of the minimum condition: every first-order partial derivative vanishes at

x0• To prove this, for say,

we simply choose TJ to be along the ith direction.

Thus

of! =o

OX; x0

'

i=l, ... ,n

(2.1.6)

Let us now mimic this procedure for the action S. Let xc1 (t) be the path of least
action and xc1 (t) + 17(1) a "nearby" path (see Fig. 2.2). The requirement that all paths coincide at t; and t1 means

(2.1.7)

78
CHAPTER 2

Now

I, S[xc1 (t) + 17(t)] = !t'(xc~{t) + 17(t); ic~(t) + l)(t)) dt
t,

I =

I

.('[ !t'(xc~(t),
t,

ic1(t))

+ -a!-t'
ax(f)

Xd

·

17(t)

+ a!t' I . l)(t)+· · ·Jdt
a.x(t) x" = S[xc1 (t)] + 8 S0 1+ higher-order terms

We set os<11 =0 in analogy with the finite variable case:

I . I J 0=8S<11

=

I

''[ -a!-t' I; ax(f)

Xd

·

17(t)+a!-t.'ax(f)

X<]

·l)(t)

dt

If we integrate the second term by parts, it turns into

J -a!t' I · 11U) l'f- I''[d- -a!t' · 77{t) dt

ax(t) X"

,, '; dt ax(t) x"

The first of these terms vanishes due to Eq. (2.1.7). So that

J · o= 8s< 11 = I'~[ a!t' _!:__ a!t'

11(t) dt

I; ax(t) dt ax(t) x"

(2.1.8)

Note that the condition 8s<IJ = 0 implies that S is extremized and not necessarily minimized. We shall, however, continue the tradition of referring to this extremum as the minimum. This equation is the analog of Eq. (2.1.5): the discrete variable 1]; is replaced by 17(t); the sum over i is replaced by an integral over t, and ajjax; is replaced by
a2 d a2
ax(t) dt ax(t)

There are two terms here playing the role of ajjax; since !t' (or equivalently S) has both explicit and implicit (through the .X terms) dependence on x(t). Since 77(t) is
arbitrary, we may extract the analog of Eq. (2.1.6):

_!__[a { a!t'

2 ]} =0 fort;5,t5,tf

ax(t) df ax(t) Xd{l)

(2.1.9)

To deduce this result for some specific time t0 , we simply choose an 1J{t) that vanishes everywhere except in an infinitesimal region around t0 •

Equation (2.1.9) is the celebrated Euler-Lagrange equation. If we feed into it :£=T-V, T= 1mx2, V= V(x), we get

off -o= x

-ao=xr

m

.
x

and
a:£ av ox ox

so that the Euler-Lagrange equation becomes just

d

av

-(m dt

x

)=

-ox-

which is just Newton's Second Law, Eq. (2.1.1).
If we consider a system described by n Cartesian coordinates, the same procedure yields

Now and

~ (a~)= off (i= 1, ... 'n)
dt OX; OX;
n
T=~ I m;(x;f
i=l

(2.1.10)

V= V(Xt, ... , Xn)

so that Eq. (2.1.10) becomes

d

av

-(m;x;)=--

dt

OX;

which is identical to Eq. (2.1.2). Thus the minimum (action) principle indeed repro-
duces Newtonian mechanics if we choose :£ = T- V.
Notice that we have assumed that Vis velocity-independent in the above proof. An important force, that of a magnetic field B on a moving charge is excluded by
this restriction, since FB = qv x B, q being the charge of the particle and v = t its
velocity. We will show shortly that this force too may be accommodated in the Lagrangian formalism, in the sense that we can find an :£ that yields the correct force law when Eq. (2.1.10) is employed. But this:£ no longer has the form T-V. One therefore frees oneself from the notion that:£= T-V; and views:£ as some

79
REVIEW OF CLASSICAL MECHANICS

80
CHAPTER 2

function 2'(x;, i;) which yields the correct Newtonian dynamics when fed into the Euler-Lagrange equations. To the reader who wonders why one bothers to even deal with a Lagrangian when all it does is yield Newtonian force laws in the end, I present a few of its main attractions besides its closeness to quantum mechanics. These will then be illustrated by means of an example.
( 1) In the Lagrangian scheme one has merely to construct a single scalar !£' and all the equations of motion follow by simple differentiation. This must be contrasted with the Newtonian scheme, which deals with vectors and is thus more
complicated. (2) The Euler-Lagrange equations (2.1.10) have the same form if we use, instead
of then Cartesian coordinates x ••... , Xn, any general set of n independent coordinates q1, q2, ... , q". To remind us of this fact we will rewrite Eq. (2.1.10) as

(2.1.11)

One can either verify this by brute force, making a change of variables in Eq. (2.1.10) and seeing that an identical equation with x; replaced by q; follows, or one can simply go through our derivation of the minimum action condition and see that nowhere were the coordinates assumed to be Cartesian. Of course, at the next stage, in showing that the Euler-Lagrange equations were equivalent to Newton's, Cartesian coordinates were used, for in these coordinates the kinetic energy T and the Newtonian equations have simple forms. But once the principle ofleast action is seen to generate the correct dynamics, we can forget all about Newton's laws and use Eq. (2.1.11) as the equations of motion. What is being emphasized is that these equations, which express the condition for least action, are form invariant under an arbitrary change of coordinates. This form invariance must be contrasted with the Newtonian equation (2.1.2), which presumes that the X; are Cartesian. If one trades the X; for another non-Cartesian set of q;, Eq. (2.1.2) will have a different form (see Example 2.1.1 at the end of this section).
Equation (2.1.11) can be made to resemble Newton's Second Law if one defines a quantity
(2.1.12)

called the canonical momentum conjugate to q; and the quantity

8!£' F;=-
oq;

(2.1.13)

called the generalized force conjugate to q;. Although the rate of change of the canonical momentum equals the generalized force, one must remember that neither is p; always a linear momentum (mass times velocity or "mv" momentum), nor is F; always a force (with dimensions of mass times acceleration). For example, if q; is an
angle (), p; will be an angular momentum and F; a torque.

(3) Conservation laws are easily obtained in this formalism. Suppose the Lag-
rangian depends on a certain velocity q; but not on the corresponding coordinate q;.
The latter is then called a cyclic coordinate. [t follows that the corresponding p; is
conserved:

(2.1.14)

Although Newton's Second Law, Eq. (2.1.2), also tells us that if a Cartesian coordi-
nate X; is cyclic, the corresponding momentum m;x; is conserved, Eq. (2.1.14) is more
general. Consider, for example, a potential V(x, y) in two dimensions that depends
only upon p=(x2 +y) 112, and not on the polar angle o/, so that V(p, ¢)= V(p). It follows that ¢ is a cyclic coordinate, as T depends only on ¢ (see Example 2.1.1 below). Consequently of!?I a¢= pq, is conserved. In contrast, no obvious conservation
law arises from the Cartesian Eqs. (2.1.2) since neither x nor y is cyclic. If one
o rewrites Newton's laws in polar coordinates to exploit V/ ocfo = 0, the corresponding
equations get complicated due to centrifugal and Coriolis terms. It is the Lagrangian
formalism that allows us to choose coordinates that best reflect the symmetry of the
potential, without altering the simple form of the equations.

Example 2.1.1. We now illustrate the above points through an example. Consider a particle moving in a plane. The Lagrangian, in Cartesian coordinates, is

!I?= ~m(.i2 +Jj2)- V(x, y)
= hm"v- V(x, y)

(2.1.15)

where v is the velocity of the particle, with v= r, r being its position vector. The
corresponding equations of motion are

av mi=- ax
my= .... -a;;v-
oy

(2.1.16) (2.1.17)

which are identical to Newton's laws. If one wants to get the same Newton's laws in terms of polar coordinates p and ¢, some careful vector analysis is needed to unearth the centrifugal and Coriolis terms:

av . ,
mp= -~:_ +mp(¢Y op

m.¢. =-;1;

av 2mpcfo
---~-

p a¢ P

(2.1.18) (2.1.19)

81
REVIEW OF CLASSICAL MECHANTCS

82
y
CHAPTER 2

Figure 2.3. Points (I) and (2) are positions of the particle at times differing by At.
Notice the difference in form between Eqs. (2.1.16) and (2.1.17) on the one hand and Eqs. (2.1.18) and (2.1.19) on the other.
In the Lagrangian scheme one has only to recompute !!:' in polar coordinates. From Fig. 2.3 it is clear that the distance traveled by the particle in time !l.t is
so that the magnitude of velocity is

and

(2.1.20)

(Notice that in these coordinates T involves not just the velocities p and cfi but also
the coordinate p. This does not happen in Cartesian coordinates.) The equations of motion generated by this !!:' are

op d
-d(t m

p)

av
=--+m

p c p·z

d -(

m

p

2

c•

p

)

=a-v-

dt

oc/J

(2.1.21) (2.1.22)

which are the same as Eqs. (2.1.18) and (2.1.19). In Eq. (2.1.22) the canonical
momentump.p=mp2cfi is the angular momentum and the generalized force -oV/oc/J

is the torque, both along the z axis. Notice how easily the centrifugal and Coriolis

forces came out.

Finally, if V(p, cp)= V(p), the conservation of P<t> is obvious in Eq. (2.1.22).

The conservation ofP<t> follows from Eq. (2.1.19) only after some manipulations and

is practically invisible in Eqs. (2.1.16) and (2.1.17). Both the conserved quantity and

its conservation law arise naturally in the Lagrangian scheme.

0