zotero-db/storage/DR4SLV27/.zotero-ft-cache

THE
PRINCIPLES
OF
QUANTUM MECHANICS
BY
P. A. M. DIRAC
LUCASIAN PROFESSOR OF ?dATHEMATICS IN TEE UNIVERSITY OF CAaIBBIDQBI
THIRD EDITION
OXFORD AT THE CLARENDON PRESS

Oxford University Press, Amen House, London E.C.4

GLASGOW NEW YORK TORONTO MELBOURNE WELLINGTON

BOMBAY CALCUTTA MADRAS CAPE TOWN

Geojafrey c

University

Second edition 1

Reprinted photographically in Great Britain at the Univercity Prese, Oxford, 1948, 194Y
from eheets of the third edition
*

PREFACE TO THIRD EDITION

THE book has again been mostly rewritten to bring in various’ improvements. The chief of these is the use of the notation of bra and ket vectors, which 1 have developed since 1939. This notation allows a more direct connexion to be made between the formalism in terms of the abstract quantities corresponding to states and observables and the formalism in terms of representatives-in fact the two formalisms become welded into a Single comprehensive scheme. With the help of this notation several of the deductions in the book take a simpler and neater form.
Other substantial alterations include :
(i) A new presentation of the theory of Systems with similar particles, based on Fock’s treatment of the theory of radiation adapted to the present notation. This treatment is simpler and more powerful than the one given in earlier editions of the book.

(ii) A further development of quantum electrodynamics, including the theory of the Wentzel field. The theory of the electron in interact’ion with the electromagnetic field is oarried as far as it tan be at the present time without getting on to speculative ground.

ST. JOHN% COLLEGE, CAMBRIDGE
21 April 1947 .

P. A. M. D.

FROM THE PREFACE TO THE SECOND EDITION

THE book has been mostly rewritten. 1 have tried by carefully overhauling the method of presentation to give the development of the theory in a rather less abstract form, without making any sacrifices in exactness of expression or in the logical Character of the development. This should make the work suitable for a wider circle of readers, although the reader who likes abstractness for its own sake may possibly prefer the style of the first edition.
The main Change has been brought about by the use of the word ‘state ’ in a three-dimensional non-relativistic sense. It would seem at first sight a pity to build up the theory largely on the basis of nonrelativistic concepts. The use of the non-relativistic meaning of ‘state ‘, however, contributes so essentially to the possibilities of clear exposition as to lead one to suspect that the fundamental ideas of the present quantum mechanics are in need of serious alteration at just tbis Point, and that an improved theory would agree more closely ’ with the development here given than with a development which aims -at preserving the relativistic meaning of ‘state’ throughout.

THE INSTITUTE FOR ADVANCED STUDY
PRINCETON
27 November 1934

P. A. M. D.

PROM THE PREFACE TO THE FIRST EDITION
THE methods of progress in theoretical physics have undergone a vast Change during the present century. The classical fradition has been to consider the world to be an association of observable objects (particles, fluids, fields, etc.) moving about according ‘to deflnite laws of forte, so that one could form a mental picture in space and time of the whole scheme. This led to a physics whose aim was to make assumptions about the mechanism and forces connecting these observable objects, to account for their behaviour in the simplest possible way. It has become increasingly evident ia recent times, however, that nature works on a different plan. Her fundamental laws do not govern the world as it appears in our mental picture in any very direct way, but instead they control a substraturn of which we cannot form a mental picture without introducing irrelevancies. The formulation of these laws requires the use of the mathematics of transformations. The important things in the world appear as the invariants (or more generally the nearly invariants, or quantities with simple transformation properties) of these transformations. The things we are immediately aware of are the relations of these nearly invariants to a certain frame of reference, usually one Chosen so as to introduce special simplifying features which are unimportant from the Point of view of general theory.
The growth of the use of transformation theory, as applied first to relativity and later to the quantum theory, is the essence of the new method in theoretical physics. Further progress lies in the direction of making our equations invariant under wider and still wider transformations. This state of affairs is very satisfaotory from a philosophical Point of view, as implying an increasing recognition of the
pst played by the observer in himself introducing the regularities that appear in his observations, and a lack of arbitrariness in the ways of nature, but it makes things less easy for the learner of physics. The new theories, if one looks apart from their mathematical setting, are built up from physical concepts which cannot be explained in terms of things previously known to the Student, which cannot even be explained adequately in words at all, Like the fundamental concepts (e.g. proximity, identity) which every one must learn on his

-------

------_

--r---_..---~

___- _ - - _ _ - -_--.---.-._

3

. . .
v111

PREFACE TO FIRST EDITION

arrival into the world, the newer concepts of physics tan be mastered only by long familiarity with their properties and uses.
From the mathematical side the approach to the new theories presents no difficulties, as the mathematics required (at any rate that which is required for the development of physics up to the present) is not essentially different from what has been current for a considerable time. Mathematics is the tool specially suited for dealing with abstract concepts of any kind and there is no limit to its power in this field. For this reason a book on the new physics, if not purely descriptive of experimental work, must be essentially mathematical. All the same the mathematics is only a tool and one should learn to hold the physical ideas in one’s mind without reference to the mathematical form. In this book 1 have tried to keep the physics to the forefront, by beginning with an entirely physical chapter and in the later work examining the physical meaning underlying the formalism wherever possible. The amount of theoretical ground one has to cover before being able to solve Problems of real practical value is rather large, but this circumstance is an inevitable consequence of the fundamental part played by transformation theory and is likely to become more
pronounced in the theoretical physics of the future. With regard to the mathematical form in which the theory tan be
presented, an author must decide at the outset between two methods. There is the symbolic method, which deals directly in an abstract way with the quantities of fundamental importante (the invariants, etc., of the transformations) and there is the method of coordinates or representations, which deals with sets of numbers corresponding to these quantities. The second of these has usually been used for the presentation of quantum mechanics (in fact it has been used practically exclusively with the exception of Weyl’s book Gruppentheorie und Quantenmechanik). It is known under one or other of the two names ‘ Wave Mechanics ’ and ‘ Matrix Mechanics ’ according to which physical things receive emphasis in the treatment, the states of a System or its dynamical variables. It has the advantage that the kind of mathematics required is more familiar to the average Student, and also it is the historical method.
The symbolic method, however, seems to go more deeply into the nature of fhings. It enables one to exuress the physical laws in a neat and concise way, and will probably be increasingly used in the future as it becomes better understood and its own special mathematics gets

PREFACE TO FIRST EDITION

ix ’

developed. For this reason 1 have Chosen thc symbolic method,

introducing the representatives later merely as 51; aid to practical

calculation. This has necessitated a completc break from the histori-

cal line of development, but this break is an advantage through

enabling the approach to the new ideas to be made as direct as

possible.

P. A. M. D.

ST.JOKN'S COLLEGE,CAMBRIDGE
29 May 1930

CONTENTS

1. THE PRINCIPLE OF SUPERPOSITION .

.

1

1. The Need for a Quantum Theory

.

.

1

2. The Polarization of Photons .

.

.

4

3. Interference of Photons

.

.

.

7

4. Superposition and Indeterminaoy

.

.

10

6. Mathematical Formulation of the Principle

.

14

6. Bra andKet Vectors . . . .

18

11. DYNAMICAL VARIABLES AND OBSERVABLES .

23

7. Linear Operators . . .

.

23

8. Conjugate Relations . . .

.

26

9. Eigenvalues and Eigenvectors .

.

.

29

10. Observables . . . .

.

34

11. Functions of Observables

.

‘.

.

41

12. The General Physicd Interpretation .

.

45

13. Commutability and Compatibility

.

.

49

111. REPRESENTATIONS . . .

.

53

14. Basic Vectors . . . .

.

53

16. The 8 Funotion . .

.

.

.

58

16. Properties of the Basic Vectors .

.

*

62

17. The Representation of Linear Operators

.

67

18. Probability Amplitudes

.

.

.

72

19. Theorems about Functions of Observables

.

76

20. Developments in Notation .

.

.

79

IV. THE QUANTUM CONDITIONS .

.

.

84

21. Poisson Brackets . . .

.

84

22. Schriidinger’s Representation .

.

.

89

23. The Momentum Representation .

.

.

94

24. Heisenberg’s Prinoiple of Uncertainty .

.

97

26. Displacement Operators .

.

.

.

99

26. Unitary Transformations . .

.

103

V. THE EQUATIONS OF MOTION .

.

.

108

27. Schrodinger’s Form for the Equations of Motion

108

28. Heisenberg’s Form for the Equations of Motion

111

29. Stationary States

. . . .

116

30. The Free Particle . . . .

118

31. The Motion of Wave Packets .

.

.

121

32. The Action Principle . . . .

126

33. The Gibbs Ensemble . . . .

130

VI. ELEMENTARY APPLICATIONS . . .

136

34. The Harmonie Oscillator . . .

136

35. Angular Momentum

. . . .

140

CONTENTS

36. Properties of Angular Momentum

.

.

37. The Spin of the Electron

,

.

.

38. Motion in a Central Field of Forte

.

.

39. Energy-levels of the Hydrogen Atom .

.

40.

Selection

Rules

.

.

.

.

.

41. The Zeeman Effect for the Hydrogen Atom

.

xi
. 144 . 149 . 152 . 156 . 159 . 165

VII.

PERTURBATION

THEORY

.

.

.

.

167

42.

General

Remarks

.

.

.

.

. 167

43. The Change in the Energy-levels caused by a Perturbation 168 b

44. The Perturbation considered as causing Transitions

. 172

45. Application to Radiation . . . . 1 7 5

46. Transitions caused by a Perturbation Independent of the

Time

.

.

.

.

.

.

.

178

47. The Anomalous Zeeman Effect .

.

.

. 181

VTTT.

COLLTSION

PROBLEMS

.

.

.

.

.

185

48.

General

Remarks

.

.

.

.

. 185

49. The Stattering Coefficient . . . . 1 8 8

50. Solution with the Momentum Representation .

. 193

51.

Dispersive

Stattering

.

.

.

.

. 199

5 2 . Rosonance Stattering .

.

. 201

53. Emission and Absorption . . . . 204

IX. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 207

54. Symmetrical and Antisymmetrical States

.

. 207

55. Permutations as Dynamical Variables .

.

. 211

56. Permutations as Constants of the Motion

. ’

. 213

,

57. Determination of the Energy-levels .

.

. 216

58. Application to Electrons.

. . . . 219

X.

THEORY

OF

RADIATION

.

.

.

.

. 225

59. An Assembly of Bosons

.

.

.

. 225

60. The Connexion between Bosons and Oscillators .

. 227

61. Emission and Absorption of Bosons .

.

. 232

62. Application to Photons . . . . . 2 3 5

$6.

63. The Literaction Energy between Photons and an Atom . 239

64. Emission, Absorption, and Stattering of Radiation

. 244

65. An Assembly of Fermions . . . . 2 4 8

XI. RELATIVISTIC THEORY OF THE ELECTRON .

66. Relativistic Treatment of a Particle .

.

67. The Wave Equation for the Electron .

.

68. Invariante under a Lorentz Transformation

.

69. Tho Motion of a Free Electron .

.

.

70. Existente of the Spin . . . .

7 1. Transition to Polar Variables .

.

.

72. The Fine-structure of the Energy-levels of Hydrogen

73. Theory of the Positron . . . .

. 252 . 252 . 253
. 257 . 260
. 263 . 266
. 268 . 272

Bi

CONTENTS

XII. QUANTUM ELECTRODYNAMICS . . .

74. Relativistic Notation . . . .

75. The Quantum Conditions for the Field .

.

76. The Hamiltonian for the Field .

.

.

77. The Supplementary Conditions .

.

_

78. Classical Electrodynamits in Hamiltonian Form .

79. Passage to the CJuantum Theory

.

.

80. Elimination of the Longitudinal Waves .

.

8 1. Discussion of the Transverse Waves .

.

INDEX

.

.

.

.

.

.

.

. 275 . 275 . 278 . 283 . 285 . 289 . 296 . 300 . 306
310

THE PRINCIPLE 03’ SUPERPOSITION

1. The need for a quantum theory

CLASSICAL mechanics has been developed continuously from the time of Newton and applied to an ever-widerring range of dynamical Systems, including the electromagnetic field in interaction ,with matter. The underlying ideas and the laws governing their application form a simple and elegant scheme, which one would be inclined to think could not be seriously modified without having all its ’ attractive features spoilt. Nevertheless it has been found possible to set up a new scheme, called quantum mechanics, which is more suitable for the description of phenomena on the atomic scale and which is in some respects more elegant and satisfying than the classical scheme. This possibility is due to the changes which the new scheme involves being of a very profound Character and not clashing with the features of the classical theory that make it so attractive, as a result of which all these features tan be incorporated in the new scheme.
The necessity for a departure from classical mechanics is clearly

shown by experimental results. In the first place the forces known in classical electrodynamics are inadequate for the explanation of the remarkable stability of atoms and molecules, which is necessary in Order that materials may have any definite physical and Chemical properties at all. The introduction of new hypothetical forces will not save the Situation, since there exist general principles of classical mechanics, holding for all kinds of forces, leading to results in direct disagreement with Observation. For example, if an atomic System has its equilibrium disturbed in any way and is then left alone, it will be set in oscillation and the oscillations will get impressed on the surrounding electromagnetic field, so that their frequencies may be observed with a spectroscope. Now whatever the laws of forte governing the equilibrium, one would expect to be able to include the various frequencies in a scheme comprising certain fundamental frequencies and their harmonics. This is not observed to be the case. Instead, there is observed a new and unexpected connexion between the frequencies, called Ritz’s Combination Law of Spectroscopy, according to which all the frequencies tan be expressed as differentes between certain terms,

3696.67

73

2

THE PRINCIPLE OF SUPERPOSITION

§l

the number of terms being much less than the number of frequencies. This law is quite unintelligible from the classical Standpoint.
One might try to get over the difficulty without departing from classical mechanics by assuming each of the spectroscopically observed frequencies to be a fundamental frequency with its own degree of freedom, the laws of forte being such that the harmonic vibrations do not occur. Such a theory will not do, however, even apart from the fact that it would give no explanation of the Combination Law, since it would immediately bring one into conflict with the experimental evidente on specific heats. Classical statistical mechanics enables one to establish a general connexion between the total number of degrees of freedom of an assembly of vibrating Systems and its specific heat. If one assumes all the spectroscopic frequencies of an atom to correspond to different degrees of freedom, one would get a specific heat for any kind of matter very much greater than the observed value. In fact the observed specific heats at ordinary temperatures are given fairly weh by a theory that takes into account merely the motion of each atom as a whole and assigns no internal motion to it at all.
This leads us to a new clash between classical mechanics and the results of experiment. There must certainly be some internal motion in an atom to account for its spectrum, but the internal degrees of freedom, for some classically inexplicable reason, do not contribute to the specific heat. A similar clash is found in connexion with the energy of oscillation of the electromagnetic field in a vacuum. Classical mechanics requires the specific heat corresponding to this energy to be infinite, but it is observed to be quite finite. A general conclusion from experimental results is that oscillations of high frequency do not contribute their classical quota to the specific heat.
As another illustration of the failure of classical mechanics we may consider the behaviour of light. We have, on the one hand, the phenomena of interference and diffraction, which tan be explained only on the basis of a wave theory; on the other, phenomena such as photo-electric emission and stattering by free electrons, which show that light is composed of small particles. These particles, which
are called photons, have each a definite energy and momentum, depending on the frequency of the light, and appear to have just as real an existente as electrons, or any other particles known in physics. A fraction of a Photon is never observed.

§1

THE NEED FOR A QUANTUM THEORY

3

Experiments have shown that this anomalous behaviour is not
peculiar to light, but is quite general. All material particles have wave properties, which tan be exhibited under suitable conditions. We have here a very striking and general example of the breakdown of classical mechanics-not merely an inaccuracy in its laws of motion, but an inadequucy of its concepts to supply us with a description of atomic events.
The necessity to depart from classical ideas when one wishes to account for the ultimate structure of matter may be Seen, not only from experimentally established facts, but also from general philosophical grounds. In a classical explanation of the constitution of matter, one would assume it to be made up of a large number of small constituent Parts and one would Postulate laws for the behaviour of these Parts, from which the laws of the matter in bulk could be deduced. This would not complete the explanation, however, since the question of the structure and stability of the constituent Parts is left untouched. To go into this question, it becomes necessary to postulate that each constituent part is itself made up of smaller Parts, in terms of which its behaviour is to be explained. There is clearly no end to this procedure, so that one tan never arrive at the ultimate structure of matter on these lines. So long as big and small are merely relative concepts, it is no help to explain the big in terms of the small. It is therefore necessary to modify classical ideas in such a way as to give an absolute meaning to size.
At this Stage it becomes important to remember that science is concerned only with observable things and that we tan observe an Object only by letting it interact with some outside influence. An act of Observation is thus necessarily accompanied by some disturbance of the Object observed. We may define an Object to be big when the disturbance accompanying our Observation of it may be neglected, and small when the disturbance cannot be neglected. This definition is in close agreement with the common meanings of big and small.
It is usually assumed that, by being careful, we may tut down the disturbance accompanying our observation to any desired extent. The concepts of big and small are then purely relative and refer to the gentleness of our means of Observation as well as to the Object being described. In Order to give an absolute meaning to size, such as is required for any theory of the ultimate structure of matter, we have to assume that there is a lz’mit to the$neness of ourpowers of observati&

4

THE PRINCIPLE OF SUPERPOSITION

§l

and the smallness of the dccompanying disturbance-a limit which is inherent in the n&ure of things and tun never be surpassed by improved technique or increused skill on the part of the observer. If the Object under Observation is such that the unavoidable limiting disturbance is negligible, then the Object is big in the absolute sense and we may apply classical mechanics to it. If, on the other hand, the limiting disturbance is not negligible, then the Object is small in the absolute sense and we require a new theory for dealing with it.
A consequence of the preceding discussion is that we must revise our ideas of causality. Causality applies only to a System which is left undisturbed. If a System is small, we cannot observe it without producing a serious disturbance and hence we cannot expect to find any causa1 connexion between the results of our observations. Causality will still be assumed to apply to undisturbed Systems and the equations which will be set up to describe an undisturbed System will be differential equations expressing a causa1 connexion between conditions at one time and conditions at a later time. These equations will be in close correspondence with the equations of classical mechanics, but they will be connected only indirectly with the results of observations. There is an unavoidable indeterminacy in the calculation of observational results, the theory enabling us to calculate in general only the probability of our obtaining a particular result when we make an Observation.

2. The polarization of photons
The discussion in the preceding section about the limit to the gentleness with which observations tan be made and the consequent indeterminacy in the results of those observations does not provide any quantitative basis for the building up of quantum mechanics. For this purpose a new set of accurate laws of nature is required. One of the most fundamental and most drastic of these is the Principle of Superposition of States. We shall lead up to a general formulation of this principle through a consideration of some special cases, taking first the example provided by the polarization of light.
It is known experimentally that when plane-p,olarized light is used for ejecting photo-electrons, there is a preferential direction for the
electron emission. Thus the polarization properties of light are closely connected with its corpuscular properties and one must ascribe a polarization to the photons. One must consider, for instance, a beam

§2

THE POLARIZATIOK OF PHOTONS

6

of light plane-polarized in a certain direction as consisting of photons each of which is plane-polarized in that direction and a beam of circularly polarized light as consisting of photons each circularly polarized. Every Photon is in a certain state of poihrization, as we shall say. The Problem we must now consider is how to fit in these ideas with the known facts about the resolution of light into polarized components and the recombination of these components.
Let us take a definite case. Suppose we have a beam of light passing through a crystal of tourmahne, which has the property of letting through only light plane-polarized perpendicular to its optic axis. Classical electrodynamics teils us what will happen for any given polarization of the incident beam. If this beam is polarized perpendicular to the optic axis, it will all go through the crystal; if parallel to the axis, none of it will go through; while if polarized at an angle CY to the axis, a fraction sin2a will go through. How are we to understand these results on a Photon basis?
A beam that is plane-polarized in a certain direction is to be pictured as made up of photons each plane-polarized in that direction. This picture leads to no difficulty in the cases when our
incident beam is polarized perpendicular or parallel to the optic axis. We merely have to suppose that each Photon polarized perpendicular to the axis Passes unhindered and unchanged through the crystal, while each Photon polarized parallel to the axis is stopped and absorbed. A difhculty arises, however, in the case of the obliquely polarized incident beam. Esch of the incident photons is then obliquely polarized and it is not clear what will happen to such a Photon when it reaches the tourmalme.
A question about what will happen to a particular Photon under certain conditions is not really very precise. To make it precise one must imagine some experiment performed having a bearing on the question and inquire what. will be the result of the experiment- Only questions about the results of experiments have a real significance and it is only such questions that theoretical physics has to consider.
In our present example the obvious experiment is to use an incident beam consisting of only a Single Photon and to observe what appears on the back side of the crystal. According to quantum mechanics the result of this experiment will be that sometimes one will find a whole Photon, of energy equal to the energy of the incident Photon, on the back side and other times one will find nothing. When one

6

THE PRINCIPLE OF SUPERPOSITION

82

Gands a whole Photon, it will be polarized perpendicular to the optic axis. One will never find only a part of a Photon on the back side. If one repeats the experiment a large number of times, one will find the Photon on the back side in a fraction sin2cY of the total number of times. Thus we may say that the Photon has a probability sin2cu. of passing through the tourmahne and appearing on the back side polarized perpendicular to the axis and a probability cos2, of being absorbed. These values for the probabilities lead to the correct classical results for an incident beam containing a large number of photons.
In this way we preserve the individuality of the Photon in all cases. We are able to do Gis, however, only because we abandon the determinacy of the classical theory. The result of an experiment is not determined, as it would be according to classical ideas, by the conditions under the control of the experimenter. The most that tan be predicted is a set of possible results, with a probability of occurrence for each.
The foregoing discussion about the result of an experiment with a Single obliquely polarized Photon incident on a crystal of tourmaline answers all that tan legitimately be asked about what happens to an obliquely polarized Photon when it reaches the tourmahne. Questions about what decides whether the Photon is to go through or not and how it changes its direction of polarization when it does go through cannot be investigated by experiment and should be regarded as outside the domain of science. Nevertheless some further description is necessary in Order to correlate the results of this experiment with the results of other experiments that might be performed with photons and to fit them all into a general scheme. Such further description should be regarded, not as an attempt to answer questions outside the domain of science, but as an aid to the formulation of rules for expressing concisely the results of large numbers of experiments.
The further description provided by quantum mechanics runs as follows. It is supposed that a Photon pobrized obliquely to the optic axis may be regarded as being partly in the state of polarization parallel to the axis and partly in the state of polarization perpen-
dicular to the axis. The state of oblique polarization may be considered as the result of some kind of Superposition process applied to the two states of parallel and perpendicular polarization. This implies

$2

THE POLARIZATION OF PHOTONS

7

a certain special kind of relationship between the various states of polarization, a relationship similar to that between polarized beams in classical optics, but which is now to be applied, not to beams, but to the states of polarization of one particular Photon. This relationship allows any state of polarization to be resolved into, or expressed as a superposition of, any two mutually perpendicular states of polarization.
When we make the Photon meet a tourmalme crystal, we are subjecting it to an Observation. We are observing whether it is polarized parallel or perpendicular to the optic axis. The effect of making this Observation is to forte the Photon entirely into the state of parallel or entirely into the state of perpendicular polarization. It has to make a sudden jump from being partly in each of these two states to being entirely in one or other of them. Which of the two states it will jump into cannot be predicted, but is governed only by probability laws. If it jumps into the parallel state it gets absorbed and if it jumps into the perpendicular state it Passes through the crystal and appears on the other side preserving this state of polarization.

3. Interference of photons In this section we shall deal with another example of Superposition.
We shall again take photons, but shall be concerned with their position in space and their momentum instead of their polarization. If we are given a beam of roughly monochromatic light, then we know something about the location and momentum of the associated photons. We know that each of them is located somewhere in the region of space through which the beam is passing and has a momenturn in the direction of the beam of magnitude given in terms of the frequency of the beam by Einstein’s photo-electric law-momentum equals frequency multiplied by a universal constant. When we have such information about the location and momentum of a Photon we shall say that it is in a definite tramlat@nal state.
We shall discuss the description which quantum mechanics provides of the interference of photons. Let us take a definite experiment demonstrating interference. Suppose we have a beam of light which is passed through some kind of interferomefer, so that it gets Split up into two components and the two components are subsequently made to interfere. We may, as in the preceding section, take an incident beam consisting of only a Single Photon and inquire what

8

THE PRINCIPLE OF SUPERPOSITION

93

will happen to it as it goes through the apparatus. This will present to us the difficulty of the confliet between the wave and corpuscular theories of light in an acute form.
Corresponding to the description that we had in the case of the polarization, we must now describe the Photon as going partly into each of the two components into which the incident beam is Split.
The Photon is then, as we may say, in a translational state given by the Superposition of the two translational states associated with the two components. We are thus led to a generalization of the term ‘translational state’ applied to a Photon. For a Photon to be in a definite translational state it need not be associated with one Single beam of light, but may be associated with two or more beams of light which arc the components into which one original beam has been Split.? In the accurate mathematical theory each translational state is associated with one of the wave functions of ordinary wave optics, which wave function may describe either a Single beam or two or more beams into which one original beam has been Split. Translational states are thus superposable in a similar way to wave functions.
Let us consider now what happens when we determine the energy in one of the components. The result of such a determination must be either the whole Photon or nothing at all. Thus the Photon must Change sudderily from being partly in one beam and partly in the other to being entirely in one of the beams. This sudden Change is due to the disturbance in the translational state of the Photon which the Observation necessarily makes. It is impossible to predict in which of the two beama the Photon will be found. Only the probability of either result tan be calculated from the previous diatribution of the Photon over the two beams.
One could carry out the energy measurementwithout destroying the component beam by, for example, reflecting the beam from a movable mirror and observing the recoil. Our description of the Photon allows us to infer that, ufter such an energy measurement, it would not be possible to bring about any interference effects between the two components. So long as the Photon is partly in one beam and partly in the other, interference tan occur when the two beams are superpose& but this possibility disappears when the Photon is forced entirely into
t The circumstance that the superposition idea requires us to generalize our original meaning of translational states, but that no corresponding generalization was needed for the states of Polarkation of the preceding section, is an accidental one with no underlying theoretical sign&ance.

§3

INTERFERENCE OF PHOTONS

one of the beams by an Observation. The other beam then no langer emers into the description of the Photon, so that it counts &S being entirely in the one beam in the ordinary way for any experiment that

may subsequently be performed on it. On these lines quantum mechanics is able to effect a reconciliation
of fhe wave and corpuscular properties of light. The essential Point is the association of each of the translational states of a photon with one of the wave functions of ordinary wave optics. The nature of this association cannot be pictured on a basis of classical mechanics, but is something entirely new. It would be quite wrong to picture the Photon and its associated wave as interacting in the way in which particles and waves tan interact in classical mechanics. The association tan be interpreted only statistically, the wave function giving us information about the probability of our finding the Photon in any particular place when we make an Observation of where it is.
Some time before the discovery of quantum mechanics People realized that the connexion between light waves and photons must be of a statistical Character. What they did not clearly realize, however, was that the wave function gives information about the probability of one Photon being in a particular place and not the probable * number of photons in that place. The importante of the distinction tan be made clear in the following way. Suppose we have a beam of light consisting of a large number of photons Split up into two components of equal intensity. On the assumption that the intensity of a beam is connected with the probable number of photons in it, we should have half the total number of photons going into each component. If the two components are now made to interfere, we should require a Photon in one component to be able to interfere with one in the other. Sometimes these two photons would have to annihilate one another and other firnes they would have to produce four photons. This would contradict the conservation of energy. The new theory, which connects the wave function with probabilities for one Photon, gets over the difficulty by making each Photon go partly into each of the two components. Esch Photon then interferes only with itself. ’ Interference between two different photons never occurs.
The association of particles with waves discussed above is not ’ restricted to the case of light, but is, according to modern theory, of universal applicability. All kinds of particles are associated with waves in this way and conversely all wave motion is associated with

10

THE PRINCIPLE OF SUPERPOSITION

§3

particles. Thus all particles tan be made to exhibit interference effects and all wave motion has its energy in the form of quanta. The reason why these general phenomena are not more obvious is on account of a law of proportionality betwcen the mass or energy of the particles and the frequency of the waves, the coefficient being such that for waves of familiar frequencies the associated quanta are extremely small, while for particles even as light as electrons the associated wave frequency is so high that it is not easy to demonstrate interference.

4. Superposition and indeterminacy

The reader may possibly feel dissatisfied with the attempt in the

two preceding sections to fit in the existente of photons with the

classical theory of light. He may argue that a very strange idea has

been introduced-the possibility of a Photon being partly in each of

two states of polarization, or partly in each of two separate beams-

but even with the help of this strange idea no satisfying picture of

the fundamental Single-Photon processes has been given. He may say

further that this strange idea did not provide any information about

experimental results for the experiments discussed, beyond what

could have been obtained from an elementary consideration of

photons being guided in some vague way by waves. What, then, is

the use of the strange idea?

In answer to the first criticism it may be remarked that the main

Object of physical science is not the Provision of pictures, but is the

formulation of laws governing phenomena and the application of

these laws to the discovery of new phenomena. If a picture exists,

so much the better; but wHhether a picture exists or not.--i-sw-a._matter

_ofno

,poc*ni-cltyu-rse,e-_c-t_oann_.d-.-ab--reye-ix”m.p_pe., co” t‘r““et.-d“a-“n_“tctoe.exiIsnt

the case of in the usual

atomic sense

phen&za of the word

‘picture’, by wbich is meant a model functioning essentially on

classical lines. One may, however, extend the meaning of the word

‘picture’ to include any way of looking at the fundamental laws which

makes their self-consistency obvious. With this extension, one may

gradually acquire a picture of atomic phenomena by becoming

familiar with the laws of the quantum theory.

With regard to the second criticism, it may be remarked that for

many simple experiments with light, an elementary theory of waves

and photons connected in a vague statistical way would be adequate

i

94

SUPERPOSITION AND INDETERMINACY

11

to account for the results. In the case of such experiments quantum mechanics has no further information to give. In the great majority of experiments, however, the conditions are too complex for an elementary theory of this kind to be applicable and some more elaborate scheme, such as is provided by quantum mechanics, is then needed. The method of description that quantum mechanics gives in the more complex cases is applicable also to the simple cases and although it is then not really necessary for accounting for the experimental results, its study in these simple cases is perhaps a suitable introduction to its study in the general case.
There remains an Overall criticism that one may make to the whole scheme, namely, that in departing from the determinacy of the classical theory a great complication is introduced into the description of Nature, which is a highly undesirable feature. This complication is undeniable, but it is offset by a great simplification, provided by the general principle of superposition of states, which we shall now go on to consider. But first it is necessary to make precise the important concept of a ‘state’ of a general atomic System.
Let us take any atomic System, composed of particles or bedies with specified properties (mass, moment of inertia, etc.) interacting according to specified laws of forte. There will be various possible motions of the particles or bodies consistent with the laws of forte. Esch such motion is called a state of the System. According to classical ideas one could specify a state by giving numerical values to all the coordinates and velocities of the various component Parts of the System at some instant of time, the whole motion being then completely determined. Now the argument of pp. 3 and. 4 Shows that we cannot observe a sma.8 System with that amount of detail which classical theory supposes. The limitation in the power of Observation puts a limitation on the number of data that tan be assigned to a state. Thus a state of an atomic System must be specitled by fewer or more indefinite data than a complete set of numerical values for all the coordinates and velocities at some instant of time. In the case when the System is just a Single Photon, a state would be completely specified by a given state of motion in the sense of $3 together with a given sfate of polarization in the sense of $! 2.
A state of a System may be defined as an undisturbed motion that is restricted by as many conditions or data as are theoretically possible without mutual interference or contradiction. In practice

-.-_._______ ..l _.._ ------ - .- _-- _._ -~--.

~?r

12

THE PRINCIPLE OF SUPERPOSITION

$4

the conditions could be imposed by a suitable preparation of the system, consisting perhaps in passing it through various kinds of sorting apparatus, such as slits and polarimeters, the System being left undisturbed after the preparation. The word ‘state’ may be used to mean either the state at one particular time (after the preparation), or the state throughout the whole of time after the preparation. To distinguish these two meanings, the latter will be called a ‘state of motion’ when there is liable to be ambiguity.
The general principle of superposition of quantum mechanics
applies to the states, with either of the above meanings, of any one dynamical System. It requires us to assume that between these states there exist peculiar relationships such that whenever the System is definitely in one state we tan consider it as being partly in each of two or more other states. The original state must be regarded as the result of a kind of superposition of the two or more new states, in a way that cannot be conceived on classical ideas. Any state may be considered as the result of a superposition of two or more other states, and indeed in an infinite number of ways. Conversely any two or more states may be superposed to give a new state. The procedure of expressing a state as the result of superPosition of a number of other states is a mathematical procedure that is always permissible, independent of any reference to physical conditions, like the procedure of resolving a wave into Fourier components. Whether it is useful in any particular case, though, depends on the special physical conditions of the Problem under consideration.
In the two preceding sections examples were given of the superPosition principle applied to a System consisting of a Single Photon. 0 2 dealt with states differing only with regard to the polarization and 5 3 with states differing only with regard to the motion of the Photon as a whole.
The nature of the relationships which the Superposition principle requires to exist between the states of any System is of a kind that cannot be explained in terms of familiar physical concepts. One cannot in the classical sense picture a System being partly in each of two states and see the equivalence of this to the System being completely in some other state. There is an entirely new idea involved, to which one must get accustomed and in terms of which one must proceed to buil’d up an exact mathematical theory, without having any detailed classical picture.

§4

SUPERPOSITION AND INDETERMINACY

13

When a state is formed by the Superposition of two other states, it will have properties that are in some vague way intermediate between those of the two original states and that approach more or less closely to those of either of them according to the greater or less ‘weight’ attached to this state in the Superposition process. The new state is completely defined by the two original states when their
relative weights in the Superposition process are known, together with a certain Phase differente, the exact meaning of weights and phases being provided in the general case by the mathematical theory.
In the case of the polarization of a Photon their meaning is that provided by classical optics, so that, for example, when two perpendicularly plane polarized states are superposed with equal weights, the ne’w state may be circularly polarized in either direction, or linearly polarized at an angle & 7~, or else elliptically polarized, according to the Phase differente.
The non-classical nature of the Superposition process is brought out clearly if we consider the Superposition of two states, A and B, such that there exists an Observation which, when made on the System in state A, is certain to lead to one particular result, a say, and when made on the System in state B is certain to lead to some different result, b say. What will be the result of the Observation when made on the System in the superposed state ? The answer is that the result will be sometimes a and sometimes b, according to a probability law depending on the relative weights of A and B in the Superposition process. It will never be different from both a and b. The inter-
mediate Character of the state formed by superposition thus expresses itself through the probability of a particulur res& for an observution
being interkdiate between the corresponding probabilities for the original
stutes,j- not through the result itself being intermediate between the
corresponding results for the original states. In this way we see that such a drastic departure from ordinary
ideas as the assumption of Superposition relationships between the states is possible only on account of the recognition of the importarme of the disturbance accompanying an Observation and of the consequent indeterminacy in the result of the Observation. When an Observation is made on any atomic System that is in a given state,
t The probability of a particulrtr result for the state formed by superposition is not
slways intermediate between those for the original states in the general case when those for the original states are not Zero OP unity, so there arc restrictions on the ‘intermediateness ’ of a state formed by Superposition.

14

THE PRINCIPLE OF SUPERPOSITION

§4

in general the result will not be determinate, i.e., if the experiment is repeated several times under identical conditions several different results may be obtained. It is a law of nature, though, that if the experiment is repeated a large number of firnes, each particular result will be obtained in a definite fraction of the total number of firnes, so that there is a definite probability of its being obtained. This probability is what the theory sets out to calculate. Only in special cases when the probability for some result is unity is the result of the experiment determinate.
The assumption of Superposition relationships between the states leads to a mathematical theory in which the equations that define a state are linear in the unknowns. In consequence of this, People have tried to establish analogies with Systems in classical mechanics, such as vibrating strings or membranes, which are governed by linear equations and for which, therefore, a superposition principle holds. Such analogies have led to the name ‘Wave Mechanics’ being sometimes given to quantum mechanics. It is important to remember, however, that the superposition that occurs in quuntum mechanics is of an. essentially different nuture from any occurring in the classical theory, as is shown by the fact that the quantum Superposition principle demands indeterminacy in the results of observations in Order to be capable of a sensible physical interpretation. The analogies are thus liable to be misleading.

5. Mathematical formulation of the principle
A profound Change has taken place during ‘the present century in the opinions physicists have held on the mathematical foundations of their subject. Previously they supposed that the principles of Newtonian mechanics would provide the basis for the description of the whole of physical phenomena and that all the theoretical physicist had to do was suitably to develop and apply these principles. With the recognition that there is no logical reason why Newtonian and other classical principles should be valid outside the domains in which they have been experimentally verified has come the realization that departures Fom these principles are indeed necessary. Such departures find their expression through the introduction of new mathematical formalisms, new schemes of axioms and rules of manipulation, into the methods of theoretical physics.
Quantum mechanics provides a good example of the new ideas. It

0 5 MATHEMATICAL FORMULATION OF THE PRINCIPLE 10
requires the &ates of a dynamical System and the dynamical variables
to be interconnected in quite strange ways that are unintelligible from the classical Standpoint. The states and dynamical variables have to be represented by mathematical quantities of different natures from those ordinarily used in physics. The new scheme becomes a precise physical theory when all the axioms and rules of manipulation governing the mathematical quantities arc spectied
and when in addition certain laws are laid down connecting physical facts with the mathematical formalism, so that from any given physical conditions equations between the mathematical quantities may be inferred and vice versa. In an application of the theory one would be given certain physical information, which one would proceed to express by equations between the mathematical quantities. One would then deduce new equations with the help of the axioms and rules of manipulation and would conclude by interpreting these new equations as physical conditions. The justification for the whole scheme depends, apart from internal consistency, on the agreement of the final results with experiment.
We shall begin to set up the scheme by dealing with the mathe-
matical relations between the states of a dynamical System at one instant of time, which relations will come from the mathematical
formulation of the principle of Superposition. The Superposition process is a kind of additive process and implies that states tan in some way be added to give new states. The states must therefore be connected with mathematical quantities of a kind which tan be added
together to give other quantities of the same kind. The most obvious of such quantities are vectors. Ordinary vectors, existing in a space of a finite number of dimensions, are not sufficiently general for
most of the dynamical Systems in quantum mechanics. We have to make a generalization to vectors in a space of an infinite number of dimensions, and the mathematical treatment becomes complicated by questions of convergence. For the present, however, we shall deal merely with some general properties of the vectors, properties which tan be deduced on the basis of a simple scheme of axioms, and questions of convergence and related topics will not be gone into until the need arises.
It is desirable to have a speeist1 name for describing the vectors which are connected with the states of a System in quantum mechanies, whether they are in a space of a finite or an inf?nite number of

16

THE PRINCIPLE OF SUPERPOSITION

§S

dimensions. We shall cal1 them ket vectors, or simply kets, and denote a general one of them by a special Symbol j>. If we want to specify a particular one of them by a label, A say, we insert it in the middle, thus IA). The suitability of this notation will become clear as the scheme is developed.
Ket vectors may be multiplied by complex numbers and may be added together to give other ket vectors, eg. from two ket vectors IA) and IB) we tan form

Cl IA)+% IW = Im,

say, where c1 and cs are any two complex numbers. We may also

perform more general linear processes with them, such as adding an

infinite sequence of them, and if we have a ket vector IX), depending

on and labelled by a Parameter x which tan take on all values in a

certain range, we may integrate it with respect to x, to get another

ket vector

s IX> dx 1 IQ>

say. A ket vector which is expressible linearly in terms of certain

others is said to be dependent on them. A set of ket vectors are called

independent if no one of them is expressible linearly in terms of the

others.

We now assume that euch state of a dynamical system at a particular

time cwresponds to a ket vector, the correspondence being such that if a

state results from the superposition of certain other states, its correspond-

ing ket vector is expressible linearly in terms of the corresponding ket

vectors of the other states, and conversely. Thus the state R results from a Superposition of the states A and B when the corresponding ket

vectors are connected by (1).

The above assumption leads to certain properties of the super-

Position process, properties which are in fact necessary for the word

‘superposition’ to be appropriate. When two or more states are

superposed, the Order in which they occur in the Superposition

process is unimportant, so the Superposition process is symmetrical

between the states that are superposed. Again, we see from equation (1) that (excluding the case when the coefficient c1 or c, is Zero) if

the state R tan be formed by Superposition of the states A and B,

then the state A tan be formed by Superposition of B and R, and B

tan be formed by Superposition of A and R. The Superposition

relationship is symmetrical between all three states A, 23, and R.

$5 MATHEMATICAL FORMULATION OF THE PRINCIPLE
A state which results from the Superposition of certain other states will be said to be dependent on those states. More generally, a state will be said to be dependent on any set of states, finite or
infinite in number, if its corresponding ket vector is dependent on the corresponding ket vectors of the set of states. A set of states will be called independent if no one of them is dependent on the
others. To proceed with the mathematical formulation of the superposition
principle we must introduce a further assumption, namely the assump-
tion that by superposing a state with itself we cannot form any new state, but only the original state over again. If the original state corresponds to the ket vector IA), when it is superposed with itself the resulting state will correspond to

clI4+% 14 = (c1+cJA),

where c1 and ca are numbers. Now we may have cl+cz = 0, in which

case the result of the Superposition process would be nothing at all,

the two components having cancelled each other by an interference

effect. Our new assumption requires that, apart from this special

case, the resulting state must be the same as the original one, so that

(c,+c,) IA} must correspond to the same state that IA> does. Now

c1+c2 is an arbitrary complex number and hence we tan conclude

that if the ket vector corresponding to a state is multi@ied by any

complex number, not xero, the resulting bet vector will correspond to the

same Stute. Thus a state is specified by the direction of a ket vector

and any length one may assign to the ket vector is irrelevant. All

the states of the dynamical System are in one-one correspondence

with all the possible directions for a ket vector, no distinction being

made between the directions of the ket vectors IA) and - IA).

The assumption just made Shows up very clearly the fundamental

differente between the Superposition of the quantum theory and any

kind of classical superposition. In the case of a classical System for

c

which a superposition principle holds, for instance a vibrating mem-

brane, when one superposes a state with itself the result is a difSerent

state, with a different magnitude of the oscillations. There is no

physical characteristic of a quantum state corresponding to the

magnitude of the classical oscillations, as distinct from their quality,

described by the ratios of the amplitudes at different Points of

the membrane. Again, while there exists a classical state with zero

3595.57

a

----__-__- ---- - .-!-

18

THE PRINCIPLE 03’ SUPERPOSITION

95

amplitude of oscillation everywhere, namely the state of rest, there does not exist any corresponding state for a quantum System, the Zero ket vector corresponding to no state at all.
Given two states corresponding to the ket vectors IA) and IB), the general state formed by superposing them corresponds to a ket vector IR> which is determined by two complex numbers, namely the coefficients cr and c2 of equation (1). If these two coefficients are multiplied by the same factor (itself a complex number), the ket veotor IR) will get multiplied by this factor and the corresponding state will be unaltered. Thus only the ratio of the two coefficients
is effective in determining the state R. Hence this state is deter-
mined by one complex number, or by two real Parameters. Thus from two given states, a twofold infinity of states may be obtained by superposition.
This resrilt is confirmed by the examples discussed in $9 2 and 3. In the example of $2 there are just two independent states of polarization for a Photon, which may be taken to be the states of plane polarization parallel and perpendicular to some fixed direction, and from the Superposition of these two a twofold infinity of states of polarization tan be obtained, namely all the states of elliptic polarization, the general one of which requires two Parameters to describe it. Again, in the example of $ 3, from the Superposition of two given states of motion for a Photon a twofold infinity of states of motion may be obtained, the general one of which is described by two Parameters, which may be taken to be the ratio of the amplitudes of the two wave functions that are added together and their Phase relationship. This confirmation Shows the need for allowing complex coeflicients in equation (1). If these coefficients were restricted to be real, then, since only their ratio is of importante for determining the
direction of the resultant ket vector 1R> when IA) and IB) are
given, there would be only a simple in.Cnity of states obtainable from the Superposition.

6. Bra and ket vectors
Whenever we have a set of vectors in any mathematical theory, we tan always set up a second set of vectors, which mathematicians call the dual vectors. The procedure will be described for the case when the original vectors are our ket vectors.
Suppose we have a number + which is a function of a ket vector

*.

t

§ß

BRA AND KET VECTORS

19

IA), i.e. to each ket vector IA) there corresponds one number 4, and suppose further that the function is a linear one, which means that the number corresponding to IA)+ IA’) is the sum of the numbers corresponding to IA) and to IA’), and the number corresponding to c/A) is c times the number corresponding to IA), c being any numerical factor. Then the number + corresponding to
any IA) may be looked upon as the scalar product of that IA) with some new vector, there being one of these new vectors for each linear function of the ket vectors IA). The justification for this way of looking at + is that, as will be seen later (see equations (5) and (6)), the new vectors may be added together and may be multiplied by numbers to give other vectors of the same kind. The new vectors are, of course, defined only to the extent that their scalar products with the original ket vectors are given numbers, but this is sufficient for one to be able to build up a mathematical theory about them.
We shall cal1 the new vectors bra vectors, or simply bras, and denote a general one of them by the Symbol ( 1, the mirror image of the Symbol for a ket vector. If we want to specify a particular one of them by a label, B say, we write it in the middle, thus <B 1. The scalar product of a bra vector (BI and a ket vector IA) will be written (BIA), i.e. as a juxtaposition of the Symbols for the bra and ket vectors, that for the bra vector being on the left, and the two vertical lines being contracted to one for brevity.
One may look upon the Symbols ( and > as a distinctive kind of
brackets. A scalar product (BIA) now appears as a complete bracket expression and a bra vector (BI or a ket vector IA) as an incomplete bracket expression. We have the rules that any complete bracket expression denotes a number and any incomplete bracket expression denotes a vector, of the bra or ket kind according to whether it contuins the Jirst or second part sf thti brackets.
The condition that the scalar product of (BI and IA) is a linear function of IA) may be expressed symbolically by

<BI(W+ IA’)) = <JWO+<BIO

(2)

<BI{+)) = c<BW,

(3)

c being any number.
A bra vector is considered to be completely defined when its scalar product with every ket vector is given, so that if a bra vector has its

20

THE PRINCIPLE OF SUPERPOSITION

§6

scalar product with every ket vector vanishing, the bra vector itself must be considered as vanishing. In Symbols, if

then

<PIA> = 0, all IA>, (PI = 0.

1 (4)

The sum of two bra vectors (B 1and (B’ { is defined by the condition that its scalar product with any ket vector IA) is the sum of the
scalar products of (BI and (B’I with IA),

@1+(8’l)lA> = <BIA>+<B’IA>,

(5)

and the product of a bra vector (B 1and a number c is defined by the

condition that its scalar product with any ket vector IA) is c firnes the scalar product of (BI with IA),

(cwI4 = c(BIA).

(6)

Equations (2) and (5) sh,ow that products of bra and ket vectors satisfy the distributive axiom of multiplication, and equations (3) and (6) show that multiplication by numerical factors satisfies the usual algebraic axioms.
The bra vectors, as they have been here introduced, are quite a different kind of vector from the kets, and so far there is no connexion between them except for the existente of a scalar product of a bra and a ket. We now make the assumption that Tiere is a one-one
correspondence between the bras and the kets, such that the bra corre-
sponding to IA) + IA’) is the suna of the bras corresponding to 1A) and
to IA’), md the bra corresponding to clA> is c’ times the bra correspon&ng- to IA), c’ being the conjugate cornplex number to c. We shall use the same label to specify a ket and the corresponding bra. Thus the bra corresponding to IA) will be written (A 1.
The relationship between a ket vector and the corresponding bra makes it reasonable to call one of them the conjugate imaginary of the other. Our bra and ket vectors are complex quantities, since they tan be multiplied by complex numbers and are then of the same nature as before, but they are complex quantities of a special kind which cannot be Split up into real and pure imaginary Parts. The usual method of getting the real part of a complex quantity, by taking half the sum of the quantity itself and its conjugate, cannot be applied since a bra and a ket vector are of d.ifIerent natures and cannot be added together. To call attention to this distinction, we shall use the words ‘conjugate complex’ to refer to numbers and

BRA AND KET VECTQRS

other complex quantities which tan be spht up into real and pure

imaginary Parts, and the words ‘conjugate imaginary’ for bra and

ket vectors, which cannot. With the former kind of quantity, we

shall use the notation of putting a bar over one of them to get the

conjugate complex one.

I

On account of the one-one correspondence between bra vectors and

ket vectors, any state of our dynamical system at a particular time may be spec@ed by the direction of a bra uector just us weil as by the direction

of a ket vector. In fact the whole theory will be symmetrical in its essentials between bras and kets.
Given any two ket vectors IA) and IB), we tan construct from them a number (BIA) by taking the scalar product of the first with the conjugate imaginary of the second. This number depends linearly on IA) and antilinearly on IB), the antilinear dependence meaning

that the number formed from IB)+ IB’) is the sum of the numbers formed from 1B) and from 1B’), and the number formed from c 1B)

is c’ times the number formed from IB). There is a second way in

which we tan construct a number which depends linearly on IA> and

antilinearly on IB>, namely by forming the scalar product of IB) with the conjugate imaginary of IA) and taking the conjugate complex of this scalar product. We assume thut these two.numbers are

always equul, i.e.

Gwo = <4w

(7)

Putting IB) = IA> here, we find that the number @IA> must be real. We make the further assumption

except when IA) = 0.

<44 > 0,

(8)

In ordinary space, from any two vectors one tan construct a

number-their scalar product-which is a real number and is sym-

metrical between them. In the space of bra vectors or the space of

ket vectors, from any two vectors one tan again construct a number

-the scalar product of one with the conjugate imaginary of the

other-but this number is complex and goes over into the conjugate

complex number when the two vectors are interchanged. There is

thus a bind of perpendicularity in these spaces, which is a generaliza-

tion of the perpendicularity in ordinary space. We shall call a bra

and a ket vector orthogonal if their scalar product is Zero, and two

bras or tw.o kets will be called orthogonal if the scalar product of one

with the conjugate imaginary of the other is Zero. E’urther, we shall

22

THE PRINCIPLE OF’ SUPERPOSITION

§6

say that two states of our dynamical System are orthogonal if the vectors corresponding to these states are orthogonal.
The Zength of a bra vector (A 1or of the conjugate imaginary ket vector JA) is defined as the Square root of the positive number (A IA). When we are given a state and wish to set up a bra or ket vector to correspond to it, only the direction of the vector is given
and the vector itself is undetermined to the extent of an arbitrary numerical factor. It is often convenient to choose this numerical factor so that the vector is of length unity. This procedure is called normalization and the vector so Chosen is said to be normlixed. The vector is not completely determined even then, since one tan still
multiply it by any number of modulus unity, i.e. any number eiy where y is real, without changing its length. We shall call such a number a phase factor.
The foregoing assumptions give the complete scheme of relations
befween the states of a dynamical System at a particular time. The relations appear in mathematical form, but they imply physical conditions, which will lead to results expressible in terms of observa-
tions when the theory is developed further. For instance, if two states are orthogonal, it means at present simply a certain equation in our formalism, but this equation implies a definite physical relationship between the states, which further developments of the theory will
enable us to interpret in terms of observational results (see the bottom of p. 35).

11
DYNAMICAL VARIABLES AND OBSERVABLES
7. Linear Operators IN the preceding section we considered a number which is a linear function of a ket vector, and this led to the concept of a bra vector. We shall now consider a ket vector which is a linear function of a ket vector, and this will lead to the concept of a linear Operator.
Suppose we have a ket IE‘) which is a function of a ket IA), i.e. to each ket IA) there corresponds one ket 1F), and suppose further that the function is a linear one, which means that the IF) corresponding to IA) + IA’) is the sum of the 1F)‘s corresponding to IA) and to IA’), and the I-8’) corresponding to clA> is c times the 1F) corresponding to IA), c being any numerical factor. Under these conditions, we may 10015 upon the passage from IA) to 1F) as the application of a linear Operator to IA). Introducing the Symbol 01 for the linear Operator, we may write

in which the result of cx operafing on IA) is written like a product of ac. with IA). We make the rule that in such products the ket wector must always be put on the right of the linear operatm. The above conditions of linearity may now be expressed by the equations

4A>+ IA’>) = +>+w>, a{clA)) = c+4).

1 (1)

A linear Operator is considered to be completely defined when the

result of its application to every ket vector is given. Thus a linear

Operator is to be considered zero if the result of its application to every

ket vanishes, and two linear Operators are to be considered equal if

they produce the same result when applied to every ket.

Linear Operators tan be added together, the sum of two linear

Operators being defined to be that linear Operator which, operating

on any ket, produces the sum of what the two linear Operators

separately would produce. Thus CY+/~ is defined by

(~+Pw> = 4A>+mo

(2)

for any IA). Equation (2) and the first of equations (1) show that

products of linear Operators with ket vectors satisfy the distributive axiom of multiplication.

24

DYNAMICAL VARIABLES AND OBSERVABLES

§7

Linear Operators tan also be multiplied together, the product of two linear Operators being defined as that linear Operator, the application of which to any ket produces fhe same result as the application of the two linear Operators successively. Thus the product a/3 is defined as the linear Operator which, operafing on any ket IA), changes it into that ket which one would get by operating first on IA> with /3, and then on the result of the first Operation with 01. In Symbols

This definition appears as the associative axiom of multiplication for the triple product of 01, fl, and IA), and allows us to write this triple product as aj3jA) without brackets. However, this triple product is in general not the same as what we should get if we operated on IA) first with Q: and then with ß, i.e. in general @IA) differs from /3aIA), so that in general 0#3 must differ from /Ia. The commutative axiom of multiplication does not hZd for linear Operators. It may happen as a special case that two linear Operators f and q are such that eq and 76 are equal. In this case we say that 5 commutes with 7, or that 6 and r] commute.
By repeated applications of the above processes of adding and multiplying linear Operators, one tan form sums and products of more than two of them, and one tan proceed to build up an algebra with them. In this algebra the commutative axiom of multiplication does not hold, and also the product of two linear Operators may vanish without either factor vanishing. But all the other axioms of ordinary algebra, including the associative and distributive axioms of multiplication, are valid, as may easily be verified.
If we take a number li: and multiply it into ket vectors, it appears as a linear Operator operating on ket vectors, the conditions (1) being fulfrlled with E substituted for CX. A number is thus a special case of a linear Operator. It has the property that it commutes with all linear Operators and this property distinguishes it from a general linear Operator.
So far we have considered linear Operators operating only on ket vectors. We tan give a meaning to their operating also on bra vectors, in the following way. Take the scalar product of any bra (BI with the ket a IA). This scalar product is a number which depends linearly on IA) and therefore, from the definition of bras, it may be considered as the scalar product of IA) with some bra. The bra thus

$7

LINEAR OPERATORS

defined depends linearly on {B 1, so we may look upon it as the result of

some linear operator applied to (B 1. This linear Operator is uniquely

determined by the original linear Operator cx and may reasonably be

called the Same linear Operator operating on a bra. In this way our

linear Operators are made capable of operating on bra vectors.

A suitable notation to use for the resulting bra when u: operates on

the bra (BI is (Bla, as in this notation the equstion which defines

(Bleu is

(3)

+

for any JA>, which simply expresses the associative axiom of multiplication for the triple product of (BI, CL, and IA). We therefore make the general rule that in a product of a bra and a linear Operator, the bra must always be put on the left. We tan now write the friple
product of (BI, CII, and IA> simply as (B ICX IA> without brsckets. It
may easily be verified that the distributive axiom of multiplication holds for products of bras and linear operetors just as weil as for products of linear Operators and kets.
There is one further kind of product which has a meaning in our scheme, namely the product of a ket vector and a bra vector with
the ket on the left, such as lA)(B 1. To examine this product, let us multiply it into an arbitrary ket 1P), putting the ket on the right,
and assume the associative axiom of multiplication. The product is
then IA)(B 1P), which is another ket, namely (A) multiplied by the number (BI P), and this ket depends linearly on the ket 1P). Thus IA){ BI appears as a linear Operator that tan operate on kets. It
tan also operate on bras, its product with a bra (& 1on fhe left being
(&JA>(BJ, which is the number (QIA) times the bra (BJ. The product IA}{B 1 is to be sharply distinguished from the product (BIA} of the same factors in the reverse Order, the latter product
being, of course, a number. We now have a complete algebraic scheme involving three kinds
of quantities, bra vectors, ket vectors, and linear Operators. They tan be multiplied together in the various ways discussed above, ad the associative and distributive axioms of multiplication always hold, but the commutative axiom of multiplication does not hold. In this general scheme we still have the rules of notation of the preceding section, that any complete bracket expression, containing ( on the left and > on the right, denotes a number, while any incomplete bracket expression, containing only ( or >, denotes a vector.

26

DYNAMICAL VARIABLES AND OBSERVABLES

§ .7

With regard to the physical significance of the scheme, we have already assumed that the bra vectors and ket vectors, or rather the directions of these vectors, correspond to the states of a dynamical System at a particular time. We now make the further assumption that the linear Operators correspond to the dynamical variables at that time. By dynamical variables are meant quantities such as the coordinates and the components of velocity, momentum and angular momentum of particles, and functions of these quantities-in fact the variables in terms of which classical mechanics is built up. The new assumption requires that these quantities shall occur also in quantum mechanics, but with the striking differente that they are now subject to an algebra in which the commutative axiom of multiplication does not hold.
This different algebra for the dynamical variables is one of the most important ways in which quantum mechanics differs from classical mechanics. We shall see later on that, in spite of this fundamental differente, the dynamical variables of quantum mechanics still have many properties in common with their classical counterParts and it will be possible to build up a theory of them closely analogous to the classical theory and forming a beautiful generalization of it.
It is convenient to use the same letter to denote a dynamical variable and the corresponding linear Operator. In fact, we may consider a dynamical variable and the corresponding linear Operator to be both the same thing, without getting into confusion.

8. Conjugate relations Our linear Operators are complex quantities, since one tan multiply
them by complex numbers and get other quantities of the Same nature. Hence they must correspond in general to complex dynamical variables, i.e. to complex functions of the coordinates, velocities, etc. We need some further development of the theory to see what kind of linear Operator corresponds to a real dynamical variable.
Consider the ket which is the conjugate imaginary of (P Ia. This ket depends antilinearly on (P 1and thus depends linearly on 1P). It may therefore be considered as the result of some linear Operator operafing on [ P). This linear Operator is called the adjoint of 01 and we shall denote it by 2. With this notation, the conjugate imaginary
of (P~cx is GIP).

§S

CONJUGATE RELATIONS

27

In formula (7) of Chapter 1 put (P Ia for (A 1and its conjugate imaginary 0i1 P) for IA). The result is

(BIGIP) = {PlalB).

(4)

This is a general formula holding for any ket vectors IB), 1.Q and

any linear Operator c11, and it expresses one of the most frequently

used properties of the adjoint.

Putting & for a in (4), we get

(BpqP) = <PIo;IB) = (BlaIP),

by using (4) again with ]P> and 1B) interchanged. This holds for any ket IP), so we tan infer from (4) of Chapter 1,
(SIE = (Bla,

and since this holds for any bra vector (B 1, we tan infer

Thus the adjoint of the adjoint of a linear Operator is the original linear Operator. This property of the adjoint makes it like the conjugate complex of a number, and it is easily verified that in the special case when the linear Operator is a number, the adjoint linear Operator is

the conjugate complex number. Thus it is reasonable to assume that

the adjoint of a linear Operator corre.spor&. to the conjugate complex of a dynamical variable. With this physical significance for the adjoint of a linear Operator, we may call the adjoint alternatively the conjugate complex linear Operator, which conforms with our notation 6.
A linear Operator may equal its adjoint, and is then called self-

adjoint. It corresponds to a real dynamical variable, so it may be

called alternatively a real linear Operator. Any linear Operator may

be Split up into a. real part and a pure imaginary part. For this reason the words ‘conjugate complex’ are applicable to linear

Operators and not the words ‘conjugate imaginary’.

The conjugate complex of the sum of two linear Operators is

obviously the sum of their conjugate complexes. To get the conjugate complex of the product of two linear Operators (II and J3, we apply formüla (7) of Chapter 1 with

<Al = <Pl%

@I = wB9

so that

IN = w>, IB = PIQ>.

The result is

28

DYNAMICAL VARIABLES AND OBSERVABLES

from (4). Since this holds for any IP) and (& 1, we tan infer that

P ii = q.

(5)

Thus the conjugate complex of the product of two linear Operators equals

the product of the conjugate complexes of the factors in the reverse Order.

As simple examples of this result, it should be noted that, if 5 and

are real, in general CJq is not real. This is an important differente

Lorn classical mechanics. However, &I + $ is real, and so is ;( & - qc).

Only when 6 and q commute is (17 itself also real. Further, if 8 is real,

then so is t2 and, more generally, tn with n any positive integer.

We may get the conjugate complex of the product of three linear

Operators by successive applications of the rule (5) for the conjugate

comnLlex of the nI roduct of two of them. We have

& = a@y) = fijgk = j$ ä,

(6)

so the conjugate complex of the product of three linear Operators

equals the product of the conjugate complexes of the factors in the

reverse Order. The rule mey easily be extended to the product of any

number of linear Operators.

In the preceding section we saw that the product (A)(B 1is a linear

Operator. We may get its conjugate complex by referring directly to

the definition of the adjoint. Multiplying @l)(BI into a general bra

(P 1we get (P IA)(B 1, whose conjugate imaginary ket is

w4Im = Gwwo = Pww).

Hence

I4W = Im4

(7)

We now have several rules concerning coniugate complexes and conjugate imaginaries of products, namely equation (7) of Chapter 1, equations (4), (5), (6), (7) of this chapter, and the rule that the conjugate imaginary of (P Ia is ai 1P). These rules tan all be summed up in a Single comprehensive rule, the conjugate complex or conjugate
imaginary of any product of bra vectors, Eet vectors, and linear operdors is obtained by taking the conjugate complex or conjugate imaginary of each factor and reversing the Order of all the factors. The rule is easily verified to hold quite generally, also for the cases not explicitly given above.

THEOREM. If ( is a real linear Operator and

lm/P) = 0

(8)

for a particulur ket 1P>, m bei-g a positive integer, then

(IP) = 0.

§f3

To prove the theorem, take first the case when m = 2. Equation

(8) then gives

(Plf21P) = 0,

showing that the ket [ 1P) multiplied by the conjugate imaginary bra
(P]tj is Zero. From the assumption (8) of Chapter 1 with 41 P> for IA), we see that 51 P) must be Zero. Thus the theorem is proved for m = 2.
Now take m > 2 and put

cm-21p) = IQ>. Equation (8) now gives f21&) = 0.

Applying the theorem for m = 2, we get

w> = 0

or

pyP) = 0.

(9)

By repeating the process by which equation (9) is obtained fiom

(8), we obtain successively

p-2jP) = 0 , pyP) = 0 , . . . . CpjP) = 0, W) = 0,

and so the theorem is proved generally.

9. Eigenvalues and eigenvectors

We must make a further development of the theory of linear operators, consisting in studying the equation

ajP) = alp},

(10)

where 01 is a linear Operator and a is a number. This equation usually

presents itself in the form that CY. is a known linear Operator and the

number a and the ket IP) are unknowns, which we have to try to

choose so as to satisfy (lO), ignoring the trivial Solution 1P) = 0.

Equation (10) means that the linear Operator cx applied to the ket

1P) just multiplies this ket by a numerical factor without changing

its direction, or else multiplies it by the factor Zero, so that it ceases

to have a direction. This same cx applied to other kets will, of course,

in general Change both their lengths and their directions. It should

be noticed that only the direction of 1P) is of importante in equation

(10). If one multiplies 1P) by any number not Zero, it will not aff ect

the question of whether (10) is satisfied or not.

Together with equation (lO), we should consider also the conjugate

imaginary form of equation

(Qb = b<Ql,

(11)

where b is a number. Here the unknowns are the number b and the

30

DYNAMlCAL VARIABLES AND OBSERVABLES

09

non-Zero bra (& 1. Equations (10) and (11) are of such fundamental importante in the theory that it is desirable to have some special words to describe the relationships between the quantities involved. If (10) is satisfied, we shall call u an eigenvaluet of the linear Operator a, or of the corresponding dynamical variable, and we shall cal1 IP) an eigenket of the linear Operator or dynamical variable. Further, we shall say that the eigenket [P) belongs to the eigenvalue u. Similarly, if (11) is satisfied, we shall call b an eigenvalue of As. and (& 1an eigenbra belonging to this eigenvalue. The words eigenvalue, eigenket, eigenbra have a meaning, of course, o%?y with reference to a linear Operator or dynamical variable.
Using this terminology, we tan assert that, if an eigenket of cx is multiplied by any number not Zero, the resulting ket is also an eigenket and belongs to the Same eigenvalue as the original one. It is possible to have two or more independent eigenkets of a linear Operator belonging to the Same eigenvalue of that linear Operator, e.g. equation (10) may have several solutions, /Pl), /Pd), jP3),... say, all holding for the same value of a, with the various eigenkets [Pl), IPQ, IP3),... independent. In this case it is evident that any linear combination of the eigenkets is another eigenket belonging to the same eigenvalue of the linear Operator, e.g.

c,l-w+c, IW+c, IP3)+...
is another solution of (lO), where cl, c2, c~,... are any numbers. In the special case when the linear Operator 01 of equations (10) and
(11) is a number, Ic say, it is obvious that any ket IP) and bra (& 1 will satisfy these equations provided a and b equal i?. Thus a number considered as a linear Operator has just one eigenvalue, and any ket is an eigenket and any bra is an eigenbra, belonging to this eigenvalue.
The theory of eigenvalues and eigenvectors of a linear Operator CY which is not real is not of much use for quantum mechanics. We shall therefore tonfine ourselves to real linear Operators for the further development of the theory. Putting for a the real linear Operator f, we have instead of equations (10) and (11)

w-9 = 4Ph

(12)

<GM = WL

(13)

t The word ‘proper ’ is sometimes used instead of ‘eigen ‘, but this is not satisfactory as the words ‘proper’ and ‘improper’ are often used with other meanings. For example, in $0 15 and 46 the words ‘improper function’ and ‘proper-energy’ are used.

§9

EIGENVALWES AND EIGENVECTORS

31

Three important results tan now be readily deduced. (i) The eigenvalues are all real numbers. To prove that a satisfying
(12) is real, we multiply (12) by the bra (P 1on the left, obtaining

Now from equation (4) with (B 1replaced by (P /and cx replaced by the real linear Operator e, we see that the number (P 16 J P> must be real, and from (8) of $6, (P 1P) must be real and not Zero. Hence a is real. Similarly, by multiplying (13) by IQ> on the right, we tan
prove that b is real. Suppose we have a Solution of (12) and we form the conjugate
imaginary equstion, which will read

in view of the reality of 5 and a. This conjugate imaginary equation
now provides a Solution of (13), with (& 1= (PJ and b = a. Thus we tan infer
(ii) The eigenvalues associated with eigenkets are the same as the eigenvalues associated with eigenbras.
(iii) The conjugate imaginary of any eigenket is an eigenbra belonging to the same eigenvalue, and conversely. This last result makes it reasonable to cal1 the state corresponding to any eigenket or to the conjugate imaginary eigenbra an eigenstate of the real dynamical variable f.
Eigenvalues and eigenvectors of vaious real dynamical variables are used very extensively in quantum mechanics, so it is desirable to have some systematic notation for labelling them. The following is suitable for most purposes. If E is a real dynamical variable, we call its eigenvalues [‘, e”, e, etc. Thus we have a letter by itself denoting a real dymmid variable or a real linear Operator, and the Same letter with primes or an index attached denoting a number, namely an eigenvalue of what the letter by itself denotes. An eigenvector may now be labelled by the eigenvalue to which it belongs. Thus lt’) denotes an eigenket belonging to the eigenvalue 6’ of the dynamical variable [. If in a piece of work we deal with more than one eigenket belonging to the same eigenvalue of a dynamical variable, we may distinguish them one fiom bnother by means of a further label, or possibly of more than one further labels. Thus, if we are dealing with two eigenkets belonging to the same eigenvalue of ff, we may cal1 them /E’l) and If’2).

32

DYNAMICAL VARIABLES AND OBSERVABLES

THEOREM. Two eigenvectors of a real dynamicab variable belonging to diJferent eigenvalues are orthogonal.

To prove the theorem, let 16’) and It”) be two eigenkets of the real

dynamical variable f, tively. Then we have

belonging to the the equations

eigenvalues

[’

and

f.”

respec-

tw> = w>,

(14)

w> = eV?*

(15)

Taking the conjugate imaginary of (14) we get

~‘0’15 = 5w.

Multiplying this by It”) on the right gives

_

<iw~“> = tx’lt’9

and multiplying (16) by ([’ 1on the left gives

aw’> = M’lr>-

Herme, subtracting,

e--5“‘W Je’> = 0,

(16)

showing that, if f’ f: t”, (l’I[“) = 0 and the two eigenvectors It’>

and lt”> arc orthogonal. TI& theorem will be referred to as the

orthogonality theorem. We have been discussing properties of the eigenvalues and eigen-
vectors of a real linear Operator, but hsve not yet considered the question of whether, for a given real linear Operator, any eigenvalues and eigenvectors exist, and if so, how to find them. This question is in general very difficult to answer. There is one useful special case, however, which is quite tractable, namely when the real linear Operator, 6 sa,y, satisfies an algebraic equation

+(t) = [m+alfn-1+a2[n-2+...+an 7 0,

(17)

the coefficients a being numbers. This equation means, of course,
that the linear Operator d(t) produces the result Zero when applied to any ket vector or to any bra vector.

Let (17) be the simplest algebraic equation that E satisfies. Then it will be shown that

(ar) The number of eigenvalues of 6 is n.

(8) There arc so many eigenkets of t that any ket whatever tan be expressed as a sum of such eigenkets.

The algebraic form +(EJ) tan be factorized into n linear factors, the

result being m = (~-c,)(5-c,)(~-c,)...(~-c,)

(18)

§9

EIGENVALUES AND EIGENVECTORS

33

say, the c’s being numbers, not assumed to be all different. This factorization tan be performed with 6 a linear Operator just as weil as with ,$ an ordinary algebraic variable, since there is nothing occurring in (18) that does not commute with f. Let the quotient when #@) is divided by (e--c,) be x,,(e), so that

&i) = (&--c,h&) (i = 1,2,3,...., 12).

-

Then, for any ket IP),

&c,)xA4‘) P> = $w) lP> = 0,

(19)

Nm x,(5) 1p> cannot vanish for every ket IP}, as otherwise x,(f) itself would vanish and we should have g satisfying an algebraic

equation of degree n- 1, which would contradict the assumption that

(17) is the simplest equation that f satisfies. If we choose IP) so that

x,.(f) IP) does not vanish, then equation (19) Shows that x,(e) IP} is

an eigenket of f, belonging to the eigenvalue c,. The argument holds

for each value of r from 1 to n, and hence each of the c’s is an eigen-

value of [. No other number tan be an eigenvalue of 5, since if 6’ is

any eigenvalue, belonging to an eigenket it’),

w> = 4’10
and we tan deduce W) IE’> = w> bt?>,
and since the left-hand side vanishes we must have +(e’) = 0. To complete the proof of (ac) we must verify that the c’s are all
different. Suppose the c’s arc not all different and cs occurs m firnes say, with m > 1. Then +(e) is of the ferm

Ws = ec,wm with 8(t) a rational integral function of 4. Equation (17) now gives us

(I-c,Pw)l~~ = 0

(20)

for any ket IA). Since c, is an eigenvalue of 5 it must be real, so that

f-c, is a real linear Operator. Equation (20) is now of the Same form

as equation (8) with f-c, for 5 and 6([)@> for IP>. From the theorem connected with equation (8) we tan infer that

Since the ket IA} is arbitrary,

&-c,vw = 0,

which contradicts the assumption that (17) is the simplest equation

that 6. satisfies. Hence the c’s arc all different and (01) is proved.

Let x,,(c,.) be the number obtained when c,, is substituted for t in

8596.67

D

34

DYNAMICAL V A R I A B L E S A N D O B S E R V A B L E S

§9

the algebraic expression x(t). Since the C’S are all different, x,(c,) cannot vanish. Consider now the expression

xAt3 --.

1

(21)

2

r

XA%>

If ce is substituted for 6 here, every term in the sum vanishes except

the one for which r = s, since x,(f) contains (&c,) as a factor when r # 8, and the term for which r = s is unity, so the whole expression vanishes. Thus the expression (21) vanishes when 4 is put equal to

any of the n numbers ci,cz,...,c,. Since, however, the expression is only of degree n- 1 in f, it must vanish identically. If we now apply the linear Operator (21) to an arbitrary ket 1P) and equate

the result to Zero, we get

IQ = 7 &jx.(s)Ip~.

(22)

Esch term in the sum on the right here is, according to (19), an eigenket of f, if it does not vanish. Equation (22) thus expresses the

arbitrary ket 1P) as a sum of eigenkets of f, and thus (/3) is proved.

As a simple example we may consider a real linear Operator u that

satisfies the equation

u2= 1.

(23)

Then u has the two eigenvalues 1 and - 1. Any ket ]P) tan be

expressed as

Ie = 6(1+4IP>+9(1-~>IP>.

It is easily verified that the two terms on the right here arc eigenkets of Q, belonging to the eigenvalues 1 and - 1 respectively, when they do not vanish.

IO. Observables We have made a number of assumptions about the way in which
states and dynamical variables are to be represented mathematically in the theory. These assumptions are not, by themselves, laws of nature, but become laws of nature when we make some further assumptions that provide a physical interpretation of the theory. Such further assumptions must take the form of establishing connexions between the results of observations, on one hand, and the equations of the mafhematical formalism on the other.
When we make an Observation we measure some dynamical variable. It is obvious physically that the result of such a measurement must always be a real number, so we should expect that any dynamical

0 10

OBSERVABLES

30

variable that we tan measure must be a real dynamical variable. One might think one could measure a complex dynamical variable by measuring separately its real and pure imaginary Parts. But this would involve two measurements or two observations, which would be all right in classical mechanics, but would not do in quantum mechanics, where two observations in general interfere with one another-it is not in general permissible to consider that two observations tan be made exactly simultaneously, and if they arc made in quick succession the first will usually disturb the state of the System and introduce an indeterminacy that will affect the second. We therefore have to restritt the dynamical variables that we tan measure to be real, the condition for this in quantum mechanics being as given in $ 8. Not every real dynarnical variable tan be measured, however. A further restriction is needed, as we shall see Iater.
We now make some assumptions for the physical interpretation of the t+heory. If the dynamical system is in an eigenstate of a real :* dy~mid variable f, belonging to the eigenvalue f’, then a measurement
of ( will certainly give us result the number [‘. Gonversely, if the system
is in a state such that a meusurement of a real dynamical variable (c is certuin to give one particular result (instead of giving one or Gother of several possible results according to a probability law, as is in general the case), then the state is an eigenstate of 5 and the result of the measurement is the eigenvalue of ,$ to which this eigenstate belongs. These assumptions are reasonable on account of the eigenvalues of real ‘linear Operators being always real numbers.
Some of the immediate consequences of the assumptions will be noted. If we have two or more eigenstates of a real dynamical variable 4 belonging to the same eigenvalue k’, then any state formed by superposition of them will also‘ be an eigenstate of 6 belonging to the eigenvalue f’. We tan infer that if we have two or more states for which a measurement of f is certain to give the result t’, then for any state formed by Superposition ,of them a measurement of 5 will still be certain to give the result t’. This gives us some insight into the physical significance of Superposition of states. Again, two eigenstates of 4 belonging to different eigenvalues are orthogonal. We tan infer that two states for which a mea&uement of [ is certain to give two different results are orthogonal. This gives us some insight into the physical significance of orthogonal states.

36

DYNAMICAL V A R I A B L E S A N D O B S E R V A B L E S

0 10

When wc measure a real dynamical variable e, the disturbance

involved in the act of measurement Causes a jump in the state of the dynamical System. From physical continuity, if we make a second measurement of the same dynamical variable 4 immediately after the first, the result of the second measurement must be the Same as that of the first. Thus after the first measurement has been made, there is no indeterminacy in the result of the second. Hence, after the first measurement has been made, the System is in an eigenstate of the dynamical variable [, the eigenvalue it belongs to being equal to the result of the first measurement. This conclusion must still hold if the second measurement is not actually made. In this way we see that a measurement always Causes the System to jump into an eigenstate of the dynamical variable that is being measured, the eigenvalue this eigenstate belongs to being equal to the result of the measurement.
We tan infer that, with the dynamical System in any state, any result of a measurement of a real dynumical variable is one of its eigenvalues. Conversely, every eigenvalue is a possible result of a meusurement of the dynamicul variable for some Stute of the System, since it is certainly the result if the state is an eigenstate belonging to this eigenvalue. This gives us the physical significance of eigenvalues. The set of eigenvalues of a real dynamical variable are just the possible results of measurements of that dynamical variable and the calculation of eigenvalues is for this reason an important Problem.
Another assumption we make connected with the physical interpretation of the theory is that, if a certuin real dynumicul variabk 4 is measured with the System in a particulur state, the states into which
the System may jump on account of the measurement are such that the original state is dependent on them. Now these states into which the System may jump are all eigenstates of f, and hence the original state is dependent on eigenstates of 6. But the original state may be any stafe, so we tan conclude that any state is dependent on eigenstates of 4. If we define a complete set of states to be a set such that any state is dependent on them, then our conclusion tan be formulated-the eigenstates of 4 form a complete set.
Not every real dynamical variable has sufficient eigenstates to form a complete set. Those whose eigenstates do not form complete sets are not quantities that tan be measured. We obtain in this way a further condition that a dynamical variable has to satisfy in Order

.

‘,”

,

.

.

s 10

OBSERVABLES

37

that it shall be susceptible to measurement, in addition to the con-

dition that it shall be real. We call a real dynamical variable whose

eigenstates form a complete set an observuble. Thus any quantity

that tan be measured is an observable.

The question now presents itself-Can every observable be

measured? The answer theoretically is yes. In practice it may be

very awkward, or perhaps even beyond the ingenuity of the experi-

menter, to devise an apparatus which could measure some particular

observable, but the theory always allows one to imagine that the

measurement tan be made.

Let us examine mathematically the condition for a real dynamical

variable e to be an observable. Its eigenvalues may consist of a

(finite or infinite) discrete set of numbers, or alternatively, they

may consist of all numbers in a certain range, such as all numbers

lying between a and b. In the former case, the condition that any state is dependent on eigenstates of 4 is that any ket tan

be expressed as a sum of eigenkets of 5. ’ In the latter case the

condition needs modification, since one may have an integral instead

of a sum, i.e. a ket 15’) may be expressible as an integral of eigen-

kets of 4,

IP) = [ It’> dt’,

lt’> being an eigenket of [ belonging to the eigenvalue f’ and the
range of integration being the range of eigenvalues, as such a ket is
dependent on eigenkets of [. Not every ket dependent on eigenkets
of 4 tan be expressed in the form of the right-hand side of (24), since one of the eigenkets itself cannot, and more generally any sum of eigenkets cannot. The condition for the eigenstates of 6 to form a
complete set must thus be formulated, that any ket IP) tan be expressed as an integral plus a sum of eigenkets of E, i.e.

Ip) = j- 14’Q dt’+ CT I&J>,

(26)

where the j[‘c), /Pd> are all eigenkets of e, the labels c and d being

inserted to distinguish them when the eigenvalues 6’ and $ are equal,

and where the integral is taken over the whole range of eigenvalues

and the sum is taken over any selection of them. If this condition

is satisfied in the case when the eigenvalues of ,$ consist of a range

of numbers, then 4 is an observable.

There is a more general case that sometimes occurs, namely the

eigenvalues of ,$ may consist of a range of numbers together with a

38

DYNAMICAL VARIABLES AND OBSERVABLES

0 10

discrete set of numbers lying outside the range. In this case the
condition that f shall be an observable is still that any ket shall be expressible in the ferm of the right-hand side of (%), but the sum over r is now a sum over the discrete set of eigenvalues as weil as a selection of those in the range.
It is often very difyicult to decide mathematically whether a particular real dynamical variable satisfies the condition for being an observable or not, because the whole Problem of finding eigenvalues and eigenvectors is in general very difficult. However, we may have good reason on experimental grounds for believing that the dynamical variable tan be measured and then we may reasonably assume that it is an observable even though the mathematical proof is missing. This is a thing we shall frequently do during the course of development of the
theory, e.g. we shall assume the energy of any dynamical System to be always an observable, even though it is beyond the power of presentday mathematical analysis to prove it so except in simple Gases.
In the special case when the real dynamical variable is a number, every state is an eigenstate and the dynamical variable is obviously an observable. Any measurement of it always gives the Same res&,
so it is just a physical constant, like the Charge on an electron. A physical constant in quantum mechanics may thus be looked upon either as an observable with a Single eigenvalue or as a mere number
appearing in the equations, the two Points of view being equivalent. If the real dynamical variable satisfies an algebraic equation, then
the result (/3) of the preceding section Shows that the dynamical variable is an observable. Such an observable has a finite number of eigenvalues . Conversely, any observable with a finite number of eigenvalues satisfies an algebraic equation, since if the observable 4 has as its eigenvalues f’, l” ,..., En, then

(E-F)(~-~“)*.*(5-~n)IP> = 0 holds for IP) any eigenket of [, and thus it holds for any IE’> whatever, because any ket oan be expressed as a sum of eigenkets of 4 on account of t being an observable. Hence

(k-5’)(~-~“)***(~-~“) = 0.

P-9

As an example we may consider the linear Operator IA)@ 1, where IA) is a normalized ket. This linear Operator is real according to (7),
and its Square is

{IA>L4 l]” = IA><A 140 I = W@ I

(27)

§ 10

OBSERVABLES

39

since (AIA) = 1. Thus its ‘Square equals itself and so it satisfies an algebraic equation and is an observable. Its eigenvalues are 1 and 0, with IA) as the eigenket belonging to the eigenvalue 1 and all kets orthogonal to IA) as eigenkets belonging to the eigenvalue 0. A measurement of the observable thus certainly gives the result 1 if the dynamical System is in the state corresponding to IA) and the result 0 if the System is in any orthogonal state, so the observable may be described as the quantity which determines whether the System is in the state IA) or not.
Before concluding this section we should examine the conditions for an integral such as occurs in (24) to be significant. Suppose IX} and 13’) are two kets which tan be expressed as integrals of eigenkets

of the observable 6,

IX> = j- It’+ dt’, 1Y> = f lF’y> dt”,

x and y being used as labels to distinguish the two integrands. Then we have, taking the conjugate imaginary of the first equation and
multiplying by the second

<XI Y> = jj- <Wt”y> &W”-

(28)

Consider now the Single integral

(29)

*’

From the orthogonality theorem, the integrand here must vanish

over the whole range of integration except the one Point [” = [‘.

If the integrand is finite at this Point, the integral (29) vanishes, and

if this holds for all f’, we get from (28) that (XI Y) vanishes. Now

in general <X 1Y) does not vanish, so in general (6’~ 15’~) must be

infinitely great in such a way as to make (29) non-vanishing and

finite. The form of infinity required for this will be discussed in 5 15.

In our work up to the present it has been implied that our bra and

ket vectors are of finite Iength and their scalar products are finite.

We see now the need for relaxing this condition when we are dealing

with eigenvectors of an observable whose eigenvalues form a range.

If we did not relax it, the phenomenon of ranges of eigenvalues could

not occur and our theory would be too weak for most practical

Problems.

40

DYNAMICAL VARIABLES AND OBSERVABLES

§ 10

Taking 1Y) = IX) above, we get the result that in general (5’~ If’x) is infinitely great. We shall assume that if 1s’~) # 0

s Gf’x lt?> 47 > 0,

(30)

as the axiom corresponding to (8) of 3 6 for vectors of infinite

length.

The space of bra or ket vectors when the vectors are restricted to

be of finite length and to have finite scalar products is called by

mathematicians a Hilbert space. The bra and ket vectors that we

now use form a more general space than a Hilbert space.

We tan now see that the expansion of a ket 1P) in the form of the

right-hand side of (26) is unique, provided there are not two or more

terrns in the sum referring to the same eigenvalue. To prove this

result, let us suppose that two different expansions of 1P) are pos-

sible. Then by subtracting one from the other, we get an equation

of the form

0 = s Ib> dt' + 18 It?),

(31)

a and b being used as new labels for the eigenvectors, and the sum

over s including all terms left after the subtraction of one sum from

the other. If there is a term in the sum in (31) referring to an eigen-

value fl not in the range, we get, by multiplying (31) on the left by

(&l and using the orthogonality theorem,

which contradicts (8) of 5 6. Again, if the integrand in (31) does not vanish for some eigenvalue 5” not equal to any (6 occurring in the sum, we get, by multiplying (3 1) on the left by (["a 1and using the orthogonality theorem,
0 =f (f”al(‘a> dt’,
which contradicts (30). Finally, if ‘there is a term in the sum in (31) referring to an eigenvalue [i in the range, we get, multiplying (31) on the 14% by (fb 1,

0 = s<~W’~> dt’ +<~tWt~>

(32)

and multiplying (31) on the left by @al

0 = s<&lf’a> dt’ +C&4@>.

(33)

Now the integral in (33) is finite, so @aIftb) is finite and @b Ipa) is

finite. The integral in (32) must then be Zero, so (ftbIetb) is Zero and

*

OBSERVABLES

41

we again have a contradiction. Thus every term in (31) must vanish and the expansion of a ket lP> in the form of the right-hand side of (25) must be unique.

11. Functions of observables
Let ,$ be an observable. We tan multiply it by any real number k and get another observable k(. In Order that our theory may be self-consistent it is necessary that, when the System is in a state such that a measurement of the observable 5 certainly gives the result t’, a measurement of the observable k[ shall certainly give the result Er. It is easily verified that this condition is fulfilled. The ket corresponding to a state for which a measurement of f certainly gives the result 6’ is an eigenket of 4, It’> say, satisfying

This equation leads to

showing that 14’) is an eigenket of k( belonging to the eigenvalue kf’, and thus that a measurement of k( will certainly give the result -4’.
More generally, we may take any real function of f, f(l) say, and
consider it as a new observable which is automatically measured
whenever 4 is measured, since an experimental determination of the
value of f also provides the value Off([). We need not restritt f(f) to
be real, and then its real and pure imaginary Parts are two observables which are automatically measured when 8 is measured. For the theory
to be consistent it is necessary that, when the System is in a state
such that a measurement of 6 certainly gives the result f’, a measure-
ment of the real and pure imaginary Parts Off([) shall certainly give for results the real and pure imaginary Parts off(6’). In the case when
f(t) is expressible as a power series

f(6) = c,+c,~+c2~2+c,~3+**.,
the c’s being numbers, this condition tan again be verified by elementary algebra. In the case of more general functions f it may not be possible to verify the condition. The condition may then be used to define f(f), whic hwe have not yet defined mathematically. In this way we tan get a more general definition of a function of an observable than is provided by power series.
We define f(f) in general to be that linear Operator which satisfies

m It’> = fr> IQ’>

(34)

42

DYNAMICAL VARIABLES AND OBSERVABLES

0 11

for every eigenket 1s’) of [, f(f’) b eing a number for each eigenvalue 5’. It is easily seen that this definition is self-consistent when applied to eigenkets 14’) that are not independent. If we have an eigenket If’A) dependent on other eigenkets of 6, these other eigenkets must all belong to the same eigenvalue t’, otherwise we should have an equation of the type (31)) which we have seen is impossible. On multiplying the equation which expresses I[‘A) linearly in terms of the other eigenkets of 4 by f(4) on the left, we merely multiply each term in it by the number f(e’), so we obviously get a consistent equation. Further, equation (34) is suficient to define the linear Operator f(e) completely, since to get the result Off(f) multiplied into an arbitrary ket IP), we have only to expand IP) in the form of the right-hand side of (25) and take

The conjugate complex f(E) of f(f) is defined by the conjugate imaginary equation to (34), namely

<5vm = 3@3tc 19 holding for any eigenbra (P’I, f(f’) being the conjugate complex function to f([‘). Let us replace f’ here by 4” and multiply the equation on the right by the arbitrary ket 1P). Then we get, using the expansion (26) for IP),
cmIp> = #iY&“K5”Ip>

= 13WY’IW dt’ + ~,fCWlbO

= j=3(F):5” IO> W +,fFW’lC’~>

(36)

with the help of the orthogonality theorem, (t” If”d) being understood to be zero if LJ” is not one of the eigenvalues to which the terms
in the sum in (25) refer. Again, putting the conjugate complex
function 3( f’) for f(f’) in (35) and multiplying on the left by {f” 1,
we get
C%&W’> = ~3(~W”l~‘c> dt’ +3(5”)GT’d>.

The right-hand side here equals that of (36), since the integrands vanish for 5’ # r, and hence

<rlf@ IJ? = <mo In.

§ 11

FUNCTIONS OF OBSERVABLES

This holds for (4” 1any eigenbra and 12’) any ket, so

Thus the conjugate cornplex of the linear Operator f(4) is the conjugate conaplex function f of e.
It follows as a corollary that if f ([‘) is a real function of t’, f(t) is a real linear Operator. f(f) is then also an observable, since its eigenstates form a complete set, every eigenstate of 6 being also an eigenstate of f (k).
With the above definition we are able to give a meaning to any
function f of an observable, provided only thut the domain of existente of the function of a real variable f(x) includes all the eigenvalues of the observable. If the domain of existente contains other Points besides these eigenvalues, then the values Off(x) for these other Points will not affect the function of the observable. The function need not be analytic or continuous. The eigenvalues of a function f of an observable are just the function f of the eigenvalues of the observable.
It is important to observe that the possibility of defining a function f of an observable requires the existente of a unique number f(x) for each value of x which is an eigenvalue of the observable. Thus the function f(x) must be Single-valued. This may be illustrated by considering the question: When we have an observable f(A) which is a real function of the observable A, is the observable A a function of the observable f (A ) 1 The answer to this is yes, if diff erent eigenvalues A’ of A always lead to different values of f(A’). If, however, there exist two different eigenvalues of A, A’ and A” say, such that f (A’) = f(A”), then, corresponding to the eigenvalue f(A’) of the observable f(A), there will not be a unique eigenvalue of the observable A and the latter will not be a function of the observable f(A).
It may easily be verified mathematically, from the definition, that the sum or product of two functions of an observable is a function of that observable and that a function of a function of an observable is a function of that observable. Also it is easily seen that the whole theory of functions of an observable is symmetrical between bras and kets and that we could equally weil work from the equation

Wf (0 = f b?) ~5’ 1

(38)

instead of from (34).

We shall conclude this section with a discussion of two examples

which are of great practical im.portance, namely the reciprocal and

44

DYNAMICAL VARIABLES AND OBSERVABLES

§ 11

the Square root. The reciprocal of an observable exists if the observ-

able does not have the eigenvalue Zero. If the observable cx does not

have the eigenvalue Zero, the reciprocal observable, which we call a--l

or I/cz, will satisfy

OL-qx’) = a’-lIQI’)>

(39)

where ja’> is an eigenket of 01 belonging to the eigenvalue a’. Hence

cwl~a’) = ad-lla’) = Ia’).

Since this holds for any eigenket Ia’), we must have

cmF1 = 1.

(40)

Similarly,

cy-% = 1.

(41)

Either of these equations is sufficient to determine a--l completely,

provided 01 does not have the eigenvalue Zero. To prove this in the

case of (40), let x be any linear Operator satisfying the equation

ax = 1

and multiply both sides on the left by the a-1 defined by (39). The

result is

&-l&x = (y-1

and hence from (41)

X =a -1.

Equations (40) and (41) tan be used to define the reciprocal, when
it exists, of a general linear Operator CII, which need not even be real. One of these equations by itself is then not necessarily sufficient. If

any two linear Operators (I! and ß have reciprocals, their product aß

has the reciprocal

(aß)-1 = ß-kl,

(42)

obtained by taking the reciprocal of each factor and reversing their
Order. We verify (42) by noting that its right-hand side gives unity when multiplied by aß, either on the right or on the left. This reci-
procal law for products tan be immediately extended to more than

two factors, i.e.,

(aßy...)-1 = . ..y-lß-101-1.

The Square root of an observable a always exists, and is real if CI has no negative eigenvalues. We write it & or &. It satisfies

dcxIa’> = f&‘lcY’),

(43)

Ia’> being an eigenket of c1 belonging to the eigenvalue 01’. Hence

&&%la’) = &‘&‘lc%‘) = a’la’) = a~cx’),

and since this holds for any eigenket ja’> we must have

4da = a.

(44)

FUNCTIONS OF OBSERVABLES

46

0 11

On account of the ambiguity of sign in (43) there will b8 several Square roots. To fix one of them we must specify a particular sign in (43) for each eigenvalue. This sign may vary irregularly fiom one eigenvalue to the next and equation (43) will always define a linear Operator & satisfying (44) and forming a square-root function of a. If there is an eigenvalue of a with two or more independent eigenkets belonging to it, then we must, according to our definition of a function, have the same sign in (43) for each of these eigenkets. If we took different signs, however, equation (44) would still hold, and hence equation (44) by itself is not sufficient to define &, except in the special case when there is only one independent eigenket of a belonging to any eigenvalue.
The number of different Square roots of an observable is 2n, where n is the total number of eigenvalues not Zero. In practice the squareroot function is used only for observables without negative eigenvalues and the particular Square root that is useful is the one for which the positive sign is always taken in (43). This one will be called the positive squure root.

12. The general physical interpretation The assumptions that we made at the beginning of 5 10 to get a
physical interpretation of the mathematical theory are of a rather special kind, since they tan be used only in connexion with eigenstates. We need some more general assumption which will enable us to extract physical information from the mathematics even when we are not deeling with eigenstates.
In classical mechanics an observable always, as we say, ‘has a value’ for any particular state of the System. What is there in quanturn mechanics corresponding to this? If we take any observable 6 and any two states x and y, corresponding to the vectors (XI and Iy), then we tan form the number (xj,$ly). This number is not very closely analogous to the value which an observable tan ‘have’ in the classical theory, for three reasons, namely, (i) it refers to two states of the System, while the classical value always refers to one, (ii) it is in general not a real number, and (iii) it is not uniquely determined by the observable and the states, since the vectors (XI and 1~) contain arbitrary numerical factors. Even if we impose on (XI and 19) the condition that they shall be normalized, there will still be an undetermined factor of modulus unity in (x Ie 1~). These three reasons cease

46

DYNAMICAL

VARIABLES AND OBSERVABLES

9 12

to apply, however, if we take the two states to be identical and 1~) to be the conjugate imaginary vector to (XI. The number that we then get, namely (x It IX>, is necessarily real, and also it is uniquely determined when (x jis normalized, since if we multiply (XI by the numerical factor ek, c being some real number, we must multiply
IX) by e-ZC and (xl[lx) will be unaltered. One mighf thus be inclined to make the tentative assumption fhat
the observable 5 ‘has the value’ (xl[lx) for the state x, in a sense analogous to the classical sense. This would not be satisfactory, + though, for the following reason. Let us fake a second observable r], which would have by the above assumption the value (~17 IX> for this same state. We should then expect, from classical analogy, fhat for this statte the sum of the two observables would have a value equal to the sum of the values of the two observables separately and the product of the two observables would have a value equal to the product of the values of the two observables separately. Actually, the tentative assumption would give for the sum of the two observables the value (x~[+T~x), which is, in fact, equal to the sum of (xl[lx) . and <x 17 IX), but for the product it would give the value (x lt7 IX)
or wqw, neither of which is connected in any simple way with WW and Wrllx)~
However, since things go wrang only with the product and not with the sum, it would be reasonable to cal1 <Xerox) the average value of the observable f for the state x. This is because the average of the sum of two quantities must equal the sum of their averages, but the average of their produot need not equal the product of their averages. We therefore make the general assumption that if the meusurement i of the observable f for the system in the stute correqonding to IX} is
made a lurge number of times, the average of all the results obtained will
j be +4~lx>, P rovided IX) is normalixed. If IX) is not normalized, as is necessarily the case if the stafe x is an eigenstate of some observable belonging to an eigenvalue in a range, the assumption becomes that the average result of a measurement of Q is proportional to (Xerox), This general assumption provides a basis for a general physical interpretation of the fheory. The expression that an observable ‘ has a particular value’ for a particular state is permissible in quantum mechanics in the spe&1 case when a measurement of the observable is certain to lead to the particular value, so that fhe state is an eigenstate of the observable.

It may easily be verified from the algebra that, with this restricted meaning for an observable ‘ having a value’, if two observables have values for a particular state, then for this state fhe sum’of the two observables (if this sum is an observablet) has a value equal to the sum of the values of the two observables separately and the product of the two observables (if this product is an observable$) has a value equal to the product of the values of the two observables separately.
In the general case we cannot speak of an observable having a value for a particular state, but we tan speak of its having an average value for the state. We tan go further and speak of the probability of its having any specified value for the state, meaning the probability of this specified value being obtained when one makes a measurement of the observable. This probability tan be obtained from the general assumption in the following way.

Let the observable be f and let the state correspond to the normalized ket IX>. Then the general assumption tells us, not only that the average value of 5 is (X Itlx), but also that the average value of any function of [,f(t) say, is (x jf(& IX). Takef(6) to be that function of 4‘ which is equal to unity when f = a, a being some real number, and zero otherwise. This function of [ has a meaning according to our

general theory of functions of an observable, and it may be denoted

by 8ta in conformity with the general notation of the Symbol 6 with

two suffixes given on p. 62 (equation (17)). The average value of

this function of (1‘ is just the probability, P, say, of 4 having the value

a. Thus

(45)

If a is not an eigenvalue of f, 66, multiplied into any eigenket of f is Zero, and hence Sta = 0 and P, = 0. This agrees with a conclusion
of 6 10, that any result of a measurement of an observable must be one of its eigenvalues.
If the possible results of a measurement of 6 form a range of numbers, the probability of f having exactly a particular value will be zero in most physical Problems. The quantity of physical importante is then the probability of f having a value within a small range, say fiom a to a+da. This probability, which we may call P(a) da, is

t This is not obviously so, since the sum may not have sticient eigenstates to

form a complete set, in which case the sum, considered as a Single quantity, would

not be measurable.

$ Here the reality condition may fail, as weil as the condition for the eigenstetes

to form a complete set.

I

48

DYNAMICAL VARIABLES AND OBSERVABLES

0 12

equal to the average value of that function of 6 which is equal to unity for f lying within the range a to a+da and zero otherwise. This function of 6 has a meaning according to our general theory of functions of an observable. Denoting it by x(e), we have

w dal = <x IX(f) IX>*

(46)

If the range a to a+da does not include any eigenvalues of f, we

have as above ~(8) = 0 and P(a) = 0. If IX) is not normalized, the

right-hand sides of (45) and (46) will still be proportional to the

probability of (t having the value CG and lying within the range a to

a+da respectively.

The assumption of $10, that a measurement of LJ is certain to give

the result [’ if the System is in an eigenstate of 6 belonging to the

eigenvalue Ir, is consistent with the general assumption for physical

interpretation and tan in fact be deduced from it. Working from the

general assumption we see that, if Ie’) is an eigenket of 6 belonging

to the eigenvalue e’, then, in the case of discrete eigenvalues of 8,

&&J 16’) = 0 unless a = f’, and in the case of a range of eigenvalues of e

#lt’) = 0 unless the range a to a+da includes 6’.

In either case, for the state corresponding to IE’>, the probability of [ having any value other than f is Zero.
An eigenstate of 6 belonging to an eigenvalue 6’ lying in a range is a state which cannot strictly be realized in practice, since it would need an infinite amount of precision to get 6 to equal exactly t’. The most that could be attained in practice would be to get ‘$ to lie within a narrow range about the value 4’. The System would then be in a state approximating to an eigenstate of 4. Thus an eigenstate belonging to an eigenvalue in a range is a mathematical idealization of what tan be attained in practice. All the Same such eigenstates play a very useful role in the theory and one could not very weh do without them. Science contains many examples of theoretical concepts which are hmits of things met with in practice and arc useful for the precise formulation of laws of nafure, although they are not realizable experimentally, and this is just one more of them. It may be that the infinite length of the ket vectors corresponding to these eigenstates is connecfed with their unrealizability, and that all realizable states correspond to ket vectors that tan be normalized and that form a Hilbert space.

x ,

§ 13

COMMUTABILITY AND COMPATIBILITY

49

13. Commutability and compatibility A state may be simultaneously an eigenstate of two observables.
If the state corresponds to the ket vector IA) and the observables arc 4 and 7, we should then have the equations

iV> = 5’IA>,

rllA> = q’lA>,

where t’ and 71’ arc eigenvalues of 4 and 7 respectively. We tan now deduce

5qIA> = Eq’lA> = h+O = MA> = $‘IA> = $dA>,

or

@r-rlOIA> = 0.

This suggests that the chances for the existente of a simultaneous eigenstate are most favourable if &- q[ = 0 and the two observables commute. If they do not commute a simultaneous eigenstate is not impossible, but is rather exceptional. On the other hand, if &ey do commute there exist so many simultaneous eigenstutes that they ferm a complete set, as will now be proved.
Let [ and 71 be two commuting observables. Take an eigenket of 7, 17’) say, belonging to the eigenvalue q’, and expand it in terms of eigenkets of 5 in the form of the right-hand side of (26), thus

hf> = J wc> at’ + cr lbifo.

(47)

The eigenkets of 6 on the right-hand side here have 7’ inserted in them as an extra label, in Order to remind us that they come from the expansion of a special ket vector, namely Iq’), and not a general one as in equation (25). We tan now show that each of these eigenkets of f is also an eigenket of 7 belonging to the eigenvalue 7’. We have

0 = h-$)Iq’) = j- (y-‘f)l~‘~‘c) dt’ + 27 (d)lStrll~>e (48)

Now the ket (q-q’) Ipq’d) satisfies

w3wwo = h-qfwqfa) = k~-~xwid>

= iF’(q--9’) lP@>,

showing that it is an eigenket of ,$ belonging to the eigenvalue p, and similarly the ket (q-- 7’) I,$‘q’c) is an eigenket of 6 belonging to the eigenvalue ff. Equation (48) thus gives an integral plus a sum of eigenkets of e equal to Zero, which, as we have seen with equation

3505.67

E

-7--

-- ------ - -

60

DYNAMICAL V A R I A B L E S A N D O B S E R V A B L E S

§ 13

(3l), is impossible unless the integrand and every term in the sum
vanishes. Henne
k-77’)IWc> = 0, br--71’m?‘d) = 0, so that all the kets appearing on the right-hand side of (47) are
eiged.mts of r] as well as of e. Equation (47) now gives 117’) expanded in terms of simultaneous eigenkets of 5 and r]. Since any ket tan be expanded in terms of eigenkets Iq’> of 7, it follows that any ket tan be expanded in terms of simultaneous eigenkets of [ and 7, and thus the simultaneous eigenstafes form a complete set.
The above simultaneous eigenkets of 4 and 7, Ie’q’c) and 1pq’d), are labelled by the eigenvalues 6’ and q’, or e and q’, to which they belong, together with the labels c and d which may also be necessary. The procedure of using eigenvalues as labels for simultaneous eigenvectors will be generally followed in the future, just as it has been followed in the past for eigenvectors of Single observables.
The converse to the above theorem says that, if 5 and 7 are two * observables such that their simultaneous eigenstates form a complete set, then f and 7 wmmute. To prove this, we note that, if jt’q’> is a simultaneous eigenket belonging to the eigenvalues 4’ and v’,

@l--77i3 kf’rl’) = ~~‘?I’-&?) Ii?rl’) = 0.

(49)

,

Since the simultaneous eigenstates form a complete set, an arbitrary

ket IP> tan be expanded in terms of simultaneous eigenkets l[‘q’),

for each of which (49) holds, and hence

(h-m-e = 0

and so

t+-174‘= 0.

The idea of simultaneous eigenstates may be extended to more than two observables and the above theorem and its converse still

hold, i.e. if any set of observables commute, each with all the others,

their simultaneous eigenstates form a complete set, and conversely.

The Same arguments used for the proof with two observables are

adequate for the general case; e.g., if we have three commuting

observables f, 7, 5, we tan expand any simultaneous eigenket of 4‘

and r) in terms of eigenkets of 5 and then show that each of these

eigenkets of 5 is also an eigenket of 5 and of 7. Thus the simultaneous

eigenket of e and 7 is expanded in terms of simultaneous eigenkets

of e, v, and f, and since any ket tan be expanded in terms of simul-

taneous eigenkets of t and 7, it tan also be expanded in terms of

simultaneous eigenkets of 4, 11, and 5.

§ 13

COMMUTABILITY AND COMPATIBILITY

The orthogonality theorem applied to simultaneous eigenkets teils
us that two simultaneous eigenvectors of a set of commuting observ-
ables are orthogonal if the sets of eigenvalues to which they belong differ in any way.
Owing to the simultaneous eigenstates of two or more commuting
observables forming a complete set, we tan set up a theory of functions of two or more commuting observables on the same lines as the
theory of functions of a Single observable given in $ 11. If 5, 7, c,... are commuting observables, we define a general function f of them to be that linear Operator f([, 7, (I, . ..) which satisfies

f<& rl, L.4 kw5’*.-> = fk!‘, $9 L.)l4’77’5’.-1,

w

L

where \,$‘q’c’.. .) is any simultaneous eigenket of e,~, c,... belonging to the eigenvalues e’, q’, c’,... . Here f is any function such that f(a, b, c,... ) is defined for all values of a, b, c,. . . which are eigenvalues of & 7, L respectively. As with a function of a Single observable defined by (34), we tan show that f(e, 7, c,...) is completely determined by (50), that

corresponding to (37), and that if f(a, b, c, . ..) is a real function, f([, q, 5 ,...) is real and is an observable.
We tan now proceed to generalize the results (45) and (46). Given a set of commuting observables [, 7, c,..., we may form that function of them which is equal to unity when 6 = a, 7 = 6, 5 = c ,..., a, b, c ,... being real numbers, and is equal to Zero when any of these conditions is not fulfilled. This function may be written 6ta 6,, $+..., and is in fact just the product in any Order of the factors Sta, $,, 6cC,. . . defined as functions of Single observables, as may be seen by substituting this product for f(e, 7, c,...) in the left-hand side of (50). The average value of this function for any state is the probability, Ph... say, of . [, ~,c ,... having the values a, b, c ,... respectively for that state. Thus if the state corresponds to the normalized ket vector IX), we get from our general assumption for physical interpretation

Pabc... = <x\a,$as$ a&*** IX>*

(61)

cbc... is Zero unless each of the numbers a, b, c,. . . is an eigenvalue of
the corresponding observable. If any of the numbers a, b, c,... is an
eigenvalue in a range of eigenvalues of the corresponding observable,

PtiC,.. will usually again be Zero, but in this case we ought to replace

62

DYNAMICAL VARIABLES A N D OBSERVABLES

9 13

the requiremenf that this ohservable shall have exactly one value by
the requirement that it shall have a value lying within a small range, which involves replacing one of the 6 factors in (51) by a factor like
the ~(6) of equafion (46). On carrying out such a replacement for each of the observables 4, 7, 5 ,..., whose corresponding numerical value a, b, c,... lies in a range of eigenvalues, we shall get a proba-
bility which does not in general vanish.
If certain observables commute, there exist states for which they all
have particular values, in the sense explained at the bottom of p. 46,
namely the simultaneous eigenstates. Thus one tan give a wuning to several commuting observables having values at the Same time. Further, we see from (61) that for any state one tun give a meaning to the probability of partklar results being obtained for simultaneous measurements of several wmmuting observables. This conclusion is an important new development . In general one cannot make an Observation on a System in a definite state without disturbing that state and spoiling
it for the purposes of a second Observation. One cannot then give
any meaning to the two observations being made simultaneously.
The above conclusion teils us, though, that in the special case when
the two observables commute, the observations are to be considered as non-interfering or compatible, in such a way that one tan give a
meaning to the two observations being made simultaneously and tan
discuss the probability of any particular results being obtained. The two observations may, in fact, be considered as a Single Observation
of a more complicated type, the result of which is expressible by two
numbers instead of a Single number. Prom the Point of view of general theory, any two or more commuting observables may be counted us a
Single observable, the result of a measurement of which consists of two or more numbers. The states for which this measurement is certain to t lead to one particular result are the simultaneous eigenstates.

111
REPRESENTATIONS
14. Basic vectors IN the preceding chapters we sef up an algebraic scheme involving certain abstract quantities of three kinds, namely bra vectors, ket
vectors, and linear Operators, and we expressed some of the fundamental laws of quantum mechanics in terms of them. It would be
possible to continue to develop the theory in terms of these abstract quantities and to use them for applications to particular Problems. However, for some purposes it is more convenient to replace the abstract quantities by sets of numbers with analogous mathematical properties and to work in terms of these sets of numbers. The procedure is similar to using coordinates in geometry, and hss the advantage of giving one greater mathematical power for the solving of particular Problems.
The way in which the abstract quantities arc to be replaced by numbers is not unique, there being many possible ways corresponding to the many Systems of coordinates one tan have in geometry. Esch of these ways is called a representution and the set of numbers that replace an abstract quantity is called the representutive of that abstract quantity in the representation. Thus the representative of
an abstract quantity corresponds to the coordinates of a geometrical Object. When one has a particular Problem to work out in quantum mechanics, one tan minimize the labour by using a representation in which the representatives of the more important abstract quantities occurring in that Problem are as simple as possible.
To set up a representation in a general way, we take a complete set of bra vectors, i.e. a set such that any bra tan be expressed linearly in terms of them (as a sum or an integral or possibly an integral plus a sum). These bras we cal1 the basic bras of the representation. They are sufficient, as we shall see, to fix the representation completely.
Take any ket Ia) and form its scalar product with each of the basic bras. The numbers so obtained constitute the representative of ja). They are sufficient to determine the ket Ia) completely, since if there is a second ket, Ia,) say, for which these numbers are the Same, the differente Ia)- Ia,) will have its scalar product with any basic bra

04

REPRESENTATIONS

0 14

vatishing, and hence its scalar product with any bra whatever will
van& and ja)- Ia,) itself will van& We may suppose the basic bras to be labelled by one or more
Parameters, h,, h, ,..., h,, each of which may take on certain numerical values, The basic bras will then be written (h, AZ.. .h, 1and the representative of ja> will be written (h, X,... AU ja>. This representative will now consist of a set of numbers, one for each set of values that hl, &r..*, h, may have in their respective domains. Such a set of numbers just forms a fmction of the variables A1, AZ,..., AU. Thus the representative of a ket may be looked upon either as a set of numbers or as a function of the variables used to label the basic bras.
If fhe number of independent states of our dynamical System is finite, equal to n say, it is sufficient to take n basic bras, which may be labelled by a Single Parameter h taking on the values 1,2,3,..., n. The representative of any ket Ia) now consists of the set of n numbers (1 Ia>, <21@, (3 Ia)>*.*, (nlu), which are precisely the coordinates of the vector Ia) referred to a System of coordinates in the usual way. The idea of the representative of a ket vector is just a generalization of the idea of the coordinates of an ordinary vector and reduces to the latter when the number of dimensions of the space of the ket vectors is finite.
In a general representation there is no need for the basic bras to be all independent. In most representations used in practice, however, they are all independent, and also satisfy the more stringent condition that any two of them are orthogonal. The representation is then called an orthogonal representation.
Take an orthogonal representation with basie bras (h, h,...h, 1, labelled by Parameters A1, A2,. . . , X, whose domains are all real. Take a ket Ia> and ferm its representative (h,h,...A,lu). Now form the numbers A,(hlh,...h, Ia) and consider them as the representative of a new ket Ib). This is permissible since the numbers forming the represenfative of a ket are independent, on account of the basic bras being independent, The ket Ib) is defined by the equation

(&&&$Jb> = h,<A,h,...h,lu).

The ket Ib) is evidently a linear function of the ket Ia}, so it may be eonsidered as the result of a linear Operator applied to la;>. Cabg this linear Operator L,, we have

10 = & Ia>

§ 14

BASIC VECTORS

56

and hence

(X, &...h,, 1L, Ia) = h,(X, h,...X, Ia).

This equation holds for any ket ja), so we get

(h, h,...h, 1L, = h,(X, X,...h, 1.

(1)

Equation (1) may be looked upon as the definition of the linear

Operator L,. It Shows that euch basic bra is an eigenbra of L,, the value of the Parameter X, being the eigenvalue belonging to it.

From the condition that the basic bras are orthogonal we tan

deduce that L, is real and is an observable. Let Xi, hk,. . ., & and

x;, Ai,..., Ai be two sets of values for the Parameters h,, Ag,. . ., h,.

We have, putting h”s for the X’s in (1) and multiplying on the right

by IA;h&..Ac), the conjugate imaginary of the basic bra (A2ha...AiI,

{x;h~...XulL,Ih:ha...~) Interchanging X”s and h”‘s,

= h;(h;A$...~lh;~~...hU).

{Xi x;..q L, p; hk...G) = X;(h; h;...A.p; Aß...&).

On account of the basic bras being orthogonal, the right-hand sides here vanish unless hr = & for all T from 1 to u, in which case the right-hand sides are equal, and they are also real, Ai being real. Thus, whether the X”‘s are equal to the X”s or not,

<h;hB...~IL,IX;hij...~) = -(X-,F X,-...-~~ IL-,I-h;~~...Xu>

= (Xih2...XuI~,li\;)12...~)

from equation (4) of $ 8. Since the (h; Ai.. .& 1’s form a complete set of bras and the /Ai A~...~)‘s form a complete set of kets, we tan infer that L, = -f;,. The further condition required for L, to be an observable, namely that its eigenstates shall form a complete set, is obviously satisfied since it has as eigenbras the basic bras, which form a complete set.
We tan similarly introduce linear Operators L,, Lw.., L, by multi-
plying (h, h,. . .h, Ia) by the factors A2, X,, . . . , h, in turn and considering the resulting sets of numbers as representatives of kets. Esch of these L’s tan be shown in the Same way to have the basic bras as eigenbras and to be real and an observable. The basic bras are simultaneous eigenbras of all the L’s. Since these simultaneous eigenbras form a complete set, it follows from a theorem of $13 that any two of the L’s commute.
It will now be shown that, if &,f2,..., fU are any set of commuting observables, we tun set up an orthogonal representution in which the basic bras are simultuneous eigenbras of 5;, [%,..., fU. Let us suppose 6rst that

66

REPRESENTATIONS

9 14

there ia only one independent simultaneous eigenbra of fl, t2,..., 4, belonging to any set of eigenvalues f;, &.,.. . , 5;. Then we may take these simultaneous eigenbras, with arbitrary numerical coefficients, as our basic bras. They are all orthogonal on account of the orthogonality theorem (any two of them will have at least one eigenvalue different, which is sufficient to make them orthogonal) and there are sufficient of them to form a complete set, from a result of 6 13. They may conveniently be labelled by the eigenvalues & SS,... , & to which they belong, so that one of them is written (6; (32..&].
Passing now to the general case when there are several independent simultaneous eigenbras of &, t2,..., CU belonging to some sets of eigenvalues, we must pick out from all the simultaneous eigenbras belonging to a set of eigenvalues 6;) &, . . . , CU a complete subset, the members of which are all orthogonal to one another. (The condition of completeness here means that any simultaneous eigenbra belonging to the eigenvalues [i, [i,..., & tan be expressed linearly in terms of the members of the subset.) We must do this for each set of eigenvalues Ei, &,..., & and then put all the members of all the subsets together and take them as the basic bras of the representation. These bras are all orthogonal, two of them being orthogonal from the orthogonality theorem if they belong to different sets of eigenvalues and from the special way in which they were Chosen if they belong to the same set of eigenvalues, and they form altogether a complete set of bras, as any bra tan be expressed linearly in terms of simultaneous eigenbras and each simultaneous eigenbra tan then be expressed linearly in terms of the members of a subset. There are infmitely many ways of choosing the subsets, and each way provides one orthogonal representation.
For labelling the basic bras in this general case, we may use the eigenvalues & &..., & to which they belong, together with certain additional real variables h,, &, . . . , &, say , which must be introduced to distinguish basic vectors belonging to the same set of eigenvalues from one another. A basic bra is then written (k; &...& hIh,...h,I. Corresponding to the variables X,, X,, . . ., &, we tan define linear Operators L,, I&,..., L, by equations like (1) and tan show that these linear Operators have the basic bras as eigenbras, and that they are real and observables, and that they commute with one another and with the 6’s. The basic bras are now simultaneous eigenbras of all
the commuting observables fl, e2 ,..., tu, L,, L, ,..., L,.

§ 14

BASIC VECTORS

Let us define a campbete set of commuting obseruables to be a set of observables which all commute with one another and for which there is only one simultaneous eigenstate belonging to any set of eigenvalues. Then the observables fl, fZ ,..., [,, L,, L, ,..., L, form a complete set of commuting observables, there being only one independent simultaneous eigenbra belonging to the eigenvalues e;, 62 ,..., &, h,, &. ,..., 4, namely the corresponding basic bra. Similarly the observables L,, L2,..., L, defined by equation (1) and the following work form a complete set of commuting observables. With the help of this definition the main results of the present section tan be concisely
formulated thus: (i) The basic bras of an orthogonal representation are simultaneous eigenbras of a complete set of commuting observables. (ii) Given a complete set of commuting observables, we tan set . up an orthogonal representation in which the basic bras are simultaneous eigenbras of this complete set.
(iii) Any set of commuting observables tan be made into a complete commuting set by adding certain observables to it.
(iv) A convenient way of labelling the basic bras of an orthogonal representation is by means of the eigenvalues of the complete set of commuting observables of which the basic bras are simultaneous eigenbras.
The conjugate imaginaries of the basic bras of a representation we cal1 the basic kets of the representation. Thus, if the basic bras arc denoted by (h, &. ..h, 1, the basic kets will be denoted by Ih, &..h,>. The representative of a bra (b 1is given by its scalar product with each of the basic kets, i.e. by (blh, A,...h,). It may, like the representative of a ket, be looked upon either as a set of numbers or as a function of the variables h,, &,. . ., X,. We have

(b /Al h,. . .A,> = (h, h,...h, 1b),

showing that the representatiue of a bra is the conjugate complex of the representative of tke conjugate imuginary Eet. In an orthogonal representation, where the basic bras are simultaneous eigenbras of a complete set of commuting observables, fx, f2,..., & say, the basic kets will be simultaneous eigenkets of fl, e2,..., &.
We have not yet considered the lengths of the basic vectors. With an orthogonal representation, the natura1 thing to do is to normalize

REPRESENTATIONS
the basic vectors, rather than leave their lengths arbitrary, and so introduce a further Stage of simplification into the representation. However, it is possible to normalize them only if the Parameters which label them all take on discrete values. If any of these parameters are continuous variables that tan take on all values in a range, the basic vectors are eigenvectors of some observable belonging to eigenvalues in a range and are of infinite length, from the discussion in $ 10 (see.p. 39 and top of p. 40). Some other procedure is then needed to fix the numerical factors by which the basic vectors may be multiplied. To get a convenient method of handling this question a new mathematical notation is required, which will be given in the next section.
15. The S function Our work in 6 10 led us to consider quantities involving a certain
kind of infinity. To get a precise notation for dealing with these infinities, we introduce a quantity S(x) depending on a Parameter x satisfying the conditions
Co s S(x) dz = 1
-* S(x) = 0 for x # 0.
To get a picture of S(x), take a function of the real variable x which vanishes everywhere except inside a small domain, of length E say, surrounding the origin x = 0, and which is so large inside this domain that its integral over this domain is unity. The exact shape of the function inside this domain does not matter, provided there are no unnecessarily wild variations (for example provided the function is always of Order 4). Then in the limit E -+ 0 this function will go over into S(X).
S(x) is not a function of x according to the usual mathematical definition of a function, which requires a function to have a definite value for each Point in its domain, but is something more general, which we may call an ‘improper function’ to show up its differente from a function defined by the usual definition. Thus S(x) is not a quantity which tan be generally used in mathematical analysis like an ordinary function, but its use must be confined to certain simple types of expression for ‘which it is obvious that no inconsistency tan arise.

0 16

THE 6 FUNCTION

The most important proper@ of S(X) is exemplified by the follow-

ing equation,

w

-0s3 f(4w9 dx = f(O),

(3)

where f(x) is any continuous function of x. We tan easily see the

validity of this equation ficom the above picture of S(x). The left-

hand side of (3) tan depend only on the values of f(x) very close

to the origin, so that we may replace f(x) by its value at the origin,

f(O), without essential error. Equation (3) then follows from the

first of equations (2). By making a Change of origin in (3), we tan

deduce the formula co

-Cso fWW4 dx = f(a),

(4)

where a is any real number. Thus the process of multiplying a function

of x by S(x-a) and integrating over all x is equivalent to the process of

substituting a for x. This general result holds also if the function of x is

not a numerical one, but is a vector or linear Operator depending on x.

The range of integration in (3) and (4) need not be from --Co to CO, but may be over any domain surrounding the critical Point at which

the S function does not vanish. In future the Limits of integration

will usually be omitted in such equations, it being understood that the domain of integration is a suitable one.

Equations (3) and (4) Show that, although an improper function does not itself have a weh-defined value, when it occurs as a factor

in an integrand the integral has a well-defined value. In quantum

theory, whenever an improper function appears, it will be something

which is to be used ultimately in an integrand. Therefore it should be possible to rewrite the theory in a form in which the improper func-

tions appear all through only in integrands. One could then eliminate

the improper functions altogether. The use of improper functions

thus does not involve any lack of rigour in the theory, but is merely

a convenient notation, enabling us to express in a concise form certain relations which we could, if necessary, rewrite in a form not

involving improper functions, but only in a cumbersome way which

would tend to obscure the argument.

An alternative way of defining the S function is as the differential

coefficient E’(X) of the function E(X) given by

E(X) = 0 (x < 0) = 1 (x > 0).

1 (5)

00

REPRESENTATIONS

0 15

We may verify that this is equivalent to the previous definition by substituting E’(X) for S(x) in the left-hand side of (3) and integrating by Parts. We find, for g, and g, two positive numbers,

-91u2lf~x)Ef~x)ax = [ft4+J)]8’up-

p’f’wt4 ax -02

= fkd- P m fJx

-f(O), O

in agreement with (3). The 8 function appears whenever one differen-

tiates a discontinuous function.

There are a number of elemrntary equations which one tan write

down about 6 functions. These equations are essentially rules of

manipulation for algebraic werk involving 6 functions. The meaning

-

of any of these equations is that its two sides give equivalent results

as factors in an integrand.

q-x) = S(x) Examples of such equations are

s

s(a-x)

S(x2xSS4(a(2xx)))
ax s+b)

===
=

0, dBS~(-xl{)w-(Ja)+>s(xO)+, ~))
s+b),

@ > o>,

((((6798))))
(10)

f(x)S(x-a) = f(a)S(x-a).

(11)

Equation (6), which merely states that S(x) is an even function of its

variable x is trivial. To verify (7) take any continuous function of

x, f(x). Then

s f(x)xS(x) ax = 0,

from (3). Thus x 6(x) as a factor in an integrand is equivalent to

Zero, which is just the meaning of (7). (8) and (9) may be verified

by similar elementary arguments. To verify (10) take any continuous

function of a, f(a). Then

ff(q d”J s(a-x) ax S(X-b) = f s(x-b) axJf(a)adqa-x)

= 1 S(x-b) dxf(x) = 1 f(a) da S(a-4).

Thus the two sides of (10) are equivalent as factors in an integrand with a as variable of integration. It may be shown in the same way

§ 15

THE 8 FUNCTION

61

that they are equivalent also as factors in an integrand with b as

variable of integration, so that equation (10) is justified from either

of these Points of view. Equation (11) is also easily justified, with

the help of (4), from two Points of view.

Equation (10) would be given by an application of (4) with

f(x) = S(x-b). We h ave here an illustration of the fact that we may

often use an improper function as though it were an ordinary con-

tinuous function, without getting a wrong result.

Equation (7) Shows that, whenever one divides both sides of an

equation by a variable x which tan take on the value Zero, one

should add on to one side an arbitrary multiple of S(x), i.e. from an

equation

A-B

(12)

one cannot infer

A/x = Bfx,

but only

Alx = B/x+cSW,

(13)

where c is unknown.

As an illustration of work with the S function, we may consider the

differentiation of log x. The usual formula

-d&logx = 1

(14)

X

requires examination for the neighbourhood of x = 0. In Order to make the reciprocal function l/x well defined in the neighbourhood of x = 0 (in the sense of an improper function) we must impose on it an extra condition, such as that ite integral from -E to E vanishes. With this extra condition, the integral of the right-hand side of (14) from -E to E vanishes, while that of the left-hand side of (14) equals log (- l), so that (14) is not a correct equation. To correct it, we must remember that, taking principal values, logx has a pure imaginary term irr for negative values of x. As x Passes through the value Zero this pure imaginary term vanishes discontinuously. The differentiation of this pure imaginary term gives us the result -ins(x), so that ( 14) should read

zdlogx -L&(x).

(15)

X

The particular combination of reciprocal function and S function appearing in (15) plays an important part in the quantum theory of collision processes (see 5 50).

62

REPRESENTATIONS

§16

16. Properties of the basic vectors Using the notation of the 8 function, we tan proceed with the theory
of representations. Let us suppose first that we have a Single observable 4 forming by itself a complete commuting set, the condition for this being that there is only one eigenstate of 4 belonging to any eigenvalue [‘, and let us set up an orthogonal representation in which the basic vectors are eigenvectors of e and are written <[‘I, It’>.
In the case when the eigenvalues of ‘$ are discrete, we csn normalize the basic vectors, and we then have

<w’> = 0 (4’ # t3>

GT’> = 1. These equations tan be combined into the Single equation

<cr> = S@,

W-9

where the Symbol 6 with two suffixes, which we shall often use in the

future, has the meaning

srs = 0 w h e n rfs = 1 when r = s.

In the case when the eigenvalues of t are continuous we cannot normalize the basic vectors. If we now consider the quantity @‘lt”> with 4’ fixed and 6” varying, we see from the work connected with expression (29) of 6 10 that this quantity vanishes for 4” # 8’ and thet its integral over a range of 6” extending through the value f is finite, equal to c say. Thus

G’ 15”) = c s(&-y”). From (30) of 5 10, c is a positive number. It mag vary with f’, so we should write it ~(6’) or c’ for brevity, and thus we have

<kT”> = c’ S(f’-6’).

(18)

Alternatively, we. have

&?15”> = C” S(f’--f”),

(19)

where c” is short for c([“), the right-hand sides of (18) and (19) being

equal on account of (11).

Let us pass to another representation whose basic vectors arc

eigenvectors of e, the new basic vectors being numerical multiples of

the previous ones. Calling the new basic vectors (4’” 1, It’*), with the

additional label * to distinguish them from the previous ones, we have

(f’“l = W~‘l,

14’“) = m’>,

P

§ 16

PROPERTIES OF THE BASIC VECTORS

I63

where k’ is short for k(f) and is a number depending on 5’. We get

(t’* Ie”*> = k’~(f’ lf”) = k’j& S(,f’-4”)

with the help of (18). This may be written

from (11). By choosing k’ so that its modulus is c’-*, which is possible since c’ is positive, we arrange to have

(f’“l f’*> = S(&(“).

(20)

The lengths of the new basic vectors are now fixed so as to make the

representation as simple as possible. The way these lengths were

fixed is in some respects analogous to the normalizing of the basic

vectors in the case of discrete e’, equation (20) being of the form of

(16) with the 8 function S([‘--6”) replacing the 6 Symbol 8ee of

equation ( 16). We shall continue to work with the new representation

and shall drop the * labels in it to save writing. Thus (20) will now

be written

([‘lf’) = S([‘-5”).

(21)

We tan develop the theory on closely parallel lines for the discrete and continuous cases. For the discrete case we have, using (16),

c 15’>wY’> = 2 IEY,,l = 14”>,

5’

i?

the sum being taken over all eigenvalues. This equation holds for

any basic ket jr) and hence, since the basic kets form a complete set,

This is a useful equation expressing an important property of the basic vectors, namely, if je’> is multiplied on the right by (6’1 the
resulting linear Operator, summed for all (‘, equds the unit Operator. Equations (16) and (22) give the fundamental properties of the basic vectors for the discrete case.
Similarly, for the continuous case we have, using (21),

/ ~kf) dff wo = 1 14’) at’ w-rf)= 157

(23)

from (4) applied with a ket vector for f(x), the range of integration

being the range of eigenvalues. This holds for any basic ket 16”)

and hence

s 149 dt’ (~7 = 1.

(24)

64

REPRESENTATIONS

§ 16

This is of the same form as (22) with an integral replacing the sum. Equations (21) and (24) give the fundamental properties of the basic vectors for the continuous case.
Equations (22) and (24) enable one to expand any bra or ket in terms of the basic vectors. For example, we get for the ket IP) in the discrete case, by multiplying (22) on the right by IP),

IP> = 2 14’>(5’IP>~

(25)

t?

which gives /P) expanded in terms of the 14’)‘s and Shows that the

coefficients in the expansion are (5’1 P), which are just the numbers

forming the representative of 1P). Similarly, in the continuous case,

IP) = j- lt’> dt’ <W’>,

(26)

giving IP) as an integral over the lt’)‘s, with the coefficient in the

integrand again just the representative (6’ 1P) of 1P), The conjugate

imaginary equations to (25) and (26) would give the bra vector (P 1

expanded in terms of the basic bras.

Our present mathematical methods enable us in the continuous

case to expand any ket as an integral of eigenkets of 5. If we do not

use the 6 function notation, the expansion of,a general ket will consist

of an integral plus a sum, as in equation (25) of 5 10, but the 6 function

enables us to replace the sum by an integral in which the integrand

consists of terms each containing a & function as a factor. For

example, the eigenket 16”) may be replaced by an integral of eigen-

kets, as is shown by the second of equations (23).

If (Q 1is any bra and 1P) any ket we get, by further applications

of (22) and (24),

KW> = ya5’)(5’IP>

(27)

for discrete 6’ and

OW> = j- <&lf> dt’ <W>

(28)

for continuous 5’. These equations express the scalar product of (QI
and 1P) in terms of their representatives (Q It’) and (6’ 1P). Equa-
tion (27) is just the usual formula for the scalar product of two vectors in terms of the coordinates of the vectors, and (28) is the natura1 modification of this formula for the case of continuous t’, with an integral instead of a sum.
The generalization of the foregoing work to the case when 4‘ has both discrete and continuous eigenvalues is quite straightforward.

S 16

PROPERTIES OF THE BASIC VECTORS

65

Using 4’ and 4” to denote discrete eigenvalues and 6’ and 4” to denote continuous eigenvalues, we have the set of equations

Gw> = $Tg”>

@ll?> = 0, GT’> = w - 4 ” ) (29)

as the generalization of (16) or (21). These equations express that the basic vectors are all orthogonal, that those belonging to discrete eigenvalues are normalized and those belonging to continuous eigenvalues have their lengths fixed by the same rule as led to (20). Prom (29) we tan derive, as the generalization of (22) or (24),

the rsnge of integration being the range of continuous eigenvalues. With the help of (30), we get immediately
lP> = c4’ 14’>G?I~)+ 1 lt’> dt’ WlP>
as the generalization of (26) or (26), and

a,s the generalization of (27) or (28).

Let us now pass to the general case when we have several commuting

observables EI, t2,. . . , & forming a complete commuting set and set up

an orthogonal representation in which the basic vectors are simul-

taneous eigenvectors of all of them, and a;re mitten {&...& 1, I&..&).

Let us suppose e1,t2,..., & (V < u) have discrete eigenvalues and

4w+l,"',

6 have continuous eigenvalues.
u

Consider the quantity (&..& ~~+I..&j~;..~~ g+,..[t). Rom the

orthogonality theorem, it must vanish unless each 68 = 6: for

S = v+ l,.., u. By extending the work connected with expression

(29) of 6 10 to simultaneous eigenvectors of several commuting

observables and extending also the axiom (30), we find that the

(u-v)-fold integral of this quantity with respect to each fi over

a range extending through the value ei is a finite positive number.

Calling this number c’, the ’ denoting that it is a function of

s;,.., G, ka+iv*, G, we tan express our results by the equation

<~;..~~~~+,..~~1~;..~~5~+1~.su> = c’s(~~+,-5~+l)..s(~-~~), (33)

with one 8 factor on the right-hand side for each value of s from V+ 1 to u. We now Change the lengths of our basic vectors so as to

3696.57

F

66

REPRESENTATIONS

§ 16

make c’ unity, by a procedure similar to that which led to (20). By a further use of the orthogonality theorem, we get finally

with a two-suffix 8 Symbol on the right-hand side for each 4 with discrete eigenvalues and a 8 function for each ,$ with continuous eigenvalues. This is the generalization of (16) or (21) to the case when there are several commuting observables in the complete set.
From (34) we tan derive, as the generalization of (22) or (24)
(35)
the integral being a (u-v)-fold one over all the k”s with continuous eigenvalues and the summation being over all the [“s with discrete eigenvalues. Equations (34) and (35) give the fundamental properfies of the basic vectors in the present case. From (35) we tan immediately write down the generalization of (25) or (26) and of (27) or (28).
The case we have just considered tan be further generalized by allowing some of the 4’s to have both discrete and continuous eigenvalues. The modifications required in the equations are quite straightforward, but will not be given here as they are rather cumbersome to write down in general form-
There are some Problems in which it is convenient not to make the cf of equation (33) equal unity, but to make it equal to some definite function of the 6”s instead. Calling this function of the f”s p’-l we then have, instead of (34)
and instead of (35) we get
(37)
p’ is called the weight function of the representation, p’d,$,+,..d& being the ‘weight’ attached to a small volume element of the space of the variables cV+r,.., &.
The representations we considered previously all had the weight function unity. The introduction of a weight function not unity is entirely a matter of convenience and does not add anything to the mathematical power of the representation. The basic bras {f;...&* 1 of a representation with the weight function p’ are connected with

§ 16

PROPERTIES OF THE BASIC VECTORS

67

the basic bras (&..& 1of the corresponding representation with the weight function unity by

(&...fu*l = p’-~(~;...~ul,

(38)

as is easily verified. An example of a useful representation with

non-unit weight function occurs when one has two 5’s which are

the polar and azimuthal angles 8 and + giving a direction in three-

dimensional space and one takes p’ = sin 8’. One then has the elcment

of solid angle sin 8’ dPd+’ occurring in (37).

17. The representation of linear Operators

In 5 14 we saw how to represent ket and bra vectors by ssts of

numbers. We now have to do the same for linear Operators, in Order

to have a complete scheme for representing all our abstract quantities

by sets of numbers. The Same basic vectors that wo had in 3 14 tan

be used again for this purpose.

Let us suppose the basic vectors are simultaneous eigenvectors of

a complete set of commuting observables 41,eZ,...,[U. If 01 is any

linear Operator, we take a general basic bra (&.& 1and a general

basic ket jf;...fc) and form the numbers

{C$..~~~CX~~~*..~~).

(39)

These numbers are sufficient to determine 01 completely, since in the

first place they determine the ket 01jt;...tc) (as they provide the

representative of this ket), and the value of this ket for all the basic

kets 1~~...~~> determines CX. The numbers (39) are called the repre-

sentative of the linear Operator CY. or of the dynamical variable (x. They

are more complicated than the representative of a ket or bra vcctor

in that they involve the Parameters that label two basic vectora

instead of one.

Let us examine the form of these numbers in simple cases. Take

first the case when there is only one t, forming a complete commuting

set by itself, and suppose that it has discrete eigenvalues 6’. The

representative of 01 is then the discrete set of numbers (5’ [CX 14”). If

one had to write out these numbers explicitly, the natura1 way of

arranging them would be as a two-dimensional array, thus:

G?l4P> <511442> @blP> * l

G21d?>

GT2bE2>

(4‘214k3>

’

’

I <~314~1> (P14t2> <S3bE3> * ’ 1

(40)

...........

i.

.’

.

.

.

.

.

.

.

.

.

J

68

REPRESENTATIONS

§ 17

where tl, t2, t3,.. arc all the eigenvalues of [. Such an array is called

a mutrix and the numbers are called the elements of the matrix- We

make the convention that the elements must always be arranged SO

that those in the same row refer to the Same basic bra vector and

those in the Same column refer to the same basic ket vector.

An element ([‘[cu~[‘>referring to two basic vectors with the same

label is called a diagonal element of the matrix, as all such elements

lie on a diagonal. If we put Q: equal to unity, we have from (16) all

the diagonal elements equal to unity and all the other elements equal

to Zero. The matrix is then called the unit matrix.

If cx is real, we have

-_----

<0#‘> = <5”145’>*

(41)

The effect of these conditions on the matrix (40) is to make the

diagonal elements all real and each of the other elements equal the

conjugate complex of its mirror reflection in the diagonal. The matrix

is then called a Hermitian matrix.

If we put 01 equal to 4, we get for a general element of the matrix

~4’1&?‘> = mw’> = Q’&$$@.

(42)

Thus all the elements not on the diagonal are Zero. The matrix is then called a diagonul matrix. Its diagonal elements are just equal
to the eigenvalues of 5‘. More generally, if we put a equal to f(f), a
function of 6, we get

(6’ IM) lt?‘> = f@> Kp@

(43)

and the matrix is again a diagonal matrix. Let us determine the representative of a product @ of two linear
Operators a and ß in terms of the representatives of the factors. F’rom equation (22) with p substituted for er we obtain

~ww’> = G’b F l5”><~lPlr’>

= f11 G?l~l5”><5”ISlk’>~

(44)

which gives us the required result. Equation (44) Shows that the

matrix formed by the elements (~‘101/3l~) equals the product of the

matrices formed by the elements (6’ Ia 15”) and (k’ Iß 1~“) respectively,

according to the usual mathematical rule for multiplying matrices.

This rule gives for the element in the rth row and sth column of the

product matrix the sum of the product of each element in the rth

row of the first factor matrix with the corresponding element in the sth

s 17

THE REPRES ENTATION OF LINEAR OPERATORS

column of the second factor matrix. The multiplication of matrices is non-commutative, like the multiphcation of linear Operators.
We tan summarize our results for the case when there is only one t and it has discrete eigenvalues as follows:
(i) Any iinear operatdr is represented by a matrix. (ii) The unit Operator is represented by the unit mutrix. (iii) A real linear Operator is represented by a Hermitian rmztrix.

(iv) 6 and functions of ZJ aye represented by diagonal matrices. (v) The matrix representing the product of two linear Operators is the
product of the matrices representing the two factors. Let us now consider the case when there is only one e and it has continuous eigenvalues. The representative of a is now (~‘/~1~“), a function of two variables 6’ and 6” which tan vary continuously. It is convenient to cal1 such a function a ‘rnatrix’, using this word in a generalized sense, in Order that we may be able to use the same terminology for the discrete and continuous cases. One of these generalized matrices cannot, of course, be written out as a twodimensional array like an ordinary matrix, since the number of its rows and columns is an infinity equal to the number of Points on a line, and the number of its elements is an infinity equal to the number of Points in an area. We arrange our definitions concerning these generalized matrices so that the rules (i)-(v) which we had above for the discrete aase hold also for the continuous case. The unit Operator is represented by S(t’--f”) and the generalized matrix formed by these elements we define to be the unit mtrix. We still have equation (41) as the condition for 01 to be real and we define the generalized matrix formed by the elements (6’ ]o~]LJ”> to be Herrnitian when it satisfies this condition. 5 is represented by

(6’ lW> = 6’ W-f’)

(46)

aJ-d f (59 bY

<f’lf<f> lt’% = f(f) W-F’),

(46)

and the generalized matrices formed by these elements we define to be

diagonal mutrices. From (1 l), we could equally well have f” and f (t”)

as the coefficients of S([‘-5”) on the right-hand sides of (45) and (46)

respectively. Corresponding to equation (44) we now have, from (24)

<~‘b/W’> = j <5’14t”‘> dt”’ @‘l~lt”>,

(47)

with an integral instead of a sum, and we define the generalized matrix formed by the elements on the right-hand side here to be the

70

REPRESENTATIONS

$ 17

product of the matrices formed by (e’jaJ[“> and (t’J/314”). With these definitions we secure complete parallelism between the discrete and continuous cases and we have the rules (i)-(v) holding for both.
The question arises how a general diagonal matrix is to be defined in the continuous case, as so far we have only defined the right-hand sides of (45) and (46) to be examples of diagonal matrices. One
might be inclined to define as diagonal any matrix whose (f’, f”) elements all vanish except when t’ differs infinitely little from t”, but this would not be satisfactory, because an important property of diagonal matrices in the discrete case is that they always commute with one another and we want this property to hold also in the continuous case. In Order that the matrix formed by the elements
(4’1~ 15”) in the continuous case may commute with that formed by the elements on the right-hand side of (45) we must have, using the multiplication rule (47),

With the help of formula (4), this reduces to

<4’144”>4” = 4’w46”>

(48)

or

(pty)((’ Iw I(“) = 0.

This gives, according to the rule by which (13) follows from (12))

(&J 1f”) = c’ 6(&-tj”)

where c’ is a number that may depend on f’. Thus (c’ Iw 16”) is of the
form of the right-hand side of (46). For this reason we de$ne only matrices whose elements are of the ferm of the right-hund side of (46) to be diagonal matrices. It is easily verified that these matrices all commute with one another. One tan form other matrices whose
(t’, 4”) elements all vanish when 5’ differs appreciably from 4” and
have a different form of singularity when 5’ equals 6” [we shall later
introduce the derivative 6’(x) of the 6 function and 6’ (ff -6”) will
then be an example, see $22 equation (lg)], but these other matrices
are not diagonal according to the definition.
Let us now pass on to the case when there is only one [ and it has
both discrete and continuous eigenvalues. Using e, t8 to denote discrete eigenvalues and ff, 5” to denote continuous eigenvalues, we
now have the representative of a consisting of four kinds of quanti-
ties, (4’jaIF>, (p]oil~‘>, ([‘Icx]~), ([‘lar]4”). These quantities tan all

5 17

THE REPRESENTATION OF LINEAR OPERATORS

71

be put together and considered to form a more general kind of matrix having some discrete rows and columns and also a continuous range of rows and columns. We define unit matrix, Hermitian matrix, diagonal matrix, and the product of two matrices also for this more general kind of matrix so as to make the rules (i)-(v) still hold. The details are a straightforward generalization of what has gone before
and need not be given explicitly. Let us now go back to the general case of several [‘s, kl, fa,..., k,,.
The representative of 01, expression (39) may still be looked upon as forming a matrix, with rows corresponding to different values of Si,. . ., & and columns corresponding to different values of [i,. .., fi. Unless all the ,$‘s have discrete eigenvalues, this matrix will be of the
generalized kind with continuous ranges of rows and columns. We again arrange our definitions so that the rules (i)-(v) hold, with rule
(iv) generalized to: (iv’) Esch tn, (rn = 1, 2,..., u> and any function of them is repre-
sented by a diagonal matrix. A diagonal matrix is now defined as one whose general element (&,..&~w~~~...~~> is of the form

in the case when fl,..,,$ V have discrete eigenvalues and &,+l, .., tU have continuous eigenvalues, c’ being any function of the 6”s. This definition is the generalization of what we had with one 4‘ and makes diagonal matrices always commute with one another. The other definitions are straightforward and need not be given explicitly.
We now have a linear Operator always represented by a matrix.
The sum of two linear Operators is represented by the sum of the matrices representing the Operators and this, together with rule (v), means that the nuztrices are subject to the same algebraic relations as the linear olperators. If any algebraic equation holds between certain linear Operators, the same equation must hold between the matrices representing those Operators.
The scheme of matrices tan be extended to bring in the representatives of ket and bra vectors. The matrices representing linear Operators are all Square matrices with the Same number of rows and columns, and with, in fact, a one-one correspondence between their rows and columns. We may look upon the representative of a ket 1P) as a rrmtrix with a single wlumn by setting all the numbers

72

REPRESENTATIONS

0 17

(.&...&lP) which form this representative one below the other. The number of rows in this matrix will be the Same as the number of rows or columns in the Square matrices representing linear Operators. Such a Single-column matrix tan be multiplied on the left by a Square matrix (&...&Icx~~~...~~)r epresenting a linear Operator, by a rule similar to that for the multiplication of two Square matrices. The product is another Single-column matrix with elements given by

From (35) this is just equal to (~;...&Icx~P), the. representative of 011 P). Similarly we may look upon the representative of a bra (Q / as a matrix with a Single row by setting all the numbers (QI~~...&> side by side. Such a Single-row matrix may be multiplied on the right by a Square matrix (~~...&Icx\~~...R), the product being another Single-row matrix, which is just the representative of <&Icx. The Single-row matrix representing (Q 1may be multiplied on the right by the Single-column matrix representing IP), the product being a matrix with just a Single element, which is equal to (Q IP). Finally, the Single-row matrix representing (Q 1 may be multiplied on the left by the Single-column matrix representing f P), the product being a Square matrix, which is just the representative of l.P)(Q 1. In this way all our abstract Symbols, linear Operators, bra vectors, and ket veetors, tan be represented by matrices, which are subject to the same algebraic relations as the abstract Symbols themselves.
18. Probability amplitudes Representations are of great importante in the physical interpreta-
tion of quantum mechanics as they provide a convenient method for obtaining the probabilities of observables having given values. In $ 12 we obtained the probability of an observable having any specified value for a given state and in $ 13 we generalized this result and obtained the probability of a set of commuting observables simultaneously having specified values for a given state. Let us now apply this result to a complete set of commuting observables, say the set of f’s which we have been dealing with already. According to formula (51) of 5 13, the probability of each 5,. having the value 6; for the state corresponding to the normalized lret vector IX) is

§ 18

PROBABILITY AMPLITUDES

73

If the 6’s all have discrete eigenvalues, we tan um (35) with v = U, and no integrals, and get

We thus get the simple result that the probahility of the 6’s kving the vulues 6’ is just the Square of the modulus of the appropriate coordinate of the normalized ket vector corresponding to the stade concerned.
If the LJ’S do not all have discrete eigenvalues, but if, say, fl,.., &, have discrete eigenvalues and &,+r ,. . , fU have continuous eigenvalues,, then to get something physically significant we must obtain the probability of each (Jr (r = l,.., v) having a specified value C and each & (8 = v+L., U) lying in a specified small range 59 to [:+c@:. For this purpose we must replace each factor Sg8g; in (50) by a factor xS, which is that function of the observable & which is equal to unity for & within the range [i to &+dtL and zero otherwise. Proceeding as before with the help of (35), we obtain for this probability
Thus in every case the probability distribution of values for the e’~ is given by the squure of the modulus of the representative of the normalixed ket vector corresponding to the stute concerned.
The numbers which form the representative of a normalized ket (or bra) may for this reason be called probability ampiitudes. The Square of the modulus of a probability amplitude is an ordinary probability, or a probability per unit range for those variables that have continuous ranges of values.
We may be interested in a state whose corresponding ket IX) cannot be normalized. This occurs, for example, if the state is an eigenstate of some observable belonging to an eigenvalue lying in a range of eigenvalues . The formula (51) or (52) tan then still be used to give the relative probability of the 6’s having specified values or having values lying in specified small ranges, i.e. it will give correctly the ratios of the probabilities for different 4”s. The numbers (&...&lx> may then be called relative probability amplitudes.

74

REPRESENTATIONS

§ 18

The representation for which the above results hold is characterized

by the basic vectors being simultaneous eigenvectors of all the f’s.

It may also be characterized by the requirement that each of the 5’s

shall be represented by a diagonal matrix, this condition being easily

seen to be equivalent to the previous one. The latter characterization

is usually the more convenient one. For brevity, we shall formulate

it as each of the 6’s ’ being diagonal in the representation’.

Provided the f’s form a complete set of commuting observables,

the representation is completely determined by the characterization,

apart Flom arbitrary Phase factors in the basic vectors. Esch basic bra

(ei.. .& 1may be multiplied by eiy’, where y’ is any real function of

the variables &..., &, without changing any of the conditions which

the representation has to satisfy, i.e. the condition that the E’s are

diagonal or that the basic vectors are simultaneous eigenvectors of

the 5’8, and the fundamental properties of the basic vectors (34) and

(35). With the basic bras changed in this way, the representative

(~~..&IP> of a ket /P) gets multiplied by eir’, the representative

(& It;...&) of a bra (& 1gets multiplied by e-iy’ and the representa-

t i v e (&...&lal~;...~~) o af ‘nhear Operator cx gets multiplied by eflr’--r?

The probabilities or relative probabilities (51), (52) are, of course,

unaltered.

The probabilities that one calculates in practical Problems in

quantum mechanics are nearly always obtained from the squares

of the moduli of probability amplitudes or relative probability ampli-

tudes. Even when one is interested only in the probability of an

incomplete set of commuting observables having specified values, it

is usually necessary first to make the set a complete one by the

introduction of some extra commuting observables and to obtain

the probability of the complete set having specified values (as the

Square of the modulus of a probability amplitude), and then to sum

or integrate over all possible values of. the extra observables. A

more direct application of formula (51) of $ 13 is usually nof

practicable.

To introduce a representation in practice

(i) We look for observables which we would like to have diagonal,

either because we are interested in their probabilities or for

reasons of mathematical simplicity ;

(ii) We must see that they all commute-a necessary condition

since diagonal matrices always commute ;

j.

9 18

PROBABILITY AMPLITUDES

75

(iii) We then sec that they form a complete commuting set, and if not we add some more commuting observables to them fo make them into a complete commuting set ;
(iv) We set up an orthogonal representation with this complete commuting set diagonal.
The representation is then completely determined except for the arbitrary Phase factors. For most purposes the arbitrary Phase factors are unimportant and trivial, so that we may count the
representation as being completely determined by the observables that are diagonal in it. This fact is already implied in our notation, since the only indication in a representative of the representation to which it belongs are the letters denoting the observables that are diagonal.
It may be that we are interested in two representations for the same dynamical System. Suppose that in one of them the complete set of commuting observables [i,..., eU are diagonal and the basic bras are <&...&] and in the other the complete set of commuting observables T~,. . . , vw are diagonal and the basic bras are (q;...&, 1. A ket 1P) will now have the two representatives {&...&I P> and <&.&lP>. If &,..> &, h ave discrete eigenvalues and &+l,.., fU have continuous eigenvalues and if Q,. . , 7% have discrete eigenvalues and ?lx+l,“) rlw have continuous eigenvalues, we get from (35)

and interchanging e’s and 7’s

These are the transformation equations which give one representative of IP) in terms of the other. They show that either representative is expressible linearly in terms of the other, with the quantities
as coefficients. These quantities are called the transformtion funcGons. Similar equations may be written down to connect the two representatives of a bra vector or of a linear Operator. The transformation functions (55) are in every case the means which enable one to pass fiom one representative to the other. Esch of the

76

REPRESENTATIONS

4 18

transformation functions is the conjugate complex of the other, and they satisfy the conditions
&, j-s (~;..qX;...iQ dt;+,.dS; <~;-~lrl;..~W) 1.. v

=

8 r)irlT" 6qz’ qlz 6(r15+1-rlr+l)..6(rl:o-rlf;)

(56)

and the corresponding conditions with 6’s and 17’s interchanged, as

may be verified from (35) and (34) and the corresponding equations

for the 77’s.

Transformation functions are examples of probability amplitudes or relative probability amplitudes. Let us take the case when all the 6’s and all the 7’s have discrete eigenvalues. Then the basic ket /qi...&) is normalized, so that its representative in the f-representation, {~;...&I~;...&,), is a probability amplitude for each set of values

for the [“s. The state to which these probability amplitudes refer, namely the state corresponding to 1y;.. .$,,), is characterized by the condition that a simultaneous measurement of Q,. . ., Q,, is certain to lead to the results &...,&. Thus I([;...&[$...&,)12 is the probability of the 5’s having the values &,...& for the state for which the 7’s certainly have the values $...&. Since

ro;...c.Jq;...&7>12 = rc~~...~Wl~;...~~>I”,

we have the theorem of reciprocity-the probability of the e’s having the values [’ for the state for which the r]‘s certainly huve the values q’ is equal to the probability of the q’s having the values 7’ for the state for which the f’s certainly haue the values 4’.
If all the q’s have discrete eigenvalues and some of the e’s have continuous eigenvalues, 1{Ei.. .eh 1~;. . . $,J l2 still gives the probability distribution of values for the 4’s for the state for which the 7)‘s certainly have the values 7’. If some of the 7’s have continuous eigenvalues, IT;...&,> is not normalized and I(~~...&I$...&>I” then gives only the relative probability distribution of values for the 4’s for the state for which the 7’s certainly have the values 7’.

19. Theorems about functions of observables We shall illustrate the mathematical value of representations by
using them to prove some theorems.
THEOREM 1. A linear Operator that commutes with an observable 6 commutes also with any function of 4.
The theorem is obviously true when the function is expressible as

3 19 T H E O R E M S ABOUT FUNCTIONS OF O B S E R V A B L E S

77

a power seiies. To prove it generally, Iet w be the linear Operator,

so that we have the equation

t+-05 = 0.

(57)

Let us introduce a representation in which [ is diagonal. If 6 by itself does not form a complete commuting set of observables, we must

make it into a complete commuting set by adding certain observables, /? say, to it, and then take the representation in which t and the ,B’s are diagonal. (The case when 6 does form a complete commuting set by itself tan be looked upon as a special case of the preceding one
with the number of /3 variables Zero.) In this representation equation

(57) becomes

<!?ß’I!sJ -co&y’ß”> = 0,

which reduces to

6$‘(ty 10 llf’ß’) - (f’ß’ Ii.0 lly’ß’)y = 0.

In the case when the eigenvalues of 4 are discrete, this equation

Shows that all the matrix elements (f’ß’ lolQ”ß’) of w vanish except

those for which 5’ = f”. In the case when the eigenvalues of 6 are

continuous it Shows, like equation (48), that @ß’ 10 Ie”ß”> is of the

form

(fß’ 10 I(“ß’) = c S(tj’q’),

where c is some function of f’ and the ß”s and p”‘s. In either case we may say that the matrix representing w ‘is diagonal with respect to 6’. Iff([) denotes any function of 6 in accordance with the general theory of 3 I 1, which requires f(r) to be deflned for ,$“’ any eigenvalue of 5, we tan deduce in either case

This gives

<fß’ If(kl @-Jf(J3 ll”ß”) = 0,

so that

f(k) Ce-Wf(f) = 0

and the theorem is proved.

‘As a special case of the theorem, we have the result that any

observable that commutes with an observable E also commutes with

any function of 4. This result appears as a physical necessity when

we identify, as in $13, the condition of commutability of two

observables with the condition of compatibility of the correspond-

ing observations. Any Observation that is compatible with the

measurement of an observable 6 must also be compatible with the

measurement of f(e), since any measurement of 6 includes in itself

a measurement of f( t).

78

REPRESENTATIONS

0 19

THEOREM 2. A linear Operator thut commutes with euch of a complete set of commuting observables is a function of those observables.

Let o be the linear Operator and el, c2,. . . , eU the complete set of commuting observables, and set up a representation with these observables diagonal. Since w commutes with each of the 8’5, the matrix representing it is diagonal with respect to each of the t’s, by the argument we had above. This matrix is therefore a diagonal matrix and is of the form (49), involving a number c’ which is a function of the (“s. It thus represents the function of the [‘s that c’ is of the e”s, and hence o equals this function of the f’s.

TEEOREM 3. If an observable 6 and a linear Operator g are such that any linear Operator thut commutes with f also commutes with g, thea g is a ficnction of 5.

This is the converse of Theorem 1. To prove it, we use the same representation with f diagonal as we had for Theorem 1. In the first place, we see that g must commute with 6 itself, and hence the representative of g must be diagonal with respect to e, i.e. it must be of the form

<~‘B’lslf’P”> = atS’W’)~~g~ or 4S’B’P”)W’-%“), according to whether 6 has discrete or continuous eigenvalues. Now let o be any linear Operator that commutes with f, so that its representative is of the form

(fjs’ 10 I[“ß”) = b([‘p’/I”)6pp or b([‘/3’/3’)8([‘---f”).

By hypothesis w must also commute with g, so that’

<CB kW -og(~“p’) = 0.

6-w

If we suppose for definiteness that the Iß’s have discrete eigenvalues,

(58) leads, with the help of the law of matrix multiplication, to

ß2”’ {a(~‘~‘jY”)b(~‘/3”‘/3’)-b(~‘/3’/3”’)cz(~’/3”’/3”)~

= 0,

(59)

the left-hand side of (58) being equal to the left-hand side of (59)

multiplied by $6 or S([‘---6”). Equation (59) must hold for all

functions b(f’lg’JQ”). We tan deduce that

43v”) = 0 for j3’ # fl”, a( [‘/3’/3’) = a( f’/3”/3”).
The first of these results Shows that the matrix representing g is diagonal and the second Shows that a(f’/3’p’) is a function of 4’ only. We tan now infer that g is that function of f which c@‘jY@‘) is of [‘,

§ 19 T H E O R E M S ABOUT FUNCTIONS O F O B S E R V A B L E S

79

so the theorem is proved. The proof is analogous if some of the B’s
have continuous eigenvalues. Theorems 1 and 3 are still valid if we replace the observable 6 by
any set of commuting observables fl, f2,.., &., only formal changes being needed in the proofs.

20. Developments in notation The theory of representations that we have developed provides a
general System for labelling kets and bras. In a representation in which the complete set of commuting observables (J1,... , Eu are diagonal any ket 1-P) will have a representative (&...&IP>, or (l’/P) for brevity. This representative is a definite function of the variables [‘, say $(E’). The function # then determines the ket IP) completely, so it may be used to label this ket, to replace the arbitrary label P. In Symbols,
if
we put
We must put IP) equal to l+(4)> and not $(f’)>, since it does not depend on a particular set of eigenvalues for the [‘s, but only on the form of the function #.
With f(t) any function of the observables El,..., CU, f(f)IP) will have as its representative

CE’ If(O I p> = f(E’J#(S’) * Thus according to (60) we put

f(6) v-3 = lfW(~b

With the help of the second of equations (60) we now get

f(t) W(5)> = If(S)sL(f)>*

(61)

This is a general result holding for any functions f and # of the e’s,

and it Shows that the vertical line 1is not necessary with the new

notation for a ket-either side of (61) may be written simply as

f(~)#([)). Thus the rule for the new notation becomes:-

if we put

(W> = VW) lP> = ?m>*

1 (62)

We may further shorten I(t)) to $>, leaving the variables [ under-

stood, if no ambiguity arises thereby.

The bt tw> may be considered as the product of the linear Operator #([) with a ket which is denoted simply by ) without a label. We cal1 the ket ) the standard ket. Any ket whatever tan be

80

REPRESENTATIONS

§ 20

expressed as a function of the 6’s multiplied into the Standard ket. For example, taking ]P) in (62) to be the basic ket It”>, we find

(63)
in the case when tl,.., & have discrete eigenvalues and &,+l,. ., 4, have continuous eigenvalues. The Standard ket is characterized by the condition that its representative (5’ 1) is unity over the whole domain of the variable t’, as may be seen by putting # = 1 in (62).
A further contraction may be made in the notation, namely to leave the Symbol ) for the Standard ket understood. A ket is then written simply as #(JJ), a function of the observables 5. A function of the 5’s used in this way to denote a ket is called a wave function.? The System of notation provided by wave functions is the one usually used by most authors for calculations in quantum mechanics. In using it one should remember that each wave function is understood to have the Standard ket multiplied into it on the right, which prevents one from multiplying the wave function by any Operator on the right. Waue functions tan be multiplied by Operators only on the Zeft. This distinguishes them from ordinary functions of the Os, which are Operators and tan be multiplied by Operators on either the left or the right. A wave function is just the representative of a ket expressed as a function of the observables f, instead of eigenvalues e’ for those observables. The Square of its modulus gives the probability (or the relative probability, if it is not normalized) of the &‘s having specified values, or lying in specified small ranges, for the corresponding state.
The new notation for bras may be developed in the Same way as i for kets. A bra (&I whose representative (&[f’) is #‘) we write r (~/(fl)l. With this notation the conjugate imaginary to I$(t)) is
(g(e) 1. Thus the rule that we have used hitherto, that a ket and its conjugate imaginary bra are both specified by the Same label, must be extended to read-if the labels of a Eet involve cornplex numbers or cmplex functions, the lubels of the conjugate irnaginary bra involve the conjugate cornplex numbers or functions. As in the
case of kets we tan show that (#)lf(k) and (~(@jo)~ are the Same,
so that the vertical line tan be omitted. We tan consider (c#) as the product of the linear Operator +(f) into the Standard bra (, which

t The reason for this name is that in the early daye of quantum mechanics all the examples of these functions were of the form of waves. The name is not a descriptive one from the Point of view of the modern general theory.

9 20

DEVELOPMENTS IN NOTATION

is the conjugate imaginary of the Standard ket ). We may leave
the Standard bra understood, so that a general bra is written as #),
the conjugate complex of a wave function. The conjugate complex
of a wave function tan be multiplied by any linear Operator on the
right, but cannot be multiplied by a linear Operator on the left. We
tan construct triple products of the form (f(t)>. Such a triple product is a number, equal to f(f) summed or integrated over the whole domain of eigenvalues for the E’s,

(64)

in the case when fr,.., ,$ V have discrete eigenvalues and &,+l,. . ., & have continuous eigenvalues.
The Standard ket and bra are defined with respect to a representa-
tion. If we carried through the above work with a different representation in which the complete set of commuting observables r) are
diagonal, or if we merely changed the Phase factors in the representation with the 5’s diagonal, we should get a different Standard ket and bra. In a piece of work in which more than one Standard ket or bra appears one must, of course, distinguish them by giving them labels.
A further development of the notation which is of great importante for dealing with complicated dynamical Systems will now be discussed. Suppose we have a dynamical System describable in terms of dynamical variables which tan all be divided into two Sets, set A and set B say, such that any member of set A commutes with any member of set B. A general dynamical variable must be expressible as a function of the A-variables and B-variables together. We may consider another dynamical System in which the dynamical variables are the A-variables only-let us call it the A-System. Similarly we may consider a third dynamical System in which the dynamical variables are the B-variables only-the B-System. The original System tan then be looked upon as a combination of the A-System and the B-System in accordance with the mathematical scheme given below.
Let us take any ket Ia> for the A-System and any ket Jb} for the B-System. We assume that they have a product Ia) [b) for which the commutative and distributive axioms of multiplication hold, i.e.

lW> = P>W,

+4%>+d%>)I~) = Cl 1%) I~)+%l%) Ib>,

3595.67

I~OECl l~l)+%lb2 =Q Cl l~M,>+c,j~>Ibn>,

82

REPRESENTATIONS

9 20

the c’s being numbers. We tan give a meaning to any A-variable

operating on the product ja) Ib) by assuming that it operates only

on the Ia) factor and commutes with the Ib> factor, and similarly

we tan give a meaning to any B-variable operafing on this product

by assumiug that it operates only on the Ib) factor and commutes

with the ja) factor. (This makes every A-variable commute with

every B-variable.) Thus any dynamical variable of the original

System tan operate on the product Icc) Ib), so this product tan be

looked upon as a ket for the original System, and may then be written lab), the two labels a and b being sufficient to specify it.

In this way we get the fundamental equations

Im> = Ib> Ia> = lW*

(65)

The multiplication here is of quite a different kind from any that occurs earlier in the theory. The ket vectors Ia) and Ib) are in two different vector spaces and their product is in a third vector space, which may be called the product of the two previous vector spaces. The number of dimensions of the product space is equal to the product of the number of dimensions of each of the factor spaces. A general ket vector of the product space is not of the form (654, but is a sum or integral of kets of this form.
Let us take a representation for the A-System in which a complete set of commuting observables fA of the A-System are diagonal. We shall then have the basic bras (62 1for the A-System. Similarly, taking a representation for the B-System with the observables tB diagonal, we shall have the basic bras (&l for the B-System. The products

will then provide the basic bras for a representation for the original System, in which representation the tA’s and the fB’s will be diagonal. The fd’s and tB’s will together form a complete set of commuting observables for the original System. From (65) and (66) we get

Kl la><&?lb> =f <4a tilab>,

(67)

showing that the representative of jab) equals the product of the

representatives of Ia) and of Jb) in their respective representations.

We tan introduce the Standard ket, )a say, for the A-System,

with respect to the representation with the fA’s diagonal, and also

the Standard ket )B for the B-System, with respect to the repre-

sentation with the &‘s diagonal. Their product )a >* is then the

§ 20

DEVELOPMENTS IN NOTATION

83

Standard ket for the original System, with respect to the representa-

tion with the fB’s and tB’s diagonal. Any ket for the original System

may be expressed as

(68)

It may be that in a certain calculation we wish to use a particular

representation for the B-System, say the above representation with

the eB’s diagonal, but do not wish to introduce any particular

representation for the A-System. It would then be convenient to

use the Standard ket )* for the B-System and no Standard ket for

the A-System. Under these circumstances we could write any ket

for the original System as

I&3&3~

w

in which ItB) is a ket for the A-System and is also a function of the

fB’s, i.e. it is a ket for the A-System for each set of values for the

fB’s-in fact (69) equals (68) if we take

We may leave the Standard ket )B in (69) understood, and then we have the general ket for the original System appearing as IeB>, a ket
for the A-System and a wave function in the variables tB of the
B-System. Examples of this notation will be used in 5s 66 and 79.

The above work tan be immediately extended to a dynamical

System describable in terms of dynamical variables which tan be

divided into three or more sets A, 23, C,... such that any member of

one set commutes with any member of another. Equation (65) gets

generalized to

la)lb) Ic)... = pc...>,

the factors on the left being kets for the component Systems and the ket on the right being a ket for the original System. Equations

(66), (67), and (68) get generalized to many factors in a similar way.

IV
THE QUANTUM CONDITIONS
2 1. Poisson brackets
OUR work so far has consisted in setting up a general mathematical
scheme connecfing states and observables in quantum mechanics. One of the dominant features of this scheme is that observables, and dynamical variables in general, appear in it as quantities which do not obey the commutative law of multiplication. It now becomes necessary for us to obtain equations to replace the commutative law of multiplication, equations that will tell us the value of [r] - 76 when 6 and 7 are any two observables or dynamical variables. Only when such equations are known shall we have a complete scheme of mechanics with which to replace classical mechanics. These new equations are called quantum conditions or comnutation relations.
The Problem of finding quantum conditions is not of such a general Character as those we have been concerned with up to the present. It is instead a special Problem which presents itself with each particular dynamical System one is called upon to study. There is, however, a fairly general method of obtaining quantum conditions, applicable to a very large class of dynamical Systems. This is the method of classical anulogy and will form the main theme of the present chapter. Those dynamical Systems to which this method is not applicable must be treated individually and special considerations used in each case.
The value of classical analogy in the development of quantum mechanics depends on the fact that classical mechanics provides a valid description of dynamical Systems under certain conditions, when the particles and bodies composing the Systems are sufficiently massive for the disturbance accompanying an Observation to be negligible. Classical mechanics must therefore be a limiting case of quantum mechanics. We should thus expect to find that important concepts in classical mechanics correspond to important concepts in quantum mechanics, and, from an understanding of the general nature of the analogy between classical and quantum mechanics, we may hope to get laws and theorems in quantum mechanics appearing as simple generalizations of well-known results in classical mechanics; in particular we may hope to get the quantum conditions appearing
.

J2 21

POISSON BRACKETS

as a simple generalization of the classical law that all dynttmical variables commute.
Let us take a dynamical System composed of a number of particles in interaction. As independent dynamical variables for dealing with the System we may use the Cartesian coordinates of all the particles and the corresponding Cartesian components of velocity of the particles. It is, however, more convenient to work with the momentum components instead of the velocity components. Let us cal1 the coordinates qr, r going from 1 to three times the number of particles, and the corresponding momentum components 13,. The q’s and p’s are called canonical coordinates and momenta.
The method of Lagrange’s equations of motion involves introducing coordinates qp and momenta pT in a more general way, applicable also for a System not composed of particles (e.g. a System containing rigid bodies). These more general q’s and JYS are also called canonical coordinates and momenta. Any dynamical variable is expressible in terms of a set of canonical coordinates and momenta.
An important concept in general dynamical theory is the Poisson Bracket. Any two dynamical variables u and v have a P.B. (Poisson Bracket) which we shall denote by [u, v], defined by

(1)

u and v being regarded as functions of a set of canonical coordinates and momenta q,, and 13,. for the purpose of the differentiations. The right-hand side of (1) is independent of which set of canonical coordinates and momenta are used, this being a consequence of the general definition of canonical coordinates and momenta, so the P.B. [u,v] is well defined.

The main properties of P.B.‘s, which follow, at once from their definition (l), are

[u, v] = -p, Ul,

(2)

[w-j = 0,

(3)

where c is a number (which may be considered as a special case of a

dynamical variable),

THE QUANTUM CONDITIONS

8 21

= [Ul, “]u2+u,[u,, v], [u, VI 021 = [u, v11v2++4v21.

(5)

Also the identity

1% [v, w]]+[v, [w, u]]+[w, [u, v]] = 0

(6)

is easily verified. Equations (4) express that the P.B. [u, v] involves

u and v linearly, while equations (5) correspond to the ordinary rules

for differentiating a product.

Let us try to introduce a quantum P.B. which shall be the analogue

of the classical one. We assume the quantum P.B. to satisfy all the

conditions (2) to (6), it being now necessary that the Order of the

factors u1 and uz in the first of equations (5) should be preserved throughout the equation, as in the way we have here w-ritten it, and

similarly for the v1 and v2 in the second of equations (5). These conditions are already sticient to determine the form of the quantum

P.B. uniquely, as may be seen from the following argument. We tan evaluate the P.B. [ul us, v1 v2] in two different ways, since we tan use
either of the two formulas (5) first, thus,

[Ul U2Y Vl %l = ~~~~211~21~2+~~~~2~~~~21

= (C% “11~2+“1c% v21>u2+%@2> “11~2+~&27 van

and

= [~~>~,P2~2+~,[~,~~21~2+~,[~2~~,1~2+~,~,~~2~~21

[Ul UZ> VI v21 = Cu1 Uz9 v&2+4?% u29 v21
= [UD %]u,v,+~,lu,~ ~&a+“1[% “zluz+% %[U27 v21. Equating these two results, we obtain

[Ul, ~&J27J2-~2U,) = mv-~I d-u,> v,l*

Since this condition holds with ui and v1 quite independent of u2 and

vuZ, we must hsve

Ul V~--V~ Ul = i?i[ul, VJ,

u2v2-42u2 = iqu,, v21, where fi must not depend on u1 and vl, nor on u, and v2, and also
must commute with (u,v, -vl u,). It follows that fi must be simply a number. We Want the P.B. of two real variables to be real, as in
the classical theory, which requires, from the work at the top of p. 28, that 6 aha11 be a real number when introduced, as here, with the

a

*

§ 21

POISSON BRACKETS

87

coefficient i. We arc thus led to the following definition for the quuntum P.B. [u, v] of any two variables u and v,

UV-vu = iqu, q,

(7)

in which 6 is it new universal constant. It has the dimensions of action. In Order that the theory may agree with experiment, we must take $5 equal to h/%, where h is the universal constant that was introduced by Planck, known as Planck’s constant. It is easily verifled that the quantum P.B. satisfies all the conditions (2), (3), (4),

W , and (6). The Problem of finding quantum conditions now reduces to the
Problem of determining P.B.‘s in quantum mechanics. The strong analogy between the quantum P.B. defined by (7) and the classical P.B. defined by (1) leads us to make the assumption that the quantum
P.B.‘s, or at any rate the simpler ones of them, have the same values as the corresponding classical P.B.‘s. The simplest P.B.‘s arc those involving the canonical coordinates and momenta themselves and have the following values in the classical theory:

We therefore assume that the corresponding quantum P.B.‘s also have the values given by (8). By eliminating the quantum P.B.‘s with the help of (7), we obtain the equations

Qr Qs-%Pr = 0, PirPs-%Pr = 09 %%-%4r = =LY

1 (9)

which are the fundurnental quuntum conditions. They Show us where

the lack of commutability among the canonical coordinates and

momenta lies. They also provide us with a basis for calculating com-

mutation relations between other dynamical variables. For instance,

if [ and r) are any two functions of the q’s and p’s expressible as

power series, we may express [y--$ or [f, 71, by repeated applica-

tions of the laws (2), (3), (4), and (6), in terms of the elementary

P.B.‘s given in (8) and so evaluate it. The result is often, in simple

cases, the same as the classical result, or departs from the classical

result omy through requiring a special Order for factors in a product,

this Order being, of course, unimportant in the classical theory. Even

when f and 7 are more general functions of the q’s and p’s not ex-

pressible as power series, equations (9) are still sufficient to fix the

88

THE QUANTUM CONDITIONS

§ 21

value of [T--$, as will become clear from the following werk.

Equ&ions (9) thus give the Solution of the Problem of finding the quantum conditions, for all those dynamical Systems which have a classical analogue and which are describable in terms of canonical coordinates and momenta. This does not include all possible Systems

in quantum mechanics. Equations (‘7) and (9) provide the foundation for the analogy
between quantum mechanics and classical mechanics. They show

fhat classical mechunics may be regarded us the limiting case of quuntum mechunics when 5 tends to Zero. A P.B. in quantum mechanics is a purely algebraic notion and is thus a rather more fundamental concept than a classical P.B., which tan be defined only with reference to

a set of canonical coordinates and momenta. For this reason canonical coordinates and momenta are of less importante in quantum mechanics than in classical mechanics; in fact, we may have a System in quanturn mechanics for which canonical coordinates and momenta do not exist and we tan still give a meaning to P.B.‘s. Such a System would be one without a classical analogue and we should not be able to obtain its quantum conditions by the method here described.
From equations (9) we see that two variables with different suffixes r and s always commute. It follows that any function of qT and p,, will commute with any function of qS and p, when s differs from r. Different values of r correspond to different degrees of freedom of the dynamical System, so we get the result that dynumical variables referring to different degrees of freedom commute. This law, as we have derived it from (9), is proved only for dynamical Systems with classical analogues, but we assume it to hold generally. In this way we tan make a Start on the Problem of finding quantum conditions. for dynamical Systems for which canonical coordinates and momenta do not exist, provided we tan give a meaning to different degrees of freedom, as we may be able to do with the help of physical insight.
We tan now see the physical meaning of the division, which was discussed in the preceding section, of the dynamical variables into Sets, any member of one set commuting with any member of another. Esch set corresponds to certain degrees of freedom, or possibly just one degree of freedom. The division may correspond to the physical process of resolving the dynamical System into its constituent Parts, each constituent being capable of existing by itself as a physical System, and the various constituents having to be brought into

§ 21

POISSON BRACKETS

89

interaction with one another to produce the original System. Alternatively the division may be merely a mathematical procedure of resolving the dynamical System into degrees of freedom which cannot be separated physically, e.C,T. the System consisting of a particle with internal structure may be divided into the degrees of freedom describing the motion of the centre of the particle and those describing the internal structure.

22. Schrödinger’s representation

Let us consider a dynamical System with n degrees of freedom

having a classical analogue, and thus describable in terms of canonical

coordinates and momenta q,.,p, (r = 1,2,... , n). We assume that the

coordinates qr are all observables and haue continuous ranges of eigenvalues, these assumptions being reasonable from the physical significance of the q’s. Let us set up a representation with the q’s diagonal. The question arises whether the q’s form a complete commuting set for this dynamical System. It seems pretty obvious from inspection

that they do. We shall here assume that they do, and the assumption will be justified later (see top of p. 92). With the q’s forming a complete commuting set, the representation is fixed except for the

arbitrary Phase factors in it.

Let us consider first the case of n = 1, so that there is only one q

and ~p, satisfying

qp-pq = in.

Any ket may be written in the Standard ket notation #(q)>. From it we tan form another ket d#/dq), whose representative is the derivative of the original one. This new ket is a linear function of the original one and is thus the result of some linear Operator applied to

the original one. Calling this linear Operator d/dq, we have

gn = -“,.

(11)

Equation (11) holding for all functions $ defines the linear Operator

dldq. We have

g->-o.

(12)

Let us treat the linear Operator d/dq according to the general theory 8f linear Operators of 6 7. We should then be able to apply it to a bra
(4(q), the product ($d/dq being defined, according to (3) of $ 7, by

90

T H E QUA.NTUM CONDITIONS

0 22

for all functions #Q). Taking representatives, we get

(14)
We tan transform the right-hand side by partial integration and get

provided the contributions from the limits of integration vanish.

This gives

<+ddqw = -9,

showing that

<Q,=

-(!g$- .

(16)

;I .

:-

dJ

T’hus dldq operating to the left on the conjugate complex of a wave ’ ’

function has the meaning of minus differentiation with respect to q.

The validity of this result depends on our being able to make the

passage f?om (14) to (15), which requires that we must restritt our-

selves to bras and kets corresponding to wave functions that satisfy

suitable boundary conditions. The conditions usually holding in

practice are that they vanish at the boundaries. (Somewhat more

general conditions will be given in the next section.) These conditions

do not limit the physical applicability of the theory, but, on the con-

trary, are usually required also on physi4 grounds. For example,

if q is a Cartesian coordinate of a particle, its eigenvalues run from

-00 to CO, and the physical requirement that the particle has Zero

probability of being at infinity leads to the condition that the wave

function vanishes for q = &co.

ri ’ ,.f1:’ The conjugate complex of fhe linear Operator d/dq tan be evaluated 1;. by noting that the conjugate imaginary of d/dq. #) or d#/dq) is

i

’

01. s4J-df&fd, qor,

so

(4 d/dq

d/dq is a

from (16). Thus the conjugate pure imginary linear Operator.

complex

of

d/dq

To get the representative of djdq we note that, from an application

of formula (63) of 5 20,

k”> = WPfD,

(17)

and hence

(19)

The representative of dldq involves the derivative of the 8 function.