zotero-db/storage/MTHW7KPR/.zotero-ft-cache

t
;-; r
. I;
.1 .1 •

This is an introduction to set theory and logic that starts completely from scratch. The text is accompanied by many methodological remarks and explanations. A rigorous axiomatic presentation of Zermelo-Fraenkel set theory is given, demonstrating how the basic tdncepts of mathematics have apparently been reduced to set theory. This is followed by a presentation of propositional and firstorder logic. Concepts and results of recursion theory are explained in intuitive terms, and the author proves and explains the Iimitative results of Skolem, Tarski, Church and Godel (the celebrated incompleteness theorems).
For students of mathematics or philosophy this book provides an excellent introduction to logic and set theory.
Cover design by Chris McLeod
CAMBRIDGE
UNIVERSITY PRESS

Set Theory, Logic and their Limitations
[

---~-------------------------------------

Set Theory, Logic and their Limitations
Moshe Machover
King's College London
~ CAMBRIDGE
~ UNIVERSITY PRESS

Published by the Press Syndicate of the University of Cambridge The Pitt Building, Trumpington Street, Cambridge CB2 lRP 40 West 20th Street, New York, NY 10011-4211, USA 10 Stamford Road, Oakleigh, Melbourne 3166, Australia
© Cambridge University Press 1996
First published 1996 A catalogue record for this book is available from the British Library
Library of Congress cataloguing in publication data available ISBN O521 47493 0 hardback ISBN O521 47998 3 paperback
Transferred to digital printing 2003
KT

Contents

Preface

vii

0 Mathematical induction

1

1 Sets and classes

9

2 Relations and functions

23

3 Cardinals

36

4 Ordinals

53

5 The axiom of choice

77

6 Finite cardinals and alephs

88

7 Propositional logic

101

8 First-order logic

142

9 Facts from recursion theory

194

10 Limitative results

210

Appendix: Skolem's Paradox

275

Author index

283

General index

284

V

Preface
This is an edited version of lecture notes distributed to students in two of my courses, one on set theory, the other on quantification theory and limitative results of mathematical logic. These courses are designed primarily for philosophy undergraduates at the University of London who bravely choose the Symbolic Logic paper as one of their Finals options. They are also offered to mathematics undergraduates at King's College, London.
This then is a discourse addressed by a mathematician to an audience with a keen interest in philosophy. The style of technical presentation is mathematical. In particular, in logical notation and terminology I generally conform to the usage of mathematicians. (It seems that in this matter philosophers in any case tend follow suit after some delay.) But philosophical and methodological issues are often highlighted instead of being glossed over, as is quite common in texts addressed primarily to students of mathematics.
A naive presentation of set theory may be in order if the main aim is instrumental: to acquaint would-be practitioners of mathematics with the basic tools of their chosen trade and to inculcate in them methods whereby nowadays the entire science is apparently reduced to set theory. In a course of that kind, the student is understandably not encouraged to scratch where it does not itch. But in the present course such an attitude would be out of place. To ,be sure, here as well set-theoretic concepts and results are needed as tools for formulating and proving results in mathematical logic. But it would be perverse not to alert would-be philosophers to the problematic aspects of settheoretic reductionism.
These considerations have largely dictated the presentation of set theory: axiomatic, albeit unformalized. Critical notes about set theoretic reductionism are sounded from time to time as a leitmotiv, rounded off in a coda on Skolem's Paradox. Also, the technical
vii

viii

Preface

exposition of set theory is accompanied by historical remarks, mainly because a historical perspective is needed in order to appreciate the emergence of reductionism and the anti-reductionist critique.
In the exposition of mathematical logic, I have drawn heavily on Chs. l, 2, 3 and 7 of B&M (see Note below), which I had used for many years as a main text for a postgraduate logic course. However, considerable portions of the material presented in B&M had to be omitted, either because they are too hard or specialized, or simply for lack of time.
My greatest regret is that there is not enough time to include both linear and rule-based logical calculi (my own favourite is the tableau method). For certain technical reasons I had to sacrifice the latter. However, as partial compensation, the linear calculi are developed in a way that makes it clear that the logical axioms are mere steppingstones towards rules of deduction: once these rules are established, the axioms can be shelved. Thus in practice the presentation comes quite close to being rule-based. The axiom schemes have been designed so as to make their connection with deduction rules quite direct and transparent.
(The connoisseur will note that the propositional axiom schemes have been chosen so that omitting one, two or three of them results in complete systems for intuitionistic implication and negation, classical implication, and intuitionistic implication. In particular, the only axiom scheme that is not intuitionistically valid is a purely implicational one.)
Propositional logic is studied with reference to a purely propositional language, rather than a first-order language as in B&M. This is done for didactic reasons: although propositional languages in themselves are of little interest, students are less intimidated by this approach.
For some tedious proofs that have been omitted, the reader is referred to B&M. These omissions are more than balanced by the addition of extensive methodological and explanatory comments.
A case in point is Lemma 10.10.12 (see Note below), which is the main technical result needed for the present version of the GodelRosser First Incompleteness Theorem. I have omitted its proof, but added a detailed analysis of the meaning of the lemma and the reason why its proof works. When this is understood, the proof itself becomes a mere technicality, almost a foregone conclusion. The analysis is resumed after the proof of the Godel-Rosser Theorem, to explain the meaning of the Godel-Rosser sentence and the reason for its remarkable behaviour.

Preface

IX

One major respect in which this course is not self-contained is its heavy borrowing from recursion theory. For further details, see Preview at the beginning of Ch. 9.
The Problems are an essential part of the text; the results contained in many of them are used later on.

Moshe Machover

Note • Throughout 'B&M' refers to
J. L. Bell and M. Machover, A course in mathematical logic, North-Holland, 1977 (second printing 1986).
• The system of cross-references used here is quite common in mathematical texts. It is illustrated by the following example. 'Def. 2.3.4' refers to the fourth numbered article (which in this case is a definition) in § 3 of Ch. 2. Within Ch. 2, this definition is referred to, more briefly, as 'Def. 3.4'.
• I would like to express my gratitude to Roger Astley, Michael Behrend and Tony Tomlinson of Cambridge University Press for their expert help in preparing the manuscript.
Warning In the last three chapters of this book there is a systematic interplay between parallel sets of symbols; one set consisting of symbols in ordinary (feint) typeface:
'=', '-,', 'v', 'A','~', '3', '\/', 'X', '+'
and the other of their bold-face counterparts:
'=', '--,', 'v', 'A','--+', '3', 'V', 'X', '+'.
For explanations of the purpose of this system of notation, and warnings against confusing a feint symbol with its bold-face counterpart, see Warnings 8.1.2, 9.1.4 and 10.1.11 and Rem. 10.1.10.
Unfortunately the bold-face characters could not always be made as distinct from their feint counterparts as would be desirable. The reader is therefore urged to exercise special vigilance to discern which typeface is being used in each instance.

0 Mathematical induction

§ 1. Intuitive illustration; preliminaries A familiar trick: dominoes standing on end are arranged in a row; then

I I I I I ...

0 12

n n+l

the initial domino (here labelled '0') is given a gentle push - and the whole row comes cascading down.
If you want to perform this trick, how can you make sure that all the dominoes standing in a row will fall? Clearly, the following two conditions are jointly sufficient.

1. The initial domino (domino 0) is made to fall to the right (for example, by giving it a push).
2. The dominoes are arranged in such a way that whenever any one of them (say domino n) falls to the right, it brings down the next
domino after it (domino n + 1) and causes it also to fall to the
right.

A moment's reflection shows that these two conditions are sufficient whether the row of dominoes is finite or proceeds ad infinitum. (In the former case, Condition 2 does not apply to the last domino.)
The reasoning that allows us to infer from Conditions 1 and 2 that all the dominoes will fall is based on the Principle of Mathematical (or Complete) Induction. This is a fundamental - arguably the most fundamental - fact about the so-called natural numbers (0, 1, 2, etc.). It has several equivalent forms, three of which will be presented here.
1

2

0. Mathematical induction

WARNING
The term 'induction' used here has nothing to do with inductive reasoning in the empirical sense.

We shall make use of the following terminology and notation. By number we shall mean natural number. The class {O, 1, 2, ... } of all numbers will be denoted by 'N'. We shall use lower-case italic letters as variables ranging over N. If P is a property of numbers and n is any number, we write 'Pn' to mean that n has the property P. The extension of P is the class of all numbers n such that Pn. This class is denoted by ' {n : Pn}'. From an extensional point of view, Pis identified with its extension:
P = {n: Pn}; and hence Pn is equivalent tone P. (Here 'e' is short
for 'is a member of'.) We write '::;.' as short for 'implies that', 'iff' or '<c>' as short for 'if
and only if, "r:/' as short for 'for all', and 'm ~ n' as short for 'm < n
orm=n'. We state here as 'facts' the following elementary properties of the
ordered system of numbers.

I.I. Fact
The relation < between numbers is transitive: whenever k < m and m < n, then also k < n.

1.2. Fact
The relation < obeys the trichotomy: for any numbers m and n, exactly
one of the following three holds:
m <norm= nor n < m.

1.3. Fact
Every number n has an immediate successor n + 1, such that, for any m, n < m iff n + 1 ~ m.

1.4. Fact Zero is the least number: 0 ~ n for all n.

§2. Weak induction

3

1.5. Fact
* For any number m O. there is an n such that m = n + 1.

§2. Weak induction

Perhaps the most commonly used form of the Principle of Mathematical Induction is the so-called 'Weak' Principle of Induction. This
asserts, for any property P of numbers, that in order to prove \/nPn
(i.e., that all numbers have the property P), it is sufficient to prove two things: first, PO (i.e., that the number zero has P) and second,
\/n[Pn => P(n + 1)] (i.e., that whenever n is a number having the property P then its successor n + 1 also has P). In schematic form:

(2.1)

PO, 'v'n[Pn => P(n + 1)]
'v'nPn

A proof of a statement \/nPn by weak induction thus falls into two
sections. One section, called the basis of the inductive proof, is a proof
that PO holds. The other section, called the induction step, is a proof
that \/n[Pn => P(n + 1)]. When these two sections are completed (not
necessarily in the above order), the proof that\/nPn is complete.
In the induction step, in order to prove that \/n[Pn => P(n + 1)],
you have to show that if n is any number such that Pn holds, then
P(n + 1) holds as well. In other words, you have to deduce P(n + 1)
from the assumption that Pn holds. The latter assumption is called the
induction hypothesis.
The induction step is therefore performed as follows. You consider an arbitrary number, say n, about which you make just one assump-
tion: that Pn holds (the induction hypothesis). Using this assumption,
you try to deduce that P(n + 1). When this is achieved, the induction
step is complete.
In using the induction hypothesis Pn to deduce P(n + 1), you are
merely considering an arbitrary hypothetical n for which Pn holds, without however committing yourself to the assumption that such a
number exists; in other words, you a-re adopting Pn as a provisional
hypothesis. If you succeed in deducing P(n + 1) from this provisional
hypothesis, then you have established the conditional statement
Pn => P(n + 1); and as you have established this for arbitrary number n, you are entitled to infer that \/n[Pn => P(n + 1)).
Note that if you have completed the induction step only (without the

4

0. Mathematical induction

basis - that is, you have not proved that PO) then you are not entitled to conclude that Pn holds for all numbers n; indeed you are not even entitled to conclude that there exist any numbers n for which Pn holds. For example, let P be the property of being a number that is
greater than itself; so Pn means that n > n. Now, from the hypothesis n > n it is easy to deduce n + 1 > n + 1 (for example, by adding 1 to
both sides of the hypothesis); so we have shown that Vn[Pn =>
P(n + l)]. But it doesn't follow that there is any number greater than
itself.

2.2. Remark
The Weak Principle of Induction was first invoked in 1653 by Pascal in the proof of one of the results (Corollary 12) in his Traite du triangle arithmetique (published in 1665). Pascal does not give an explicit formulation of the principle in general, for arbitrary P; but from his presentation of the method of proof it is clear that the general principle is being invoked. We shall not reproduce Pascal's proof here. Instead, we shall illustrate the use of weak induction in proving a simpler result.

2.3. Example
We shall prove that, for all n,
0 + 1 + 2 + • • • + n = n(n + 1)/2.
PROOF
Define the property P by stipulating that Pn iff (*) holds for n. We show by weak induction that VnPn.
Basis. For n =0 the sum on the left-hand side reduces to 0, and the
value of the right-hand side is 0. Thus PO.
Induction step. Let n be any number such that Pn; thus our induction hypothesis is that (*) holds for this n. Then
0 + 1 + 2 + • • · + n + (n + 1) = n(n + 1)/2 + (n + 1) by ind. hyp., = (n + l)(n/2 + 1)
= (n + l)(n + 2)/2. (The last two steps consist of simple algebraic manipulation.) Thus

§ 3. Strong induction

5

from the induction hypothesis we have deduced that

0 + 1 + 2 + • • • + (n + 1) = (n + l)(n + 2)/2.

This equation says that P(n + 1) - it is the same as(*), but with n + 1

in place of n. So we have shown that Pn => P(n + 1).

■

§ 3. Strong induction
The so-called 'Strong' Principle of Induction can be stated schematically as follows:

(3.1)

v'n[v'm < nPm => Pn)

v'nPn

Here, as before, P is any property of numbers. We have written
''Ii m < nPm' as short for 'all numbers m smaller than n have the
property P'. Thus, to prove that all numbers have a given property P, it is
enough to prove that v'n[v'm < nPm => Pn). To do this, you have to show that if n is any number such that v'm < nPm holds, then Pn
holds as well; in other words, you have to deduce Pn from the
assumption that 'r:lm < nPm. This assumption is called the induction
hypothesis. Note that a proof by strong induction does not have a separate
'basis' section. As in the case of weak induction, here too the induction hypothesis
'Ii m < nPm is adopted provisionally, without presupposing it to be
actually true.
However, unlike the case of weak induction, here there is one
particular value of n for which the hypothesis v'm < nPm is in fact
always automatically true. To see this, observe that there does not
exist any m such that m < O; this follows at once from Facts 1.2 and 1.4. Therefore any statement of the form 'for all m < 0, ... ' (that is, ''Ii m < 0 ... ') is considered by convention to be vacuously true. In particular, v'm < OPm is always true.

3.2. Theorem
The Strong Principle of Induction follows from the Weak Principle of Induction.

6

0. Mathematical induction

PROOF
Assume that P is a property of numbers such that Vn[Vm < nPm => Pn] holds. We shall show, using weak induction, that \inPn holds as well. To this end, we define a new property Q by stipulating that, for any number n,

Qn <a>df \Im< nPm.

(The subscript 'df is short for 'definition'.) Note that our assumption regarding P can now be rewritten as

Vn[Qn => Pn].

We shall apply weak induction to Q, to prove that \inQn holds. First, observe that by(*) QO is the same as \Im< OPm, which - as
we have noted - is vacuously true. Next, let n be a number and suppose (as induction hypothesis) that
Qn holds. From this hypothesis we shall deduce that Q(n + 1) holds as
well. Using our induction hypothesis we infer from(**) that Pn holds. We
therefore have both Qn and Pn. But by(*) Qn means \Im< nPm. Therefore what we have shown is that

(***)

Pm holds for all m ~ n.

From Facts 1.2 and 1.3 it is easy to see that m ~ n is equivalent to
m < n + 1, hence(***) can be rephrased as

Pm holds for all m < n + 1,

which, by the definition (*) of Q, means that Q(n + 1) holds This

completes the proof of VnQn by weak induction.

From \inQn, which we have just proved, together with (**) it

follows at once that Pn holds for all n.

■

§ 4. The Least Number Principle
Let M be any class of numbers; that is, Mk N (Mis a subclass of N). By a least member of M we mean a number a E M such that a ~ m for allmeM.
Using Fact 1.2, it is easy to see that M cannot have more than one least member; so if M has a least member we can refer to the latter as the least member of M.

§ 4. The Least Number Principle

7

The Least Number Principle (LNP) states: If M ~ N and M is non-empty then M has a least member.

4.1. Theorem The LNP follows from the Strong Principle of Induction.
PROOF
Let M ~ N and suppose that M does not have a least member. We must show M is empty. To this end, let P be the property of not belonging to M. Thus, for any n,
Pn <a>ctf n ff. M.
To show that M is empty is tantamount to showing that VnPn holds. We shall do so by applying strong induction to P.
So let n be any number, and assume (as induction hypothesis) that
\Im < nPm holds. By the definition of P, our induction hypothesis means that for all m < n we have m ff. M. This is equivalent to saying that m < n is not the case for any m E M. But by Fact 1.2 this means
that n ,s; m for all m E M. Therefore n cannot belong to M, otherwise it would be the least member of M, contrary to our assumption that M has no such member. Hence Pn holds. and our induction is complete.
■
We shall now complete the cycle by proving:

4.2. Theorem The Weak Principle of Induction follows from the LNP.
PROOF
Let P be a property of numbers such that PO and Vn[Pn => P(n + 1)) hold. We must prove that \fnPn holds. This amounts to showing that the class
M =ctf {n: Pn does not hold}
is empty. By the LNP, it is enough to show that M has no least member.
Suppose that M does have a least member, m. Since PO holds, 0 is

8

0. Mathematical induction

not in M; hence m =I= 0. Therefore by Fact 1.5 there is a number n
such that m = n + 1.

From Fact 1.3 it follows at once that n < m. If n were in M, then we

would have m ..,; n, because m is the least member of M; but m ~ n is

excluded by Fact 1.2, since we already have n < m. Therefore n

cannot be in M, which means that Pn must hold.

From our assumption that 'v'n[Pn => P(n + l}] it now follows that

P(n + l} holds; in other words, Pm holds. But then m cannot be a

member of M, let alone the least member. Thus our assumption that

M has a least member leads to contradiction.

■

We have thus shown that the Weak Principle of Induction, the Strong Principle of Induction and the LNP are equivalent to one another.

4.3. Remark
While there is no evidence that the ancient Greek mathematicians knew the Principles of Weak and Strong Induction, they did use mathematical induction in the form of the LNP. We shall quote here from a proof of Proposition 31 in Euclid's Elements, Book VIII.
First we need a few definitions. By arithm6s (plural: arithmoi) the Greeks meant what we call natural number greater than 1. An arithmos b is said to measure an arithmos a if b < a and b goes into a (in modern terminology: bis a proper divisor of a). An arithmos a is said to be composite if there is an arithmos that measures it; otherwise, a is said to be prime.
In Proposition 31 of Book VII, Euclid claims that every composite arithmos is measured by some prime arithmos. He writes:
'Let a be a composite arithmos. I say that it is measured by some prime arithmos. For since a is composite, it will be measured by an arithmos, and let b be the least of the arithmoi measuring it.'
Here the LNP is clearly invoked. The proof is now easily concluded: b must be prime; otherwise, it would be measured by some smaller arithmos c, which must then also measure a - contrary to the choice of bas the least of the arithmoi measuring a.
Euclid also gives another proof of the same proposition, in which he uses yet another form of the Principle of Induction: There does not exist an infinite decreasing sequence of natural numbers .1
1 On these matters see David Fowler, 'Could the Greeks have used Mathematical Induction? Did they use it?', Physis, vol. 311994 pp. 252-265.

1 Sets and classes

1.1. Preview

§ 1. Introduction

Set theory occupies a fundamental position in the edifice of modern mathematics. Its concepts and results are used nowadays in virtually all standard mathematical discourse - not only in pure mathematics, but also in applied mathematics and hence in all the mathematics-based deductive sciences. In particular, set theory is used extensively in technical discussions of logic and analytical philosophy.
The purpose of Chs. 1-6 is to present a minimal core of set theory, adequate for the kind of application just mentioned. In particular, we shall provide the set-theoretical vocabulary, notation and results needed in later chapters, devoted to Symbolic Logic.
We shall not venture into the higher reaches of the theory, which are of interest to specialist set-theorists. Nor shall we attempt a systematic logical-axiomatic investigation of set theory itself.

1.2. Further reading There are hundreds of books on set theory, many of them very good. Among those pitched at a level similar to this course, there are two classics:
Abraham A Fraenkel, Abstract set theory, Paul R Halmos, Naive set theory.
Both contain more material than our course. Fraenkel's book is suitable for readers with relatively little previous mathematical knowledge. If you are mathematically more experienced, you may find it too slow or verbose. Halmos is then likely to be more suitable.
For a more advanced, logical-axiomatic study of set theory, the two
9

10

1. Sets and classes

original masterpieces are:

Kurt Godel, The consistency of the continuum hypothesis (1940), Paul J Cohen, Set theory and the continuum hypothesis (1966).

An alternative exposition of Godel's results and some additional related material is in Chapter 10 of B&M. An alternative exposition of Cohen's results and much additional related material is in John L Bell, Boolean-valued models and independence proofs in set theory.

1.3. Intuitive explanation
Intuitively speaking, a set is a definite collection, a plurality of objects of any kind, which is itself apprehended as a single object.
For example, think of a lot of sheep grazing in a field. They are a collection of sheep, a plurality of individual objects. However, we may (and often do) think of them - it - as a single object: a herd of sheep.1
Note that in order to qualify as a set, the collection in question must be definite. By this we mean that, if a is any object whatsoever, then a either definitely belongs to the collection or definitely does not. For this reason there is no such thing as the set of all blue cars, if 'blue' and 'car' are understood in their everyday fuzzy sense: my car is sort of bluish, and a friend of mine has a vehicle that is half-way between a car and a sad joke. (Most collections and concepts that are used in everyday thinking and discourse are fuzzy; some philosophers have therefore attempted to construct a theory of so-called fuzzy sets which are clearly not sets at all in the present sense of the term. This difficult subject lies outside the scope of our course.)
From now on, whenever we speak of a collection (or plurality) we shall tacitly take it to be definite, in the sense just explained. We shall also use the word class as synonymous with collection.
The objects belonging to a class may be of any kind whatsoever physical or mental, real or ideal. In fact, being an object (in the sense in which we shall use this term) is tantamount to being capable of belonging to a collection.
In particular, since a set is a class regarded as a single object, it can itself belong to a class. So we can have a class some, or even all, of
1 Cf. Eric Partridge, Usage and abusage: 'coU,EcnVE NOUNS; ... Such collective nouns as can be used either in the singular or in the plural (family, clergy, committee, Parliament), are singular when unity (a unit) is intended; plural, when the idea of plurality is predominant.'

§ 1. Introduction

11

whose members are sets If such a class, in turn, is regarded as a single object, we get a set having sets as (some of its) members. Thus, there are sets of sets (sets all of whose members are sets), sets of sets of sets, and so on.
The objects dealt with by set theory are therefore of two sorts: sets, and objects that are not sets. An object of the latter sort is called an individual; the German term Urelement (plural: Urelemente) is often used as well for such an object. Somewhat surprisingly, it has turned out that, as far as applications to pure mathematics are concerned, individuals are in principle dispensable, so that set theory can confine itself to sets only. We shall not make any ruling on this matter. Unless otherwise stated, what we shall say will apply regardless of whether, or how many, individuals are present.

1.4. Definition We write 'a e A' as short for '[the object) a belongs to [the class] A'. The same proposition is also expressed by saying that a is a member of A, or an element of A, or that A contains a. We write 'a fJ A' to negate the proposition that a e A.
A class is specified by means of a definite property, say P, for which it is stipulated that the condition Px is necessary and sufficient for any object x's membership in the class.

1.5. Definition lf P is any definite property, such that the condition Px is meaningful for an arbitrary object x, then the extension of P, denoted by
'{x: Px}',
is the class of all objects x such that Px. Thus a e {x : Px} iff Pa.
Classes having exactly the same members are regarded as identical. Let us state this more formally:

1.6. Principle ofExtensionality (PX) If A and B are any classes such that, for every object x,
xeA~xeB, then A= B.

12

1. Sets and classes

For example, the two classes

{x: xis an integer such that x2 = x},

{y : y is an integer such that -1 < y < 2}

are equal: although the two defining conditions differ in meaning, they are satisfied by the same objects - the integers Oand 1.

1.7. Remark
Set theory (along with other parts of present-day mathematics) is dominated by a structuralist ideology, which entails an extensionalist view of properties. This means that properties having equal extensions are considered to be equal; thus a property and its extension uniquely determine each other.

§ 2. The antinomies; limitation of size
Since ancient times, mathematicians have dealt with infinite pluralities as a matter of course - an obvious example is the class of positive integers. However, until well into the 19th century there was great reluctance to regard such pluralities as single objects, as sets in the sense explained in 1.3. The infinitude of a class meant that more and more of its members could be constructed or conceived of, without limit. But to apprehend such a plurality as a single object seems to imply that all its members have 'already' been constructed or conceived of, or at least that they are somehow all 'out there'. This idea of a completed or actual - rather than potential - infinity was (rightly!) regarded with utmost suspicion.
However, the needs of mathematics as it developed in the 19th century drove Georg Cantor (1845-1918) to create his Mengenlehre, set theory, which admits infinite classes as objects. Despite early hostility, set theory was soon accepted by the majority of mathematicians as a powerful and indispensable tool; indeed, many regard it as a framework and foundation for the whole of mathematics.
The success of set theory first lured its adherents into assuming that every class can be regarded as a set. This assumption, known as the Comprehension Principle, is however untenable: it leads to certain logical contradictions or antinomies. The first such antinomy to be discovered is called the Burali-Forti Paradox, after the person who first published it, in 1897; but Cantor himself had been aware of it at

§2. The antinomies; limitation of size

13

least two years earlier. The antinomy results directly from the assumption that the class W of all ordinals is a set. (The theory of ordinals is an important but quite technical part of set theory. In Ch. 4, when we study the ordinals, we shall prove that W cannot be a set.) Similar antinomies were later discovered by Cantor himself and by others.
Cantor was not too disturbed by these discoveries. He noticed that the antinomies arose from applying the Comprehension Principle to classes that were not just infinite but extremely vast. (An early result of his set theory was that not all infinite classes have the same 'size'.) He concluded that some classes are not merely infinite but absolutely infinite, hence simply too large to be comprehended as a single object. Set theory would be on safe ground if the Comprehension Principle were restricted to classes of moderate size.1 However, he did not specify precisely how to draw the line between moderately large infinite classes, which can be regarded as sets with impunity, and vast ones, which cannot be so regarded.
Matters came to a head in 1903, when Bertrand Russell published a new antinomy, Russell's Paradox, which he had discovered two years earlier. Whereas previous antinomies arose in rather technical reaches of set theory and therefore required lengthy expositions, Russell's Paradox checkmated the Comprehension Principle in two simple moves, as follows. Let

S =ctf {x: xis a set such that x '1- x}.

Assuming that S is a set, it follows that S e S iff S satisfies the defining condition of S - that is, iff S fJ S. This is absurd.

The fact that an antinomy follows so easily from apparently sound assumptions plunged set theory and logic (which cannot be sharply demarcated from set theory) into a crisis.
In 1908, two solutions were proposed to this crisis. Both amounted to imposing restrictions on the Comprehension Principle - but in two very different ways. The first, proposed by Russell himself and embodied in his type theory, refused to accept {x : Px} as an object if the condition Px is impredicative (that is, refers to a totality to which the object, if it did exist, would belong).2 Russell's type theory, elaborated

1 See Michael Hallett, Cantorian set theory and limitation ofsize. 2 Russell's paper, 'Mathematical logic as based on the theory of types', is reprinted in
van Heijenoort, From Frege to Godel.

14

1. Sets and classes

by Whitehead and him in their three-volume Principia Mathematica (1910, 1912, 1913) as a total system for logic and mathematics, turned out to be quite complicated and cumbersome; and, at least in part because of this, has won very few adherents.
The other solution, proposed by Ernst Zermelo, embodied an idea similar to that entertained by Cantor: limitation of size.1 Zermelo proceeded to develop set theory axiomatically: he laid down postulates, or [extralogical] axioms, from which the theorems of set theory were to be deduced by elementary logical means. Besides an Axiom of Extensionality (for sets), Zermelo's axioms include certain particular cases of the Comprehension Principle, which are regarded as safe because - as far as one can tell - they do not allow the formation of over-large sets and do not give rise to antinomies. In addition, Zermelo postulated a special axiom, the Axiom of Choice, which is not a restricted form of the Comprehension Principle, but is needed for proving certain important results in set theory itself and in other branches of mathematics.2
In 1921-2, Abraham Fraenkel, Thoralf Skolem and Nels Leones (independently of one another) proposed one further postulate, the Axiom of Replacement, which is vital for the internal needs of set theory rather than for applications to other branches of mathematics. This postulate is another apparently safe special case of the Comprehension Principle.3

The resulting theory - known as Zermelo-Fraenkel set theory (ZF) has proved to be very convenient and has been adopted almost universally by users of set theory.
While Zermelo's axiomatic approach is, as far as we can tell, sufficient for blocking the logical antinomies, such as the Burali-Forti and Russell Paradoxes, it does not ward against another sort of antinomy, which may be called linguistic or semantic.
Here is a modified version of a linguistic antinomy published in 1906 by Russell, who attributed it to G. G. Berry. Some English expressions define natural numbers; for example, 'zero', 'the square of eightyseven', 'the least prime number greater than eighty-seven million'.

1 Russell too had briefly toyed with the same idea in 1905. 2 A translation of Zermelo's paper, 'Investigations in the foundations of set theory I', is
printed in van Heijenoort, From Frege to Godel.
3 This postulate, as well as Zermelo's Axiom of Separation and Axiom of Union Set, had in fact been foreshadowed in 1899 by Cantor, in a letter to Dedekind, a translation of which is printed in van Heijenoort, From Frege to Godel.

§3. Zermelo's axioms

15

Only finitely many numbers can be defined by English expressions that use fewer than 87 letters, since clearly there are only finitely many such expressions. Hence the class M of natural numbers not so definable must be non-empty. By the Least Number Principle (see §4 of Ch. 0), M has a unique least member: the least natural number not definable by an English expression using fewer than eighty-seven letters. But observe: the italicized part of the previous sentence is an English expression using just 86 letters, which (presumably) defines a number that cannot be defined by an English expression using less than 87 letters!
On the face of it, this antinomy affects arithmetic rather than set theory. However, as we shall see in §3 of Ch. 4 and§ 1 of Ch. 6, the arithmetic of natural numbers can be simulated within set theory, so that Berry's antinomy threatens set theory as well.
We cannot go here into a detailed discussion of the linguistic antinomies. Suffice it to say that the source of the trouble is that the notion of definite property, and hence also that of class (as the extension of such a property) has been left too loose and vague. Thus, for example, the property of being definable by an English expression using fewer than eighty-seven letters does not have a rigorously defined meaning.
These antinomies can be blocked by laying down precise conditions as to what may count as a definite property (or a class). 1 This may be done by specifying a formal language with precise structure and rules, and allowing as definite properties only such as can be expressed formally in this language. For a formalized presentation of ZF see, for example, Chapter 10 of B&M.
We shall present a fairly rigorous but unformalized version of ZF. However, if desired it would be easy in principle (though tedious in practice) to formalize our treatment.

§ 3. Zermelo's axioms
Here we present (with minor modifications) Zermelo's axioms except for the Axiom of Choice, which we shall discuss in Ch. 5
First, we shall assume that our universe of discourse - the class of all

1 The first to formulate such precise conditions was Hermann Weyl in Das Kontinuum (1918). A simila~ (and somewhat more formal) characterization was given independently by Skolem m a 1922 paper whose translation, 'Some remarks on axiomatized set theory', is printed in van Heijenoort, From Frege to Godel.

16

1. Sets and classes

objects with which set theory deals - is non-empty. We do not announce this assumption officially as a special postulate, because it is conventional to consider it as a logical presupposition.
The objects in the universe of discourse are of two distinct sorts: sets and individuals. Classes are admitted as extensions of properties: if P is a definite property of objects, then we admit the class A= {x: Px}. Note that, by Def. 1.5, to say that a e A is just another way of saying that Pa (the object a has the property P).
In order to block the semantic antinomies we must however insist that P be defined in purely set-theoretic terms, without using extraneous concepts.
The universe of discourse itself can be presented as a class according
to this format: it is {x: x = x}.
Although we refer to a class in the singular, this is merely a manner of speaking and does not imply that the class is necessarily a single object. From the axioms it will follow, however, that certain classes are sets, and hence objects of set theory. Each set is identified with the class of all its members.
The universe may also contain other objects, called individuals. An individual is not a set and has no members. As we shall see shortly, there is also a set that has no members - the empty set.
A class that is not a set is called a proper class; a proper class is not an object, and therefore cannot be a member of any class.

As our first postulate we adopt the Principle of Extensionality 1.6. We shall refer to it briefly as 'PX'.
Zermelo postulated PX for sets only, as he did not consider classes (except the universe of discourse) and used properties instead.

Before stating our next postulate, we introduce a useful piece of notation.

3.1. Defmition
If n is any natural number and a1, a2, ... , all are any objects, not necessarily distinct, we put
{a1, a2, . .. , all} =df {x : x =I- x or x = a1 or x = a2 or ... or x = an}.
In particular, for n = 0 we get the empty class { } = {x: x =I- x}, which
we denote by '0'. (No object can differ from itself!)

§3. Zermelo's axioms

17

3.2. Axiom ofPairing(A2) For all objects a and b the class {a, b} is a set.

3.3. Remarks
(i) This set is called the pair of a and b. By PX we have
{a, b} = {b, a}.
(ii) For any object a we clearly have {a} = {a, a}, which is a set by A2. This set is called the singleton of a.
(iii) From our assumption that there exists at least one object a, it now follows that there exists at least one set, namely {a}. Note however that we cannot prove the existence of an individual: our postulates are neutral on this matter.

3.4. Definition
Let A and B be classes. If every member of Bis also a member of A, we say that B is a subclass of A (also, B is included in A, or A includes B), briefly: BC A.
* If B C A but A B, we say that B is a proper subclass of A (also,
B is properly included in A, or A properly includes B), briefly: BCA.

3.5. Warnings
(i) Beware of confusing 'contains' and 'includes'; the former refers to the relation of membership E while the latter refers to the relation C just defined.
(ii) However, this terminological distinction is not observed by all authors, so watch out for other usages.
(iii) Also, the notation introduced in Def. 3.4 is not universally accepted. Some authors use 'C' instead of 'C' for not-necessarilyproper inclusion; and•~' instead of 'C' for proper inclusion.
The following postulate was one of Zermelo's central ideas.

3.6. Axiom of Subsets (AS) If BC A and A is a set then so is B.

18

1. Sets and classes

3.7. Definition
If A is a class and P is a definite property such that the condition Px is meaningful for any object x, we put

{x EA: Px} =df {x: XE A and Px}.

3.8. Remarks
(i) Zermelo's formulation of AS, clearly equivalent to the one used here, said (in effect) that if A is a set then the class {x E A : Px} is always a set. Since this class separates or singles out those members of A that have the property P, he called AS the Axiom of Separation (Aussonderung). This name is still in current use.
(ii) The intuitive idea behind AS is clear: if B ~ A and A is not too vast, then B cannot be too vast either.

3.9. Theorem 0 is a set.

PROOF

Clearly 0 is included in any class, and in particular in any set. By

Rem. 3.3(iii) there exists a set. Hence 0 is included in some set, and

by AS is itself a set.

■

3.10. Theorem
The class of all objects (the universe of discourse) and the class of all sets are proper classes.

PROOF
We saw in § 2 that Russell's class,

{x: xis a set such that x ft. x}

cannot be a set. Since Russell's class is included in the class of all sets,

the latter cannot be a set by AS. The same applies to the universe of

discourse.

■

§3. Zermelo's axioms

19

3.11. Definition If A is any class, we put
UA =dt { x : x E y for some y E A}. UA is called the union class ofA.

3.12. Axiom of Union set (AV)
If A is a set then so is UA.

3.13. Remarks
(i) The members of UA are the members of the members of A.
(ii) Intuitively, the idea behind AU is that if A is a set then it does not have 'too many' members; and each of these, being an object (an individual or a set), in turn does not have 'too many'
members. Therefore UA - obtained by pooling together not-too-
many collections, none of which is too vast - cannot itself be too vast.

3.14. Definition For any classes A and B, we put
AU B =dt {x: x EA or x EB}. A U B is called the union (or join) of A and B.

3.15. Theorem A U B is a set iff both A and B are sets.

PROOF

If A and Bare sets, then AU B = U{A, B}, which is a set by A2 and

AU. The converse follows easily from AS.

■

3.16. Theorem
If n is any natural number and a 1, a2, ... , an are any objects, the class {ai, a2, ... , an} is a set.

20

1. Sets and classes

PROOF
By (weak) induction on n.

Basis. For n = 0 the assertion of our theorem is Thm. 3.9.

Induction step. By Def. 3.14,
{ai, a2, ... , an, an+il = {ai, a2, ... , an} U {an+il,
which is a set by the induction hypothesis, Rem. 3.3(ii) and Thm. 3.15.
■

3.17. Definition If A is any class, we put
PA =dr{x :xisasetsuchthatx !;;;:A}. PA is called the power class ofA.
3.18. Axiom ofPower set (AP) If A is a set then so is PA.

3.19. Remark
Intuitively, the idea behind AP is that although PA can be very large in fact, much larger than A - its size is nevertheless bounded provided A itself is not too vast.

3.20. Problem Prove that if A is a class of sets (that is, a class all of whose members
are sets) such that UA is a set, then A is a set as well.
The last axiom we shall postulate here is
3.21. Axiom ofInfinity (Al) There exists a set Z such that 0 E Z and such that for every set x E Z
alsox U {x} E Z.

§ 4. lntersections and differences

21

3 .22. Remarks
(i) Without AI it is impossible to prove that there are infinite sets. On the other hand, it is easy to see intuitively that any set Z satisfying the conditions imposed by AI must be infinite. We shall be able to prove this rigorously when we have a rigorous definition of infiniteness.
(ii) A2, AS, AU and AP are clearly particular cases of the Principle of Comprehension: they say that certain classes are sets. Although Al as it stands is not of this form, we shall see later that it is equivalent to the proposition that a certain class, w, is a set.

§ 4. Intersections and differences The following definitions will be needed later on.

4.1. Definition If A is any class.
nA =df { X : X E y for every y E A}.
nA is called the intersection class ofA.

4.2. Definition If A and Bare classes,
A n B =dt {x : x e A and x e B}.
An Bis called the intersection (or meet) ofA and B.

4.3. Definition If A is any class,

Ac =df {x: x ft_ A}.

Ac is called the complement of A.

4.4. Definition If A and Bare any classes,
A - B =dt A n Be.
A - B is called the difference between A and B.

22

1. Sets and classes

4.5. Problem
(i) Prove that if A is a non-empty class then nA is a set. What is n01
(ii) Prove that if A or Bis a set then so is A n B.
(iii) Prove that A and Ac cannot both be sets.

2 Relations and functions

§ 1. Ordered n-tuples, cartesian products and relations I.I. Preview
By Def. 1.1.5, the extension of a property P of objects is the class {x : Px}. Recall (Rem. 1.1.7) that from an extensionalist point of view a property and its extension determine each other uniquely; so thatwielding Occam's razor, the structuralist mathematician's favourite instrument-one can identify the two and pretend that a property simply is its extension. As set theory developed, it transpired that a similar procedure could be applied to other fundamental mathematical notions such as relation (among objects) and function: instead of taking these as independent primitive notions, as had been done in the early days of set theory, they could be reduced to classes and the membership relation. In this and the next section we shall see how this is done.

For any two objects a and b, not necessarily distinct, we need a unique

object (a, b) called the ordered pair of a and b [in this order]. It is not

really important how the ordered pair is defined, so long as the

following condition is satisfied:

(1.2)

(a, b) = (c, d) <=>a= c and b = d.

1.3. Warning The ordered pair (a, b) must not be confused with the set {a, b}, sometimes known as an unordered pair, whose members are just a and b. For example, the sets {a, b} and {b, a} are always equal (see Rem. l.3.3(i)), but by (1.2) the ordered pairs (a,b) and (b,a) are equal
only if a = b. However, when there is no risk of confusion we shall
often omit the adjective 'ordered' and say 'pair' when we mean ordered pair.
23

24

2. Relations and functions

As part of the reductionist programme aiming to reduce all mathematical concepts to the notion of class and the membership relation, the following rather artificial definition, first proposed by Kazimierz Kuratowski in 1921, has been widely accepted.

1.4. Definition For any objects a and b,
(a, b) =dt {{a}, {a, b}}.

1.5. Problem Prove that (1.2) follows from Def. 1.4.

More generally, for any number n and any n objects ai, a2, ... , an -not necessarily distinct-we need a unique object (ai, a2, ... , an) called the ordered n-tuple of a1, a2, •.. , an [in this order]. Again, it is not really important how ordered n-tuples are defined, so long as the following condition-of which (1.2) is a special case-is satisfied:

(1.6)

(ai, a2, ... , an) = (b1, b2, ... , bn)

<=> a; = b; for i = 1, 2, ... , n.

Again, we shall often say 'n-tuple' as short for 'ordered n-tuple'.

The following definitions deliver the goods. Proceeding inductively, we supplement Def. 1.4 by:

1. 7. Definition For any n ~ 2 and objects a1, a2, ... , an, an+l•
(a1, a2, •••,an, an+1) =dt ( (a1, a2, •••,an), an+1),

1.8. Problem
Prove (1.6) for all n ~ 2. (Use weak induction on n, taking n = 2 as
basis.)

§ 1. Ordered n-tuples

25

There remain the cases n = 1 and n = 0. For n = l, condition (1.6) reduces to:
(a)= (b) <c>a = b.
The simplest way to satisfy this is to adopt the following.

1.9. Definition (a} =dt a.
As for n = 0, condition (1.6) reduces to the unconditional equality
( ) = ( ) , which will hold trivially, no matter how we define ( ) . Since 0 is the simplest object, the simplest convention to adopt is

1.10. Definition
(} =df0,

1.11. Remark The equality which was decreed by Def. 1.7 for n ~ 2, now holds also
for n = 1 by virtue of Def. 1.9. However, it does not hold for n = 0,
because by Def. 1.9 (a}= a, whereas by Def. 1.10 ((),a}= (0, a}.
We proceed to define the notions of cartesian product and cartesian power.

1.12. Definition (i) For any classes Ai, A 2 , ... , An, not necessarily distinct, their cartesian product [in this order] is the class
Ai X A2 X • • • X An =df
{(xi, Xz, ... , Xn): X1 E Ai, X2 E A2, .•• , Xn E An},
that is, the class of all n-tuples whose i-th component belongs to
A; for i = 1, 2, ... , n.
(ii) The n-th cartesian power of a class A is the cartesian product of A with itself n times: An =dtA X A X • • • X A,
n times

26

2. Relations and functions

that is, the class of all n-tuples of members of A. In particular, A1 = A and A0 = {()} = {0}.

1.13. Remarks
(i) In Def. 1.12(i) we have used a convenient generalization of the class notation introduced in Def. 1.1.5. Although it is almost self-explanatory, let us spell it out. Suppose F(x 1, x 2, ... , Xn) is an object whenever x 1, x 2, ... , Xn are objects; and suppose P(x1, x2, ... , Xn) is a condition involving xi, x2, ... , Xn, Then {F(x1, Xz, ... , Xn) : P(x1, X2, ... , Xn)}
is defined to be the class
{y : there exist xi, x2, ... , Xn such that
= F(xi, X2, ... , Xn) y and P(xi, Xz, ... , Xn)}.
(ii) It is easy to see that, for any n ;;;i,, l, A1 X A 2 X • • • X An= 0 iff A;= 0 for at least one i.
Intuitively, if n ;;;i,, 1 and R is an n-ary relation on a class A, then for any n-tuple of members of A it is meaningful to say that R holds or does not hold for it. The class of all those n-tuples for which R does hold is known as the extension of R. From an extensionalist point of view, two relations are identical iff they have the same extension. Thus, a relation and its extension uniquely determine each other. In the spirit of the reductionist programme mentioned above, a relation is simply identified with its extension. Hence the following

1.14. Definition
(i) For any n ;;;i,, 1 and any class A, an n-ary relation on A is a class of n-tuples of members of A-that is, a subclass of An.
(ii) In particular, a property on A is a unary relation on A-that is, a subclass of A.

1.15. Remarks
(i) If R is an n-ary relation we shall often write 'R(a1, a2 , .•. , an)' as short for ' ( a1, a2 , ••• , an) e R'. In the special case where R is a binary relation we shall often write 'aRb' for '(a, b) e R'.

§2. Functions; the axiom of replacement

27

(ii) We could extend Def. 1.14(i) to the case n = 0, but the resulting notion of 0-ary relation is found to be of little use.

§ 2. Functions; the axiom of replacement Intuitively, if f is a function (or map, or mapping) then f assigns to any object x at most one object fx as value. The class of all objects x to which a value fx is assigned by f is called the domain [of definition] off and denoted by 'domf'.
The graph off is then the class {(x, fx): x E dom /}. Note that the graph of a function is a class of pairs. But not every class of pairs can be the graph of a function: a class G of pairs is the graph of a function iff for any object x there is at most one object y such that (x, y) E G.
From an extensionalist point of view, two functions are identical if they have the same graphs. In the spirit of reductionism, we can therefore identify a function with its graph:
2.1. Definition A function (a.k.a. map or mapping) is a class f of ordered pairs satisfying the functionality condition: whenever both ( x, y) E / and (x,z)Eftheny=z.
2.2. Definition Let f be a function.
(i) The domain off is the class
domf=ctr{x: (x,y) E/forsomey}.
(ii) If x e dom /, then the value off at x - usually denoted by 'fx' - is the [necessarily unique] y such that (x, y) E /.
(iii) The range off is the class
ran/ =dr {fx : x E domf}.
2.3. Problem Verify that from Defs. 2.1 and 2.2 it follows that a function f is equal to its own graph; that is,
f = {(x, /x): x e domf}.

28

2. Relations and functions

Hence prove that functions / and g are equal iff dom f = dom g and
fx = gx for every x in their common domain.

2.4. Definition
Let/ be a function.
(i) We say that f is a map from A to B (or that f maps A into B) if dom/ = A and ran/~ B.
(ii) We say that f is a surjection from A to B (or that f maps A onto
B) if dom/ = A and ran/= B. (iii) We say that f is an injection (or a one-to-one map) if whenever x
and y are distinct members of domf then fx and fy are also distinct.
(iv) We say that f is a bijection from A to B if it is an injection as
well as a surjection from A to B (that is, a one-to-one map from A onto B).
We shall now enquire when a relation or a function is a set.

2.5. Lemma Let A and B be non-empty classes. Then A x B is a set if! both A and B are sets.
PROOF
Let a and b be any members of A and B respectively. Then by Defs. 1.4 and 1.12 we have
{a, b} E {{a}, {a, b}} = (a, b) EA X B.
Therefore by Def. 1.3.11
{a, b} e U(A x B).
Since both a and b belong to {a, b}, it follows, again by Def. 1.3.11, that both are members of UU(A x B). Thus we have shown that A~ UU(A x B) and Bk UU(A x B), hence AU Bk UU(A x B).
Also, it is easy to see that UU(A x B) k AU B. Therefore by PX we have
UU(A X B) =Au B.
If A x B is a set, it follows from AU and Thm. 1.3.15 that A and B are sets as well.

§2. Functions; the axiom of replacement

29

Conversely, if A and B are sets, then by Thm. 1.3.15 and Prob.

1.3.20 it follows that A X B is a set as well.

■

2.6. Theorem Let n;;,, 1, and let Ai, A 2, ... , An be non-empty classes. Then
A 1 x A2 x • • • x An is a set if! A; is a set for each i = 1, 2, ... , n.
PROOF By weak induction on n.

Basis. For n = 1 the assertion of our theorem is trivial, since in this
case A1 x A 2 x · · • x An is simply A 1 (see Defs. l.12(i) and 1.9).

Induction step. It is easy to see that

= A1 X A2 X • • • X An X An+l (A1 X A2 X • • • X An) X An+l

(use Defs. l.12(i) and 1.7 and Rem. 1.11). Hence, by Lemma 2.5 and

the induction hypothesis, A 1 X A2 x • • • x An x An+l is a set iff A; is

a set for each i = 1, 2, ... , n, n + 1.

■

2.7. Corollary
If A is a set and R is an n-ary relation on A (for some n ;;,, 1) then R is a set as well.

PROOF

By Def. 1.14 we have R ~An.If A= 0 then An= 0 by Def. 1.12(ii)
and Rem. 1.13(ii); hence R = 0. If A is a non-empty set then An is a

set by Thm. 2.6, hence R is a set by AS.

■

2.8. Theorem Let f be a function. Then f is a set if! both dom f and ran/ are sets.
PROOF It is easy to verify that
UUJ = domf Uran/.

30

2. Relations and functions

From this the required result follows, using the same argument as in

the proof of Lemma 2.5.

■

At this point we introduce

2.9. Axiom ofReplacement (AR) If f is a function and dom f is a set then ran f is a set as well.
2.10. Remarks (i) AR is clearly a particular case of the Comprehension Principle. (ii) In view of Thm. 2.8, AR is equivalent to the proposition that if f is a function such that dom f is a set then f itself is a set. The intuitive idea behind AR is that f has exactly 'as many' members as does domf: for each a E domf, f contains the corresponding pair ( a, fa). Therefore if dom f is not too vast, neither is f itself.
(iii) In mathematical applications, a function f is almost always defined as a mapping from A to B, where both A and B are known in advance to be sets. It then follows from AS and Thm. 2.8 that ran/ and f itself are sets. AR is not needed for this. But as we shall see AR plays an important role within set theory itself.

3.1. Preview

§ 3. Equivalence and order relations

In this section we discuss two kinds of relation that are of particular importance, not only in set theory but in mathematics as a whole.
Throughout the section, A is an arbitrary class.

3 .2. Definition
R is an equivalence relation on A if R is a binary relation on A such
that, for any members x, y and z of A, the following three conditions
are satisfied:

xRx if xRy then also yRx if xRy and yRz then also xRz

(reflexivity), (symmetry), (transitivity).

§3. Equivalence and order relations

31

3.3. Example
The paradigmatic example of an equivalence relation on A is the binary relation {( x, x ) : x e A} , called the identity (or diagonal) relation on A, and denoted by 'idA'• By the way, idA is clearly a function; indeed, it is a bijection from A to itself.

3.4. Definition Let R be an equivalence relation on A. For each a e A we put
[a]R =df {x: xRa}.
We call [a]R the R-class of a, or the equivalence class of a modulo R. Where there is no risk of confusion we omit the subscript 'R' and write simply '[a]'.

3.5. Theorem
Let R be an equivalence relation on A and let a and b be any members ofA. Then [a]= [b] iff aRb.

PROOF

(=>). By reflexivity, aRa, so a e [a]. If [a]= [b] then by PX also

a e [b], so that aRb.

(<=). Suppose aRb. If x e [a], then xRa, hence by transitivity xRb,

so that x e [b]. Thus we have shown that [a] C [b].

Also, from aRb it follows by symmetry that bRa, so the argument

we have just used shows that [b] C [a]. Hence by PX [a]= [b].

■

3.6. Corollary
Let R be an equivalence relation on A and let a be any member of A. Then a belongs to exactly one R-class, namely [a].

PROOF

We have seen that a e [a]. If also a e [b] then by Def. 3.4 aRb, so by

Thm. 3.5 it follows that [a] = [b].

■

3.7. Definition
(i) S is a sharp partial order on A if S is a binary relation on A such
that, for any members x, y and z of A, the following two

32

2. Relations and functions

conditions are satisfied:

if xSy, then ySx does not hold if xSy and ySz then also xSz

(anti-symmetry), (transitivity).

(ii) B is a blunt partial order on A if B is a binary relation on A such
that, for any members x, y and z of A, the following three
conditions are satisfied:

xBx if xBy and yBx then x = y if xBy and yBz then also xBz

(reflexivity), (weak anti-symmetry),
(transitivity).

3.8. Example Let A be a class of sets (that is, all the members of A are sets rather
than individuals). Let S and B be the restrictions to A of c and ~ respectively; that is,
S=dt{(x,y)eA2 :xCy} and B=dt{(x,y)eA2 :x~y}.
Then it is easy to see that S and B are a sharp and a blunt partial order, respectively, on A.
3.9. Problem Let S and B be a sharp and a blunt partial order, respectively, on A. Put
sb =df s u idA and n# =df B - idA.
(For the definitions of idA and - see Ex. 3.3 and Def. 1.4.4.)
(i) Prove that sb and n# are a blunt and a sharp order on A,
respectively.
(ii) Verify that sb# = S and n#b = B.
3.10. Remarks (i) The qualifications 'sharp' and 'blunt' are often omitted and a partial order of either kind is referred to simply as a 'partial order'. There is no real harm in this, for two reasons. First, because it is usually clear from the context which kind of partial order is meant. Second, as shown in Prob. 3.9, there is a natural

§ 4. Operations on functions

33

mutual association between a sharp partial order and a blunt partial order, whereby the latter is obtained from the former by
#. applying b and the former from the latter by applying
(ii) Sharp partial orders are often denoted by symbols such as'<' or
'<'; the corresponding blunt partial orders are then denoted by symbols such as•,;;;;• or•~• respectively.

3.11. Definition
(i) S is a sharp total order on A if S is a binary relation on A such
that, for any members x, y and z of A, the following two
conditions are satisfied:

exactly one of the following three disjuncts holds

xSy or x = y or ySx

(trichotomy),

whenever xSy and ySz then also xSz

(transitivity).

(ii) B is a blunt total order on A if B is a binary relation on A such
that, for any members x, y and z of A, the following three
conditions are satisfied:

xBy or yBx
if xBy and yBx then x = y
if xBy and yBz then also xBz

(connectedness), (weak anti-symmetry),
(transitivity).

3.12. Problem
Let S and B be a sharp and a blunt total order, respectively, on A. Prove that

(i) S is a sharp partial order,

(ii) sb is a blunt total order,

(iii) Bis a blunt partial order, (iv) Btr, is a sharp total order,

on A.

§ 4. Operations on functions The following definitions will be needed later on.

4.1. Definition If f and g are functions such that ran f ~ dom g, we put
g 0 f=dt{(x,gy): (x,y) E/}.

34

2. Relations and functions

go /-often denoted briefly 'gf' - is called the composition off and g. (Note reading from right to left!)

4.2. Problem
Show: go f is a function, dom (go/) = dom f and ran (go/) k rang.
Moreover, for any x in dom (g 0 /)-which is also dom /-check that
(g 0 /)x = g(fx).

4.3. Definition If f is an injective (that is, one-to-one) function we put
1-1 =df {(y, x): (x, y) E /}. 1-1 is called the inverse of/.

4.4. Problem
Verify that 1-1 itself is an injective function and, moreover, dom(/-1) = ran/, ran(/-1) = dom/, /-l 0 f = iddomf,
(For the definition of id see Ex. 3.3.)

4.5. Problem
Prove that if f is a function from a proper class to a set, then f is not injective.

4.6. Definition
If f is a function and Ck domf, we put (i) /tC =df {(x, fx}: x e C}, (ii) f[C] =dr {fx: x e C}.
/ t C is called the restriction off to C and /[C] is called the image of C
under/.

4.7. Problem
Verify that ft C is a function, dom (ft C) = C and ran (ft C) = /[C]. Moreover, (/ tC)x = fx for every x e C.

§ 4. Operations on functions

35

4.8. Problem
Let F be a class whose members are functions. Show that UF is a
function iff the following coherence condition is fulfilled: fx = gx for
all f and g in F and all x E <lorn f n dom g. Assuming this condition
holds, what are dam F and ran F?

3 Cardinals

§ 1. Equipollence and cardinality We start by defining a binary relation= on the class of all sets:

1.1. Definition
Let A and B be sets. We say that A and B are equipollent, briefly: A = B, if there exists a bijection from A to B (that is, a one-to-one map from A onto B).

1.2. Theorem
Equipollence is an equivalence relation on the class ofsets.

PROOF

For any set A, idA is a bijection from A to itself; so = is reflexive.
If f is a bijection from A to B then clearly 1-1 is a bijection from B

to A; so = is symmetric.

Finally, if f is a bijection from A to B and g is a bijection from B to

C, then go f is a bijection from A to C; so= is transitive.

■

It is convenient to introduce the following

1.3. Definition (incomplete)
To each set A we assign an object IAI, called the cardinality of A, such that for any two sets A and B, IAI = IBI iff A= B.
An object of the form IA Ifor some set A is called a cardinal.
36

§ 1. Equipollence and cardinality

37

1.4. Remarks
(i) Def. 1.3 is incomplete, because we have not specified what the object IAI is or how it is to be chosen. Cantor regarded cardinals as special abstract entities of a new kind. In effect, this amounted to introducing the notion of cardinal as a separate primitive notion. However, it would obviously be more convenient - and conform to the reductionist programme - if cardinals were among the hitherto posited objects of set theory. In this spirit, Frege proposed in 1884 the elegant idea of defining IAI as [A],,., the equivalence class of A modulo= (see Def. 2.3.4). The condition
required by Def. 1.3 - IAI = jBj <c> A= B -would then follow at
once by Thm. 2.3.5. This procedure, novel at the time, was to become standard
practice, used with respect to various equivalence relations that arise in numerous mathematical situations.
Ironically, Frege's procedure does not work at all well in the present case, where the equivalence relation is =. Unaware that the Comprehension Principle had to be restricted, he assumed as a matter of course that [A],,. is always a set, hence an object. Unfortunately, this is in general false. For example, if A is a singleton, then [A].., is the class of all singletons, and hence U[A],,., is the class of all objects, the entire universe of discourse, which is a proper class by Tom. 1.3.10. Hence by AU [A].,. must be a proper class as well. This is very inconvenient, because we would like to be able to form classes of cardinals, which is impossible if cardinals are proper classes.
Fortunately there are other ways of defining cardinals, satisfying the requirement of Def. 1.3, while ensuring that the cardinals are sets. Later on, in Ch. 6, we shall follow one such procedure. In each =-class we shall be able to select a unique 'distinguished' member. Then, for any set A, we can take IAI to be the distinguished member of [A].. rather than that class itself. Then Thm. 2.3.5 ensures that the requirement of Def. 1.3 is satisfied. (ii) For the time being, let us take it on trust that Def. 1.3 can be completed in a satisfactory way. This is not asking too much, since our reference to cardinals may be regarded as a mere convenience: everything that we shall say in this chapter in terms of cardinals can easily be rephrased (at the cost of some circumlocution) in terms of sets and mapping between sets.

38

3. Cardinals

(iii) The cardinality IAI of a set A is a measure of its size. Cardinals
can be regarded intuitively as generalized natural numbers. Indeed, if A is a finite set of the form {ai, a2, ... , an}, where the
a; are distinct, then we could take IAI to be n, the number of
members of A. Thus, each natural number may be regarded intuitively as the cardinality of a finite set. (iv) However, we shall not assume formally that the natural numbers are in fact cardinals. Rather, in §3 we shall posit for each n a corresponding cardinal n, without necessarily identifying the two.

§2. Ordering the cardinals; the Schroder-Bernstein Theorem We define a binary relation :e;; on the class of cardinals, which, as we shall soon see, is a [blunt) partial order on that class:
2.1. Definition
Let Ji. and µ be cardinals. Let A and B be sets such that IA I = Ji. and IBI =µ.We say that Ji. is smaller-than-or-equal-toµ- briefly: Ji. :e;; µ - if
there is an injection from A to B.
2.2. Remark This definition is in need of legitimation: we must make sure that the criterion it provides for asserting that Ji.,;,;µ depends only on these cardinals themselves rather than on the choice of particular sets A and
B such that IAI =Ji.and IBI =µ.This is done as follows. Let A, A', B, B' be sets such that IAI = IA'I and IBI = IB'I. Given an injection
from A to B, it is easy to show - DIY! - that there is also an injection from A' to B'.

2.3. Theorem
Let A and µ be cardinals and let B be a set such that IBl = µ. Then
Ji. :e;; µ if! B has a subset whose cardinality is Ji..

PROOF

Let A be a set such that IAI = Ji.. By Def. 2.2.4, an injection from A to

B is the same thing as a bijection from A to a subset of B.

■

§2. Ordering; Schroeder-Bernstein Theorem

39

2.4. Theorem The relation :;;;; on the class of cardinals is reflexive and transitive.

PROOF

DIY.

■

To show that :;;;; is a partial order, it remains to establish that it is weakly anti-symmetric (see Def. 2.3. 7). This fact was conjectured by Cantor and proved independently by F. Bernstein and E. Schroder. The proof we shall present here, due to Zermelo, uses a lemma that is of some interest in its own right.

2.5. Definition
A map f/ from a class of sets to a class of sets is monotone if whenever X and Y are sets in dom q such that X C Y then <J X C r; Y.

2.6. Lemma
Let A be a set and let <J be a monotone map from PA to itself Then A
has a subset G such that fJ G = G.
PROOF
For any subset X of A, the value r; X is also a subset of A. Let us say Xis a good set if it is a subset of A such that qX C X. (For example, A itself is clearly good.)
Note that if X is good then <J X is good as well. Indeed, if q_ X C X then by the monotonicity of r; we get q(qX) C qX, which means that qX is good.
Let G be the intersection of all good subsets of A, that is:
n{ G = X E PA : r; X C X}.
(See Def. 1.4.1.) We claim that G itself is good. To show this, let X be any good set. Then G C X because G is the intersection of all good sets. Therefore by the monotonicity of q we have qG C qX. Also, since Xis good, we have r;X C X; hence q.G C X. Thus we see r;G is included in every good set. Hence r; G must also be included in the intersection of all good sets. But this intersection is G itself; this means that qG CG, so G is good, as claimed.
It now follows that (J-G is good as well. But G, the intersection of all

40

3. Cardinals

good sets, is included in each of them and in particular in the good set fJG. So we have shown both 'JG!; G and G !; fJG. Thus 'JG= G. ■

2.7. Theorem (Schroder-Bernstein)
If Ji. and µ are cardinals such that Ji. ..;; µ and µ ..;; Ji. then Ji. = µ.

PROOF
Let A be a set such that IAI =µ.Since Ji...;;µ, according to Thm. 2.3 A has a subset, say B, such that IBI = Ji.. Since also µ..;; Ji., according to Def. 2.1 there is an injection, say,/, from A to B.
The claim that Ji. = µ will be proved if we show that there is a bijection from A to B.
Define a map tJ from PA into itself by putting, for any X !; A,
tJ,X = (A - B) U /[X].
(For the definitions of A - Band /[X), cf. Def. 1.4.4 and Def. 2.4.6.) It is easy to see that 'I is monotone. By Lemma 2.6, there exists some G !; A such that G = tJ,G. Thus
G = (A - B) U /[G).
Note that /[ G) !; B because f maps the whole of A into B. (See Fig.
1. The large rectangle represents A; like Gaul, it is divided into three parts.)
Now, f tG is an injection from G to B and a bijection from G to
/[G) (see Prob. 2.4.7). Let us put
h = (/tG) u idA-G•

A-B

l

I

/[G]

J G

i----------t

B

l

A-G

Fig. l

§3. Cardinals for natural numbers

41

Thus h is a map whose domain in the whole of A, such that

if XE G, if x e A - G.

It is obvious that h is a bijection from A to B.

■

2.8. Remarks
(i) In view of Toms. 2.4 and 2.7, :!:i is a [blunt] partial order on the class of cardinals.
(ii) As usual in such cases, we denote by '<' the sharp partial order associated with :!:i. (Thus< is :!:i#; see Prob. 2.3.9.) If Aandµ are cardinals such that A< µ we say that Ais smaller than µ.
(iii) Later on we shall prove (using the Axiom of Choice) that :!:i is a total order on the class of cardinals.

§ 3. Cardinals for natural numbers 3.1. Definition If n is a natural number and a1, a2, ... , an are distinct objects, we put
n =df l{ai, a2, • • •, an}I. In particular, 0 = 101 and 1 = i{a}I, where a is any object. We call n the cardinal for (or corresponding to) n.
3.2. Remarks (i) To legitimize Def. 3.1 we must verify that if a1, a2, ... , an are distinct objects and bi, b2, ... , bn are likewise distinct objects then {a1, a2, ... , an}= {bi, b2, ... , bn}. This is easy: {(a1, b1), (a2, b2), ... , (an, bn)} is clearly a bijection from {a1, a2, ... , an} to {b1, b2, ... , bn}(ii) By Thm. 2.3, 0 :!:i µ for every cardinal µ.
3.3. Problem Define cn by induction on n as follows:
co = 0 and Cn+l = { cn} for each n.

42

3. Cardinals

Prove that, for each n, the objects c0 , Ci, ... , Cn are distinct. (Use induction on n.)
Thus for any natural number n there exist n distinct objects, and hence the corresponding cardinal n exists.

3.4. Theorem
Let ai, a2, ... , an be any objects. Then there does not exist an injection from the set {a1, a2, ... , an} toanypropersubsetofitself

PROOF

By induction on n. For n = 0 our theorem is trivial, since 0 has no

proper subset.
For the induction step, consider a set A = {ai, a2, ... , an, an+1}-

We may assume that the objects a1, a2, ... , an, an+t are all distinct; otherwise, by eliminating one duplication we can write A in the form

'{bi, b2, ... , bn}' and the required result follows at once by the induction hypothesis.
Suppose f is an injection from A to some B ~ A. If B c A then at

least one member of A must be outside B; and (by relabelling the a's

if necessary) we may assume that an+l ,t B.

Since fan+l must be in B, it cannot be an+l itself; and (again, by

relabelling if necessary) we may assume that fan+l = a1. Therefore

a1 e B. Also, since f is injective, an+l is the only x EA such that
fx = a1.

It would then follow that ft {a1, a2, .•. , an} is an injection from the

set {a1, a2, •.• , an} to its proper subset B - {a1} - contrary to the

induction hypothesis. Thus B cannot be a proper subset of A.

■

3.5. Theorem For any natural numbers n and m:
(i) if m ~ n then m ~ n; (ii) if m ::/= n then m ::/= n.
(WARNING. The two •~• here mean different things: the first denotes the usual order among natural numbers, while the second denotes the partial order on the cardinals.)

§ 4. Addition

43

PROOF

(i) Assume m ~ n. Take n distinct objects a1, a2, ... , an (which

exist by Prob. 3.3). Since {ai, a2, ... , am} is clearly a subset of

{a1, a2, ... , an}, we have m ~ n by Thro. 2.3.

(ii) Let m =I= n. Without loss of generality we may assume m < n.

Take n distinct objects a1, a2, ... , an- By Thm. 3.4 there is no

bijection from {a1, a2, ... , an} to its proper subset {a1, a2, ... ,

am}. Therefore m =I= n.

■

3.6. Remark
A subtle matter: we have not shown that being a natural number is a notion of set theory. Rather, we have taken this notion to be understood in advance, prior to the development of set theory. Therefore Def. 3.1 cannot be regarded as a single definition within this theory. Rather, it is a definition scheme, a sequence of definitions whereby each of the cardinals 0, 1, 2, 3, etc., in turn may be defined separately. Similar caveats apply to the whole of this section as well as to definitions like 1.3.1 and 2.1.7 and theorems like 1.3.16.

§ 4. Addition In this section we shall see how cardinals may be added. But first we introduce a useful bit of terminology.
4.1. Definition
If A n B = 0, we say that A and B are disjoint.
4.2. Lemma For any sets A and B, there are disjoint sets A' and B', such that
IAI = IA'I and IBI = IB'I.
PROOF
Take any two distinct objects a and b (for example, 0 and {0}; see Prob. 3.3). Then let
A'= {a} x A= {(a,x) :x EA}, B' = {b} x B = {(b,x) :x EB}.

44

3. Cardinals

Using (2.1.2) it is easy to see that A' n B' = 0. Also, a bijection f

from A' to A is obtained by putting f ( a, x) = x for every x e A; so

IAI = IA'I, Similarly, IBI = IB'I.

■

4.3. Lemma
Let A, B, A', B' be sets such that An B = A' n B' = 0, IAI = IA'I
and IBI = IB'I. Then IA U Bl= IA' U B'I.
PROOF Let f and g be bijections from A to A' and from B to B' respectively. Then it is clear that f U g is a bijection from AU B to A' U B'. ■

4.4. Definition
For any cardinals it and µ, we define the sum of it and µ:
j\, + µ =df IA u Bl,
where A and Bare disjoint sets such that IAI = it and IBI = µ.

4.5. Remarks
(i) Def. 4.4 is legitimized by Lemma 4.3. (ii) In the proof of Thm. 2.7 we made use of a special case of Lemma
4.3. We had there A= GU (A - G) and B = f[GJ U (A - G), where the unions in both cases are between disjoint sets. Also, IGI = lf[GJI because f is injective. Hence we concluded that IAl=IBI,

4.6. Theorem
If k, m and n are natural numbers and k + m = n, then k + m = n.

PROOF

DIY. (WARNING. The two'+' here mean different things. The first

denotes the operation of addition of numbers. The second denotes

addition of cardinals.)

■

§4. Addition

45

4.7. Problem
Verify, for all cardinals x, }.. and µ:
(i) x+(}..+µ)=(u+}..)+µ
(ii) }.. + µ = µ + }..
= (iii) }.. + 0 =}..
(iv) }.. ,;,; µ x + }.. e;:;: x + µ

(associativity of addition), (commutativity of addition), (neutrality of Ow.r.t. addition), (weak monotonicity of addition).

4.8. Warning
Although cardinal addition behaves in many ways like ordinary addition of natural numbers, not all rules of ordinary arithmetic apply
here. For example, as we shall see later, from " + }.. = " it does not always follow that }.. = 0. Hence the cancellation law does not apply in general (from "+ }.. = "+ µ it does not always follow that }.. = µ); nor
is addition of cardinals strongly monotone (from }.. < µ it does not
always follow that x + }.. < "+ µ).
Instead of adding just a pair of cardinals at a time, it is possible to define the sum of many - even infinitely many - cardinals simultaneously. However, the legitimation of this definition requires the Axiom of Choice (AC, see Ch. S). We shall explain the definition here, leaving its legitimation for later. First, we need some new notation:

4.9. Definition If B is a function whose domain is a set X, we sometimes denote the value of B at x e X by 'Bx' rather than by 'Bx' and denote B itself by
'{Bx IX EX}'.
In this connection we refer to X as the index set and to B as the family of the Bx, indexed by X.

4.10. Remark
Many authors use the vertical stroke 'I' instead of the colon for class
abstraction (as in Def. 1.1.S) and so use some other notation for indexed families.

46

3. Cardinals

4.11. Definition Let {Bx Ix EX} be an indexed family of sets (that is, all the Bx are
sets). Let µx = IBxl for each x E X. We put:
L{µx IX EX} =dt ILJ{{x} X Bx: XE X}I.
This is called the sum of the [family ofthe] µx, indexed by X.

4.12. Remarks
(i) Thus, to add up all the µx simultaneously, we form the cartesian product {x} x Bx for each x EX. (Note that these products are pairwise disjoint: if x =I= y then {x} x Bx and {y} x By are disjoint, although Bx and By need not be disjoint and may even be equal.) Then we take the union of all these products. Using AR and AU it is easy to verify that this union is a set. The cardinality of this set is the required sum.
(ii) To legitimize this definition one must show that if A is another indexed family of sets with the same index set X such that IAxl = IBxl for all x EX, then
U{{x} X Ax: x EX}= LJ{{x} X Bx: x EX}.
This can easily be done, using AC (see Rem. 5.1.3(iii} below). (iii) We need to define the sum of a family, rather than a set, of
cardinals because in a set of cardinals each cardinal can occur at most once: a given cardinal either does or does not belong to a given set. However, we must not forbid multiple occurrence of a cardinal in a sum. This is taken care of by our definition, since in
the family {µx Ix E X} we can have µx = µY for x =I= y.
(iv) Def. 4.4 is obtained as a special case of Def. 4.11 by taking the index set X to have just two members.
(v) The set U{{x} x Bx: x EX} is called the direct sum of the
indexed family {Bx Ix E X}.

§5. Multiplication 5.1. Definition For any cardinals }., and µ, we define the product of Aand µ:
A·µ =dt IA X Bl,

§5. Multiplication

47

where A and Bare any sets such that IAI =), and IBI =µ.We often abbreviate '), • µ' as ').µ'.

5.2. Remarks
(i) A x Bis a set by Rem. 2.1.13(ii) and Lemma 2.2.5.
(ii) Def. 5.1 is legitimized by the easily proved fact that if A' = A
and B' = B, then also A' x B' = A x B.
For natural numbers m and n, the product mn equals the sum obtained when n is added to itself m times (this is why the product is read as 'm times n'). A similar result also holds in cardinal arithmetic, in the following sense:

5.3. Theorem
Let ), and x be any cardinals and let {µa Ia e A} be an indexed family of cardinals such that µ0 = x for every a EA and such that IAI = A.
Then

PROOF
Let D be a set such that IDI = x. Applying Def. 4.11 to the indexed family of sets {Ba Ia e A} such that Ba= D for every a e A, we obtain
L {µa I a E A} = ILJ {{a} X D : a E A} 1-

However, it is not difficult to verify (DIY!) that

U{{a} x D: a e A}= Ax D.

Hence }:{µa Ia e A}= IA X DI= Ji.x.

■

5.4. Theorem If k, m and n are natural numbers and km= n, then km= n.

PROOF

DIY.

■

48

3. Cardinals

5.5. Problem Verify, for all cardinals "• Ji. and µ:

(i) ,e(Ji.µ) = (,eJi.)µ

(associativity of multiplication),

(ii) Ji.µ= µJi.

(commutativity of multiplication),

(iii) Al= Ji.

(neutrality of 1 w.r.t. multiplication),

(iv) Ji. ~ µ => ,e)i. ~ ,eµ

(weak monotonicity of multiplication),

(v) (,e +Ji.)µ= ,eµ + Ji.µ

(distributivity of multiplication over addition),

(vi) Ji.µ = 0 ~ Ji. = 0 orµ = 0

(absorptive property of 0).

5.6. Problem
Prove the following generalization of Prob. 5.5(v): if {Ax Ix e X} is
any indexed family of cardinals and µ is any cardinal then

5.7. Warning The same as 4.8, mutatis mutandis.
As in the case of addition, multiplication can be defined for a whole family of cardinals rather than just a pair of cardinals. (Legitimation again requires AC.) We start from a simple observation:
5.8. Lemma Let C and D be any sets and let u and v be distinct objects. Let P be the class
{/: f is a function such that domf = {u, v} and fu e C and fv e D}.
Then P is a set equipollent to C x D.
PROOF
It is quite easy to show, without using AR, that Pis a set. However, we shall not bother to do so. Instead, we shall define a bijection F from the set C x D to P. Thus by AR the latter is also a set. We put,

§5. Multiplication

49

for each c EC and d eD,

F(c, d) = {(u, c), (v, d) }.

It is easy to verify that F is indeed a bijection from C x D to P. ■

The following definition generalizes the construction of Lemma 5.8 to an arbitrary family of sets.

5.9. Definition If {Bx I x E X} is an indexed family of sets, the class
{/ : f is a function such that dom f = X and fx e Bx for all x E X}
is denoted by 'X {Bx I X E X}'
and called the direct product of the family {Bx I x e X }.

5.10. Lemma
If {Bx I x E X} is any indexed family of sets, then X {Ba I x e X} is a set.

PROOF
Recall (Def. 4.9) that {Bx Ix e X} is the function having the index set
X as its domain, whose value at each x e X is Bx. Therefore the range of this function is

{Bx: XE X}

and this range is a set by AR. Now let us put
U = LJ {Bx : x e X}.

U is a set by AU. Next, observe that by Def. 5.9, if f is any member of X {Bx I x e X} then f is a map from X to U. Hence fr;;;, X x U, which means that f e P(X x U). Thus we have shown that

X {Bx IX EX} r;;;, P(X X U).

Since Xx U is a set (cf. Rem. 5.2(i)), it follows that P(X x U) is a

set by AP. Hence X {Bx Ix e X} is a set by AS.

■

50

3. Cardinals

5.11. Definition
Let {Bx I x E X} be a family of sets and let µx = IBxl for each x E X.
We put
TI{µx IX EX} =df IX{Bxlx E X}I.
This is called the product of the [family ofl µx, indexed by X.

5.12. Remarks (i) Using AC it is easy to legitimize this definition by showing that if A is another indexed family of sets with the same index set X
such that IAxl = IBxl for all x EX, then
X{Ax IX EX}= X{Bx IX EX}.
(ii) Def. 5.1 can be regarded as a special case of Def. 5.11. Indeed, if C and D are any sets, whose cardinalities are u and ,l respect-
ively, take X = {u, v}, where u and v are distinct objects. and let
{Bx Ix e X} be the family such that Bu = C and Bv = D. Then
Lemma 5.8, rewritten in the notation of Def. 5.9, says that
X {Bx IX EX}= C X D.
So in this case we have
IX{Bx IX E X}I = IC X DI,
which is what Def. 5.1 says ui should be.

§ 6. Exponentiation; Cantor's Theorem 6.1. Definition Let A and B be any sets. Then
map(A, B) =dt {/: f is a map from A to B}.
6.2. Remarks (i) If f is any member of map (A, B) then f <;;;, A x B, hence f is a member of P(A x B). Thus map(A, B) <;;;, P(A x B), and map (A, B) is a set. (ii) Perhaps more instructively, the same result can be derived from Lemma 5.10, as follows. Consider the indexed family

§ 6. Exponentiation; Cantor's Theorem

51

{Da Ia EA} such that Da = B for every a EA. Then
X {Da Ia EA} - which is a set by Lemma 5.10 - is, by Def. 5.9 equal to
{f : f is a function such that

dom f = A and fa e B for all a E A}.

By Def. 6.1 this is exactly map (A, B).

6.3. Definition For any cardinals Aand µ, we define µ to the [power of] A:
,1 = lmap(A, B)I, where A and Bare sets such that IA!= Aand IBI = µ.

6.4. Remarks
(i) This definition is legitimized by the easily verified fact that if
= = A= A' and B B' then map(A, B) map(A', B').
(ii) From Rem. 6.2(ii) it follows that exponentiation (raising to a power) can be achieved by repeated multiplication, in the follow-
ing sense: if {Xa Ia E A} is an indexed family of cardinals such that Xa =µfor all a EA, and if IAI =).,then
fl{xa I a EA}=,}.

6.5. Problem Let k, m be natural numbers, and let n = mk. Verify that n = mk.

6.6. Problem
Verify that for any cardinals x, A andµ:
(i) l = 1,
(ii) µl = µ, (iii) µ",} = µ"+\
(iv) (,})" = µ"\
(v) (Aµ)"= A"µ".

6.7. Theorem For any set A, !PAI= 2IAI_

52

3. Cardinals

PROOF
By Def. 6.3, what we have to show is that PA is equipollent to map(A. B). where Bis a set having exactly two members. Let us take
B = (0, (0)). Define a map F from map(A, B) to PA, by putting,
for every f e map{A, B),

Ff= {a e A: fa =0).

Itis easy to verify that 7is a bijection from map(A, B) to PA.

■

6.8. Canto~s Theorem ForanysetA, IAJ<IPAI.

PROOF
First, we show that ]Al ,i;;;; ]PA]. We define a map /from A into PA by putting fa= (a} for each a e A. Qearly, f is an injection from A to PA.
We show that JAi ,fa IPAI by reductio. Let g be any map from A to PA. For eachx e A, then, gx is a member of PA-that is, a subset of A.Put
D = {x e A : x f gx).

Then Dis a subset of A-that is, a member of PA. If g were to map
A onto PA, there would be some d e A for which gd = D. Then

de gd<t;>d e D.

But from the dermition of D we see that d e D d «;;>- f gd.

Thus, d belongs to gd iff it doesn't. This contradiction shows that g

cannot map A onto PA, and hence cannot be a bijection from A to

M.

■

6.9. Remm-k
The idea of Russell's Paradox derives from this proof. Indeed, if A is the class of all sets, then it is easy to see that PA CA. Thus id.A is in fact a bijection from A to a class-A itself-that includes PA. Talcing idA as the g in Cantor's proof, the D of that proof becomes Russell's paradoxical class of all sets that do not belong to themselves.

4 Ordinals
§ 1. Intuitive discussion and preview The introduction of the set-theoretical cardinals was motivated by the wish to generalize the natural numbers in their capacity as cardinal numbers, answering the question 'how many?'. But the natural numbers are also used, in arithmetic as well as in ordinary life, in other capacities. In my local bank branch there is a number dispenser: on entering the branch, each customer collects from the dispenser a piece of paper showing a number. This number is not (at least, not directly) an answer to a 'how many?' question, but an ordinal number, fixing the place of the customer in the queue.
A finite set can always be arranged as a queue - and if we ignore the identity of the elements being ordered, this can done in just one way. For example, the first three customers in the bank, arranged according to the numbers assigned to them by the dispenser, always form the following pattern:
We can use the number three as an ordinal number, to describe this general abstract pattern, the order type of three objects arranged in a queue. Note that three is also the number to be assigned to the next customer, who is about to join the queue. This is quite general: the ordinal number assigned to each customer is the order-type (the queue pattern) of the queue of all preceding customers.
Cantor wished to extend this idea of finite queues and finite ordinal numbers into the transfinite. Imagine that all the old (finite) ordinal numbers have been dispensed. We have now got an infinite queue
53

54 forming the pattern (*)

4. Ordinals

We need a new ordinal to describe the order type of this infinite queue. Cantor denoted this new ordinal by 'w'. We can assign this ordinal to the next 'customer' and extend the queue by placing that customer behind all the finite-numbered ones:

•<•<•<•< ... <•

0 123

w

The new order type just formed is described by the next ordinal, which Cantor denoted by 'w + 1'. We can continue in this way, getting not only w + n for every natural n but also w + w, then w + w + 1 and so on and on and on.
Examining the 'queues' formed in this way, Cantor saw that they are not merely totally ordered, but have a special property not shared by all totally ordered sets: every non-empty subset of the queue has a least (first) member. Cantor called such queues well-ordered.
An example of a total order that is not a well-ordering is provided by the integers, ordered according to magnitude:

... < -3 < -2 < -1 < 0 < 1 < 2 < 3 < ....

Note that the fact that the pattern (*), described by the ordinal w, is well-ordered is just the Least Number Principle, a form of the Principle of Mathematical Induction (see§ 4 of Ch. 0).
Cantor introduced the ordinals as a new and separate sort of abstract entity, just as he did with cardinals. However, in 1923 John von Neumann pointed out that among all well-ordered sets having a given Cantorian ordinal as their order-type there is a particular one with some very special properties. In the spirit of reductionism, this particular set can then be taken to be the ordinal of that order type.
We shall present von Neumann's theory of ordinals as streamlined by Raphael M Robinson and others.

§ 2. Definition and basic properties 2.1. Definition
Let < be a [sharp] partial order on a class A and let B !: A. If be B and b < x for every other x e B, we say that b is least in B with respect
to<.

§2. Definition and basic properties

55

2.2. Remarks
(i) Instead of demanding that b -< x for every other x E B, we may equivalently demand that b :,.,; x for every x E B. Here :,.,; is of course -<b, the blunt partial order associated with -< (see Prob. 2.3.9 and Rem. 2.3.10).
(ii) When there is no risk of confusion, we omit the phrase 'with respect to -<'.
(iii) Since -< is anti-symmetric, if B does have a least member it is unique and we may therefore refer to it as the least member of
B.

2.3. Definition
A well-ordering on a class A is a partial order on A such that every non-empty set included in A has a least member.

2.4. Lemma
If< is a well-ordering on a class A then < is a [sharp] total order on
A.

PROOF
According to Def. 2.3.11, we must show that -< fulfils the trichotomy and transitivity conditions. The latter condition is fulfilled because by Def. 2.3 < is a partial order; so it only remains to verify the trichotomy.
Let x and y be any members of A. We must show that exactly one of the three disjuncts

x -< y or x = y or y -< x

holds. That no two of these disjuncts can hold simultaneously follows

at once from the anti-symmetry of -<. On the other hand, the set

{x, y} is included in A and so must have a least member; hence at

least one of the three disjuncts must hold.

■

2.5. Definition If A is any class, we define the binary relation EA on A, called the restriction of E to A, by putting
EA =ctf {(x, y) E A 2 : XE y}.

56

4. Ordinals

2.6. Remark
The relation EA can also be characterized by the fact that, for all x and y,
= x EAY x E A and y E A and x e y.

2.7. Definition We say that a class A is e-well-ordered if the relation EA is a well-ordering on A.

2.8. Problem
(i) Let A be a class such that EA is a sharp total order on A; let B ~ A and b EB. Prove that b is least in B w.r.t. EA iff b is
either an individual or a set such that b n B = 0.
(ii) Hence verify that a class A is E-well-ordered iff the following two conditions are satisfied: (1) EA is a sharp total order on A. (2) Every non-empty set u included in A has a member v such
that v is either an individual or a set such that v n u = 0.
(iii) Prove that in (ii) we may replace (1) by the weaker condition: (1') For any members x and y of A, at least one of the following three disjuncts holds:
X E y or X = y or y E X.
(Show that if two of these disjuncts hold simultaneously then the set u = {x, y} violates (2). To verify that EA is transitive, let x, y
and z be members of A such that x E y e z and apply (2) to the set u = {x, y, z}.)
(iv) Hence (or directly from Def. 2.7) prove that if B ~ A and A is E-well-ordered, then so is B.

2.9. Theorem If A is an e-well-ordered class and Bis a non-empty subclass ofA, then B has a least member w.r.t. EA-

§2. Definition and basic properties

57

PROOF

Take any z E B. If z is the least member of B, we need look no

further. So let us suppose z is not the least member of B. Therefore by
* Prob. 2.8(i) z is a set rather than an individual and z n B 0.
By Prob. 1.4.S(ii) z n B is a set; we have just seen that it is

non-empty; and it is clearly included in B and hence also in A. So by

Def. 2.7 z n B must have a least member w.r.t. EA.
Let y be the least member of z n B. We claim that this y is also the

least member of B. Indeed, if this were untrue, then (applying to y the

argument we have just applied to z) we would find an x such that

x E y n B. Then x E y as well as y E z and by the transitivity of EA it

would follow that x E z, hence x E z n B. But this is impossible,

because x E y and y is the least member of z n B.

■

2.10. Defmition A class A is transitive if, for all y,
y EA=> y CA.

2.11. Remarks
(i) Note that every member of a transitive class must be a set rather than an individual, because by Def. 1.3.4 y CA holds only if y is a class. So a class A is transitive iff: (1) all its members are sets and (2) UA CA; that is, for all x and y, x E y EA=> x EA.
(ii) Unfortunately, 'transitivity' is used with two meanings: the present one and that applicable to binary relations (as, for example, in Def. 2.3.2). In practice no confusion shall arise, as the context will indicate which meaning is intended.

2.12. Definition
An ordinal is a transitive and E-well-ordered set. The class of all ordinals is denoted by 'W'.

2.13. Examples
The empty set 0 is, vacuously, an ordinal. It is also easy to verify that {0} and {0, {0}} are ordinals.

58

4. Ordinals

2.14. Convention
We shall use lower-case Greek letters - mainly 'c.r', '/3', 'y', 'A', \;' and 'rJ' - as variables ranging over the ordinals.

2.15. Theorem All members of an ordinal are ordinals; thus, if a is an ordinal, c.r={~:~ec.r}.

PROOF

Let y E c.r. Since a is transitive, we have y k a. Since a is an

e-well-ordered set, it follows from Prob. 2.8(iv) that its subset y is also

e-well-ordered. It remains to show that y is transitive. So let u e x e y. Using the fact that a is a transitive set, we have
x e c.r and then in tum also u e c.r. Hence u and x, as well as y, are

members of a; so by the transitivity of the relation ea we infer from

u E X E y that U E y.

■

2.16. Lemma
If y is any transitive subset of an ordinal a then y itself is an ordinal;
moreover, y = a or y e a.

PROOF

That y is an ordinal follows at once from Prob. 2.8(iv). Moreover, let

u = a - y. If u = 0 then y = a. If u is non-empty, then it has a

(unique) least member x w.r.t. Ea. We shall show that y = x.

First, let z e x. Since x e c.r and a- is transitive, it follows that z e c.r.

But z cannot be in u, because z ex, and x is the least member of u;

thus z must be in y. This proves that x k y.

Conversely, let z e y. Then z =xis impossible because x f/; y. Also,

x e z i~ impossible because, by the transitivity of y, it would imply

x E y. Hence by Lemma 2.4 we must have z e x. This proves that

y k x. Thus y = x E c.r.

■

2.17. Theorem The class W of all ordinals is transitive and e-well-ordered.

§2. Definition and basic properties

59

PROOF
The transitivity of W follows at once from Tom. 2.15. To prove that W is E-well-ordered, we shall make use of Prob. 2.8(iii).
To verify that condition (1 ') of Prob. 2.8(iii) holds for W, let a and f3 be any ordinals. Since both a and f3 are transitive, it is easy to see that an f3 is also transitive. Thus by Lemma 2.16 an f3 is an ordinal,
say y; moreover, y = a or y Ea. Likewise, y = f3 or y E /3. But we cannot have both y E a and y E /3 because then y E a n /3 -
that is, y E y; and this would violate the anti-symmetry of the wellordering relation Ey on y. Therefore y = a or y = (3. Hence a= f3 or a E f3 or f3 E a, which proves condition (1') for W.
Now let u be any non-empty set of ordinals. We must prove that there exists an ordinal ; Eu such that l; nu= 0. Take any a Eu. If an u = 0, we are through.
On the other hand, suppose a n u =fa 0. Since a is e-well-ordered, there must exist some member l; of an u such that ; n an u = 0. But ; e a and a is transitive; so l; k a. Hence ; n u = ; n an u = 0.
■

Z.18. Corollary Wis a proper class (that is, not a set).

PROOF

If W were a set, then by Def. 2.12 and Thm. 2.17 it would be an

ordinal, hence W E W, in violation of the anti-symmetry of the well-

ordering relation E w.

■

Z.19. Remarks
(i) The (naive) assumption that W is a set led to a contradiction. This was the Burali-Forti Paradox (see§ 2 of Ch. 1). Cor. 2.18 is a 'tame' version, within ZF, of the paradox. Similarly, Thm. 1.3.10 is a 'tame' ZF version of Russell's Paradox.
(ii) In the proofs of Thm. 2.17 and Cor. 2.18 we used the argument that an ordinal y cannot be a member of itself because this would violate the anti-symmetry of the well-ordering relation Ey on y. In mathematical practice it is often convenient to posit a further postulate - the Axiom of Foundation (or Regularity), first proposed by Dimitry Mirimanoff in 1917 - one of whose effects is to

60

4. Ordinals

exclude any set that belongs to itself. On the other hand, in some special applications of set theory - notably in so-called situation semantics, developed by Jon Barwise and others, and in abstract computation theory - it is convenient to use an extension of ZF proposed by Peter Aczel, which negates the Axiom of Foundation and admits some sets that belong to themselves. In the present course we do not commit ourselves either way.

2.20. Corollary Any class of ordinals is e-well-ordered.

PROOF

Immediate from Thm. 2.17 and Prob. 2.8(iv).

■

2.21. Definition The e-well-ordering on W shall be denoted by '<'. Thus for any
ordinals a and /3,
a < /3 ¢> a e /3.

2.22. Remarks
(i) As usual, we denote by•~• the blunt version of<. Thus
a "== /3 ¢> a e /3 or a = {3.
(ii) Thm. 2.15 can now be read as saying that if a is any ordinal then a={;:;< a}.
(iii) From now on, whenever we use order-related terminology in connection with ordinals, we shall take it for granted that the order relation referred to is the e-well-ordering, unless otherwise stated.

2.23. Definition
Let< be a partial order on a class A and let BC A.
(i) If u e A and x ~ u for all x e B, then u is said to be an upper bound of (or for) B with respect to <.
(ii) If u is the least member of the class of upper bounds for B w.r.t. < - that is, if u is an upper bound for B w.r.t. < and if u < v

§2. Definition and basic properties

61

whenever v is any other upper bound for B w.r.t. < - then u is
said to be the least upper bound (abbreviated 'lub') for B
w.r.t. <.

2.24. Remarks
(i) The phrase 'with respect to <' is omitted when there is no danger of confusion.
(ii) A subclass B of A need not in general have any upper bound, let alone a lub; but if it has a lub, it is unique.

2.25. Theorem
If A is a set of ordinals then its union-set LJA is an ordinal. Moreover, LJA is the lub ofA.

PROOF

To show that UA is transitive, assume that x eye UA. Then for

some ordinal a we have x e y ea e A. Since a is transitive, it follows
that x e a e A; hence x e UA.

By Thm. 2.15, all the members of UA are ordinals; so by Cor. 2.20

UA is e-well-ordered. Thus UA is an ordinal.
If a e A then a C UA, since UA is a transitive set. Therefore by U U Lemma 2.16 a :,;;;; A. This means that A is an upper bound for A.

Finally, if f3 is any upper bound for A, then for each a e A we have
a:,;;;; /3 - that is, a e /3 or a = {3. By the transitivity of the set f3 it

follows that in either case a C {3. Since this holds for each a e A, it

follows that also UA C /3. By Lemma 2.16 we now have UA:,;;;; /3 -

U which proves that A is the least upper bound for A.

■

2.26. Definition
For any ordinal a we put a' =dr a U {a}. We call ex' the immediate successor of a. (This terminology is justified by the following theorem.)

2.27. Theorem
For any a, a' is an ordinal. Moreover, for any /3, f3:,;;;; a if! f3 < a'
(equivalently: a< f3 if! a',;;;; {3). Hence a< f3 iff a'< /3'.

62

4. Ordinals

PROOF

Easy-DIY.

■

2.28. Definition
(i) An ordinal of the form a' is called a successor ordinal. (ii) An ordinal that is neither 0 nor a successor ordinal is called a
limit ordinal.

§3. The rmite ordinals 3.1. Definition An ordinal a is said to be finite if no ordinal ; ::;;; a is a limit ordinal. Otherwise, a is said to be an infinite ordinal. We put
w =dt {a: a· is a finite ordinal}.

3.2. Theorem w is transitive.

PROOF

Let a be a finite ordinal. We must show that every member of a is also

a finite ordinal. This is easily done - DIY, using Rem. 2.22(ii).

■

3.3. Theorem
(i) 0 is a finite ordinal. (ii) If a is a finite ordinal then so is a'.

PROOF.

(i) We know that 0 is an ordinal (Ex. 2.13). But by Def. 2.28(ii) 0

is not a limit ordinal. Since 0 has no members, the only ; such

that ; ~ 0 is 0 itself. Hence 0 is a finite ordinal.

(ii) Let a be a finite ordinal and let ; ..; a'. We must show that ; is

not a limit ordinal. Now, a' itself is a successor ordinal, hence

not a limit ordinal. It remains to consider the case where ; < a'.

By Tom. 2.27 this means that ; ..; a. Since a is a finite ordinal, ;

is not a limit ordinal.

■

§3. The finite ordinals

63

3.4. Theorem w is a set.

PROOF
Using the Axiom of Infinity (Ax. 1.3.21), take a set Z such that 0 E Z and such that whenever x E Z, then also x U { x} E Z. Thus if an ordinal a belongs to Z then (by Def. 2.26) so does a'.
Consider the class w - Z, the class of all finite ordinals not belonging to Z. If this class is non-empty, then by Thm. 2.9 it must have a least member, say /3. Now, /3 cannot be 0, because 0 does belong to Z. Also, /3, being a finite ordinal, cannot be a limit ordinal. So it must
be a successor ordinal, say /3 = a' = a u {a}. But in this case a itself
is a finite ordinal (by Thm. 3.2), such that a< {3. Since /3 was supposed to be the least finite ordinal not belonging to Z, it follows that a E Z. Therefore by the assumption on Z also a' E Z. But this is
impossible, because a' = {3, which is the least finite ordinal not belon-
ging to Z. Sow- Z must be empty. Thus w CZ; hence w is a set by AS. ■

3.5. Corollary w is the unique set X having the following three properties:
(i) 0 EX; (ii) whenever a E X then also a' E X; (iii) X C Z for any set Z such that 0 E Z and such that whenever
a E Z then also a' E Z.

PROOF

Thm. 3.3 says that w has properties (i) and (ii). The proof of Tom. 3.4

shows that w has also property (iii). The uniqueness of w follows by

PX, because if X is any set having the three properties then both

w C X and X C w.

■

3.6. Remarks
(i) Our first use of Al was to prove that w is a set. Conversely, if we postulate that w is a set, then by Tom. 3.3 w is a set satisfying the conditions that AI lays down for Z. This shows that (in the

64

4. Ordinals

presence of the other postulates) AI is equivalent to the proposition that w is a set, which is a special case of the Comprehension Principle. (ii) In fact, it now transpires (Cor. 3.5) that w is simply the smallest set satisfying the conditions of AI.

We restate the fact that w satisfies condition (iii) of Cor. 3.5 as a principle in its own right:

3.7. Corollary (Weak Principle ofInduction on Finite Ordinals)

Let Z be any set such that 0 e Z and such that whenever a E Z then

also a' E Z. Then wi;;: Z.

■

3.8. Remarks
(i) We see that the set w of finite ordinals, with its e-well-ordering, simulates, within the confines of ZF set theory, the behaviour that characterizes the system of natural numbers. We can take 0 as the counterpart of the number O and the e-well-ordering on w as the counterpart of the usual ordering of the natural numbers.
Just as each natural number n has an immediate successor, n + 1, so every finite ordinal a has an immediate successor, a'.
Moreover, the basic facts about the ordering of the natural numbers (Facts 0.1.1-0.1.5) are mimicked by theorems about the finite ordinals and their e-well-ordering. And, most importantly, the Principle of Mathematical Induction is mimicked by the Principle of Induction on Finite Ordinals. Certainly, within ZF w impersonates, plays the role of, 'the set of natural numbers'. In fact, Cor. 3.5 reproduces within ZF Richard Dedekind's famous characterization of the natural numbers.1 (ii) The obvious reductionist step at this point is to identify the ZF-set w of finite ordinals as the 'true' (hitherto intuitive) set N of natural numbers. This would be a grand reduction indeed, because work done during the 19th century by several mathematicians (including Hamilton, Bolzano, Weierstrass, Dedekind and Cantor) showed that all the concepts of mathematical analysis could be reduced to those of natural number, set and membership (plus concepts such as relation and function that we have by
1 Was sind und was sollen die Zahlen?, 1888. (English translation in Essays on the theory ofnumbers edited byW. W. Beman, 1901.}

§ 3. The finite ordinals

65

now reduced to set-theoretic concepts). Thus a huge part, if not the whole, of mathematics would be reduced to set theory.
Many (perhaps most) mathematicians, under the influence of the dominant structuralist ideology, do proceed in this way, and frame (or think of) their mathematical discourse as taking place within set theory.

3.9. Warning
This reduction, although extremely successful in a formal sense, is by no means unproblematic, as Skolem pointed out in 1922, when he published his famous paradox. (We shall discuss Skolem's Paradox in the Appendix.)

3.10. Theorem w is the least infinite ordinal and the least limit ordinal.

PROOF

That w is an ordinal follows at once from Cor. 2.20 and Thms. 3.2 and

3.4. Also, w cannot be a finite ordinal, because that would mean that

w e w - which is impossible for an ordinal. Thus w must be an infinite ordinal. On the other hand, if;< w- that is, ; e w- then by Def. 3.1

; is a finite ordinal; hence w must be the least infinite ordinal. If ; e w then, as we have just seen, ; is a finite ordinal, hence a

fortiori, not a limit ordinal. If w itself were not a limit ordinal then by

Def. 3.1 it would follow that w is a finite ordinal, contrary to what we have proved. Thus w must be a limit ordinal. As we have just

observed, no ordinal smaller than w can be a limit ordinal. Hence w is

the least limit ordinal.

■

3.11. Preview
We have yet to justify the adjectives finite and infinite introduced in Def. 3.1 in connection with ordinals. Dedekind defined a set as infinite if there exists an injection from it to a proper subset of itself, and as finite if there is no such injection. We will not adopt Dedekind's definition, but we shall show that finite and infinite ordinals in the sense of Def. 3.1 are finite and infinite respectively in Dedekind's sense.

66

4. Ordinals

3.12. Theorem
There does not exist an injection from a finite ordinal to a proper subset of itself.

PROOF

We proceed by weak induction on finite ordinals (Cor. 3.7). The proof

is a formal (or 'internalized') version of the proof of Thm. 3.3.4.

Let Z be the set of all finite ordinals a such that there is no injection

from a to a subset of itself. In order to prove our theorem it is enough

to show that 0 e Z and that if a e Z then also a' e Z.

That 0 e Z is obvious, since 0 has no proper subsets. Now assume,

as induction hypothesis, that a e Z and let f be an injection from a' -

that is, from a U {a} - to a subset B of itself. If Bis a proper subset

of a' then the set a' - Bis non-empty.

Without loss of generality we may assume that a belongs to a' - B

rather than to B. (In the contrary case, where a e B, take any member

f3 of a' - B and let g be the bijection from a' to itself that inter-

changes f3 and a but leaves all other members of a' fixed: thus,
gf:J = a, ga = f3 and g; = ; for any ; e a' other than f3 and a. Then

use g of instead of f itself: it is an injection from a' to its proper
subset g[B] = (B - {a}) U {/3}.)

Our assumption that a e a' - B means that B !:: a. Next, let
y = fa; then y must belong to B, since f is a map to B. It now follows that ft a is an injection from a to its proper subset B - {y}. This

contradicts the induction hypothesis. So B cannot be a proper subset

of a'.

■

3.13. Theorem
If a is an infinite ordinal then there is an injection from a to a proper subset of itself.
PROOF
First, consider ro. Define a map f on w (that is, with ro as its domain)
by putting f; =;' for every finite ordinal ;. Then f is injective.
Indeed, if ; and rJ are distinct, say ~ < f/, then by Thm. 2.27 ;' < f/',
hence ;' and f/' are also distinct. Also, f maps ro to (in fact, onto) its proper subset w - {0}.
Now let a be any infinite ordinal. By Thm. 3.10 we have ro :,s;; a,

§ 3. The finite ordinals

67

which means that ro e £Y or ro = a; and since a is a transitive set, it

follows that ro ~ a. Then the map f U idll'-w (with f as before) is

clearly an injection from a to its proper subset a - {0}.

■

3.14. Theorem A finite ordinal is not equipollent to any other ordinal.

PROOF

Let a be a finite ordinal and let f3 be another ordinal. First, suppose f3

is finite as well. We have a< f3 or /3 < £Y - that is, /3 e a or f3 e a - and

since ordinals are transitive sets it follows that a C f3 or f3 C a; hence

by Tom. 3.12 a and f3 cannot be equipollent.

Now suppose f3 is an infinite ordinal. By Tum. 3.13 there exists an
injection, say g, from f3 to a proper subset of itself. If f were a

bijection from a to (3, then clearly 1-10g Of would be an injection

from a to a proper subset of itself - which is impossible.

■

3.15. Definition
A set is finite if it is equipollent to a finite ordinal (in the sense of Def. 3.1). Otherwise, it is infinite.

3.16. Remarks
(i) By virtue of Thm. 3.14, an ordinal is finite (or infinite) in the sense of Def. 3.1 iff it is finite (or infinite, respectively) in the sense of Def. 3.15; so there in no conflict between the two definitions.
(ii) By Thm. 3.14, a finite set is equipollent to a unique finite ordinal.

3.17. Problem
(i) Prove that there does not exist an injection from a finite set to a proper subset of itself. (Use Thm. 3.12.)
(ii) Prove that if A is a non-empty finite set of ordinals, then A has a greatest member - that is, an ordinal a e A such that ; :,;;; a for each~ e A. (Otherwise, define a map f on A by taking, for each a e A, fa as the least ~ e A such that a < ~. Show that f would be an injection from A to a proper subset of itself.)

68

4. Ordinals

3.18. Problem
Let n be a natural number. Show that for any objects ai, a2, ... , an, the set {ai, a2, .•. , an} is finite. (Use weak mathematical induction on the number n.)

§ 4. Transfmite induction Various forms of the Principle of Mathematical Induction have analogues that apply to ordinals. These analogues collectively are known as the Principle of Transfinite Induction. First, by virtue of the fact that W is well-ordered, we have immediately by Thm. 2.9:
4.1. Theorem (Least Ordinal Principl.e) If Xis a non-empty class of ordinals, then X has a least member. ■
Hence other forms of the Principle of Transfinite Induction can be deduced.

4.2. Theorem (Strong Principle of Transfinite Induction) If Xis a class of ordinals such that for every ordinal ;

thenX = W.

1J E Xforevery 1J <;:;,;EX,

PROOF

Let Y = W - X. If Y were non-empty, it would have a least member,
say;. So for each 1J <; we would have 1J EX. But then by(*); e X,

which is impossible. Thus Y must be empty.

■

4.3. Remark
By Rem. 2.22(ii) the antecedent, 1J E X for every 1J < ~. in condition
(*) of Thm. 4.2 is equivalent to the statement that ; ~ X.

4.4. Theorem (Weak Principle of Transfinite Induction) If Xis a class of ordinals satisfying the following three conditions
(i) 0 EX,

§5. The Representation Theorem

69

(ii) for every ordinal;,; e X ~ ;' e X, (iii) for every limit ordinal A, A~ X ~ Ae X,

thenX = W.

PROOF

Assume X satisfies these three conditions. Then by (i) and (iii) X

satisfies condition (*) of Thm. 4.2 for 0 and for limit ordinals.

Now suppose ;' ~ X. By Def. 2.26 it follows that s e X; hence by

(ii);' e X. Thus X satisfies(*) also for successor ordinals.

■

4.5. Remarks
(i) These principles have restricted forms, in which Xis assumed to be a subset of some (arbitrary) given ordinal a rather than a subclass of W. Thus, the form of Thm. 4.1 restricted to an arbitrary ordinal a says that a non-empty subset of a has a least member. The restricted form of Thm. 4.2 says that if X is a subset of a such that for all;< ll' we have ; ~ X ~; e X, then X=a.
(ii) The Principle of Transfinite Induction restricted to the particular ordinal w is precisely the Principle of Induction on Finite Ordinals.

4.6. Problem
Prove the restricted form of Thm. 4.2. Formulate and prove a form of Thm. 4.4 restricted to an arbitrary ordinal.

5.1. Preview

§ 5. The Representation Theorem

In this section we shall show that every well-ordered set is similar in its ordering to a unique ordinal.

5.2. Definition
A partially ordered set (briefly, poset) is a pair (A,<}, where A is a set and < is a [sharp] partial order on A . A totally ordered set is a poset (A, <}, in which < is a total order on A. A well-ordered set is a poset (A, <), in which < is a well-ordering on A.

70

4. Ordinals

5.3. Remarks
(i) This is just a convenient way of packaging a set A together with a particular partial order on A into a single object. It saves us having to keep saying 'such-and-such a set with such-and-such a partial order on it'.
(ii) However, we shall often refer, somewhat inaccurately, to A itself as the poset (or ordered set, or well-ordered set) when, strictly speaking, we have in mind the pair (A, <) . We shall only commit this peccadillo when it is clear from the context which relation < is involved. Thus, we refer to an ordinal a as a well-ordered set, when strictly speaking we mean the pair (a,<), where< is Ea-, the E-well-ordering on a.

5.4. Definition A similarity map (a.k.a. isomorphism) from a poset (A,<) to a poset (A', <') is a bijection f from A to A' such that, for all x and y in A,
x<y¢>fx<'fy.
If such a map exists, (A,<) is said to be similar (or isomorphic) to (A',<').

5.5. Remark
It is easy to see that the identity map idA is a similarity map from (A,<) to itself. Also if f is a similarity map from (A,<) to (A', <')
then its inverse 1-1 is a similarity map from (A', <') to (A, <).
Finally, if f is a similarity map from (A,<) to (A',<') and g is a similarity map from (A', <') to (A", < ") then the composition go f is a similarity map from (A,<) to (A",<").
It follows that similarity is an equivalence relation on the class of posets.

5.6. Theorem
If f is a similarity map from an ordinal a to an ordinal /3 then f is the
identity map ida-, hence a = {3.
PROOF
First, we prove by strong transfinite induction (restricted to a) that
S:,;;; /s for every SE fr.

§ 5. The Representation Theorem

71

Let l_; E a. By the induction hypothesis, if 1J < ,; then 1J:,;;:; /17. But if
* 'f/ <,; then also f'f/ < f,;, since f is a similarity map. Thus for every
'f/ < l; we have 'f/ < f,;. In particular, 'f/ fl; for every 'f/ < l;; in other
words, f,; <,; is impossible. This proves that l;:,;;:; f,; and completes the

induction.

Now, 1-1 is a similarity map from /3 to CY; therefore by the same token we have also s:,;;:; f-1s for all sE /3. Taking s to be fl;, where ,; E a, we obtain f,;:,;;:; 1-1/,; = f Thus f,;:,;;:;,; as well as,;:,;;:; f,;, which

shows that f must be the identity ida-.

■

5.7. Corollary
For any poset (A, -<), there exists at most one similarity map from (A,<) to an ordinal.

PROOF

If f and g are isomorphisms from (A, -<) to a and /3 respectively, then

the composition g O 1-1 is clearly an isomorphism from a to {3.

Therefore a= /3 and g O 1-1 is the identity mapping, which means that

f= g.

■

5.8. Preliminaries (i) For the rest of this section. we consider a fixed but otherwise
arbitrary well-ordered set (A, -<).
(ii) If B ~ A, then B is clearly well-ordered by the relation < n B2,
that is:
{ ( x. y): x E B, and y E B, and x -< y},
which is called the restriction of < to B. Whenever we refer to a subset B of A as well-ordered, we shall mean B with this well-ordering, inherited by B from A. (iii) For each a EA, the segment of A determined by a is the set
Aa =dt {x EA: X-< a}.
(iv) We define a class Fas follows:
F =df { (x, l;) : x EA, and,; is an ordinal,
and Ax is similar to l;}.
By Cor. 5.7, Fis a function (see Def. 2.2.1). We may therefore

72

4. Ordinals

use functional notation in connection with F. Thus 'Fx = s'
means the same as '(x, ;) e F'. Clearly, dom F is a subset of A. By AS dom Fis a set; hence
by AR ran F is a set as well. Note that all the members of ran F are ordinals.

5.9. Lemma
Let Fa= a. Then for any ordinal f3 < a there exists some b < a such
that Fb = /3. Conversely, if b < a then b belongs to dom F and Fb is
some ordinal f3 < a.

PROOF

Let f be the similarity map from Aa to a. Suppose fJ <a.This means

that f3 ea. Therefore fb = f3 for some be A 0 - that is, b < a. Note

that by the transitivity of a we have fJ ~ a. It is easy to verify that

ft Ab, the restriction off to Ab, is a similarity map from Ab to /3.

Hence Fb = fJ.

Conversely, suppose that b < a. This means that be A 0 • Therefore
fb = f3 for some f3 e a - that is, f3 < a. As before, it follows that

Fb = fJ.

■

5.10. Lemma F is injective.
PROOF Let a and b be two distinct members of dom F. We have to show that
Fa =I= Fb. Without loss of generality, we may assume b < a. Let Fa= a. Then by Lemma 5.9 it follows that Fb is some ordinal fJ < a.
■

5.11. Lemma The set ran Fis an ordinal.

PROOF

As a set of ordinals, ran F is e-well-ordered. It remains to prove that it

is a transitive set. Let a e ran F; thus Fa= a for some a e A. Now let

fJ ea - that is, fJ < a. Then by Lemma 5.9 fJ also belongs to ran F,

showing that this set is transitive.

■

§ 6. Transfinite recursion

73

5.12. Theorem (Representation Theorem for well-ordered sets)
There exists a unique similarity map from the well-ordered set A to an ordinal.

PROOF

Uniqueness follows from Cor. 5.7. To prove existence, we shall show

that Fis a similarity map from A to the ordinal ran F. By Lemmas 5.9

and 5.10, Fis a similarity map from dom F, which is a subset of A, to

ran F; so it only remains to establish that dom F is the whole of A.

Suppose not. Then, since A is well-ordered, there would be a least

b EA such that b fJ dom F. Thus, if a EA such that a < b then a must

belong to dom F. On the other hand, if b < a then a cannot be in

dom F because if it were then by the second half of Lemma 5.9 b

would also be in that domain.

It would follow that dom F is exactly Ab. But then F is a similarity

map from Ab to ran F. Thus Ab is similar to the ordinal ran F. By the

definition of F it would then follow that (b, ran F} E F, hence

b E dom F, contradicting the choice of b.

■

5.13. Definition
A set is denumerable if it is equipollent to w. A set is countable if it is finite or denumerable.

5.14. Problem
(i) Let D be a subset of an ordinal a. By Cor. 2.20, D is E-wellordered; and by Thm. 5.12, D is similar to an ordinal {3. Prove
that f3 -s. a. (Let f be a similarity map from (3 to D. Show that
s ; -s. f; for every E /3.)
(ii) Prove that a set is countable iff it is equipollent to a subset of w. (Use (i) to show that every subset of w is countable.)

6.1. Preview

§ 6. Transf"mite recursion

In this section we validate a powerful method of defining functions on
s, W (that is. having W as domain). Roughly speaking, F;, the value of
the function Fat is defined in terms of the 'behaviour' of F for all ordinals smaller than ; .

74

4. Ordinals

6.2. Convention
Throughout this section we let C be a fixed but arbitrary function such that dom C is the class of all sets.

6.3. Definition We shall write '(R,c(F. a)' as short for the statement:
Fis a function and a-' ~ dom F and F; = C(F t ;) for all ; :!:i a-.
The equation 'F; = C(F t ;)' is called an ordinal recursion equation.

6.4. Remarks
s (i) Recall that a'={;: :!:i a-}.
(ii) Note that Ft s = {(71, F71): TJ e s}. Therefore the recursion equa-
tion determines F; in terms of the 'previous behaviour' of F -
the restriction of F to the set of all ordinals TJ < ;. Note also that even if F is a proper class. Ft; is always a set by AR and Tum.
2.2.8. (iii) mc(F, a-) means that F is defined and satisfies the recursion
equation for all ordinals up to a- inclusive. Hence
mc(F, a-)=> mc(F, fJ) for all /3 :!:i a-.

6.5. Lemma
If both mc(F. a) and mc(G, a-) then F; = G; for all; :!:i a-.

PROOF

By (strong) transfinite induction, restricted to a-'. Let ; be any ordinal

:!:i a- (that is. ; < <l'') and assume, as induction hypothesis, that F71 =

Gri for all TJ <; - that is. for all TJ e ;. This means that Ft;=

G ts, hence C(Fts) = C(G t;). It now follows from (R,c(F, a-} and

mc(G, a-) that F; = G;.

■

6.6. Lemma
For any ordinal a- there exists a unique function fa such that domfa =a-'= {;: ; :!:i a-} and such that me(/(¥, a-).

§ 6. Transfinite recursion

75

PROOF

Uniqueness follows from Lemma 6.5. We prove existence by strong
transfinite induction. Assume as induction hypothesis that for each
/3 < a there exists a (necessarily unique) function fp whose domain is /3' = {s: ; ,;,;; /3} such that rQc(fp, {3).
If y ~ /3 < a then by Rem 6.4(iii) we have rQc(fp, y) and hence by Lemma 6.5 fp(;) = fy(;) for all;~ y. This means that fp and fr agree
wherever both of them are defined; in fact, it is easy to see that
fr k f 13. By Prob. 2.4.8, we can therefore glue all the f p together to obtain a single function: we put
f = UU13: /3 < a}.

Clearly, f is a function whose domain is {/3: /3 < a} - that is a itself and it satisfies the recursion equation f /3 = C(ft /3) for all /3 < a. Finally, we extend f to a function defined for all /3 ~ a:
f o: = f U {( a, C(f)) }.

Then domfo: =a'. Also, f = fo: ta and hence fo:(a) = C(f) =

C(fo: ta). Thus f o: satisfies the required recursion equation for all

/3~ a.

■

6.7. Theorem (Definition by transfinite recursion)
We can define a (necessarily unique) function F such that dom F = W
and such that F; = C( F t ;) for all ; E W.

PROOF

To define F, note that the f« of Lemma 6.6 satisfy the recursion equation wherever they are defined, and any two of them agree with each other wherever both are defined. Therefore all we have to do is glue them together:
F =ctt UUo:: a E W}.

It is easy to see that indeed dom F = W and F; = C( F t ;) for every

; E W. Moreover, these two conditions fulfilled by F imply that

rQc(F, a) for all a; hence Fis unique by Lemma 6.5.

■

6.8. Remarks
(i) Note the phrasing of Thm. 6.7: it does not claim that such-andsuch an F exists but that we can define it. To say, in set theory,

76

4. Ordinals

that F 'exists' would mean that it is an object of the theory which is false, since Fis a proper class. In fact, Thm. 6.7 is not a single theorem of set theory, but a meta-theorem or a theorem scheme which shows how, for any given class C fulfilling a certain condition (Convention 6.2), we can define a class F fulfilling certain other conditions. The same applies to any other theorem, postulate and definition in which general statements or stipulations are made concerning classes - for example Def. 1.3.4 and Ax. 1.3.6 (AS): they are not individual statements of set theory, but schemes. (Compare Rem. 3.3.6.) (ii) From Thm. 6.7 (or directly from Lemma 6.6) it is easy to obtain a version of definition by transfinite recursion restricted to any given ordinal a, in which dom F is a instead of W and the
recursion equation F; = C(F t ;) is satisfied for all ; < a.

5 The axiom of choice
§ 1. From the axiom of choice to the well-ordering theorem 1.1. Definition
A choice function on a class J of sets is a function lJ with domlJ = J,
such that qX e X for every Xe J.
1.2. The axiom of choice (AC) states: If cS- is a set ofnon-empty sets then there exists a choice function on J.
1.3. Remarks
(i) AC was the first postulate of set theory (apart from PX) to be stated as such. Its first known explicit formulation is due to Giuseppe Peano (1890), who however rejected it as untenable. It was first proposed as a new valid mathematical principle by Beppo Levi in 1902, although it had been used inadvertently by Cantor and others long before that. Zermelo, who was told about AC by Erhard Schmidt, used it almost at once in his first (1904) proof of the Well-Ordering Theorem (WOT, Cor. 1.6 below), a result that had been conjectured by Cantor. Our formulation of AC is essentially that used by Zermelo in his 1904 paper.
(ii) In his 1908 paper on the foundations of set theory, in which the theory is given its first fully fledged axiomatic presentation, Zermelo does not state AC in this form but in a more restricted version. He assumes that J is a set of non-empty sets that are
pairwise disjoint-that is, X n Y = 0 for any two distinct mem-
bers of cS- (see Def. 3.4.1). He then postulates the existence of a
set A such that, for any X e &, the intersection A n X has
exactly one member.
77

78

5. The Axiom of Choice

This restricted version follows at once from AC. Indeed, if J. is a set of non-empty pairwise disjoint sets, then by AC there exists a choice function fJ on J.. It is then easy to see that, for any
Xe J., ranq. n X = {q.X}.
Conversely, AC in the form we have stated it follows from the restricted version. To show this, let J. be any set of non-empty sets. Put

g= {{X} XX: X eJ'.}.

It is easy to verify that g is a set of non-empty and pairwise disjoint sets. According to the restricted version, there exists a set A whose intersection with each member of g is a singleton. We now define a function q. on J. as follows. For any XE J., the set
{X} x X belongs to g and hence its intersection with A has
exactly one member. This member must be of the form (X, x0), where x0 is some member of X. We put tJX = Xo- Then q. is a choice function on J..
(iii) Using AC, Def. 3.4.11 is easily legitimized. If IAxl = IBxl for each x E X, then by AC there exists a family f = {fx Ix E X}
such that, for each x, fx is a bijection from {x} x Ax to {x} x Bx. Then it is easy to see that Uran f is a bijection from U{{x} x Ax : x E X} to U{{x} x Bx : x E X}. A similar argument applies to Def. 3.5.11. (iv) AC has been regarded with suspicion because it is a purely existential postulate. It asserts the existence of a set - a choice function - without characterizing it as the extension of some previously specified property. In other words, AC is not a special case of the Principle of Comprehension. In this respect AC is mar\{edly different from all other existential postulates of set theory. For example, the Power-set Axiom asserts that, for each set A, there exists the power-set PA, which is characterized as the extension of the property being a subset of A. (v) In 1938 Godel proved that AC is consistent relative to the other, commonly accepted, postulates of set theory, in the sense that if they are consistent, then the addition of AC does not result in inconsistency. In 1963 P. J. Cohen proved that the same holds also for the negation of AC. (vi) AC has some weird (counter-intuitive) consequences. However, its negation has even weirder ones: for example, the direct product of a family of non-empty sets may well be empty. Note

§1. FromACtoWOT

79

also that the finite version of AC - in which the set J'. is assumed to be finite - can be deduced from the remaining postulates of ZF. Thus AC is only needed as an additional postulate for the case where J'. is infinite. It therefore appears as a natural extension to the infinite case of a principle that must in any case be accepted in the finite case. (vii) Most mathematicians regard AC as indispensable: without it, many results in modern mathematics as well as in set theory itself would be unprovable. However, in view of its somewhat controversial status, when the AC is needed for proving a mathematical result, it is customary to point this out.

1.4. Preview Starting from AC, we shall prove a chain of other major principles, all of which tum out to be equivalent to each other and to AC. The first of these principles, which is also the most important, is a corollary of the following theorem.
1.5. Theorem Every set is equipollent to an ordinal.
PROOF
Let A be a set, and let cS be the set PA - {0} of all non-empty subsets of A. By AC there exists a choice function fJ on J'.. Since A is a set, it cannot be the universal class (Thm. 1.3.10); so there exists an object b that does not belong to A.
We now define a function C whose domain is the class of all sets, as follows: for any set x we put
Cx = {~(A - ranx) if xis a map such that ranx CA, otherwise.
Using transfinite recursion (Thm. 4.6.7), we get a function F with W as domain, satisfying the recursion equation F s = C(Ft s) for all s E W. Combining this equation with (*), we obtain for all s:
ifran(Fts) CA, otherwise.

80

5. The Axiom of Choice

* Let s be any ordinal such that F s b. This means that F ts must be a

map from s to A, and

Fs = 9'.(A - ran(Fts)) EA - ran(Ft;).

Thus Fs is a •fresh' member of A, different from FTJ for all T/ < ;.

(What happens is that so long as A is not exhausted by previous values

of F, the new value F; is chosen, using the choice function 9'., as a

* fresh member of A.) If Fs b for all ordinals ; , it would follow that F is an injection

from the proper class W (Cor. 4.2.18) to the set A. This is impossible
by Prob. 2.4.5. So there must exist some ordinals for which Fs = b.

Let a be the least ordinal such that Fa= b. Such an a exists by the

Least Ordinal Principle (Tom. 4.4.1). Then it is easy to see that Ft a is

an injection from a - that is, from the set {s:; < a} - to A. Also,

ran (Ft a) cannot be a proper subset of A. Thus Ft a is in fact a

bijection from a to A.

■

1.6. Corollary (Well-Ordering Theorem) For every set A there exists a well-ordering on A.
PROOF
By Thm. 1.5, there exists a bijection F from an ordinal a to A. Now put

This means that for any members x and y of A, x < y iff s < T/, where

; and T/ are the (necessarily unique) ordinals< a such that x = Fs and

y = FTJ. Clearly, < is a well-ordering on A.

■

1.7. Remarks
(i) With F, a and < as above, Ft a is a similarity map from a to the well-ordered set (A, <).
(ii) Toms. 1.5 and Cor. 1.6 are equivalent to each other. Indeed, the former can easily be deduced from the latter using the Representation Theorem 4.5.12. We shall therefore refer also to both Thm. 1.5 and Cor. 1.6 as the WOT.
Another important consequence of Thm. 1.5 is that the class of cardinals is totally ordered (see Def. 2.3.ll(ii)):

§2. From WOT to AC

81

1.8. CoroUary For any sets A and B, IAI,;;; IBI or IBI,;;; IAI.

PROOF

By Thm. 1.5, A and B are equipollent to ordinals, say a: and /3

respectively. Since the class of ordinals is e-well-ordered, it follows

(see Lemma 4.2.4) that a E /3 or a:= /3 or /3 E a. But ordinals are

transitive sets, hence a~ {3 or f3 ~ a.

■

§ 2. From the WOT via Zorn's Lemma back to AC
We start by proving two simple lemmas about finite sets, which do not depend on AC.

2.1. Lemma
If B C A and A is equipollent to a finite ordinal a:, then B is equipollent
to an ordinal /3 < a. Hence every subset of a finite set is finite.

PROOF

Let B c A, where A is equipollent to a finite ordinal a. Then B is

clearly equipollent to some DC a:. By Prob. 4.5.14(i), D is similar -

and hence equipollent - to some ordinal /3,;;; a. However, since here a
is finite, Tom. 4.3.12 excludes the possibility that /3 = a. Therefore

/3< a.

■

2.2. Lemma If f is a map such that dom f is finite then ran f is finite as well.
PROOF
By Def. 4.3.15, domf is equipollent to a finite ordinal a. Without loss of generality we may therefore assume that domf is a: itself. (Otherwise, replace f by f O h, where h is a bijection from a to domf.) Define a map g from ran f to a: by putting, for each x E ran f,
s gx =dt the least E a such that fl; = x.

82

5. The Axiom of Choice

It is easy to see that g is injective, hence it is a bijection from ran/ to some subset D of a-. By Lemma 2.1, Dis finite; therefore so is ran/.
■

Next, we lay down a few definitions.

2.3. Definition
Let < be a partial order on a class A. A member a of A is said to be maximal in A with respect to < if there is no x e A such that a < x.

2.4. Remarks
(i) When there is no risk of confusion, we shall omit the phrase 'in A
with respect to <'.
(ii) In general, A may not have a maximal member; or it may have more than one.
(iii) Do not confuse maximal with greatest. However, if < is a total order on A and a is maximal in A then a is also the greatest
member of A, in the sense that x < a for any other x < A for any
other x E A. In this case it is clear that A cannot have more than one maximal member.

2.5. Definition If al is any class of sets, we put
Cot =df {(X, Y) E al2 : X CY}. Cot is called the restriction of C to al.

2.6. Remarks
(i) We can also characterize the relation Cot by saying that, for any Xand Y, X Cot Y ~ X e ctl and Y E al and X C Y.
(ii) As noted in Ex. 2.3.8, if al is any class of sets, Cot is a [sharp] partial order on al.

§2. From WOT to AC

83

2.7. Definition A class dl of sets is offinite character if, for any set X,
Xe d <=>Ye ell for every finite Y C X. We shall use the WOT to prove the following useful result.

2.8. Theorem (Tukey-Teichmuller Lemma). If ell is a set of finite character, then for every A E d there exists an Med such that A C Mand Mis maximal in dl w.r.t. Cc1,
PROOF
By the WOT, dl is equipollent to some ordinal a. Let G be a bijection from a to d. Thus

Take any A E ell; we shall hold A fixed for the rest of the proof. Without loss of generality, we may assume that A = G0 - otherwise, we could compose G with the bijection from d to itself that interchanges A with G0 and leaves all other members of d alone.
Using transfinite recursion restricted to a (see Rem. 4.6.S(ii)), we define a map Fon a such that, for every;< a,
if U{F17: 17 < ;) CG;,
otherwise.
(Note that {F17 : 'fJ < ;} = ran (Ft;), so that here Fl; is indeed being
determined in terms of Ft;, as required in transfinite recursion.) It is clear that F is monotone in the sense that whenever 17 """ ; < a
then F17 CF;. We claim that F; e ell for every;< a. We shall prove this claim by
strong transfinite induction restricted to a. Let ; < a; our induction hypothesis is that F17 Ed for every 'fJ < l;.
Now, F; is G; or U{F17: 'fJ < ;}. Since certainly Gl; Ed, we need only prove that the union U{F17 : 17 < l;} belongs to d. But di, is a set of finite character. So it is enough to show that every finite subset of U{F17: 1J < l;} belongs to d. We need only deal with non-empty subsets, since 0 is a finite subset of A, and as such must in any case belong to ell.
Let B be a non-empty finite subset of U{F'fJ : 17 < l;}. Then for each
b E B there exists some '1J < l; such that b e F1]. Define a map f from

84

5. The Axiom of Choice

B to ; by putting, for each b e B,

fb =dr the least 'f/ <; such that b e FrJ.

By Lemma 2.2, ran/ is a finite non-empty set of ordinals < ;. Hence by Prob. 4.3.17(ii) ran/ has a greatest member, say 'f/*. This means that for every be B we have fb ~ rJ*; and, since F is monotone, it follows that F(fb) ~ F(r,*). But by the definition of f we have be F(fb); hence

be F(fb) ~ F(r,*) for every be B.
Thus B !;;: F(TJ*). But 'f/* < ;, so by our induction hypothesis F(rJ*) belongs to o1.; and since o1. is of finite character B, as a finite subset of F(rJ*), must also belong to o1.. This completes the proof that F; e o1. for every ; ~ a.
We now put M = U{FrJ: 'f/ < a}. We shall show that M has the
properties claimed by our theorem. The fact that M E o1. is proved by showing, exactly as before, that every finite subset of M belongs to o1..
Also, it is easy to see that F0 = G0 = A, hence A !;;: M.
It remains to show that M is maximal w.r.t. Cot. Suppose this were not so. Then there would be some XE o1. such that MC X. Now, X must be G; for some ; < a, so the assumption Mc X means that U{FrJ: 'f/ < a} CG;. Hence, a fortiori,

But in this case the definition of F says that F; = G;. It would then

follow that U{F'Y/: 'f/ < a} C F;-which is impossible.

■

2.9. Definition Let (A, <) be a poset. A chain in (A, <) is any subset C of A such
that, for all x and yin C, x < y or x = y or y < x.
2.10. Remark In other words, a chain in {A,<) is a subset of A that is totally ordered by the restriction of< to it.
We shall use the Tukey-Teichmilller (TT) Lemma to prove:

§2. From WOT to AC

85

2.11. Theorem (Hausdorff Maximality Principle)
Let (A,<) be a poset and let (Q be the set of all chains in (A,<). Then every member of (Q is included in some member of (Q that is maximal w.r.t. Cr2.

PROOF

The condition for C being a chain in (A,<) (see Def. 2.9) involves

only two members of C at a time. Hence it is easy to see that the set (Q

of all chains is of finite character. Therefore the TT Lemma applies to

<Q.

■

The most famous and frequently used of all the maximality principles that are equivalent to AC is generally known as 'Zorn's Lemma' although it is arguably due to Kuratowski, who published a version of it in 1922, thirteen years before Zorn. We shall now deduce it from the Hausdorff Maximality Principle (HMP). (For the meaning of upper bound, see Def. 4.2.23.)

2.12. Theorem (Zorn's Lemma)
Let (A, <) be a poset such that every chain in it has an upper bound in A. Then for each a E A there is some u E A such that u is maximal in A w.r.t. < and such that a~ u.

PROOF

As before, let (Q be the set of all chains in (A, <), and consider the poset consisting of (Q with the partial order C@ on it.
The singleton {a} is, trivially, a chain in ( A , <). Hence by the HMP {a} is included in a chain C that is maximal in @ w.r. t. C@. By hypothesis, C has an upper bound u in A. Since a E C, it follows that a~ u.

It remains to show that u is maximal in A. Suppose it were not

maximal. Then there would exist some u such that u < u. Since u is an

upper bound for C, it would follow that x < u for all x e C. But then

C U {u} would be a chain that properly includes C - contradicting the

maximality of C in @.

■

We have shown that AC => WOT => TT Lemma => HMP => Zorn's Lemma.
Now we shall complete the cycle:

86

5. The Axiom of Choice

2.13. Theorem
ACfollows from Zorn's Lemma.

PROOF
Let J. be a set of non-empty sets. We must show that there exists a choice function on J..
If J. is empty then 0 is the required choice function. So from now on we may assume that J. is non-empty.
Let us say that I is a partial choice function (pcf), if I is a choice
function on a subset of J.. Such creatures do exist: for example, if A is any member of J. and a is any member of A then {(A, a)} is a choice function on {A} and hence a pcf. Let (f be the set of all pcfs. (It is easy to verify that (f is indeed a set; DIY.) As we have just seen, (f is non-empty.
We now consider the poset ((f, Cq). Note that if I and fJ are pcfs,
then IC fJ means that dom/ C domfJ and IX= qX for each Xe dom/.
We shall show that ((f, Cq} satisfies the condition of Zorn's
e Lemma. To this end, let us consider any chain in this poset. We
claim that its union, Ue, is an upper bound fore in (f.
For any I e @ we obviously have I k Ue. So it only remains to show
that Ue belongs to (f; in other words, that Ue is a pcf.
Since every member of e, being a pcf, is a set of ordered pairs
( X, x) such that x e X e J., it is clear that U@ likewise is a set of ordered pairs of this kind. It only remains to show that U@ is a function.
Now, if both I and fJ are members of @ then, since e is a chain, we must have I k 'l- or fJ ~ I- Therefore X e dom I n dom fJ then IX = fJX. Thus the coherence condition is fulfilled, showing that U@ is
indeed a function (see Prob. 2.4.8). We can now apply Zorn's Lemma to the poset ((f, Cq}. Since (f is
non-empty, it follows from the Lemma that there exists some fJ E (f that is maximal w.r.t. Cq. Such fJ is a pcf - a choice function on a subset of J.. However, if domq were not the whole of J., we could take any A e J'. - domfJ and any a e A, and put

l='l-U{(A,a)}.

Then I would be a pcf such that fJ C I, contradicting the maximality of

fJ· Therefore fJ must be a choice function on the whole of J..

■