t ;-; r . I; .1 .1 • This is an introduction to set theory and logic that starts completely from scratch. The text is accompanied by many methodological remarks and explanations. A rigorous axiomatic presentation of Zermelo-Fraenkel set theory is given, demonstrating how the basic tdncepts of mathematics have apparently been reduced to set theory. This is followed by a presentation of propositional and firstorder logic. Concepts and results of recursion theory are explained in intuitive terms, and the author proves and explains the Iimitative results of Skolem, Tarski, Church and Godel (the celebrated incompleteness theorems). For students of mathematics or philosophy this book provides an excellent introduction to logic and set theory. Cover design by Chris McLeod CAMBRIDGE UNIVERSITY PRESS Set Theory, Logic and their Limitations [ ---~------------------------------------- Set Theory, Logic and their Limitations Moshe Machover King's College London ~ CAMBRIDGE ~ UNIVERSITY PRESS Published by the Press Syndicate of the University of Cambridge The Pitt Building, Trumpington Street, Cambridge CB2 lRP 40 West 20th Street, New York, NY 10011-4211, USA 10 Stamford Road, Oakleigh, Melbourne 3166, Australia © Cambridge University Press 1996 First published 1996 A catalogue record for this book is available from the British Library Library of Congress cataloguing in publication data available ISBN O521 47493 0 hardback ISBN O521 47998 3 paperback Transferred to digital printing 2003 KT Contents Preface vii 0 Mathematical induction 1 1 Sets and classes 9 2 Relations and functions 23 3 Cardinals 36 4 Ordinals 53 5 The axiom of choice 77 6 Finite cardinals and alephs 88 7 Propositional logic 101 8 First-order logic 142 9 Facts from recursion theory 194 10 Limitative results 210 Appendix: Skolem's Paradox 275 Author index 283 General index 284 V Preface This is an edited version of lecture notes distributed to students in two of my courses, one on set theory, the other on quantification theory and limitative results of mathematical logic. These courses are designed primarily for philosophy undergraduates at the University of London who bravely choose the Symbolic Logic paper as one of their Finals options. They are also offered to mathematics undergraduates at King's College, London. This then is a discourse addressed by a mathematician to an audience with a keen interest in philosophy. The style of technical presentation is mathematical. In particular, in logical notation and terminology I generally conform to the usage of mathematicians. (It seems that in this matter philosophers in any case tend follow suit after some delay.) But philosophical and methodological issues are often highlighted instead of being glossed over, as is quite common in texts addressed primarily to students of mathematics. A naive presentation of set theory may be in order if the main aim is instrumental: to acquaint would-be practitioners of mathematics with the basic tools of their chosen trade and to inculcate in them methods whereby nowadays the entire science is apparently reduced to set theory. In a course of that kind, the student is understandably not encouraged to scratch where it does not itch. But in the present course such an attitude would be out of place. To ,be sure, here as well set-theoretic concepts and results are needed as tools for formulating and proving results in mathematical logic. But it would be perverse not to alert would-be philosophers to the problematic aspects of settheoretic reductionism. These considerations have largely dictated the presentation of set theory: axiomatic, albeit unformalized. Critical notes about set theoretic reductionism are sounded from time to time as a leitmotiv, rounded off in a coda on Skolem's Paradox. Also, the technical vii viii Preface exposition of set theory is accompanied by historical remarks, mainly because a historical perspective is needed in order to appreciate the emergence of reductionism and the anti-reductionist critique. In the exposition of mathematical logic, I have drawn heavily on Chs. l, 2, 3 and 7 of B&M (see Note below), which I had used for many years as a main text for a postgraduate logic course. However, considerable portions of the material presented in B&M had to be omitted, either because they are too hard or specialized, or simply for lack of time. My greatest regret is that there is not enough time to include both linear and rule-based logical calculi (my own favourite is the tableau method). For certain technical reasons I had to sacrifice the latter. However, as partial compensation, the linear calculi are developed in a way that makes it clear that the logical axioms are mere steppingstones towards rules of deduction: once these rules are established, the axioms can be shelved. Thus in practice the presentation comes quite close to being rule-based. The axiom schemes have been designed so as to make their connection with deduction rules quite direct and transparent. (The connoisseur will note that the propositional axiom schemes have been chosen so that omitting one, two or three of them results in complete systems for intuitionistic implication and negation, classical implication, and intuitionistic implication. In particular, the only axiom scheme that is not intuitionistically valid is a purely implicational one.) Propositional logic is studied with reference to a purely propositional language, rather than a first-order language as in B&M. This is done for didactic reasons: although propositional languages in themselves are of little interest, students are less intimidated by this approach. For some tedious proofs that have been omitted, the reader is referred to B&M. These omissions are more than balanced by the addition of extensive methodological and explanatory comments. A case in point is Lemma 10.10.12 (see Note below), which is the main technical result needed for the present version of the GodelRosser First Incompleteness Theorem. I have omitted its proof, but added a detailed analysis of the meaning of the lemma and the reason why its proof works. When this is understood, the proof itself becomes a mere technicality, almost a foregone conclusion. The analysis is resumed after the proof of the Godel-Rosser Theorem, to explain the meaning of the Godel-Rosser sentence and the reason for its remarkable behaviour. Preface IX One major respect in which this course is not self-contained is its heavy borrowing from recursion theory. For further details, see Preview at the beginning of Ch. 9. The Problems are an essential part of the text; the results contained in many of them are used later on. Moshe Machover Note • Throughout 'B&M' refers to J. L. Bell and M. Machover, A course in mathematical logic, North-Holland, 1977 (second printing 1986). • The system of cross-references used here is quite common in mathematical texts. It is illustrated by the following example. 'Def. 2.3.4' refers to the fourth numbered article (which in this case is a definition) in § 3 of Ch. 2. Within Ch. 2, this definition is referred to, more briefly, as 'Def. 3.4'. • I would like to express my gratitude to Roger Astley, Michael Behrend and Tony Tomlinson of Cambridge University Press for their expert help in preparing the manuscript. Warning In the last three chapters of this book there is a systematic interplay between parallel sets of symbols; one set consisting of symbols in ordinary (feint) typeface: '=', '-,', 'v', 'A','~', '3', '\/', 'X', '+' and the other of their bold-face counterparts: '=', '--,', 'v', 'A','--+', '3', 'V', 'X', '+'. For explanations of the purpose of this system of notation, and warnings against confusing a feint symbol with its bold-face counterpart, see Warnings 8.1.2, 9.1.4 and 10.1.11 and Rem. 10.1.10. Unfortunately the bold-face characters could not always be made as distinct from their feint counterparts as would be desirable. The reader is therefore urged to exercise special vigilance to discern which typeface is being used in each instance. 0 Mathematical induction § 1. Intuitive illustration; preliminaries A familiar trick: dominoes standing on end are arranged in a row; then I I I I I ... 0 12 n n+l the initial domino (here labelled '0') is given a gentle push - and the whole row comes cascading down. If you want to perform this trick, how can you make sure that all the dominoes standing in a row will fall? Clearly, the following two conditions are jointly sufficient. 1. The initial domino (domino 0) is made to fall to the right (for example, by giving it a push). 2. The dominoes are arranged in such a way that whenever any one of them (say domino n) falls to the right, it brings down the next domino after it (domino n + 1) and causes it also to fall to the right. A moment's reflection shows that these two conditions are sufficient whether the row of dominoes is finite or proceeds ad infinitum. (In the former case, Condition 2 does not apply to the last domino.) The reasoning that allows us to infer from Conditions 1 and 2 that all the dominoes will fall is based on the Principle of Mathematical (or Complete) Induction. This is a fundamental - arguably the most fundamental - fact about the so-called natural numbers (0, 1, 2, etc.). It has several equivalent forms, three of which will be presented here. 1 2 0. Mathematical induction WARNING The term 'induction' used here has nothing to do with inductive reasoning in the empirical sense. We shall make use of the following terminology and notation. By number we shall mean natural number. The class {O, 1, 2, ... } of all numbers will be denoted by 'N'. We shall use lower-case italic letters as variables ranging over N. If P is a property of numbers and n is any number, we write 'Pn' to mean that n has the property P. The extension of P is the class of all numbers n such that Pn. This class is denoted by ' {n : Pn}'. From an extensional point of view, Pis identified with its extension: P = {n: Pn}; and hence Pn is equivalent tone P. (Here 'e' is short for 'is a member of'.) We write '::;.' as short for 'implies that', 'iff' or '' as short for 'if and only if, "r:/' as short for 'for all', and 'm ~ n' as short for 'm < n orm=n'. We state here as 'facts' the following elementary properties of the ordered system of numbers. I.I. Fact The relation < between numbers is transitive: whenever k < m and m < n, then also k < n. 1.2. Fact The relation < obeys the trichotomy: for any numbers m and n, exactly one of the following three holds: m P(n + 1)] (i.e., that whenever n is a number having the property P then its successor n + 1 also has P). In schematic form: (2.1) PO, 'v'n[Pn => P(n + 1)] 'v'nPn A proof of a statement \/nPn by weak induction thus falls into two sections. One section, called the basis of the inductive proof, is a proof that PO holds. The other section, called the induction step, is a proof that \/n[Pn => P(n + 1)]. When these two sections are completed (not necessarily in the above order), the proof that\/nPn is complete. In the induction step, in order to prove that \/n[Pn => P(n + 1)], you have to show that if n is any number such that Pn holds, then P(n + 1) holds as well. In other words, you have to deduce P(n + 1) from the assumption that Pn holds. The latter assumption is called the induction hypothesis. The induction step is therefore performed as follows. You consider an arbitrary number, say n, about which you make just one assump- tion: that Pn holds (the induction hypothesis). Using this assumption, you try to deduce that P(n + 1). When this is achieved, the induction step is complete. In using the induction hypothesis Pn to deduce P(n + 1), you are merely considering an arbitrary hypothetical n for which Pn holds, without however committing yourself to the assumption that such a number exists; in other words, you a-re adopting Pn as a provisional hypothesis. If you succeed in deducing P(n + 1) from this provisional hypothesis, then you have established the conditional statement Pn => P(n + 1); and as you have established this for arbitrary number n, you are entitled to infer that \/n[Pn => P(n + 1)). Note that if you have completed the induction step only (without the 4 0. Mathematical induction basis - that is, you have not proved that PO) then you are not entitled to conclude that Pn holds for all numbers n; indeed you are not even entitled to conclude that there exist any numbers n for which Pn holds. For example, let P be the property of being a number that is greater than itself; so Pn means that n > n. Now, from the hypothesis n > n it is easy to deduce n + 1 > n + 1 (for example, by adding 1 to both sides of the hypothesis); so we have shown that Vn[Pn => P(n + l)]. But it doesn't follow that there is any number greater than itself. 2.2. Remark The Weak Principle of Induction was first invoked in 1653 by Pascal in the proof of one of the results (Corollary 12) in his Traite du triangle arithmetique (published in 1665). Pascal does not give an explicit formulation of the principle in general, for arbitrary P; but from his presentation of the method of proof it is clear that the general principle is being invoked. We shall not reproduce Pascal's proof here. Instead, we shall illustrate the use of weak induction in proving a simpler result. 2.3. Example We shall prove that, for all n, 0 + 1 + 2 + • • • + n = n(n + 1)/2. PROOF Define the property P by stipulating that Pn iff (*) holds for n. We show by weak induction that VnPn. Basis. For n =0 the sum on the left-hand side reduces to 0, and the value of the right-hand side is 0. Thus PO. Induction step. Let n be any number such that Pn; thus our induction hypothesis is that (*) holds for this n. Then 0 + 1 + 2 + • • · + n + (n + 1) = n(n + 1)/2 + (n + 1) by ind. hyp., = (n + l)(n/2 + 1) = (n + l)(n + 2)/2. (The last two steps consist of simple algebraic manipulation.) Thus § 3. Strong induction 5 from the induction hypothesis we have deduced that 0 + 1 + 2 + • • • + (n + 1) = (n + l)(n + 2)/2. This equation says that P(n + 1) - it is the same as(*), but with n + 1 in place of n. So we have shown that Pn => P(n + 1). ■ § 3. Strong induction The so-called 'Strong' Principle of Induction can be stated schematically as follows: (3.1) v'n[v'm < nPm => Pn) v'nPn Here, as before, P is any property of numbers. We have written ''Ii m < nPm' as short for 'all numbers m smaller than n have the property P'. Thus, to prove that all numbers have a given property P, it is enough to prove that v'n[v'm < nPm => Pn). To do this, you have to show that if n is any number such that v'm < nPm holds, then Pn holds as well; in other words, you have to deduce Pn from the assumption that 'r:lm < nPm. This assumption is called the induction hypothesis. Note that a proof by strong induction does not have a separate 'basis' section. As in the case of weak induction, here too the induction hypothesis 'Ii m < nPm is adopted provisionally, without presupposing it to be actually true. However, unlike the case of weak induction, here there is one particular value of n for which the hypothesis v'm < nPm is in fact always automatically true. To see this, observe that there does not exist any m such that m < O; this follows at once from Facts 1.2 and 1.4. Therefore any statement of the form 'for all m < 0, ... ' (that is, ''Ii m < 0 ... ') is considered by convention to be vacuously true. In particular, v'm < OPm is always true. 3.2. Theorem The Strong Principle of Induction follows from the Weak Principle of Induction. 6 0. Mathematical induction PROOF Assume that P is a property of numbers such that Vn[Vm < nPm => Pn] holds. We shall show, using weak induction, that \inPn holds as well. To this end, we define a new property Q by stipulating that, for any number n, Qn df \Im< nPm. (The subscript 'df is short for 'definition'.) Note that our assumption regarding P can now be rewritten as Vn[Qn => Pn]. We shall apply weak induction to Q, to prove that \inQn holds. First, observe that by(*) QO is the same as \Im< OPm, which - as we have noted - is vacuously true. Next, let n be a number and suppose (as induction hypothesis) that Qn holds. From this hypothesis we shall deduce that Q(n + 1) holds as well. Using our induction hypothesis we infer from(**) that Pn holds. We therefore have both Qn and Pn. But by(*) Qn means \Im< nPm. Therefore what we have shown is that (***) Pm holds for all m ~ n. From Facts 1.2 and 1.3 it is easy to see that m ~ n is equivalent to m < n + 1, hence(***) can be rephrased as Pm holds for all m < n + 1, which, by the definition (*) of Q, means that Q(n + 1) holds This completes the proof of VnQn by weak induction. From \inQn, which we have just proved, together with (**) it follows at once that Pn holds for all n. ■ § 4. The Least Number Principle Let M be any class of numbers; that is, Mk N (Mis a subclass of N). By a least member of M we mean a number a E M such that a ~ m for allmeM. Using Fact 1.2, it is easy to see that M cannot have more than one least member; so if M has a least member we can refer to the latter as the least member of M. § 4. The Least Number Principle 7 The Least Number Principle (LNP) states: If M ~ N and M is non-empty then M has a least member. 4.1. Theorem The LNP follows from the Strong Principle of Induction. PROOF Let M ~ N and suppose that M does not have a least member. We must show M is empty. To this end, let P be the property of not belonging to M. Thus, for any n, Pn ctf n ff. M. To show that M is empty is tantamount to showing that VnPn holds. We shall do so by applying strong induction to P. So let n be any number, and assume (as induction hypothesis) that \Im < nPm holds. By the definition of P, our induction hypothesis means that for all m < n we have m ff. M. This is equivalent to saying that m < n is not the case for any m E M. But by Fact 1.2 this means that n ,s; m for all m E M. Therefore n cannot belong to M, otherwise it would be the least member of M, contrary to our assumption that M has no such member. Hence Pn holds. and our induction is complete. ■ We shall now complete the cycle by proving: 4.2. Theorem The Weak Principle of Induction follows from the LNP. PROOF Let P be a property of numbers such that PO and Vn[Pn => P(n + 1)) hold. We must prove that \fnPn holds. This amounts to showing that the class M =ctf {n: Pn does not hold} is empty. By the LNP, it is enough to show that M has no least member. Suppose that M does have a least member, m. Since PO holds, 0 is 8 0. Mathematical induction not in M; hence m =I= 0. Therefore by Fact 1.5 there is a number n such that m = n + 1. From Fact 1.3 it follows at once that n < m. If n were in M, then we would have m ..,; n, because m is the least member of M; but m ~ n is excluded by Fact 1.2, since we already have n < m. Therefore n cannot be in M, which means that Pn must hold. From our assumption that 'v'n[Pn => P(n + l}] it now follows that P(n + l} holds; in other words, Pm holds. But then m cannot be a member of M, let alone the least member. Thus our assumption that M has a least member leads to contradiction. ■ We have thus shown that the Weak Principle of Induction, the Strong Principle of Induction and the LNP are equivalent to one another. 4.3. Remark While there is no evidence that the ancient Greek mathematicians knew the Principles of Weak and Strong Induction, they did use mathematical induction in the form of the LNP. We shall quote here from a proof of Proposition 31 in Euclid's Elements, Book VIII. First we need a few definitions. By arithm6s (plural: arithmoi) the Greeks meant what we call natural number greater than 1. An arithmos b is said to measure an arithmos a if b < a and b goes into a (in modern terminology: bis a proper divisor of a). An arithmos a is said to be composite if there is an arithmos that measures it; otherwise, a is said to be prime. In Proposition 31 of Book VII, Euclid claims that every composite arithmos is measured by some prime arithmos. He writes: 'Let a be a composite arithmos. I say that it is measured by some prime arithmos. For since a is composite, it will be measured by an arithmos, and let b be the least of the arithmoi measuring it.' Here the LNP is clearly invoked. The proof is now easily concluded: b must be prime; otherwise, it would be measured by some smaller arithmos c, which must then also measure a - contrary to the choice of bas the least of the arithmoi measuring a. Euclid also gives another proof of the same proposition, in which he uses yet another form of the Principle of Induction: There does not exist an infinite decreasing sequence of natural numbers .1 1 On these matters see David Fowler, 'Could the Greeks have used Mathematical Induction? Did they use it?', Physis, vol. 311994 pp. 252-265. 1 Sets and classes 1.1. Preview § 1. Introduction Set theory occupies a fundamental position in the edifice of modern mathematics. Its concepts and results are used nowadays in virtually all standard mathematical discourse - not only in pure mathematics, but also in applied mathematics and hence in all the mathematics-based deductive sciences. In particular, set theory is used extensively in technical discussions of logic and analytical philosophy. The purpose of Chs. 1-6 is to present a minimal core of set theory, adequate for the kind of application just mentioned. In particular, we shall provide the set-theoretical vocabulary, notation and results needed in later chapters, devoted to Symbolic Logic. We shall not venture into the higher reaches of the theory, which are of interest to specialist set-theorists. Nor shall we attempt a systematic logical-axiomatic investigation of set theory itself. 1.2. Further reading There are hundreds of books on set theory, many of them very good. Among those pitched at a level similar to this course, there are two classics: Abraham A Fraenkel, Abstract set theory, Paul R Halmos, Naive set theory. Both contain more material than our course. Fraenkel's book is suitable for readers with relatively little previous mathematical knowledge. If you are mathematically more experienced, you may find it too slow or verbose. Halmos is then likely to be more suitable. For a more advanced, logical-axiomatic study of set theory, the two 9 10 1. Sets and classes original masterpieces are: Kurt Godel, The consistency of the continuum hypothesis (1940), Paul J Cohen, Set theory and the continuum hypothesis (1966). An alternative exposition of Godel's results and some additional related material is in Chapter 10 of B&M. An alternative exposition of Cohen's results and much additional related material is in John L Bell, Boolean-valued models and independence proofs in set theory. 1.3. Intuitive explanation Intuitively speaking, a set is a definite collection, a plurality of objects of any kind, which is itself apprehended as a single object. For example, think of a lot of sheep grazing in a field. They are a collection of sheep, a plurality of individual objects. However, we may (and often do) think of them - it - as a single object: a herd of sheep.1 Note that in order to qualify as a set, the collection in question must be definite. By this we mean that, if a is any object whatsoever, then a either definitely belongs to the collection or definitely does not. For this reason there is no such thing as the set of all blue cars, if 'blue' and 'car' are understood in their everyday fuzzy sense: my car is sort of bluish, and a friend of mine has a vehicle that is half-way between a car and a sad joke. (Most collections and concepts that are used in everyday thinking and discourse are fuzzy; some philosophers have therefore attempted to construct a theory of so-called fuzzy sets which are clearly not sets at all in the present sense of the term. This difficult subject lies outside the scope of our course.) From now on, whenever we speak of a collection (or plurality) we shall tacitly take it to be definite, in the sense just explained. We shall also use the word class as synonymous with collection. The objects belonging to a class may be of any kind whatsoever physical or mental, real or ideal. In fact, being an object (in the sense in which we shall use this term) is tantamount to being capable of belonging to a collection. In particular, since a set is a class regarded as a single object, it can itself belong to a class. So we can have a class some, or even all, of 1 Cf. Eric Partridge, Usage and abusage: 'coU,EcnVE NOUNS; ... Such collective nouns as can be used either in the singular or in the plural (family, clergy, committee, Parliament), are singular when unity (a unit) is intended; plural, when the idea of plurality is predominant.' § 1. Introduction 11 whose members are sets If such a class, in turn, is regarded as a single object, we get a set having sets as (some of its) members. Thus, there are sets of sets (sets all of whose members are sets), sets of sets of sets, and so on. The objects dealt with by set theory are therefore of two sorts: sets, and objects that are not sets. An object of the latter sort is called an individual; the German term Urelement (plural: Urelemente) is often used as well for such an object. Somewhat surprisingly, it has turned out that, as far as applications to pure mathematics are concerned, individuals are in principle dispensable, so that set theory can confine itself to sets only. We shall not make any ruling on this matter. Unless otherwise stated, what we shall say will apply regardless of whether, or how many, individuals are present. 1.4. Definition We write 'a e A' as short for '[the object) a belongs to [the class] A'. The same proposition is also expressed by saying that a is a member of A, or an element of A, or that A contains a. We write 'a fJ A' to negate the proposition that a e A. A class is specified by means of a definite property, say P, for which it is stipulated that the condition Px is necessary and sufficient for any object x's membership in the class. 1.5. Definition lf P is any definite property, such that the condition Px is meaningful for an arbitrary object x, then the extension of P, denoted by '{x: Px}', is the class of all objects x such that Px. Thus a e {x : Px} iff Pa. Classes having exactly the same members are regarded as identical. Let us state this more formally: 1.6. Principle ofExtensionality (PX) If A and B are any classes such that, for every object x, xeA~xeB, then A= B. 12 1. Sets and classes For example, the two classes {x: xis an integer such that x2 = x}, {y : y is an integer such that -1 < y < 2} are equal: although the two defining conditions differ in meaning, they are satisfied by the same objects - the integers Oand 1. 1.7. Remark Set theory (along with other parts of present-day mathematics) is dominated by a structuralist ideology, which entails an extensionalist view of properties. This means that properties having equal extensions are considered to be equal; thus a property and its extension uniquely determine each other. § 2. The antinomies; limitation of size Since ancient times, mathematicians have dealt with infinite pluralities as a matter of course - an obvious example is the class of positive integers. However, until well into the 19th century there was great reluctance to regard such pluralities as single objects, as sets in the sense explained in 1.3. The infinitude of a class meant that more and more of its members could be constructed or conceived of, without limit. But to apprehend such a plurality as a single object seems to imply that all its members have 'already' been constructed or conceived of, or at least that they are somehow all 'out there'. This idea of a completed or actual - rather than potential - infinity was (rightly!) regarded with utmost suspicion. However, the needs of mathematics as it developed in the 19th century drove Georg Cantor (1845-1918) to create his Mengenlehre, set theory, which admits infinite classes as objects. Despite early hostility, set theory was soon accepted by the majority of mathematicians as a powerful and indispensable tool; indeed, many regard it as a framework and foundation for the whole of mathematics. The success of set theory first lured its adherents into assuming that every class can be regarded as a set. This assumption, known as the Comprehension Principle, is however untenable: it leads to certain logical contradictions or antinomies. The first such antinomy to be discovered is called the Burali-Forti Paradox, after the person who first published it, in 1897; but Cantor himself had been aware of it at §2. The antinomies; limitation of size 13 least two years earlier. The antinomy results directly from the assumption that the class W of all ordinals is a set. (The theory of ordinals is an important but quite technical part of set theory. In Ch. 4, when we study the ordinals, we shall prove that W cannot be a set.) Similar antinomies were later discovered by Cantor himself and by others. Cantor was not too disturbed by these discoveries. He noticed that the antinomies arose from applying the Comprehension Principle to classes that were not just infinite but extremely vast. (An early result of his set theory was that not all infinite classes have the same 'size'.) He concluded that some classes are not merely infinite but absolutely infinite, hence simply too large to be comprehended as a single object. Set theory would be on safe ground if the Comprehension Principle were restricted to classes of moderate size.1 However, he did not specify precisely how to draw the line between moderately large infinite classes, which can be regarded as sets with impunity, and vast ones, which cannot be so regarded. Matters came to a head in 1903, when Bertrand Russell published a new antinomy, Russell's Paradox, which he had discovered two years earlier. Whereas previous antinomies arose in rather technical reaches of set theory and therefore required lengthy expositions, Russell's Paradox checkmated the Comprehension Principle in two simple moves, as follows. Let S =ctf {x: xis a set such that x '1- x}. Assuming that S is a set, it follows that S e S iff S satisfies the defining condition of S - that is, iff S fJ S. This is absurd. The fact that an antinomy follows so easily from apparently sound assumptions plunged set theory and logic (which cannot be sharply demarcated from set theory) into a crisis. In 1908, two solutions were proposed to this crisis. Both amounted to imposing restrictions on the Comprehension Principle - but in two very different ways. The first, proposed by Russell himself and embodied in his type theory, refused to accept {x : Px} as an object if the condition Px is impredicative (that is, refers to a totality to which the object, if it did exist, would belong).2 Russell's type theory, elaborated 1 See Michael Hallett, Cantorian set theory and limitation ofsize. 2 Russell's paper, 'Mathematical logic as based on the theory of types', is reprinted in van Heijenoort, From Frege to Godel. 14 1. Sets and classes by Whitehead and him in their three-volume Principia Mathematica (1910, 1912, 1913) as a total system for logic and mathematics, turned out to be quite complicated and cumbersome; and, at least in part because of this, has won very few adherents. The other solution, proposed by Ernst Zermelo, embodied an idea similar to that entertained by Cantor: limitation of size.1 Zermelo proceeded to develop set theory axiomatically: he laid down postulates, or [extralogical] axioms, from which the theorems of set theory were to be deduced by elementary logical means. Besides an Axiom of Extensionality (for sets), Zermelo's axioms include certain particular cases of the Comprehension Principle, which are regarded as safe because - as far as one can tell - they do not allow the formation of over-large sets and do not give rise to antinomies. In addition, Zermelo postulated a special axiom, the Axiom of Choice, which is not a restricted form of the Comprehension Principle, but is needed for proving certain important results in set theory itself and in other branches of mathematics.2 In 1921-2, Abraham Fraenkel, Thoralf Skolem and Nels Leones (independently of one another) proposed one further postulate, the Axiom of Replacement, which is vital for the internal needs of set theory rather than for applications to other branches of mathematics. This postulate is another apparently safe special case of the Comprehension Principle.3 The resulting theory - known as Zermelo-Fraenkel set theory (ZF) has proved to be very convenient and has been adopted almost universally by users of set theory. While Zermelo's axiomatic approach is, as far as we can tell, sufficient for blocking the logical antinomies, such as the Burali-Forti and Russell Paradoxes, it does not ward against another sort of antinomy, which may be called linguistic or semantic. Here is a modified version of a linguistic antinomy published in 1906 by Russell, who attributed it to G. G. Berry. Some English expressions define natural numbers; for example, 'zero', 'the square of eightyseven', 'the least prime number greater than eighty-seven million'. 1 Russell too had briefly toyed with the same idea in 1905. 2 A translation of Zermelo's paper, 'Investigations in the foundations of set theory I', is printed in van Heijenoort, From Frege to Godel. 3 This postulate, as well as Zermelo's Axiom of Separation and Axiom of Union Set, had in fact been foreshadowed in 1899 by Cantor, in a letter to Dedekind, a translation of which is printed in van Heijenoort, From Frege to Godel. §3. Zermelo's axioms 15 Only finitely many numbers can be defined by English expressions that use fewer than 87 letters, since clearly there are only finitely many such expressions. Hence the class M of natural numbers not so definable must be non-empty. By the Least Number Principle (see §4 of Ch. 0), M has a unique least member: the least natural number not definable by an English expression using fewer than eighty-seven letters. But observe: the italicized part of the previous sentence is an English expression using just 86 letters, which (presumably) defines a number that cannot be defined by an English expression using less than 87 letters! On the face of it, this antinomy affects arithmetic rather than set theory. However, as we shall see in §3 of Ch. 4 and§ 1 of Ch. 6, the arithmetic of natural numbers can be simulated within set theory, so that Berry's antinomy threatens set theory as well. We cannot go here into a detailed discussion of the linguistic antinomies. Suffice it to say that the source of the trouble is that the notion of definite property, and hence also that of class (as the extension of such a property) has been left too loose and vague. Thus, for example, the property of being definable by an English expression using fewer than eighty-seven letters does not have a rigorously defined meaning. These antinomies can be blocked by laying down precise conditions as to what may count as a definite property (or a class). 1 This may be done by specifying a formal language with precise structure and rules, and allowing as definite properties only such as can be expressed formally in this language. For a formalized presentation of ZF see, for example, Chapter 10 of B&M. We shall present a fairly rigorous but unformalized version of ZF. However, if desired it would be easy in principle (though tedious in practice) to formalize our treatment. § 3. Zermelo's axioms Here we present (with minor modifications) Zermelo's axioms except for the Axiom of Choice, which we shall discuss in Ch. 5 First, we shall assume that our universe of discourse - the class of all 1 The first to formulate such precise conditions was Hermann Weyl in Das Kontinuum (1918). A simila~ (and somewhat more formal) characterization was given independently by Skolem m a 1922 paper whose translation, 'Some remarks on axiomatized set theory', is printed in van Heijenoort, From Frege to Godel. 16 1. Sets and classes objects with which set theory deals - is non-empty. We do not announce this assumption officially as a special postulate, because it is conventional to consider it as a logical presupposition. The objects in the universe of discourse are of two distinct sorts: sets and individuals. Classes are admitted as extensions of properties: if P is a definite property of objects, then we admit the class A= {x: Px}. Note that, by Def. 1.5, to say that a e A is just another way of saying that Pa (the object a has the property P). In order to block the semantic antinomies we must however insist that P be defined in purely set-theoretic terms, without using extraneous concepts. The universe of discourse itself can be presented as a class according to this format: it is {x: x = x}. Although we refer to a class in the singular, this is merely a manner of speaking and does not imply that the class is necessarily a single object. From the axioms it will follow, however, that certain classes are sets, and hence objects of set theory. Each set is identified with the class of all its members. The universe may also contain other objects, called individuals. An individual is not a set and has no members. As we shall see shortly, there is also a set that has no members - the empty set. A class that is not a set is called a proper class; a proper class is not an object, and therefore cannot be a member of any class. As our first postulate we adopt the Principle of Extensionality 1.6. We shall refer to it briefly as 'PX'. Zermelo postulated PX for sets only, as he did not consider classes (except the universe of discourse) and used properties instead. Before stating our next postulate, we introduce a useful piece of notation. 3.1. Defmition If n is any natural number and a1, a2, ... , all are any objects, not necessarily distinct, we put {a1, a2, . .. , all} =df {x : x =I- x or x = a1 or x = a2 or ... or x = an}. In particular, for n = 0 we get the empty class { } = {x: x =I- x}, which we denote by '0'. (No object can differ from itself!) §3. Zermelo's axioms 17 3.2. Axiom ofPairing(A2) For all objects a and b the class {a, b} is a set. 3.3. Remarks (i) This set is called the pair of a and b. By PX we have {a, b} = {b, a}. (ii) For any object a we clearly have {a} = {a, a}, which is a set by A2. This set is called the singleton of a. (iii) From our assumption that there exists at least one object a, it now follows that there exists at least one set, namely {a}. Note however that we cannot prove the existence of an individual: our postulates are neutral on this matter. 3.4. Definition Let A and B be classes. If every member of Bis also a member of A, we say that B is a subclass of A (also, B is included in A, or A includes B), briefly: BC A. * If B C A but A B, we say that B is a proper subclass of A (also, B is properly included in A, or A properly includes B), briefly: BCA. 3.5. Warnings (i) Beware of confusing 'contains' and 'includes'; the former refers to the relation of membership E while the latter refers to the relation C just defined. (ii) However, this terminological distinction is not observed by all authors, so watch out for other usages. (iii) Also, the notation introduced in Def. 3.4 is not universally accepted. Some authors use 'C' instead of 'C' for not-necessarilyproper inclusion; and•~' instead of 'C' for proper inclusion. The following postulate was one of Zermelo's central ideas. 3.6. Axiom of Subsets (AS) If BC A and A is a set then so is B. 18 1. Sets and classes 3.7. Definition If A is a class and P is a definite property such that the condition Px is meaningful for any object x, we put {x EA: Px} =df {x: XE A and Px}. 3.8. Remarks (i) Zermelo's formulation of AS, clearly equivalent to the one used here, said (in effect) that if A is a set then the class {x E A : Px} is always a set. Since this class separates or singles out those members of A that have the property P, he called AS the Axiom of Separation (Aussonderung). This name is still in current use. (ii) The intuitive idea behind AS is clear: if B ~ A and A is not too vast, then B cannot be too vast either. 3.9. Theorem 0 is a set. PROOF Clearly 0 is included in any class, and in particular in any set. By Rem. 3.3(iii) there exists a set. Hence 0 is included in some set, and by AS is itself a set. ■ 3.10. Theorem The class of all objects (the universe of discourse) and the class of all sets are proper classes. PROOF We saw in § 2 that Russell's class, {x: xis a set such that x ft. x} cannot be a set. Since Russell's class is included in the class of all sets, the latter cannot be a set by AS. The same applies to the universe of discourse. ■ §3. Zermelo's axioms 19 3.11. Definition If A is any class, we put UA =dt { x : x E y for some y E A}. UA is called the union class ofA. 3.12. Axiom of Union set (AV) If A is a set then so is UA. 3.13. Remarks (i) The members of UA are the members of the members of A. (ii) Intuitively, the idea behind AU is that if A is a set then it does not have 'too many' members; and each of these, being an object (an individual or a set), in turn does not have 'too many' members. Therefore UA - obtained by pooling together not-too- many collections, none of which is too vast - cannot itself be too vast. 3.14. Definition For any classes A and B, we put AU B =dt {x: x EA or x EB}. A U B is called the union (or join) of A and B. 3.15. Theorem A U B is a set iff both A and B are sets. PROOF If A and Bare sets, then AU B = U{A, B}, which is a set by A2 and AU. The converse follows easily from AS. ■ 3.16. Theorem If n is any natural number and a 1, a2, ... , an are any objects, the class {ai, a2, ... , an} is a set. 20 1. Sets and classes PROOF By (weak) induction on n. Basis. For n = 0 the assertion of our theorem is Thm. 3.9. Induction step. By Def. 3.14, {ai, a2, ... , an, an+il = {ai, a2, ... , an} U {an+il, which is a set by the induction hypothesis, Rem. 3.3(ii) and Thm. 3.15. ■ 3.17. Definition If A is any class, we put PA =dr{x :xisasetsuchthatx !;;;:A}. PA is called the power class ofA. 3.18. Axiom ofPower set (AP) If A is a set then so is PA. 3.19. Remark Intuitively, the idea behind AP is that although PA can be very large in fact, much larger than A - its size is nevertheless bounded provided A itself is not too vast. 3.20. Problem Prove that if A is a class of sets (that is, a class all of whose members are sets) such that UA is a set, then A is a set as well. The last axiom we shall postulate here is 3.21. Axiom ofInfinity (Al) There exists a set Z such that 0 E Z and such that for every set x E Z alsox U {x} E Z. § 4. lntersections and differences 21 3 .22. Remarks (i) Without AI it is impossible to prove that there are infinite sets. On the other hand, it is easy to see intuitively that any set Z satisfying the conditions imposed by AI must be infinite. We shall be able to prove this rigorously when we have a rigorous definition of infiniteness. (ii) A2, AS, AU and AP are clearly particular cases of the Principle of Comprehension: they say that certain classes are sets. Although Al as it stands is not of this form, we shall see later that it is equivalent to the proposition that a certain class, w, is a set. § 4. Intersections and differences The following definitions will be needed later on. 4.1. Definition If A is any class. nA =df { X : X E y for every y E A}. nA is called the intersection class ofA. 4.2. Definition If A and Bare classes, A n B =dt {x : x e A and x e B}. An Bis called the intersection (or meet) ofA and B. 4.3. Definition If A is any class, Ac =df {x: x ft_ A}. Ac is called the complement of A. 4.4. Definition If A and Bare any classes, A - B =dt A n Be. A - B is called the difference between A and B. 22 1. Sets and classes 4.5. Problem (i) Prove that if A is a non-empty class then nA is a set. What is n01 (ii) Prove that if A or Bis a set then so is A n B. (iii) Prove that A and Ac cannot both be sets. 2 Relations and functions § 1. Ordered n-tuples, cartesian products and relations I.I. Preview By Def. 1.1.5, the extension of a property P of objects is the class {x : Px}. Recall (Rem. 1.1.7) that from an extensionalist point of view a property and its extension determine each other uniquely; so thatwielding Occam's razor, the structuralist mathematician's favourite instrument-one can identify the two and pretend that a property simply is its extension. As set theory developed, it transpired that a similar procedure could be applied to other fundamental mathematical notions such as relation (among objects) and function: instead of taking these as independent primitive notions, as had been done in the early days of set theory, they could be reduced to classes and the membership relation. In this and the next section we shall see how this is done. For any two objects a and b, not necessarily distinct, we need a unique object (a, b) called the ordered pair of a and b [in this order]. It is not really important how the ordered pair is defined, so long as the following condition is satisfied: (1.2) (a, b) = (c, d) <=>a= c and b = d. 1.3. Warning The ordered pair (a, b) must not be confused with the set {a, b}, sometimes known as an unordered pair, whose members are just a and b. For example, the sets {a, b} and {b, a} are always equal (see Rem. l.3.3(i)), but by (1.2) the ordered pairs (a,b) and (b,a) are equal only if a = b. However, when there is no risk of confusion we shall often omit the adjective 'ordered' and say 'pair' when we mean ordered pair. 23 24 2. Relations and functions As part of the reductionist programme aiming to reduce all mathematical concepts to the notion of class and the membership relation, the following rather artificial definition, first proposed by Kazimierz Kuratowski in 1921, has been widely accepted. 1.4. Definition For any objects a and b, (a, b) =dt {{a}, {a, b}}. 1.5. Problem Prove that (1.2) follows from Def. 1.4. More generally, for any number n and any n objects ai, a2, ... , an -not necessarily distinct-we need a unique object (ai, a2, ... , an) called the ordered n-tuple of a1, a2, •.. , an [in this order]. Again, it is not really important how ordered n-tuples are defined, so long as the following condition-of which (1.2) is a special case-is satisfied: (1.6) (ai, a2, ... , an) = (b1, b2, ... , bn) <=> a; = b; for i = 1, 2, ... , n. Again, we shall often say 'n-tuple' as short for 'ordered n-tuple'. The following definitions deliver the goods. Proceeding inductively, we supplement Def. 1.4 by: 1. 7. Definition For any n ~ 2 and objects a1, a2, ... , an, an+l• (a1, a2, •••,an, an+1) =dt ( (a1, a2, •••,an), an+1), 1.8. Problem Prove (1.6) for all n ~ 2. (Use weak induction on n, taking n = 2 as basis.) § 1. Ordered n-tuples 25 There remain the cases n = 1 and n = 0. For n = l, condition (1.6) reduces to: (a)= (b) a = b. The simplest way to satisfy this is to adopt the following. 1.9. Definition (a} =dt a. As for n = 0, condition (1.6) reduces to the unconditional equality ( ) = ( ) , which will hold trivially, no matter how we define ( ) . Since 0 is the simplest object, the simplest convention to adopt is 1.10. Definition (} =df0, 1.11. Remark The equality which was decreed by Def. 1.7 for n ~ 2, now holds also for n = 1 by virtue of Def. 1.9. However, it does not hold for n = 0, because by Def. 1.9 (a}= a, whereas by Def. 1.10 ((),a}= (0, a}. We proceed to define the notions of cartesian product and cartesian power. 1.12. Definition (i) For any classes Ai, A 2 , ... , An, not necessarily distinct, their cartesian product [in this order] is the class Ai X A2 X • • • X An =df {(xi, Xz, ... , Xn): X1 E Ai, X2 E A2, .•• , Xn E An}, that is, the class of all n-tuples whose i-th component belongs to A; for i = 1, 2, ... , n. (ii) The n-th cartesian power of a class A is the cartesian product of A with itself n times: An =dtA X A X • • • X A, n times 26 2. Relations and functions that is, the class of all n-tuples of members of A. In particular, A1 = A and A0 = {()} = {0}. 1.13. Remarks (i) In Def. 1.12(i) we have used a convenient generalization of the class notation introduced in Def. 1.1.5. Although it is almost self-explanatory, let us spell it out. Suppose F(x 1, x 2, ... , Xn) is an object whenever x 1, x 2, ... , Xn are objects; and suppose P(x1, x2, ... , Xn) is a condition involving xi, x2, ... , Xn, Then {F(x1, Xz, ... , Xn) : P(x1, X2, ... , Xn)} is defined to be the class {y : there exist xi, x2, ... , Xn such that = F(xi, X2, ... , Xn) y and P(xi, Xz, ... , Xn)}. (ii) It is easy to see that, for any n ;;;i,, l, A1 X A 2 X • • • X An= 0 iff A;= 0 for at least one i. Intuitively, if n ;;;i,, 1 and R is an n-ary relation on a class A, then for any n-tuple of members of A it is meaningful to say that R holds or does not hold for it. The class of all those n-tuples for which R does hold is known as the extension of R. From an extensionalist point of view, two relations are identical iff they have the same extension. Thus, a relation and its extension uniquely determine each other. In the spirit of the reductionist programme mentioned above, a relation is simply identified with its extension. Hence the following 1.14. Definition (i) For any n ;;;i,, 1 and any class A, an n-ary relation on A is a class of n-tuples of members of A-that is, a subclass of An. (ii) In particular, a property on A is a unary relation on A-that is, a subclass of A. 1.15. Remarks (i) If R is an n-ary relation we shall often write 'R(a1, a2 , .•. , an)' as short for ' ( a1, a2 , ••• , an) e R'. In the special case where R is a binary relation we shall often write 'aRb' for '(a, b) e R'. §2. Functions; the axiom of replacement 27 (ii) We could extend Def. 1.14(i) to the case n = 0, but the resulting notion of 0-ary relation is found to be of little use. § 2. Functions; the axiom of replacement Intuitively, if f is a function (or map, or mapping) then f assigns to any object x at most one object fx as value. The class of all objects x to which a value fx is assigned by f is called the domain [of definition] off and denoted by 'domf'. The graph off is then the class {(x, fx): x E dom /}. Note that the graph of a function is a class of pairs. But not every class of pairs can be the graph of a function: a class G of pairs is the graph of a function iff for any object x there is at most one object y such that (x, y) E G. From an extensionalist point of view, two functions are identical if they have the same graphs. In the spirit of reductionism, we can therefore identify a function with its graph: 2.1. Definition A function (a.k.a. map or mapping) is a class f of ordered pairs satisfying the functionality condition: whenever both ( x, y) E / and (x,z)Eftheny=z. 2.2. Definition Let f be a function. (i) The domain off is the class domf=ctr{x: (x,y) E/forsomey}. (ii) If x e dom /, then the value off at x - usually denoted by 'fx' - is the [necessarily unique] y such that (x, y) E /. (iii) The range off is the class ran/ =dr {fx : x E domf}. 2.3. Problem Verify that from Defs. 2.1 and 2.2 it follows that a function f is equal to its own graph; that is, f = {(x, /x): x e domf}. 28 2. Relations and functions Hence prove that functions / and g are equal iff dom f = dom g and fx = gx for every x in their common domain. 2.4. Definition Let/ be a function. (i) We say that f is a map from A to B (or that f maps A into B) if dom/ = A and ran/~ B. (ii) We say that f is a surjection from A to B (or that f maps A onto B) if dom/ = A and ran/= B. (iii) We say that f is an injection (or a one-to-one map) if whenever x and y are distinct members of domf then fx and fy are also distinct. (iv) We say that f is a bijection from A to B if it is an injection as well as a surjection from A to B (that is, a one-to-one map from A onto B). We shall now enquire when a relation or a function is a set. 2.5. Lemma Let A and B be non-empty classes. Then A x B is a set if! both A and B are sets. PROOF Let a and b be any members of A and B respectively. Then by Defs. 1.4 and 1.12 we have {a, b} E {{a}, {a, b}} = (a, b) EA X B. Therefore by Def. 1.3.11 {a, b} e U(A x B). Since both a and b belong to {a, b}, it follows, again by Def. 1.3.11, that both are members of UU(A x B). Thus we have shown that A~ UU(A x B) and Bk UU(A x B), hence AU Bk UU(A x B). Also, it is easy to see that UU(A x B) k AU B. Therefore by PX we have UU(A X B) =Au B. If A x B is a set, it follows from AU and Thm. 1.3.15 that A and B are sets as well. §2. Functions; the axiom of replacement 29 Conversely, if A and B are sets, then by Thm. 1.3.15 and Prob. 1.3.20 it follows that A X B is a set as well. ■ 2.6. Theorem Let n;;,, 1, and let Ai, A 2, ... , An be non-empty classes. Then A 1 x A2 x • • • x An is a set if! A; is a set for each i = 1, 2, ... , n. PROOF By weak induction on n. Basis. For n = 1 the assertion of our theorem is trivial, since in this case A1 x A 2 x · · • x An is simply A 1 (see Defs. l.12(i) and 1.9). Induction step. It is easy to see that = A1 X A2 X • • • X An X An+l (A1 X A2 X • • • X An) X An+l (use Defs. l.12(i) and 1.7 and Rem. 1.11). Hence, by Lemma 2.5 and the induction hypothesis, A 1 X A2 x • • • x An x An+l is a set iff A; is a set for each i = 1, 2, ... , n, n + 1. ■ 2.7. Corollary If A is a set and R is an n-ary relation on A (for some n ;;,, 1) then R is a set as well. PROOF By Def. 1.14 we have R ~An.If A= 0 then An= 0 by Def. 1.12(ii) and Rem. 1.13(ii); hence R = 0. If A is a non-empty set then An is a set by Thm. 2.6, hence R is a set by AS. ■ 2.8. Theorem Let f be a function. Then f is a set if! both dom f and ran/ are sets. PROOF It is easy to verify that UUJ = domf Uran/. 30 2. Relations and functions From this the required result follows, using the same argument as in the proof of Lemma 2.5. ■ At this point we introduce 2.9. Axiom ofReplacement (AR) If f is a function and dom f is a set then ran f is a set as well. 2.10. Remarks (i) AR is clearly a particular case of the Comprehension Principle. (ii) In view of Thm. 2.8, AR is equivalent to the proposition that if f is a function such that dom f is a set then f itself is a set. The intuitive idea behind AR is that f has exactly 'as many' members as does domf: for each a E domf, f contains the corresponding pair ( a, fa). Therefore if dom f is not too vast, neither is f itself. (iii) In mathematical applications, a function f is almost always defined as a mapping from A to B, where both A and B are known in advance to be sets. It then follows from AS and Thm. 2.8 that ran/ and f itself are sets. AR is not needed for this. But as we shall see AR plays an important role within set theory itself. 3.1. Preview § 3. Equivalence and order relations In this section we discuss two kinds of relation that are of particular importance, not only in set theory but in mathematics as a whole. Throughout the section, A is an arbitrary class. 3 .2. Definition R is an equivalence relation on A if R is a binary relation on A such that, for any members x, y and z of A, the following three conditions are satisfied: xRx if xRy then also yRx if xRy and yRz then also xRz (reflexivity), (symmetry), (transitivity). §3. Equivalence and order relations 31 3.3. Example The paradigmatic example of an equivalence relation on A is the binary relation {( x, x ) : x e A} , called the identity (or diagonal) relation on A, and denoted by 'idA'• By the way, idA is clearly a function; indeed, it is a bijection from A to itself. 3.4. Definition Let R be an equivalence relation on A. For each a e A we put [a]R =df {x: xRa}. We call [a]R the R-class of a, or the equivalence class of a modulo R. Where there is no risk of confusion we omit the subscript 'R' and write simply '[a]'. 3.5. Theorem Let R be an equivalence relation on A and let a and b be any members ofA. Then [a]= [b] iff aRb. PROOF (=>). By reflexivity, aRa, so a e [a]. If [a]= [b] then by PX also a e [b], so that aRb. (<=). Suppose aRb. If x e [a], then xRa, hence by transitivity xRb, so that x e [b]. Thus we have shown that [a] C [b]. Also, from aRb it follows by symmetry that bRa, so the argument we have just used shows that [b] C [a]. Hence by PX [a]= [b]. ■ 3.6. Corollary Let R be an equivalence relation on A and let a be any member of A. Then a belongs to exactly one R-class, namely [a]. PROOF We have seen that a e [a]. If also a e [b] then by Def. 3.4 aRb, so by Thm. 3.5 it follows that [a] = [b]. ■ 3.7. Definition (i) S is a sharp partial order on A if S is a binary relation on A such that, for any members x, y and z of A, the following two 32 2. Relations and functions conditions are satisfied: if xSy, then ySx does not hold if xSy and ySz then also xSz (anti-symmetry), (transitivity). (ii) B is a blunt partial order on A if B is a binary relation on A such that, for any members x, y and z of A, the following three conditions are satisfied: xBx if xBy and yBx then x = y if xBy and yBz then also xBz (reflexivity), (weak anti-symmetry), (transitivity). 3.8. Example Let A be a class of sets (that is, all the members of A are sets rather than individuals). Let S and B be the restrictions to A of c and ~ respectively; that is, S=dt{(x,y)eA2 :xCy} and B=dt{(x,y)eA2 :x~y}. Then it is easy to see that S and B are a sharp and a blunt partial order, respectively, on A. 3.9. Problem Let S and B be a sharp and a blunt partial order, respectively, on A. Put sb =df s u idA and n# =df B - idA. (For the definitions of idA and - see Ex. 3.3 and Def. 1.4.4.) (i) Prove that sb and n# are a blunt and a sharp order on A, respectively. (ii) Verify that sb# = S and n#b = B. 3.10. Remarks (i) The qualifications 'sharp' and 'blunt' are often omitted and a partial order of either kind is referred to simply as a 'partial order'. There is no real harm in this, for two reasons. First, because it is usually clear from the context which kind of partial order is meant. Second, as shown in Prob. 3.9, there is a natural § 4. Operations on functions 33 mutual association between a sharp partial order and a blunt partial order, whereby the latter is obtained from the former by #. applying b and the former from the latter by applying (ii) Sharp partial orders are often denoted by symbols such as'<' or '<'; the corresponding blunt partial orders are then denoted by symbols such as•,;;;;• or•~• respectively. 3.11. Definition (i) S is a sharp total order on A if S is a binary relation on A such that, for any members x, y and z of A, the following two conditions are satisfied: exactly one of the following three disjuncts holds xSy or x = y or ySx (trichotomy), whenever xSy and ySz then also xSz (transitivity). (ii) B is a blunt total order on A if B is a binary relation on A such that, for any members x, y and z of A, the following three conditions are satisfied: xBy or yBx if xBy and yBx then x = y if xBy and yBz then also xBz (connectedness), (weak anti-symmetry), (transitivity). 3.12. Problem Let S and B be a sharp and a blunt total order, respectively, on A. Prove that (i) S is a sharp partial order, (ii) sb is a blunt total order, (iii) Bis a blunt partial order, (iv) Btr, is a sharp total order, on A. § 4. Operations on functions The following definitions will be needed later on. 4.1. Definition If f and g are functions such that ran f ~ dom g, we put g 0 f=dt{(x,gy): (x,y) E/}. 34 2. Relations and functions go /-often denoted briefly 'gf' - is called the composition off and g. (Note reading from right to left!) 4.2. Problem Show: go f is a function, dom (go/) = dom f and ran (go/) k rang. Moreover, for any x in dom (g 0 /)-which is also dom /-check that (g 0 /)x = g(fx). 4.3. Definition If f is an injective (that is, one-to-one) function we put 1-1 =df {(y, x): (x, y) E /}. 1-1 is called the inverse of/. 4.4. Problem Verify that 1-1 itself is an injective function and, moreover, dom(/-1) = ran/, ran(/-1) = dom/, /-l 0 f = iddomf, (For the definition of id see Ex. 3.3.) 4.5. Problem Prove that if f is a function from a proper class to a set, then f is not injective. 4.6. Definition If f is a function and Ck domf, we put (i) /tC =df {(x, fx}: x e C}, (ii) f[C] =dr {fx: x e C}. / t C is called the restriction off to C and /[C] is called the image of C under/. 4.7. Problem Verify that ft C is a function, dom (ft C) = C and ran (ft C) = /[C]. Moreover, (/ tC)x = fx for every x e C. § 4. Operations on functions 35 4.8. Problem Let F be a class whose members are functions. Show that UF is a function iff the following coherence condition is fulfilled: fx = gx for all f and g in F and all x E A= B -would then follow at once by Thm. 2.3.5. This procedure, novel at the time, was to become standard practice, used with respect to various equivalence relations that arise in numerous mathematical situations. Ironically, Frege's procedure does not work at all well in the present case, where the equivalence relation is =. Unaware that the Comprehension Principle had to be restricted, he assumed as a matter of course that [A],,. is always a set, hence an object. Unfortunately, this is in general false. For example, if A is a singleton, then [A].., is the class of all singletons, and hence U[A],,., is the class of all objects, the entire universe of discourse, which is a proper class by Tom. 1.3.10. Hence by AU [A].,. must be a proper class as well. This is very inconvenient, because we would like to be able to form classes of cardinals, which is impossible if cardinals are proper classes. Fortunately there are other ways of defining cardinals, satisfying the requirement of Def. 1.3, while ensuring that the cardinals are sets. Later on, in Ch. 6, we shall follow one such procedure. In each =-class we shall be able to select a unique 'distinguished' member. Then, for any set A, we can take IAI to be the distinguished member of [A].. rather than that class itself. Then Thm. 2.3.5 ensures that the requirement of Def. 1.3 is satisfied. (ii) For the time being, let us take it on trust that Def. 1.3 can be completed in a satisfactory way. This is not asking too much, since our reference to cardinals may be regarded as a mere convenience: everything that we shall say in this chapter in terms of cardinals can easily be rephrased (at the cost of some circumlocution) in terms of sets and mapping between sets. 38 3. Cardinals (iii) The cardinality IAI of a set A is a measure of its size. Cardinals can be regarded intuitively as generalized natural numbers. Indeed, if A is a finite set of the form {ai, a2, ... , an}, where the a; are distinct, then we could take IAI to be n, the number of members of A. Thus, each natural number may be regarded intuitively as the cardinality of a finite set. (iv) However, we shall not assume formally that the natural numbers are in fact cardinals. Rather, in §3 we shall posit for each n a corresponding cardinal n, without necessarily identifying the two. §2. Ordering the cardinals; the Schroder-Bernstein Theorem We define a binary relation :e;; on the class of cardinals, which, as we shall soon see, is a [blunt) partial order on that class: 2.1. Definition Let Ji. and µ be cardinals. Let A and B be sets such that IA I = Ji. and IBI =µ.We say that Ji. is smaller-than-or-equal-toµ- briefly: Ji. :e;; µ - if there is an injection from A to B. 2.2. Remark This definition is in need of legitimation: we must make sure that the criterion it provides for asserting that Ji.,;,;µ depends only on these cardinals themselves rather than on the choice of particular sets A and B such that IAI =Ji.and IBI =µ.This is done as follows. Let A, A', B, B' be sets such that IAI = IA'I and IBI = IB'I. Given an injection from A to B, it is easy to show - DIY! - that there is also an injection from A' to B'. 2.3. Theorem Let A and µ be cardinals and let B be a set such that IBl = µ. Then Ji. :e;; µ if! B has a subset whose cardinality is Ji.. PROOF Let A be a set such that IAI = Ji.. By Def. 2.2.4, an injection from A to B is the same thing as a bijection from A to a subset of B. ■ §2. Ordering; Schroeder-Bernstein Theorem 39 2.4. Theorem The relation :;;;; on the class of cardinals is reflexive and transitive. PROOF DIY. ■ To show that :;;;; is a partial order, it remains to establish that it is weakly anti-symmetric (see Def. 2.3. 7). This fact was conjectured by Cantor and proved independently by F. Bernstein and E. Schroder. The proof we shall present here, due to Zermelo, uses a lemma that is of some interest in its own right. 2.5. Definition A map f/ from a class of sets to a class of sets is monotone if whenever X and Y are sets in dom q such that X C Y then ,e)i. ~ ,eµ (weak monotonicity of multiplication), (v) (,e +Ji.)µ= ,eµ + Ji.µ (distributivity of multiplication over addition), (vi) Ji.µ = 0 ~ Ji. = 0 orµ = 0 (absorptive property of 0). 5.6. Problem Prove the following generalization of Prob. 5.5(v): if {Ax Ix e X} is any indexed family of cardinals and µ is any cardinal then 5.7. Warning The same as 4.8, mutatis mutandis. As in the case of addition, multiplication can be defined for a whole family of cardinals rather than just a pair of cardinals. (Legitimation again requires AC.) We start from a simple observation: 5.8. Lemma Let C and D be any sets and let u and v be distinct objects. Let P be the class {/: f is a function such that domf = {u, v} and fu e C and fv e D}. Then P is a set equipollent to C x D. PROOF It is quite easy to show, without using AR, that Pis a set. However, we shall not bother to do so. Instead, we shall define a bijection F from the set C x D to P. Thus by AR the latter is also a set. We put, §5. Multiplication 49 for each c EC and d eD, F(c, d) = {(u, c), (v, d) }. It is easy to verify that F is indeed a bijection from C x D to P. ■ The following definition generalizes the construction of Lemma 5.8 to an arbitrary family of sets. 5.9. Definition If {Bx I x E X} is an indexed family of sets, the class {/ : f is a function such that dom f = X and fx e Bx for all x E X} is denoted by 'X {Bx I X E X}' and called the direct product of the family {Bx I x e X }. 5.10. Lemma If {Bx I x E X} is any indexed family of sets, then X {Ba I x e X} is a set. PROOF Recall (Def. 4.9) that {Bx Ix e X} is the function having the index set X as its domain, whose value at each x e X is Bx. Therefore the range of this function is {Bx: XE X} and this range is a set by AR. Now let us put U = LJ {Bx : x e X}. U is a set by AU. Next, observe that by Def. 5.9, if f is any member of X {Bx I x e X} then f is a map from X to U. Hence fr;;;, X x U, which means that f e P(X x U). Thus we have shown that X {Bx IX EX} r;;;, P(X X U). Since Xx U is a set (cf. Rem. 5.2(i)), it follows that P(X x U) is a set by AP. Hence X {Bx Ix e X} is a set by AS. ■ 50 3. Cardinals 5.11. Definition Let {Bx I x E X} be a family of sets and let µx = IBxl for each x E X. We put TI{µx IX EX} =df IX{Bxlx E X}I. This is called the product of the [family ofl µx, indexed by X. 5.12. Remarks (i) Using AC it is easy to legitimize this definition by showing that if A is another indexed family of sets with the same index set X such that IAxl = IBxl for all x EX, then X{Ax IX EX}= X{Bx IX EX}. (ii) Def. 5.1 can be regarded as a special case of Def. 5.11. Indeed, if C and D are any sets, whose cardinalities are u and ,l respect- ively, take X = {u, v}, where u and v are distinct objects. and let {Bx Ix e X} be the family such that Bu = C and Bv = D. Then Lemma 5.8, rewritten in the notation of Def. 5.9, says that X {Bx IX EX}= C X D. So in this case we have IX{Bx IX E X}I = IC X DI, which is what Def. 5.1 says ui should be. § 6. Exponentiation; Cantor's Theorem 6.1. Definition Let A and B be any sets. Then map(A, B) =dt {/: f is a map from A to B}. 6.2. Remarks (i) If f is any member of map (A, B) then f <;;;, A x B, hence f is a member of P(A x B). Thus map(A, B) <;;;, P(A x B), and map (A, B) is a set. (ii) Perhaps more instructively, the same result can be derived from Lemma 5.10, as follows. Consider the indexed family § 6. Exponentiation; Cantor's Theorem 51 {Da Ia EA} such that Da = B for every a EA. Then X {Da Ia EA} - which is a set by Lemma 5.10 - is, by Def. 5.9 equal to {f : f is a function such that dom f = A and fa e B for all a E A}. By Def. 6.1 this is exactly map (A, B). 6.3. Definition For any cardinals Aand µ, we define µ to the [power of] A: ,1 = lmap(A, B)I, where A and Bare sets such that IA!= Aand IBI = µ. 6.4. Remarks (i) This definition is legitimized by the easily verified fact that if = = A= A' and B B' then map(A, B) map(A', B'). (ii) From Rem. 6.2(ii) it follows that exponentiation (raising to a power) can be achieved by repeated multiplication, in the follow- ing sense: if {Xa Ia E A} is an indexed family of cardinals such that Xa =µfor all a EA, and if IAI =).,then fl{xa I a EA}=,}. 6.5. Problem Let k, m be natural numbers, and let n = mk. Verify that n = mk. 6.6. Problem Verify that for any cardinals x, A andµ: (i) l = 1, (ii) µl = µ, (iii) µ",} = µ"+\ (iv) (,})" = µ"\ (v) (Aµ)"= A"µ". 6.7. Theorem For any set A, !PAI= 2IAI_ 52 3. Cardinals PROOF By Def. 6.3, what we have to show is that PA is equipollent to map(A. B). where Bis a set having exactly two members. Let us take B = (0, (0)). Define a map F from map(A, B) to PA, by putting, for every f e map{A, B), Ff= {a e A: fa =0). Itis easy to verify that 7is a bijection from map(A, B) to PA. ■ 6.8. Canto~s Theorem ForanysetA, IAJd e D. But from the dermition of D we see that d e D d «;;>- f gd. Thus, d belongs to gd iff it doesn't. This contradiction shows that g cannot map A onto PA, and hence cannot be a bijection from A to M. ■ 6.9. Remm-k The idea of Russell's Paradox derives from this proof. Indeed, if A is the class of all sets, then it is easy to see that PA CA. Thus id.A is in fact a bijection from A to a class-A itself-that includes PA. Talcing idA as the g in Cantor's proof, the D of that proof becomes Russell's paradoxical class of all sets that do not belong to themselves. 4 Ordinals § 1. Intuitive discussion and preview The introduction of the set-theoretical cardinals was motivated by the wish to generalize the natural numbers in their capacity as cardinal numbers, answering the question 'how many?'. But the natural numbers are also used, in arithmetic as well as in ordinary life, in other capacities. In my local bank branch there is a number dispenser: on entering the branch, each customer collects from the dispenser a piece of paper showing a number. This number is not (at least, not directly) an answer to a 'how many?' question, but an ordinal number, fixing the place of the customer in the queue. A finite set can always be arranged as a queue - and if we ignore the identity of the elements being ordered, this can done in just one way. For example, the first three customers in the bank, arranged according to the numbers assigned to them by the dispenser, always form the following pattern: We can use the number three as an ordinal number, to describe this general abstract pattern, the order type of three objects arranged in a queue. Note that three is also the number to be assigned to the next customer, who is about to join the queue. This is quite general: the ordinal number assigned to each customer is the order-type (the queue pattern) of the queue of all preceding customers. Cantor wished to extend this idea of finite queues and finite ordinal numbers into the transfinite. Imagine that all the old (finite) ordinal numbers have been dispensed. We have now got an infinite queue 53 54 forming the pattern (*) 4. Ordinals We need a new ordinal to describe the order type of this infinite queue. Cantor denoted this new ordinal by 'w'. We can assign this ordinal to the next 'customer' and extend the queue by placing that customer behind all the finite-numbered ones: •<•<•<•< ... <• 0 123 w The new order type just formed is described by the next ordinal, which Cantor denoted by 'w + 1'. We can continue in this way, getting not only w + n for every natural n but also w + w, then w + w + 1 and so on and on and on. Examining the 'queues' formed in this way, Cantor saw that they are not merely totally ordered, but have a special property not shared by all totally ordered sets: every non-empty subset of the queue has a least (first) member. Cantor called such queues well-ordered. An example of a total order that is not a well-ordering is provided by the integers, ordered according to magnitude: ... < -3 < -2 < -1 < 0 < 1 < 2 < 3 < .... Note that the fact that the pattern (*), described by the ordinal w, is well-ordered is just the Least Number Principle, a form of the Principle of Mathematical Induction (see§ 4 of Ch. 0). Cantor introduced the ordinals as a new and separate sort of abstract entity, just as he did with cardinals. However, in 1923 John von Neumann pointed out that among all well-ordered sets having a given Cantorian ordinal as their order-type there is a particular one with some very special properties. In the spirit of reductionism, this particular set can then be taken to be the ordinal of that order type. We shall present von Neumann's theory of ordinals as streamlined by Raphael M Robinson and others. § 2. Definition and basic properties 2.1. Definition Let < be a [sharp] partial order on a class A and let B !: A. If be B and b < x for every other x e B, we say that b is least in B with respect to<. §2. Definition and basic properties 55 2.2. Remarks (i) Instead of demanding that b -< x for every other x E B, we may equivalently demand that b :,.,; x for every x E B. Here :,.,; is of course - y CA. 2.11. Remarks (i) Note that every member of a transitive class must be a set rather than an individual, because by Def. 1.3.4 y CA holds only if y is a class. So a class A is transitive iff: (1) all its members are sets and (2) UA CA; that is, for all x and y, x E y EA=> x EA. (ii) Unfortunately, 'transitivity' is used with two meanings: the present one and that applicable to binary relations (as, for example, in Def. 2.3.2). In practice no confusion shall arise, as the context will indicate which meaning is intended. 2.12. Definition An ordinal is a transitive and E-well-ordered set. The class of all ordinals is denoted by 'W'. 2.13. Examples The empty set 0 is, vacuously, an ordinal. It is also easy to verify that {0} and {0, {0}} are ordinals. 58 4. Ordinals 2.14. Convention We shall use lower-case Greek letters - mainly 'c.r', '/3', 'y', 'A', \;' and 'rJ' - as variables ranging over the ordinals. 2.15. Theorem All members of an ordinal are ordinals; thus, if a is an ordinal, c.r={~:~ec.r}. PROOF Let y E c.r. Since a is transitive, we have y k a. Since a is an e-well-ordered set, it follows from Prob. 2.8(iv) that its subset y is also e-well-ordered. It remains to show that y is transitive. So let u e x e y. Using the fact that a is a transitive set, we have x e c.r and then in tum also u e c.r. Hence u and x, as well as y, are members of a; so by the transitivity of the relation ea we infer from u E X E y that U E y. ■ 2.16. Lemma If y is any transitive subset of an ordinal a then y itself is an ordinal; moreover, y = a or y e a. PROOF That y is an ordinal follows at once from Prob. 2.8(iv). Moreover, let u = a - y. If u = 0 then y = a. If u is non-empty, then it has a (unique) least member x w.r.t. Ea. We shall show that y = x. First, let z e x. Since x e c.r and a- is transitive, it follows that z e c.r. But z cannot be in u, because z ex, and x is the least member of u; thus z must be in y. This proves that x k y. Conversely, let z e y. Then z =xis impossible because x f/; y. Also, x e z i~ impossible because, by the transitivity of y, it would imply x E y. Hence by Lemma 2.4 we must have z e x. This proves that y k x. Thus y = x E c.r. ■ 2.17. Theorem The class W of all ordinals is transitive and e-well-ordered. §2. Definition and basic properties 59 PROOF The transitivity of W follows at once from Tom. 2.15. To prove that W is E-well-ordered, we shall make use of Prob. 2.8(iii). To verify that condition (1 ') of Prob. 2.8(iii) holds for W, let a and f3 be any ordinals. Since both a and f3 are transitive, it is easy to see that an f3 is also transitive. Thus by Lemma 2.16 an f3 is an ordinal, say y; moreover, y = a or y Ea. Likewise, y = f3 or y E /3. But we cannot have both y E a and y E /3 because then y E a n /3 - that is, y E y; and this would violate the anti-symmetry of the wellordering relation Ey on y. Therefore y = a or y = (3. Hence a= f3 or a E f3 or f3 E a, which proves condition (1') for W. Now let u be any non-empty set of ordinals. We must prove that there exists an ordinal ; Eu such that l; nu= 0. Take any a Eu. If an u = 0, we are through. On the other hand, suppose a n u =fa 0. Since a is e-well-ordered, there must exist some member l; of an u such that ; n an u = 0. But ; e a and a is transitive; so l; k a. Hence ; n u = ; n an u = 0. ■ Z.18. Corollary Wis a proper class (that is, not a set). PROOF If W were a set, then by Def. 2.12 and Thm. 2.17 it would be an ordinal, hence W E W, in violation of the anti-symmetry of the well- ordering relation E w. ■ Z.19. Remarks (i) The (naive) assumption that W is a set led to a contradiction. This was the Burali-Forti Paradox (see§ 2 of Ch. 1). Cor. 2.18 is a 'tame' version, within ZF, of the paradox. Similarly, Thm. 1.3.10 is a 'tame' ZF version of Russell's Paradox. (ii) In the proofs of Thm. 2.17 and Cor. 2.18 we used the argument that an ordinal y cannot be a member of itself because this would violate the anti-symmetry of the well-ordering relation Ey on y. In mathematical practice it is often convenient to posit a further postulate - the Axiom of Foundation (or Regularity), first proposed by Dimitry Mirimanoff in 1917 - one of whose effects is to 60 4. Ordinals exclude any set that belongs to itself. On the other hand, in some special applications of set theory - notably in so-called situation semantics, developed by Jon Barwise and others, and in abstract computation theory - it is convenient to use an extension of ZF proposed by Peter Aczel, which negates the Axiom of Foundation and admits some sets that belong to themselves. In the present course we do not commit ourselves either way. 2.20. Corollary Any class of ordinals is e-well-ordered. PROOF Immediate from Thm. 2.17 and Prob. 2.8(iv). ■ 2.21. Definition The e-well-ordering on W shall be denoted by '<'. Thus for any ordinals a and /3, a < /3 ¢> a e /3. 2.22. Remarks (i) As usual, we denote by•~• the blunt version of<. Thus a "== /3 ¢> a e /3 or a = {3. (ii) Thm. 2.15 can now be read as saying that if a is any ordinal then a={;:;< a}. (iii) From now on, whenever we use order-related terminology in connection with ordinals, we shall take it for granted that the order relation referred to is the e-well-ordering, unless otherwise stated. 2.23. Definition Let< be a partial order on a class A and let BC A. (i) If u e A and x ~ u for all x e B, then u is said to be an upper bound of (or for) B with respect to <. (ii) If u is the least member of the class of upper bounds for B w.r.t. < - that is, if u is an upper bound for B w.r.t. < and if u < v §2. Definition and basic properties 61 whenever v is any other upper bound for B w.r.t. < - then u is said to be the least upper bound (abbreviated 'lub') for B w.r.t. <. 2.24. Remarks (i) The phrase 'with respect to <' is omitted when there is no danger of confusion. (ii) A subclass B of A need not in general have any upper bound, let alone a lub; but if it has a lub, it is unique. 2.25. Theorem If A is a set of ordinals then its union-set LJA is an ordinal. Moreover, LJA is the lub ofA. PROOF To show that UA is transitive, assume that x eye UA. Then for some ordinal a we have x e y ea e A. Since a is transitive, it follows that x e a e A; hence x e UA. By Thm. 2.15, all the members of UA are ordinals; so by Cor. 2.20 UA is e-well-ordered. Thus UA is an ordinal. If a e A then a C UA, since UA is a transitive set. Therefore by U U Lemma 2.16 a :,;;;; A. This means that A is an upper bound for A. Finally, if f3 is any upper bound for A, then for each a e A we have a:,;;;; /3 - that is, a e /3 or a = {3. By the transitivity of the set f3 it follows that in either case a C {3. Since this holds for each a e A, it follows that also UA C /3. By Lemma 2.16 we now have UA:,;;;; /3 - U which proves that A is the least upper bound for A. ■ 2.26. Definition For any ordinal a we put a' =dr a U {a}. We call ex' the immediate successor of a. (This terminology is justified by the following theorem.) 2.27. Theorem For any a, a' is an ordinal. Moreover, for any /3, f3:,;;;; a if! f3 < a' (equivalently: a< f3 if! a',;;;; {3). Hence a< f3 iff a'< /3'. 62 4. Ordinals PROOF Easy-DIY. ■ 2.28. Definition (i) An ordinal of the form a' is called a successor ordinal. (ii) An ordinal that is neither 0 nor a successor ordinal is called a limit ordinal. §3. The rmite ordinals 3.1. Definition An ordinal a is said to be finite if no ordinal ; ::;;; a is a limit ordinal. Otherwise, a is said to be an infinite ordinal. We put w =dt {a: a· is a finite ordinal}. 3.2. Theorem w is transitive. PROOF Let a be a finite ordinal. We must show that every member of a is also a finite ordinal. This is easily done - DIY, using Rem. 2.22(ii). ■ 3.3. Theorem (i) 0 is a finite ordinal. (ii) If a is a finite ordinal then so is a'. PROOF. (i) We know that 0 is an ordinal (Ex. 2.13). But by Def. 2.28(ii) 0 is not a limit ordinal. Since 0 has no members, the only ; such that ; ~ 0 is 0 itself. Hence 0 is a finite ordinal. (ii) Let a be a finite ordinal and let ; ..; a'. We must show that ; is not a limit ordinal. Now, a' itself is a successor ordinal, hence not a limit ordinal. It remains to consider the case where ; < a'. By Tom. 2.27 this means that ; ..; a. Since a is a finite ordinal, ; is not a limit ordinal. ■ §3. The finite ordinals 63 3.4. Theorem w is a set. PROOF Using the Axiom of Infinity (Ax. 1.3.21), take a set Z such that 0 E Z and such that whenever x E Z, then also x U { x} E Z. Thus if an ordinal a belongs to Z then (by Def. 2.26) so does a'. Consider the class w - Z, the class of all finite ordinals not belonging to Z. If this class is non-empty, then by Thm. 2.9 it must have a least member, say /3. Now, /3 cannot be 0, because 0 does belong to Z. Also, /3, being a finite ordinal, cannot be a limit ordinal. So it must be a successor ordinal, say /3 = a' = a u {a}. But in this case a itself is a finite ordinal (by Thm. 3.2), such that a< {3. Since /3 was supposed to be the least finite ordinal not belonging to Z, it follows that a E Z. Therefore by the assumption on Z also a' E Z. But this is impossible, because a' = {3, which is the least finite ordinal not belon- ging to Z. Sow- Z must be empty. Thus w CZ; hence w is a set by AS. ■ 3.5. Corollary w is the unique set X having the following three properties: (i) 0 EX; (ii) whenever a E X then also a' E X; (iii) X C Z for any set Z such that 0 E Z and such that whenever a E Z then also a' E Z. PROOF Thm. 3.3 says that w has properties (i) and (ii). The proof of Tom. 3.4 shows that w has also property (iii). The uniqueness of w follows by PX, because if X is any set having the three properties then both w C X and X C w. ■ 3.6. Remarks (i) Our first use of Al was to prove that w is a set. Conversely, if we postulate that w is a set, then by Tom. 3.3 w is a set satisfying the conditions that AI lays down for Z. This shows that (in the 64 4. Ordinals presence of the other postulates) AI is equivalent to the proposition that w is a set, which is a special case of the Comprehension Principle. (ii) In fact, it now transpires (Cor. 3.5) that w is simply the smallest set satisfying the conditions of AI. We restate the fact that w satisfies condition (iii) of Cor. 3.5 as a principle in its own right: 3.7. Corollary (Weak Principle ofInduction on Finite Ordinals) Let Z be any set such that 0 e Z and such that whenever a E Z then also a' E Z. Then wi;;: Z. ■ 3.8. Remarks (i) We see that the set w of finite ordinals, with its e-well-ordering, simulates, within the confines of ZF set theory, the behaviour that characterizes the system of natural numbers. We can take 0 as the counterpart of the number O and the e-well-ordering on w as the counterpart of the usual ordering of the natural numbers. Just as each natural number n has an immediate successor, n + 1, so every finite ordinal a has an immediate successor, a'. Moreover, the basic facts about the ordering of the natural numbers (Facts 0.1.1-0.1.5) are mimicked by theorems about the finite ordinals and their e-well-ordering. And, most importantly, the Principle of Mathematical Induction is mimicked by the Principle of Induction on Finite Ordinals. Certainly, within ZF w impersonates, plays the role of, 'the set of natural numbers'. In fact, Cor. 3.5 reproduces within ZF Richard Dedekind's famous characterization of the natural numbers.1 (ii) The obvious reductionist step at this point is to identify the ZF-set w of finite ordinals as the 'true' (hitherto intuitive) set N of natural numbers. This would be a grand reduction indeed, because work done during the 19th century by several mathematicians (including Hamilton, Bolzano, Weierstrass, Dedekind and Cantor) showed that all the concepts of mathematical analysis could be reduced to those of natural number, set and membership (plus concepts such as relation and function that we have by 1 Was sind und was sollen die Zahlen?, 1888. (English translation in Essays on the theory ofnumbers edited byW. W. Beman, 1901.} § 3. The finite ordinals 65 now reduced to set-theoretic concepts). Thus a huge part, if not the whole, of mathematics would be reduced to set theory. Many (perhaps most) mathematicians, under the influence of the dominant structuralist ideology, do proceed in this way, and frame (or think of) their mathematical discourse as taking place within set theory. 3.9. Warning This reduction, although extremely successful in a formal sense, is by no means unproblematic, as Skolem pointed out in 1922, when he published his famous paradox. (We shall discuss Skolem's Paradox in the Appendix.) 3.10. Theorem w is the least infinite ordinal and the least limit ordinal. PROOF That w is an ordinal follows at once from Cor. 2.20 and Thms. 3.2 and 3.4. Also, w cannot be a finite ordinal, because that would mean that w e w - which is impossible for an ordinal. Thus w must be an infinite ordinal. On the other hand, if;< w- that is, ; e w- then by Def. 3.1 ; is a finite ordinal; hence w must be the least infinite ordinal. If ; e w then, as we have just seen, ; is a finite ordinal, hence a fortiori, not a limit ordinal. If w itself were not a limit ordinal then by Def. 3.1 it would follow that w is a finite ordinal, contrary to what we have proved. Thus w must be a limit ordinal. As we have just observed, no ordinal smaller than w can be a limit ordinal. Hence w is the least limit ordinal. ■ 3.11. Preview We have yet to justify the adjectives finite and infinite introduced in Def. 3.1 in connection with ordinals. Dedekind defined a set as infinite if there exists an injection from it to a proper subset of itself, and as finite if there is no such injection. We will not adopt Dedekind's definition, but we shall show that finite and infinite ordinals in the sense of Def. 3.1 are finite and infinite respectively in Dedekind's sense. 66 4. Ordinals 3.12. Theorem There does not exist an injection from a finite ordinal to a proper subset of itself. PROOF We proceed by weak induction on finite ordinals (Cor. 3.7). The proof is a formal (or 'internalized') version of the proof of Thm. 3.3.4. Let Z be the set of all finite ordinals a such that there is no injection from a to a subset of itself. In order to prove our theorem it is enough to show that 0 e Z and that if a e Z then also a' e Z. That 0 e Z is obvious, since 0 has no proper subsets. Now assume, as induction hypothesis, that a e Z and let f be an injection from a' - that is, from a U {a} - to a subset B of itself. If Bis a proper subset of a' then the set a' - Bis non-empty. Without loss of generality we may assume that a belongs to a' - B rather than to B. (In the contrary case, where a e B, take any member f3 of a' - B and let g be the bijection from a' to itself that inter- changes f3 and a but leaves all other members of a' fixed: thus, gf:J = a, ga = f3 and g; = ; for any ; e a' other than f3 and a. Then use g of instead of f itself: it is an injection from a' to its proper subset g[B] = (B - {a}) U {/3}.) Our assumption that a e a' - B means that B !:: a. Next, let y = fa; then y must belong to B, since f is a map to B. It now follows that ft a is an injection from a to its proper subset B - {y}. This contradicts the induction hypothesis. So B cannot be a proper subset of a'. ■ 3.13. Theorem If a is an infinite ordinal then there is an injection from a to a proper subset of itself. PROOF First, consider ro. Define a map f on w (that is, with ro as its domain) by putting f; =;' for every finite ordinal ;. Then f is injective. Indeed, if ; and rJ are distinct, say ~ < f/, then by Thm. 2.27 ;' < f/', hence ;' and f/' are also distinct. Also, f maps ro to (in fact, onto) its proper subset w - {0}. Now let a be any infinite ordinal. By Thm. 3.10 we have ro :,s;; a, § 3. The finite ordinals 67 which means that ro e £Y or ro = a; and since a is a transitive set, it follows that ro ~ a. Then the map f U idll'-w (with f as before) is clearly an injection from a to its proper subset a - {0}. ■ 3.14. Theorem A finite ordinal is not equipollent to any other ordinal. PROOF Let a be a finite ordinal and let f3 be another ordinal. First, suppose f3 is finite as well. We have a< f3 or /3 < £Y - that is, /3 e a or f3 e a - and since ordinals are transitive sets it follows that a C f3 or f3 C a; hence by Tom. 3.12 a and f3 cannot be equipollent. Now suppose f3 is an infinite ordinal. By Tum. 3.13 there exists an injection, say g, from f3 to a proper subset of itself. If f were a bijection from a to (3, then clearly 1-10g Of would be an injection from a to a proper subset of itself - which is impossible. ■ 3.15. Definition A set is finite if it is equipollent to a finite ordinal (in the sense of Def. 3.1). Otherwise, it is infinite. 3.16. Remarks (i) By virtue of Thm. 3.14, an ordinal is finite (or infinite) in the sense of Def. 3.1 iff it is finite (or infinite, respectively) in the sense of Def. 3.15; so there in no conflict between the two definitions. (ii) By Thm. 3.14, a finite set is equipollent to a unique finite ordinal. 3.17. Problem (i) Prove that there does not exist an injection from a finite set to a proper subset of itself. (Use Thm. 3.12.) (ii) Prove that if A is a non-empty finite set of ordinals, then A has a greatest member - that is, an ordinal a e A such that ; :,;;; a for each~ e A. (Otherwise, define a map f on A by taking, for each a e A, fa as the least ~ e A such that a < ~. Show that f would be an injection from A to a proper subset of itself.) 68 4. Ordinals 3.18. Problem Let n be a natural number. Show that for any objects ai, a2, ... , an, the set {ai, a2, .•. , an} is finite. (Use weak mathematical induction on the number n.) § 4. Transfmite induction Various forms of the Principle of Mathematical Induction have analogues that apply to ordinals. These analogues collectively are known as the Principle of Transfinite Induction. First, by virtue of the fact that W is well-ordered, we have immediately by Thm. 2.9: 4.1. Theorem (Least Ordinal Principl.e) If Xis a non-empty class of ordinals, then X has a least member. ■ Hence other forms of the Principle of Transfinite Induction can be deduced. 4.2. Theorem (Strong Principle of Transfinite Induction) If Xis a class of ordinals such that for every ordinal ; thenX = W. 1J E Xforevery 1J <;:;,;EX, PROOF Let Y = W - X. If Y were non-empty, it would have a least member, say;. So for each 1J <; we would have 1J EX. But then by(*); e X, which is impossible. Thus Y must be empty. ■ 4.3. Remark By Rem. 2.22(ii) the antecedent, 1J E X for every 1J < ~. in condition (*) of Thm. 4.2 is equivalent to the statement that ; ~ X. 4.4. Theorem (Weak Principle of Transfinite Induction) If Xis a class of ordinals satisfying the following three conditions (i) 0 EX, §5. The Representation Theorem 69 (ii) for every ordinal;,; e X ~ ;' e X, (iii) for every limit ordinal A, A~ X ~ Ae X, thenX = W. PROOF Assume X satisfies these three conditions. Then by (i) and (iii) X satisfies condition (*) of Thm. 4.2 for 0 and for limit ordinals. Now suppose ;' ~ X. By Def. 2.26 it follows that s e X; hence by (ii);' e X. Thus X satisfies(*) also for successor ordinals. ■ 4.5. Remarks (i) These principles have restricted forms, in which Xis assumed to be a subset of some (arbitrary) given ordinal a rather than a subclass of W. Thus, the form of Thm. 4.1 restricted to an arbitrary ordinal a says that a non-empty subset of a has a least member. The restricted form of Thm. 4.2 says that if X is a subset of a such that for all;< ll' we have ; ~ X ~; e X, then X=a. (ii) The Principle of Transfinite Induction restricted to the particular ordinal w is precisely the Principle of Induction on Finite Ordinals. 4.6. Problem Prove the restricted form of Thm. 4.2. Formulate and prove a form of Thm. 4.4 restricted to an arbitrary ordinal. 5.1. Preview § 5. The Representation Theorem In this section we shall show that every well-ordered set is similar in its ordering to a unique ordinal. 5.2. Definition A partially ordered set (briefly, poset) is a pair (A,<}, where A is a set and < is a [sharp] partial order on A . A totally ordered set is a poset (A, <}, in which < is a total order on A. A well-ordered set is a poset (A, <), in which < is a well-ordering on A. 70 4. Ordinals 5.3. Remarks (i) This is just a convenient way of packaging a set A together with a particular partial order on A into a single object. It saves us having to keep saying 'such-and-such a set with such-and-such a partial order on it'. (ii) However, we shall often refer, somewhat inaccurately, to A itself as the poset (or ordered set, or well-ordered set) when, strictly speaking, we have in mind the pair (A, <) . We shall only commit this peccadillo when it is clear from the context which relation < is involved. Thus, we refer to an ordinal a as a well-ordered set, when strictly speaking we mean the pair (a,<), where< is Ea-, the E-well-ordering on a. 5.4. Definition A similarity map (a.k.a. isomorphism) from a poset (A,<) to a poset (A', <') is a bijection f from A to A' such that, for all x and y in A, xfx<'fy. If such a map exists, (A,<) is said to be similar (or isomorphic) to (A',<'). 5.5. Remark It is easy to see that the identity map idA is a similarity map from (A,<) to itself. Also if f is a similarity map from (A,<) to (A', <') then its inverse 1-1 is a similarity map from (A', <') to (A, <). Finally, if f is a similarity map from (A,<) to (A',<') and g is a similarity map from (A', <') to (A", < ") then the composition go f is a similarity map from (A,<) to (A",<"). It follows that similarity is an equivalence relation on the class of posets. 5.6. Theorem If f is a similarity map from an ordinal a to an ordinal /3 then f is the identity map ida-, hence a = {3. PROOF First, we prove by strong transfinite induction (restricted to a) that S:,;;; /s for every SE fr. § 5. The Representation Theorem 71 Let l_; E a. By the induction hypothesis, if 1J < ,; then 1J:,;;:; /17. But if * 'f/ <,; then also f'f/ < f,;, since f is a similarity map. Thus for every 'f/ < l; we have 'f/ < f,;. In particular, 'f/ fl; for every 'f/ < l;; in other words, f,; <,; is impossible. This proves that l;:,;;:; f,; and completes the induction. Now, 1-1 is a similarity map from /3 to CY; therefore by the same token we have also s:,;;:; f-1s for all sE /3. Taking s to be fl;, where ,; E a, we obtain f,;:,;;:; 1-1/,; = f Thus f,;:,;;:;,; as well as,;:,;;:; f,;, which shows that f must be the identity ida-. ■ 5.7. Corollary For any poset (A, -<), there exists at most one similarity map from (A,<) to an ordinal. PROOF If f and g are isomorphisms from (A, -<) to a and /3 respectively, then the composition g O 1-1 is clearly an isomorphism from a to {3. Therefore a= /3 and g O 1-1 is the identity mapping, which means that f= g. ■ 5.8. Preliminaries (i) For the rest of this section. we consider a fixed but otherwise arbitrary well-ordered set (A, -<). (ii) If B ~ A, then B is clearly well-ordered by the relation < n B2, that is: { ( x. y): x E B, and y E B, and x -< y}, which is called the restriction of < to B. Whenever we refer to a subset B of A as well-ordered, we shall mean B with this well-ordering, inherited by B from A. (iii) For each a EA, the segment of A determined by a is the set Aa =dt {x EA: X-< a}. (iv) We define a class Fas follows: F =df { (x, l;) : x EA, and,; is an ordinal, and Ax is similar to l;}. By Cor. 5.7, Fis a function (see Def. 2.2.1). We may therefore 72 4. Ordinals use functional notation in connection with F. Thus 'Fx = s' means the same as '(x, ;) e F'. Clearly, dom F is a subset of A. By AS dom Fis a set; hence by AR ran F is a set as well. Note that all the members of ran F are ordinals. 5.9. Lemma Let Fa= a. Then for any ordinal f3 < a there exists some b < a such that Fb = /3. Conversely, if b < a then b belongs to dom F and Fb is some ordinal f3 < a. PROOF Let f be the similarity map from Aa to a. Suppose fJ mc(F, fJ) for all /3 :!:i a-. 6.5. Lemma If both mc(F. a) and mc(G, a-) then F; = G; for all; :!:i a-. PROOF By (strong) transfinite induction, restricted to a-'. Let ; be any ordinal :!:i a- (that is. ; < Ye ell for every finite Y C X. We shall use the WOT to prove the following useful result. 2.8. Theorem (Tukey-Teichmuller Lemma). If ell is a set of finite character, then for every A E d there exists an Med such that A C Mand Mis maximal in dl w.r.t. Cc1, PROOF By the WOT, dl is equipollent to some ordinal a. Let G be a bijection from a to d. Thus Take any A E ell; we shall hold A fixed for the rest of the proof. Without loss of generality, we may assume that A = G0 - otherwise, we could compose G with the bijection from d to itself that interchanges A with G0 and leaves all other members of d alone. Using transfinite recursion restricted to a (see Rem. 4.6.S(ii)), we define a map Fon a such that, for every;< a, if U{F17: 17 < ;) CG;, otherwise. (Note that {F17 : 'fJ < ;} = ran (Ft;), so that here Fl; is indeed being determined in terms of Ft;, as required in transfinite recursion.) It is clear that F is monotone in the sense that whenever 17 """ ; < a then F17 CF;. We claim that F; e ell for every;< a. We shall prove this claim by strong transfinite induction restricted to a. Let ; < a; our induction hypothesis is that F17 Ed for every 'fJ < l;. Now, F; is G; or U{F17: 'fJ < ;}. Since certainly Gl; Ed, we need only prove that the union U{F17 : 17 < l;} belongs to d. But di, is a set of finite character. So it is enough to show that every finite subset of U{F17: 1J < l;} belongs to d. We need only deal with non-empty subsets, since 0 is a finite subset of A, and as such must in any case belong to ell. Let B be a non-empty finite subset of U{F'fJ : 17 < l;}. Then for each b E B there exists some '1J < l; such that b e F1]. Define a map f from 84 5. The Axiom of Choice B to ; by putting, for each b e B, fb =dr the least 'f/ <; such that b e FrJ. By Lemma 2.2, ran/ is a finite non-empty set of ordinals < ;. Hence by Prob. 4.3.17(ii) ran/ has a greatest member, say 'f/*. This means that for every be B we have fb ~ rJ*; and, since F is monotone, it follows that F(fb) ~ F(r,*). But by the definition of f we have be F(fb); hence be F(fb) ~ F(r,*) for every be B. Thus B !;;: F(TJ*). But 'f/* < ;, so by our induction hypothesis F(rJ*) belongs to o1.; and since o1. is of finite character B, as a finite subset of F(rJ*), must also belong to o1.. This completes the proof that F; e o1. for every ; ~ a. We now put M = U{FrJ: 'f/ < a}. We shall show that M has the properties claimed by our theorem. The fact that M E o1. is proved by showing, exactly as before, that every finite subset of M belongs to o1.. Also, it is easy to see that F0 = G0 = A, hence A !;;: M. It remains to show that M is maximal w.r.t. Cot. Suppose this were not so. Then there would be some XE o1. such that MC X. Now, X must be G; for some ; < a, so the assumption Mc X means that U{FrJ: 'f/ < a} CG;. Hence, a fortiori, But in this case the definition of F says that F; = G;. It would then follow that U{F'Y/: 'f/ < a} C F;-which is impossible. ■ 2.9. Definition Let (A, <) be a poset. A chain in (A, <) is any subset C of A such that, for all x and yin C, x < y or x = y or y < x. 2.10. Remark In other words, a chain in {A,<) is a subset of A that is totally ordered by the restriction of< to it. We shall use the Tukey-Teichmilller (TT) Lemma to prove: §2. From WOT to AC 85 2.11. Theorem (Hausdorff Maximality Principle) Let (A,<) be a poset and let (Q be the set of all chains in (A,<). Then every member of (Q is included in some member of (Q that is maximal w.r.t. Cr2. PROOF The condition for C being a chain in (A,<) (see Def. 2.9) involves only two members of C at a time. Hence it is easy to see that the set (Q of all chains is of finite character. Therefore the TT Lemma applies to WOT => TT Lemma => HMP => Zorn's Lemma. Now we shall complete the cycle: 86 5. The Axiom of Choice 2.13. Theorem ACfollows from Zorn's Lemma. PROOF Let J. be a set of non-empty sets. We must show that there exists a choice function on J.. If J. is empty then 0 is the required choice function. So from now on we may assume that J. is non-empty. Let us say that I is a partial choice function (pcf), if I is a choice function on a subset of J.. Such creatures do exist: for example, if A is any member of J. and a is any member of A then {(A, a)} is a choice function on {A} and hence a pcf. Let (f be the set of all pcfs. (It is easy to verify that (f is indeed a set; DIY.) As we have just seen, (f is non-empty. We now consider the poset ((f, Cq). Note that if I and fJ are pcfs, then IC fJ means that dom/ C domfJ and IX= qX for each Xe dom/. We shall show that ((f, Cq} satisfies the condition of Zorn's e Lemma. To this end, let us consider any chain in this poset. We claim that its union, Ue, is an upper bound fore in (f. For any I e @ we obviously have I k Ue. So it only remains to show that Ue belongs to (f; in other words, that Ue is a pcf. Since every member of e, being a pcf, is a set of ordered pairs ( X, x) such that x e X e J., it is clear that U@ likewise is a set of ordered pairs of this kind. It only remains to show that U@ is a function. Now, if both I and fJ are members of @ then, since e is a chain, we must have I k 'l- or fJ ~ I- Therefore X e dom I n dom fJ then IX = fJX. Thus the coherence condition is fulfilled, showing that U@ is indeed a function (see Prob. 2.4.8). We can now apply Zorn's Lemma to the poset ((f, Cq}. Since (f is non-empty, it follows from the Lemma that there exists some fJ E (f that is maximal w.r.t. Cq. Such fJ is a pcf - a choice function on a subset of J.. However, if domq were not the whole of J., we could take any A e J'. - domfJ and any a e A, and put l='l-U{(A,a)}. Then I would be a pcf such that fJ C I, contradicting the maximality of fJ· Therefore fJ must be a choice function on the whole of J.. ■