Set theory

From Citizendium
Jump to navigation Jump to search
This article has a Citable Version.
Main Article
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
This editable Main Article has an approved citable version (see its Citable Version subpage). While we have done conscientious work, we cannot guarantee that this Main Article, or its citable version, is wholly free of mistakes. By helping to improve this editable Main Article, you will help the process of generating a new, improved citable version.

A set, in mathematics, is a collection of distinct entities, called its elements, considered as a whole. The early study of sets led to a family of paradoxes and apparent contradictions. It therefore became necessary to abandon "naïve" conceptions of sets, and a precise definition that avoids the paradoxes turns out to be a tricky matter. However, some unproblematic examples from naïve set theory will make the concept clearer. These examples will be used throughout this article:

  • A = the set of the numbers 1, 2 and 3.
  • B = the set of primary light colours—red, green and blue.
  • C = the empty set (the set with no elements).
  • D = the set of all books in the British Library.
  • E = the set of all positive integers, 1, 2, 3, 4, and so on.

Note that the last of these sets is infinite.

A set is the collection of its elements considered as a single, abstract entity. Note that this is different from the elements themselves, and may have different properties. For example, the elements of D are flammable (they are books), but D itself is not flammable, since abstract objects cannot be burnt.

History of set theory

Set theory grew out of the need for a better understanding of the concept of infinity. Georg Cantor (1845–1918), "the father of set theory", used it to demonstrate for the first time that two infinite objects could have different sizes. The proof today seems straightforward, but how great a leap it represented is made evident by the resistance and hostility it met from some of the greatest mathematicians of the day.

Set theory began by reasoning about collections of objects and operations on those collections, the operations of what would now be called "naïve set theory". It occupied a position on the boundary of mathematics and logic, partly because of an apparent, though ultimately illusory, duality between sets and predicates. However, it soon ran into serious paradoxes. These were banished only by axiomatising the theory - expressing the basis of reasoning about sets in the form of formal axioms. The axioms giving rise to the paradoxical behaviour could then be identified; they were eliminated and replaced with something less powerful, but more complicated and certainly less intuitive. Different versions of set theory achieved this in many different ways.

The episode represented something of a crisis for mathematics, since it appeared for a while that sound mathematical reasoning could lead to contradictory conclusions. The security of the foundations of the new set theory that arose in response was used as a bulwark to secure those of the rest of mathematics. Almost all other mathematical objects can be shown to be modelled by set-theoretical constructs. Since it seems reasonable to hope that no further paradoxes lurk in the axioms of modern set theory, other mathematical disciplines are thus protected from logical contradictions. Thus, though the set/predicate duality could not be entirely sustained, set theory remained firmly entwined with mathematical logic.

Naïve set theory


By convention, a set can be written by listing its elements, separated by commas, between {braces}. Using the sets defined above:

  • A = {1, 2, 3}
  • B = {red, green, blue}
  • C = {}

This is impractical for large sets (D), and impossible for infinite ones (E). Thus a set can also be described by naming a particular property that is shared by all its elements and only by them. A common notation uses a bar (|) to separate a variable name (e.g. "x") from a property of the variable that elements of the set must have. For example:

  • D = {x | x is a book and x is in the British Library}
  • E = {x | x is a positive integer}

A simple translation of this notation is that "D is the set of all x, where x is a book and x is in the British Library" or "E is the set of all x, where x is a positive integer".

Definition of a set

A set is defined completely by its elements. Formally, sets X and Y are the same set if they have the same elements; that is, if every element of X is also an element of Y, and vice versa. For example, suppose we define:

  • F = {x | (x is an integer) and (0 < x < 4)}

Then F=A. The definition also makes it clear that {1,2,3} = (3,2,1} = {1,1,2,3,2}. Note also that if X and Y are both empty, then they are equal, justifying referring to "the" empty set.

The equivalence of empty sets has a metaphysical consequence for some theories of the metaphysics of properties that define the property of being x as simply the set of all x, then if the two properties are uninstantiated or coextensive they are equivalent - under this theory, because there are no unicorns and there are no pixies, the property of being a unicorn and being a pixie are the same - but if there were a unicorns and pixies, we could tell them apart. (See Universals for more on this.)

Elements and subsets

The ∈ sign indicates set membership. If x is an element (or "member") of a set X, we write x∈X; e.g. 3∈A. (We may also say "X contains x".)

A very important notion is that of a subset. X is a subset of Y, written X⊆Y (often simply as X⊂Y), if every element of X is also an element of Y. E.g. C⊆A⊆E.

Sets can of course be elements of other sets; for example we can form the set G = {A,B,C,D,E}, whose five elements are the sets we considered earlier. Then, for instance, A∈G. (Note that this is very different from saying A⊆G!)

The tilde (~) is as usual used for negation; e.g. ~(A⊆B).

Set operations

Suppose X and Y are sets. Various operations allow us to build new sets from them.


The union of X and Y, written X∪Y, contains all the elements in X and all those in Y. Thus A∪B = {1, 2, 3, red, green, blue}. As A is a subset of E, the set A∪E is just E.


The intersection of X and Y, written X∩Y, contains all the elements that are common to both X and Y. Thus {1,2,3,red,green,blue} ∩ {2,4,6,8,10} = {2}.

Set difference

The difference X minus Y, written X−Y or X\Y, contains all those elements in X that are not also in Y. For example, E−A contains all integers greater than 3. A−B is just A; red, green and blue were not elements of A, so no difference is made by excluding them.

Complement and universal set

The universal set (if it exists), usually denoted U, is a set of which everything conceivable is a member. In pure set theory, normally sets are the only objects considered (unlike here, where we have also considered numbers, colours and books, for example); in this case U would be the set of all sets. (Non-set objects, where they are allowed, are called 'urelemente' and are discussed below.)

In the presence of a universal set we can define X′, the complement of X, to be U−X. X′ contains everything in the universe apart from the elements of X. We could alternatively have defined it as

X′ = {x | ~ (x∈X)}

and U as the complement of the empty set.


The cardinality |X| of X is, roughly speaking, its size. The empty set has size 0, while B = {red, green, blue} has size 3. This method of expressing cardinality works for finite sets but is not helpful for infinite ones. A more useful notion is that of two sets having the same size: if a direct one-to-one correspondence can be found between the elements of X and those of Y, they have the same size. Consider the sets, both infinite, of positive integers {1,2,3,4, ...} and of even positive integers {2,4,6,8, ...}. One is a subset of the other. Nevertheless they have the same cardinality, as is shown by the correspondence mapping n in the former set to 2n in the latter. We can express this by writing |X| = |Y|, or by saying that X and Y are "equinumerous".

Power set

The power set of X, P(X), is the set whose elements are all the subsets of X. Thus P(A) = {{}, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}}. The power set of the empty set P({}) = {{}}. Note that in both cases the cardinality of the power set is strictly greater than that of base set: No one-to-one correspondence exists between the set and its power set. Cantor proved that this in fact holds for any set (Cantor's Theorem). This is obvious for a finite set, but Cantor's ingenious proof made no reference to the set being finite; the theorem holds even for infinite sets. This was a powerful generalisation of his previous discovery that different sizes of infinity exist.


Comprehension is the set-formation operation that we have seen expressed in bar notation, which forms a set from a predicate (i.e. a property). For example, "x is purple" is a predicate, and the set of all purple things is the set of all x satisfying the predicate, namely

{x| x is purple}.

Comprehension appears to offer an exact correspondence between (extensional) predicates and sets: the predicate p corresponds to the set of x such that p(x) holds. Conversely, of course, a set X corresponds to a property, namely the property of being an element of X. The correspondence extends to link basic set-theoretic operations with logical ones: if X={x|p(x)} and Y={x|q(x)}, then X∩Y ={x|p(x) and q(x)}, X∪Y={x|p(x) or q(x)}, and X′={x| not p(x)}. Unfortunately full comprehension always leads to paradoxes and cannot be admitted as an axiom to a consistent system, so this elegant correspondence must sometimes fail.

The universal set U may be formed by comprehension over some predicate, such as x=x, which is always true: U={x| x=x}.

The paradoxes

The naïveté of "naïve" set theory lay in not being explicit about what set-formation operations were allowed. A set was seen, not as an entity in a formal axiomatic system, but as a collection of objects being considered as a whole. Any reasonable collection, it seemed, could make a claim to sethood, and in particular, any set formed by comprehension. However, two paradoxes dealt a crushing blow to this happy state of affairs.

Cantor's Paradox

We have met Cantor's Theorem, that the power set P(X) is strictly larger than X. If there is a universal set U (which, as we've seen, can be formed by comprehension), then P(U) must be larger than U. But this contradicts the assumption that U is a universal set, so P(U)⊆U. This contradiction is paradoxical if the existence of U is assumed.

Actually it turns out that it is possible to devise an apparently consistent set theory with a universal set. Such a theory must include some sets, including U, for which Cantor's Theorem does not hold. These sets however ("non-Cantorian" sets) must have rather peculiar and counter-intuitive properties, and standard versions of modern set theory exclude them.

Russell's Paradox

Russell's paradox, discovered by Bertrand Russell (1872-1970), dealt a far more lethal blow to the principle of set-formation by comprehension. Note that one can ask whether a set is an element of itself. The set of all sets would have this property, for example (it is a set); the set of all bananas would not (it is not a banana), nor would the empty set. That is, the latter two sets satisfy the opposite predicate, ~(x∈x). Russell observed that comprehension over this predicate is impossible: one cannot form

{x| ~(x∈x)}.

For it is easy to see that if this set existed (call it R, say), then one could prove both R∈R and ~(R∈R), a contradiction.

Unlike Cantor's paradox, Russell's is absolute; it shows that it is impossible to allow set-formation by comprehension without some restriction on what predicates can be used, no matter what other axioms are in place. In particular, the apparent correspondence between sets and predicates must fail somewhere.

It is worth noting that the discovery of these paradoxes, though they made sweeping changes to the eventual form of set theory, in no way invalidated Cantor's work on infinities, for which he had founded the subject.

Axiomatic set theory

Russell's Paradox is puzzling if we stick to the original intuition that a set is just a collection of objects sharing some property, considered together as an object of thought. Why can we not consider together all those sets that are not elements of themselves? Clearly we can. The problem arose only when we tried to consider R both as a set and as a potential element of other sets. The ability to form sets of sets is essential to set theory; but the correct interpretation of Russell's Paradox is that there are limits to what classes of objects can be collected together, while still considering the collection as being of the same kind as the objects. This insight led to the axiomatic approach to set theory. Axioms require us to be explicit about what set formation rules are allowed, as well as what operations we consider.

A note on axiom schemas

Note that Comprehension, if expressed in axiomatic form, would not be a single axiom. It would state that for a predicate p, one can form the set {x|p(x)}. But any such p is an expression in the language of set theory itself; at least if the language is "first-order", it does not have variables that range over its own expressions. Thus, a separate axiom would be needed for each predicate. Comprehension would be an "axiom schema", with infinitely many instances (axioms).

Type theory

Russell's own solution, propounded with Alfred North Whitehead, was type theory. In type theory, there is a collection of 'types'; there is a base type (type 0 say), and each type has a (distinct) successor type (types 1, 2, 3, ...). Each set is of a particular type, and can be an element only of sets of its successor type. Under these rules, a "set of all sets" is ruled out, as it would contain sets of different types. Comprehension over sets of a particular type is permitted, and yields a set of the successor type. Russell's Paradox cannot even be expressed, since one cannot write x∈x.

Type theory succeeded in avoiding the paradoxes, and is appealingly simple, but proved too weak for the practical needs of mathematics.

Zermelo-Fraenkel axioms

Zermelo's axioms for set theory weaken Comprehension to an axiom schema of Separation, which allows one to form {x∈S| p(x)} for a predicate p and set S. Since this requires a pre-existing set S, there must be some additional set-formation axioms to compensate for the weakening of Comprehension. These are the axiom of Infinity, stating that an infinite set exists, and the Powerset axiom, allowing the power set of a given set to be formed. Some explicit axioms must also be added to allow simple operations such as pair-sets (e.g. {x,y}) and unions. The technical axiom of Foundation ensures that sets exist only if they can be formed by the given set-formation operations.

To this, Fraenkel realised it was necessary to add the axiom schema of Replacement. If S is a set and F some operation ("function-class") on sets, this allows one to operate with F on all the elements of S, and collect the results into a new set. The resulting set theory, called ZF -- sometimes with the addition of the Axiom of Choice (ZFC); see below - has proved to be sufficient for the needs of much of mathematics, and is the set theory most usually assumed as a foundation by mathematicians outside set theory. Set theorists have studied it extensively, and found no reason to suspect that it hides lurking inconsistencies.

Within the Zermelo-Fraenkel (ZF) system one can speak informally of the "class" of sets satisfying a particular predicate. A class that is not a set — such as the Universe — is a "proper class". But classes are not explicitly integrated with the theory.

There are eight ZF axioms:[1]

  1. Axiom of extensionality: If X and Y have the same elements, then X=Y
  2. Axiom of pairing: For any a and b there exists a set {a, b} that contains exactly a and b
  3. Axiom schema of separation: If φ is a property with parameter p, then for any X and p there exists a set Y that contains all those elements uX that have the property φ; that is, the set Y={uX(u, p)}
  4. Axiom of union: For any set X there exists a set Y=∪X, the union of all elements of X
  5. Axiom of power set: For any X there exists a set Y=P(X), the set of all subsets of X
  6. Axiom of infinity: There exists an infinite set
  7. Axiom schema of replacement: If f is a function, then for any X there exists a set Y, denoted F(X) such that F(X)={f(x)|xX}
  8. Axiom of regularity: Every nonempty set has an ∈-minimal element

If to these is added the axiom of choice, the theory is designated as the ZFC theory:

9. Axiom of choice: Every family of nonempty sets has a choice function

A discussion of the interpretation of these axioms and their technical terms is found in Zermelo-Fraenkel axioms and in the cited reference.[1]

Axiom of choice

The axiom of choice (AC) says, loosely speaking, that given a (generally infinite) set of non-empty sets, it is possible to choose an element from each one in a uniform way. The axiom so stated is intuitively plausible, but some of its consequences (for example, the Banach-Tarski Paradox) are not. AC is known to be independent of ZF: that is, in addition to the axioms of ZF, one can assume AC to be either true or false without introducing any inconsistencies.

Other axiomatisations

There are many other sets of axioms for set theory. Here we present some of the most important variations.


NBG set theory was developed by von Neumann, Bernays and Gödel between about 1920 and 1940. Its objects explicitly include classes as well as sets; the difference is that classes are not allowed to be elements of other classes or sets. It is a "conservative extension" of ZFC: a statement about sets is provable in NBG if and only if it is provable in ZFC.

NBG's classes are formed by Comprehension: given a predicate, the collection of sets satisfying it is a class. For example, the Universe is a class, although of course it is not a set. An ingenious axiom of von Neumann's, Limitation of Size, specifies which classes are sets: a class cannot be a set precisely if it is equinumerous with the Universe. This single idea does the work of the Separation and Replacement axioms. It is interesting that, since NBG conservatively extends ZF, the latter also effectively defines sethood by limitation of size: its proper classes are those that are "too big" to be sets.

NBG has another appealing property: it can be finitely axiomatised. Although Comprehension is an axiom schema with infinitely many instances, it is provably equivalent in NBG to a finite (in fact, quite small) collection of its instances. Montague proved in 1961 that ZF cannot be finitely axiomatised.

From a theoretical perspective NBG is more elegant than ZF. It introduces no inconsistencies: because it is a conservative extension of ZF, if the latter is consistent, so is NBG. However the broader mathematical community has not seen any need to adopt it.

NF: set theory with a universal set

Quine's NF (for "New Foundations") allows true comprehension for "stratified" formulae, namely those that can be expressed in simple type theory. For example, x=x is stratified; as this is true for all x, the Universe is a set in NF. However, x∈x is not stratified, since in type theory, one can write x∈y only if the type of y is the successor to that of x. Thus Comprehension does not allow one to form the paradoxical Russell set. Cantor's Paradox is avoided in a more subtle way: Cantor's Theorem for a set X relies on the fact that X is equinumerous with the set of singleton sets of its elements. For example

|{a, b, c}| = |{ {a}, {b}, {c} }|

While this is true for all sets in standard set theory, in NF the equivalence cannot be shown for some sets, including the Universe. Such sets are called "non-Cantorian". NF would be appealing if one considered that limitation of size is an unnatural way to define sethood, that the Universe is a rather natural set, and that the true criterion for sethood should concern the structure rather than merely the size of a class. The existence of non-Cantorian sets is the rather high price paid for satisfying these requirements. NF, like NBG, is finitely axiomatisable. Almost nothing is known about its consistency, though it is probably rather weak (i.e. consistent). The addition of urelemente (see below), a seemingly minor modification, yields a theory, NFU, whose consistency is implied by that of simple type theory (and certainly therefore by that of ZF).


In most formulations of set theory, all entities are sets; the only possible elements of a set are therefore other sets. In particular if an entity has no elements it must, by Extensionality, be the empty set. However, one may wish to allow primordial "atoms" as possible elements of sets. Such atoms are called urelemente. They can be introduced to a theory by weakening the axiom of Extensionality to apply only to non-empty sets.

Relative consistency

An inconsistent theory can prove any proposition at all, including false ones. Mathematicians want a foundational set theory to be 'strong' (that is, to allow lots of proofs), but not so strong as to be inconsistent. Having been burned by the paradoxes that plagued set theory in its early years, set theorists are naturally keen to prove results about the consistency of set theories. Unfortunately this is a very delicate and tricky business. For if a set theory can be used to 'prove' its own consistency, it does not provide any reason to believe that it really is consistent. The proof may work either because the result is true (the theory is consistent), or on the contrary because the theory is inconsistent and can prove anything. Indeed, a famous result of Gödel's shows that if a theory is strong enough to prove its own consistency, then it is in fact inconsistent! This statement of Gödel's result depends on exactly how the 'consistency' of a system is expressed within the system (mathematised): alternative mathematisations of 'consistency' exist that do allow a system to prove its own consistency without being inconsistent. But the problem remains that an inconsistent system will be able to do so too; so such a 'proof' does not really provide evidence of consistency.

Since an absolute proof of the consistency of any system is impossible, set theorists focus instead on relative consistency proofs. A relative consistency proof between two systems R and S might say, for instance, that if R is consistent, then so is S. Put informally, S is weaker or 'more consistent' than R. Such a result is particularly useful if S is an extension of R. In this case, S is actually stronger than R, as it has extra axioms; but the relative consistency proof provides confidence that these have not introduced new inconsistencies into the system. In other words, the extra axioms may be used with impunity, and if an inconsistency is eventually found, it was lurking in the original system all along.

Relative consistency proofs can also be used to compare alternative theories. Any set theory worth the name can prove the consistency of simple type theory (STT). STT is thus very trustworthy, but unfortunately, as we have seen, not strong enough for the needs of real mathematics. ZF has been heavily studied and is known to be consistent relative to many other systems that have been proposed, while having enough expressive power to model almost all of mathematics, and this accounts for its widespread adoption by mathematicians in fields outside set theory.


  1. 1.0 1.1 Thomas J Jech (1978). Set theory. Academic Press. ISBN 0123819504.