Theory (mathematics)

Mathematics is structurally distinguishable

Axiomatic or non-axiomatic

There are two possible approaches to mathematics, called by R. Feynman "the Babylonian tradition" and "the Greek tradition". They answer differently the question, whether or not some mathematical facts are more fundamental than others, more consequential facts. The same approaches apply to any theory, mathematical or not.

The Babylonian (non-axiomatic) tradition treats a theory as a network whose nodes are facts and connections are derivations. If some facts are forgotten they probably can be derived from the others.

The Greek (axiomatic) tradition treats a theory as a tower of more consequential facts called theorems, grounded on the basis of more fundamental facts called axioms. If all theorems are forgotten they surely can be derived from the axioms. The axioms are sparse and simple, not to be forgotten.

Monotonic or non-monotonic

The distinction between these two approaches is closely related to the distinction between monotonic and non-monotonic logic. These answer differently the question, whether or not a fact can be retracted because of new evidence. Monotonic logic answers in the negative, non-monotonic logic answers in the affirmative.

Non-monotonic logic is used routinely in everyday life and research. An example: "being a bird, this animal should fly", but the bird may appear to be a penguin. Another example: "the grass is wet, therefore it rained", but the cause may appear to be a sprinkler.

The non-axiomatic approach is flexible. When needed, some old facts can be retracted, some new facts added, and some derivations changed accordingly. Nowadays this approach is widely used outside mathematics, and only marginally within mathematics (so-called informal mathematics).

The axiomatic approach is inflexible. A theorem cannot be retracted without removing (or replacing) at least one axiom, which usually has dramatic consequences for many other theorems. Nowadays this approach is widely used in mathematics.

The non-axiomatic approach is well suited when new evidence often comes from the outside. The axiomatic approach is well suited for a theory that advances only by extracting new consequences from an immutable list of axioms. It may seem that such a development must be dull. Surprisingly, this is an illusion. Being inflexible in some sense, an axiomatic theory can be very flexible in another sense, see Sect. 2.2.

Defined or undefined

Facts are formulated via notions.

In the non-axiomatic approach, notions are nodes of a network whose connections are definitions. If some notions are forgotten they probably can be restored from the others.

Searching Google for "define:line" we get "a length without breadth or thickness". Similarly we find definitions for breadth, thickness and so on, recursively. Doing so we would get a large subnetwork; here is its small fragment:

Line: a length without breadth or thickness
- Length: linear extent in space
  - Linear: along a line↑
  - Extent: a range of locations
    - Location: point or extent↑ in space
  - Space: unlimited expanse in which everything is located↑
- Breadth: the extent↑ from side to side
  - Side: a surface forming part of the outside of an object
    - Surface: the extended↑ two-dimensional outer boundary of a three-dimensional object
- Thickness: the dimension through an object as opposed to its length↑ or width
  - Width: the extent↑ from side↑ to side↑

(Up arrows mean: see above.) We observe that

circularity appears routinely; for example: line→length→linear→line;
the definition of a single notion involves recursively a large number of other, quite remote notions.

Such system of notions is unsuitable for a mathematical theory. Here, circularity is disallowed, and the set of involved notions is kept reasonably small (whenever possible).

In the axiomatic approach, notions are a tower of defined notions, grounded on the basis of more fundamental notions called undefined primitives. If all defined notions are forgotten they surely can be restored from the undefined primitives. The undefined primitives are sparse and simple, not to be forgotten.

Curiously, when a non-mathematical encyclopedia contains an article on a mathematical notion, two very different "definitions" may appear, one general (informal), the other mathematical (formal).

From now on, in this article, "definition" means a mathematical definition (unless explicitly stated otherwise).

The lack of definition of a primitive notion does not mean lack of any information about this notion. Axioms provide such information, to be used in proofs. Informal (intuitive) understanding of a primitive notion is communicated in a natural language. This information cannot be used in proofs, but is instrumental when guessing what to prove, how to prove, how to apply proved theorems and, last but not least, what to postulate by axioms.

Mathematics is not isolated

Computers metaphor

A conceptual metaphor helps to understand one conceptual domain in terms of another. For example, the desktop metaphor treats the monitor of a computer as if it is the user's desktop, which helps to a user not accustomed to the computer. Nowadays many are more accustomed to computers than to mathematics. Thus, analogies with computers may help to understand mathematics. Such analogies are widely used below.

Flexible or inflexible

In 1960s a computer was an electronic monster able to read from a punch tape simple instructions stipulated by the hardware and execute them quickly, thus performing a lot of boring calculations. Nowadays some parents complain that personal computers are too fascinating. However, without software a personal computer is only able to read (say, from a compact disk) and execute instructions stipulated by the hardware. These instructions are now as technical as before: simple arithmetical and logical operations, loops, conditional execution etc. A computer is dull, be it a monster of 1960s or a nice looking personal computer, unless programmers develop fascinating combinations (called programs) of these technical instructions.

For a programmer, the instruction set of a given computer is an immutable list. The programmer cannot add new elements to this list, nor modify existing elements. In this sense the instruction set is inflexible. New programs are only new combinations of the given elements. Does it mean that program development is dull? In no way! A good instruction set is universal. It means that a competent programmer feels pretty free to implement any well-understood algorithm provided that the time permits (which is usually most problematic) and the memory and the speed of the computer are high enough (which is usually less problematic). In this sense a good instruction set is very flexible.

Likewise a mathematician working within a given theory faces its axioms as an immutable list. However, this list can be universal in the sense that a competent mathematician feels pretty free to express any clear mathematical idea. In this sense some axiomatic systems are very flexible.

Universal or specialized

For a user, the software of a computer is first of all a collection of applications (games, web browsers, media players, word processors, image editors etc.) All applications function within an operating system (Windows, MacOS, Linux etc.) The operating system is a universal infrastructure behind various specialized applications. Each application deals with relevant files. The operating system maintains files in general, and catalogs (directories) containing files and possibly other catalogs.

Likewise, the set theory is a universal infrastructure behind various specialized mathematical theories (algebra, geometry, analysis etc.) Each specialized mathematical theory deals with relevant objects, relations and sets. The sets theory deals with sets in general, possibly containing other sets, and reduces objects and relations to sets.

In many cases a specialized mathematical theory is a theory of mathematical structures of some kind, often called spaces (of this kind). For example: Euclidean geometry is a theory of Euclidean spaces; topology is a theory of topological spaces; linear algebra is a theory of linear spaces. Differential geometry investigates smooth manifolds. Algebra investigates groups, rings, fields etc.

Motivated or indiscriminate

Monkeys could type into a computer a sequence of hardware instructions; the computer could execute them; but the result of such "programming" has almost no chance to be fascinating or useful. Fascinating computer games reflect human predilections. Useful programs reflect human needs. A computer is dull for humans unless its software reflects human life in one way or another.

Likewise, a theorem is of no interest for humans unless it is motivated in one way or another by human life. The motivation may be quite indirect; many theorems "only" help to prove other theorems, many are appreciated "only" for their aesthetic value, etc. But some kind of motivation is necessary. Indiscriminate stream of logical consequences of the axioms is not publishable in the mathematical literature.

Note that "a theorem" does not mean "a motivated theorem", "an important theorem" etc., not even "an already discovered theorem". All theorems are just an indiscriminate stream of logical consequences of the axioms.

Theorems of a theory are, by definition, statements that follow from the given axioms according to the given rules (called by different authors inference rules, derivation rules, deduction rules, transformation rules), including the axioms themselves.

From technical to human: definitions

The gap between a bare hardware and a nice application is too wide for a single jump, or even a triple jump (hardware – operating system – programming language – application). Bridging the gap is a laborious task for many programmers. They compose programs of modules, and modules of subroutines. Each subroutine reduces a bit more useful task to a bit simpler tasks. Ultimately, a useful (or even fascinating) task is reduced to the technical instructions of the bare hardware.

Likewise, mathematicians bridge the wide gap between useful notions (say, "ellipse" or "normal distribution") and the undefined primitives by a large and complicated system of definitions. Each definition reduces a bit more useful notion to a bit more primitive notions.

Mathematical definitions are very diverse.

A definition may be just an abbreviation local to a single calculation, like this: "denoting for convenience $x^{2}+x+1$ by $a$ we have...".

A single definition may embed a whole specialized mathematical theory into the universal set-theoretic framework, like this:

"A Euclidean space consists, by definition, of three sets, whose elements are called points, lines and planes respectively, and six relations, one called betweenness, three called containment and two called congruence, satisfying the following conditions: ..."

A definition may be given mostly for speaking more concisely, like this: "in a triangle, the altitude of a vertex is, by definition, its distance to the line through the other two vertices".

A definition may introduce a new revolutionary concept, like this: "the derivative of a function at a point is, by definition, the limit of the ratio..."

Some mathematicians will tell you that the main aim of their research is to find the right definition, after which their whole area will be illuminated. ... For other mathematicians, the main purpose of definitions is to prove theorems... [The Companion, pages 74-75]

Principles and practices

Sharp or fuzzy; real or ideal

A fair coin is tossed 1000 times; can it happen at random that heads is obtained all the 1000 times? Emile Borel, a famous mathematician, answers in the negative, with an important reservation:

Such is the sort of event which, though its impossibility may not be rationally demonstrable, is, however, so unlikely that no sensible person will hesitate to declare it actually impossible. [Borel, page 9]

Why `may not be rationally demonstrable'? Why does this statement remain outside the list of Borel's theorems in probability theory?

Mathematical truth is sharp, not fuzzy. Every mathematical statement is assumed to be either true or false (even if no one is able to decide) rather than "basically true", "true for all practical purposes" etc. If n heads out of n times is an impossible event for n=1000 then there must exist the least such n (be it known or not). Say, 665 heads can occur, but 666 heads cannot. Then, assuming that 665 heads have just appeared, we see that the next time tails is guaranteed, which contradicts to the assumed memoryless behavior of the coin.

Another example. The number 4 can be written as a sum of units: 1+1+1+1. Can the number 2¹⁰⁰⁰ be written as a sum of units? Mathematics answers in the affirmative. Indeed, if you can write, say, 2⁶⁶⁵ as a sum of units, then you can do it twice (connecting the two copies by the plus sign). Complaints about limited resources, appropriate in the real world, are inappropriate in the imaginary, highly idealized mathematical universe.

Rigor or intuition

Mathematics as a cultural process is not bound by monotonic logic. For example, it was treated till 1872 as intuitively evident fact that any curve on the plane has a tangent line everywhere except maybe a discrete set of points. However, this "fact" was refuted and retracted.

A century later, we have seen so many monsters of this sort that we have become a little blasé. But the effect produced on the majority of nineteenth century mathematicians ranged from disgust to consternation. (Bourbaki, page 311)

Since then, intuition is not a valid argument in mathematical proofs. A mathematical theory is bound by monotonic logic. If needed, mathematics as a cultural process can retract an axiomatic theory as whole (rather than some theorems) and accept another axiomatic theory. (Moreover, in a remote perspective the axiomatic approach could be abandoned.)

Formal or formalizable

An example. If $a^{2}+2b^{2}=1$ then $a^{4}+4b^{4}=(1-2ab)(1+2ab)$ , since $a^{4}+4b^{4}=a^{4}+4a^{2}b^{2}+4b^{4}-4a^{2}b^{2}=(a^{2}+2b^{2})^{2}-4a^{2}b^{2}=1-4a^{2}b^{2}=(1-2ab)(1+2ab)$ . This simple algebraic argument can be checked by a human or even a computer. No need to understand the meaning of the statement; it is sufficient to know the relevant formal rules. The statement follows from the given rules only, rigorously, without explicit or implicit use of intuition. A formal proof guarantees that a theorem will never be retracted.

A universal formal language conceived by Gottfried Leibniz near 1685,

capable of expressing all humans thoughts unambiguously, of strengthening our power of deduction, of avoiding errors by a purely mechanical effort of attention [Bourbaki, p.302]

was partially implemented in mathematics two centuries later, especially, as a formal language of set theory. In the form introduced by Bourbaki this language contains four logical signs, three specific signs, and letters. Its rules are considerably more complicated than elementary algebra.

However, this formal language is never used by a typical mathematician. Likewise, most programmers never use the instructions of the computer hardware. Instead, they develop applications in appropriate programming languages (BASIC, Java, C++, Python etc). A statement of a programming language corresponds usually to many hardware instructions; the correspondence is established by special programs (compilers and interpreters). A programming language is geared to programmers, hardware instructions --- to hardware.

The gap between the language of mathematical texts (books and articles) and the formal language of the set theory is even wider than the gap between a programming language and hardware instructions. It is as wide as the gap between pseudocode and Turing machines.

A programmer, when coding a program, knows that the computer does exactly what it is told to do. In contrast, an author or speaker, when communicating with other people, can be ambiguous, make small errors, and still expect to be understood. Pseudocode is a description of an algorithm, intended for human reading, that can be converted into a code by any competent programmer. A code is formal; a pseudocode is formalizable.

Likewise, communication between mathematicians is made in a formalizable language.

In practice, the mathematician ... is content to bring the exposition to a point where his experience and mathematical flair tell him that translation into formal language would be no more than an exercise of patience (though doubtless a very tedious one). (Bourbaki, page 8)

A mathematician, using a formalizable language, need not know all technical details of the underlying formal language. Many equivalent formal languages may be used interchangeably. Likewise, a single pseudocode can be used when coding in different programming languages, as long as these are not too far from each other.

Axiomatic or axiomatizable

Two finite lists of vectors generate (by summation) the same infinite set of vectors. Similarly, different finite lists of axioms can be used for generating the same infinite set of theorems.

A mathematician, working within an axiomatic theory, need not remember the list of axioms in full detail. It is sufficient to remember a list of theorems that contains all axioms or, more generally, such that each axiom follows easily from these theorems. (Here, as usual, "theorems" are treated inclusively, as "axioms and theorems".)

Two lists of axioms are called equivalent if every axiom of each list follows from the axioms of the other list. Such lists of axioms may be used interchangeably, since they generate the same theorems.

Some details

Consistent or inconsistent

If a theory states that 2+2=5, it is a paradox but not yet a contradiction. By "paradox" people may mean

a contradiction;
an apparent contradiction;
something counterintuitive;
something surprising;
something ironic;

etc. In contrast, a contradiction (in a mathematical theory) is, by definition, a pair of theorems (of the given theory) such that one is the negation of the other. Thus, two theorems

2+2=4,

2+2=5

are still not a contradiction. Two theorems

2+2=4,

2+2\neq 4

are a contradiction.

If a contradiction exists in a given theory, this theory is called inconsistent. Otherwise, if no contradiction exist (rather than merely not found for now), the theory is called consistent.

For a mathematician, an inconsistent theory is completely useless. Some philosophers disagree:

Superstitious dread and veneration by mathematicians in face of a contradiction (Ludwig Wittgenstein)

But a mathematician insists: an inconsistent theory is completely useless, since all statements (in the given language) are theorems! The reason is, proof by contradiction. No matter which statement X is in question, we always can prove X as follows:

Assume that X is false;
... (put the proof of the contradiction here);
the assumption leads to a contradiction, thus X is true.

It is tempting to object that the contradiction has nothing in common with the assumption and therefore cannot invalidate it. However, the rules of formal logic do not demand that the contradiction has something in common with the assumption. Some attempts to change these rules were made (so-called "relevance logic", or "relevant logic"), but with little success. It is always possible to obfuscate the proof of the contradiction, making it seemingly entangled with X. We have no formal criterion able to unmask any possible fictitious participation of X in the proof of the contradiction.

Univalent or multivalent

Plane geometry (called also "planar geometry") is a part of solid geometry that restricts itself to a single plane ("the plane") treated as a geometric universe. The question "which plane?" is inappropriate, since planes do not differ in their geometric properties. Every two planes α, β are isomorphic, that is, there exists an isomorphism f between α and β. Treating α and β as sets of points one defines isomorphism as an invertible (one-to-one and onto) map f : α → β preserving all primitive relations. Namely: f maps lines into lines; the distance between f(A) and f(B) on β is equal to the distance between A and B on α; etc. The same is required of the inverse map $f^{-1}:\beta \to \alpha$ .

Axioms of the plane Euclidean geometry leave no freedom, they determine uniquely all geometric properties of the plane. More exactly: all Euclidean planes are mutually isomorphic. In this sense we have "the" Euclidean plane. In terms of Bourbaki, the plane Euclidean geometry is an univalent theory. In contrast, axioms of a linear space (called also vector space) leave a freedom: a linear space may be one-dimensional, two-dimensional, three-dimensional, four-dimensional and so on (infinite dimension is also possible). Linear algebra is multivalent.

Axioms of topology leave much more freedom; every subset of (say) the plane is an example of a topological space, be it connected or not, compact or not, be it a curve, a domain, a fractal, or whatever; and all these are still a minority among all topological spaces. Topology is multivalent.

According to Bourbaki, the study of multivalent theories is the most striking feature which distinguishes modern mathematics from classical mathematics.

A similar idea occurs in mathematical logic: a theory is called categorical if all its models are mutually isomorphic. However, for Bourbaki a theory is embedded into the set theory, while in logic a theory is standalone (embedded into the first-order logic).

Complete or incomplete

A theory is called complete, if every statement in its language can be proved or disproved. In other words: if such a statement is not a theorem then its negation is necessarily a theorem.

A statement (in the language of the given theory) is called independent (in other words, undecidable) in this theory, if it is not a theorem, but also its negation is not a theorem.

Independent statements appear naturally in multivalent theories. For example, in linear algebra the statement "three vectors cannot be linearly independent" is neither provable nor disprovable, since the dimension of a linear space may be 2 or 3 (or anything else). Thus, multivalent theories are usually incomplete.

What about univalent theories? Seemingly, these should be complete. For example, the plane Euclidean geometry should prove all geometric statements that hold on the Euclidean plane, and only such statements. Similarly, arithmetics of natural numbers should prove all true arithmetical statements about natural numbers, and only these. Unexpectedly, the situation is far more complicated than these naive ideas, see Sect. 4.7.

Definitions: abbreviations or extensions

According to Sect. 2.4, a theorem is a logical consequence of the axioms. However, what about definitions? They are numerous and important, according to Sect. 2.5. Two possible approaches to definitions answer differently the question, whether they belong to the formal theory or not.

One approach treats definitions as abbreviations, used in the formalizable language and eliminated when translating into the formal language. Thus, the formal theory contains only primitive notions and axioms, not definitions.

According to the other approach, each definition extends the formal theory. The new (extended) theory contains one more notion and one more axiom. Informally, the new notion is just defined, and the new axiom is just the definition. But formally, the new theory still contains only primitive notions and axioms, not definitions.

An example.

Definition: a prime number is a natural number having exactly two divisors. (These are 1 and the number itself, of course.)

Theorem: there are infinitely many prime numbers.

The first approach (abbreviations). In the formal language, the notion "prime" is eliminated, as follows. Theorem: there are infinitely many natural numbers having exactly two divisors. The proof also does not involve the notion "prime".

The second approach (extensions). The new formal theory stipulates a new primitive notion "prime" and a new axiom: a natural number is prime if and only if it has exactly two divisors. This axiom is used in the proof of the theorem.

Formalizable in reality or in principle

The integer 1 is a very simple example of a mathematical object that can be defined in the language of set theory. Its definition, treated as an abbreviation (rather than extension), can be translated from this formalizable language into any corresponding formal language, in particular, into the formal language introduced by Bourbaki. However, the length of the resulting formal definition is equal to 4,523,659,424,929. It would take about 4000 gigabytes of computer memory! More complicated mathematical definitions are much longer. Clearly, such formalization is possible only in principle, similarly to the possibility of writing 2¹⁰⁰⁰ as a sum of units.

This formal language is so verbose because of substitution. Here is an algebraic example: substituting $x=(a+1)/(a^{2}+1)$ into $(x^{2}-x+1)/(x^{2}+x+1)$ gives an expression of length 64, containing 8 occurrences of a. Further, substitute $a=u^{3}-2u^{2}-3u+4$ and you get an expression of length 152, containing 24 occurrences of u. And so on.

There exist formal mathematical languages intended for practical use, for example, Mizar. These treat definitions as extensions.

Generating all theorems; decidable or undecidable

It is impossible to list all theorems, since they are infinitely many. However, an endless algorithmic process can generate theorems, only theorems, and all theorems in the sense that every theorem will be generated, sooner or later. (In more technical words: the set of all theorems is recursively enumerable.)

An open question (in a mathematical theory) is a statement neither proved nor disproved. It is possible (in principle, not necessarily in practice) to run the theorem-generating algorithm waiting for one of two events: either the given statement appears to be a theorem, or its negation does; in both cases the (formerly) open question is decided. If the theory is complete, this must happen sooner or later.

A theory is called decidable if there exists an algorithm that, given an arbitrary statement (in the language of the theory) decides whether it is a theorem or not. (In more technical words: if the set of all theorems is recursive.)

The argument above shows that a complete theory is decidable.

A theory as a crystal ball?

Imagine a theory able to describe the whole discrete mathematics (the set theory?) or, at least, arbitrary algorithmic computations, in other words, formal machines (like computers, but with unlimited resources). Assume that the theorems include all true statements about these machines, and no wrong statements about them. Then a formal machine can use this theory as a "crystal ball", — for predicting the future of any formal machine. In particular, for deciding, whether a given machine will eventually halt, or will run forever (so-called halting problem). To this end it runs the theorem-generating algorithm until either the statement "such machine eventually halts" or its negation appears among the theorems.

In order to predict the future of a formal machine Y, a formal machine X needs the code of Y. It does not mean that the code of Y must be inscribed into (and therefore be shorter than) the code of X. Instead, X can generate the code of Y. This way, a single machine X can predict the future of infinitely many machines Y, and moreover, of all possible machines Y, generating their codes in an endless loop.

Can X predict its own future? It may seem that the affirmative answer is ensured by the argument above. However, how can X know, which Y is identical to X? In other words, how can X know (it means, be able to generate) its own code?

It is a well-known and nontrivial fact that some programs, so-called quine programs, can generate their own texts. (See also halting problem.) It may seem to be a trick, but in fact it is a form of self-replication, and it uses a 4×10⁹ years old patent of life: a cell replicates itself using its DNA, and also replicates the DNA itself.

Using the quine technique, a machine X can predict its own future. Is it paradoxical? Seemingly not, since X is a deterministic machine, it has no free will. But nevertheless it is paradoxical, since we may program X as follows:

predict your own future: whether you will eventually halt, or will run forever;
if "halt" is predicted, enter an infinite loop;
otherwise (if "run forever" is predicted), halt.

We get a paradox, which means that the assumptions made in the beginning of this section cannot be satisfied. An axiomatic theory cannot be such that its theorems include all true statements about formal machines, and no wrong statements about them! This important conclusion is closely related to the famous Gödel theorem.^[2]

Notes

↑ This phrase is borrowed from Life# — Cells.
↑ The first incompleteness theorem of Gödel, somewhat stronger than this conclusion, states that a consistent and complete theory cannot contain Robinson arithmetics.

[1] This phrase is borrowed from Life# — Cells.

[2] The first incompleteness theorem of Gödel, somewhat stronger than this conclusion, states that a consistent and complete theory cannot contain Robinson arithmetics.

[1]

[2]