Uniqueness of mathematical structures

This post is an introduction to model theory, of sorts. Occasionally I get asked what model theory is, and I generally find it quite difficult to give someone who doesn’t already know any model theory a good answer to this question, that actually says anything useful about what model theory is really about without leaving them hopelessly lost. This is my attempt to provide a real taste of model theory in a way that should be accessible to a math grad student without a background in logic.

Warm-up exercise

Let’s say I make a graph with the following procedure: I start with a countably infinite set of vertices. For each pair of vertices, I flip a fair coin. If the coin lands heads, I put an edge between those two vertices; if the coin lands tails, no edge.

Now you make another graph in a very similar manner. You also start with a countably infinite set of vertices. But instead of flipping a coin, you roll a fair standard six-sided die for each pair of vertices. If the die comes up 6, you put an edge between those two vertices; if it comes up anything from 1 through 5, no edge.

What is the probability that these two graphs are isomorphic?

For the numerical answer, paste “Gur zhygvcyvpngvir vqragvgl” into https://rot13.com/. An explanation will appear later in this post.

Introduction

There are several cases in which we can identify a mathematical object up to isomorphism with a list of first-order properties it satisfies (I’ll tell you what that means in a sec) and some data about cardinality. Here’s a couple examples: All countable dense linear orders without endpoints are isomorphic. Any two algebraically closed fields of the same characteristic, which have transcendence bases of the same cardinality, are isomorphic. It turns out that the possibility of uniquely specifying a mathematical structure in this way corresponds to interesting structural properties of that structure.

First, the basic definitions:

A first-order language consists of a set of relation symbols, each of which is labeled with a number representing its arity (number of inputs it takes), a set of function symbols, each of which is also labeled with a number representing its arity, and a set of constant symbols (which could also just be thought of as 0-ary function symbols). For example, the language of linear orders has one binary relation < and no functions or constants. The language of fields has constants 0 and 1, binary functions + and \cdot, a unary function - (no unary function for reciprocal, because the functions should be total), and no relations.

A first-order structure in a given language is a set X in which each constant symbol is interpreted as an element of X, each n-ary function symbol is interpreted as a function X^n\rightarrow X, and each n-ary relation symbol is interpreted as a subset of X^n (or alternatively, as a function X^n\rightarrow\{\text{True},\text{False}\}). So linear orders and fields are examples of structures in their respective languages.

We can compose function symbols, constant symbols, and variables into ways of pointing to elements of a structure, called terms. We have as many variables as we want, and they are terms. Constant symbols are terms. And for each n-ary function symbol f and terms t_1,...,t_n, f(t_1,...,t_n) is a term. So in the language of fields, we can construct terms representing integers by adding however many 1s together (and then negating to get negative numbers), and then combine these with variables using addition and multiplication to get terms representing polynomials in however many variables with integer coefficients. In the language of linear orders, since we have no functions or constants, the only terms are variables.

A first-order formula is a way of actually saying things about a first-order structure and elements of it represented by variables. If R is an n-ary relation and t_1,...t_n are terms, then R(t_1,...,t_n) is a formula. If t_1,t_2 are terms, then t_1=t_2 is a formula (you can think of this as just meaning that languages always have the binary relation = by default). Boolean combinations of formulas are formulas (i.e. if \varphi and \psi are formulas, then so are \varphi \& \psi, \varphi\text{or}\psi, and \neg\varphi), and if \varphi is a formula that refers to a variable x, then \forall x\,\varphi and \exists x\,\varphi are formulas. Any variable that appears in a formula without being bound to a quantifier \forall or \exists is called a free variable, and if each free variable is assigned to an element of a structure, the formula makes a claim about them, which can be either true or false. For example, in a ring, \exists y\, x\cdot y=1 is true iff x is a unit.

A first-order formula with no free variables is called a sentence. These are true or false statements about a first-order structure. Many types of mathematical objects are defined by listing first-order sentences that are true of them. For instance, a linear order is a structure with a < relation satisfying transitivity (\forall x \forall y \forall z\, x<y \& y<z \implies x<z), antisymmetry (\forall x \forall y \neg(x<y \& y<x)), and totality (\forall x \forall y\, x<y \text{ or } y<x \text{ or } x=y), and a linear order is dense without endpoints if it also satisfies \forall x \forall y \exists z \, x<z \& z<y and \forall x \exists y \exists z \, y<x \& x<z. These are all first-order sentences. Algebraically closed fields of a given characteristic are another example. The field axioms are first-order sentences. For each positive integer n, we can formulate a first-order sentence saying that every polynomial of degree n has a root: \forall y_0 \forall y_1 ... \forall y_{n-1} \exists x \, y_0 + y_1x + ... + y_{n-1}x^{n-1}+x^n = 0 (the y_is represent the coefficients, with the leading coefficient normalized to 1). So we just add in these infinitely many sentences, one for each n. And we can say that the field has characteristic p by saying 1+...+1=0 (with p ones), or say that it has characteristic 0 by, for each prime p, saying \neg(1+...+1=0).

First-order sentences can tell us a lot about a structure, but not everything, unless the structure is finite.

Löwenheim–Skolem theorem: Given a countable set of first-order sentences (in particular, any set of sentences if the language is countable), if there is any infinite structure in which they are all true, then there are first-order structures of every infinite cardinality in which they are all true.

This is why the uniqueness results all have to say something about cardinality. You might also think of some examples of ways to identify an infinite mathematical object up to isomorphism with a list of axioms without saying directly anything about cardinality, but in all such cases, you’ll be using an axiom that isn’t first-order. For instance, all Dedekind-complete ordered fields are isomorphic to the reals, but Dedekind-completeness isn’t a first-order sentence. Same goes for any way of characterizing the natural numbers up to isomorphism that says something like “every set of natural numbers that contains 0 and is closed under successor contains all of the natural numbers”.

Countable structures

Let’s go back to the example of countable dense linear orders. If you don’t know the proof that all countable dense linear orders are isomorphic, here it goes: suppose we have two countable dense linear orders, X and Y. Since they’re countable, we can label the elements of each of them with distinct natural numbers. We’re going to match elements of X to elements of Y one at a time such that we get an isomorphism at the end. To ensure that every element of X gets matched to something in Y, on odd-numbered steps, we’ll take the lowest-numbered element of X that hasn’t been matched yet, and match it with an element of Y. Similarly, to ensure that every element of Y gets matched to something in X, on even-numbered steps, we’ll take the lowest-numbered element of Y that hasn’t been matched yet, and match it with an element of X. As for what we do on each step (suppose it’s an odd-numbered step; even-numbered steps are the same but with the roles of X and Y reversed), at the start of the step, finitely many elements of X have already been matched. We take the first element that hasn’t yet been matched. Call it x. x is either greater than all previously matched elements, less than all previously matched elements, or between two previously matched elements that don’t already have previously matched elements between them. Since Y is dense and has no endpoints, we know that in the first case, there will be something greater than all previously matched elements of Y, so we can match x to it; in the second case, there will be something less than all previously matched elements of Y for us to match x to; and in the third case, there will be something between the elements matched to the elements on either side of x, which we can match x to. By doing this, we continue to preserve the ordering at each step, so the bijection we get at the end is order-preserving, and thus an isomorphism.

Now let’s get back to the warm-up exercise. A graph can be viewed as a first-order structure whose elements are the vertices, with a single binary relation E (the edge relation) that is symmetric and anti-reflexive (symmetry and anti-reflexivity are both first-order conditions). There are some more first-order sentences satisfied by both of our random graphs with probability 1. Given any two finite disjoint sets of vertices, we can find another vertex that’s connected to everything in the first set and not connected to anything in the second set. This is because each vertex has the same positive probability of having this property, they’re all independent, and there’s infinitely many of them, so there also must be some (in fact, infinitely many) that have all the desired edges and none of the undesired edges. To write this condition using first-order sentences, for each natural number n and m, we have a sentence \forall x_1 ... \forall x_n\forall y_1 ... \forall y_m \, (x_1\neq y_1 \& ... \& x_n\neq y_m) \implies \exists z \, E(z,x_1)\&...\&E(z,x_n)\&\neg E(z,y_1)\&...\&\neg E(z,y_m)
(the big conjunction before “\implies” includes x_i\neq y_j for each 1\leq i\leq n and 1\leq j\leq m, so that this says \{x_1,...,x_n\} and \{y_1,...,y_m\} are disjoint).

This is enough for us to construct an isomorphism, using essentially the same proof as for countable dense linear orders. Since we each started with countably many vertices, we can label each of our vertices with natural numbers, and then iteratively match the next available unmatched vertex in one graph to a vertex on the other, alternating between which graph we take the next available unmatched vertex from on each step, just like before. On each step, only finitely many vertices have been matched. The new vertex shares edges with some of the already matched vertices and doesn’t share edges with some others. We need to match it with a vertex in the other graph that shares exactly the same pattern of edges with previously matched vertices. And we know that somewhere in that graph, there must be such a vertex. So we can match the new vertex and keep going, and the bijection we get at the end preserves the edge relation, and is thus an isomorphism.

For the general argument that these are both special cases of, we’ll need the concept of a type (not to be confused with the identically-named concept from type theory). Given a first-order structure X and c_1,...,c_n\in X, say that a,b\in X have the same type over A if for every first-order formula \varphi(x,y_1,...,y_n) (where x,y_1,...,y_n are its free variables), \varphi(a,c_1,...,c_n) holds iff \varphi(b,c_1,...,c_n) does. So, for example, in a dense linear order without endpoints, if c_1<a<c_2, then, in order for b to have the same type as a over c_1,c_2, it must be the case that c_1<b<c_2 as well, since y_1<x<y_2 is a first-order formula, and a and b must satisfy exactly the same first-order formulas with parameters in c_1,c_2. And as it turns out, this is enough; if c_1<a<c_2 and c_1<b<c_2, then a and b have the same type over c_1,c_2. In an infinite random graph, if vertices a and b have the same type over some other vertices c_1,..,c_n, then a must have an edge to each c_i that b has an edge to, and vice-versa. Again, this turns out to be enough to guarantee that they have the same type.

In both of these cases, for any finite set of elements c_1,...,c_n, there are only finitely many types over c_1,...,c_n. Let’s count them. Each c_i is its own type, since x=y_i is a formula, so if a and c_i have the same type over c_1,...,c_n, then, since c_i=c_i, a=c_i as well. Let’s ignore these and count the rest. In a dense linear order without endpoints, we can assume WLOG that c_1<c_2<...<c_n. There are n+1 nontrivial types over c_1,...,c_n: \{a\mid a<c_1\}, \{a\mid c_n<a\}, and, for each 1\leq i<n, \{a\mid c_i<a<c_{i+1}\}. In an infinite random graph, there are 2^n nontrivial types over n vertices c_1,...,c_n: for each S\subseteq\{c_1,...,c_n\}, there’s a type of vertices that have edges to everything in S and no edges to anything in \{c_1,...,c_n\}\setminus S.

Theorem: Let X be a countably infinite first-order structure such that for every n\in\mathbb{N} and c_1,...,c_n\in X, there are only finitely many types over c_1,...,c_n. Then every countable structure Y satisfying the same first-order sentences that X does is isomorphic to X.

That is, our ability to specify a countable structure (up to isomorphism) by its first-order properties corresponds exactly to the condition that there are only finitely many different behaviors that elements of the structure can have in relation to any given finite subset. The proofs that all countable dense linear orders without endpoints are isomorphic and that all countable random graphs are isomorphic look the same because they both follow the proof of this theorem, which goes like so:

Suppose there are only finitely many types over c_1,...,c_n. Let p be one of those types, and let q_1,...,q_k be the others. For each q_i, there’s some formula \varphi_i(x,c_1,...,c_n) that’s true for x\in p but not for x\in q_i. Then \varphi_1(x,c_1,...,c_n) \& ... \& \varphi_k(x,c_1,...,c_n) is a formula that holds only for x\in p. That is, the entire type is specified by a single formula; it wasn’t just coincidence that we were able to find such a formula for the types in each of our two examples.

Lemma: If X and Y are two first-order structures in the same language which satisfy all the same sentences, c_1,...,c_n\in X and d_1,...,d_n\in Y satisfy all the same formulas (i.e., for any \varphi(x_1,...,x_n), \varphi(c_1,...,c_n) is true in X iff \varphi(d_1,...,d_n) is true in Y), and X has only finitely many types, then there’s a natural bijection between types in X over c_1,...,c_n and types in Y over d_1,...,d_n. A formula \varphi(x,c_1,...,c_n) is true for x in some type over c_1,...,c_n iff \varphi(x,d_1,...,d_n) is true for x in the corresponding type over d_1,...,d_n.

Proof: Let p_1,...,p_k be the types in X over c_1,...,c_n, and for each 1\leq i\leq k, let \varphi_i(x,c_1,...,c_n) be a formula specifying that p_i is the type of x. These formulas \varphi_i specify types in Y over d_1,...,d_n as well; for any other formula \psi(x,y_1,...,y_n), either \forall x\, \varphi_i(x,c_1,...,c_n)\implies\psi(x,c_1,...,c_n) or \forall x\, \varphi_i(x,c_1,...,c_n)\implies\neg\psi(x,c_1,...,c_n) in X. These are first-order formulas, so again since d_1,...,d_n satisfy the same first-order formulas that c_1,...,c_n do, one of \forall x\, \varphi_i(x,d_1,...,d_n)\implies\psi(x,d_1,...,d_n) or \forall x\, \varphi_i(x,d_1,...,d_n)\implies\neg\psi(x,d_1,...,d_n) is true in Y as well. So \varphi_i(x,d_1,...,d_n) determines the truth value of every such formula; that is, it specifies the type of x over d_1,...,d_n, and formulas are true in this type iff they are true in the corresponding type in X over c_1,...,c_n. To show that these are all of the types in Y over d_1,...,d_n, consider the formula \forall x\, \varphi_1(x,y_1,...,y_n)\text{ or ... or }\varphi_k(x,y_1,...,y_n). In X, when we plug in c_1,...,c_n, the formula is true. And d_1,...,d_n satisfies all the same formulas as c_1,...,c_n, so the same formula must also be true in Y when we plug in d_1,...,d_n. That is, for any b\in Y, there’s some i such that \varphi_i(b,d_1,...,d_n) is true, so every element of Y must have one of the above types.

Armed with this lemma, we can prove the theorem. Let X and Y be countable structures satisfying the same first-order sentences, and suppose for every c_1,...,c_n\in X, there are only finitely many types over c_1,...,c_n. We’ll match elements of X to elements of Y one at a time, using the same back-and-forth trick from our two examples to ensure that we get a bijection at the end. After n steps, we’ll have c_1,...,c_n from X matched with d_1,...,d_n from Y, and we’ll want to ensure that c_1,...,c_n and d_1,...,d_n satisfy exactly the same first-order formulas. If we’ve done this, then on step n+1, we’ll have an element either of X or of Y, which we need to match with some element of the other one. We can match it to an element that has the corresponding type; that is, we’re matching c_{n+1}\in X and c_{n+1}\in Y such that the type of c_{n+1} over c_1,...,c_n corresponds to the type of d_{n+1} over d_1,...,d_n. Then c_1,...,c_{n+1} satisfy the same formulas that d_1,...,d_{n+1} do, so by induction, c_1,...,c_n and d_1,...,d_n satisfy the same formulas for every n (the assumption that X and Y satisfy the same first-order sentences provides a base case). Thus, the bijection we get at the end preserves the truth-values of all formulas, so it is an isomorphism, and we’re done.

As it turns out, the converse of the theorem is also true. Given a set of first-order sentences for which there is, up to isomorphism, only one countable model, all models have only finitely many types over any finite list of elements. Whenever there’s infinitely many types, there will be some types (which cannot be specified by a single formula) that appear in some models but not in others.

Uncountable structures

Let’s turn to the other example I introduced at the beginning: any two algebraically closed fields of the same characteristic with transcendence bases of the same cardinality are isomorphic. Every field has a transcendence basis, so a corollary of this is that any two uncountable algebraically closed fields of the same characteristic and cardinality are isomorphic.

A sketch of the proof: Given algebraically closed fields F and K of the same characteristic, with transcendence bases B_F and B_K of the same cardinality, any isomorphism between a subfield of F and a subfield of K extends to a maximal such isomorphism (by Zorn’s lemma). Since B_F and B_K have the same cardinality, there’s a bijection between them, and since F and K have the same characteristic, this bijection extends to an isomorphism between the fields they generate. Thus there is a maximal isomorphism between subfields of F and of K which restricts to a bijection between B_F and B_K. Now we just need to show that these subfields are all of F and K. This is because, given any such isomorphism between subfields F'\subseteq F and K'\subseteq K with B_F\subseteq F' and B_K\subseteq K', if F'\neq F, then let a\in F\setminus F', and let f(x) be the minimal polynomial of a. Applying the isomorphism to the coefficients gives us an irreducible polynomial over K', which must have a root b\in K, and then by matching a with b, we get an isomorphism F'[a]\cong K'[b], contradicting maximality of the isomorphism.

Here’s another example: Any two vector spaces over the same vector space, with bases of the same cardinality, are isomorphic. Since every vector space has a basis, a corollary of this is that, over a countable field, any two uncountable vector spaces of the same cardinality are isomorphic. Citing Zorn’s lemma is overkill, since there’s only one way to extend a bijection between bases to an isomorphism. But the basic idea is the same in each case: We have an appropriate notion of basis, and we extend a bijection between bases to an isomorphism. And vector spaces are also first-order structures; the language has a binary operation +, a constant 0, and, for each scalar \alpha, a unary operation for multiplication by \alpha.

The thing that unites both these cases is called strong minimality. A first-order structure is called minimal if every set defined by a first-order formula is either finite, or the complement of a finite set. More formally: X is minimal if for every formula \varphi(x,y_1,...,y_n) and b_1,...,b_n\in X, one of \{a\in X\mid\varphi(a,b_1,...,b_n)\} or \{a\in X\mid\neg\varphi(a,b_1,...,b_n)\} is finite. We call a structure strongly minimal if every structure satisfying the same first-order sentences is also minimal. (This turns out to be equivalent to, for each \varphi(x,y_1,...,y_n), there’s a finite upper bound on the size of whichever of \{a\in X\mid\varphi(a,b_1,...,b_n)\} or \{a\in X\mid\neg\varphi(a,b_1,...,b_n)\} is finite, as b_1,...,b_n vary.)

Let’s go over the general notion of “basis” we’ll be using: Say that a is algebraic over b_1,...,b_n if there is a formula \varphi(x,y_1,...,y_n) such that \varphi(a,b_1,...,b_n) holds, and \{a'\in X\mid\varphi(a',b_1,...,b_n)\} is finite. In algebraically closed fields, this corresponds to the usual notion of a being algebraic over the subfield generated by b_1,...,b_n. In vector spaces, this corresponds to a being a linear combination of b_1,...,b_n. Call B\subseteq X independent if no element of B is algebraic over any other elements of B. In other words, you can’t pin down an element of B to one of finitely many possibilities by using a single formula and other elements of B. In a vector space, independence is linear independence. In an algebraically closed field, independence is algebraic independence. Now call B a basis if it is a maximal independent set. If X is minimal, this turns out to imply that every a\in X is algebraic over some b_1,...,b_n\in B. An increasing union of independent sets is independent, so by Zorn’s lemma, every structure has a basis.

Now let’s look at type spaces in minimal structures. Let X be a minimal structure and b_1,...,b_n\in X. If a is algebraic over b_1,...,b_n, then there’s some formula \varphi(x,y_1,...,y_n) such that \varphi(a,b_1,...,b_n) holds and \{a'\in X\mid\varphi(a',b_1,...,b_n)\} is as small as possible. Then a and a' have the same type iff \varphi(a',b_1,...,b_n). So this type is implied by a single formula, and there are only finitely many elements of this type. There’s only one remaining type: the type of elements that aren’t algebraic over b_1,...,b_n. If a and a' are both non-algebraic over b_1,...,b_n, then for every formula \varphi(x,y_1,...,y_n), since X is minimal, one of \varphi(x,b_1,...,b_n) and \neg\varphi(x,b_1,...,b_n) must have only finitely many solutions x; a and a', being non-algebraic, must both be solutions to the other one. This shows they have the same type over b_1,...,b_n. This non-algebraic type is optional; in some cases, there might not be any elements that aren’t algebraic over b_1,...,b_n.

Let X and Y be minimal structures in the same language, which satisfy the same first-order sentences. They each have a basis. If those bases have the same cardinality, then X and Y are isomorphic. Say a “partial isomorphism” between X and Y is a bijection between a subset of X and a subset of Y, such that whenever a formula is true about some elements of the subset of X, then it is also true about the corresponding elements of the subset of Y, and vice-versa. If B_X and B_Y are bases for X and Y, respectively, then a bijection between B_X and B_Y is a partial isomorphism (this is because if b_1,...,b_n\in B_X and c_1,...,c_n\in B_Y satisfy all the same formulas, and b_{n+1}\in B_X\setminus\{b_1,...,b_n\} and c_{n+1}\in B_Y\setminus\{c_1,...,c_n\}, then b_{n+1} must have the unique non-algebraic type over b_1,...,b_n, c_{n+1} has the unique non-algebraic type over c_1,...,c_n, and these unique non-algebraic types satisfy the same formulas, so it follows by induction on the number of variables that a formula is true of distinct elements of B_X iff it is true of distinct elements of B_Y). An increasing union of partial isomorphisms is a partial isomorphism, so by Zorn’s lemma, there’s a maximal partial isomorphism extending a bijection between B_X and B_Y. If this maximal partial isomorphism is a bijection between X'\subseteq X and Y'\subseteq Y, and X'\neq X, then let a\in X\setminus X'. a is algebraic over X' (since B_X\subseteq X'), so there’s a single formula \varphi(x,b_1,...,b_n) (b_1,..,b_n\in X') that is true for x=a, and which determines its type over X' (meaning, determines its type over b_1',...,b_m' for every b_1',...,b_m'\in X'). Then, where d_1,...,d_n\in Y' correspond to b_1,...,b_n under the partial isomorphism, there must be c\in Y such that \varphi(c,d_1,...,d_n) (since d_1,...,d_n satisfies the same formulas b_1,...,b_n do, and \exists x \varphi(x,b_1,...,b_n)). c\notin Y', because this can be expressed as part of the type of c over Y', which is the same as the type of a over X'. Thus we can extend the partial isomorphism by matching a with c. Thus, in our maximal partial isomorphism, X'=X, and for the same reason, Y'=Y, so it is an isomorphism.

So for a strongly minimal structure, the structures satisfying the same sentences are classified by the cardinality of a basis. This isn’t quite the end of the story; in some cases, a structure with too small a basis would be finite, and we could thus distinguish it from the rest with a first-order sentence saying that there are n distinct elements (for large enough n). This isn’t the case for algebraically closed fields, which are infinite even when the empty set is a transcendence basis. But for vector spaces, the empty basis generates a one-element vector space, so an infinite vector space must have basis of size at least one.

And if the vector space is over a finite field, then its basis must be infinite. Another case where where the basis must be infinite is an infinite set. A set is a first-order structure in the language with no relations, no functions, and no constants. Every subset of a set is independent, so a basis for the set is just the entire set. In these cases where a basis must be infinite; there’s only one (up to isomorphism) countable model: the model with a countably infinite basis. You can check that both of these examples satisfy the finitely-many-types condition from the previous section for having a unique countable model.

So the general story, for a strongly minimal structure X, is that there is some n\in\mathbb{N}\cup\{\aleph_0\} such that structures satisfying the same sentences as X are classified by cardinalities that are at least n, that being the cardinality of a basis. In a countable language, the cardinality of a structure is the maximum of \aleph_0 and the cardinality of a basis, so it follows that an uncountable strongly minimal structure is isomorphic to all structures of the same cardinality satisfying the same sentences.

In the previous section, we had a converse, so you may ask, if an uncountable structure is isomorphic to all structures of the same cardinality satisfying the same sentences, is it strongly minimal? This is not quite true. For example, consider a vector space V over a countable field, where we add two unary relations W and U to the language, each of which define subspaces of V, which are disjoint and span V, and then add a unary function T to the language, which is a linear function such that T^2=id_V, and T\restriction_W is an isomorphism between W and U. Vector spaces like this are classified by the dimension of W, so there is a unique one (up to isomorphism) of any given uncountable cardinality. It is not strongly minimal because W itself is a formula picking out a set that is neither finite nor the complement of a finite set. But it is almost strongly minimal, in the sense that it is basically just the vector space W^2, and W is strongly minimal. It turns out that for any uncountable structure (in a finite or countable language) that is isomorphic to every structure of the same cardinality and satisfying the same sentences, there’s a formula defining a subset that is strongly minimal in an appropriate sense, such that the rest of the structure can be parameterized somehow using the subset.