Content warning: mathematical logic.
Note: This write-up consists mainly of open questions rather than results, but may contain errors anyway.
I'd like to describe a logic for talking about probabilities of logical sentences. Fix some first-order language . This logic deals with pairs , which I'm calling assertions, where is a formula and . Such a pair is to be interpreted as a claim that has probability at least .
A theory consists of a set of assertions. A model of a theory consists of a probability space whose points are -structures, such that for every assertion , , where is inner probability. I'll write for can be proved from , and for all models of are also models of .
The rules of inference are all rules where is a finite set of assertions, and is an assertion such that in all models of . Can we make an explicit finite list of inference rules that generate this logic? If not, is the set of inference rules at least recursively enumerable? (For recursive enumerability to make sense here, we need to restrict attention to probabilities in some countable dense subset of that has a natural explicit bijection with , such as .) I'm going to assume later that the set of inference rules is recursively enumerable; if it isn't, everything should still work if we use some recursively enumerable subset of the inference rules that includes all of the ones that I use.
Note that the compactness theorem fails for this logic; for example, , but no finite subset of implies , and hence .
Any classical first-order theory can be converted into a theory in this logic as .
Let be a consistent, recursively axiomatizable extension of Peano Arithmetic. By the usual sort of construction, there is a binary predicate such that for any sentence and , where is a coding of sentences with natural numbers. We have a probabilistic analog of Löb's theorem: if , then . Peano arithmetic can prove this theorem, in the sense that .
Proof: Assume . By the diagonal lemma, there is a sentence such that . If , then and , so . This shows that . By the assumption that , this implies that . By a probabilistic version of the deduction theorem, . That is, . Going back around through all that again, we get .
If we change the assumption to be that for some , then the above proof does not go through (if , then it does, because ). Is there a consistent theory extending Peano Arithmetic that proves a soundness schema about itself, , or can this be used to derive a contradiction some other way? If there is no such consistent theory, then can the soundness schema be modified so that it is consistent, while still being nontrivial? If there is such a consistent theory with a soundness schema, can the theory also be sound? That is actually several questions, because there are multiple things I could mean by "sound". The possible syntactic things "sound" could mean, in decreasing order of strictness, are: 1) The theory does not assert a positive probability to any sentence that is false in . 2) There is an upper bound below for all probabilities asserted of sentences that are false in . 3) The theory does not assert probability to any sentence that is false in .
There are also semantic versions of the above questions, which are at least as strict as their syntactic analogs, but probably aren't equivalent to them, since the compactness theorem does not hold. The semantic version of asking if the soundness schema is consistent is asking if it has a model. The first two soundness notions also have semantic analogs. 1') is a model of the theory. 2') There is a model of the theory that assigns positive probability to . I don't have a semantic version of 3, but metaphorically speaking, a semantic version of 3 should mean that there is a model that assigns nonzero probability density at , even though it might not have a point mass at .
This is somewhat similar to Definability of Truth in Probabilistic Logic. But in place of adding a probability predicate to the language, I'm only changing the metalanguage to refer to probabilities, and using this to express statements about probability in the language through conventional metamathematics. An advantage of this approach is that it's constructive. Theories with the properties described by the Christiano et al paper are unsound, so if some reasonably strong notion of soundness applies to an extension of Peano Arithmetic with the soundness schema I described, that would be another advantage of my approach.
A type of situation that this might be useful for is that when an agent is reasoning about what actions it will take in the future, it should be able to trust its future self's reasoning. An agent with the soundness schema can assume that its future self's beliefs are accurate, up to arbitrarily small loss in precision. A related type of situation is if an agent reaches some conclusion, and then writes it to external storage instead of its own memory, and later reads the claim it had written to external storage. With the soundness schema, if the agent has reason to believe that the external storage hasn't been tampered with, it can reason that since its past self had derived the claim, the claim is to be trusted arbitrarily close to as much as it would have been if the agent had remembered it internally.
First Incompleteness Theorem
For a consistent theory , say that a sentence is -measurable if there is some such that for every and for every . So -measurability essentially means that pins down the probability of the sentence. If is not -measurable, then you could say that has Knightian uncertainty about . Say that is complete if every sentence is -measurable. Essentially, complete theories assign a probability to every sentence, while incomplete theories have Knightian uncertainty.
The first incompleteness theorem (that no recursively axiomatizable extension of PA is consistent and complete) holds in this setting. In fact, for every consistent recursively axiomatizable extension of PA, there must be sentences that are given neither a nontrivial upper bound nor a nontrivial lower bound on their probability. Otherwise, we would be able to recursively separate the theorems of PA from the negations of theorems of PA, by picking some recursive enumeration of assertions of the theory, and sorting sentences by whether they are first given a nontrivial lower bound or first given a nontrivial upper bound; theorems of PA will only be given a nontrivial lower bound, and their negations will only be given a nontrivial upper bound. [Thanks to Sam Eisenstat for pointing this out; I had somehow managed not to notice this on my own.]
For an explicit example of a sentence for which no nontrivial bounds on its probability can be established, use the diagonal lemma to construct a sentence which is provably equivalent to "for every proof of for any , there is a proof of for some with smaller Gödel number."
Thus a considerable amount of Knightian uncertainty is inevitable in this framework. Dogmatic Bayesians such as myself might find this unsatisfying, but I suspect that any attempt to unify probability and first-order arithmetic will suffer similar problems.
A side note on model theory and compactness
I'm a bit unnerved about the compactness theorem failing. It occurred to me that it might be possible to fix this by letting models use hyperreal probabilities. Problem is, the hyperreals aren't complete, so the countable additivity axiom for probability measures doesn't mean anything, and it's unclear what a hyperreal-valued probability measure is. One possible solution is to drop countable additivity, and allow finitely-additive hyperreal-valued probability measures, but I'm worried that the logic might not even be sound for such models. A different possible solution to this is to take a countably complete ultrafilter on a set , and use probabilities valued in the ultrapower . Despite not being Cauchy complete, it inherits a notion of convergence of sequences from , since a sequence can be said to converge to , and this is well-defined (if is for a -large set of indices ) by countable completeness. Thus the countable additivity axiom makes sense for -valued probability measures. Allowing models to use -valued probability measures might make the compactness theorem work.