Probabilistic Proofs of Undecidable Sentences?

22 04 2009

As I suggested (though perhaps not made as clear as I would have liked) in my forthcoming paper “Probabilistic Proofs and Transferability”, I think that probabilistic proofs can be used to provide knowledge of mathematical facts, even though there are reasons why we might think they should remain unacceptable as part of mathematical practice. (Basically, even though mathematicians gain knowledge through testimony all the time, there may be reasons for a methodological insistence that all mathematical arguments not rely essentially on testimony, the way probabilistic proofs (just like experiments in the natural sciences) do.) Given that this allows for a non-deductive means of acquiring mathematical knowledge, a natural question would be whether probabilistic proofs can be used to settle questions that are undecided by standard axiom sets. Unfortunately, I think the answer is probably “no”.

The sort of probabilistic proofs that I endorse aren’t just any old argument that isn’t deductively valid. In my paper, I contrast probabilistic proofs of the primality of a given number with certain arguments in favor of the Goldbach Conjecture and the Riemann Hypothesis, as well as lottery arguments. The primality arguments work by showing that if n is prime, then a certain result is guaranteed, while if n is composite, then the probability of the same sort of result is at most 1/4k, where k can be made as large as one wants by generating more random numbers. As a result the evidence “tracks” the truth of the claim involved (that is, the evidence is very likely if the claim is true and very unlikely if the claim is false).

In the case of the Goldbach Conjecture, one of the pieces of “probabilistic” evidence that we have is that none of the first several million even numbers is a counterexample to the claim. However, although this evidence is very likely if the claim is true, we have no good a priori bounds on how likely this evidence would be if the claim were false. Thus, the evidence may give us some reason to believe the Goldbach Conjecture, but it’s hard to see just how much more evidence we’d need in order for this to give us knowledge that the Goldbach Conjecture is in fact true.

In the case of the Riemann Hypothesis, I don’t have room (or the knowledge) to state all the relevant pieces of evidence, but in this case the evidence is more of the type of “inference to the best explanation”, but again lacks any sort of probabilistic bounds on the possibility of it arising in misleading cases. This is not to say that this type of evidence can’t provide knowledge – in fact, we may already have enough evidence to count as knowing that the Riemann Hypothesis is true, though I’m definitely not certain of this. At any rate, if this does provide knowledge, it’s of a different sort than probabilistic proof, and may well have its own problems for acceptability in the mathematical canon.

In “lottery cases”, we have strong probabilistic evidence for the claim that a particular ticket will not win (in particular, that this ticket is in a fair lottery, and that there are a billion separate tickets, exactly one of which is a winner). However, most people want to say that we don’t have knowledge of this claim. Again, at least part of the reason is that our evidence doesn’t track the truth of the claim – nothing about our evidence is sensitive to the ticket’s status, and the evidence is just as likely if the claim is false (that is, the ticket is a winner) as if it’s true (that the ticket isn’t a winner).

Therefore, it seems that for any sort of evidence that we’d count as a “probabilistic proof” that gives us knowledge of some mathematical claim, we’d want to have good bounds on the probability of getting such evidence if the claim were true, as well as good bounds on the probability of getting such evidence if the claim were false. At the moment, I can’t really conceive of ways of generating mathematical evidence where there would be clear bounds on these probabilities other than by randomly sampling numbers from a particular bounded set. (There may be possibilities of sampling from unbounded sets, provided that the sampling method doesn’t rely on a non-countably-additive probability, but it’s a bit harder to see how these sorts of things could be relevant.)

Now let’s consider some claim that is independent of Peano Arithmetic, and see if there might be any way to come up with a probabilistic technique that would give evidence for or against such a claim, that had the relevant probabilistic bounds.

Unfortunately, I don’t think there will be. Consider two models of PA – one in which the claim is true, and one in which it’s false. We can assume that one of these models is the standard model (that is, it just consists of the standard natural numbers) while the other is some end-extension of this model. If our probabilistic method involves generating natural numbers through some random process and then checking their properties, then this method will have the same probabilities of generating the same numbers if applied in either one of these models (since presumably, the method will only ever generate standard natural numbers, and not any of the extra things in the non-standard model). Thus, any result of such a test will have the same probability whether the statement is true or false, and thus can’t be sensitive in the sense demanded above.

Now, one might worry that this argument assumes that conditional probabilities will have to be probabilities given by considering a particular non-standard model. This clearly isn’t what we’re doing in the case of primality testing however – we don’t consider the probabilities of various outcomes in a model where n is prime and in a model where n is not prime, not least because we have no idea what a model of the false one would even look like.

However, I think this is what we’d have to do for statements that could conceivably be independent of PA. Consider any process that involves sampling uniformly among the integers up to k, and then performing some calculations on them. Knowing the truth of PA, we can calculate what all the possible results of such a process could be, and verify that these will be the same whether our independent statement is true or false. Since this calculation will tell us everything we could possibly gain from running the probabilistic process, and it hasn’t given us knowledge about the independent statement, the probabilistic process can’t either, since it adds no new information, and therefore no new knowledge.

If we allow for probabilistic processes that sample from all natural numbers (rather than just sampling uniformly from a bounded range of them) I think we still can’t get any relevant knowledge. Assuming that the sampling process must have some countably additive probability distribution, then for any ε there must be some sufficiently large N that the probability that the process never generates a number larger than N is at most ε. By the above argument, the process can’t tell us anything about the independent statement if it never generates a number larger than N. Therefore, the difference in probability of various outcomes given the statement and its negation can’t be any larger than ε. But since ε was arbitrary, this means that the difference in probabilities must be 0. Thus, the process still fails to meet the sensitivity requirement, and therefore can’t give knowledge about the independent statement.

Thus, we can’t get probabilistic information about independent claims. Of course, since many people will be willing to say that we have good reason to believe the Goldbach Conjecture, and the Riemann Hypothesis, on the evidence mentioned above, we may be able to get arguments of those sorts for independent claims. But such arguments aren’t really “probabilistic” in the sense standardly discussed.

Additionally, there are many claims independent of Peano Arithmetic that we can prove using standard mathematical means. Some examples include Goodstein’s Theorem, the Paris-Harrington Theorem, and Borel Determinacy. (The latter is in fact so strong that it can’t be proven in Zermelo set theory, which is ZF set theory without the Axiom of Replacement.) In fact, the arguments given above show that probabilistic methods not only fail to decide claims that are undecided in PA, but in fact fail to decide claims that are independent of substantially weaker systems, like Robinson arithmetic and probably even Primitive Recursive Arithmetic. Since basically everything that is of mathematical interest is independent of PRA (basically all it can prove is statements saying that a recursive function has a specific value on specified inputs, like “376+91=467”) this means that probabilistic proofs really can’t do much. All they can do is give feasible proofs of claims that would otherwise take a very long time to verify, as in the primality of a particular number.

Coding Truth and Satisfaction

23 07 2007

I just got back from a week visiting the Canada/USA Mathcamp, where I spent a few days teaching a class on Gödel’s theorems. I only got to the incompleteness theorems on the last day, when I introduced Gödel numbering, showed that some syntactic relations were definable, suggested that provability was therefore definable, and showed that truth (in the standard interpretation) is not definable. Thus, I didn’t get to anything like the full incompleteness theorems, but just showed that there is no recursive set of axioms such that truth in the standard interpretation corresponds to provability from those axioms.

The thing about showing that truth is not definable is that you normally go through satisfaction instead. Given the fact that syntactic relations are definable, we can define functions NUMERAL(z) (which returns the code number for the numeral expressing z) and SUBST(x,y) (which takes x as the code of a formula with one free variable, and substitutes in the term that y is the code of, and returns the code of the resulting sentence). Then SAT(x,y) (which says that x is the code of a formula satisfied by number y) can be defined in terms of TRUE(x) by TRUE(SUBST(x,NUMERAL(y))).

Then, it’s fairly straightforward to show that SAT(x,y) is not definable. If it were, then we could define \lnot SAT(x,x), which would have some code number n. But then \lnot SAT(n,n) would be true iff n didn’t satisfy \lnot SAT(x,x), which is a contradiction. Therefore, SAT(x,y) is not definable.

Afterwards, someone pointed out that this proof of the undefinability of SAT(x,y) doesn’t make any assumption about how the coding works, which seems somewhat surprising. After all, I could take highly non-effective codings, and the undefinability of SAT would still hold. This is surprising, because TRUE(x) can be defined under certain encodings. One such encoding is just to separate the sentences into true and false ones, and code them respectively by even and odd numbers, using some sort of lexicographic ordering. Then, since being even and being odd are both definable, truth would be definable as well. But it turns out that no trick like this is going to let satisfaction be definable, because of the diagonalization argument I gave above.

My guess for why this is is because satisfaction requires something like a coding of some of the syntactic structure as well as some of the semantic structure. Standard codings make the syntax definable, but then the semantics fails. Certain bizarre codings let the semantics be definable, but they make the syntax fail. But since the syntax and semantics aren’t recursive in terms of each other, there’s no way to make them both recursive, the way satisfaction seems to require. (Of course, you could probably choose some awful coding that makes both sets of relations undefinable, but why would you want to do that?)

Anyway, going back to camp is always great for precisely this sort of reason – I get to try to teach something new and interesting that I haven’t taught before, and people ask questions that also help me understand the material in a new way.

Penrose and Perry

17 07 2006

Last week, one of the campers asked me about Penrose’s arguments about the mind based on Gödel’s theorems. I wasn’t entirely sure about them, since it’s been years since I read Penrose, but I discussed it for about an hour with the camper and another staff member that was interested, and helped clarify (to myself as well) what seems to be going on here. In the process, I think I’ve found a connection between an issue of these arguments and John Perry’s “The Problem of the Essential Indexical”. (Of course, I should probably check out the book on Gödel’s theorem by the late Torkel Franzén for more about these arguments.)

Anyway, as far as I can remember (which may not be terribly accurate), Penrose’s arguments go something like the following. If humans were computational beings, then the set of mathematical truths knowable by humans would be recursively axiomatizable. The set of truths knowable by humans seems to include Peano arithmetic. Therefore, Gödel’s second incompleteness theorem tells us that the fact that this set of truths is consistent is independent of this set, and thus the consistency can’t be known. However, since we know that the truth is consistent, and we know this set of sentences to be true, this means that we know the set is consistent, which is a contradiction. Therefore, our initial assumption that humans are computational must be false.

Some versions of this argument might use some propositional attitude other than knowledge – I think those would generally be weaker arguments, because there’s no good reason to suppose that beliefs (or anything else) should be consistent. I will also bracket the contention by various non-classical logicians that the truth might not actually be consistent – I think all that we need is that 0=1 is not provable from the knowable truths, and that Gödel’s theorem tells us we can’t prove that. And besides, I can’t conceive of what it would mean for the truth to be inconsistent (except perhaps for some special cases like the liar paradox).

I think the important challenge to this argument comes when I say “we know this set of sentences to be true, [and thus] we know the set is consistent”. There seems to me to be an important ambiguity in knowing a set of sentences to be true. In particular, there is a difference between knowing, of each sentence in the set, that it is true, and knowing of the set as a whole, that all of its elements are true sentences. To know that the set is consistent, one needs the latter, rather than the former, because consistency is a property of the set as a whole, and not of its individual sentences, the way truth is.

In other words, although I might know that “the set of all mathematical truths that I know is a consistent set”, it seems plausible that if I was presented with this same set under a different mode of presentation, I might not recognize it as a consistent set. This seems to be an important step in this version of Penrose’s argument – I don’t think Gödel’s theorems would cause trouble for my knowledge of the statement “the set of all mathematical truths I know is consistent”, though it would cause trouble for any system S proving the statement “system S is consistent”, when system S is presented in some sort of transparent manner, say by listing its axioms. Presenting it as “my system” doesn’t let me prove much about its consequences, so I can know it to be consistent. Presenting it in a more extensional format lets me prove a lot about its consequences, perhaps at the cost of my knowing that it’s consistent. Penrose would need to bridge that gap in order for his argument to be valid. (At least as stated above – it seems quite plausible that he’s got a more subtle argument that gets around these points.)

Anyway, I found an interesting duality here to Perry’s point in “The Problem of the Essential Indexical”. In that paper, Perry points out the importance of learning some essentially indexical information in addition to purely descriptive information. In his example, one can be in a grocery store and see a trail of spilled sugar, and realize “someone has a leaky bag of sugar – I should go let him or her know”. After following the trail for a little bit, one realizes one is walking in a circle, and evetually gains the new information “I have a leaky bag of sugar”, at which point one can remedy the situation. This information must be presented with the indexical “I”, rather than with any description or other third-person presentation (unless one already has the connection between that description and the first person).

Conversely, in this case with the Penrose argument, it seems to me that one might have the indexical information, without the third-person description! That is, one can know “the set of mathematical truths I know is consistent” (because it is a set of truths), without knowing, “system S is consistent”, even if system S characterizes the set of mathematical truths I know. Perhaps all that’s important here is the difference between two different descriptions of the same set, but maybe there’s a role for Perry’s “essential indexical” as well, though the role is the opposite of the one he is concerned with.

Church-Turing Theses

18 04 2006

On my way back from Austin, I visited my friend Bob McGrew in Palo Alto, and we caught up with each other a bit. At a certain point, when we were talking about our respective research, the subject of the Church-Turing Thesis came up (I don’t remember how, because it’s not really directly related to what either of us does, though not terribly far from either). In our discussion, we both clarified distinct versions of the CT Thesis that the other hadn’t realized.

The basic idea is that the CT thesis is an analysis of the pre-theoretic notion of computability – in this version the thesis approximately states: A problem is intuitively computable iff it can be solved by a Turing machine. However, it’s easy to forget that this is a conceptual analysis claim (especially if one isn’t a philosopher, and thus exposed to conceptual analyses all the time). There are only a few other equally widely accepted conceptual analyses – perhaps Tarski’s account of logical consequence (which has been disputed by John Etchemendy), and maybe one or two others. But in general, these are highly contentious claims, that are often disputed, the way Gettier disputed the justified true belief account of knowledge.

Because this particular analysis has been so successful, it’s easy for computer scientists to forget that it is one, and instead think the CT thesis is the following: A problem is physically computable iff it can be solved by a Turing machine. Now that we’re used to physical computers doing this sort of stuff all the time, it seems that this is what we’re talking about – but there was a notion of what an algorithm was, or an effective process, long before we had machines to carry them out. And people can still intuively recognize whether a sequence of instructions is algorithmic without trying to reduce it to a physically programmable algorithm. However, CS texts very often say something like, “the CT thesis is a statement that is not amenable to proof or verification, but is just a matter of faith”. This statement doesn’t accurately describe this second thesis though – after all, it’s a matter of empirical physical investigation whether relativity or quantum mechanics allow for any processes that are not Turing-computable. Perhaps more relevantly, it may also be an empirical matter whether a physical device with the full strength of a Turing machine can be created! After all, a Turing machine needs an unbounded tape, and actual computers have finite memory space – and every device that can only accept or reject finitely many strings is automatically computable on just about every sense of the word (including ones much weaker than Turing-computable, like finite-state-machine recognizable, and the like).

At any rate, one might want to make the three-way identity claim: The pre-theoretic notion of computability, Turing’s account of computability, and physical computability all coincide.

What Bob pointed out to me was that many computer scientists actually subscribe to a claim that’s much stronger than these ones. Rather than focusing on the general notion of what is computable, it deals with the speed of the computability. Roughly speaking, this thesis says: Every physically realistic model of computation is polynomially reducible to every other one. I believe that this thesis has lately been threatened by the feasibility of quantum computation – although it preserves the absolute notion of computability, quantum computation allows computations that could only be done in non-deterministic polynomial time (NP) to be done in deterministic polynomial time (P). I suppose if P=NP turns out to be true, then this doesn’t damage this thesis, but since most people believe that it’s false, this seems to be evidence against this thesis.

However, there is a slightly weaker version available, that is still much stronger than the earlier ones, saying: Every mathematical model of computation from a certain list is polynomially reducible to every other one. This seems to be true with register machines, Turing machines, and apparently lots of the other formalisms that have been developed, both in the 1930’s and later. Of course, the claim can’t be strengthened too much, because an oracle for Turing computability could solve all computable problem in constant time. We also need to specify an appropriate notion of time and/or space complexity for each notion of computing. But so far, the most natural-seeming notions have ended up having this feature. Which is really quite interesting when you think about it.

More on Gödel, Structuralism, and Machines

4 06 2005

I was just talking to Fabrizio Cariani (one of the other students in my program here) about my earlier post on a version of structuralism that treats elementarily equivalent structures as the same, rather than just isomorphic ones. But he pointed out that although this would avoid the problems of the Löwenheim-Skolem theorems, it still wouldn’t avoid the problem of Gödel’s incompleteness theoresm. That is, we need to have a complete theory to specify a structure completely in this sense, but Gödel’s incompleteness theorems show that no recursive theory extending Peano arithmetic or ZFC is complete, so we can’t even pin down the natural numbers or the sets to this extent. It’s sort of nice that the platonist account of mathematical objects runs afoul of isomorphic models, the standard structuralist account runs afoul of the Löwenheim-Skolem theorems, and the “elementary equivalence structuralist” account runs afoul of the Gödel theorems.

Fabrizio had been thinking about this because he had just been looking at the result that there are continuum-many non-equivalent countable models of PA. This is because for any recursive, consistent theory T extending PA, Gödel shows that T+Con(T) and T+~Con(T) are both recursive, consistent theories extending PA. Thus, there is a complete binary tree of such theories, and by the compactness theorem, each infinitary branch of the tree is a consistent theory extending PA. Since any two such branches differ at some stage over whether they add Con(T) or ~Con(T), the models of the theories are going to be non-equivalent, so there are continuum many non-isomorphic models. (This is also the theoretical upper bound on the number of countable models for any theory in a finite language, because there are only countably many objects and tuples of objects, and for each of the finitely many predicates and relations in the language, there are two choices for each tuple of the appropriate size.)

Now, returning to yesterday’s post on mechanism, I think Lewis’ point is stronger than I had thought originally. In the comments, Peter McB suggests that there is a machine that takes any consistent, recursive theory T extending PA and outputs the sentence Con(T). Now that I think about it, I think the procedure Gödel himself used is amenable to this sort of recursive construction. Now, for any countable string of 0’s and 1’s, we can use this machine to recursively enumerate an axiomatization for some consistent extension of PA, by letting T_0 be PA, and using the machine to enumerate the axioms Con(T_i) or ~Con(T_i), depending on whether the ith bit of the string is 1 or 0. The fact that these enumerations are recursive in the string doesn’t violate Gödel’s theorem, because it just means that the theory PA + Con(PA) + Con(PA+Con(PA)) + … is itself not a complete theory. And similarly for the theory PA + ~Con(PA) + ~Con(PA+~Con(PA)) + …, and so on for any other theory given by some recursive string of 0’s and 1’s.

Now, it seems intuitive to me that no such theory is going to be complete, but Gödel’s theorem only tells us that the ones produced by recursive strings of 0’s and 1’s aren’t complete. Is it possible that some non-recursive string of 0’s and 1’s gives a complete theory? That would be weird. I suppose recursion theorists might have a better idea about this.

Lewis on Lucas on Gödel

3 06 2005

I recently bought David Lewis’ Papers in Philosophical Logic and have been reading some of the essays recently, starting with the shorter ones. I hadn’t remembered how engaging his writing style is.

Anyway, I should probably read Roger Penrose again some time, but from what I remember of his theses in The Emperor’s New Mind and Shadows of the Mind, it looks like they were pretty soundly refuted long before he wrote the books. This was by Lewis in his two papers “Lucas against Mechanism” and “Lucas against Mechanism II”. The first one just makes the point that the inference rule from a theory to its consistency statement is in general infinitary, and thus can’t clearly be run in general. But the second one faces a tougher argument, because of the way Lucas seemed to moderate his position. (I haven’t actually read Lucas, or even heard of him before reading these two papers by Lewis, but it sounds like he was advocating the same position that Penrose did, that our ability to recognize Gödel statements proves that humans are not Turing machines.)

In the second paper, Lewis phrases Lucas’ argument as a challenge to the mechanist to produce a Turing machine that generates exactly the set of truths that Lucas recognizes to be true. Lucas can then find the Gödel statement of this machine and recognize it to be true, which shows Lucas not to be this actual machine. But Lewis points out that since this allows Lucas to respond to challenges, the machines proposed must also be the sort that can respond to challenges. So if the mechanist proposes that Lucas is the machine M, then Lucas might either be responding with the Gödel number of the machine M, or with the Gödel number of the machine M would turn into in response to a challenge with the number M. But Lewis claims that there are many machines that make the former response to any true accusation that they are M, and there are many machines that (falsely) make the latter response to any true accusation that they are M, so Lucas’ ability to give such a response doesn’t prove him not to be a machine.

This is where I would like a bit more clarification. At first Lewis’ claim sounded ridiculous, because it sounded like his response to the former was a machine that takes any input and responds with the Gödel number for that input, which sounds impossible to me. (Maybe I’m wrong about that?) Now I think that he actually meant something more like a machine that gives the same response to any challenge, and this response just happens to be its own Gödel number. After all, Lewis only required it to function right on true challenges to its identity. I wish I could link to the text of the article (it’s just 4 pages), but the Canadian Journal of Philosophy seems not to have its issues from 1979 online.

(The reason I filed this under the category “set theory” is because of the last few lines of the first “Lucas against Mechanism”:

We do not know how Lucas verifies theoremhood in Lucas arithmetic, so we do not know how many of its theorems he can produce as true. He can certainly go beyond Peano arithmetic, and he is perfectly justified in claiming the right to do so. But he can go beyond Peano arithmetic and still be a machine, provided that some sort of limitations on his ability to verify theoremhood eventually leave him unable to recognize some theorem of Lucas arithmetic, and hence unwarranted in producing it as true.

This is I think exactly what set theorists are doing – trying to see just how far beyond PA (and in fact, beyond ZFC) they can go.)

Field on Consistency

2 02 2005

In “Realism, Mathematics, and Modality”, pp. 31-32, Hartry Field makes what seems to me to be a muddled argument about the consistency of set theory on platonistic grounds. He starts by suggesting that it is unproblematic that if a theory T is true, then it must be consistent. However, he then suggests that the Gödel and Henkin proofs of the completeness theorem, guaranteeing the existence of a model of T, are a sort of “accident of first order logic”, as if this were an argument for the platonist against identifying logical consistency with the existence of a model.

If S is a sentence that is false in set theory, then set theory shouldn’t imply S. But if no set – no proper part of the set-theoretical reality – could serve as a model for the entire set-theoretical reality, as seems prima facie possible, then set theory would Tarski-imply S. Again, if we are platonists we are saved from this possible extensional divergence between implication and Tarski-implication, by virtue of [the completeness theorem].

I suppose that here by “set theory” he means to refer to the collection of all true first-order statements in the language of set theory (working from a platonistic perspective). What makes this make sense is the fact that “set theory” is not recursively axiomatizable (the way ZFC or any of its standard extentions are), so Gödel’s Incompleteness Theorem doesn’t apply here. Thus, “set theory” can be a theory that implies its own consistency, and thus has models of itself inside any model. I don’t see why the Completeness Theorem is seen as somehow “accidental” and thus not reason enough for the platonist to accept the Tarski definition of implication.

Rather, what seems to me a better argument against the platonist set theorist using the Tarski definition of logical consequence (which relies on the existence of models, which are set-theoretic constructs) is that logical consequence should be prior even to set theory even for the platonist. This is certainly the case for the logicist, who says that the way we know things about the set-theoretic universe is because they are logical truths. Thus, to know that truths about sets are logical truths, we must have some way of recognizing logical truth independent of the existence of set-theoretic entities whose existence has not been justified.

I think even other non-logicist platonists would have to accept the primacy of logical consequence over set theory, because the ideas of model theory are taken to apply to models like “the actual universe of sets” rather than just set-theoretic models. This is an interesting recursivity in foundational mathematics that I have noted, that model theory talks about set-theoretic entities, while set theory assumes that we are working within a model. I think if both are taken literally, then it really is a vicious circle, but we can get out of it by making a move like the one Field argues for in a confused way, by saying that logical consequence is basic, and Tarski merely gave us an extensionally adequate analysis of it.