Guessing the result of infinitely many coin tosses

26 09 2008

I’ve stolen the title of a very interesting post by Andrew Bacon over at Possibly Philosophy. He considers a situation where there will be infinitely many coin flips occurring between noon and 1 pm, but instead of getting faster and faster, they’re getting slower and slower. In particular, one coin flip occurs at 1/n past noon for each integer n. He shows that there exists a strategy for guessing, where each guess may use information about how previous coin flips have come up (which sounds like it should be irrelevant, because they’re all independent 50/50 flips), such that one is guaranteed to get all but finitely many of the guesses correct. In particular, this means that although there is no first flip, there is a first flip on which one gets an incorrect guess, so up to that point one has successfully guessed the result of infinitely many independent fair coin tosses.

Check out his post to see the description of the strategy, or work it out yourself, before going on.

Anyway, I managed to figure this out, only because it’s formally identical to a game with colored hats that I had discussed earlier with math friends. But I think Andrew’s version with the coin flips is somehow even more surprising and shocking!

He begins his discussion with a discussion of a very nice paper by Tim Williamson showing that allowing infinitesimal probabilities doesn’t get us around the requirement that the probability of an infinite sequence of heads be 0. (Coincidentally, I’m currently working on a paper extending this to suggest that infinitesimal probabilities really can’t do much of anything useful for epistemological purposes.) Andrew concludes with a bit more discussion of these probabilistic issues:

Now this raises some further puzzles. For example, suppose that it’s half past twelve, and you know the representative sequence predicts that the next toss will be heads. What should your credence that it will land heads be? On the one hand it should be 1/2 since you know it’s a fair coin. But on the other hand, you know that the chance that this is one of the few times in the hour that you guess incorrectly is very small. In fact, in this scenario it is “infinitely” smaller in comparison, so that your credence in heads should be 1. So it seems this kind of reasoning violates the Principal Principle.

I was going to leave this following series of remarks as a comment, but it got bigger than I expected, and I liked this puzzle so much that I figured I should point other people to it in my own post anyway. So I don’t think that this puzzle leads to a violation of the Principal Principle. (For those who aren’t familiar, this principle comes from David Lewis’ paper “A Subjectivist’s Guide to Objective Chance” and states the seeming triviality that when one knows the objective chance of some outcome, and doesn’t know anything “inadmissible” specifically about the actual result of the process, then regardless of what other historical information one knows, one’s degree of belief in the outcome should be exactly equal to the chances. The only reason this principle is normally in dispute is in terms of figuring out what “inadmissible” means – if we can’t make sense of that concept then we can’t use the principle.)

The set of representatives involved in picking the strategy is a non-measurable set. To see this, we can see that the construction is basically a version of Vitali’s construction of a non-measurable set of reals. The only difference is that in the case of the reals we make use of the fact that Lebesgue measure is translation invariant – here we make use of the fact that measure is invariant under transformations that interchange heads and tails on any set of flips. Instead of the set of rational translations, we consider the set of finite interchanges of heads and tails. (This argument is basically outlined in the paper by Kadane, Schervish, and Seidenfeld, “Statistical Implications of Finitely Additive Probability”, which I’ve been working through more carefully recently for another paper I’m working on.)

This suggests to me that the set of situations in which you make the wrong guess on flip n may well be unmeasurable as well. (I haven’t worked out a proof of this claim though. [UPDATE: I have shown that it can sometimes be measurable depending on the strategy, and I conjecture that whenever it is measurable, it will have probability 1/2. See the comments for the argument.]) So from the outset (before the game starts) it looks like you can’t (or perhaps better, don’t) assign a credence to “I will make the wrong choice on flip n”. However, once it’s actually 1/n past twelve, you’ve got a lot more information – in particular, you know the outcomes of all the previous flips. I suspect that conditional on this information, the probability of making the wrong choice on flip n will just be 1/2, as we expect. (Of course, we have to make sense of what the right conditional probability to use here is, because we’re explicitly conditionalizing on a set of probability 0, as per Williamson.) Thus, there’s no violation of the Principal Principle when the time actually comes.

However, there does appear to be a violation of a principle called “conglomerability”, which is what my interest in the Kadane, Schervish, and Seidenfeld paper is about. One version of this principle states that if there is a partition of the probability space, and an event whose probability conditional on every event in this partition is in some interval, then the unconditional probability of that event must also be in this interval. In this case we’ve got a partition (into all the sequences of flips down to the nth) and the probability of guessing wrong conditional on each event in that partition is 1/2, and yet I’ve said that the unconditional probability is undefined, rather than 1/2.

I think I have a defense here though, which is part of my general defense of conglomerability against several other apparent counterexamples (including some in this paper and others by Seidenfeld, Schervish, and Kadane, as well as a paper by Arntzenius, Elga, and Hawthorne). The problem here is that the event whose unconditional probability we are interested in is unmeasurable, so it doesn’t even make sense to talk about the unconditional probability. In the case of the conditional probabilities, I’d say that strictly speaking it doesn’t make sense to talk about them either. However, conditional on the outcome of the flips down to the nth, this event is coextensive with something that is measurable, namely “the coin will come up heads” (or else perhaps, “the coin will come up tails”), because the sequence of flips down to that point determines which outcome I will guess for that flip. Thus these events have well-defined probabilities.

The only reason this situation of an apparent violation of conglomerability emerges is because we’re dealing explicitly with unmeasurable sets. However, I’m only interested in the application of probability theory to agents with a certain type of restriction on their potential mental contents. I don’t want to insist that they can only grasp finite amounts of information, because in some sense we actually do have access to infinitely many pieces of information at any time (at least, to the extent that general Chomskyan competence/performance distinctions suggest that we have the competence to deal with infinitely many propositions about things, even if in practice we never deal with extremely large sentences or numbers or whatever). However, I think it is reasonable to insist that any set that an agent can grasp not essentially depend on the Axiom of Choice for the proof that it exists. Although we can grasp infinitely much information, it’s only in cases where it’s nicely structured that we can grasp it. Thus, the strategy that gives rise to this problem is something that can’t be grasped by any agent, and therefore this problem isn’t really a problem.


Foundations of Category Theory

1 12 2007

Yesterday, Solomon Feferman from Stanford was up at Berkeley to give the logic colloquium, and he had a very interesting discussion of foundations for category theory. I figured I’d write up the outlines of the basic theories he discussed, because mathematicians might well be interested in how this sort of thing could work. I’d also be interested in hearing what people who actually use category theory think of all this. The issues that came up are the reason why set theory suffices as a foundation for all mathematics before category theory, but has troubles here. The basic idea is that he wants some formal framework in which one can work with categories, in a way that the following four criteria are satisfied to a sufficient degree:

The Criteria

1. There should be a category of all groups (with homomorphisms), a category of all sets (with functions), a category of all topological spaces (with continuous maps), and a category of all categories (with functors).

2. If A and B are two categories, then there should be a category consisting of all functors from A to B, with morphisms given by natural transformations.

3. Standard set-theoretic operations used throughout mathematics should be possible – things like taking unions and intersections of a set of sets, constructing powersets and ordered pairs and n-tuples and the like.

4. The framework should be provably consistent from some standard set-theoretic background. (This criterion is obviously going to seem unnecessary for someone like Lawvere, who wants to use category theory as an alternate foundation for mathematics. For someone like that, the really interesting criterion will probably be #3, because they’ll have to translate away all talk that mathematicians normally have of sets in other terms. Maybe this has been done, but I’ve had a lot of conceptual difficulty trying to read Lawvere’s textbook treatment of this stuff. At any rate, I don’t think most mathematicians will find this any more congenial than working in ZFC directly. As a result, this sort of thing wasn’t what Feferman was talking about, but he did a good job clarifying the interplay of category theory and foundational work in the more usual frameworks.)

Most of the talk was spent describing several systems that allow for some degree of satisfaction of all these criteria, but it looks like no one yet has come up with a framework that allows for full satisfaction of all of them.

The Frameworks

Mac Lane’s small and large categories

This is probably the most familiar way of talking about these issues for mathematicians. In the background, we use a standard theory to talk about sets and proper classes (Bernays-Gödel in particular). This theory gives us the full power of ZFC when talking about sets, but also allows us to talk about proper classes whose elements are sets. On this picture a category is a class of objects, together with a class of morphisms between them, satisfying the standard axioms. If both of these classes are sets, then the category is said to be small, while if either one is a proper class, then the category is said to be large.

We can see how well it fits all the criteria:

1. There are categories of all groups, of all sets, and all topological spaces, but each of these categories is itself large. There is a category of all small categories, but this category obviously doesn’t have any of the previous categories (or itself) as an object. Thus, it makes sense to talk about the fundamental group operation as a functor from the category of topological spaces to the category of groups, but this functor can’t itself be seen as a morphism in the category of categories.

(Note that these categories might be seen to leave out some members. Once we can talk about proper classes, there are some “large” groups, sets, and topological spaces, just as there are large categories. An example of a large group is Conway’s surreal numbers, which form a field, but are a proper class, so they are a large group, and large ring, as well as a large field. So we might also think there’s a problem with the category of groups since it doesn’t tell us about morphisms from arbitrary groups into the surreal numbers.)

2. If A is a small category, then functors from A to B can be represented by sets of ordered pairs (in the usual way that functions are) so that there is in fact a category containing all these functors as objects. If B is small, then this category is small, and if B is large, then this category is large as well. However, if A is a large category, then functors would themselves be proper classes, and therefore can’t be members of classes, and therefore there is no category containing these functors. This restriction may or may not be problematic. (I don’t know enough about what mathematicians actually do with categories, so I don’t know if it’s important to have this functor category between large categories.)

3. Fortunately, since the background theory is built from a standard set theory, all standard set-theoretic constructions work just fine.

4. Bernays-Gödel set theory was constructed specifically to be a conservative extension of ZF. Therefore, it is consistent iff ZF is. And just as ZFC is conservative iff ZF is, Bernays-Gödel is consistent with the Axiom of Choice iff it is consistent by itself.

Grothendieck’s “axiom of universes”

This is another background I’ve heard of before. This framework basically extends Mac Lane’s hierarchy so that instead of just small and large categories, there are infinitely many levels of categories, each of which is “small enough” to be an object in a larger category.

The official technical details are as follows. We assume standard ZF set theory, and then add the axiom that every set is contained in a universe. A universe is just a transitive set (meaning that it contains everything contained in any of its members), which also contains the powerset of any of its members, and also contains the range of any definable function applied to any set inside the universe. Basically, this just means that a universe is a standard model of ZF. In this framework, a category is just a set of objects together with a set of morphisms, following the standard axioms (none of the proper class stuff that we get with Mac Lane).

1. Now, since the collection of all groups forms a proper class, there is no category of all groups. However, for any universe U, since this universe is a set, the collection of all groups contained in U is itself a set, so we get a category of all U-groups. Similarly, for any universe U there is a category of all U-sets, all U-topological spaces, and (importantly) all U-categories.

By the Axiom of Universes, we see that every set is contained in some universe. Since every category has some underlying set, this category is itself contained in some universe U, and thus is itself an object in the category of all U-categories. None of these categories contains itself, but no matter how big the categories are you’re talking about, there is some category of categories that contains everything that big. This lets you do lots of work that couldn’t be done in the Mac Lane framework, because on his framework, large categories aren’t objects in any category, while here we can always just go up one level. (However, Mac Lane does have the advantage that there is a single category containing all groups, while here no single category does.)

2. Again, if A and B are categories in some universe U (there must be some universe containing the set {A,B}, and this universe must contain both A and B) then every function from one to the other is itself in that universe U, and thus the category of functors from A to B is itself in U. This again is an improvement over the Mac Lane situation.

3. Since we’re using standard ZF set theory, this is straightforwardly satisfied.

4. There is a slight loss over Mac Lane’s framework here. The existence of a universe implies the consistency of ZF, because the universe is itself a model of ZF. Therefore, if the consistency of ZF implied the consistency of the Axiom of Universes, then the Axiom of Universes would prove its own consistency, which is impossible by Gödel’s theorem. Since this Axiom of Universes framework was implicitly used by Wiles in his proof of Fermat’s Last Theorem, we therefore don’t yet know that ZFC is strong enough to prove the result.

However, the situation is not so bad. The existence of a universe containing a set S is just the same thing as the existence of an inaccessible cardinal with higher cardinality than S. Therefore, the Axiom of Universes is equivalent to the existence of unboundedly large inaccessible cardinals. This is provably consistent if we assume the existence of any single large cardinal that is even larger (such as a Mahlo cardinal, or a measurable cardinal), so set theorists are basically just as certain that this theory is consistent as they are of ZFC. You might have further worries about whether these cardinals actually exist (so that the system has no false premises, and therefore leads only to true conclusions), but those are the sorts of worries ordinary mathematicians like to ignore.

Feferman’s 1969 system

It turns out that Grothendieck’s system can be brought down in strength. Instead of using the Axiom of Universes, we just add a bunch of terms U1, U2, and so on (through all the ordinals, if necessary), and add some axioms. For each proposition P expressible in the language of set theory, add the axiom:
PU_i iff P
where PU_i is the same as P, with all quantifiers explicitly restricted to range only over Ui. This has the effect of making each set Ui “just like” the entire universe. Thus, we can just formulate Grothendieck’s system here. There are some slight differences: Grothendieck’s smallest universe satisfies the claim “there are no universes”, while other universes don’t – in this system, any two of these universes satisfy exactly the same sentences. But in general, those differences make this system better, because each universe represents the whole universe better than Grothendieck’s did.

However, there’s a nice advantage – this system is apparently equiconsistent with ZFC, rather than requiring the existence of additional inaccessibles. Thus, proving a theorem in this framework uses less strength than Grothendieck’s system does. I don’t know if the two systems are quite similar enough to translate the proof of FLT into this one, but that would bring down the strength needed quite a bit. So in some sense, mathematicians probably should use this system, rather than Grothendieck’s. There are still the limitations that Grothendieck has on the first two conditions, but the third and fourth are much better satisfied.

Feferman’s 1974 system using Quine’s “New Foundations”

This was the system Feferman spent the most time discussing. This one is also the most confusing, because it uses Quine’s system NF instead of a more standard set theory. The basic way that this system works is that instead of the complicated set of axioms of ZFC, we just have two very intuitive axioms. One is the axiom of extensionality, which says that there are no two distinct sets with exactly the same members. The other is a restriced axiom of comprehension. The basic axiom of comprehension just says that for every property, there is a set consisting of all things that satisfy that property. But as Russell pointed out to Frege in 1902, this full axiom is inconsistent, because it leads to Russell’s paradox of the set of all sets that don’t contain themselves. In ZFC this paradox is avoided by using several axioms to prove the existence of sets defined by various types of properties. Quine decided instead to avoid this (and related) paradoxes by restricting the types of properties that can be used to define sets. The only properties he allowed were ones that could be written correctly in a theory of types. In particular, each variable in the formula could be assigned a number, such that in x=y, x and y get the same number, while in x\in y the number assigned to y is exactly 1 greater than the number assigned to x. This prevents Russell’s paradox, because the formula \lnot(x\in x) can’t be assigned numbers in this way.

The nice thing about NF is that it allows sets to contain themselves, and it allows there to be a set of all sets (this set is just defined as the set of all things satisfying x=x). Of course, there are some very messy things that go on with the theory of cardinalities, because the powerset of the universe has to be smaller than the universe itself. There are also messy things because the standard definitions of ordered pairs and powersets and sets of functions from one set to another need to use formulas that aren’t “stratified” in the appropriate way. Feferman discussed some ways of fixing this so it could all work. A further difficulty arises in that so far no one yet knows what sort of system would be strong enough to prove that NF is consistent. However, if we allow for many objects that aren’t sets (this requires weakening the axiom of extensionality so that it only applies to things with at least one element) it turns out that this system is actually provably weaker than ZFC. (In particular, it can be proven to be equiconsistent with Zermelo set theory.) We can then define categories to be given by a set of objects and a set of morphisms.

So now we can see how this theory stands up to the criteria.

1. Since there is a set of all sets, and sets can contain themselves, it turns out that in this theory we really do get a category of all groups, a category of all sets, a category of all topological spaces, and a category of all categories that really contains itself as an object! This is the only theory we’ve listed on which this really works, and we really get all categories into a single category.

2. Again, as long as A and B are categories, we really do get a category of all functors from A to B, with all the natural transformations as morphisms (I think).

3. Here is where we really run into trouble. We’ve patched things up so that we can take unions and intersections and products and powersets and the like. However, the Axiom of Choice is provably false in this theory. Additionally, sets of functions sometimes have bad problems. Feferman said in particular that the Yoneda Lemma can’t be proven in this system, as well as some other standard things that mathematicians want to use. I don’t really know what the problems are, but these seem bad. The problem is that no one yet really understands NF well enough to know how to fix these things. Perhaps for the sake of category theory, this would be worth doing. But, Dana Scott pointed out after the talk that NF has a history of destroying careers wasted a lot of people’s time, and it’s the only foundational-type system whose full consistency strength still hasn’t been decided, and it’s been around for 70 years. [Correction: I had mischaracterized what Scott said and updated the post to reflect this. He also points out that Jensen showed that NFU, allowing urelemente, is consistent, though I’m not sure what consistency strength it has.]

4. As mentioned above, allowing basic elements that aren’t sets makes this theory equiconsistent with Zermelo set theory, but this consistency proof is also part of the weakness of the theory, since it can’t do as much with functions as standard set theories. If we managed to fix that part, this would presumably blow up the consistency strength much higher than Grothendieck’s system.


Thus, there’s a bunch of interesting ways out there to get something like what mathematicians want from category theory. However, none of these systems really gets everything that we want. The ones that we know are consistent, and we know have enough set-theoretic power for everyday mathematics, we also know can’t really talk about the category of all categories or the category of all groups. There are various replacement categories around (like the category of all U-categories, or the category of all small groups), and for many purposes these are enough. But they don’t quite get the conceptual ideas right. (One could level the same criticism directly at the set theories though – none of them really gets a set of all sets, or a class of all classes, except for the poorly understood New Foundations.)

This might motivate some further work by category theorists on foundations, and in particular something like Lawvere’s program, though his program requires a radical shift in understanding of the notions of set, function, collection, individual, and the like, so it doesn’t seem like it would be much more congenial to most mathematicians for fully formal work.

Of course, most mathematicians don’t need to do this stuff fully formally, so the situation is generally comfortable. But it’s nice to see where foundational work in set theory really could impact the lives of ordinary mathematicians.

Choice and Determinacy

15 11 2007

This is something I thought about a couple years ago when I was giving a talk to the math grad students about the Axiom of Choice and the Axiom of Determinacy. I never got around to writing up though, so I figured I’d put it here. My contention is going to be that (contrary to standard assumptions) classical mathematicians should accept Determinacy rather than Choice, and that intuitionists should actually support Choice, even though they have traditionally been some of its primary enemies.

The Axiom of Choice (AC) says that for any collection S of sets, if \forall (x\in S)\exists y(y\in x) then \exists f\forall (x\in S)(f(x)\in x). That is, for any collection of non-empty sets, there is a function that assigns an element of each set to that set.

The Axiom of Determinacy (AD) is slightly more complicated to explain. Given a set S of countable sequences of 0 and 1, we can define a game as follows – two players take turns naming either 0 or 1, and if the countable sequence produced is a member of S, then player I wins, and if not, then player II wins. AD says that for any game of this form, one of the two players has a winning strategy (i.e., a function from states that the game could be in on their turn to possible moves on that turn, such that for any sequence of moves the opponent could make, the eventual outcome is a win for this player).

It turns out that AD and AC are incompatible with one another, given the standard axioms of set theory. In particular, there is a standard proof (using AC) that some sets of real numbers are unmeasurable by Lebesgue measure, while AD actually entails that all sets of reals are Lebesgue measurable. (AD has all sorts of other nice consequences for sets of reals as well, which is why set theorists are interested in it. It basically avoids all the pathologies created by AC.) So mathematicians have to choose at most one of the two to accept.

Traditionally, intuitionists or constructivists have rejected AC, because it asserts the existence of a function that is not constructively given in any way. Even ordinary mathematicians realize that whenever AC is needed for a proof, the result is non-constructive. However, I think that intuitionists actually should accept AC. The reason is that an intuitionist shouldn’t accept a universal claim unless there’s a constructive (or uniform) way to prove every instance of that claim. Thus, if they accept the claim that \forall (x\in S)\exists y(y\in x), then it can only be because there is a constructive proof of it. But such a proof necessarily gives a function to satisfy \exists f\forall (x\in S)(f(x)\in x). Thus, although AC looks like it gives non-constructive existence proofs, it only does so because one has accepted universal claims that weren’t uniformly provable – when restricted to uniform constructions, it’s just a triviality, so the intuitionist should accept it. (The intuitionist avoids the pathological choice constructions by just denying that certain collections of sets were really uniformly proven to be non-empty.)

For the classical mathematician however, who rejects the intuitionist assumptions that proofs should be constructive, this argument for AC is unavailable. Additionally, there is a sort of argument that should support AD. Recall that AD says that for any two-player game with countably many turns where the players take turns naming 0 or 1 (or generally integers – it turns out this general version is a consequence of the 0/1 version), one of the players has a winning strategy. If the variables x_i are all understood to be restricted to {0,1}, then we can express what it means for the player I to have a winning strategy as follows:

\exists x_1 \forall x_2 \exists x_3 \forall x_4 \dots (x_1 x_2 x_3 x_4 \dots)\in S

Similarly, what it means for player II to have a winning strategy is:

\forall x_1 \exists x_2 \forall x_3 \exists x_4 \dots (x_1 x_2 x_3 x_4 \dots)\not\in S

If there were only finitely many quantifiers, then these would be real formulas. In fact, these formulas would simply be the negations of one another – classical mathematicians accept that \lnot\forall is equivalent to \exists\lnot, and vice versa. So determinacy is just a generalization of this claim to infinite sequences of quantifiers instead of finite ones, together with the assumption that for infinitely long sentences, either that sentence or its negation is true. But these are just natural generalizations of classical principles, so the classicist should accept AD rather than AC.

There are some worries of course. (Otherwise AD would be standard rather than AC!) This argument for AD looks like it should work even if the x_i aren’t restricted to being integers – however, I’m pretty sure that the version of AD where the x_i are allowed to be arbitrary ordinals is contradictory. Also, classical mathematicians have arguments for AC that rely on the fact that every possible set of objects exists, independently of any construction. Therefore, if there’s a bunch of elements (say, one from each of a collection of non-empty sets), then there is a set that contains exactly those elements, which is equivalent to AC.

But still, I think there’s something interesting to think about in this reversal.

Mathematical Existence

2 05 2007

Last night, I mentioned to my roommate (who’s a mathematician) the fact that some realist philosophers of mathematics have (perhaps in a tongue-in-cheek kind of way) used the infinitude of primes to argue for the existence of mathematical entities. After all, if there are infinitely many primes, then there are primes, and primes are numbers, so there are numbers.

As might be expected, he was unimpressed, and suggested there was some sort of ambiguity in these existence claims. He wanted to push the idea that mathematically speaking, all that it means for a statement to be true (in particular, an existence statement) is that it be provable from the relevant axioms. I was eventually able to argue him out of this position, but it was surprisingly harder than expected, and made major use of the Gödel and Löwenheim-Skolem-Tarski theorems.

The picture that is appealing to the mathematician is that we’re just talking about any old structure that satisfies the axioms, just as is the case when we talk about groups, graphs, or topological spaces. However, I pointed out that the relevant axioms for the natural numbers can’t just be Peano Arithmetic, because we’ll all also accept the arithmetical sentence Con(PA), once we’ve figured out what it’s saying. This generalizes to whatever set of axioms we accept for the natural numbers. As a result, Gödel’s theorem says that we don’t have any particular recursive set of axioms in mind.

To make things worse for the person who wants to think of the natural numbers as just given by some set of axioms, there’s also the fact that we only mean to count the members of the smallest model of PA, not any model. By the LST theorem, any first-order set of axioms that could possibly characterize the natural numbers will also have uncountable models. In order to pick out the ones we really mean, we either have to accept some mathematical class of pre-existing entities (the models of PA, so we can pick out the smallest one), or accept a second-order axiomatization, or give up any distinction between standard and non-standard natural numbers.

Of course, the same argument can be run for set theory, and at this point, accepting a second-order axiomatization causes trouble. Some philosophers might argue a way around this using plural quantification, but at least on its face, it seems that second-order quantification requires a prior commitment to sets, so it can’t underlie set theory. And we really don’t want to admit non-standard models of set theory either, because they give rise to non-standard models of arithmetic, which include certain infinite-sized members as “natural numbers”.

So it seems, for number theory and set theory at least, the mathematician has to admit that something other than just axioms underlies what they’re talking about. I’ve argued that the existence of and agreement on these axioms means that mathematicians don’t often have to think about what’s going on there. And from what I can see so far, the mathematician could take a platonist, fictionalist, or some other standpoint without any trouble. And in fact, most of the reasoning is going on at a high enough level that even if there’s some problem with the axioms (say, ZFC turns out to be inconsistent) this will have almost no effect on most other areas of mathematics, just as a discovery that quantum mechanics doesn’t correctly describe the behavior of subatomic particles will have relatively little effect on biology. But the mathematician can’t insist that existence in mathematics just means provability from the axioms – she can insist on some distinction between mathematical existence and ordinary physical existence, but it will be something like the minor distinction that the platonist recognizes, or possibly something like the distinction between real and fictional existence.

Is the Set Concept Intuitive?

10 12 2006

I’ve recently run into two arguments that there is no single intuition behind the notion of set. One was a talk by Tatiana Arrigoni in Berlin that I mentioned earlier, arguing that there may be multiple intuitions underlying set theory. Another was a talk by Jose Ferreiros at the Berkeley Logic Colloquium last week, arguing that there isn’t even one intuition underlying the notion of set.

Ferreiros started by pointing out that the 19th century founders of modern set theory (Cantor, Frege, and Dedekind) all seemed to be working with different notions. In particular, Cantor seemed to never consider a set as an element of a further set (his famous theorem was apparently proved about the set of functions on a given set, rather than using the modern notion of a power set). In addition, recognizing some paradoxes that would result from this theorem, he rejected the notion of a set of everything. His notion was fairly close to the intuitive notion of “collection” that we often introduce students to set theory with.

Frege (and Russell, following him) thought of sets as extensions of concepts, or as some way of dividing the universe into two parts. It’s a more “top-down” notion than the others, because of the sort of impredicativity involved in dividing a totality (including the set to be defined) into two parts. This notion is probably closest to our intuitive notion of “property” that the set concept seems to inherit some of from the separation axiom.

Dedekind had a slightly different conception from either. I didn’t quite get the details, but it sounds like it was the only notion that was relatively close to the modern “iterative conception” championed later by Gödel (among others).

The fact that our modern notion of set has inherited properties from each of these suggests that it is not a rigorization of any of these intuitive concepts. He also gave some other arguments, based on the fact that we have intuitions about the existence of “absolutely arbitrary” subsets of any given set, which aren’t successfully cashed out in any of our axioms, but are approximated by the axioms of separation and choice.

Size of a Set

9 12 2006

My good friend Dan Korman from UT Austin has now joined the blog at Close Range, starting with a provocative post arguing that the traditional analysis of “size of a set” as “equivalence class under bijections (one-one mappings)” is incorrect. I’d like to suggest that there is no fact of the matter as to what the “size of a set” is, and that our intuitions correspond to multiple distinct notions, rather than just one notion as Dan (and those who argue in favor of the bijection analysis) suggest.

I will start with an argument from the first page of Gödel’s “What is Cantor’s Continuum Problem?”

This question [Cantor’s continuum problem], of course, could arise only after the concept of “number” had been extended to infinite sets; hence it might be doubted if this extension can be effected in a uniquely determined manner … Cantor’s definition of infinite numbers really has this character of uniqueness. For whatever “number” as applied to infinite sets may mean, we certainly want it to have the property that the number of objects belonging to some class does not change if, leaving the objects the same, one changes in any way whatsoever their properties or mutual relations (e.g., their colors or their distribution in space). From this, however, it follows at once that two sets (at least two sets of changeable objects of the space-time world) will have the same cardinal number if their elements can be brought into a one-to-one correspondence, which is Cantor’s definition of equality between numbers. For if there exists such a correspondence for two sets A and B it is possible (at least theoretically) to change the properties and relations of each element of A into those of the corresponding element of B, whereby A is transformed into a set completely indistinguishable from B, hence of the same cardinal number.

There are two premises: 1. Changing the properties of the elements of a set doesn’t change the size of the set. 2. If two sets are “completely indistinguishable”, then they have the same size. One may also count a third premise: 3. The properties and relations of any set of physical objects may be changed arbitrarily.

Gödel doesn’t quite spell out what “completely indistinguishable” means. He can’t mean just that every sentence true of the elements of one set is true of the elements of the other (i.e., that they are elementarily equivalent as models), because the Löwenheim-Skolem Theorem guarantees that for every infinite set, we can find an elementarily equivalent model of every infinite cardinality. Thus, this interpretation of the term would collapse all infinite cardinalities, and not just the bijectable ones. (Presumably, the lack of a bijection between two sets should guarantee that they’re not the same size, so that any acceptably notion of size must be at least as fine-grained as the standard one, though it’s not absolutely clear why this is intuitive.)

But assuming that being “completely indistinguishable” makes sense, the argument goes through for sets of physical objects. To extend it to arbitrary sets, we either have to assume that the properties and relations of any set of objects can be changed arbitrarily (which seems false in the case of numbers), or else strengthen the first premise to say that replacing elements of a set by other objects doesn’t change the size of the set. This may be a reasonable strengthening, but it does seem dangerously close to assuming the bijection notion of size.

But if these concerns can be fixed, then it looks like the bijection notion is the correct notion of size for infinite sets. Even otherwise, it’s probably the coarsest-grained such notion that could possibly be correct.

Dan discusses another notion that is finer grained: “If S1 is a proper subset of S2, then S1 has fewer members than S2”. This notion conflicts with the bijection notion, but it doesn’t fully specify when two sets have the same size – after all, it doesn’t say anything about two distinct singletons, so we should specify, “any two singletons have the same size”. We can extend this to a consistent notion by saying in addition: “if the set of elements in S1 but not S2 is the same size as the set of elements in S2 but not S1, then S1 and S2 have the same size”. Now these three principles yield all the standard judgments about sizes of finite sets, but still leave many pairs of infinite sets incomparable in size. This may be a flaw, but it is a flaw that the bijection theory shares if one rejects the axiom of choice. I believe this is probably the finest-grained notion that is permissible.

I’d like to suggest that these two notions are both usable notions of size for sets, and the reason we seem to have conflicting intuitions is that our intuitive judgments don’t track one rather than the other. Just as there is no single intuitive notion of set, there is no single intuitive notion of size of a set.

And this shouldn’t be too surprising – mathematicians have come up with many other notions for size of sets that are finer-grained than cardinality, some of which have strong intuitive justifications like Dan’s. For sets of points in some sort of geometric space, the next coarsest-grained notion after cardinality is that of dimension – we say that a cube is in some sense “larger” than any rectangle, even though both have the same cardinality (equal to the continuum), because it is a higher dimension. (Of course, it’s quite unclear what our intuitions say about the relative size of a 1 mm cube and a 10 km square with absolutely no thickness.) To get even finer-grained, we can consider the notion of “measure”, which is a generalization of the notion of length/area/volume/whatever-is-appropriate-in-this-dimension. None of these notions is fine enough to say that an open interval is smaller than the corresponding closed interval (which Dan would like to say, I suppose), but they are all usable notions of size. There are some other problems some of them face, but they’re still all interesting.

I think Dan approaches this solution in the comments, but suggests that the honorific term “size” should be reserved for the finest-grained notion. I suggest instead that there’s no fact of the matter as to which should get that term, and set theorists have settled on cardinality as the basic one for technical and theoretical reasons, which are the only reasons that are applicable. However, topologists and analysts are more interested in the dimension and measure notions respectively, so these are also equally good notions. Whether there are enough technical advantages to the more fine-grained notion Dan suggests is unclear, but I don’t see that this should disqualify it from being an acceptable notion.

Set Theory and String Theory

29 10 2006

One remark that Penelope Maddy makes several times in Naturalism in Mathematics, is that if the indispensability argument was really important in justifying mathematics, then set theorists should be looking to debates over quantum gravity to settle questions of new axioms. Since this doesn’t seem to be happening, she infers that the indispensability argument can’t play the role Quine and Putnam (and perhaps her earlier book?) argued that it does.

However, since the awarding of the Fields Medal to the physicist Ed Witten in 1990, it’s not totally clear that Maddy is right about this. Set theorists certainly don’t pay much attention to string theory and related theories, but other mathematicians in low-dimensional topology and algebraic geometry seem to. I don’t know much about the details, but from what I understand, physicists have conjectured some deep and interesting connections between seemingly disparate areas of mathematics, in order to explain (or predict?) particular physical phenomena. These connections have rarely been rigorously proved, but they have stimulated mathematical research both in pursuing the analogies and attempting to prove them. Although the mathematicians often find the physicists’ work frustratingly imprecise and non-rigorous, once the analogies and connections have been suggested by physicists, mathematicians get very interested as well.

If hypothetically, one of these connections was to turn out to be independent of ZFC, I could imagine that there would at least be a certain camp among mathematicians that would take this as evidence for whatever large cardinal (or other) principle was needed to prove the connection. Set theorists themselves haven’t paid too much attention to these issues, because the interesting connections are in mathematical areas traditionally considered quite distant from set theory. Instead, they have traditionally looked at intra-set-theoretic considerations to justify large cardinals. But if it became plausible that some of these other debates would turn out to be connected, I’m sure they would start paying attention to the physics research, contrary to what Maddy suggests.

Banach-Tarski, the Axiom of Choice, and Notions of Size

1 10 2006

At the Australasian Association of Philosophy conference in July, someone raised a question about the Banach-Tarski paradox (that link actually explains and proves it quite nicely) that was supposed to be a problem for someone’s position. It was pointed out that this depended on the Axiom of Choice (and therefore perhaps wasn’t that important), but Jason Grossman said that this wasn’t quite true. In discussion afterwards, he sent me to this paper by Randall Dougherty and Michael Foreman, where the relevant results were published (back in 1994).

It turns out that this is not quite right – full Banach-Tarski-like results do in fact depend on the Axiom of Choice. (This can be seen because the Axiom of Determinacy, which is an alternative to choice that is believed to be consistent, entails that all sets of reals are Lebesgue measurable, which would mean that only sets of the same Lebesgue measure are equidecomposable.) However, Dougherty and Foreman do prove some similarly paradoxical-sounding results without choice, which is what Grossman was referring to.

Perhaps the most surprising result is that for any two closed balls in Rn, one can take a finite collection of disjoint open(!) subsets of the first and rearrange them to form a set whose closure is the other. One can even arrange things so that these sets are dense in each ball. Intuitively, there is a sense in which a dense open subset of a set is “almost equal” to that set (this is in Baire’s sense, rather than the senses of Borel or Lebesgue), and there is also a sense in which open sets are extremely well-behaved. Thus, the result shows that one can take two balls of vastly different sizes, and show that they are “almost the same” as one another, using well-behaved sets and well-behaved tranformations of those sets (rotations and translations). This sounds bad – but it may make weaken the case of Banach-Tarski against the Axiom of Choice, since you can get something almost as bad even without it. However, I think this isn’t quite right. Instead, it seems to suggest both that Baire’s sense of “almost all” just isn’t as good as Lebesgue’s, and perhaps also that open sets aren’t as nice as we thought.

Baire’s idea is that a dense open subset of a nice topological space is almost all of it. (We call the complement of a dense open subset, “nowhere dense”, because its closure contains no open interval – in fact it contains nothing outside of itself.) This was reinforced when he proved the Baire Category Theorem, which shows that the intersection of countably many dense open sets is itself always dense (anything including such a countable intersection is said to be “co-meager”, and its complement, a subset of the countable union of nowhere dense sets, is said to be “meager”). Traditionally, meager sets are thought of as “almost empty” and co-meager sets are thought to contain “almost all” points. In particular, the boundary of any open or closed set is meager. A set is said to have “the property of Baire” if its symmetric difference with some open set is meager – thus, it is “almost open”.

By contrast, Lebesgue introduced a much more precise notion of measure, assigning a real number to every measurable set, and saying that sets of measure 0 are “almost empty” and their complements contain “almost all” points. In the reals, this is done by declaring the measure of any open or closed interval to be the difference between the endpoints, and then saying that this measure is additive for disjoint unions of measurable sets. (In higher dimensions, one starts with rectangular products of open or closed intervals in the reals, saying that the measure of the product is the product of the measures, and continuing as before.) Finally, any subset of a measure 0 set is also declared to be measurable, and to have measure 0. It turns out that a set is Lebesgue measurable iff its symmetric difference with some open set has measure 0, so being measurable is Lebesgue’s counterpart of the property of Baire.

It has long been known that Baire’s and Lebesgue’s notions are different. It turns out that one can find dense open sets of arbitrarily small Lebesgue measure (consider the countable collection of basic open sets with rational endpoints, and take one open subset of each of measure at most x/2n, and their union will be a dense open subset of measure at most 2x), and thus the countable intersection of some of them will be a co-meager set of measure 0, and its complement will be a meager set of full measure.

Now, because Lebesgue measure is countably additive, and open sets are all Lebesgue measurable, we can see what must be going on in the Dougherty-Foreman case mentioned above. The dense open sets in the decomposition must have Lebesgue measure less than that of the small ball, but their boundary must be a meager set whose Lebesgue measure makes up for the difference. Thus, although we can rearrange parts of the small ball to be dense in the large one, we haven’t filled “almost all” of it in Lebesgue’s sense. If we stick with Lebesgue measure rather than Baire-influenced notions, we don’t get the paradox unless we use the Axiom of Choice. And with the Axiom of Choice, the sets needed for the decomposition are non-Lebesgue-measurable. And as a further blow against the Baire notion, the central result of Dougherty and Foreman is a Banach-Tarski-style decomposition where every set involved has the property of Baire. Thus, sets with the property of Baire aren’t really all that nice, since we can use them to decompose small sets into large ones.

At one point in the past, it looked like Baire and Lebesgue had come across two different but equally good notions of “almost equality” for sets of reals (and sets in other topological spaces as well). However, it looks like more recent developments in set theory have shown that Lebesgue’s is in general more useful. The Banach-Tarski paradox doesn’t occur without the Axiom of Choice (or at least, some axiom on the way towards it, since we need to construct non-Lebesgue-measurable sets), but with choice we can do it with sets with the property of Baire. Something seemingly nearly as bad occurs with open sets even without choice, but it turns out that it only looks bad when using Baire’s notion of almost equality. So the Banach-Tarski paradox remains approximately as paradoxical as it always was, though we have more understanding now of it through some of its relatives.

Etchemendy on Consequence

15 05 2006

I’m really not totally sure what to make of Etchemendy’s objections to Tarski’s account of consequence in The Concept of Logical Consequence – perhaps I shouldn’t admit that while I’m TAing a class that covers this book, and grading papers about it. In general, the particular points he makes seem largely right, but I’m not really sure how they add up to the supposed conclusion. I suppose this all means that at some point I should put in some more serious study of the book. But the middle of exam week may not be the time for that.

His objections basically seem to be that Tarski’s account is either extensionally inadequate (if the universe is finite) or adequate for purely coincidental reasons; and that it doesn’t seem to be able to guarantee the right modal and epistemic features.

The worry about finiteness runs as follows – Tarski says that a sentence is a logical truth iff there is no model in which it is false. If there are only finitely many objects (including sets and models and the like), then every model has a domain which is at most a subset of this finite set, so there is some finite size n that is not achieved. Thus, any sentence that says there are at most n objects must come out logically true. However, intuitively, this is just an empirical matter, and not one that is up to logic alone, so the account must be wrong. Even worse, sentences of this sort can be expressed in a language that doesn’t even involve quantifiers or identity, so we can’t blame this on some sort of error in identifying logical constants. (One might try to sneak out of this objection by pointing out that the sentences involved always have more than n symbols – so every sentence that would exist in such a case would get the “correct” logical truth-value. However, there are sentences that are true in every finite model, but not in all models, and these raise a similar problem.)

However, I think this isn’t really a terrible worry – Tarski’s account of consequence (like his account of truth) makes essential use of quantification over sets. Thus, anyone who’s even prepared to consider it as an account of consequence must be making use of some sort of set theory. But just about every set theory that has been proposed guarantees the existence of infinitely many objects (of some sort or another), so we don’t need to worry about finiteness. Etchemendy suggests that this is putting the cart before the horse, in making logical consequence depend on something distinctly non-logical. But perhaps this isn’t really bad – after all, Tarski didn’t claim that his definition was logically the same as the notion of consequence, but rather that it was conceptually the same. Just because the truth of set theory isn’t logical doesn’t mean that set theory isn’t a conceptual truth – and if some set of axioms guaranteeing the existence of infinitely many objects is conceptually necessary (as neo-logicists seem to claim), then Tarski’s account could be extensionally adequate as a matter of conceptual necessity, even if not of logical necessity.

As for the requisite epistemic and modal features, there might be a bit more worry here. After all, nothing about models seems to directly indicate anything modal or epistemic. However, it does seem eminently plausible that every way things (logically) could have been would be represented by some model. In fact, we can basically prove this result for finite possible worlds using an extremely weak set theory (only an empty set, pairing, and union are needed). It seems likely that the same would obtain among the actual sets for infinite possible worlds as well. However, ZFC doesn’t see to provide any natural way of extending this to all possible worlds – in fact, ZFC can prove that there is no model that correctly represents the actual world, because there are too many things to form a set! Fortunately, this problem doesn’t seem to arise for other set theories, like Quine’s NF, and certainly not for a Russellian type theory. And even in ZFC, the fact that Gödel could prove his completeness theorem provides some guide – any syntactically consistent set of sentences has a model, so that even if there is no model representing a particular logically possible world, there is at least a model satisfying exactly the same sentences, so that logical consequence judgements all come out right. But that’s a bit unsatisfying, seeing as how it makes the semantic notion of consequence depend on the syntactic one.

At any rate, it seems available to classical logicians to suggest that it is a matter of conceptual necessity that every way things could logically have been is adequately represented by the sets – and thus that Tarski’s account is correct and Etchemendy’s criticisms inconclusive. I’m pretty sure something like this has to be right.

Of course, the non-realist about mathematics has to give a different account of consequence (as Hartry Field tries to do starting with “On Conservativeness and Incompleteness”) – but this will just be part of paraphrasing away all the many uses we have for set theory. This one is remarkably central (especially given that linguists now suggest that something like it is at the root of all natural language semantics) and so it will be the important test case in a reduction. But the criticism will be substantially different from Etchemendy’s – before the paraphrase, the non-realist can still make the same arguments I’m suggesting above.

(Of course, if Etchemendy’s criticisms are right, they could themselves form the starting point for a useful dispensability argument for mathematical non-realism – if we don’t need sets for consequence, then the strongest indispensability consideration is gone, and we’re just left with the physical sciences, all of which seem to require something much weaker than full set theory.)

Tarski and Fraenkel-Mostowski

10 04 2006

I had a bit of a blogging hiatus last week because I was fairly busy with some talks. In particular, I gave a talk to the math graduate students about the Fraenkel-Mostowski method of proving the independence of the Axiom of Choice from a slight weakening of Zermelo-Fraenkel set theory. Coincidentally, Sol Feferman was in town giving the Tarski Lectures, and the one that day was on a topic with some similarities, namely Tarski’s notion of a logical constant.

The question of which symbols count as logical is an important one in many characterizations of the notion of logical consequence, so after discussing his notion, Tarski started work on a characterization of the logical symbols. The idea he hit upon is phrased in terms of a sort of Russellian type-theory, but it could probably be equally well formulated in something more like ZF set theory. Basically, the idea is that we start with some domain of basic objects, and then form sets of these objects, and sets of those sets, and so on. Now, we consider any permutation of the basic objects, and see what effect these permutations have on sets of various levels. Tarski’s idea was that the ones that are fixed under all permutations are the sets that can be adequately denoted by a logical symbol. For instance, the identity relation (consisting of all ordered pairs of a basic object and itself) is fixed under every permutation, as is its negation, the trivial relation that holds between no objects, and the relation that holds between any two objects. At a level one step higher, the sets of sets that are fixed include the set of all non-empty sets (which corresponds to the existential quantifier), the set consisting of just the empty set (which corresponds to the negated universal quantifier), the set consisting of all sets of cardinality k (which corresponds to a cardinality quantifier), and so on. So it seems like a pretty good match for the intuitive notion. Feferman went on to point out that this idea was further developed by a variety of people over later decades, and notably Vann McGee who advocated a slightly different characterization.

The Fraenkel-Mostowski method actually works fairly similarly. We start with a collection of basic objects (“urelements”) and construct a hierarchy of sets above them. The existence of urelements is a slight weakening of standard ZF set theory, which only allows a single object with no elements, namely the empty set. Thus, this theory is called ZFU. If we start with a model of ZFU with infinitely many urelements, then we can construct a model of ZFU that falsifies choice fairly easily. Basically, we consider all the permutations, and take just the sets that are fixed under “most” permutations, rather than the sets fixed under all of them, like in Tarski’s idea for the purely logical notions. To measure what “most” means, we say that a set is fixed by most permutations if we can choose finitely many urelements, such that any permutation fixing all of them also fixes the set in question. Of course, the finitely many urelements to be fixed will in general be different for each set in this class, which I will call “FB”, for the sets with “finite basis”.

It is straightforward to see that in FB, every set containing just urelements is either finite, or it contains all but finitely many urelements. But since the set of urelements is infinite, this is a violation of the Axiom of Choice, which is what we want. However, FB isn’t quite a model of ZFU – for instance, it includes the full powerset of the set of urelements, even though there may be sets of urelements that are not included among the sets with a finite basis. We want our model to be transitive – that is, any element of a set in the model is itself in the model. This condition is very useful for verifying the axioms of Extensionality and Foundation. So we define HFB (for “hereditarily FB”) to be the sets in FB whose elements are all in HFB – this is a recursive characterization of HFB that basically guarantee that its elements are in FB, as are their elements, and their elements, and so on. It doesn’t take too much work to prove that HFB is a model of ZFU (the hard axioms to check are powerset and replacement – some lemmas to use are the fact that any formula satisfied by a sequence of sets is also satisfied by their images under any permutation, and that a permutation applied to a set of finite basis gives another set with finite basis).

Thus, a very similar method gives Tarski’s account of the logical constants, and also a proof of the independence of Choice! Of course, both needed to be updated to deal with trouble cases, but it’s interesting to see that a similar start gives rise to two seemingly unrelated results.