An Economic Argument for a Mathematical Conclusion

27 04 2008

How valuable is an income stream that pays $1000 a year in perpetuity? Naively, one might suspect that since this stream will eventually pay out arbitrarily large amounts of money, it should be worth infinitely much. But of course, this is clearly not true - for a variety of reasons, future money is not as valuable as present money. (One reason economists focus on is the fact that present money can be invested and thus become a larger amount of future money. Another reason is that one may die at any point, and thus one may not live to be able to use the future money. Yet another reason is that one’s interests and desires gradually change, so one naturally cares less about one’s future self’s purchasing power as one’s current purchasing power.) Thus, there must be some sort of discount rate. For now, let’s make the simplifying assumption that the discount rate is constant over future years, so that money in any year from now into the future is worth 1.01 times the same amount of money a year later.

Then we can calculate mathematically that the present value of an income stream of $1000 a year in perpetuity is given by the sum \frac{1000}{1.01}+\frac{1000}{1.01^2}+\frac{1000}{1.01^3}\dots. Going through the work of summing this geometric series, we find that the present value is \frac{1000/1.01}{1-1/1.01}=\frac{1000}{1.01-1}=100,000. However, there is an easier way to calculate this present value that is purely economic. The argument is not mathematically rigorous, but there are probably economic assumptions that could be used to make it so. We know that physical intuition can often suggest mathematical calculations that can later be worked out in full rigor (consider things like the Kepler conjecture on sphere packing, or the work that led to Witten’s Fields Medal) but I’m suggesting here that the same can be true for economic intuition (though of course the mathematical calculation I’m after is much simpler).

The economic argument goes as follows. If money in any year is worth 1.01 times money in the next year, then in an efficient market, there would be investments one could make that pay an interest of 1% in each year. Investing $100,000 permanently in this and taking out the interest each year gives rise to this income stream, and thus one can fairly trade $100,000 to receive this perpetual income stream, so they must be equal in value. We don’t need to sum the series at all.

Now perhaps there is a sense in which the mathematical argument given above and the economic argument given below can be translated into one another, but it’s far from clear to me. Thus, it looks like at least sometimes, economic intuition can solve mathematical problems. People often talk about the “unreasonable effectiveness of mathematics in the sciences”, but here I think I have another example of the unreasonable effectiveness of the sciences in mathematics.




Updates

24 04 2008

I suppose it’s been about four months since I last updated here - partly that’s been because I was busy with the job search, and partly it’s because I’m still finishing up my dissertation. Anyway, I’m glad to announce that I’ll be taking a tenure-track position at USC starting next year. Additionally, I’ll be spending two semesters as a post-doc in the RSSS department of philosophy at ANU, most likely from June to December of 2008 and of 2009. Part of the reason why it took so long to sort out my job situation is that I’ve been trying to make sure these two positions will be compatible. (Part of the reason was also that the tenure-track offers I did receive were all offered to other people first, who turned them down.) At any rate I’m very excited to be affiliated with both of these institutions. The other job offers I had were also quite attractive, and it was very hard to turn them down.

I’ve missed a few things in the past few months that I should mention. My second book review was published, as was my first actual paper:
Review of Jody Azzouni’s Tracking Reason
“The Role of Axioms in Mathematics”
Plus, I also got a paper accepted to Mind, and a paper (with Mark Colyvan) accepted to The Australasian Journal of Logic!

Also, since I was tagged by Shawn, here’s the 5th, 6th, and 7th sentences of p. 123 of the book that happened to be closest to me when I read his post (which is Roger Penrose’s, The Road to Reality, which I’ve been using so far to refresh my multivariable and complex analysis, and hope to eventually learn a bit of physics from).

From the complex perspective, we see that 1/z is indeed a single function. The one place where the function ‘goes wrong’ in the complex plane is the origin z=0. If we remove this one point from the complex plane, we still get a connected region.

From now on I hope I’ll be back to more regular posting.




Probabilistic Causation in Hungary

20 12 2007

Budapest is a very nice city, and this sounds like an interesting program - I’m just not yet sure whether I can plan anything for that time period, or else I would certainly apply.

Course Dates: JULY 21 - AUGUST 1, 2008
Location: Central European University (CEU), Budapest, Hungary,
Detailed course description: http://www.sun.ceu.hu/causality

Faculty: Miklos Redei, Department of Philosophy, Logic and Scientific Method, London School of Economics, UK; Nancy Cartwright, London School of Economics and Political Science, UK; Damian Fennell, London School of Economics and Political Science, UK; Gabor Hofer-Szabo, King Sigismund College, Budapest, Hungary; Ferenc Huoranszki, Central European University, Budapest, Hungary
Laszlo E. Szabo, Eötvös University, Budapest, Hungary; Richard E. Neapolitan, Northeastern Illinois University

Target group: advanced graduate students, postdoctoral fellows, junior faculty and researchers in philosophy, physics, economics and computer science
Language of instruction: English
Tuition fee: EUR 500, financial aid is available.
The application deadline: February 14, 2008 (for scholarship places), April 30, 2008 (for fee-paying applications)
Online application: http://www.sun.ceu.hu/apply (attachments to be sent by email to causality@ceu.hu).

For further information queries can be directed to the SUN office by email (summeru@ceu.hu), via skype (ceu-sun) or telephone (00-36-1-327-3811).




APA Interview Scheduling, 2007

13 12 2007

I realized from looking at the Philosophy Job Market Wiki that I had been waiting for interviews to be announced and worrying about them not being announced substantially earlier than I should have been. So I figured it would be useful to compile statistics on the time-line of when interviews were scheduled. I’ll try to keep updating this as the season goes on. I was considering compiling something like this after the APA was done, but some comments in this thread on the Philosophy Job Market Blog suggested that there would be demand for it now. I think it’ll be more useful for people in future years than for this year (since by definition, the data isn’t available until after the interviews are announced).

Also, for those paying attention in future years note that the distribution may well vary from year to year, both because weekends will fall at different points in the semester and in relation to the date of the APA, and also because there will be a different number and selection of departments hiring, and disciplinary trends will be changing. (For instance, in future years, more or fewer departments may decide to skip the APA interviews, or it might become more standard to e-mail candidates rather than call them or whatever.) Also, in each case, I’m just counting lines on the Philosophy Job Market Wiki - I know that sometimes a single line represents two or more potential hires, or that a single department may have multiple lines on the wiki that don’t all end up representing different hires, but this is the easiest way to count. “Leiter Ranked” means Leiter 2006-2008 top 54 in the US, 15 in UK, 4 in Canada, and 4 in Australasia - I know this unfairly excludes universities in Ireland and continental Europe, and is a fairly arbitrary cutoff in each region that is included, but at least it’s publicly available.

TOTAL TENURE-TRACK JOBS - 278
TOTAL LEITER RANKED - 49
TOTAL NON-US - 30

Number of jobs that first scheduled an interview (phone, APA, or otherwise) or campus visit on a given date (rows are grouped by week):

DATE: by 12/2 12/3 12/4 12/5 12/6 12/7

TOTAL TT 20 1 7 10 7 12

LEITER RANKED 5 0 1 0 0 1

NON-US 4 0 0 0 0 0

DATE: 12/8-9 12/10 12/11 12/12 12/13 12/14

TOTAL TT 4 20 13 25 25 23

LEITER RANKED 0 7 2 6 7 5

NON-US 1 3 0 1 0 0

DATE: 12/15-16 12/17 12/18 12/19 12/20 12/21 12/22-23

TOTAL TT 5 16 13 12 8 2 1

LEITER RANKED 1 2 4 3 1 0 0

NON-US 0 0 2 0 1 0 0

Not entered in Wiki as of 12/24

TOTAL TT 75

LEITER RANKED 14

NON-US 22

If there are other categories that would be easy to track that people would like tracked, mention them in the comments. I might want to track liberal arts colleges, but I won’t necessarily recognize which names to track. I may well have missed a couple international jobs too, because the name wasn’t obviously Canadian.




Probability and Bayesian Epistemology

10 12 2007

From the last Carnival of Philosophy, I found a post by another Kenny about the relation between Bayesian epistemology and probability! He puts forward three views of what this relation might be:

Here are brief definitions of each view, and how each one relates subjective degrees of rational confidence to probabilities (I will explain in more depth later).

* (P) takes subjective degrees of rational confidence as primitive. There is no state space for degrees of rational confidence, because they aren’t probabilities.
* (KPW) takes subjective degrees of rational confidence to be actual probabilities over the state space of all epistemically possible worlds, where the epistemically possible worlds are formal constructions that may or may not be objectively possible.
* (LPW) takes subjective degrees of rational confidence to be actual probabilities over the state space of the subset of the really possible worlds which are epistemically accessible.

However, he seems to be focused on a very particular understanding of the word “probability” that might not quite be what I would mean by it. The very fact that he talks about a relation between rational degrees of confidence and probabilities suggests that he’s understanding the word differently from how I am.

My understanding of the word is that “probability” refers to any function from a Boolean algebra to the real numbers satisfying the following three properties: (1) it is never negative; (2) the tautology is assigned value 1; (3) finite additivity (that is, given two elements whose conjunction is the contradiction, the probability of their disjunction is the sum of their probabilities). I’d also be willing to apply the term “probability” in cases where instead of a Boolean algebra in the strict mathematical sense, one uses any structure where the terms “tautology”, “conjunction”, “contradiction”, and “disjunction” have a natural interpretation.

It seems that Kenny Pearce, by contrast, understands the term to require that the algebra be an algebra of sets over some state space, and that there be some objective fact about the probability values. If this interpretation is right, then I don’t think I’d quite take any of the positions he mentions. At any rate, I think I support something more like (KPW) than the others, where “actual probabilities” isn’t taken in any objective sense. In explaining this position, I think I can give answers to three questions he raises:

1. Why should we suppose that we can use the math of probability theory in dealing with degrees of rational confidence?
2. The math of probability theory is generally interpreted in terms of sets called state spaces, but, ex hypothesi, degrees of rational confidence, not being probabilities, have no state spaces. What, then, does the math mean?
3. Why should we suppose that when an occurrence has a well defined objective probability, our subjective degree of rational confidence should be assigned a value equal to its probability?

In response to the first question, the standard answer would be to refer to something like a Dutch book argument - degrees of rational confidence can be described by the mathematics of probability theory because if degrees of confidence couldn’t, then the agent would be subject to a certain loss from a set of bets they would be willing to take, and therefore would be irrational. (There’s some slipperiness here with generating the bets from the confidences, and concluding irrationality based on a collection of bets the agent may take individually, but I think this can be cleaned up.) There’s also a host of other arguments for something like this same conclusion (though Alan Hájek raises issues for them in his (forthcoming?) “Arguments For - Or Against? - Probabilism”). As Kenny Pearce notes, nothing about these arguments requires there to be a state space, so they don’t end up being probabilities in his sense (due to Kolmogorov), but they do seem to be probabilities in the sense I use (and Popper, and Borel, and others).

As for the second question, I think that there actually is a state space that is relevant for degrees of rational confidence, which is why I lean more towards something like what Kenny Pearce calls (KPW) rather than (P). The state space here would be the set of epistemic possibilities (whatever those are - I don’t really have a good theory of them, do you?). Despite my lack of an account of them, I think they do need to play a role. I think we can’t make very good sense of the notion of a degree of confidence in p, supposing q, without a set of possibilities that we can restrict to the q-possibilities. Also, these epistemic possibilities seem to play an important role in other aspects of epistemic modality, not just degree of belief. And most importantly, I think there’s a rational difference between having an rational confidence of 0 in p and actually being certain that p will not happen. When measuring the speed of light, there’s a difference between my attitude towards it being exactly 2.9980000000000001 x 108 m/s, and my attitude towards it between 3 m/s - I consider the former possible given what I know, and the latter not. However, since there is some interval around 2.998 x 108 m/s that I can’t rule out, and there are infinitely many such values that I am indifferent between, I can’t give any of them a positive value without either violating additivity or assigning values larger than 1 to certain disjunctions. So I propose that the state space contains infinitely many epistemic possibilities, and that my degree of confidence in certain sets of these possibilities is 0, even though the set is non-empty. (Of course, for the empty set, I trivially have confidence 0 in that set of possibilities.) So I think this aspect of the math actually applies quite well to degrees of confidence, though I’m willing to concede that many people will want to challenge this point, and I don’t think it’s as important as the point that degrees of confidence must be probabilities in something like the general sense I outlined earlier.

However, I don’t think such a state space comes with objectively correct probabilities to assign - after all, it’s infinite, and Bertrand’s Paradox shows how all sorts of troubles arise when we think that symmetries of an infinite space constrain probability assignments.

As for the third question, I’m not sure I agree with its premise. I’m not totally convinced that when there is a well-defined objective probability, we should match it with our degrees of confidence. Consider a fair coin that has just been tossed. There is some sense in which it had an objective probability of 1/2 of coming up heads, so this principle would suggest having degree of belief 1/2 in heads. But if I also know that this coin was one of 10 fair coins flipped at that point, 9 of which happened to come up heads, then (in the frequency sense, as opposed to the chance sense) there is also an objective probability of 9/10 of that coin being heads up, so this principle would suggest the contradictory degree of belief of .9. Maybe in this situation one of the two principles wins out (my guess would be the latter), but I don’t really know under what circumstances something like this should be the case. Of course, I also don’t really know what sorts of objective things count as “well-defined objective probabilities” - is it chances, frequencies, or something else? There are many well-defined objective things that obey the mathematics of probability, but it’s an interesting question which (if any) should be tracked by our degrees of confidence.

Kenny Pearce suggests that on the (KPW) theory of degrees of confidence, it’s the fact that “the worlds … divide more or less evenly” that makes us assign 1/6 to each of the propositions about the way the die might land up. I don’t think there’s such thing as an objective measure over this infinite state space, so we can’t even make sense of the worlds dividing more or less evenly. Thus, if there is some objective reason for the degrees of belief we assign, I don’t know what it is yet, but I don’t think it could be anything like what Kenny Pearce suggests in either (KPW) or (LPW). (Also, I don’t think (LPW) is even a viable candidate, because this is supposed to be a theory of degree of rational belief, and actual possibilities have almost nothing to do with rational epistemic possibilities - one could try to make a modified 2-dimensionalist version of this strategy, as Frank Jackson does, but I’m not convinced that this will work.)

I think that these degrees of confidence exist, and are actually often much more precise than we realize (there’s no reason we should have transparent access to exactly what our degrees of belief are), but they’re not constitutively tied to any sort of objective probability in the sense that Kenny Pearce was expecting for a relation between Bayesian epistemology and probability. These degrees of belief are themselves probabilities, just in a different interpretation than Kenny Pearce was specifically considering.




Foundations of Category Theory

1 12 2007

Yesterday, Solomon Feferman from Stanford was up at Berkeley to give the logic colloquium, and he had a very interesting discussion of foundations for category theory. I figured I’d write up the outlines of the basic theories he discussed, because mathematicians might well be interested in how this sort of thing could work. I’d also be interested in hearing what people who actually use category theory think of all this. The issues that came up are the reason why set theory suffices as a foundation for all mathematics before category theory, but has troubles here. The basic idea is that he wants some formal framework in which one can work with categories, in a way that the following four criteria are satisfied to a sufficient degree:

The Criteria

1. There should be a category of all groups (with homomorphisms), a category of all sets (with functions), a category of all topological spaces (with continuous maps), and a category of all categories (with functors).

2. If A and B are two categories, then there should be a category consisting of all functors from A to B, with morphisms given by natural transformations.

3. Standard set-theoretic operations used throughout mathematics should be possible - things like taking unions and intersections of a set of sets, constructing powersets and ordered pairs and n-tuples and the like.

4. The framework should be provably consistent from some standard set-theoretic background. (This criterion is obviously going to seem unnecessary for someone like Lawvere, who wants to use category theory as an alternate foundation for mathematics. For someone like that, the really interesting criterion will probably be #3, because they’ll have to translate away all talk that mathematicians normally have of sets in other terms. Maybe this has been done, but I’ve had a lot of conceptual difficulty trying to read Lawvere’s textbook treatment of this stuff. At any rate, I don’t think most mathematicians will find this any more congenial than working in ZFC directly. As a result, this sort of thing wasn’t what Feferman was talking about, but he did a good job clarifying the interplay of category theory and foundational work in the more usual frameworks.)

Most of the talk was spent describing several systems that allow for some degree of satisfaction of all these criteria, but it looks like no one yet has come up with a framework that allows for full satisfaction of all of them.

The Frameworks

Mac Lane’s small and large categories

This is probably the most familiar way of talking about these issues for mathematicians. In the background, we use a standard theory to talk about sets and proper classes (Bernays-Gödel in particular). This theory gives us the full power of ZFC when talking about sets, but also allows us to talk about proper classes whose elements are sets. On this picture a category is a class of objects, together with a class of morphisms between them, satisfying the standard axioms. If both of these classes are sets, then the category is said to be small, while if either one is a proper class, then the category is said to be large.

We can see how well it fits all the criteria:

1. There are categories of all groups, of all sets, and all topological spaces, but each of these categories is itself large. There is a category of all small categories, but this category obviously doesn’t have any of the previous categories (or itself) as an object. Thus, it makes sense to talk about the fundamental group operation as a functor from the category of topological spaces to the category of groups, but this functor can’t itself be seen as a morphism in the category of categories.

(Note that these categories might be seen to leave out some members. Once we can talk about proper classes, there are some “large” groups, sets, and topological spaces, just as there are large categories. An example of a large group is Conway’s surreal numbers, which form a field, but are a proper class, so they are a large group, and large ring, as well as a large field. So we might also think there’s a problem with the category of groups since it doesn’t tell us about morphisms from arbitrary groups into the surreal numbers.)

2. If A is a small category, then functors from A to B can be represented by sets of ordered pairs (in the usual way that functions are) so that there is in fact a category containing all these functors as objects. If B is small, then this category is small, and if B is large, then this category is large as well. However, if A is a large category, then functors would themselves be proper classes, and therefore can’t be members of classes, and therefore there is no category containing these functors. This restriction may or may not be problematic. (I don’t know enough about what mathematicians actually do with categories, so I don’t know if it’s important to have this functor category between large categories.)

3. Fortunately, since the background theory is built from a standard set theory, all standard set-theoretic constructions work just fine.

4. Bernays-Gödel set theory was constructed specifically to be a conservative extension of ZF. Therefore, it is consistent iff ZF is. And just as ZFC is conservative iff ZF is, Bernays-Gödel is consistent with the Axiom of Choice iff it is consistent by itself.

Grothendieck’s “axiom of universes”

This is another background I’ve heard of before. This framework basically extends Mac Lane’s hierarchy so that instead of just small and large categories, there are infinitely many levels of categories, each of which is “small enough” to be an object in a larger category.

The official technical details are as follows. We assume standard ZF set theory, and then add the axiom that every set is contained in a universe. A universe is just a transitive set (meaning that it contains everything contained in any of its members), which also contains the powerset of any of its members, and also contains the range of any definable function applied to any set inside the universe. Basically, this just means that a universe is a standard model of ZF. In this framework, a category is just a set of objects together with a set of morphisms, following the standard axioms (none of the proper class stuff that we get with Mac Lane).

1. Now, since the collection of all groups forms a proper class, there is no category of all groups. However, for any universe U, since this universe is a set, the collection of all groups contained in U is itself a set, so we get a category of all U-groups. Similarly, for any universe U there is a category of all U-sets, all U-topological spaces, and (importantly) all U-categories.

By the Axiom of Universes, we see that every set is contained in some universe. Since every category has some underlying set, this category is itself contained in some universe U, and thus is itself an object in the category of all U-categories. None of these categories contains itself, but no matter how big the categories are you’re talking about, there is some category of categories that contains everything that big. This lets you do lots of work that couldn’t be done in the Mac Lane framework, because on his framework, large categories aren’t objects in any category, while here we can always just go up one level. (However, Mac Lane does have the advantage that there is a single category containing all groups, while here no single category does.)

2. Again, if A and B are categories in some universe U (there must be some universe containing the set {A,B}, and this universe must contain both A and B) then every function from one to the other is itself in that universe U, and thus the category of functors from A to B is itself in U. This again is an improvement over the Mac Lane situation.

3. Since we’re using standard ZF set theory, this is straightforwardly satisfied.

4. There is a slight loss over Mac Lane’s framework here. The existence of a universe implies the consistency of ZF, because the universe is itself a model of ZF. Therefore, if the consistency of ZF implied the consistency of the Axiom of Universes, then the Axiom of Universes would prove its own consistency, which is impossible by Gödel’s theorem. Since this Axiom of Universes framework was implicitly used by Wiles in his proof of Fermat’s Last Theorem, we therefore don’t yet know that ZFC is strong enough to prove the result.

However, the situation is not so bad. The existence of a universe containing a set S is just the same thing as the existence of an inaccessible cardinal with higher cardinality than S. Therefore, the Axiom of Universes is equivalent to the existence of unboundedly large inaccessible cardinals. This is provably consistent if we assume the existence of any single large cardinal that is even larger (such as a Mahlo cardinal, or a measurable cardinal), so set theorists are basically just as certain that this theory is consistent as they are of ZFC. You might have further worries about whether these cardinals actually exist (so that the system has no false premises, and therefore leads only to true conclusions), but those are the sorts of worries ordinary mathematicians like to ignore.

Feferman’s 1969 system

It turns out that Grothendieck’s system can be brought down in strength. Instead of using the Axiom of Universes, we just add a bunch of terms U1, U2, and so on (through all the ordinals, if necessary), and add some axioms. For each proposition P expressible in the language of set theory, add the axiom:
PU_i iff P
where PU_i is the same as P, with all quantifiers explicitly restricted to range only over Ui. This has the effect of making each set Ui “just like” the entire universe. Thus, we can just formulate Grothendieck’s system here. There are some slight differences: Grothendieck’s smallest universe satisfies the claim “there are no universes”, while other universes don’t - in this system, any two of these universes satisfy exactly the same sentences. But in general, those differences make this system better, because each universe represents the whole universe better than Grothendieck’s did.

However, there’s a nice advantage - this system is apparently equiconsistent with ZFC, rather than requiring the existence of additional inaccessibles. Thus, proving a theorem in this framework uses less strength than Grothendieck’s system does. I don’t know if the two systems are quite similar enough to translate the proof of FLT into this one, but that would bring down the strength needed quite a bit. So in some sense, mathematicians probably should use this system, rather than Grothendieck’s. There are still the limitations that Grothendieck has on the first two conditions, but the third and fourth are much better satisfied.

Feferman’s 1974 system using Quine’s “New Foundations”

This was the system Feferman spent the most time discussing. This one is also the most confusing, because it uses Quine’s system NF instead of a more standard set theory. The basic way that this system works is that instead of the complicated set of axioms of ZFC, we just have two very intuitive axioms. One is the axiom of extensionality, which says that there are no two distinct sets with exactly the same members. The other is a restriced axiom of comprehension. The basic axiom of comprehension just says that for every property, there is a set consisting of all things that satisfy that property. But as Russell pointed out to Frege in 1902, this full axiom is inconsistent, because it leads to Russell’s paradox of the set of all sets that don’t contain themselves. In ZFC this paradox is avoided by using several axioms to prove the existence of sets defined by various types of properties. Quine decided instead to avoid this (and related) paradoxes by restricting the types of properties that can be used to define sets. The only properties he allowed were ones that could be written correctly in a theory of types. In particular, each variable in the formula could be assigned a number, such that in x=y, x and y get the same number, while in x\in y the number assigned to y is exactly 1 greater than the number assigned to x. This prevents Russell’s paradox, because the formula \lnot(x\in x) can’t be assigned numbers in this way.

The nice thing about NF is that it allows sets to contain themselves, and it allows there to be a set of all sets (this set is just defined as the set of all things satisfying x=x). Of course, there are some very messy things that go on with the theory of cardinalities, because the powerset of the universe has to be smaller than the universe itself. There are also messy things because the standard definitions of ordered pairs and powersets and sets of functions from one set to another need to use formulas that aren’t “stratified” in the appropriate way. Feferman discussed some ways of fixing this so it could all work. A further difficulty arises in that so far no one yet knows what sort of system would be strong enough to prove that NF is consistent. However, if we allow for many objects that aren’t sets (this requires weakening the axiom of extensionality so that it only applies to things with at least one element) it turns out that this system is actually provably weaker than ZFC. (In particular, it can be proven to be equiconsistent with Zermelo set theory.) We can then define categories to be given by a set of objects and a set of morphisms.

So now we can see how this theory stands up to the criteria.

1. Since there is a set of all sets, and sets can contain themselves, it turns out that in this theory we really do get a category of all groups, a category of all sets, a category of all topological spaces, and a category of all categories that really contains itself as an object! This is the only theory we’ve listed on which this really works, and we really get all categories into a single category.

2. Again, as long as A and B are categories, we really do get a category of all functors from A to B, with all the natural transformations as morphisms (I think).

3. Here is where we really run into trouble. We’ve patched things up so that we can take unions and intersections and products and powersets and the like. However, the Axiom of Choice is provably false in this theory. Additionally, sets of functions sometimes have bad problems. Feferman said in particular that the Yoneda Lemma can’t be proven in this system, as well as some other standard things that mathematicians want to use. I don’t really know what the problems are, but these seem bad. The problem is that no one yet really understands NF well enough to know how to fix these things. Perhaps for the sake of category theory, this would be worth doing. But, Dana Scott pointed out after the talk that NF has a history of destroying careers, and it’s the only foundational-type system whose full consistency strength still hasn’t been decided, and it’s been around for 70 years.

4. As mentioned above, allowing basic elements that aren’t sets makes this theory equiconsistent with Zermelo set theory, but this consistency proof is also part of the weakness of the theory, since it can’t do as much with functions as standard set theories. If we managed to fix that part, this would presumably blow up the consistency strength much higher than Grothendieck’s system.

Conclusion

Thus, there’s a bunch of interesting ways out there to get something like what mathematicians want from category theory. However, none of these systems really gets everything that we want. The ones that we know are consistent, and we know have enough set-theoretic power for everyday mathematics, we also know can’t really talk about the category of all categories or the category of all groups. There are various replacement categories around (like the category of all U-categories, or the category of all small groups), and for many purposes these are enough. But they don’t quite get the conceptual ideas right. (One could level the same criticism directly at the set theories though - none of them really gets a set of all sets, or a class of all classes, except for the poorly understood New Foundations.)

This might motivate some further work by category theorists on foundations, and in particular something like Lawvere’s program, though his program requires a radical shift in understanding of the notions of set, function, collection, individual, and the like, so it doesn’t seem like it would be much more congenial to most mathematicians for fully formal work.

Of course, most mathematicians don’t need to do this stuff fully formally, so the situation is generally comfortable. But it’s nice to see where foundational work in set theory really could impact the lives of ordinary mathematicians.




Choice and Determinacy

15 11 2007

This is something I thought about a couple years ago when I was giving a talk to the math grad students about the Axiom of Choice and the Axiom of Determinacy. I never got around to writing up though, so I figured I’d put it here. My contention is going to be that (contrary to standard assumptions) classical mathematicians should accept Determinacy rather than Choice, and that intuitionists should actually support Choice, even though they have traditionally been some of its primary enemies.

The Axiom of Choice (AC) says that for any collection S of sets, if \forall (x\in S)\exists y(y\in x) then \exists f\forall (x\in S)(f(x)\in x). That is, for any collection of non-empty sets, there is a function that assigns an element of each set to that set.

The Axiom of Determinacy (AD) is slightly more complicated to explain. Given a set S of countable sequences of 0 and 1, we can define a game as follows - two players take turns naming either 0 or 1, and if the countable sequence produced is a member of S, then player I wins, and if not, then player II wins. AD says that for any game of this form, one of the two players has a winning strategy (i.e., a function from states that the game could be in on their turn to possible moves on that turn, such that for any sequence of moves the opponent could make, the eventual outcome is a win for this player).

It turns out that AD and AC are incompatible with one another, given the standard axioms of set theory. In particular, there is a standard proof (using AC) that some sets of real numbers are unmeasurable by Lebesgue measure, while AD actually entails that all sets of reals are Lebesgue measurable. (AD has all sorts of other nice consequences for sets of reals as well, which is why set theorists are interested in it. It basically avoids all the pathologies created by AC.) So mathematicians have to choose at most one of the two to accept.

Traditionally, intuitionists or constructivists have rejected AC, because it asserts the existence of a function that is not constructively given in any way. Even ordinary mathematicians realize that whenever AC is needed for a proof, the result is non-constructive. However, I think that intuitionists actually should accept AC. The reason is that an intuitionist shouldn’t accept a universal claim unless there’s a constructive (or uniform) way to prove every instance of that claim. Thus, if they accept the claim that \forall (x\in S)\exists y(y\in x), then it can only be because there is a constructive proof of it. But such a proof necessarily gives a function to satisfy \exists f\forall (x\in S)(f(x)\in x). Thus, although AC looks like it gives non-constructive existence proofs, it only does so because one has accepted universal claims that weren’t uniformly provable - when restricted to uniform constructions, it’s just a triviality, so the intuitionist should accept it. (The intuitionist avoids the pathological choice constructions by just denying that certain collections of sets were really uniformly proven to be non-empty.)

For the classical mathematician however, who rejects the intuitionist assumptions that proofs should be constructive, this argument for AC is unavailable. Additionally, there is a sort of argument that should support AD. Recall that AD says that for any two-player game with countably many turns where the players take turns naming 0 or 1 (or generally integers - it turns out this general version is a consequence of the 0/1 version), one of the players has a winning strategy. If the variables x_i are all understood to be restricted to {0,1}, then we can express what it means for the player I to have a winning strategy as follows:

\exists x_1 \forall x_2 \exists x_3 \forall x_4 \dots (x_1 x_2 x_3 x_4 \dots)\in S

Similarly, what it means for player II to have a winning strategy is:

\forall x_1 \exists x_2 \forall x_3 \exists x_4 \dots (x_1 x_2 x_3 x_4 \dots)\not\in S

If there were only finitely many quantifiers, then these would be real formulas. In fact, these formulas would simply be the negations of one another - classical mathematicians accept that \lnot\forall is equivalent to \exists\lnot, and vice versa. So determinacy is just a generalization of this claim to infinite sequences of quantifiers instead of finite ones, together with the assumption that for infinitely long sentences, either that sentence or its negation is true. But these are just natural generalizations of classical principles, so the classicist should accept AD rather than AC.

There are some worries of course. (Otherwise AD would be standard rather than AC!) This argument for AD looks like it should work even if the x_i aren’t restricted to being integers - however, I’m pretty sure that the version of AD where the x_i are allowed to be arbitrary ordinals is contradictory. Also, classical mathematicians have arguments for AC that rely on the fact that every possible set of objects exists, independently of any construction. Therefore, if there’s a bunch of elements (say, one from each of a collection of non-empty sets), then there is a set that contains exactly those elements, which is equivalent to AC.

But still, I think there’s something interesting to think about in this reversal.




Putnam on Indispensability

29 10 2007

I mostly wrote this post three weeks ago, when Hilary Putnam was at Berkeley to give the three Townsend lectures, but I didn’t get around to finalizing it until just now.

Putnam gave three talks, which were all quite interesting. The first discussed his own type of realism, and tried to counteract false impressions people have had of his views changing drastically with regards to this question over time. (He admitted that they had changed at a few points, but several other apparent changes were just due to some poor choice of terminology on his part. In particular, he had used the phrases “internal realism” and “scientific realism” in a few different ways by accident.)

The second discussed the Lucas/Penrose arguments trying to use Gödel’s theorems to undercut mechanism about the mind - he claimed that although these arguments couldn’t really work, there is an interesting modification that should undercut Chomsky’s thesis that there is a shared innate “scientific competence” that underlies the notion of scientific justification. If there were such an algorithm, then we could never recognize that algorithm as the true one - but since this thesis has to be an idealization (because of the infinities involved) it doesn’t make much sense to argue that our competence is represented by a Turing machine without our being able to say which one. However, I wonder if there’s still some sense to be made of the question - perhaps the idealization says that our scientific competence is given by some arithmetically definable set. But in that case, it is an interesting question whether the best such set is itself Turing computable (even though we can’t know which Turing machine gives it), or more complicated.

The third discussed his picture of “mathematics as modality”. What I found most interesting about this talk was his discussion of the so-called “Quine-Putnam indispensability argument”. He pointed out that when he gave the argument, it had a very different form and conclusion from Quine’s - while Quine used it as an argument for “realism in ontology” about mathematics (that is, that mathematical objects actually exist), Putnam claimed merely that it established “realism in truth-value”. Putnam’s argument was intended to show that there is a serious tension between scientific realism and verificationism about mathematics, on which mathematical claims don’t have truth-values unless they can be proven or disproven. The scientific realist says that there are some sets of equations (say, equations of motion, or wave-equations) that correctly describe the physical world. Now, if these equations have enough complexity (I suppose a three-body system under Newtonian gravitation suffices), then given the truth of certain initial conditions, there will be some interval (a,b) and some time t such that the question of whether one of the objects is in that interval or not at that time is undecidable. But if there is a fact of the matter for the physical claim, and the equations correctly describe the physical world, then there must be a fact of the matter about the mathematical claim, despite its being undecidable.

Putnam took care to point out that this only establishes that there is a fact of the matter about the mathematics, and not that the mathematical objects actually exist. Also, he took care to point out that he was not trying to make an epistemological point - nothing in this argument establishes that this is how we know the truth-values of mathematical claims, just that there must be such truth values. Since Quine concluded that mathematical claims must be about something (because of his picture of ontology as being given by the existential commitments of the best theory of the world), he also concluded that this is how we know which mathematical axioms are true.




Jaynes on the Indifference Principle

24 09 2007

I’ve started reading Jaynes’ book on probability theory, to get a better sense of how objective Bayesians think about things. One thing I found interesting (and a bit frustrating) was his argument for the “indifference principle”, stating that, conditional on background information that says nothing about possibility A without also saying it about possibility B, A and B must have the same probability.

The argument for this principle is quite interesting. He starts with the premise that a rational agent (or “robot”, as he often calls it) must assign probabilities to outcomes just based on the information about them, and that the probabilities should be the same in situations with identical information. Thus, if there are two propositions about which the information says nothing different, we can interchange them and end up in an identical situation to how we started, so the probabilities assigned must be the same. It’s a nice little argument, but I think it relies on a missing premise, which states that given any background information, there is a set of probabilities that it is uniquely right to assign - if many probability assignments are all allowed (as most subjective Bayesians will say), then this argument won’t entail that they all have to obey the indifference principle, as long as every permissible assignment with one asymmetry has a corresponding permissible assignment with the other asymmetry.

What makes me more suspicious about this indifference principle is how Jaynes actually goes on to use it. He says that using it requires the background information about the different propositions to actually be identical, but his very first use of it violates this condition!

Consider the traditional ‘Bernoulli urn’ of probability theory; ours is known to contain ten balls of identical size and weight, labeled {1,2,…,10}. Three balls (numbers 4, 6, 7) are black, the other seven are white. We are to shake the urn and draw one ball blindfolded. The background information … consists of the statements in the last two sentences. What is the probability that we draw a black one? (p. 42)

Of course, he goes on to say that the probability is 3/10 (which is obviously the “right” answer in some sense), because “the background information is indifferent to these ten possibilities”, so each ball has probability 1/10 of being drawn, and we can add the three chances for a black ball, since the background information entails that they are mutually exclusive events.

However, it looks to me like this is a mis-application of the principle, as he has stated it. The background information is explicitly not indifferent to the ten possibilities - it says that three of the balls are black and seven are white. A strict use of the indifference principle will say that balls 1,2,3,5,8,9,10 are all equally likely, and balls 4,6,7 are equally likely, but there’s no obvious way to apply the indifference principle to compare possibilities from one set and possibilities from the other. To see why this is the case, consider the following example, which is identical in terms of information content, but gives rise to an intuition other than 3/10:

our cabinet is known to contain ten balls of identical size and weight, labeled {1,2,…,10}. Three balls (numbers 4, 6, 7) are in the black drawer, the other seven are in the white drawer. We are to spin the cabinet and draw one ball blindfolded. The background information consists of the statements in the last two sentences. What is the probability that we draw one from the black drawer?

Unless our information includes something about how drawers and urns and paint and the like behave physically, there is no distinguishing between these two set-ups. However, it would seem quite odd to assign probability 3/10 in the latter set-up of drawing a ball from the black drawer - a better answer (if there is a right answer) seems like 1/2. But Jaynes seems to explicitly state that there is no information about drawers and urns in the background, since he says “the background information consists of the statements in the last two sentences”. (Something like this is exactly what changes between classical and quantum statistics of particle arrangements, so this is a relevant worry if we want to apply this objective Bayesianism to physics.)

Another way to get his answer would be to first consider the set-up where we’re told there are ten balls in the urn, and told their numbers, but not told anything about their colors. Now we see that the probability of drawing one of the balls 4,6,7 is 3/10, so when we learn that these three are black and the others are white, we conclude that the probability of getting a black ball is 3/10.

But this relies on the supposition that telling us the color of the balls has no effect on our rational degree of belief that any ball is drawn. Intuitively this seems right, but that’s only because we know how color behaves in the physical world - if it had been the size or shape, this would have been less clear, and properties about location or stickiness or solidity or whatever should clearly have changed the probabilities. Without this background information explicitly included, this update can’t work.

Additionally, there’s another way to reach this set-up from a slightly smaller set of background information. If we first just say that there is an urn with some balls in it, some of which are black and some of which are white, then the indifference principle would entail that the rational degree of belief in either black or white should be 1/2. But upon learning precisely which balls are black or white, we should somehow update our probabilities in a way that changes things - but how precisely to do this is left unspecified by the indifference principle.

So Jaynes must be implicitly appealing to some extra principles here in order to get the intuitive answers, unless he thinks the problems he set up implicitly contain more information than he has stated. If so, then he won’t be able to apply this objectively in actual physical scenarios where this background information isn’t known (which is why the experiment is being performed). This is no problem for a subjective Bayesian, because she doesn’t claim that an agent has no further information, or that there is a unique probability value that every rational agent must assign in this situation. It’s also no problem for someone who takes either a frequency or chance view of probability, since in any actual physical set-up we can assume that those numbers are well-defined, even though the agent has no access to them. (This causes problems for using frequency or chance as the sole basis of statistical inference, but that’s a different worry.) The situation seems uniquely troubling for the objective Bayesian.




Math Links Roundup

16 09 2007

First, I’ll mention that I’ve updated my blogroll - there’s been a real burst in math blogs over the summer, at least in part instigated by my friends at the Secret Blogging Seminar, but also by the spurt of Fields Medalists with blogs. (Are we up to 10% of the total number now?) I’ve also added a few philosophy blogs that I’ve been reading for a while, and a couple that I should have been reading, but of course I’m sure I’m missing others.

Anyway, there’s new math job search gossip stuff going on on the web - I think the discussion on that post is interesting and relevant across disciplines for people trying to figure out whether this is generally a good thing or not.

Tim Gowers discusses the way logarithms and other abstract things should be taught. He advocates a way that’s a bit more formalist than some others suggest, but it sounds reasonable to me. There’s also interesting discussion of formalism there in the comments, though some of it sounds more like structuralism to me. See for example Terence Tao’s comment, “I guess there is a fundamental transition in mathematical learning when one realises that what mathematical objects are (and how they are constructed) may be less important than what mathematical objects do (e.g. what properties they obey).”

Also, a discussion about the Axiom of Choice at The Everything Seminar (I may add that one to my links later too), focusing on a puzzle I first heard from my friend Lukas Biewald. There’s interesting discussion in the comments that reveals implicit ideas about platonism and formalism among mathematicians. I think the anti-platonist majority there should be a bit more careful though, because similar issues apply in arithmetic, thanks to Gödel’s results. I think we should be much more hesitant to say that the natural numbers are just something we make up than they are with the universe of ZFC (or a topos, or whatever), as I mentioned before.