Computer Proofs Give A Priori Knowledge

30 06 2008

I just read a very interesting paper by Tyler Burge on computer proofs. (“Computer Proofs, Apriori Knowledge, and Other Minds”, Phil. Perspectives 12, 1998 ) I suspect that many mathematicians will find the paper quite interesting, as well as philosophers. His major claim is that, contrary to most assumptions, computer proofs can in fact give mathematicians a priori knowledge of their theorems.

An important question here is just what it means for knowledge to be a priori. Burge defines the notion as just stating that the knowledge doesn’t depend for its justification on any sensory experience – however, he allows that a priori knowledge may depend for its possibility on sensory experience. This account allows for the knowledge that red is a color to be a priori, even though having this knowledge requires having sensory experience of red in order to have the concepts required to even formulate the idea. Burge’s main goal here is to defend a sort of rationalism, which is the claim that a lot of a priori knowledge is possible, so it’s somewhat important that his account of the a priori is broader than one might initially think. One might worry that this waters down the notion of the a priori so much as to make this form of rationalism uninteresting, but I think it still gets at some interesting distinctions. For instance, his account will end up saying that almost all mathematical knowledge is a priori (or at least, can be) while very little knowledge in the other sciences can be. This may be very interesting for those wanting to claim a principled difference between mathematics and the other sciences. (For those of you familiar with the talk I’ve been giving recently about probabilistic proofs, I suspect that in addition to a priority, the notion I call “transferability” gives mathematics a lot of its specific character, but that’s a different issue.)

The biggest premise that Burge asks us to grant is that most ordinary mathematical knowledge that doesn’t rely on computer proofs is itself a priori. In order to grant this, we have to grant that testimony can preserve a priority, since very many mathematical proofs depend on theorems or lemmas that the author has never worked through, but believes just on the basis of testimony (having skimmed a proof, or even just read a result published in another paper). For an extreme instance, consider the classification of finite simple groups – no one has worked through all the steps in the proof, yet it seems at least plausible that our knowledge of the result is a priori in some interesting sense. The sense that Burge suggests this is is that although a mathematician may depend on testimony for her knowledge of the theorem, the actual steps in the justification of the result were a priori for whoever carried them out.

Burge needs to do a lot of work to suggest that this transfer of knowledge through testimony can preserve a priority – he ends up showing that we can often get a priori justification for the claim that there are other minds by the same means. He suggests that the very fact that some part of the world appears to express a propositional content gives us some defeasible a priori reason to believe the content of that claim, and also to believe that some mind somewhere is responsible for that content, even if at some remove. (That is, although books and computers can display things expressing propositional contents despite lacking minds themselves, the only reason they normally succeed in doing this is because some mind somewhere gave them this capacity, either by directly putting the symbols on them, or causing them to shuffle symbols in intelligible ways. Although you often get significant and meaningful patterns in nature, it’s exceedingly rare that something natural appears to express a specific proposition. See Grice for more on this.)

Once we’ve got this idea that testimony can preserve a priority, it becomes more plausible to think that computer proofs can generate a priori knowledge. However, he still has to go through a lot of work to argue that we have an undefeated warrant for believing the claims of the computer. Clearly, in many cases where someone utters a sentence, we have strong reason to doubt the truth of that sentence, unless we have positive reason to believe that the person is a very skilled mathematician. In the case of something like Fermat’s Last Theorem, it seems that even that is insufficient (after all, even Fermat’s word on the theorem really doesn’t seem like sufficient grounds for knowledge). Burge needs to do some fancy footwork to argue that the means by which we build up our trust in a source of utterances can itself be a priori, since it only depends on success of apparent utterances, and not on the fact that the source of the utterances really is as it appears to be. (It doesn’t matter to me whether Andrew Wiles (whom Burge embarrassingly refers to as Michael Wiles!) is a real person, a computer, or a simulation in the Matrix – he’s done enough to prove that he’s a reliable source of knowledge of complicated mathematical statements of certain forms.)

I think in the end, Burge spends a lot of time focusing on the wrong sorts of support for the reliability of a person or computer who claims to have proved something difficult. He mostly focuses on the fact that this source has gotten many other difficult statements correct before. I think in actual mathematical practice the more important criterion is that the outline of the strategy used looks much more promising than previous attempts at the problem, and the source has given good indication of being able to carry out particular sub-parts of the proof. Burge does deal with this justification, but in not as central a way as might be desired.

So I think Burge has come up with some interesting arguments that computer proof still preserves the fact that most mathematical knowledge is a priori, even though I think he makes some mathematical errors in focus and about particular claims of mathematical history. I think his defense of computer proof also still allows for the fact that other mathematical arguments (like DNA computing, for instance) really don’t preserve a priority. After all, in these cases, the computer isn’t going through something like a reasoning process, but rather is doing something more like an observation of an experiment. The way that most ordinary computers process data still shares abstract features of reasoning that are important for a notion of the a priori, in ways that DNA computing don’t. (If we didn’t grant this, then it might seem that there’s no such thing as any a priori knowledge, since we always in some sense rely on the physical reliability of our own brains – he gives some good arguments for why we should dismiss this worry.)

I think this sort of epistemology of mathematics is probably of much more practical interest for mathematicians than the standard questions of ontology and logic that more traditional philosophy of mathematics deals with.

Advertisements




An Economic Argument for a Mathematical Conclusion

27 04 2008

How valuable is an income stream that pays $1000 a year in perpetuity? Naively, one might suspect that since this stream will eventually pay out arbitrarily large amounts of money, it should be worth infinitely much. But of course, this is clearly not true – for a variety of reasons, future money is not as valuable as present money. (One reason economists focus on is the fact that present money can be invested and thus become a larger amount of future money. Another reason is that one may die at any point, and thus one may not live to be able to use the future money. Yet another reason is that one’s interests and desires gradually change, so one naturally cares less about one’s future self’s purchasing power as one’s current purchasing power.) Thus, there must be some sort of discount rate. For now, let’s make the simplifying assumption that the discount rate is constant over future years, so that money in any year from now into the future is worth 1.01 times the same amount of money a year later.

Then we can calculate mathematically that the present value of an income stream of $1000 a year in perpetuity is given by the sum \frac{1000}{1.01}+\frac{1000}{1.01^2}+\frac{1000}{1.01^3}\dots. Going through the work of summing this geometric series, we find that the present value is \frac{1000/1.01}{1-1/1.01}=\frac{1000}{1.01-1}=100,000. However, there is an easier way to calculate this present value that is purely economic. The argument is not mathematically rigorous, but there are probably economic assumptions that could be used to make it so. We know that physical intuition can often suggest mathematical calculations that can later be worked out in full rigor (consider things like the Kepler conjecture on sphere packing, or the work that led to Witten’s Fields Medal) but I’m suggesting here that the same can be true for economic intuition (though of course the mathematical calculation I’m after is much simpler).

The economic argument goes as follows. If money in any year is worth 1.01 times money in the next year, then in an efficient market, there would be investments one could make that pay an interest of 1% in each year. Investing $100,000 permanently in this and taking out the interest each year gives rise to this income stream, and thus one can fairly trade $100,000 to receive this perpetual income stream, so they must be equal in value. We don’t need to sum the series at all.

Now perhaps there is a sense in which the mathematical argument given above and the economic argument given below can be translated into one another, but it’s far from clear to me. Thus, it looks like at least sometimes, economic intuition can solve mathematical problems. People often talk about the “unreasonable effectiveness of mathematics in the sciences”, but here I think I have another example of the unreasonable effectiveness of the sciences in mathematics.





Updates

24 04 2008

I suppose it’s been about four months since I last updated here – partly that’s been because I was busy with the job search, and partly it’s because I’m still finishing up my dissertation. Anyway, I’m glad to announce that I’ll be taking a tenure-track position at USC starting next year. Additionally, I’ll be spending two semesters as a post-doc in the RSSS department of philosophy at ANU, most likely from June to December of 2008 and of 2009. Part of the reason why it took so long to sort out my job situation is that I’ve been trying to make sure these two positions will be compatible. (Part of the reason was also that the tenure-track offers I did receive were all offered to other people first, who turned them down.) At any rate I’m very excited to be affiliated with both of these institutions. The other job offers I had were also quite attractive, and it was very hard to turn them down.

I’ve missed a few things in the past few months that I should mention. My second book review was published, as was my first actual paper:
Review of Jody Azzouni’s Tracking Reason
“The Role of Axioms in Mathematics”
Plus, I also got a paper accepted to Mind, and a paper (with Mark Colyvan) accepted to The Australasian Journal of Logic!

Also, since I was tagged by Shawn, here’s the 5th, 6th, and 7th sentences of p. 123 of the book that happened to be closest to me when I read his post (which is Roger Penrose’s, The Road to Reality, which I’ve been using so far to refresh my multivariable and complex analysis, and hope to eventually learn a bit of physics from).

From the complex perspective, we see that 1/z is indeed a single function. The one place where the function ‘goes wrong’ in the complex plane is the origin z=0. If we remove this one point from the complex plane, we still get a connected region.

From now on I hope I’ll be back to more regular posting.





Probabilistic Causation in Hungary

20 12 2007

Budapest is a very nice city, and this sounds like an interesting program – I’m just not yet sure whether I can plan anything for that time period, or else I would certainly apply.

Course Dates: JULY 21 – AUGUST 1, 2008
Location: Central European University (CEU), Budapest, Hungary,
Detailed course description: http://www.sun.ceu.hu/causality

Faculty: Miklos Redei, Department of Philosophy, Logic and Scientific Method, London School of Economics, UK; Nancy Cartwright, London School of Economics and Political Science, UK; Damian Fennell, London School of Economics and Political Science, UK; Gabor Hofer-Szabo, King Sigismund College, Budapest, Hungary; Ferenc Huoranszki, Central European University, Budapest, Hungary
Laszlo E. Szabo, Eötvös University, Budapest, Hungary; Richard E. Neapolitan, Northeastern Illinois University

Target group: advanced graduate students, postdoctoral fellows, junior faculty and researchers in philosophy, physics, economics and computer science
Language of instruction: English
Tuition fee: EUR 500, financial aid is available.
The application deadline: February 14, 2008 (for scholarship places), April 30, 2008 (for fee-paying applications)
Online application: http://www.sun.ceu.hu/apply (attachments to be sent by email to causality@ceu.hu).

For further information queries can be directed to the SUN office by email (summeru@ceu.hu), via skype (ceu-sun) or telephone (00-36-1-327-3811).





APA Interview Scheduling, 2007

13 12 2007

I realized from looking at the Philosophy Job Market Wiki that I had been waiting for interviews to be announced and worrying about them not being announced substantially earlier than I should have been. So I figured it would be useful to compile statistics on the time-line of when interviews were scheduled. I’ll try to keep updating this as the season goes on. I was considering compiling something like this after the APA was done, but some comments in this thread on the Philosophy Job Market Blog suggested that there would be demand for it now. I think it’ll be more useful for people in future years than for this year (since by definition, the data isn’t available until after the interviews are announced).

Also, for those paying attention in future years note that the distribution may well vary from year to year, both because weekends will fall at different points in the semester and in relation to the date of the APA, and also because there will be a different number and selection of departments hiring, and disciplinary trends will be changing. (For instance, in future years, more or fewer departments may decide to skip the APA interviews, or it might become more standard to e-mail candidates rather than call them or whatever.) Also, in each case, I’m just counting lines on the Philosophy Job Market Wiki – I know that sometimes a single line represents two or more potential hires, or that a single department may have multiple lines on the wiki that don’t all end up representing different hires, but this is the easiest way to count. “Leiter Ranked” means Leiter 2006-2008 top 54 in the US, 15 in UK, 4 in Canada, and 4 in Australasia – I know this unfairly excludes universities in Ireland and continental Europe, and is a fairly arbitrary cutoff in each region that is included, but at least it’s publicly available.

TOTAL TENURE-TRACK JOBS – 278
TOTAL LEITER RANKED – 49
TOTAL NON-US – 30

Number of jobs that first scheduled an interview (phone, APA, or otherwise) or campus visit on a given date (rows are grouped by week):

DATE:

by 12/2

12/3

12/4

12/5

12/6

12/7

TOTAL TT

20

1

7

10

7

12

LEITER RANKED

5

0

1

0

0

1

NON-US

4

0

0

0

0

0

DATE:

12/8-9

12/10

12/11

12/12

12/13

12/14

TOTAL TT

4

20

13

25

25

23

LEITER RANKED

0

7

2

6

7

5

NON-US

1

3

0

1

0

0

DATE:

12/15-16

12/17

12/18

12/19

12/20

12/21

12/22-23

TOTAL TT

5

16

13

12

8

2

1

LEITER RANKED

1

2

4

3

1

0

0

NON-US

0

0

2

0

1

0

0

Not entered in Wiki as of 12/24

TOTAL TT

75

LEITER RANKED

14

NON-US

22

If there are other categories that would be easy to track that people would like tracked, mention them in the comments. I might want to track liberal arts colleges, but I won’t necessarily recognize which names to track. I may well have missed a couple international jobs too, because the name wasn’t obviously Canadian.





Probability and Bayesian Epistemology

10 12 2007

From the last Carnival of Philosophy, I found a post by another Kenny about the relation between Bayesian epistemology and probability! He puts forward three views of what this relation might be:

Here are brief definitions of each view, and how each one relates subjective degrees of rational confidence to probabilities (I will explain in more depth later).

* (P) takes subjective degrees of rational confidence as primitive. There is no state space for degrees of rational confidence, because they aren’t probabilities.
* (KPW) takes subjective degrees of rational confidence to be actual probabilities over the state space of all epistemically possible worlds, where the epistemically possible worlds are formal constructions that may or may not be objectively possible.
* (LPW) takes subjective degrees of rational confidence to be actual probabilities over the state space of the subset of the really possible worlds which are epistemically accessible.

However, he seems to be focused on a very particular understanding of the word “probability” that might not quite be what I would mean by it. The very fact that he talks about a relation between rational degrees of confidence and probabilities suggests that he’s understanding the word differently from how I am.

My understanding of the word is that “probability” refers to any function from a Boolean algebra to the real numbers satisfying the following three properties: (1) it is never negative; (2) the tautology is assigned value 1; (3) finite additivity (that is, given two elements whose conjunction is the contradiction, the probability of their disjunction is the sum of their probabilities). I’d also be willing to apply the term “probability” in cases where instead of a Boolean algebra in the strict mathematical sense, one uses any structure where the terms “tautology”, “conjunction”, “contradiction”, and “disjunction” have a natural interpretation.

It seems that Kenny Pearce, by contrast, understands the term to require that the algebra be an algebra of sets over some state space, and that there be some objective fact about the probability values. If this interpretation is right, then I don’t think I’d quite take any of the positions he mentions. At any rate, I think I support something more like (KPW) than the others, where “actual probabilities” isn’t taken in any objective sense. In explaining this position, I think I can give answers to three questions he raises:

1. Why should we suppose that we can use the math of probability theory in dealing with degrees of rational confidence?
2. The math of probability theory is generally interpreted in terms of sets called state spaces, but, ex hypothesi, degrees of rational confidence, not being probabilities, have no state spaces. What, then, does the math mean?
3. Why should we suppose that when an occurrence has a well defined objective probability, our subjective degree of rational confidence should be assigned a value equal to its probability?

In response to the first question, the standard answer would be to refer to something like a Dutch book argument – degrees of rational confidence can be described by the mathematics of probability theory because if degrees of confidence couldn’t, then the agent would be subject to a certain loss from a set of bets they would be willing to take, and therefore would be irrational. (There’s some slipperiness here with generating the bets from the confidences, and concluding irrationality based on a collection of bets the agent may take individually, but I think this can be cleaned up.) There’s also a host of other arguments for something like this same conclusion (though Alan Hájek raises issues for them in his (forthcoming?) “Arguments For – Or Against? – Probabilism”). As Kenny Pearce notes, nothing about these arguments requires there to be a state space, so they don’t end up being probabilities in his sense (due to Kolmogorov), but they do seem to be probabilities in the sense I use (and Popper, and Borel, and others).

As for the second question, I think that there actually is a state space that is relevant for degrees of rational confidence, which is why I lean more towards something like what Kenny Pearce calls (KPW) rather than (P). The state space here would be the set of epistemic possibilities (whatever those are – I don’t really have a good theory of them, do you?). Despite my lack of an account of them, I think they do need to play a role. I think we can’t make very good sense of the notion of a degree of confidence in p, supposing q, without a set of possibilities that we can restrict to the q-possibilities. Also, these epistemic possibilities seem to play an important role in other aspects of epistemic modality, not just degree of belief. And most importantly, I think there’s a rational difference between having an rational confidence of 0 in p and actually being certain that p will not happen. When measuring the speed of light, there’s a difference between my attitude towards it being exactly 2.9980000000000001 x 108 m/s, and my attitude towards it between 3 m/s – I consider the former possible given what I know, and the latter not. However, since there is some interval around 2.998 x 108 m/s that I can’t rule out, and there are infinitely many such values that I am indifferent between, I can’t give any of them a positive value without either violating additivity or assigning values larger than 1 to certain disjunctions. So I propose that the state space contains infinitely many epistemic possibilities, and that my degree of confidence in certain sets of these possibilities is 0, even though the set is non-empty. (Of course, for the empty set, I trivially have confidence 0 in that set of possibilities.) So I think this aspect of the math actually applies quite well to degrees of confidence, though I’m willing to concede that many people will want to challenge this point, and I don’t think it’s as important as the point that degrees of confidence must be probabilities in something like the general sense I outlined earlier.

However, I don’t think such a state space comes with objectively correct probabilities to assign – after all, it’s infinite, and Bertrand’s Paradox shows how all sorts of troubles arise when we think that symmetries of an infinite space constrain probability assignments.

As for the third question, I’m not sure I agree with its premise. I’m not totally convinced that when there is a well-defined objective probability, we should match it with our degrees of confidence. Consider a fair coin that has just been tossed. There is some sense in which it had an objective probability of 1/2 of coming up heads, so this principle would suggest having degree of belief 1/2 in heads. But if I also know that this coin was one of 10 fair coins flipped at that point, 9 of which happened to come up heads, then (in the frequency sense, as opposed to the chance sense) there is also an objective probability of 9/10 of that coin being heads up, so this principle would suggest the contradictory degree of belief of .9. Maybe in this situation one of the two principles wins out (my guess would be the latter), but I don’t really know under what circumstances something like this should be the case. Of course, I also don’t really know what sorts of objective things count as “well-defined objective probabilities” – is it chances, frequencies, or something else? There are many well-defined objective things that obey the mathematics of probability, but it’s an interesting question which (if any) should be tracked by our degrees of confidence.

Kenny Pearce suggests that on the (KPW) theory of degrees of confidence, it’s the fact that “the worlds … divide more or less evenly” that makes us assign 1/6 to each of the propositions about the way the die might land up. I don’t think there’s such thing as an objective measure over this infinite state space, so we can’t even make sense of the worlds dividing more or less evenly. Thus, if there is some objective reason for the degrees of belief we assign, I don’t know what it is yet, but I don’t think it could be anything like what Kenny Pearce suggests in either (KPW) or (LPW). (Also, I don’t think (LPW) is even a viable candidate, because this is supposed to be a theory of degree of rational belief, and actual possibilities have almost nothing to do with rational epistemic possibilities – one could try to make a modified 2-dimensionalist version of this strategy, as Frank Jackson does, but I’m not convinced that this will work.)

I think that these degrees of confidence exist, and are actually often much more precise than we realize (there’s no reason we should have transparent access to exactly what our degrees of belief are), but they’re not constitutively tied to any sort of objective probability in the sense that Kenny Pearce was expecting for a relation between Bayesian epistemology and probability. These degrees of belief are themselves probabilities, just in a different interpretation than Kenny Pearce was specifically considering.





Foundations of Category Theory

1 12 2007

Yesterday, Solomon Feferman from Stanford was up at Berkeley to give the logic colloquium, and he had a very interesting discussion of foundations for category theory. I figured I’d write up the outlines of the basic theories he discussed, because mathematicians might well be interested in how this sort of thing could work. I’d also be interested in hearing what people who actually use category theory think of all this. The issues that came up are the reason why set theory suffices as a foundation for all mathematics before category theory, but has troubles here. The basic idea is that he wants some formal framework in which one can work with categories, in a way that the following four criteria are satisfied to a sufficient degree:

The Criteria

1. There should be a category of all groups (with homomorphisms), a category of all sets (with functions), a category of all topological spaces (with continuous maps), and a category of all categories (with functors).

2. If A and B are two categories, then there should be a category consisting of all functors from A to B, with morphisms given by natural transformations.

3. Standard set-theoretic operations used throughout mathematics should be possible – things like taking unions and intersections of a set of sets, constructing powersets and ordered pairs and n-tuples and the like.

4. The framework should be provably consistent from some standard set-theoretic background. (This criterion is obviously going to seem unnecessary for someone like Lawvere, who wants to use category theory as an alternate foundation for mathematics. For someone like that, the really interesting criterion will probably be #3, because they’ll have to translate away all talk that mathematicians normally have of sets in other terms. Maybe this has been done, but I’ve had a lot of conceptual difficulty trying to read Lawvere’s textbook treatment of this stuff. At any rate, I don’t think most mathematicians will find this any more congenial than working in ZFC directly. As a result, this sort of thing wasn’t what Feferman was talking about, but he did a good job clarifying the interplay of category theory and foundational work in the more usual frameworks.)

Most of the talk was spent describing several systems that allow for some degree of satisfaction of all these criteria, but it looks like no one yet has come up with a framework that allows for full satisfaction of all of them.

The Frameworks

Mac Lane’s small and large categories

This is probably the most familiar way of talking about these issues for mathematicians. In the background, we use a standard theory to talk about sets and proper classes (Bernays-Gödel in particular). This theory gives us the full power of ZFC when talking about sets, but also allows us to talk about proper classes whose elements are sets. On this picture a category is a class of objects, together with a class of morphisms between them, satisfying the standard axioms. If both of these classes are sets, then the category is said to be small, while if either one is a proper class, then the category is said to be large.

We can see how well it fits all the criteria:

1. There are categories of all groups, of all sets, and all topological spaces, but each of these categories is itself large. There is a category of all small categories, but this category obviously doesn’t have any of the previous categories (or itself) as an object. Thus, it makes sense to talk about the fundamental group operation as a functor from the category of topological spaces to the category of groups, but this functor can’t itself be seen as a morphism in the category of categories.

(Note that these categories might be seen to leave out some members. Once we can talk about proper classes, there are some “large” groups, sets, and topological spaces, just as there are large categories. An example of a large group is Conway’s surreal numbers, which form a field, but are a proper class, so they are a large group, and large ring, as well as a large field. So we might also think there’s a problem with the category of groups since it doesn’t tell us about morphisms from arbitrary groups into the surreal numbers.)

2. If A is a small category, then functors from A to B can be represented by sets of ordered pairs (in the usual way that functions are) so that there is in fact a category containing all these functors as objects. If B is small, then this category is small, and if B is large, then this category is large as well. However, if A is a large category, then functors would themselves be proper classes, and therefore can’t be members of classes, and therefore there is no category containing these functors. This restriction may or may not be problematic. (I don’t know enough about what mathematicians actually do with categories, so I don’t know if it’s important to have this functor category between large categories.)

3. Fortunately, since the background theory is built from a standard set theory, all standard set-theoretic constructions work just fine.

4. Bernays-Gödel set theory was constructed specifically to be a conservative extension of ZF. Therefore, it is consistent iff ZF is. And just as ZFC is conservative iff ZF is, Bernays-Gödel is consistent with the Axiom of Choice iff it is consistent by itself.

Grothendieck’s “axiom of universes”

This is another background I’ve heard of before. This framework basically extends Mac Lane’s hierarchy so that instead of just small and large categories, there are infinitely many levels of categories, each of which is “small enough” to be an object in a larger category.

The official technical details are as follows. We assume standard ZF set theory, and then add the axiom that every set is contained in a universe. A universe is just a transitive set (meaning that it contains everything contained in any of its members), which also contains the powerset of any of its members, and also contains the range of any definable function applied to any set inside the universe. Basically, this just means that a universe is a standard model of ZF. In this framework, a category is just a set of objects together with a set of morphisms, following the standard axioms (none of the proper class stuff that we get with Mac Lane).

1. Now, since the collection of all groups forms a proper class, there is no category of all groups. However, for any universe U, since this universe is a set, the collection of all groups contained in U is itself a set, so we get a category of all U-groups. Similarly, for any universe U there is a category of all U-sets, all U-topological spaces, and (importantly) all U-categories.

By the Axiom of Universes, we see that every set is contained in some universe. Since every category has some underlying set, this category is itself contained in some universe U, and thus is itself an object in the category of all U-categories. None of these categories contains itself, but no matter how big the categories are you’re talking about, there is some category of categories that contains everything that big. This lets you do lots of work that couldn’t be done in the Mac Lane framework, because on his framework, large categories aren’t objects in any category, while here we can always just go up one level. (However, Mac Lane does have the advantage that there is a single category containing all groups, while here no single category does.)

2. Again, if A and B are categories in some universe U (there must be some universe containing the set {A,B}, and this universe must contain both A and B) then every function from one to the other is itself in that universe U, and thus the category of functors from A to B is itself in U. This again is an improvement over the Mac Lane situation.

3. Since we’re using standard ZF set theory, this is straightforwardly satisfied.

4. There is a slight loss over Mac Lane’s framework here. The existence of a universe implies the consistency of ZF, because the universe is itself a model of ZF. Therefore, if the consistency of ZF implied the consistency of the Axiom of Universes, then the Axiom of Universes would prove its own consistency, which is impossible by Gödel’s theorem. Since this Axiom of Universes framework was implicitly used by Wiles in his proof of Fermat’s Last Theorem, we therefore don’t yet know that ZFC is strong enough to prove the result.

However, the situation is not so bad. The existence of a universe containing a set S is just the same thing as the existence of an inaccessible cardinal with higher cardinality than S. Therefore, the Axiom of Universes is equivalent to the existence of unboundedly large inaccessible cardinals. This is provably consistent if we assume the existence of any single large cardinal that is even larger (such as a Mahlo cardinal, or a measurable cardinal), so set theorists are basically just as certain that this theory is consistent as they are of ZFC. You might have further worries about whether these cardinals actually exist (so that the system has no false premises, and therefore leads only to true conclusions), but those are the sorts of worries ordinary mathematicians like to ignore.

Feferman’s 1969 system

It turns out that Grothendieck’s system can be brought down in strength. Instead of using the Axiom of Universes, we just add a bunch of terms U1, U2, and so on (through all the ordinals, if necessary), and add some axioms. For each proposition P expressible in the language of set theory, add the axiom:
PU_i iff P
where PU_i is the same as P, with all quantifiers explicitly restricted to range only over Ui. This has the effect of making each set Ui “just like” the entire universe. Thus, we can just formulate Grothendieck’s system here. There are some slight differences: Grothendieck’s smallest universe satisfies the claim “there are no universes”, while other universes don’t – in this system, any two of these universes satisfy exactly the same sentences. But in general, those differences make this system better, because each universe represents the whole universe better than Grothendieck’s did.

However, there’s a nice advantage – this system is apparently equiconsistent with ZFC, rather than requiring the existence of additional inaccessibles. Thus, proving a theorem in this framework uses less strength than Grothendieck’s system does. I don’t know if the two systems are quite similar enough to translate the proof of FLT into this one, but that would bring down the strength needed quite a bit. So in some sense, mathematicians probably should use this system, rather than Grothendieck’s. There are still the limitations that Grothendieck has on the first two conditions, but the third and fourth are much better satisfied.

Feferman’s 1974 system using Quine’s “New Foundations”

This was the system Feferman spent the most time discussing. This one is also the most confusing, because it uses Quine’s system NF instead of a more standard set theory. The basic way that this system works is that instead of the complicated set of axioms of ZFC, we just have two very intuitive axioms. One is the axiom of extensionality, which says that there are no two distinct sets with exactly the same members. The other is a restriced axiom of comprehension. The basic axiom of comprehension just says that for every property, there is a set consisting of all things that satisfy that property. But as Russell pointed out to Frege in 1902, this full axiom is inconsistent, because it leads to Russell’s paradox of the set of all sets that don’t contain themselves. In ZFC this paradox is avoided by using several axioms to prove the existence of sets defined by various types of properties. Quine decided instead to avoid this (and related) paradoxes by restricting the types of properties that can be used to define sets. The only properties he allowed were ones that could be written correctly in a theory of types. In particular, each variable in the formula could be assigned a number, such that in x=y, x and y get the same number, while in x\in y the number assigned to y is exactly 1 greater than the number assigned to x. This prevents Russell’s paradox, because the formula \lnot(x\in x) can’t be assigned numbers in this way.

The nice thing about NF is that it allows sets to contain themselves, and it allows there to be a set of all sets (this set is just defined as the set of all things satisfying x=x). Of course, there are some very messy things that go on with the theory of cardinalities, because the powerset of the universe has to be smaller than the universe itself. There are also messy things because the standard definitions of ordered pairs and powersets and sets of functions from one set to another need to use formulas that aren’t “stratified” in the appropriate way. Feferman discussed some ways of fixing this so it could all work. A further difficulty arises in that so far no one yet knows what sort of system would be strong enough to prove that NF is consistent. However, if we allow for many objects that aren’t sets (this requires weakening the axiom of extensionality so that it only applies to things with at least one element) it turns out that this system is actually provably weaker than ZFC. (In particular, it can be proven to be equiconsistent with Zermelo set theory.) We can then define categories to be given by a set of objects and a set of morphisms.

So now we can see how this theory stands up to the criteria.

1. Since there is a set of all sets, and sets can contain themselves, it turns out that in this theory we really do get a category of all groups, a category of all sets, a category of all topological spaces, and a category of all categories that really contains itself as an object! This is the only theory we’ve listed on which this really works, and we really get all categories into a single category.

2. Again, as long as A and B are categories, we really do get a category of all functors from A to B, with all the natural transformations as morphisms (I think).

3. Here is where we really run into trouble. We’ve patched things up so that we can take unions and intersections and products and powersets and the like. However, the Axiom of Choice is provably false in this theory. Additionally, sets of functions sometimes have bad problems. Feferman said in particular that the Yoneda Lemma can’t be proven in this system, as well as some other standard things that mathematicians want to use. I don’t really know what the problems are, but these seem bad. The problem is that no one yet really understands NF well enough to know how to fix these things. Perhaps for the sake of category theory, this would be worth doing. But, Dana Scott pointed out after the talk that NF has a history of destroying careers wasted a lot of people’s time, and it’s the only foundational-type system whose full consistency strength still hasn’t been decided, and it’s been around for 70 years. [Correction: I had mischaracterized what Scott said and updated the post to reflect this. He also points out that Jensen showed that NFU, allowing urelemente, is consistent, though I’m not sure what consistency strength it has.]

4. As mentioned above, allowing basic elements that aren’t sets makes this theory equiconsistent with Zermelo set theory, but this consistency proof is also part of the weakness of the theory, since it can’t do as much with functions as standard set theories. If we managed to fix that part, this would presumably blow up the consistency strength much higher than Grothendieck’s system.

Conclusion

Thus, there’s a bunch of interesting ways out there to get something like what mathematicians want from category theory. However, none of these systems really gets everything that we want. The ones that we know are consistent, and we know have enough set-theoretic power for everyday mathematics, we also know can’t really talk about the category of all categories or the category of all groups. There are various replacement categories around (like the category of all U-categories, or the category of all small groups), and for many purposes these are enough. But they don’t quite get the conceptual ideas right. (One could level the same criticism directly at the set theories though – none of them really gets a set of all sets, or a class of all classes, except for the poorly understood New Foundations.)

This might motivate some further work by category theorists on foundations, and in particular something like Lawvere’s program, though his program requires a radical shift in understanding of the notions of set, function, collection, individual, and the like, so it doesn’t seem like it would be much more congenial to most mathematicians for fully formal work.

Of course, most mathematicians don’t need to do this stuff fully formally, so the situation is generally comfortable. But it’s nice to see where foundational work in set theory really could impact the lives of ordinary mathematicians.