Yesterday, Solomon Feferman from Stanford was up at Berkeley to give the logic colloquium, and he had a very interesting discussion of foundations for category theory. I figured I’d write up the outlines of the basic theories he discussed, because mathematicians might well be interested in how this sort of thing could work. I’d also be interested in hearing what people who actually use category theory think of all this. The issues that came up are the reason why set theory suffices as a foundation for all mathematics before category theory, but has troubles here. The basic idea is that he wants some formal framework in which one can work with categories, in a way that the following four criteria are satisfied to a sufficient degree:

## The Criteria

1. There should be a category of all groups (with homomorphisms), a category of all sets (with functions), a category of all topological spaces (with continuous maps), and a category of all categories (with functors).

2. If A and B are two categories, then there should be a category consisting of all functors from A to B, with morphisms given by natural transformations.

3. Standard set-theoretic operations used throughout mathematics should be possible – things like taking unions and intersections of a set of sets, constructing powersets and ordered pairs and *n*-tuples and the like.

4. The framework should be provably consistent from some standard set-theoretic background. (This criterion is obviously going to seem unnecessary for someone like Lawvere, who wants to use category theory as an alternate foundation for mathematics. For someone like that, the really interesting criterion will probably be #3, because they’ll have to translate away all talk that mathematicians normally have of sets in other terms. Maybe this has been done, but I’ve had a lot of conceptual difficulty trying to read Lawvere’s textbook treatment of this stuff. At any rate, I don’t think most mathematicians will find this any more congenial than working in ZFC directly. As a result, this sort of thing wasn’t what Feferman was talking about, but he did a good job clarifying the interplay of category theory and foundational work in the more usual frameworks.)

Most of the talk was spent describing several systems that allow for some degree of satisfaction of all these criteria, but it looks like no one yet has come up with a framework that allows for full satisfaction of all of them.

## The Frameworks

### Mac Lane’s small and large categories

This is probably the most familiar way of talking about these issues for mathematicians. In the background, we use a standard theory to talk about sets and proper classes (Bernays-Gödel in particular). This theory gives us the full power of ZFC when talking about sets, but also allows us to talk about proper classes whose elements are sets. On this picture a category is a class of objects, together with a class of morphisms between them, satisfying the standard axioms. If both of these classes are sets, then the category is said to be *small*, while if either one is a proper class, then the category is said to be *large*.

We can see how well it fits all the criteria:

1. There are categories of all groups, of all sets, and all topological spaces, but each of these categories is itself large. There is a category of all small categories, but this category obviously doesn’t have any of the previous categories (or itself) as an object. Thus, it makes sense to talk about the fundamental group operation as a functor from the category of topological spaces to the category of groups, but this functor can’t itself be seen as a morphism in the category of categories.

(Note that these categories might be seen to leave out some members. Once we can talk about proper classes, there are some “large” groups, sets, and topological spaces, just as there are large categories. An example of a large group is Conway’s surreal numbers, which form a field, but are a proper class, so they are a large group, and large ring, as well as a large field. So we might also think there’s a problem with the category of groups since it doesn’t tell us about morphisms from arbitrary groups into the surreal numbers.)

2. If A is a small category, then functors from A to B can be represented by sets of ordered pairs (in the usual way that functions are) so that there is in fact a category containing all these functors as objects. If B is small, then this category is small, and if B is large, then this category is large as well. However, if A is a large category, then functors would themselves be proper classes, and therefore can’t be members of classes, and therefore there is no category containing these functors. This restriction may or may not be problematic. (I don’t know enough about what mathematicians actually do with categories, so I don’t know if it’s important to have this functor category between large categories.)

3. Fortunately, since the background theory is built from a standard set theory, all standard set-theoretic constructions work just fine.

4. Bernays-Gödel set theory was constructed specifically to be a conservative extension of ZF. Therefore, it is consistent iff ZF is. And just as ZFC is conservative iff ZF is, Bernays-Gödel is consistent with the Axiom of Choice iff it is consistent by itself.

### Grothendieck’s “axiom of universes”

This is another background I’ve heard of before. This framework basically extends Mac Lane’s hierarchy so that instead of just small and large categories, there are infinitely many levels of categories, each of which is “small enough” to be an object in a larger category.

The official technical details are as follows. We assume standard ZF set theory, and then add the axiom that every set is contained in a universe. A *universe* is just a transitive set (meaning that it contains everything contained in any of its members), which also contains the powerset of any of its members, and also contains the range of any definable function applied to any set inside the universe. Basically, this just means that a universe is a standard model of ZF. In this framework, a category is just a set of objects together with a set of morphisms, following the standard axioms (none of the proper class stuff that we get with Mac Lane).

1. Now, since the collection of all groups forms a proper class, there is no category of all groups. However, for any universe U, since this universe is a set, the collection of all groups contained in U is itself a set, so we get a category of all U-groups. Similarly, for any universe U there is a category of all U-sets, all U-topological spaces, and (importantly) all U-categories.

By the Axiom of Universes, we see that every set is contained in some universe. Since every category has some underlying set, this category is itself contained in some universe U, and thus is itself an object in the category of all U-categories. None of these categories contains itself, but no matter how big the categories are you’re talking about, there is some category of categories that contains everything that big. This lets you do lots of work that couldn’t be done in the Mac Lane framework, because on his framework, large categories aren’t objects in any category, while here we can always just go up one level. (However, Mac Lane does have the advantage that there is a single category containing *all* groups, while here no single category does.)

2. Again, if A and B are categories in some universe U (there must be some universe containing the set {A,B}, and this universe must contain both A and B) then every function from one to the other is itself in that universe U, and thus the category of functors from A to B is itself in U. This again is an improvement over the Mac Lane situation.

3. Since we’re using standard ZF set theory, this is straightforwardly satisfied.

4. There is a slight loss over Mac Lane’s framework here. The existence of a universe implies the consistency of ZF, because the universe is itself a model of ZF. Therefore, if the consistency of ZF implied the consistency of the Axiom of Universes, then the Axiom of Universes would prove its own consistency, which is impossible by Gödel’s theorem. Since this Axiom of Universes framework was implicitly used by Wiles in his proof of Fermat’s Last Theorem, we therefore don’t yet know that ZFC is strong enough to prove the result.

However, the situation is not so bad. The existence of a universe containing a set S is just the same thing as the existence of an inaccessible cardinal with higher cardinality than S. Therefore, the Axiom of Universes is equivalent to the existence of unboundedly large inaccessible cardinals. This is provably consistent if we assume the existence of any single large cardinal that is even larger (such as a Mahlo cardinal, or a measurable cardinal), so set theorists are basically just as certain that this theory is consistent as they are of ZFC. You might have further worries about whether these cardinals actually exist (so that the system has no false premises, and therefore leads only to true conclusions), but those are the sorts of worries ordinary mathematicians like to ignore.

### Feferman’s 1969 system

It turns out that Grothendieck’s system can be brought down in strength. Instead of using the Axiom of Universes, we just add a bunch of terms U_{1}, U_{2}, and so on (through all the ordinals, if necessary), and add some axioms. For each proposition P expressible in the language of set theory, add the axiom:

P^{U_i} iff P

where P^{U_i} is the same as P, with all quantifiers explicitly restricted to range only over U_{i}. This has the effect of making each set U_{i} “just like” the entire universe. Thus, we can just formulate Grothendieck’s system here. There are some slight differences: Grothendieck’s smallest universe satisfies the claim “there are no universes”, while other universes don’t – in this system, any two of these universes satisfy exactly the same sentences. But in general, those differences make this system better, because each universe represents the whole universe better than Grothendieck’s did.

However, there’s a nice advantage – this system is apparently equiconsistent with ZFC, rather than requiring the existence of additional inaccessibles. Thus, proving a theorem in this framework uses less strength than Grothendieck’s system does. I don’t know if the two systems are quite similar enough to translate the proof of FLT into this one, but that would bring down the strength needed quite a bit. So in some sense, mathematicians probably should use this system, rather than Grothendieck’s. There are still the limitations that Grothendieck has on the first two conditions, but the third and fourth are much better satisfied.

### Feferman’s 1974 system using Quine’s “New Foundations”

This was the system Feferman spent the most time discussing. This one is also the most confusing, because it uses Quine’s system NF instead of a more standard set theory. The basic way that this system works is that instead of the complicated set of axioms of ZFC, we just have two very intuitive axioms. One is the axiom of extensionality, which says that there are no two distinct sets with exactly the same members. The other is a restriced axiom of comprehension. The basic axiom of comprehension just says that for every property, there is a set consisting of all things that satisfy that property. But as Russell pointed out to Frege in 1902, this full axiom is inconsistent, because it leads to Russell’s paradox of the set of all sets that don’t contain themselves. In ZFC this paradox is avoided by using several axioms to prove the existence of sets defined by various types of properties. Quine decided instead to avoid this (and related) paradoxes by restricting the types of properties that can be used to define sets. The only properties he allowed were ones that could be written correctly in a theory of types. In particular, each variable in the formula could be assigned a number, such that in , x and y get the same number, while in the number assigned to y is exactly 1 greater than the number assigned to x. This prevents Russell’s paradox, because the formula can’t be assigned numbers in this way.

The nice thing about NF is that it allows sets to contain themselves, and it allows there to be a set of all sets (this set is just defined as the set of all things satisfying ). Of course, there are some very messy things that go on with the theory of cardinalities, because the powerset of the universe has to be smaller than the universe itself. There are also messy things because the standard definitions of ordered pairs and powersets and sets of functions from one set to another need to use formulas that aren’t “stratified” in the appropriate way. Feferman discussed some ways of fixing this so it could all work. A further difficulty arises in that so far no one yet knows what sort of system would be strong enough to prove that NF is consistent. However, if we allow for many objects that aren’t sets (this requires weakening the axiom of extensionality so that it only applies to things with at least one element) it turns out that this system is actually provably weaker than ZFC. (In particular, it can be proven to be equiconsistent with Zermelo set theory.) We can then define categories to be given by a set of objects and a set of morphisms.

So now we can see how this theory stands up to the criteria.

1. Since there is a set of all sets, and sets can contain themselves, it turns out that in this theory we really do get a category of all groups, a category of all sets, a category of all topological spaces, and a category of all categories that really contains itself as an object! This is the only theory we’ve listed on which this really works, and we really get *all* categories into a single category.

2. Again, as long as A and B are categories, we really do get a category of all functors from A to B, with all the natural transformations as morphisms (I think).

3. Here is where we really run into trouble. We’ve patched things up so that we can take unions and intersections and products and powersets and the like. However, the Axiom of Choice is provably false in this theory. Additionally, sets of functions sometimes have bad problems. Feferman said in particular that the Yoneda Lemma can’t be proven in this system, as well as some other standard things that mathematicians want to use. I don’t really know what the problems are, but these seem bad. The problem is that no one yet really understands NF well enough to know how to fix these things. Perhaps for the sake of category theory, this would be worth doing. But, Dana Scott pointed out after the talk that NF has ~~a history of destroying careers~~ wasted a lot of people’s time, and it’s the only foundational-type system whose full consistency strength still hasn’t been decided, and it’s been around for 70 years. [Correction: I had mischaracterized what Scott said and updated the post to reflect this. He also points out that Jensen showed that NFU, allowing urelemente, is consistent, though I'm not sure what consistency strength it has.]

4. As mentioned above, allowing basic elements that aren’t sets makes this theory equiconsistent with Zermelo set theory, but this consistency proof is also part of the weakness of the theory, since it can’t do as much with functions as standard set theories. If we managed to fix that part, this would presumably blow up the consistency strength much higher than Grothendieck’s system.

## Conclusion

Thus, there’s a bunch of interesting ways out there to get something like what mathematicians want from category theory. However, none of these systems really gets everything that we want. The ones that we know are consistent, and we know have enough set-theoretic power for everyday mathematics, we also know can’t really talk about the category of all categories or the category of all groups. There are various replacement categories around (like the category of all U-categories, or the category of all small groups), and for many purposes these are enough. But they don’t quite get the conceptual ideas right. (One could level the same criticism directly at the set theories though – none of them really gets a set of all sets, or a class of all classes, except for the poorly understood New Foundations.)

This might motivate some further work by category theorists on foundations, and in particular something like Lawvere’s program, though his program requires a radical shift in understanding of the notions of set, function, collection, individual, and the like, so it doesn’t seem like it would be much more congenial to most mathematicians for fully formal work.

Of course, most mathematicians don’t need to do this stuff fully formally, so the situation is generally comfortable. But it’s nice to see where foundational work in set theory really could impact the lives of ordinary mathematicians.

Theo(19:21:38) :When pressed, most category theorists seem to mumble something about “large” and “small”, and perhaps decide that in fact they need “larger”. I usually mumble something about arbitrarily many sufficiently large cardinals, knowing that this is problematic.

So instead, let me ask when we (pretending that I’m a category theorist) actually care about this. For one, we do want the category of all categories. Almost. In fact, the collection of all categories has more structure than just being a category: it has objects (categories), morphisms (functors), and 2-morphisms (natural transformations). And, in general, we really do want to be able to talk about n-categories, requiring that our classes go up and up and up.

And really we want omega-categories. The natural definition of “omega-category” is (i) a collection of objects (ii) between any two objects there is an omega category of morphisms. I.e. the natural definition of “omega category” is “(weak) category enriched over OmegaCat”. I say “weak category” because the point is that composition, for instance, is not associative in an omega-category, but only associative up to something. So really, in the current language, the definition of an “omega-category” is that it is an omega-category enriched over OmegaCat. Anyway, so the natural structure of OmegaCat is as an omega-category, not as an (omega+1)-category, at least as far as I can tell.

So that’s one thing that these proposals don’t deal with: we really want infinite chains of type-increase.

But is there ever a time when a mathematician would want to really think much about functors that themselves break levels? To MacLane, a group, for instance, is always a small category (as it has only one object) and always a small group: a “category” is defined as “a category enriched over Set”, so there must only be a (small) set of morphisms between any pair of objects. In any case, we do want the talk about the category of representations of a group; i.e. the functor category from (small) Grp to (large) Vect. And in general people ask about representations of a category, meaning functors into Vect; e.g. (T)QFTs. But I don’t know if anyone has really wanted to know about representations of Cat, say. We do talk about functors, say, between Cat to BiCat (the former being a 2-category, latter being the collection (weak) 2-categories, equipped with its natural structure as a 3-category). For instance, there is a functor Cat \to BiCat that introduces only identity 2-morphisms, and its adjoint functor from BiCat to Cat that decategorifies by modding out by all 2-morphisms. (I don’t remember how adjoint these are; in one direction the composition is the identity, and in the other? And, of course, Cat and BiCat are not really in the same n-category, so we have to be precise what kind of functor we mean…)

Certainly we do want to do things like quantizing and q-deforming Cat; doing so may require, e.g., allowing all (N- or C-)linear combinations of categories. So y’all logic people better give us a language that lets us make the vector space with basis _all categories_. Or at least _all topological spaces_ or something.

Kenny(23:40:35) :If you’ve got proper classes, then there’s a natural way to talk about a vector space with a proper class as a basis, just as the class of all formal linear combinations of elements of this class. (I suppose you need to be slightly careful that none of the elements of this class are themselves formal linear combinations of each other in whatever set-theoretic structure we’re using to talk about formal linear combinations.) So that gives you a way to have a vector space whose basis is all small categories.

If you’re working with universes, then the same trick lets you make a vector space in one universe whose basis is all topological spaces in all smaller universes.

Unfortunately, I don’t know enough about the other stuff to really understand why you want these things.

As for the n-category stuff, is it really foundationally any more difficult than doing it for 1-categories? You just need a class of objects, a class of 1-morphisms, …, and a class of n-morphisms, and you just need to write down whatever associativity axioms they need to satisfy. (That’s a problem for category theorists to solve, but I don’t see how it would mess up the set/class stuff underlying it.) I guess at the omega-category level things might get tougher because of the apparent non-well-foundedness of what’s going on.

Presumably Jacob Lurie has written about these foundational issues in his tome on the subject?

nikita(09:12:35) :Maybe this has been done, but I’ve had a lot of conceptual difficulty trying to read Lawvere’s textbook treatment of this stuff.

A more compact exposition is in the Chapter I of Lawvere’s Functorial Semantics of Algebraic Theories