(The ideas in this post originate in several conversations with Alan Hájek and Aidan Lyon.)

In chapter 15 of his big book on probability, E. T. Jaynes said “Not only in probability theory, but in all mathematics, it is the careless use of infinite sets, and of infinite and infinitesimal quantities, that generates most paradoxes.” (By “paradox”, he means “something which is absurd or logically contradictory, but which appears at first glance to be the result of sound reasoning. … A paradox is simply an error out of control; i.e. one that has trapped so many unwary minds that it has gone public, become institutionalized in our literature, and taught as truth.” His position is somewhat unorthodox, hinting that in some sense all of infinite set theory (and many classic examples in probability theory) is made up of this sort of paradox. But I th ink a lot of what he says in the chapter is useful, and I intend to study it more to see what it says about the particular infinities and zeroes that I’ve been worrying about in probability theory.

His recommendation of what to do is as follows:

Apply the ordinary processes of arithmetic and analysis only to expressions with a finite number n of terms. Then after the calculation is done, observe how the resulting finite expressions behave as the parameter n increases indefinitely.

Put more succinctly, passage to a limit should always be the last operation, not the first.

One suggestion to take from this idea might be that in phrasing any well-defined decision problem, the payoffs should in some sense be a continuous function o f the states. (I should point out that I got this suggestion from Aidan Lyon.) For instance, consider the game in the St. Petersburg paradox (interestingly, the argument in section 3 there about boundedness of utilities seems to miss a possibility – M artin considers the case of bounded utilities where the maximum is achievable, and unbounded ones where the maximum is not achievable, but not bounded ones where the maximum is unachievable, which seems to largely invalidate the argument). An objection y ou might be able to make to this game is that payoffs haven’t been specified for every possibility – although the probability of a fair coin repeatedly flipped coming up heads every time is zero, that doesn’t mean that it’s impossible. So we must specify a payoff for this state. But of course, we’d like to not have to wait forever to start giving you your payout, since then you’ll effectively get no payout. So we have to be giving a sequence of approximations at each step. Which basically suggests tha t we should make the payoff for the limit state be the limit of the payoffs of the finite states. Which is just as Jaynes would like – we shouldn’t do something (specify a payout) *after* taking a limit. Instead, we should specify payouts, and *t hen* take a limit, making the payout of the limit stage be the limit of the finite payouts. Which in this case means that there’s actually a chance of an infinite payout for the St. Petersburg game! (Even if that chance has probability zero.) So per haps it’s no longer so problematic a game – the expectation is no longer higher than every payout.

Note the sort of continuity I’m considering here. In some sense the payouts are discontinuous (obviously, they jump with each coin toss). But in the natural topology on the sp ace (where the open sets are exactly the pieces of evidence we could have at some point – in this case that the game will take at least *n* flips) it is continuous. Which leads us to a distinction between two games that classically look the same – in one game I flip a fair coin and give you $1 if it comes up heads and nothing if it comes up tails; in the other game I throw an infinitely thin dart randomly at a dartboard and give you $1 if it hits the left half and nothing if it hits the right half (stipulate that the upper half of the center line counts as left, the lower half counts as right, and the center point doesn’t exist). The difference is that in the former case, it’s always easy to tell which state has occurred, so we can calculate the pa yoff. In the latter case though, if the dart hits exactly on the middle line, then we can’t tell which payoff you should get unless we can measure the location of the dart with infinite accuracy. If we can only tell to within 1 mm where the dart has hit, then any dart that hits within 1 mm of the center line will be impossible to pay on. If we can refine our observations, then we can pay up for most of these points, but even closer ones will still cause trouble. And no matter how much we refine them, a dart that hits the line exactly (this has probability zero, but it seems that it still might happen, since it’ll hit *some* line) will be one that we can never know which payoff is right. So you’ll be stuck waiting for your payoff rather than actu ally getting one or the other. So the game is bad again, although the analogous coin-flipping game is good.

So once we’ve found the right topology for the state space, it seems that we may want to require that the payoffs for any well-defined game be co ntinuous on it. (For other reasons, like representing our limited capacity to know about the world, we might want to require that any isolated points (where the payoff can jump) must have positive probability, like the finite numbers of coin flips in the St. Petersburg case, but not the infinite one.)

In conversation today, Alan Hájek pointed out another sort of discontinuity that can arise in decision theory, namely one where payoffs are discontinuous on one’s actions. In the cases above, what’s discontinuous are the payoffs within some action I might agree to perform (say, playing the St. Petersburg game, or agreeing to a payoff schedule for a dart throw). But we can run into problems when we’re faced with multiple possible actions. The one that Alan mentioned was where you’re at the gates of heaven, and god offers you the possibility to stay as many days as you like, provided that you give him the (finite) number ahead of time on a piece of paper (a really large one, that you can compress a s much text on as you want). So you start writing down a large number, say by writing 9999…. No matter how long you keep writing, it’s in your interest to write another 9. However, Alan points out that the worst possible outcome here is that you keep writing 9’s forever and never actually make it into heaven!

To model this in the framework of decision theory, there are a bunch of actions that you can choose from. Action *n* results in you being in heaven for exactly *n* days, and then the re is one more action, that’s somehow the limit of all these previous actions, in which you never make it to heaven. No state space or anything else is relevant (in this particular case). But on your preferences, action *n* is preferred to action *m* iff *n>m*, except that the limit of these actions doesn’t get the limit of the payoffs. That is, staying there writing 9’s forever doesn’t give you infinitely many days in heaven – in fact, it gives you none! So this decision problem mig ht also be counted as somehow bad on Jaynes’ account, because it involves an essential discontinuity in the payoffs, though this time it is with respect to the space of actions rather than the space of states. (The topology on the space of actions will have to be defined with open sets being possible partial actions, or something of the sort, just as the topology on the space of states is defined with open sets being the possible partial observations, or something.)

Maybe this is a problem for Jaynes? This game that god has given you doesn’t seem to be too paradoxical – somehow it doesn’t seem as bad as St. Petersburg, even though it puts decision theory almost in a worse place (no decision is correct, rather than the correct decision being to pay any finite amount for a single shot at a game).

Anyway, I thought there was an interesting distinction here between these two types of discontinuities. I don’t know if one is more problematic than the other, but it’s something to think about. Also, I should point out that a decision problem like this last one seems to have first been introduced by Cliff Landesman, in “When to Terminate a Charitable Trust?”, which came out in *Analysis* some time in the mid-’90s I think.s

Theo(22:48:05) :Indeed, it’s not unusual for limits to fail to commute with other operations, even when they naively “ought” to. Such is why Cauchy’s calculus distinguishes convergence from uniform convergence: your God example feels an awful lot like trying to evaluate \[ \lim_{n\to\infty} \int_0^\infty f_n(x) dx = \lim (\int_0^n -dx + \int_n^{n^2} dx \] with a suitable function f_n(x) which is 0 for x>n^2, 1 for n^2>x>n, and -1 for n>x. Of course, \lim\int = \infty, whereas \int\lim = -\infty (evaluated pointwise).

A famous and funny story about limits failing to commute shows up as a warning in quantum field theory classes:

How would we define a quantum theory of gravity? Conventionally, quantum field theories are described as quantizations of classical field theories, and the classical theories that are straightforward (since Feynman) to quantize are free theories with higher-order correction terms. So, we might start by writing down Einstein’s equations for a general Riemann metric g_{\mu\nu} and Taylor-expand around flat-space \eta_{\mu\nu} (a known solution). Indeed, doing so gives a free theory h_{\mu\nu} = g_{\mu\nu} – \eta_{\mu\nu} plus higher-order terms. (I will ignore, other than this reference, all problems with nonrenormalizability of the quantized version of the full theory, and focus just on the classical theories.)

Great. Let’s cut off our expansion to just include one term of higher order than the free theory. We end up with what is essentially the only (up to some coefficients) theory in a (symmetric) tensor h_{\mu\nu} with this degree. It does not, of course, facially look like Einstein’s theory. But does it make the same (leading-order) predictions?

Well, that’s actually a hard question. Because to make this into really a good theory requires introducing a graviton mass. Otherwise, the integrals from which we can read off physics have poles in them. Introducing a graviton mass is a form of “regularizing” the theory. (And, physically, shouldn’t be a problem: particles only know that they’re massive by oscillating, and the amount of time it takes to oscillate is inverse with the mass. If the mass of any particle is, up to factors of \hbar and c, much smaller than the (inverse) length of the universe, then that particle, in any physical (quantum) theory, is essentially massless.)

Anyhoo, so what sorts of physical things can we measure? Given a field theory, we can compute a “propagator”, which essentially controls how well the fields travel from one point in spacetime to another. To leading order, the propagator for the massive-graviton theory has numerator of the form (denominators of propagators are all the same)

\[ G_{\mu\nu} k_\kappa k_\lambda + G_{\kappa\lambda} k_\mu k_\nu – (2/3) G_{\mu\kappa} G_{\nu\lambda} \]

plus terms scaling with the mass, which I’ve taken to 0 in the limit. Einstein’s theory (with a massless “graviton”) has a “propagator” of the same shape, except with a -1 in place of the -(2/3).

So what’s up? And does it matter? In fact, Einstein’s most famous GR prediction (actually a postdiction?) of the correct period of Mercury’s orbit (noticeably off from Newtonian/Kepler physics), does not depend on this term. It just depends on the overall shape of the propagator: if you believe gravity to be carried by symmetric tensor bosons, then you have no choice but to arrive at Mercury’s orbit.

However, an even more important measurement, a few years later, does depend on this 1 v.s. 2/3: there was a total eclipse of the sun shortly after Einstein published, in which astronomers successfully measured the gravitational lensing from the sun. And, indeed, Einstein was exactly correct: the coefficient should be 1, not 2/3.

So what gives? Most importantly, what went wrong with our limits? Surely a correct theory would give the correct outcome when the mass is taken to 0. (There is no doubt that a classical theory of a truly massless graviton, rather than this small-mass theory, would have Einstein’s propagator, because Einstein can be rewritten without any geometric intuition into just such a theory, provided the topology stay flat.)

About 10 years after the calculation I described was originally published (this is in the 70s/80s, but I don’t remember details), someone did such a careful calculation. In fact, what happened was that in calculating the lensing effect, folks had treated the sun as infinitely large. But the whole point is that a very-small-mass graviton wouldn’t know it has a mass, so long as its “wavelength” (if you will — really I just mean \hbar/(c m)) is much more than the size of the universe. So certainly it’s more than the size of the sun, and it’s wrong to take the limit as the size of the sun approaches infinity. At least, not before letting m\to 0.

Essentially, it’s a problem of non-commuting limits.

Peter(12:34:24) :Jaynes, of course, was a physicist, not a mathematician. I doubt any pure mathematician is careless in the use of the infinite, at least not since Cantor.

Being finite beings in a finite world, none of us — not even Einstein himself — start with a good, uneducated intuition for the infinite. This is something pure mathematicians learn about themselves with their mothers’ milk (as soon as the encounter the proof of the uncountability of the reals). After a while, we acquire (or most of us, do) an educated intuition about what is likely to work and what not in this strange world.

Of course, the irony here is that physicists often use the infinite to approximate the very-large-finite. Probability theory itself, as developed for and alongside statistical physics, falls into this category — an infinitistic abstraction of a finite reality.

Kenny(18:55:15) :I didn’t realize Jaynes was a physicist!

Mathematicians probably still do make a few such errors, though you’re right that it’s not very many since Cantor. But Jaynes also seems to want to count many theorems about probability spaces as errors, because they involve probability spaces that are somehow pathologically defined. There’s almost a hint of finitism in his ideas, as I read them. (Especially if you look at his discussion of specific examples.)

Peter(10:01:54) :The wikipedia entry on Jaynes is thin, but at least does mention his physics background:

http://en.wikipedia.org/wiki/Edwin_Thompson_Jaynes

Interestingly, another keen Bayesian probability theorist, Harold Jeffreys, was also a physicist (although of a different sort). Perhaps some sociologist of science could explain the attraction of bayesianism to physicists.

For a history of early probability theory and its links to physics see the nice book by von Plato:

author = “J. {von Plato}”,

title = “Creating Modern Probability: Its Mathematics, Physics and Philosophy in Historical Perspective”,

publisher = “Cambridge University Press”,

year = “1994”,

series = “Cambridge Studies in Probability, Induction, and Decision Theory”,

address = “Cambridge, UK”}

Francois(07:13:03) :Hi,

Doesn’t this whole days-in-heaven thing look a lot like Pascal’s bet, with perhaps some modifications to the weight (burden) given to ‘living a virtuous life on Earth’ ?

François

Kenny(00:34:04) :There certainly are some similarities to Pascal’s Wager here, but there’s also some significant differences. I think the main similarity is that both situations appeal to heavenly rewards as the only way to get easy intuitions about unboundedly large payoffs. However, in Pascal’s case, one of the payoffs is actually treated as infinite, rather than just very large and finite. And another difference is that Pascal’s case involves probabilities, while in this case it is completely clear to the agent exactly what will happen under any choice.

François Blumenfeld(12:52:15) :Fair enough. By the way, I’m working on Gorgias right now, and speaking of infinites, I was wondering if you had any opinion as to which of a spatial or a temporal infinity would be easier to understand/think about, if any?

I’m very jealous of your blog, by the way, it’s way more nerdier than mine, not for lack of trying, though.