Jaynes on the Indifference Principle

24 09 2007

I’ve started reading Jaynes’ book on probability theory, to get a better sense of how objective Bayesians think about things. One thing I found interesting (and a bit frustrating) was his argument for the “indifference principle”, stating that, conditional on background information that says nothing about possibility A without also saying it about possibility B, A and B must have the same probability.

The argument for this principle is quite interesting. He starts with the premise that a rational agent (or “robot”, as he often calls it) must assign probabilities to outcomes just based on the information about them, and that the probabilities should be the same in situations with identical information. Thus, if there are two propositions about which the information says nothing different, we can interchange them and end up in an identical situation to how we started, so the probabilities assigned must be the same. It’s a nice little argument, but I think it relies on a missing premise, which states that given any background information, there is a set of probabilities that it is uniquely right to assign – if many probability assignments are all allowed (as most subjective Bayesians will say), then this argument won’t entail that they all have to obey the indifference principle, as long as every permissible assignment with one asymmetry has a corresponding permissible assignment with the other asymmetry.

What makes me more suspicious about this indifference principle is how Jaynes actually goes on to use it. He says that using it requires the background information about the different propositions to actually be identical, but his very first use of it violates this condition!

Consider the traditional ‘Bernoulli urn’ of probability theory; ours is known to contain ten balls of identical size and weight, labeled {1,2,…,10}. Three balls (numbers 4, 6, 7) are black, the other seven are white. We are to shake the urn and draw one ball blindfolded. The background information … consists of the statements in the last two sentences. What is the probability that we draw a black one? (p. 42)

Of course, he goes on to say that the probability is 3/10 (which is obviously the “right” answer in some sense), because “the background information is indifferent to these ten possibilities”, so each ball has probability 1/10 of being drawn, and we can add the three chances for a black ball, since the background information entails that they are mutually exclusive events.

However, it looks to me like this is a mis-application of the principle, as he has stated it. The background information is explicitly not indifferent to the ten possibilities – it says that three of the balls are black and seven are white. A strict use of the indifference principle will say that balls 1,2,3,5,8,9,10 are all equally likely, and balls 4,6,7 are equally likely, but there’s no obvious way to apply the indifference principle to compare possibilities from one set and possibilities from the other. To see why this is the case, consider the following example, which is identical in terms of information content, but gives rise to an intuition other than 3/10:

our cabinet is known to contain ten balls of identical size and weight, labeled {1,2,…,10}. Three balls (numbers 4, 6, 7) are in the black drawer, the other seven are in the white drawer. We are to spin the cabinet and draw one ball blindfolded. The background information consists of the statements in the last two sentences. What is the probability that we draw one from the black drawer?

Unless our information includes something about how drawers and urns and paint and the like behave physically, there is no distinguishing between these two set-ups. However, it would seem quite odd to assign probability 3/10 in the latter set-up of drawing a ball from the black drawer – a better answer (if there is a right answer) seems like 1/2. But Jaynes seems to explicitly state that there is no information about drawers and urns in the background, since he says “the background information consists of the statements in the last two sentences”. (Something like this is exactly what changes between classical and quantum statistics of particle arrangements, so this is a relevant worry if we want to apply this objective Bayesianism to physics.)

Another way to get his answer would be to first consider the set-up where we’re told there are ten balls in the urn, and told their numbers, but not told anything about their colors. Now we see that the probability of drawing one of the balls 4,6,7 is 3/10, so when we learn that these three are black and the others are white, we conclude that the probability of getting a black ball is 3/10.

But this relies on the supposition that telling us the color of the balls has no effect on our rational degree of belief that any ball is drawn. Intuitively this seems right, but that’s only because we know how color behaves in the physical world – if it had been the size or shape, this would have been less clear, and properties about location or stickiness or solidity or whatever should clearly have changed the probabilities. Without this background information explicitly included, this update can’t work.

Additionally, there’s another way to reach this set-up from a slightly smaller set of background information. If we first just say that there is an urn with some balls in it, some of which are black and some of which are white, then the indifference principle would entail that the rational degree of belief in either black or white should be 1/2. But upon learning precisely which balls are black or white, we should somehow update our probabilities in a way that changes things – but how precisely to do this is left unspecified by the indifference principle.

So Jaynes must be implicitly appealing to some extra principles here in order to get the intuitive answers, unless he thinks the problems he set up implicitly contain more information than he has stated. If so, then he won’t be able to apply this objectively in actual physical scenarios where this background information isn’t known (which is why the experiment is being performed). This is no problem for a subjective Bayesian, because she doesn’t claim that an agent has no further information, or that there is a unique probability value that every rational agent must assign in this situation. It’s also no problem for someone who takes either a frequency or chance view of probability, since in any actual physical set-up we can assume that those numbers are well-defined, even though the agent has no access to them. (This causes problems for using frequency or chance as the sole basis of statistical inference, but that’s a different worry.) The situation seems uniquely troubling for the objective Bayesian.

16 09 2007

First, I’ll mention that I’ve updated my blogroll – there’s been a real burst in math blogs over the summer, at least in part instigated by my friends at the Secret Blogging Seminar, but also by the spurt of Fields Medalists with blogs. (Are we up to 10% of the total number now?) I’ve also added a few philosophy blogs that I’ve been reading for a while, and a couple that I should have been reading, but of course I’m sure I’m missing others.

Anyway, there’s new math job search gossip stuff going on on the web – I think the discussion on that post is interesting and relevant across disciplines for people trying to figure out whether this is generally a good thing or not.

Tim Gowers discusses the way logarithms and other abstract things should be taught. He advocates a way that’s a bit more formalist than some others suggest, but it sounds reasonable to me. There’s also interesting discussion of formalism there in the comments, though some of it sounds more like structuralism to me. See for example Terence Tao’s comment, “I guess there is a fundamental transition in mathematical learning when one realises that what mathematical objects are (and how they are constructed) may be less important than what mathematical objects do (e.g. what properties they obey).”

Also, a discussion about the Axiom of Choice at The Everything Seminar (I may add that one to my links later too), focusing on a puzzle I first heard from my friend Lukas Biewald. There’s interesting discussion in the comments that reveals implicit ideas about platonism and formalism among mathematicians. I think the anti-platonist majority there should be a bit more careful though, because similar issues apply in arithmetic, thanks to Gödel’s results. I think we should be much more hesitant to say that the natural numbers are just something we make up than they are with the universe of ZFC (or a topos, or whatever), as I mentioned before.

Job Search

12 09 2007

As some of you probably already know, I’m going on the job market this year. I don’t know if that will affect my blogging, except possibly to make it lighter (especially since I won’t be at any conferences for a while, and they tend to give me blogging inspiration). I probably won’t blog about job-search-related stuff though. (You can find that sort of stuff here if you want it.)

But it looks like Aidan has put together some links on this (despite being only in his third year). One he missed is the academic job market wiki, including a philosophy section. I wasn’t too in touch with people on the market last year – does anyone have any comments as to how useful that wiki was then? I suspect it will only be useful if a lot of people use it.

Banff Proposals for 2009

7 09 2007

I just got the following e-mail. People should definitely think about doing something like this – the workshop I went to organized by Richard Zach on “Mathematical Methods in Philosophy” was great, and I think there’s plenty of potential here for fruitful cross-disciplinary collaboration:

The Banff International Research Station for Mathematical Innovation and Discovery (BIRS) is currently accepting proposals for its 2009 programme. The deadline for 5-day Workshop and Summer School
proposals is October 1, 2007.

Full information and guidelines are available at the website
http://www.birs.ca/

Proposal submissions should be made online at:
https://www.birs.ca/proposals/.

BIRS will be again hosting a 48-week scientific programme in 2009. The Station provides an environment for creative interaction and the exchange of ideas, knowledge, and methods within the mathematical, statistical, and computing sciences, and with related disciplines and industrial sectors. Each week, the station will be running either a full workshop (42 people for 5 days) or two half-workshops (20 people for 5 days). As usual, BIRS provides full accomodation, board, and research facilities at no cost to the invited participants, in a setting conducive to research and collaboration.

Nassif Ghoussoub,
Scientific Director, Banff International Research Station

Crazy Neo-Falsificationism

2 09 2007

Karl Popper’s criterion of “falsifiability” for scientific theories (saying that a theory counts as scientific only if there is some hypothetical observation that would prove it to be false) is a very good heuristic for thinking about what science (or any sort of evidence-based procedure for finding out about the world) is like. However, regardless of what scientists say (whether they be physicists yelling about string theory, biologists yelling about intelligent design, or anyone railing at crackpots, or economists, or anyone they don’t like) it just isn’t right as even part of a criterion for what counts as science. But I think there is perhaps a way to use something like it as a criterion for what counts as a belief, though perhaps my suggestion is crazy.

First, a quick rundown of the problems with falsificationism as a criterion for science. As Popper was well aware, it can’t apply to statistical theories – in most cases, no evidence could actually rule out a statistical theory, rather than just making it extremely improbable, and you might think we shouldn’t rule something out just because it’s extremely improbable, because (in the long run) we’re bound to get unlucky and rule out the truth at some point. A bigger problem is the Quine-Duhem problem – basically no theory is falsifiable in a strict sense, because falsification of a theory by evidence always depends on auxiliary hypotheses, which can be let go of to save the theory. For instance, an observation of Uranus or Mercury in a place where you don’t expect it to be might look like a straightforward falsification of Newtonian mechanics, but there’s also room to postulate a so-far-unobserved planet (Neptune or Vulcan), or to argue that there was some optical artifact in the way the telescope was working, or even just that the astronomer misremembered or misrecorded the observation. Thus, there is no sharp line that can be drawn by a falsifiability criterion of this sort. In addition, theories that look straightforwardly unfalsifiable can still serve as useful heuristics for the further development of science – for instance, the theory that there actually are quarks (as opposed to the theory that protons and neutrons and cloud chambers and the like all behave “as if quarks existed”) can lead one to think of different modifications of the Standard Model in the face of recalcitrant data.

But despite all these problems, I think there’s still something very useful about the idea of falsificationism. But rather than a logical criterion, as Popper considered it, I’d rather think of it as an epistemological, or perhaps even psychological one. Popper thought that a theory needed to be specific enough that certain observations would be logically inconsistent with it, in order to count as a scientific theory. I’d rather say that a belief needs to be flexible enough that certain observations would lead the agent to give it up, in order for the belief to count as a “rational” or “scientific” one. (Or perhaps even to count as a belief at all, rather than just an article of faith, or something like that.) That is, it doesn’t need to be inconsistent with any set of observations – it just needs to be held in a way that is not totally unshakable. Although this is a psychological criterion I’m suggesting, I don’t think that the observations that would lead an agent to give it up need to be known to the agent – they just need to actually have the relevant dispositions. This removes the worries about statistical theories and the Quine-Duhem problem – although it might be that any theory could logically be saved from the data by giving up enough auxiliaries, it seems plausible that any rational agent would have some limit to the lengths that they would go to to save the theory. (I don’t know if comparative amounts of evidence needed to shake one’s belief should say anything interesting about the comparison between two agents.) This also applies to the more “standardly unfalsifiable” theories that I’d like to defend – I say that they’re important because they give useful heuristics for how to modify theories that are different from their empirically identical peers. But if these heuristics never seem to lead one to good modifications, then eventually one would likely give up this theory. It can’t be falsified, but one can still be made to give it up by seeing how fruitless it is and how much more fruitful its competitor is (which is just as unfalsifiable in this respect).

One might have worries about mathematical truths, or other potential “analytic” truths. Popper explicitly set these aside and said that his criterion only applied to things that weren’t logical truths (or closely enough related to logical truths). However, I suspect that something like my criterion might still apply here – although there is no possible observation that is inconsistent with Cauchy’s theorem on path integrals in the complex plane, I suspect that there are possible observations that would make anyone give up their belief in this theorem. For instance, someone could uncover a very subtle logical flaw that appears in every published proof of the theorem, and then exhibit some strange function that is complex-differentiable everywhere but whose integral around a closed curve is non-zero. Or at least, someone could do something that looks very much like this and would convince everyone, even though I think they couldn’t actually discover such a function because there isn’t one. It’s tougher to imagine what sort of observation would make mathematicians give up their beliefs in much simpler propositions, like the claim that there are infinitely many primes, or that 2+2=4, but as I said, there’s no need for the agents to actually be able to imagine the relevant observations – the disposition to give up the belief in certain circumstances just has to exist.

I think this is a relatively low bar for a belief to reach – I suspect that just about all apparent beliefs that people have would actually be given up under certain observations. However, with logical beliefs and religious beliefs, people often claim that no possible observation would make them give it up (this is called “analyticity” for logical beliefs, and “faith” for religious beliefs). I don’t know if that should actually count as a defect for either of these types of belief, but I think it is good reason to worry about them, at least to some extent.