I’ve started reading Jaynes’ book on probability theory, to get a better sense of how objective Bayesians think about things. One thing I found interesting (and a bit frustrating) was his argument for the “indifference principle”, stating that, conditional on background information that says nothing about possibility A without also saying it about possibility B, A and B must have the same probability.

The argument for this principle is quite interesting. He starts with the premise that a rational agent (or “robot”, as he often calls it) must assign probabilities to outcomes just based on the information about them, and that the probabilities should be the same in situations with identical information. Thus, if there are two propositions about which the information says nothing different, we can interchange them and end up in an identical situation to how we started, so the probabilities assigned must be the same. It’s a nice little argument, but I think it relies on a missing premise, which states that given any background information, there is a set of probabilities that it is uniquely right to assign – if many probability assignments are all allowed (as most subjective Bayesians will say), then this argument won’t entail that they all have to obey the indifference principle, as long as every permissible assignment with one asymmetry has a corresponding permissible assignment with the other asymmetry.

What makes me more suspicious about this indifference principle is how Jaynes actually goes on to use it. He says that using it requires the background information about the different propositions to actually be identical, but his very first use of it violates this condition!

Consider the traditional ‘Bernoulli urn’ of probability theory; ours is known to contain ten balls of identical size and weight, labeled {1,2,…,10}. Three balls (numbers 4, 6, 7) are black, the other seven are white. We are to shake the urn and draw one ball blindfolded. The background information … consists of the statements in the last two sentences. What is the probability that we draw a black one? (p. 42)

Of course, he goes on to say that the probability is 3/10 (which is obviously the “right” answer in some sense), because “the background information is indifferent to these ten possibilities”, so each ball has probability 1/10 of being drawn, and we can add the three chances for a black ball, since the background information entails that they are mutually exclusive events.

However, it looks to me like this is a mis-application of the principle, as he has stated it. The background information is explicitly *not* indifferent to the ten possibilities – it says that three of the balls are black and seven are white. A strict use of the indifference principle will say that balls 1,2,3,5,8,9,10 are all equally likely, and balls 4,6,7 are equally likely, but there’s no obvious way to apply the indifference principle to compare possibilities from one set and possibilities from the other. To see why this is the case, consider the following example, which is identical in terms of information content, but gives rise to an intuition other than 3/10:

our cabinet is known to contain ten balls of identical size and weight, labeled {1,2,…,10}. Three balls (numbers 4, 6, 7) are in the black drawer, the other seven are in the white drawer. We are to spin the cabinet and draw one ball blindfolded. The background information consists of the statements in the last two sentences. What is the probability that we draw one from the black drawer?

Unless our information includes something about how drawers and urns and paint and the like behave physically, there is no distinguishing between these two set-ups. However, it would seem quite odd to assign probability 3/10 in the latter set-up of drawing a ball from the black drawer – a better answer (if there is a right answer) seems like 1/2. But Jaynes seems to explicitly state that there is no information about drawers and urns in the background, since he says “the background information consists of the statements in the last two sentences”. (Something like this is exactly what changes between classical and quantum statistics of particle arrangements, so this is a relevant worry if we want to apply this objective Bayesianism to physics.)

Another way to get his answer would be to first consider the set-up where we’re told there are ten balls in the urn, and told their numbers, but not told anything about their colors. Now we see that the probability of drawing one of the balls 4,6,7 is 3/10, so when we learn that these three are black and the others are white, we conclude that the probability of getting a black ball is 3/10.

But this relies on the supposition that telling us the color of the balls has no effect on our rational degree of belief that any ball is drawn. Intuitively this seems right, but that’s only because we know how color behaves in the physical world – if it had been the size or shape, this would have been less clear, and properties about location or stickiness or solidity or whatever should clearly have changed the probabilities. Without this background information explicitly included, this update can’t work.

Additionally, there’s another way to reach this set-up from a slightly smaller set of background information. If we first just say that there is an urn with some balls in it, some of which are black and some of which are white, then the indifference principle would entail that the rational degree of belief in either black or white should be 1/2. But upon learning precisely which balls are black or white, we should somehow update our probabilities in a way that changes things – but how precisely to do this is left unspecified by the indifference principle.

So Jaynes must be implicitly appealing to some extra principles here in order to get the intuitive answers, unless he thinks the problems he set up implicitly contain more information than he has stated. If so, then he won’t be able to apply this objectively in actual physical scenarios where this background information isn’t known (which is why the experiment is being performed). This is no problem for a subjective Bayesian, because she doesn’t claim that an agent has no further information, or that there is a unique probability value that every rational agent must assign in this situation. It’s also no problem for someone who takes either a frequency or chance view of probability, since in any actual physical set-up we can assume that those numbers are well-defined, even though the agent has no access to them. (This causes problems for using frequency or chance as the sole basis of statistical inference, but that’s a different worry.) The situation seems uniquely troubling for the objective Bayesian.

Krzysztof Szymanek(03:15:00) :Sorry, this is not a comment.

Please tell me the name of the author of “Jaynes on the Indifference Principle”. I failed to find it.

Best regards

Krzysztof Szymanek

-NG(06:14:08) :hi kenny,

one of the interesting things to me about jaynes’ use of the indifference principle in many places (not necesarily the book, which i haven’t read in any detail) is that he really wants indifference to do the work that symmetry often does in modern physics. indeed, he often envokes the indifference principle to extract predictions after first stating the symmetries which relate states to whose differences we should be indifferent.

it seems that this way of doing things would address your concern. in particular, the difference between your two scenarios is (i think) some background assumptions about symmetries (balls vs drawers). of course, if we need to state symmetry assumptions to apply the p.o.i., we should probably be explicit about this in stating the principle….

-NG

Kenny(11:14:22) :Krzysztof – I hadn’t noticed until just now that my name and affiliation don’t seem to appear anywhere on the blog! I’ll fix that.

NG – That’s an interesting point – certainly, finding explicit symmetries that have to be respected would put applications of the indifference principle on a better footing, but I start to worry that there might be some sort of circularity here, because I’m not sure how Jaynes would suggest that we find out about symmetries, except by using some sort of Bayesian inference.

-NG(14:14:06) :yes, that’s probably true….

i’m not sure that jaynes would have endorsed this, but here’s my take on why it’s not circular (and/or silly) to learn those symmetries by Bayesian inference:

symmetries seem like a good candidate for framework-level knowledge (abstract knowledge shared across many specific systems, or domains). you can formalize this a bit with a hierarchical bayesian model in which symmetries live at an upper level — the principle of indiference gives the prescription for taking a symmetry and turning it into a prior on lower-level theories. because the high-level “symmetry theories” receive support from many specific systems it isn’t circular to use them to constrain inference in any one system. (the symmetries here act very much like (the other) goodman’s “overhypotheses”.)

-NG

Paul Shearer(09:39:39) :In this comment, I offer a formulation of the principle one should use to make the “update” from a 1/2 probability of drawing black to a 3/10 probability, once the colors of the balls are known. This principle I call the “minimal sufficient model principle”. Here is a formulation:

Given some information about a situation, a probabilistic model’s partitioning of probability space should be fine enough that all possible states-of-affairs consistent with the information are distinct events in this space (sufficiency); further, the probability space should contain no events inconsistent with the information (minimality).

Here are a couple applications of the minimal sufficient model principle.

When one is told only that the urn contains some black and some white balls, the probability 1/2 for drawing black is derived from applying the indifference principle to the 2 member probability space S = { [white drawn] , [black drawn] }. When we are told the color of the balls, we receive extra information about the possible draws which is not incorporated in S. As a result, the 1/2 probability no longer applies, because our space S does not incorporate all the given information. For example, it does not distinguish between drawing ball 1 and ball 2. To incorporate the given information one must refine S to S’ = { [ball 1 drawn], [ball 2 drawn], …}. Once the space is updated for sufficiency, the indifference principle can be correctly applied to S’ and yield the probability 3/10.

Here’s another way 1/2 could be derived: applying the indifference principle to the probability space

S = { [ball 1 drawn & is white], [ball 1 drawn & is black]. [ball 2 drawn & is white], …}

One determines the probability of drawing a black ball is 1/2. When we are told the color of the balls, however, some possibilities are no longer possible and must be stricken from S, giving us S’ as in the first example. Here the space S is sufficient but must be modified for minimality before applying the indifference principle.

The minimal sufficient model principle reminds us that before any probabilistic reasoning can be done, one must create a reasonable model of the events which are possible given a body of information.

As a final comment, here’s a situation where even the sufficient model principle does not provide a unique way forward for the modeler. Suppose we are shown a jar and told that it contains some proportion of oil and some proportion of water. We cannot determine any more information from looking at the jar. What probability distribution should we assume on the proportions?

There are at least two ways to apply our principles. In the first, we could induce a probability model from the proportion of water in the jar, then apply the indifference principle to that model. The result would be a model in which every proportion of water is equally likely. In the second, we could induce a model from the ratio of water to oil in the jar and apply the indifference principle to that. Then we have a model in which every ratio of water to oil is equally likely. A little calculus will show that the resulting distributions are not the same. [NB: one could avoid calculus by discretizing. Then in the first case our model states that 0-10% water is equally likely as 10-20%, 20-30% etc; in the second case, our model states that a ratio of oil to water in the 0-10% range is equally likely to 10-20%, 20-30%, etc.]

It seems like the first model is more justified by the information than the second. But how do we know that? Are we using some hidden information to make that decision?