A Stronger Two-Envelope Paradox

24 05 2009

Consider the standard two-envelope paradox – there are two envelopes in front of you, and all you know is that one of them has twice as much money as the other. It seems that you should be indifferent to which envelope you choose to take. However, once you’ve taken an envelope and opened it, you’ll see some amount x of money there, and you can reason that the other envelope either has 2x or x/2, which has an expected value of 1.5x 1.25x, so you’ll wish you had taken the other.

Of course, this reasoning doesn’t really work, because of course you always know more than just that one envelope has twice as much money as the other, especially if you know who’s giving you the money. (If the amount I see is large enough, I’ll be fairly certain that this one is the larger of the two envelopes, and therefore be happy that I chose it rather than the other.)

But it’s also well-known that the paradox can be fixed up and made more paradoxical again by imposing a probability distribution on the amounts of money in the two envelopes. Let’s say that one envelope has 2^n dollars in it, and the other has 2^{n+1} dollars, where n is chosen by counting the number of tries it takes a biased coin to come up heads, where the probability of tails on any given flip is 2/3. Now, if you see that your envelope has 2 dollars, then you know the other one has 4 dollars, so you’d rather switch than stay. And if you see that your envelope has any amount other than 2, then (after working out the math) the expected value of the other one will be (I believe) 5/4 of the amount in your current envelope, so again you’d rather switch than stay.

This is all fair enough – Dave Chalmers pointed out in his “The St. Petersburg Two-Envelope Problem” that there are other cases where A can be preferred to B conditional on the value of B, while B can be preferred to A conditional on the value of A, while unconditionally, one should be indifferent between A and B. This just means that we shouldn’t accept Savage’s “sure-thing principle”, which says that if there is a partition of possibilities, and you prefer A to B conditional on any element of the partition, then you should prefer A to B unconditionally. Of course, restricted versions of this principle hold, either when the partition is finite, or the payouts of the two actions are bounded, or one of the unconditional expected values is finite, or when the partition is fine enough that there is no uncertainty conditional on the partition (that is, when you’re talking about strict dominance rather than the sure-thing principle).

What I just noticed is that it’s trivial to come up with an example where we have the same pattern of conditional preferences, but there should be a strict unconditional preference for A over B. To see this, just consider this same example, where you know that the two envelopes are filled with the same pattern as above, but that 5% of the money has been taken out of envelope B. It seems clear that unconditionally one should prefer A to B, since it has the same probabilities of the same pre-tax amounts, and no tax. But once you know how much is in A, you should prefer B, because the 5% loss is smaller than the 25% difference in expected values. And of course, the previous reasoning shows why, once you know how much is in B, you should prefer A.

Violations of the sure-thing principle definitely feel weird, but I guess we just have to live with them if we’re going to allow decision theory to deal with infinitary cases.

Probabilistic Proofs of Undecidable Sentences?

22 04 2009

As I suggested (though perhaps not made as clear as I would have liked) in my forthcoming paper “Probabilistic Proofs and Transferability”, I think that probabilistic proofs can be used to provide knowledge of mathematical facts, even though there are reasons why we might think they should remain unacceptable as part of mathematical practice. (Basically, even though mathematicians gain knowledge through testimony all the time, there may be reasons for a methodological insistence that all mathematical arguments not rely essentially on testimony, the way probabilistic proofs (just like experiments in the natural sciences) do.) Given that this allows for a non-deductive means of acquiring mathematical knowledge, a natural question would be whether probabilistic proofs can be used to settle questions that are undecided by standard axiom sets. Unfortunately, I think the answer is probably “no”.

The sort of probabilistic proofs that I endorse aren’t just any old argument that isn’t deductively valid. In my paper, I contrast probabilistic proofs of the primality of a given number with certain arguments in favor of the Goldbach Conjecture and the Riemann Hypothesis, as well as lottery arguments. The primality arguments work by showing that if n is prime, then a certain result is guaranteed, while if n is composite, then the probability of the same sort of result is at most 1/4k, where k can be made as large as one wants by generating more random numbers. As a result the evidence “tracks” the truth of the claim involved (that is, the evidence is very likely if the claim is true and very unlikely if the claim is false).

In the case of the Goldbach Conjecture, one of the pieces of “probabilistic” evidence that we have is that none of the first several million even numbers is a counterexample to the claim. However, although this evidence is very likely if the claim is true, we have no good a priori bounds on how likely this evidence would be if the claim were false. Thus, the evidence may give us some reason to believe the Goldbach Conjecture, but it’s hard to see just how much more evidence we’d need in order for this to give us knowledge that the Goldbach Conjecture is in fact true.

In the case of the Riemann Hypothesis, I don’t have room (or the knowledge) to state all the relevant pieces of evidence, but in this case the evidence is more of the type of “inference to the best explanation”, but again lacks any sort of probabilistic bounds on the possibility of it arising in misleading cases. This is not to say that this type of evidence can’t provide knowledge – in fact, we may already have enough evidence to count as knowing that the Riemann Hypothesis is true, though I’m definitely not certain of this. At any rate, if this does provide knowledge, it’s of a different sort than probabilistic proof, and may well have its own problems for acceptability in the mathematical canon.

In “lottery cases”, we have strong probabilistic evidence for the claim that a particular ticket will not win (in particular, that this ticket is in a fair lottery, and that there are a billion separate tickets, exactly one of which is a winner). However, most people want to say that we don’t have knowledge of this claim. Again, at least part of the reason is that our evidence doesn’t track the truth of the claim – nothing about our evidence is sensitive to the ticket’s status, and the evidence is just as likely if the claim is false (that is, the ticket is a winner) as if it’s true (that the ticket isn’t a winner).

Therefore, it seems that for any sort of evidence that we’d count as a “probabilistic proof” that gives us knowledge of some mathematical claim, we’d want to have good bounds on the probability of getting such evidence if the claim were true, as well as good bounds on the probability of getting such evidence if the claim were false. At the moment, I can’t really conceive of ways of generating mathematical evidence where there would be clear bounds on these probabilities other than by randomly sampling numbers from a particular bounded set. (There may be possibilities of sampling from unbounded sets, provided that the sampling method doesn’t rely on a non-countably-additive probability, but it’s a bit harder to see how these sorts of things could be relevant.)

Now let’s consider some claim that is independent of Peano Arithmetic, and see if there might be any way to come up with a probabilistic technique that would give evidence for or against such a claim, that had the relevant probabilistic bounds.

Unfortunately, I don’t think there will be. Consider two models of PA – one in which the claim is true, and one in which it’s false. We can assume that one of these models is the standard model (that is, it just consists of the standard natural numbers) while the other is some end-extension of this model. If our probabilistic method involves generating natural numbers through some random process and then checking their properties, then this method will have the same probabilities of generating the same numbers if applied in either one of these models (since presumably, the method will only ever generate standard natural numbers, and not any of the extra things in the non-standard model). Thus, any result of such a test will have the same probability whether the statement is true or false, and thus can’t be sensitive in the sense demanded above.

Now, one might worry that this argument assumes that conditional probabilities will have to be probabilities given by considering a particular non-standard model. This clearly isn’t what we’re doing in the case of primality testing however – we don’t consider the probabilities of various outcomes in a model where n is prime and in a model where n is not prime, not least because we have no idea what a model of the false one would even look like.

However, I think this is what we’d have to do for statements that could conceivably be independent of PA. Consider any process that involves sampling uniformly among the integers up to k, and then performing some calculations on them. Knowing the truth of PA, we can calculate what all the possible results of such a process could be, and verify that these will be the same whether our independent statement is true or false. Since this calculation will tell us everything we could possibly gain from running the probabilistic process, and it hasn’t given us knowledge about the independent statement, the probabilistic process can’t either, since it adds no new information, and therefore no new knowledge.

If we allow for probabilistic processes that sample from all natural numbers (rather than just sampling uniformly from a bounded range of them) I think we still can’t get any relevant knowledge. Assuming that the sampling process must have some countably additive probability distribution, then for any ε there must be some sufficiently large N that the probability that the process never generates a number larger than N is at most ε. By the above argument, the process can’t tell us anything about the independent statement if it never generates a number larger than N. Therefore, the difference in probability of various outcomes given the statement and its negation can’t be any larger than ε. But since ε was arbitrary, this means that the difference in probabilities must be 0. Thus, the process still fails to meet the sensitivity requirement, and therefore can’t give knowledge about the independent statement.

Thus, we can’t get probabilistic information about independent claims. Of course, since many people will be willing to say that we have good reason to believe the Goldbach Conjecture, and the Riemann Hypothesis, on the evidence mentioned above, we may be able to get arguments of those sorts for independent claims. But such arguments aren’t really “probabilistic” in the sense standardly discussed.

Additionally, there are many claims independent of Peano Arithmetic that we can prove using standard mathematical means. Some examples include Goodstein’s Theorem, the Paris-Harrington Theorem, and Borel Determinacy. (The latter is in fact so strong that it can’t be proven in Zermelo set theory, which is ZF set theory without the Axiom of Replacement.) In fact, the arguments given above show that probabilistic methods not only fail to decide claims that are undecided in PA, but in fact fail to decide claims that are independent of substantially weaker systems, like Robinson arithmetic and probably even Primitive Recursive Arithmetic. Since basically everything that is of mathematical interest is independent of PRA (basically all it can prove is statements saying that a recursive function has a specific value on specified inputs, like “376+91=467”) this means that probabilistic proofs really can’t do much. All they can do is give feasible proofs of claims that would otherwise take a very long time to verify, as in the primality of a particular number.

Guessing the result of infinitely many coin tosses

26 09 2008

I’ve stolen the title of a very interesting post by Andrew Bacon over at Possibly Philosophy. He considers a situation where there will be infinitely many coin flips occurring between noon and 1 pm, but instead of getting faster and faster, they’re getting slower and slower. In particular, one coin flip occurs at 1/n past noon for each integer n. He shows that there exists a strategy for guessing, where each guess may use information about how previous coin flips have come up (which sounds like it should be irrelevant, because they’re all independent 50/50 flips), such that one is guaranteed to get all but finitely many of the guesses correct. In particular, this means that although there is no first flip, there is a first flip on which one gets an incorrect guess, so up to that point one has successfully guessed the result of infinitely many independent fair coin tosses.

Check out his post to see the description of the strategy, or work it out yourself, before going on.

Anyway, I managed to figure this out, only because it’s formally identical to a game with colored hats that I had discussed earlier with math friends. But I think Andrew’s version with the coin flips is somehow even more surprising and shocking!

He begins his discussion with a discussion of a very nice paper by Tim Williamson showing that allowing infinitesimal probabilities doesn’t get us around the requirement that the probability of an infinite sequence of heads be 0. (Coincidentally, I’m currently working on a paper extending this to suggest that infinitesimal probabilities really can’t do much of anything useful for epistemological purposes.) Andrew concludes with a bit more discussion of these probabilistic issues:

Now this raises some further puzzles. For example, suppose that it’s half past twelve, and you know the representative sequence predicts that the next toss will be heads. What should your credence that it will land heads be? On the one hand it should be 1/2 since you know it’s a fair coin. But on the other hand, you know that the chance that this is one of the few times in the hour that you guess incorrectly is very small. In fact, in this scenario it is “infinitely” smaller in comparison, so that your credence in heads should be 1. So it seems this kind of reasoning violates the Principal Principle.

I was going to leave this following series of remarks as a comment, but it got bigger than I expected, and I liked this puzzle so much that I figured I should point other people to it in my own post anyway. So I don’t think that this puzzle leads to a violation of the Principal Principle. (For those who aren’t familiar, this principle comes from David Lewis’ paper “A Subjectivist’s Guide to Objective Chance” and states the seeming triviality that when one knows the objective chance of some outcome, and doesn’t know anything “inadmissible” specifically about the actual result of the process, then regardless of what other historical information one knows, one’s degree of belief in the outcome should be exactly equal to the chances. The only reason this principle is normally in dispute is in terms of figuring out what “inadmissible” means – if we can’t make sense of that concept then we can’t use the principle.)

The set of representatives involved in picking the strategy is a non-measurable set. To see this, we can see that the construction is basically a version of Vitali’s construction of a non-measurable set of reals. The only difference is that in the case of the reals we make use of the fact that Lebesgue measure is translation invariant – here we make use of the fact that measure is invariant under transformations that interchange heads and tails on any set of flips. Instead of the set of rational translations, we consider the set of finite interchanges of heads and tails. (This argument is basically outlined in the paper by Kadane, Schervish, and Seidenfeld, “Statistical Implications of Finitely Additive Probability”, which I’ve been working through more carefully recently for another paper I’m working on.)

This suggests to me that the set of situations in which you make the wrong guess on flip n may well be unmeasurable as well. (I haven’t worked out a proof of this claim though. [UPDATE: I have shown that it can sometimes be measurable depending on the strategy, and I conjecture that whenever it is measurable, it will have probability 1/2. See the comments for the argument.]) So from the outset (before the game starts) it looks like you can’t (or perhaps better, don’t) assign a credence to “I will make the wrong choice on flip n”. However, once it’s actually 1/n past twelve, you’ve got a lot more information – in particular, you know the outcomes of all the previous flips. I suspect that conditional on this information, the probability of making the wrong choice on flip n will just be 1/2, as we expect. (Of course, we have to make sense of what the right conditional probability to use here is, because we’re explicitly conditionalizing on a set of probability 0, as per Williamson.) Thus, there’s no violation of the Principal Principle when the time actually comes.

However, there does appear to be a violation of a principle called “conglomerability”, which is what my interest in the Kadane, Schervish, and Seidenfeld paper is about. One version of this principle states that if there is a partition of the probability space, and an event whose probability conditional on every event in this partition is in some interval, then the unconditional probability of that event must also be in this interval. In this case we’ve got a partition (into all the sequences of flips down to the nth) and the probability of guessing wrong conditional on each event in that partition is 1/2, and yet I’ve said that the unconditional probability is undefined, rather than 1/2.

I think I have a defense here though, which is part of my general defense of conglomerability against several other apparent counterexamples (including some in this paper and others by Seidenfeld, Schervish, and Kadane, as well as a paper by Arntzenius, Elga, and Hawthorne). The problem here is that the event whose unconditional probability we are interested in is unmeasurable, so it doesn’t even make sense to talk about the unconditional probability. In the case of the conditional probabilities, I’d say that strictly speaking it doesn’t make sense to talk about them either. However, conditional on the outcome of the flips down to the nth, this event is coextensive with something that is measurable, namely “the coin will come up heads” (or else perhaps, “the coin will come up tails”), because the sequence of flips down to that point determines which outcome I will guess for that flip. Thus these events have well-defined probabilities.

The only reason this situation of an apparent violation of conglomerability emerges is because we’re dealing explicitly with unmeasurable sets. However, I’m only interested in the application of probability theory to agents with a certain type of restriction on their potential mental contents. I don’t want to insist that they can only grasp finite amounts of information, because in some sense we actually do have access to infinitely many pieces of information at any time (at least, to the extent that general Chomskyan competence/performance distinctions suggest that we have the competence to deal with infinitely many propositions about things, even if in practice we never deal with extremely large sentences or numbers or whatever). However, I think it is reasonable to insist that any set that an agent can grasp not essentially depend on the Axiom of Choice for the proof that it exists. Although we can grasp infinitely much information, it’s only in cases where it’s nicely structured that we can grasp it. Thus, the strategy that gives rise to this problem is something that can’t be grasped by any agent, and therefore this problem isn’t really a problem.

An Economic Argument for a Mathematical Conclusion

27 04 2008

How valuable is an income stream that pays $1000 a year in perpetuity? Naively, one might suspect that since this stream will eventually pay out arbitrarily large amounts of money, it should be worth infinitely much. But of course, this is clearly not true – for a variety of reasons, future money is not as valuable as present money. (One reason economists focus on is the fact that present money can be invested and thus become a larger amount of future money. Another reason is that one may die at any point, and thus one may not live to be able to use the future money. Yet another reason is that one’s interests and desires gradually change, so one naturally cares less about one’s future self’s purchasing power as one’s current purchasing power.) Thus, there must be some sort of discount rate. For now, let’s make the simplifying assumption that the discount rate is constant over future years, so that money in any year from now into the future is worth 1.01 times the same amount of money a year later.

Then we can calculate mathematically that the present value of an income stream of $1000 a year in perpetuity is given by the sum \frac{1000}{1.01}+\frac{1000}{1.01^2}+\frac{1000}{1.01^3}\dots. Going through the work of summing this geometric series, we find that the present value is \frac{1000/1.01}{1-1/1.01}=\frac{1000}{1.01-1}=100,000. However, there is an easier way to calculate this present value that is purely economic. The argument is not mathematically rigorous, but there are probably economic assumptions that could be used to make it so. We know that physical intuition can often suggest mathematical calculations that can later be worked out in full rigor (consider things like the Kepler conjecture on sphere packing, or the work that led to Witten’s Fields Medal) but I’m suggesting here that the same can be true for economic intuition (though of course the mathematical calculation I’m after is much simpler).

The economic argument goes as follows. If money in any year is worth 1.01 times money in the next year, then in an efficient market, there would be investments one could make that pay an interest of 1% in each year. Investing $100,000 permanently in this and taking out the interest each year gives rise to this income stream, and thus one can fairly trade $100,000 to receive this perpetual income stream, so they must be equal in value. We don’t need to sum the series at all.

Now perhaps there is a sense in which the mathematical argument given above and the economic argument given below can be translated into one another, but it’s far from clear to me. Thus, it looks like at least sometimes, economic intuition can solve mathematical problems. People often talk about the “unreasonable effectiveness of mathematics in the sciences”, but here I think I have another example of the unreasonable effectiveness of the sciences in mathematics.

Probabilistic Causation in Hungary

20 12 2007

Budapest is a very nice city, and this sounds like an interesting program – I’m just not yet sure whether I can plan anything for that time period, or else I would certainly apply.

Course Dates: JULY 21 – AUGUST 1, 2008
Location: Central European University (CEU), Budapest, Hungary,
Detailed course description: http://www.sun.ceu.hu/causality

Faculty: Miklos Redei, Department of Philosophy, Logic and Scientific Method, London School of Economics, UK; Nancy Cartwright, London School of Economics and Political Science, UK; Damian Fennell, London School of Economics and Political Science, UK; Gabor Hofer-Szabo, King Sigismund College, Budapest, Hungary; Ferenc Huoranszki, Central European University, Budapest, Hungary
Laszlo E. Szabo, Eötvös University, Budapest, Hungary; Richard E. Neapolitan, Northeastern Illinois University

Target group: advanced graduate students, postdoctoral fellows, junior faculty and researchers in philosophy, physics, economics and computer science
Language of instruction: English
Tuition fee: EUR 500, financial aid is available.
The application deadline: February 14, 2008 (for scholarship places), April 30, 2008 (for fee-paying applications)
Online application: http://www.sun.ceu.hu/apply (attachments to be sent by email to causality@ceu.hu).

For further information queries can be directed to the SUN office by email (summeru@ceu.hu), via skype (ceu-sun) or telephone (00-36-1-327-3811).

Probability and Bayesian Epistemology

10 12 2007

From the last Carnival of Philosophy, I found a post by another Kenny about the relation between Bayesian epistemology and probability! He puts forward three views of what this relation might be:

Here are brief definitions of each view, and how each one relates subjective degrees of rational confidence to probabilities (I will explain in more depth later).

* (P) takes subjective degrees of rational confidence as primitive. There is no state space for degrees of rational confidence, because they aren’t probabilities.
* (KPW) takes subjective degrees of rational confidence to be actual probabilities over the state space of all epistemically possible worlds, where the epistemically possible worlds are formal constructions that may or may not be objectively possible.
* (LPW) takes subjective degrees of rational confidence to be actual probabilities over the state space of the subset of the really possible worlds which are epistemically accessible.

However, he seems to be focused on a very particular understanding of the word “probability” that might not quite be what I would mean by it. The very fact that he talks about a relation between rational degrees of confidence and probabilities suggests that he’s understanding the word differently from how I am.

My understanding of the word is that “probability” refers to any function from a Boolean algebra to the real numbers satisfying the following three properties: (1) it is never negative; (2) the tautology is assigned value 1; (3) finite additivity (that is, given two elements whose conjunction is the contradiction, the probability of their disjunction is the sum of their probabilities). I’d also be willing to apply the term “probability” in cases where instead of a Boolean algebra in the strict mathematical sense, one uses any structure where the terms “tautology”, “conjunction”, “contradiction”, and “disjunction” have a natural interpretation.

It seems that Kenny Pearce, by contrast, understands the term to require that the algebra be an algebra of sets over some state space, and that there be some objective fact about the probability values. If this interpretation is right, then I don’t think I’d quite take any of the positions he mentions. At any rate, I think I support something more like (KPW) than the others, where “actual probabilities” isn’t taken in any objective sense. In explaining this position, I think I can give answers to three questions he raises:

1. Why should we suppose that we can use the math of probability theory in dealing with degrees of rational confidence?
2. The math of probability theory is generally interpreted in terms of sets called state spaces, but, ex hypothesi, degrees of rational confidence, not being probabilities, have no state spaces. What, then, does the math mean?
3. Why should we suppose that when an occurrence has a well defined objective probability, our subjective degree of rational confidence should be assigned a value equal to its probability?

In response to the first question, the standard answer would be to refer to something like a Dutch book argument – degrees of rational confidence can be described by the mathematics of probability theory because if degrees of confidence couldn’t, then the agent would be subject to a certain loss from a set of bets they would be willing to take, and therefore would be irrational. (There’s some slipperiness here with generating the bets from the confidences, and concluding irrationality based on a collection of bets the agent may take individually, but I think this can be cleaned up.) There’s also a host of other arguments for something like this same conclusion (though Alan Hájek raises issues for them in his (forthcoming?) “Arguments For – Or Against? – Probabilism”). As Kenny Pearce notes, nothing about these arguments requires there to be a state space, so they don’t end up being probabilities in his sense (due to Kolmogorov), but they do seem to be probabilities in the sense I use (and Popper, and Borel, and others).

As for the second question, I think that there actually is a state space that is relevant for degrees of rational confidence, which is why I lean more towards something like what Kenny Pearce calls (KPW) rather than (P). The state space here would be the set of epistemic possibilities (whatever those are – I don’t really have a good theory of them, do you?). Despite my lack of an account of them, I think they do need to play a role. I think we can’t make very good sense of the notion of a degree of confidence in p, supposing q, without a set of possibilities that we can restrict to the q-possibilities. Also, these epistemic possibilities seem to play an important role in other aspects of epistemic modality, not just degree of belief. And most importantly, I think there’s a rational difference between having an rational confidence of 0 in p and actually being certain that p will not happen. When measuring the speed of light, there’s a difference between my attitude towards it being exactly 2.9980000000000001 x 108 m/s, and my attitude towards it between 3 m/s – I consider the former possible given what I know, and the latter not. However, since there is some interval around 2.998 x 108 m/s that I can’t rule out, and there are infinitely many such values that I am indifferent between, I can’t give any of them a positive value without either violating additivity or assigning values larger than 1 to certain disjunctions. So I propose that the state space contains infinitely many epistemic possibilities, and that my degree of confidence in certain sets of these possibilities is 0, even though the set is non-empty. (Of course, for the empty set, I trivially have confidence 0 in that set of possibilities.) So I think this aspect of the math actually applies quite well to degrees of confidence, though I’m willing to concede that many people will want to challenge this point, and I don’t think it’s as important as the point that degrees of confidence must be probabilities in something like the general sense I outlined earlier.

However, I don’t think such a state space comes with objectively correct probabilities to assign – after all, it’s infinite, and Bertrand’s Paradox shows how all sorts of troubles arise when we think that symmetries of an infinite space constrain probability assignments.

As for the third question, I’m not sure I agree with its premise. I’m not totally convinced that when there is a well-defined objective probability, we should match it with our degrees of confidence. Consider a fair coin that has just been tossed. There is some sense in which it had an objective probability of 1/2 of coming up heads, so this principle would suggest having degree of belief 1/2 in heads. But if I also know that this coin was one of 10 fair coins flipped at that point, 9 of which happened to come up heads, then (in the frequency sense, as opposed to the chance sense) there is also an objective probability of 9/10 of that coin being heads up, so this principle would suggest the contradictory degree of belief of .9. Maybe in this situation one of the two principles wins out (my guess would be the latter), but I don’t really know under what circumstances something like this should be the case. Of course, I also don’t really know what sorts of objective things count as “well-defined objective probabilities” – is it chances, frequencies, or something else? There are many well-defined objective things that obey the mathematics of probability, but it’s an interesting question which (if any) should be tracked by our degrees of confidence.

Kenny Pearce suggests that on the (KPW) theory of degrees of confidence, it’s the fact that “the worlds … divide more or less evenly” that makes us assign 1/6 to each of the propositions about the way the die might land up. I don’t think there’s such thing as an objective measure over this infinite state space, so we can’t even make sense of the worlds dividing more or less evenly. Thus, if there is some objective reason for the degrees of belief we assign, I don’t know what it is yet, but I don’t think it could be anything like what Kenny Pearce suggests in either (KPW) or (LPW). (Also, I don’t think (LPW) is even a viable candidate, because this is supposed to be a theory of degree of rational belief, and actual possibilities have almost nothing to do with rational epistemic possibilities – one could try to make a modified 2-dimensionalist version of this strategy, as Frank Jackson does, but I’m not convinced that this will work.)

I think that these degrees of confidence exist, and are actually often much more precise than we realize (there’s no reason we should have transparent access to exactly what our degrees of belief are), but they’re not constitutively tied to any sort of objective probability in the sense that Kenny Pearce was expecting for a relation between Bayesian epistemology and probability. These degrees of belief are themselves probabilities, just in a different interpretation than Kenny Pearce was specifically considering.

Jaynes on the Indifference Principle

24 09 2007

I’ve started reading Jaynes’ book on probability theory, to get a better sense of how objective Bayesians think about things. One thing I found interesting (and a bit frustrating) was his argument for the “indifference principle”, stating that, conditional on background information that says nothing about possibility A without also saying it about possibility B, A and B must have the same probability.

The argument for this principle is quite interesting. He starts with the premise that a rational agent (or “robot”, as he often calls it) must assign probabilities to outcomes just based on the information about them, and that the probabilities should be the same in situations with identical information. Thus, if there are two propositions about which the information says nothing different, we can interchange them and end up in an identical situation to how we started, so the probabilities assigned must be the same. It’s a nice little argument, but I think it relies on a missing premise, which states that given any background information, there is a set of probabilities that it is uniquely right to assign – if many probability assignments are all allowed (as most subjective Bayesians will say), then this argument won’t entail that they all have to obey the indifference principle, as long as every permissible assignment with one asymmetry has a corresponding permissible assignment with the other asymmetry.

What makes me more suspicious about this indifference principle is how Jaynes actually goes on to use it. He says that using it requires the background information about the different propositions to actually be identical, but his very first use of it violates this condition!

Consider the traditional ‘Bernoulli urn’ of probability theory; ours is known to contain ten balls of identical size and weight, labeled {1,2,…,10}. Three balls (numbers 4, 6, 7) are black, the other seven are white. We are to shake the urn and draw one ball blindfolded. The background information … consists of the statements in the last two sentences. What is the probability that we draw a black one? (p. 42)

Of course, he goes on to say that the probability is 3/10 (which is obviously the “right” answer in some sense), because “the background information is indifferent to these ten possibilities”, so each ball has probability 1/10 of being drawn, and we can add the three chances for a black ball, since the background information entails that they are mutually exclusive events.

However, it looks to me like this is a mis-application of the principle, as he has stated it. The background information is explicitly not indifferent to the ten possibilities – it says that three of the balls are black and seven are white. A strict use of the indifference principle will say that balls 1,2,3,5,8,9,10 are all equally likely, and balls 4,6,7 are equally likely, but there’s no obvious way to apply the indifference principle to compare possibilities from one set and possibilities from the other. To see why this is the case, consider the following example, which is identical in terms of information content, but gives rise to an intuition other than 3/10:

our cabinet is known to contain ten balls of identical size and weight, labeled {1,2,…,10}. Three balls (numbers 4, 6, 7) are in the black drawer, the other seven are in the white drawer. We are to spin the cabinet and draw one ball blindfolded. The background information consists of the statements in the last two sentences. What is the probability that we draw one from the black drawer?

Unless our information includes something about how drawers and urns and paint and the like behave physically, there is no distinguishing between these two set-ups. However, it would seem quite odd to assign probability 3/10 in the latter set-up of drawing a ball from the black drawer – a better answer (if there is a right answer) seems like 1/2. But Jaynes seems to explicitly state that there is no information about drawers and urns in the background, since he says “the background information consists of the statements in the last two sentences”. (Something like this is exactly what changes between classical and quantum statistics of particle arrangements, so this is a relevant worry if we want to apply this objective Bayesianism to physics.)

Another way to get his answer would be to first consider the set-up where we’re told there are ten balls in the urn, and told their numbers, but not told anything about their colors. Now we see that the probability of drawing one of the balls 4,6,7 is 3/10, so when we learn that these three are black and the others are white, we conclude that the probability of getting a black ball is 3/10.

But this relies on the supposition that telling us the color of the balls has no effect on our rational degree of belief that any ball is drawn. Intuitively this seems right, but that’s only because we know how color behaves in the physical world – if it had been the size or shape, this would have been less clear, and properties about location or stickiness or solidity or whatever should clearly have changed the probabilities. Without this background information explicitly included, this update can’t work.

Additionally, there’s another way to reach this set-up from a slightly smaller set of background information. If we first just say that there is an urn with some balls in it, some of which are black and some of which are white, then the indifference principle would entail that the rational degree of belief in either black or white should be 1/2. But upon learning precisely which balls are black or white, we should somehow update our probabilities in a way that changes things – but how precisely to do this is left unspecified by the indifference principle.

So Jaynes must be implicitly appealing to some extra principles here in order to get the intuitive answers, unless he thinks the problems he set up implicitly contain more information than he has stated. If so, then he won’t be able to apply this objectively in actual physical scenarios where this background information isn’t known (which is why the experiment is being performed). This is no problem for a subjective Bayesian, because she doesn’t claim that an agent has no further information, or that there is a unique probability value that every rational agent must assign in this situation. It’s also no problem for someone who takes either a frequency or chance view of probability, since in any actual physical set-up we can assume that those numbers are well-defined, even though the agent has no access to them. (This causes problems for using frequency or chance as the sole basis of statistical inference, but that’s a different worry.) The situation seems uniquely troubling for the objective Bayesian.