What I was looking for The Logic of Decision (by Richard Jeffrey) in the library the other day, I found a wonderful collection sitting right next to it called Decision, Probability, and Utility, by Peter Gärdenfors and Nils-Eric Sahlin. I’ve read a bunch of essays there to get more of a sense of the history of decision theory, and what topics and axioms are considered important, because I’ve been working a bit on some paradoxes based on the work of Alan Hájek.
One essay in particular that caught my attention was “Bets and Beliefs” (from 1968) by Henry Kyburg, in which he argues against the subjective account of probability then being pushed then by Jeffrey and others (which is now perhaps the leading account, at least in philosophical circles). I haven’t yet gone to Google Scholar to find out what the responses to this paper are like, but it seems to me that all his complaints can be addressed by slightly more sophisticated subjective probabilities. In particular, the two objections he has are that it seems unreasonable to think that we have a specific probability for every event (rather than some vague interval), and that we seem to be able to get a lot of precise predictions about long-run frequencies from apparently weak assumptions about subjective probabilities.
Both of these complaints are largely solved when we consider the possibility of the subjectivist assigning credences not just to an individual event but to some hidden variable like frequency, or objective chance, and using Lewis’ Principal Principle (that if “p” represents the objective probability and “P” the subjective probability, then P(x|p(x)=r)=r).
If Kyburg wants to say that my subjective probability of the Democrats winning the next election isn’t exactly .4857339… or .5293758…, but merely somewhere in the interval between .4 and .6, then my subjectivist can say that the objective chance is somewhere between .4 and .6, and the subjective probability for the value of this objective chance is uniformly distributed, or perhaps follows some other more sensitive distribution. This still ends up giving an infinitely precise subjective probability for the event (namely, the expectation of the objective chances), but this seems to me no worse than Kyburg’s infinitely precise endpoints of his interval. If I have a non-uniform distribution, I can make more clear the fact that I think it’s close to .5. And I also no longer need to be quite as careful about the endpoints of the interval, because I can just assign very low probabilities to the values near the endpoint.
Now, when I learn more detailed facts about an event, I can adjust the distribution of my credence over these different objective chances. So if I have a coin, and start with a uniform subjective distribution over the possible biases of the coin, then observe a flip in which it comes up heads, my new distribution will favor the biases that lean towards heads. If the second flip comes up tails, then my distribution will start being concentrated towards the center. Thus, I can do just what Kyburg wants and become extremely confident that the objective chance is close to .5 after observing a very large number of flips, while being extremely unconfident about the objective chances of a Democrat being elected in 2008, even though my subjective credence in the coin coming up heads and a Democrat being elected might both be exactly .5
Kyburg’s other challenge is developed in much more detail. He proves a result talking about a sequence of exchangeable events (basically, this is de Finetti’s way of talking about independent and identically distributed events without making any assumptions about objective probabilities – in this case let’s say that we are drawing balls (with replacement) from an urn and are wondering about whether or not the ball will be purple) such that the unconditional probability of any particular draw being purple is m1, and the conditional probability of a draw being purple given only that some specific other draw is purple is m2. He shows that in a very long sequence, the probability that the proportion of purple balls drawn differs from m1 by more than km2 is less than 1/k^2. In particular, if we assign probability .01 to a purple ball being drawn, and only .02 to a purple ball being drawn given that another draw was purple, then the probability of a long sequence having 11% (sic – I think he means 21%) purple balls is at most .01, or having 50% purple balls is at most .0004.
The problem here is that my seemingly very weak low credence in a purple ball being drawn seems to give me an extremely low credence in a long string of balls coming up approximately half purple. It somehow seems like cheating to get this much knowledge about long strings of draws almost entirely a priori. Thus, this subjectivist framework seems to be doing something wrong.
But I think the framework of assigning credence to objective chances (or in this case frequencies) explains again what’s going on. If we assigned probability .01 to a purple ball being drawn initially, it’s probably not because we thought that exactly .01 of the balls in the urn were purple, but rather because we assigned some credence to this possibility – but also some credence to various other higher or lower amounts of purple balls (and probably a fairly high credence to there being no purple balls). Now, let’s see what happens to our credence in the next ball being purple when we draw a ball from the urn and find that it’s purple. We have now eliminated all the cases where there are no purple balls, and increased by quite a bit our credence in any situation with a lot of purple balls, and increased less so ones with fewer purple balls.
But once we start thinking like this, we see that if we assign a probability greater than .0004 to at least half the balls being purple, then (doing a bit of calculation), we see that either we have to initially assign a probability greater than .01 to a ball drawn being purple, or else seeing a purple ball on the first draw will confirm a high proportion of the balls being purple enough that we will have to assign a probability greater than .02 to the second ball drawn being purple as well. For instance, say we give probability .0004 to half the balls being purple, and probabiliy .9996 to .009 of the balls being purple, so that the initial probability of a ball being purple is approximately .01. But then the probability of the former case occurring and us drawing a purple ball in the first draw is .0002, while the probability of the latter occurring while we draw a purple ball in th first draw is a bit below .009996. Conditioning on this event, we now see that the probability of the second ball being purpls will be about .0002/.010196, which is a bit more than .02. So those supposedly innocent initial assumptions we made were actually quite strong – they said that we have to assign very low probability initially to half the balls being purple, if we want to have a low probability of the first ball being purple, and also say that a purple ball does little to confirm further purple balls.
So if we take the standard subjectivist framework, but allow credences in frequencies and objective chances and the like in addition to simple events, then we see that the initial beliefs Kyburg talked about really are stronger than he thought. Thus, Kyburg’s objections no longer seem so problematic, and we see that subjective probability really can do quite a lot, especially when we remember that subjective probabilities are assigned to everything – that is, to frequencies and objective probabilities (if we think they exist), and not just to individual draws from urns. I believe this realization is what has made Bayesianism such a strong contender against frequentism in statistics departments recently, though I’m still a bit uncertain about that particular debate.