Discovery in Mathematics

26 10 2005

Just as I looked for Gettier cases in mathematics a while ago, it looks like Brian Weatherson has some good candidates for Gettier cases in philosophy. Except that he’s looking for them to show that to discover P is not the same as to be the first person to know P. He uses the example of the existence of necessary a posteriori propositions (which he says Aquinas had justified true belief in, while Kripke was the first to properly know) to show that we might come to know something with no one discovering it. Aquinas didn’t discover it because he didn’t know, and Kripke didn’t discover it because he was just properly justifying a result he already knew of. (Even if that’s not how it actually went, it’s how it could have gone, and the moral should be the same for the relation between discovery and knowledge.)

However, I’m not quite convinced that we should deny Aquinas the discovery just because he didn’t know P. I claim that being the first person whose justified true belief in P gives others a good reason to believe P is what it takes to discover P. If Aquinas’ belief was such, then he should be credited with the discovery, and otherwise it seems that Kripke should.

In the comments on Brian’s post, Stefan Ionescu cites Fermat’s Last Theorem as a mathematical example that I think follows Brian’s example fairly closely – FLT is conjectured, Shimura and Taniyama make a (seemingly unrelated) conjecture, Ken Ribet shows S-T implies FLT, and then Andrew Wiles proves S-T. Fermat never knew it to be true, so on an account where discovery implies knowledge, he can’t be the discoverer. Wiles certainly didn’t discover FLT, because the statement was well-known and already widely believed to be true – he just definitively proved something (also widely believed) that was already known to imply FLT. Ribet also seems to be a bad candidate for the discoverer – all he did was relate two already widely believed statements.

So did Fermat discover it? Or was no one the discoverer, because Fermat never knew it to be true? (I think I’ll leave aside the question of whether knowledge of mathematical claims like FLT can come from sources other than proof.)

It certainly seems clear that Fermat discovered something – whether that something is FLT, or just “FLT is a good approximation to the truth” may be controversial. But just as Columbus discovered (from a non-Viking European context) America without knowing that’s what it was, Fermat may have discovered his theorem without having known that it was true. One might say the same thing about Michelson and Morley being the discoverers of the lack of an ether (or maybe some other related fact about light), though they never came up with a theory that properly predicted or explained it. These examples (built on Brian’s model for a mathematical parallel of his example) suggest that discovery may not require knowledge, but merely some strong enough sort of justified true belief. Just what sort of justification is involved, who knows – having mistaken a statement of an open problem for a homework assignment doesn’t seem sufficient justification to count as the discoverer.

Perhaps one way to measure whether A’s justified true belief counts as a discovery is whether B (knowing the incompleteness of A’s justification) could consider A’s belief a reason to believe the proposition. Ribet, Wiles, and the others all knew that Fermat had very good insight into the nature of the natural numbers, so even though they were convinced he didn’t have a proof, they could use his belief as a decent reason to believe that FLT was true. I don’t know enough about Aquinas’ arguments to say whether his belief provided Kripke a reason to believe that there are necessary a posteriori truths, but if it did, then perhaps he could in fact be credited with the discovery. If not, then perhaps Kripke could be, because he didn’t have an antecedent reason to believe before working out his theory.

If this account is right, then perhaps every piece of knowledge does have a discoverer (except maybe the odd open problem solved by a computer science student thinking it’s a homework assignment – apparently, since George Dantzig got his start this way and then went on to discover the Simplex Algorithm for linear programming, computer science professors often put open problems in homework sets). Probably some details will need to be worked out (especially with the “reason to believe” bit), but this seems a bit more plausible to me than the line that discovery requires knowledge. We credit explorers with all sorts of geographic discoveries they don’t properly understand or know the significance of – we should probably do the same for intellectual explorers.

Explanatory and Unexplanatory Proofs

23 10 2005

I remember one theorem that I proved, and yet I really couldn’t see why it was true. It worried me for years and years… I kept worrying about it, and five or six years later I understood why it had to be true. Then I got an entirely different proof… Using quite different techniques, it was quite clear why it had to be true.

Michael Atiyah, in a 1984 interview with the Mathematical Intelligencer, quoted in Jamie Tappenden’s Proof Style and Understanding in Mathematics I

On the FOM e-mail list there have recently been a few discussions of the notion of explanation in mathematics by Allen Hazen, Richard Heck, and Richard Zach, suggesting that it may or may not be ready yet as a well-defined enough topic to work on. However, I think probably the best way to give it better definition is to gather more examples of it.

So if you, or any of your friends or colleagues, have good examples of proofs that are explanatory (or not), then send them to me at easwaran at berkeley dot edu. I suppose if it’s a very short example, or just a link to some example posted elsewhere, a comment would be good too, so that others can see it. Once I have a few of them, I’ll try to figure out some useful way to make these proofs accessible to others too, and credit the submitters.

Probably the most useful examples would be two proofs of the same result, one of which is clearly a better explanation than the other. For instance, Fürstenberg’s topological proof of the infinitude of primes, despite being remarkably clever, is clearly not as good an explanation of this fact as Euclid’s original proof. Of course, plenty of good examples will probably be like whatever Atiyah was talking about above, and exist in contemporary research, rather than in well-established results. I don’t expect to necessarily understand the relevant proofs, but it’ll still be helpful to have a collection including them, both for other people’s use, and in case I want to check some general or structural relations between the proofs.

Partial Orders and Dominance Principles

22 10 2005

I suppose this is a more mathematical than philosophical post, but it’s related to issues that have come up in my thoughts on probability and decision theory.

When talking about orderings, we often use three connected relations, symbolized by the three symbols (<, ≤, ≈). The standard axioms are:

  • (Ir)reflexivity: it is not the case that x<x, and it is the case that x≤x and x≈x.
  • (Anti)symmetry: If x<y then it is not the case that y<x; if x≤y and y≤x then x≈y; if x≈y then y≈x.
  • Transitivity: If x<y and y<z then x<z; if x≤y and y≤z then x≤z; if x≈y and y≈z then x≈z.

We usually say in these conditions that < is a partial ordering of the strict type, ≤ is a partial ordering of the weak type, and ≈ is an equivalence relation.

When the relevant ordering involved is total (also called a linear ordering), we add the additional axiom:

  • Trichotomy: For any x and y, either x<y or y<x or x≈y; either x≤y or y≤x.

Of course, if we’re using all three symbols together, we normally want one more condition:

  • Coherence: If x<y is true, then x≤y is true and x≈y is false. If x≤y is true, then either x<y or x≈y is. If x≈y is true, then x<y is false and x≤y is true.

Given this coherence condition, we can reduce the number of axioms drastically by taking coherence to be a definition of one or two of the symbols in terms of the others, rather than an axiomatization of them. In the presence of trichotomy, it works just to lay down the axioms for ≤, and then define > as the negation of this relation, and ≈ as the symmetric part of it. However, without trichotomy, it becomes slightly tougher.

As a result, Jim Joyce uses both ≤ and < in defining the preference ranking of an agent in his book The Foundations of Causal Decision Theory, because he doesn’t want to presume the ordering is total, and thus he can’t define the strict relation as just the negation of the converse of the weak one. (Though he does define the equivalence relation as the intersection of the weak relation with its converse.)

However, it seems to me that a more natural way to define < and ≈ in terms of ≤ is to use antisymmetry of ≤ to define ≈, and then say that x<y holds just in case x≤y does and x≈y doesn’t. This latter definition is the same as saying that x<y just in case x≤y holds and y≤x doesn’t – in the case of a linear ordering we just needed to ask that y≤x doesn’t hold in order to get x<y, but here we also need to require x≤y. I believe there are a few results Joyce could have phrased more economically with this definition, instead of defining a ranking by using two independent relations.

However, there might be reasons to keep both definitions separate. For instance, given a preference ranking on the potential outcomes of actions, there’s a natural way to get at least some rankings on the actions themselves. (I will let s denote a possible state of the world, f and g denote possible actions, and f(s) and g(s) denote the corresponding outcomes of these actions.)

  • Dominance: If for every state s we have f(s)<g(s) then we should have f<g; if for every state s we have f(s)≤g(s) then we should have f≤g; If for every state s we have f(s)≈g(s) then we should have f≈g.

However, as it stands, this dominance principle leaves some preference relations among actions underspecified. That is, if f and g are actions such that f strictly dominates g in some states, but they have the same (or equipreferable) outcomes in the others, then we know that f≥g, but we don’t know whether f>g or f≈g. So the axioms for a partial ordering on the outcomes, together with the dominance principle, don’t suffice to uniquely specify an induced partial ordering on the actions.

The natural solution to this situation would be to take one or another of the three dominance principles to be basic and to use the resulting defined relation to fully specify the other two by coherence. The most natural way to do this is to use ≤-dominance and ≈-dominance to define the two corresponding relations, and then say that f<g just in case ≤ holds and ≈ doesn’t. This corresponds to the common supposition that one action is strictly better than another if in some situations it is strictly better, and in no situations is it worse. One should be indifferent between them iff one is indifferent between their outcomes in every state.

However, we normally feel that if two actions differ in outcome only on states with a total probability of zero, then the difference isn’t significant. This contradicts the idea that if the only difference is on a set of probability zero, and one action is strictly better than the other on that set, then the former action should be strictly preferred. But there are reasons to be indifferent in this case (not least of which is the fact that we can often permute the set of states in some seemingly unimportant way and transform one action into an action the recommendation of the previous paragraph would strictly prefer). As a result, it seems that we have to live with the incompleteness given by these dominance principles, at least until we come up with some other means of filling out the preference relation on actions (which is of course the goal of decision theory). Knowing about probabilities will certainly help here, but we may have to make these distinctions in circumstances where probabilities are difficult or impossible to come by.

No Gain?

5 10 2005

I’ve been reading Mike Resnik’s book Mathematics as a Science of Patterns and have found a lot of stuff I like in it. He makes the point that if we use the indispensability argument to show that mathematical entities exist, then they shouldn’t be that different from the entities postulated by theoretical physics. I didn’t know much about the particular examples he gives from physics, but I think I would take the argument a little bit in the other direction – if mathematical entities really are indispensable to physical theory, then we might as well take them to be concrete physical objects that just happen to lack a lot of the causal and spatiotemporal properties that other physical objects have.

In addition, the indispensability argument meets the epistemological challenge, because we get epistemic access to the objects by confirming the whole theory to which they are indispensable. I’m not sure if Hartry Field discusses this point much in Science without Numbers, but Burgess and Rosen (in A Subject with no Object) seem to be puzzled by the fact that Field both gives an epistemological challenge and argues for the dispensability of mathematics. They think either one alone should be enough, if successful, and therefore the fact that he has to give both calls each into question. But if the point of the epistemological challenge is merely to show that there is no direct epistemic access to these objects, then we need to establish dispensability to show that the indispensability argument doesn’t give us indirect epistemic access. So both arguments are needed.

However, on page 109, Resnik criticizes Field, saying “whether space-time points are mathematical or physical, abstract or concrete, there will be no real gain in using them to dispense with (other) mathematical objects unless they are more epistemically accessible than the objects they replace.” This sort of objection is related to the ones that say that space-time points really are mathematical objects, and therefore Field hasn’t succeeded in nominalizing anything.

However, I think all of this misses an important point. Although Field says he’s a nominalist, it seems to me that a more important point is that he’s trying to give an internal explanation of everything in physics rather than an external one. (Resnik compares this to the contrast between synthetic geometry (where we only talk about points and lines and such) and analytic geometry (where we refer to real-number coordinates as well) and thus talks about “synthesizing” physics rather than “nominalizing” it.) Whether space-time points are concrete, physical objects or abstract, mathematical ones, and whether we have good epistemic access to them or not, they are somehow much more intrinsic to the physical system than real numbers seem to be. Real numbers are applied in measuring distances, calculating probabilities, stating temperatures, and many other things. These seem to be many different areas of the natural world, and using real numbers to explain all of them seems to involve some sort of “spooky action at a distance” as I discussed several months ago. Field’s reconstruction of Newtonian mechanics is certainly an advance on this front, whether or not it has any metaphysical, epistemological, or nominalistic gains. Thus, Resnik is wrong when he says there is no real gain in using space-time points instead of real numbers.

The (Un?)reasonable Effectiveness of Mathematics

1 10 2005

The phrase, “the unreasonable effectiveness of mathematics” goes back to the title of an essay by the physicist Eugene Wigner in 1960. He points out that mathematics is developed largely on aesthetic grounds, and yet large parts of it eventually get co-opted by physics and the other natural sciences to formalize parts of their theories. There seems to be no reason to believe that mathematics (especially the limited fragment of mathematics that humans actually get around to developing) should have anything to do with the physical world. He then goes on to point out how surprising it should be that it’s even possible to formulate laws of physics in the first place, let alone that they should be mathematical. And he spends the last little bit of the essay discussing the conceivability both of finding a unified theory to which all our scientific theories are approximations, or of the impossibility of such a theory, which would leave us with multiple contradictory theories, each good for its own domain. The fact that we’ve managed to come so far seems to cry out for explanation.

Wigner seems to miss some aspects of the development of mathematics though. He suggests that mathematicians find something beautiful and develop it, but doesn’t point out that these theories are actually very often developed just to explain things in already-established areas of mathematics. For instance, complex numbers were developed to fill in the steps in the solution of certain cubic equations over the real numbers. At least some of the theory of groups was first developed specifically by Galois and Abel to show why there was no corresponding method for solving quintic equations. If all of mathematics was developed for motivations resembling these (as I think plausible), then once we realize that the very basic parts of mathematics are applicable, it may be no surprise that the rest of it is as well. If the natural numbers apply to some phenomenon, and some other theory was developed to explain the natural numbers, then it seems plausible that this theory would be applicable to the explanation of the phenomenon the natural numbers apply to.

Of course, this still leaves open the question of why so much mathematics seems to apply in contexts other than these. If group theory was developed to explain properties of real numbers and other fields, then why should it apply to the fundamental particles of physics in a context independent of any such field?

Greg Frost-Arnold has a fascinating post suggesting that in fact in pre-Galilean astronomy, the effectiveness of mathematics might not have seemed so unreasonable. After all, they believed then that the “heavenly bodies” had similar properties of permanence and perfection to the objects of mathematics. And if they were all created by the same God, then it would make sense that mathematics and astronomy had a lot of overlap. The effectiveness only started seeming really unreasonable when Newton showed that there were mathematical theories unifying earthly and astronomical motion.

At any rate, this contemporary effectiveness of mathematics, which seems so unreasonable, for some reason hasn’t been a very central question in the philosophy of mathematics. Instead, people have focused on more foundational questions about mathematics, like what the nature of mathematical truth is, and how it is that we have access to it. But I think Hartry Field’s program in Science Without Numbers gives the closest thing to an explanation for the effectiveness of mathematics. His main goal is to prove a certain claim about the ontology of mathematics (namely, that there is none), but I think it’s more successful as an extension of the methods of Krantz, Luce, Suppes, and Tversky in their Foundations of Measurement to explain how mathematics can be applied in a rigorous manner. He formulates the axioms of Newtonian mechanics in a way that the mathematics that is applied to it can be straightforwardly seen to be a conservative extension. Thus, he justifies this application.

Michael Dummett, in “What is Mathematics About?” criticizes this program, saying that “Field envisages the justification of his conservative extension thesis as being accomplished only piecemeal.” Dummett suggests that this would be unsatisfying, because it would never make mathematics completely justified, but only justify particular applications of particular theories. Whether or not he’s right that this is all that Field would accomplish (Field seems to claim to have shown that all of mathematics is conservative over any non-mathematical theory), I think that this is actually almost exactly the goal we should want to achieve. It wouldn’t do to suggest that any mathematical theory can be applied to any aspect of the world – there’s only certain applications that make sense, and only those should be justified. We would still face some puzzles as to why it is that so much mathematics ends up applying to so much of the physical world, but at least each particular application would no longer seem so unreasonable.