Mathematics as a Natural Language

6 06 2007

My friend Luke Biewald pointed me to an interesting post suggesting that the language of mathematics is actually best viewed as a natural language rather than a formal language. I think some of the points the author makes about math don’t separate natural and formal languages (recursion, self-reference, alphabet and rules for combination, and so on). And I think that since he works in natural language processing, he may be thinking of being natural as a sort of simplicity rather than complexity (as I would think of it). But it’s an interesting point.

Mathematicians basically never write fully formal proofs in the sense that logicians like to talk about. They regularly “abuse notation” and overload symbols in order to simplify their way of speaking. Many of these changes are in fact quite historically contingent – if we hadn’t originally started abbreviating things one way, further developments that way would have looked incomprehensible to a community that had abbreviated things differently.

Given the fact that it is actually used by a relatively large community for certain essential (to those people) types of communication, it has most likely picked up a lot of the irregularities and “irrationalities” that plague natural languages – probably much more so than constructed languages like Esperanto and Klingon. I don’t know what all the relevant differences are between “real” natural languages, Klingon, and math, but they may help reveal something interesting about at least one of these languages.


Time Reversal and Conversations

29 03 2007

My boyfriend was just telling me about a recent cover story in New Scientist magazine, which had a provocative title about remembering the future, but apparently just said something about how the brain processes information about the future and about the past in similar ways. At any rate, it stimulated a conversation about what it would be like if there were beings that “remembered” the future the way that we remember the past. After trying to figure out whether this would necessarily involve them experiencing things in reverse order, I at least came up with an argument that we wouldn’t really be able to effectively communicate with these beings (at least given some aspects of certain theories of conversation). This is true even bracketing any concerns about strange thermodynamic properties of these systems, or concerns about being able to produce sensible utterances that individually have meaning to both these speakers and us.

On some standard models of conversation (with Stalnaker’s, I believe, as a paradigm of some sort), the state of a conversation can be represented by some set of information that is in the common ground, together with the separate information states of the two speakers. (Let’s assume there’s only two speakers for convenience.) Whenever someone utters something, this updates the common ground, and presumably also updates the other speaker’s beliefs.

However, if one conversational participant remembers only the past utterances in the conversation, while the other “remembers” only the future utterances, then you get a bit of a problem. It no longer makes sense to model the common ground as including all earlier utterances (and their logical consequences), for just the same reason that we don’t model it as containing all later utterances. So in some sense, the common ground shouldn’t contain any utterances at all from this conversation – it can only include the background presuppositions. So the conversational context never changes. But this seems to make individual utterances pointless then, if their point is to change the conversational context.

Of course, we might be able to get information from one another, but it seems like it wouldn’t work like any sort of conversation, or probably even through intentionally saying things to one another. It would have to just be through observation.

Another interesting side note we realized in discussing this is that there’s a sense in which intention is the natural future-directed counterpart of memory, rather than prediction. After all, intentions are the mental states that have direct causal connections to future events, while predictions generally involve some sort of indirect connection involving previous past events. (Let’s leave aside self-fulfilling prophecies, like the things the Oracle telling all those people in Greek myths.) Of course, a far higher proportion of intentions end up not connected to the appropriate action than is the case for memories and the appropriate experience. That is, it’s much easier to have an intention that you don’t carry out than it is to have a false memory (though both are certainly possible).

Epistemic Modals and Modality

24 06 2006

On Thursday and Friday this week there was a conference on epistemic modality here at ANU – though more of it ended up being about the semantics of epistemic modal words. Unfortunately, John MacFarlane and Brian Weatherson couldn’t be here, so the conference was trimmed slightly.

On the first day, Frank Jackson wondered about how we should assign probabilities to possible worlds so that we don’t end up with metaphysical omniscience – his answer was basically that our credence in a sentence should be the sum of the probabilities of the worlds contained in its A-intension, rather than its C-intension (I hope I’m getting the terminology right). That is, rather than finding out what proposition the sentence expresses and summing the probabilities over the (centered) worlds where that proposition is true, we should figure out in which centered worlds the sentence expresses a proposition that is also true in that centered world, and sum over those. So for instance, although “water is H2O” actually expresses a necessarily true proposition, there are worlds (ones like Putnam’s twin earth) in which “water” refers to a different substance, so the proposition expressed at that world ends up being false at that world. Since there might be worlds that we can’t tell apart from the actual one, it makes sense that those would play a role in our probability assignments. (My concern about the whole framework is a worry about why we should be assigning probabilities to sets of worlds in the first place, rather than to something more epistemically accessible – in which case the problem doesn’t arise.)

Andy Egan spoke about his version of relativist semantics for epistemic modals. Since I’ve mainly only been exposed to John MacFarlane’s version, I was quite interested in this. It does sound to me that the frameworks could be intertranslatable, if done carefully. John does this approximately by saying that the proposition expressed by uttering “might P” at CU is true when assessed at CA iff the proposition expressed by P (at CU?) is compatible with the knowledge of the agent at CA. Andy does it approximately by saying that the proposition expressed by uttering “might P” is a set of centered worlds including all centers whose knowledge is compatible with P. So roughly, John seems to view the proposition as a function, while Andy views it as a set. However, their main differences are in the norms for asserting and denying such statements, which I don’t think I understand fully enough for either – but they’ll have to give some convincing story in order to be able to say that propositions aren’t just true or false simpliciter.

Matt Weiner picked up where Andy Egan left off and gave an interesting argument for why we might have relativist semantics for epistemic modals, rather than contextualist semantics, like we do for “I”, “here”, and other similar terms. Basically, the idea is that we’ve got a conversational norm that one shouldn’t let a proposition one judges false remain unchallenged. Then relativist semantics makes it easy for people to assert modals to share their ignorance, and requires others to share their knowledge to fix this state. So this semantics makes them good tools for joint inquiry.

Seth Yalcin started the next morning by pointing out that epistemic modals lead to a certain type of Moore’s paradox. The ordinary paradox is approximately, “It’s raining, but I don’t know that it’s raining”, which is certainly a very bad thing for anyone to ever assert, but is perfectly reasonable to suppose or embed in the antecedent of a conditional. Depending on your account of epistemic modals, this should mean approximately the same as “It’s raining, but it might not be raining” – which is just as bad to assert, but is interestingly about equally bad to suppose or embed in a conditional. (Compare, “If it’s raining, but I don’t know that it’s raining, then I must be confused”, while “If it’s raining, but it might not be raining, then (I must be confused/it’s raining/etc.)”.) Then he attempted to explain this by giving a very interesting semantics involving sets of probability functions over sets of worlds, rather than just sets of worlds. I’m very interested in looking more at the details of that as he works it out.

Jonathan Schaffer then argued for the KGB account of modals over the CIA account. (That is, the contextualist view he called “Kratzer’s Graded Basis” over the relativists’ “Contexts and Indices of Assessment”.) He disputed the accuracy of a lot of the CIA data, showed that the KGB deals better with modals of all sorts (with epistemic modals just a special case), and showed that some of the propaganda pushed by CIA agents is predicted by the KGB, so it shouldn’t mislead us.

Finally, Dave Chalmers returned to the notion of epistemic modality, after four papers on semantics, and disputed Frank Jackson’s idea of doing epistemic modality in terms of worlds. Instead, he suggested that there should be some space of epistemic possibilities, which are effectively something like maximal consistent conjunctions (or perhaps sets, to deal with infinities?) of sentences. However, the sentences should be phrased in some basic vocabulary that is sufficient to deal with all concepts whatsoever, and “consistent” means “not knowable to be false by any a priori means”. Thus, he’s allowing much stronger reasoning principles than just first-order logic, because he thinks that all mathematical claims (for instance) can be settled by a priori reasoning (which I guess must therefore be much stronger than Turing complete). Also, he’d like to modify this view to deal with epistemically non-ideal agents.

Anyway, it was quite an interesting conference, I learned a lot, and I’m very interested in seeing how these projects continue to develop!

Gricean Silences

29 05 2006

This post came out of a conversation I had with Andy Egan and Gillian Russell the other day, and then similar topics came up in Ryan Muldoon’s comments on Ed Epsen’s paper the next day at FEW. I don’t remember exactly how the topic came up, but we were trying to figure out whether silences can carry implicatures, or more ordinarily, whether you can say something without words. And of course the answer is yes:

Q: “What do you like about John?”
A: [silence]

This one works almost exactly like Grice’s recommendation letter example in “Logic and Conversation”. (By the way, anyone who hasn’t read this paper really should. I think the notion of implicature is one of the most significant advances philosophy has made in our understanding of the world. The paper is widely reprinted, but unfortunately not available online – there’s a two-page excerpt here.) Based on the conversational context, A is expected to make a contribution to the conversation mentioning some positive fact about John. A’s silence violates the maxim of quantity (she hasn’t said as much as is expected), so Q can infer that some other conversational principle (one requiring politeness) must conflict with anything that A would be in a position to say. Therefore, Q comes to believe that there is nothing (or at least nothing relevant) about John that A likes.

But then I realized that we should think about this (and perhaps the original recommendation letter example) a bit more carefully. It seems that the story given above could work in at least two different ways. In one case, A is struggling for an answer, and the silence just comes about because she can’t think of anything she likes about John. In the second case, A knows there is nothing she likes about John and remains actively silent. I think the second case is an example of an implicature carried by a silence, but the first is not.

The explanation of the distinction comes from Grice’s earlier classic paper, “Meaning” in which he suggests that speaker A means y by utterance x iff “A intended the utterance of x to produce some effect [y] in an audience by means of recognition of this intention.” He comes to this recursive intention account of meaning by way of a bunch of examples, which I think parallel the situation here. If I don’t intend you to believe (or consider) something by means of my action, then I didn’t mean it, even if it’s true. Thus, my silence can reveal my dislike for John, just as an accidentally dropped photograph can reveal where I was the other day, but neither means it. But even just performing the action intentionally isn’t enough – Grice suggests that showing someone a photograph doesn’t constitute a meaning of what is depicted, because my intention plays only an unnecessary role in the observer’s coming to believe the truth of what is depicted.

However, showing someone a drawing can constitute a meaning, since the person has to recognize the intention of the person who made the drawing in order to come to believe in the truth of what is depicted (assuming that this was in fact the intent of the drawer). One reason for this distinction that Grice doesn’t discuss in that paper is that only a recursive intention like this can help the speaker and hearer achieve “common knowledge” of the content of the proposition. (That is, not only do you know p, but I know you know p, and you know I know you know p, and …) If the utterance succeeds, then the listener believes that p. But in addition, the listener believes that the utterer intended her to believe that p, so the listener can believe that the listener believes that the utterer now believes that p. But the listener believes that the utterer intended her to believe this, and the cycle can repeat, generating common knowledge. Common knowledge is important in a lot of acts of communication, and a simple, non-recursive, intention can’t generate it.

So for a silence to really implicate (in Grice’s “speaker meaning” sense) something, it has to be intentionally given, and its meaning must be intentional as well, and so on. This makes it much harder than I initially thought for a silence to mean something, but a slighty specification of the original example can fix it.

(Ed’s paper was on zero-knowledge signalling in games of imperfect information, and Ryan pointed out that in Ed’s particular example, one player comes to know that the other player knows some fact, but not because the other player intended her to come to know this. However, it seems that a slightly modified version of the example will put the Gricean condition back in. It was quite an interesting application of the zero-knowledge proof literature in computer science to game theory.)

Thoughts, Words, Objects

17 04 2006

I just got back yesterday from the University of Texas Graduate Philosophy Conference, which was a lot of fun. In fact, I think it was the most fun I’ve had at a conference other than FEW last year, which coincidentally was also in Austin – maybe it’s just a fun town! At any rate, it was a lot of very good papers, and I got a lot of good ideas from the discussion after each one as well. Even the papers about mind-body supervenience, and Aristotelian substance (which aren’t issues I’m normally terribly interested in) made important use of logical and mathematical arguments that kept me interested. And the fact that both keynote speakers and several Texas faculty were sitting in on most of the sessions helped foster a very collegial mood. I’d like to thank the organizers for putting on such a good show, and especially Tim Pickavance for making everything run so smoothly, and Aidan McGlynn for being a good commentator on my paper (and distracting Jason Stanley from responding to my criticism!)

Because the theme was “thoughts, words, objects”, most of the papers were about language and metaphysics, and perhaps about their relation. There seems to be a methodological stance expressed in some of the papers, with some degree of explicitness in the talks by Jason Stanley and Josh Dever, that I generally find quite congenial but others might find a bit out there. I’ll just state my version of what’s going on, because I’m sure there are disagreements about the right way to phrase this, and I certainly don’t pretend to be stating how anyone else thinks of what’s going on.

But basically, when Quine brought back metaphysics, it was basically with the understanding that it wouldn’t be this free-floating discipline that it had become with some of the excesses of 19th century philosophy and medieval theology – no counting angels on the heads of pins. Instead, we should work in conjunction with the sciences to establish our best theory of the world, accounting for our experiences and the like and giving good explanations of them. And at the end, if our theories are expressed in the right language (first-order logic), we can just read our ontological commitments off of this theory, and that’s the way we get our best theory of metaphysics. There is no uniquely metaphysical way of figuring out about things, apart from the general scientific and philosophical project of understanding the world.

More recently, it’s become clear that much of our work just won’t come already phrased in first-order logic, so the commitments might not be transparent on the surface. However, the growth of formal semantics since Montague (building on the bits that Frege and Tarski had already put together) led linguists and philosophers to develop much more sophisticated accounts that can give the apparent truth-conditions for sentences of more complicated languages than first-order logic, like say, ordinary English. Armed with these truth-conditions from the project of semantics, and the truth-values for these statements that we gather from the general scientific project, we can then figure out just what we’re committed to metaphysically being the case.

Of course, science is often done in much more regimented and formalized languages, so that less semantic work needs to be done to find its commitments, which is why Quine didn’t necessarily see the need to do formal semantics. Not to mention that no one else had really done anything like modern formal semantics for more than a very weak fragment of any natural language, so that the very idea of such a project might well have been foreign to what he was proposing in “On What There Is”.

In addition to the obvious worry that this seems to do so much metaphysical work with so little metaphysical effort, there are more Quinean worries one might have about this project. For one thing, it seems odd that formal semantics, alone among the sciences (or “sciences”, depending on how one sees things) gets a special role to play. On a properly Quinean view there should be no such clear seams. And I wonder if the two projects can really be separated in such a clear way – it seems very plausible to me that what one wants to say about semantics might well be related to what one wants to say about ordinary object language facts of the matter, especially in disciplines like psychology, mathematics, and epistemology.

In discussion this afternoon, John MacFarlane pointed out to me that this sort of project has clear antecedents in Davidson, when he talks about the logical structure of action sentences, and introduced the semantic tool of quantifying over events. This surprises me, because I always think of Davidson as doing semantics as backwards from how I want to do it, but maybe I’ve been reading him wrong.

At any rate, thinking about things this way definitely renews my interest in the problems around various forms of context-sensitivity. The excellent comments by Julie Hunter on Elia Zardini’s paper helped clarify what some of the issues around MacFarlane-style relativism really are. Jason Stanley had been trying to convince me of some problems that made it possibly not make sense, but she expressed them in a way that I could understand, though I still can’t adequately convey. It seems to be something about not being able to make proper sense of “truth” as a relation rather than a predicate, except within a formal system. Which is why it seems that MacFarlane has emphasized the role of norms of assertion and retraction rather than mere truth-conditions, and why he started talking about “accuracy” rather than “truth” in the session in Portland.

Anyway, lots of interesting stuff! Back to regularly-scheduled content about mathematics and probability shortly…

APA Blogging: Relativism

31 03 2006

(Anyone who is not familiar with contextualist and relativist semantics might want to read the last three paragraphs here first.)

The last day of the APA saw an interesting session on relativism by Bob Stalnaker, John MacFarlane, and John Hawthorne. I had unfortunately slept through the morning session on relativism with Michael Glanzberg, Thony Gillies, and Andy Egan. And since it was at the end of the conference, I wasn’t at peak form in catching what was said. But what I got out of it was that Stalnaker gave a good summary of the New Relativist position (together with some challenges about “ambivalence” towards propositions that I didn’t quite catch), MacFarlane gave the start of an account of how it is that an assertion of P by one person can disagree with a denial of P by another person if the relevant context of assessment has changed between people, and Hawthorne gave some arguments to challenge standard intuitions that are taken to motivate the relativist move.

When John says, “rotting flesh is not tasty”, I will agree with him. But it seems that I will disagree with Vinny the Talking Vulture who says “rotting flesh is tasty”. This would be an argument against standard (indexical) contextualism – an indexical account would say there is no more disagreement here than there is when I say “My name is Kenny” and you say “My name is not Kenny”. But Hawthorne suggested that this intuition just doesn’t exist here, so there is no need to reject contextualism for something more radical. Because John (having seen some roadkill a little bit back) can then go to Vinny and say “there’s something tasty down the road” – and it seems both that he is agreeing with Vinny, and that he hasn’t changed his mind from earlier, so his original statement and Vinny’s original statement aren’t really contraries, as the relativist (but not the contextualist) would suggest. I think this example is a bit beside the point, because it seems to me that there’s some odd pragmatic move going on when John says “there’s something tasty down the road”. (I think John MacFarlane pointed this out best by asking what John Hawthorne would say if the vulture said “Aha! So you agree with me now! Rotting flesh is tasty!”) But there may well be challenges in the vicinity.

I think it was a bit unfortunate that so much of the discussion was focused on predicates of personal taste (like “fun”, “funny”, “tasty”, and the like), because it’s generally seemed to me that the relativist’s best case is for epistemic modals (“might” in the sense that’s roughly similar to “as far as I know”), though Branden Fitelson might have convinced me that future contingent statements (“I will go shopping tomorrow”, when nothing about the world guarantees that this either will or won’t be the case) are a better case. All the relativist needs to do is point out that at least some area of discourse is best modeled with their semantics.

In general, it seems that there is a family of problems and a family of solutions available for them. The problem cases include future contingents, epistemic modals, predicates of personal taste, gradable adjectives (like “tall” and “flat” – which Gil Harman pointed out may well fall into two categories depending on whether or not they have an absolute at one end), knowledge, indexicals (“I”, “here”, “tomorrow”), demonstratives (“this”, “that”), and probably others. The solutions that have been proposed include saying that the effect is (1) merely pragmatic, (2) subject-sensitive invariantism (or something similar), (3) indexical contextualism, (4) non-indexical contextualism, and (5) relativism.

(1) attempts to explain away the data by appeal to non-literal speech or conversational practices. (2) says the relation discussed is more complicated than it seems, but ultimately only depends on the status of the situation being talked about, not the circumstance it is mentioned in. The other three suggest that some additional sort of contextually supplied parameter is important in the assessment of sentences involving the relevant concept. The question is just whether the parameter is necessary to find out what proposition is expressed, as in (3), or to find out whether or not the proposition is true, as in (4) and (5). The distinction between (4) and (5) is whether the parameter is taken from the circumstance of utterance or the circumstance of assessment. Since an utterance typically expresses a unique proposition, any parameter needed to find out what that proposition is will need to come from the circumstance of utterance. But when assessing an utterance for truth, we have an extra circumstance available to us, so both (4) and (5) are theoretically options (despite the problems one might find with (5)).

I think we have fairly decisive evidence that the best way to treat indexicals and demonstratives is with (3), and people have generally agreed that a proposition gives not just an individual truth-value, but a function from worlds to truth-values, so that the world of utterance must show up in the parameter needed for (4). Perhaps this means that non-indexical contextualism should be called bicontextualism, and relativism should actually be called tricontextualism! However, it’s a somewhat open question whether anything other than person, time, location, and world appear in (3), anything other than world appears in (4), and whether anything at all shows up in the parameter for (5). A debate about one specific parameter for predicates of personal taste will be relevant to whether anything shows up in (5), but it is by no means decisive about relativism.

APA Blogging: Jason Stanley, Knowledge and Practical Interests

27 03 2006

I should start this post by pointing out that I haven’t read Stanley’s book, I didn’t have anything to take notes at the session, and all I know about subject-centered invariantism (which I believe is Stanley’s position on knowledge) is what I learned in John MacFarlane’s seminar last year, and occasional discussions with other people. But even given all that, it was quite an interesting session – and other people seem to agree on that, given that it seemed to have the largest audience. (I was sitting on the floor most of the time, with about a dozen other people – there were two other sessions with similar-sized audiences in even smaller rooms, so they were even more crowded.)

Stanley’s position is “invariantism” in that he denies that the situation of the asserter plays any important role in the proposition expressed by “A knows that p” or its truth value (unlike, say, Keith DeRose and others, some of whom suggest that the salience of relevant alternatives in the context of utterance can make “Jane knows that she has hands” go from true (in most contexts) to false (if one has just considered that she has no way to rule out being a brain in a vat)). However, it is “subject-centered”, because having made exactly the same observations, A can know that p while B doesn’t, if something of extreme urgency for B (but not for A) depends on whether p. (Stanley pointed out that there is a connection between “evidence” and “knowledge”, so we have to talk about something more like observations than evidence.)

I don’t remember too many details of the session (though I know I’d like to get a copy of Stephen Schiffer’s handout, since he made a bunch of very interesting points, and seemed to have handed out something like a full transcript of his remarks), but there was one interesting objection raised in the question period by Ryan Wasserman. Jason Stanley had already bitten the bullet and agreed that if John and Jane are on the same airplane, and have the same information about windspeeds and departure time and the like, but Jane has an important talk to give very soon after arrival, then John can know that the plane will arrive on time even though Jane doesn’t. This seems somewhat odd, and Ryan Wasserman pushed it further by pointing out that on Stanley’s position, this can even be possible if Jane has gathered more information about the weather, history of the airline’s performance, and the like.

After Jason Stanley’s response, Delia Graff (who was chairing the session) tried to defuse the worry that the theory makes such predictions by pointing out a related prediction of a related theory. It seems perfectly fine for her to say that her 9-year-old cousin is really really good at playing basketball, and that Ryan Wasserman plays basketball even better than her cousin, but that Ryan Wasserman is not very good at basketball (despite being better than someone that’s really really good at it). The fact that this prediction is perfectly fine for subject-centered invariantism about “good at basketball” (rather than “knows”) seems to support subject-centered invariantism.

At first I agreed with her, but now I think that this piece of evidence actually counts against Stanley’s theory. I think (correct me if I’m wrong) the intuitions suggest that the basketball case sounds much better than the knowledge case. If subject-centered invariantism about both predicates predicts that both should be acceptable, then this suggests that something like subject-centered invariantism may well be true for “good at basketball”, but probably isn’t true about “knows”. If it did apply to both, then in addition to explaining away the intuition in the case of “knows”, Stanley would have to explain why the intuition reappears for the basketball case. Now, it’s possible that such an explanation will emerge (as it will have to if one thinks that subject-centered invariantism is wrong for both predicates, as I think that I do), but it’s quite a convoluted way to get at the data from Stanley’s theory, and starts to disconnect the theory somewhat from the evidence.

Anyway, it’s interesting stuff.

The Vastness of Natural Languages

27 02 2006

I was browsing in a bookstore today and saw a copy of The Vastness of Natural Languages, by Langendoen and Postal (1984), which had been mentioned to me last semester by my semantics professor as a controversial attempt to show natural languages are infinite. I had wondered then why that claim was controversial, but on looking at the book, I can see why – they claim that any actual natural language has not just countably many sentences, but actually a proper class! That is, there are more sentences of English than any finite or transfinite cardinality! I only read the chapter arguing for this conclusion, and skimmed a bit of the chapter discussing some of the implications. But I think that their arguments are possibly somewhat careless (for instance, they would prove too much if applied to domains other than natural language), and that even if their conclusion is correct, it might not be relevant. But it makes some interesting connections between the traditional philosophy of mathematics and linguistics.

They first look at the traditional arguments that say that there is no finite upper bound on the length of sentences in any natural language. They say that the standard argument (which they attribute among others to Chomsky, and some earlier work by Postal) is just unsound. The argument says that if S is any grammatical sentence of English, then “I know that S” is a grammatical sentence. Similar versions say that a word like “very” can obviously be repeated any number of times, or that two sentences can always be conjoined to produce a longer one. However, they point out (I think rightly) that the argument is question-begging against someone who presupposes that there is some finite limit to the length of sentences in a language – we say the resulting longer sentences are grammatical because they are constructed by grammaticality-preserving rules. But the evidence that these rules are grammaticality-preserving should be based on judgements of grammaticality of individual sentences. So we should have to recognize these extremely large sentences as grammatical before accepting the rule as grammaticality-preserving, but the opponent has already claimed that these long sentences are too long to be grammatical, so the rules are clearly not grammaticality preserving, even though they appear to be (and are for smaller sentences). (They also point out the analogies between this argument and Dedekind’s argument for the existence of infinitely many concepts, or sets, or objects of thought, where he assumes that for any thought T, there is a distinct thought, “T is a thought”. They point out that Dedekind’s set is exactly the sort that gets one into the semantic paradoxes.)

They also point out what they say is the only non-question-begging argument that natural languages have no finite bound on the length of sentences, which they attribute to Jerrold Katz. This argument depends more specifically on some of the methodology of linguistics. When analyzing the grammatical structure of some language L, we always work with some (necessarily finite) set of sentences that have been judged to be grammatical by a native speaker, which they call “IB(L)”, as an inductive basis. We attempt to find the rules of L by observing regularities in IB(L), but we don’t want to generalize accidental features of IB(L). For instance, we generally don’t want to say that to be grammatical in L just is to be a member of IB(L) – that would be too accidental and tied to the particular set of data we gathered. Similarly, since the set of sentences is finite, there will in fact be some greatest length k of any sentence of IB(L). But to say that every sentence of L must be at most as long as k would also seem to tie features of L too closely to accidental features of IB(L). And to choose any other length as the upper bound would be arbitrary – there is no clear reason to choose one natural number over another as the bound. So there should be no rule of the grammar of L directly specifying the maximum length of a sentence of L.

One might try to suggest that people will never be able to remember or understand a sentence that is a million words in length. However, this ties the language too much to contingent “performance limitations” of speakers. I’m sure there’s a decent amount of controversy around the relevance of performance limitations, but I’m willing to concede to them the claim that to get a good theory of a natural language, one should abstract away from the performance capacities of its actual speakers.

They point out that nothing in this argument depends on the putative size limitation of sentences being finite. Thus, the only good argument that shows that there is no in-principle limitation of the length of sentences of English also shows (according to them) that there are sentences of English of arbitrary transfinite length, as well as arbitrary finite length. Since there is a proper class of possible lengths then, we see that English must contain a proper class of grammatically well-formed sentences, as they claim.

I think their argument moves too fast. Just because we don’t want to generalize a size limitation that we find in some particular inductive basis IB(L) doesn’t mean that we can’t get any principled size limitation. For instance, we might discover that all the other syntactic rules that we extract from the data give some constraint on the possible lengths of sentences. For instance, in a propositional logic with only binary connectives, we can see that every sentence of the language must have an odd number of symbols in it – even if we don’t make this generalization from the observation of some IB(L) where every sentence has an odd number of symbols, we will still arrive at it by noticing that the only well-formed sentences are either single-symbol sentences, or of the form (A^B), where A and B are well-formed, or similarly for other connectives. (In fact, these rules will generate the constraint that every sentence has a length that is 1 mod 4.) If a language is complicated enough, the rules might well generate naturally some constraint that ensures sentences can never be more than k symbols long – even if k is not itself the upper bound of the lengths of strings in IB(L).

Of course, it doesn’t seem that the syntactic rules of English generate such a constraint. (We really do seem to be able to iterate “very” as many times as we want.) But another non-arbitrary constraint may arise. If we observe the frequency of sentences of various lengths in our IB(L), we might discover that sentences of length k occur in proportion to the square-root of 1,000,000-k2. Then, even if the inductive basis happens not to contain any sentences of length greater than 900, we might suppose that 1000 is the maximal length of any sentence of the language. Perhaps more naturally, we might notice that sentences of length k occur in proportion to 2-k, which would lead us not to put any finite bound on the length of sentences. However, this would give us good reason to suppose that there are no sentences of any infinite length, since their occurrence would be proportional to 0.

Of course, this presupposes that IB(L) was chosen through some means of representative sampling, which it generally isn’t. (It’s generally invented entirely by the linguist, and then confirmed by a native speaker as containing only grammatical sentences. And actually, there’s normally a set, say IB*(L) of sentences judged ungrammatical by a native speaker, to constrain the theory from the other side. The fact that both sets are likely to be bounded by the same length k makes it even more clear that we shouldn’t just judge that anything beyond length k is ungrammatical.) And it gives more weight to the performance limitations of speakers than Postal and Langendoen might want.

I was struck in the course of reading their chapter that many of the arguments for a finite bound on the length of sentences in a natural language are like the arguments given by ultrafinitists for an upper bound on the size of natural numbers that exist. (For instance, “no one could conceive of counting a number as large as 21,000,000“, or concerns based on the total number of particles in the universe.) But if Langendoen and Postal are right, then I think their arguments should carry over to the case of natural numbers, which would seem to show that there is no (finite or transfinite) upper bound to the size of natural numbers!

Fine, let’s concede this and just say that all of Cantor’s ordinals are themselves “natural numbers” in the sense that all these transfinitely long strings are in fact “grammatical sentences of English”. There are still many important situations in which people want to just work with the standard natural numbers – and I think it’s even more clear that there are important situations in which people only care about the “finitary fragment” of these extremely vast “natural languages” that Postal and Langendoen want to talk about. One of the important consequences they list for their theory is that it means that every traditional account of syntax is wrong, because they all insist that the set of grammatical sentences is recursively-enumerable, and thus countable. If they’re right, then every traditional account is wrong. Yet if these theories aim not to account for the full “natural language” as Postal and Langendoen want to, but merely for the finitary fragment, then I don’t see how they can criticize these theories. Perhaps they’ll argue that such a restricted theory won’t be able to get at what’s psychologically real in our syntactic structures – but I’d like to see some evidence from their side that they get anything different without severely increasing the complexity of the rules for the finitary fragment. And in addition, the techniques of modern proof theory and recursion theory show how to extend notions of computability into the transfinite – perhaps traditional accounts of syntax can be extended into the transfinite similarly, generating extremely small changes to existing theories to account for all the new sentences Postal and Langendoen want to consider.

Unless they can defuse these criticisms, I don’t see what allowing transfinite sentences gets for them. But it’s definitely an interesting place to try comparing arguments about ultrafinitism, finitism, constructivism, and platonism in mathematics with a similar set of positions in language!