The Vastness of Natural Languages

27 02 2006

I was browsing in a bookstore today and saw a copy of The Vastness of Natural Languages, by Langendoen and Postal (1984), which had been mentioned to me last semester by my semantics professor as a controversial attempt to show natural languages are infinite. I had wondered then why that claim was controversial, but on looking at the book, I can see why – they claim that any actual natural language has not just countably many sentences, but actually a proper class! That is, there are more sentences of English than any finite or transfinite cardinality! I only read the chapter arguing for this conclusion, and skimmed a bit of the chapter discussing some of the implications. But I think that their arguments are possibly somewhat careless (for instance, they would prove too much if applied to domains other than natural language), and that even if their conclusion is correct, it might not be relevant. But it makes some interesting connections between the traditional philosophy of mathematics and linguistics.

They first look at the traditional arguments that say that there is no finite upper bound on the length of sentences in any natural language. They say that the standard argument (which they attribute among others to Chomsky, and some earlier work by Postal) is just unsound. The argument says that if S is any grammatical sentence of English, then “I know that S” is a grammatical sentence. Similar versions say that a word like “very” can obviously be repeated any number of times, or that two sentences can always be conjoined to produce a longer one. However, they point out (I think rightly) that the argument is question-begging against someone who presupposes that there is some finite limit to the length of sentences in a language – we say the resulting longer sentences are grammatical because they are constructed by grammaticality-preserving rules. But the evidence that these rules are grammaticality-preserving should be based on judgements of grammaticality of individual sentences. So we should have to recognize these extremely large sentences as grammatical before accepting the rule as grammaticality-preserving, but the opponent has already claimed that these long sentences are too long to be grammatical, so the rules are clearly not grammaticality preserving, even though they appear to be (and are for smaller sentences). (They also point out the analogies between this argument and Dedekind’s argument for the existence of infinitely many concepts, or sets, or objects of thought, where he assumes that for any thought T, there is a distinct thought, “T is a thought”. They point out that Dedekind’s set is exactly the sort that gets one into the semantic paradoxes.)

They also point out what they say is the only non-question-begging argument that natural languages have no finite bound on the length of sentences, which they attribute to Jerrold Katz. This argument depends more specifically on some of the methodology of linguistics. When analyzing the grammatical structure of some language L, we always work with some (necessarily finite) set of sentences that have been judged to be grammatical by a native speaker, which they call “IB(L)”, as an inductive basis. We attempt to find the rules of L by observing regularities in IB(L), but we don’t want to generalize accidental features of IB(L). For instance, we generally don’t want to say that to be grammatical in L just is to be a member of IB(L) – that would be too accidental and tied to the particular set of data we gathered. Similarly, since the set of sentences is finite, there will in fact be some greatest length k of any sentence of IB(L). But to say that every sentence of L must be at most as long as k would also seem to tie features of L too closely to accidental features of IB(L). And to choose any other length as the upper bound would be arbitrary – there is no clear reason to choose one natural number over another as the bound. So there should be no rule of the grammar of L directly specifying the maximum length of a sentence of L.

One might try to suggest that people will never be able to remember or understand a sentence that is a million words in length. However, this ties the language too much to contingent “performance limitations” of speakers. I’m sure there’s a decent amount of controversy around the relevance of performance limitations, but I’m willing to concede to them the claim that to get a good theory of a natural language, one should abstract away from the performance capacities of its actual speakers.

They point out that nothing in this argument depends on the putative size limitation of sentences being finite. Thus, the only good argument that shows that there is no in-principle limitation of the length of sentences of English also shows (according to them) that there are sentences of English of arbitrary transfinite length, as well as arbitrary finite length. Since there is a proper class of possible lengths then, we see that English must contain a proper class of grammatically well-formed sentences, as they claim.

I think their argument moves too fast. Just because we don’t want to generalize a size limitation that we find in some particular inductive basis IB(L) doesn’t mean that we can’t get any principled size limitation. For instance, we might discover that all the other syntactic rules that we extract from the data give some constraint on the possible lengths of sentences. For instance, in a propositional logic with only binary connectives, we can see that every sentence of the language must have an odd number of symbols in it – even if we don’t make this generalization from the observation of some IB(L) where every sentence has an odd number of symbols, we will still arrive at it by noticing that the only well-formed sentences are either single-symbol sentences, or of the form (A^B), where A and B are well-formed, or similarly for other connectives. (In fact, these rules will generate the constraint that every sentence has a length that is 1 mod 4.) If a language is complicated enough, the rules might well generate naturally some constraint that ensures sentences can never be more than k symbols long – even if k is not itself the upper bound of the lengths of strings in IB(L).

Of course, it doesn’t seem that the syntactic rules of English generate such a constraint. (We really do seem to be able to iterate “very” as many times as we want.) But another non-arbitrary constraint may arise. If we observe the frequency of sentences of various lengths in our IB(L), we might discover that sentences of length k occur in proportion to the square-root of 1,000,000-k2. Then, even if the inductive basis happens not to contain any sentences of length greater than 900, we might suppose that 1000 is the maximal length of any sentence of the language. Perhaps more naturally, we might notice that sentences of length k occur in proportion to 2-k, which would lead us not to put any finite bound on the length of sentences. However, this would give us good reason to suppose that there are no sentences of any infinite length, since their occurrence would be proportional to 0.

Of course, this presupposes that IB(L) was chosen through some means of representative sampling, which it generally isn’t. (It’s generally invented entirely by the linguist, and then confirmed by a native speaker as containing only grammatical sentences. And actually, there’s normally a set, say IB*(L) of sentences judged ungrammatical by a native speaker, to constrain the theory from the other side. The fact that both sets are likely to be bounded by the same length k makes it even more clear that we shouldn’t just judge that anything beyond length k is ungrammatical.) And it gives more weight to the performance limitations of speakers than Postal and Langendoen might want.

I was struck in the course of reading their chapter that many of the arguments for a finite bound on the length of sentences in a natural language are like the arguments given by ultrafinitists for an upper bound on the size of natural numbers that exist. (For instance, “no one could conceive of counting a number as large as 21,000,000“, or concerns based on the total number of particles in the universe.) But if Langendoen and Postal are right, then I think their arguments should carry over to the case of natural numbers, which would seem to show that there is no (finite or transfinite) upper bound to the size of natural numbers!

Fine, let’s concede this and just say that all of Cantor’s ordinals are themselves “natural numbers” in the sense that all these transfinitely long strings are in fact “grammatical sentences of English”. There are still many important situations in which people want to just work with the standard natural numbers – and I think it’s even more clear that there are important situations in which people only care about the “finitary fragment” of these extremely vast “natural languages” that Postal and Langendoen want to talk about. One of the important consequences they list for their theory is that it means that every traditional account of syntax is wrong, because they all insist that the set of grammatical sentences is recursively-enumerable, and thus countable. If they’re right, then every traditional account is wrong. Yet if these theories aim not to account for the full “natural language” as Postal and Langendoen want to, but merely for the finitary fragment, then I don’t see how they can criticize these theories. Perhaps they’ll argue that such a restricted theory won’t be able to get at what’s psychologically real in our syntactic structures – but I’d like to see some evidence from their side that they get anything different without severely increasing the complexity of the rules for the finitary fragment. And in addition, the techniques of modern proof theory and recursion theory show how to extend notions of computability into the transfinite – perhaps traditional accounts of syntax can be extended into the transfinite similarly, generating extremely small changes to existing theories to account for all the new sentences Postal and Langendoen want to consider.

Unless they can defuse these criticisms, I don’t see what allowing transfinite sentences gets for them. But it’s definitely an interesting place to try comparing arguments about ultrafinitism, finitism, constructivism, and platonism in mathematics with a similar set of positions in language!

Quantifying into Sentence Position

23 02 2006

In his “Concept of Truth in Formalized Languages”, Tarski considers an alternative truth-definition that involves sentence-position variables inside quotes. In what he calls a “formally correct” truth definition, we would have a condition of the form “forall x (x is true iff …)”. “x” here is a variable that ranges over mentioned sentences, and “…” should be filled in with our definition. The attempt under consideration is
forall x (x is true iff exists σ (x=”σ” and σ)).
Here, “σ” is a variable that is supposed to range over used sentences, rather than mentioned sentences. We will say that x is true if it is a sentence that can be used to mean σ, so we need x to be an expression for σ, which is why we need to say “x=”σ””. However weird sentence-position quantification might be, the worse problem here is that we have to quantify into quotation marks. Note that the quoted letter sigma appears inside the truth-definition in a position where we have to quantify into it, but in my first sentence after that definition, I used that same expression to name the variable, not to give an expression with a free variable inside it naming a sentence. That usage is what we would expect given ordinary rules for using quotation marks, but Tarski considers what would happen if we allowed for this unusual usage (which would make it tough to talk about the language) and shows that we can get versions of the liar paradox, which would undermine the whole goal of trying to define truth.

However, I’ve got another reason to think that we shouldn’t have quantification into quotes that behaves the way we want it to in the attempted truth-definition – instead it should behave the way it does in my first sentence after the definition. The reason is mainly going to be because we often want to have distinct object-language sentences with the same truth-conditions (or perhaps more generally, possibly the same meanings). As a result, the range of values for the sentence-position variable will have to come with both intensional and extensional information. That is, to be used in sentence-position, it will have to have at least the extensional information of the truth-conditions, but in order to get different values for quote sigma given extensionally equivalent values of sigma, it will need to somehow have intensional information carried with it. Now, this is possible, but somewhat unwieldy.

In addition, if sigma is a variable that appears in the object language as well as in the metalanguage, then we’ll have to have a different procedure to indicate that we mean to refer to that variable, rather than have a free sentence-position variable inside quotation marks. This is also possible – in the LaTeX markup language, one can do this for special characters by putting a backslash in front of them; in some other languages one doubles up the special character, or uses some other way to “escape” it. Of course, if one wants to have that expression in quotes, rather than just the letter sigma, then one needs a further set of commands to escape the relevant characters. It’s possible, but it involves replacing a lot of the standard names for certain symbols in the language inside quotes.

So we can reconsider why we wanted to be able to quantify into quotes to begin with. The reason was so that we can have one position that names a sentence while another position uses the same sentence, with the sentence being quantified over. Since every sentence has exactly one meaning (or set of truth-conditions), while truth-conditions are in general shared by multiple sentences, it seems most natural for our metalanguage function to go the other way. Instead of going from use to mention, it should go from mention to use, because that function should be well-defined – multiple intensions correspond to the same extension, but not vice versa. Thus, we should be able to express our truth-definition roughly as the following:
forall x (x is true iff exists S (x=S and F(S))),
where “F” is the function that gets us from use to mention, or intension to extension. That is, “F(S)” corresponds to “σ” and “S” corresponds to “”σ””. We don’t have to worry here about any collision between object language and metalanguage variables, so I think this proposal is overall more natural.

But we can see that this definition is equivalent to
forall x (x is true iff F(x)),
which we see means that “F” just is the truth-predicate. I think this is why natural language has a truth-predicate rather than a quote-quantifying sentence-place variable. They can express all the same things, but one is more convenient than the other. Semantic descent is easier than semantic ascent, so that’s why it’s the function that we have built into our language.

As a result, we have to go to more work to define truth, but Tarski has showed us that this is generally possible, as long as we don’t mind the problems Field points out of the definition being non-systematic and non-explanatory.

FEW 2006

18 02 2006

The schedule is here. So many good submissions that they still haven’t had any repeat presenters after three years! Everyone should come – there are funds for graduate student travel to Berkeley.


8 02 2006

1997 seems like it must have been quite a year in the philosophy of mathematics. Mike Resnik published Mathematics as a Science of Patterns, and Stewart Shapiro did Philosophy of Mathematics: Structure and Ontology, which are two strong arguments in favor of different versions (I think) of structuralism, which had been a popular idea over the previous few decades, but I think not terribly well-developed before those books. At the same time, John Burgess and Gideon Rosen outlined and attacked fictionalism in their A Subject with no Object, and Penelope Maddy advocated an end to all this investigation into the ontology and epistemology of mathematics in Naturalism in Mathematics. In addition, Synthese did a special issue on proof in mathematics, and in 1998 a couple other important books came out – Mark Balaguer’s Platonism and Anti-Platonism in Mathematics and the collection Truth in Mathematics edited by Dales and Oliveri.

I don’t know of any other particular year that had such a proliferation of interesting books coming out nearly simultaneously in one relatively small area of philosophy like this. And I feel like there’s another book from 1997 that I’m leaving out as well. Does anyone know any other examples of years like this?

Structuralism and Application

2 02 2006

One form of structuralism says that mathematics is just the proving of theorems from axioms, which may or may not hold of (or approximate) various real systems. Something like this is clearly right for things like topology, group theory, and the like – when we prove something about topological spaces (or better yet, about compact hausdorff spaces), we don’t mean to prove it of any particular thing, but rather just know that whenever we find some entity that satisfies the axioms of topology (plus the compact hausdorff axioms), the theorem will be true of this entity. Whether or not there is such an entity is almost a side concern, though the fact that they seem to arise in various areas of mathematics convinces us that the theorems are likely to be useful.

The structuralist says that this is the case with Peano arithmetic, and set theory as well. We’re not proving anything about actual numbers or sets (like the platonist claims) – we’re just proving theorems that will hold of any entities that happen to satisfy the axioms we use. Something like this seems to be the position of Saunders Mac Lane in his 1997 paper “Despite Physicists, Proof is Essential in Mathematics” (Synthese 111:147-154),

However, this position seems problematic, because it doesn’t explain how we are justified in many of our applications of mathematics. (Platonism and fictionalism don’t obviously work much better, but I think I can sketch an account of how they at least partially justify us in these applications.) If our theorems about real numbers are just theorems about whatever happens to satisfy the real number axioms, rather than about anything independently existing entities, then to apply the theorems, we would need to show that the system we’re applying it to satisfies the axioms. So in order to apply the intermediate value theorem to distances, we’d need to show that distances are dense, linearly ordered, have no endpoints, and are topologically complete. In order to apply them to probabilities, temperatures, electric charges, masses, times, and the like, we’d need to show that these axioms apply to each of those domains. However, it seems unlikely to me that we’ve actually shown that any of these quantities satisfies the real number axioms, much less all of them.

The platonist has a way out by saying that (somehow) we’ve discovered facts about these entities we call real numbers, which aren’t the physical entities we’re talking about. Instead, we see that in each of these physical situations, our investigation reveals that there is some sort of abstract entity involved, and we can use inductive reasoning to suggest that the real numbers are the appropriate ones. Thus, even if we can’t directly support the infinite divisibility, or the topological completeness, of any of these realms, we might be able to inductively support them by inferring that the reals (a structure we already know about) would give the best explanation of the phenomena we’re observing. If we had to restrict ourselves to a set of structural axioms that we knew to hold, we’d have to convince ourselves that some closely related set of axioms wouldn’t be better. If the platonist has a reason for thinking that the real numbers are a special set of entities, then we could infer this convergence on one set of axioms without having to establish it completely independently each time.