7.3 Words and Eras
For most of the book, I have worked from a dataset that started with the JSTOR word list, then made two major edits. First, I cut out words that don’t look like they are part of the content of the papers I’m focussing on. Second, I cut out words that only appear one to three times in a paper. For this section, mostly still excluding the noncontentful words—though I’ll note one point below when I consider them—but I’m restoring the words that appear one to three times.
On the other hand, I’m restricting attention to the five thousand most common words in the data set. In practice, that means restricting attention to words that appear about 2,200 times or more. That is, the word has to appear about once ever fifteen articles. That doesn’t mean it has to appear in one-fifteenth of the articles; it could appear very often in a few articles. But it does have to get those 2,200 appearances somewhere. The reason for this restriction will soon becme clear.
To start, here are the most common words from each era—remembering that we’ve filtered out a lot of “stop words”.
1876–1945 | 1946–1965 | 1966–1981 | 1982–1998 | 1999–2013 |
---|---|---|---|---|
sense | sense | theory | theory | theory |
nature | theory | case | case | case |
experience | fact | true | true | true |
fact | true | sense | might | might |
knowledge | philosophy | might | argument | argument |
time | case | fact | moral | first |
mind | use | first | sense | example |
philosophy | moral | time | first | account |
theory | first | argument | truth | just |
world | time | moral | fact | truth |
That doesn’t tell us a lot. It is a bit interesting that theory and especially case are much more prevalent after World War II than before it, but that’s about it.
We learn more by comparing the frequency of each word in an era to that word’s frequency across the whole data set. So rather than ask how many times a word appears in an era, we might ask what percentage of the word’s appearances are in that era. That gives us a sense of the characteristic words of a given era.
But as it stands, that doesn’t work either. A lot of words have all of their appearances in one or other era. For instance, we don’t see any occurrences of elga, kolodny, knobe, weatherson, rayo, obama or greaves until era 5. So they’d all be tied for being the most characteristic words of the 1999–2013 era, since all of their appearances in this era. And while some of those words are somewhat important to the era, I don’t think that’s quite what we’re looking for.
What I decided to do, following common practice, is to restrict the study to the five thousand most common words. This excludes all the words I just listed. (I’d have to extend it a fair bit further to get them; elga is at position 10468, greaves at 16583, and the others in between.) And then we can look at which words from that five thousand have the highest percentage of their occurrences in a given era.
But it turns out even that isn’t quite what we want, though it’s not clearly not what we want. If you do what I just said, most of the words that show up are between the four thousandth and five thousandth most common words. It’s just much easier for rarer words, especially names, to appear at one particular time. Therefore, I decided to show you a whole bunch of tables.
For every one of the following tables, I restricted attention to the top n thousand words in the data set, and then asked of those words, which have the highest percentage of their occurrences in different eras. I think looking at the tables for each value of n from 1 to 5 is useful. There will be some repetition; sometimes a common word has a distinctive distribution. But there is some new information each time n is increased. I cut it off at n = 5, but you could keep going beyond that. (Though if we did, we’d find a few more latex words, and journal names, and OCR errors, that, in retrospect, I might have wanted to delete from the dataset.) So here’s what we get when restricting attention to the one thousand most common words.
1876–1945 | 1946–1965 | 1966–1981 | 1982–1998 | 1999–2013 |
---|---|---|---|---|
consciousness | statements | jones | intentional | epistemic |
reality | statement | legal | rationality | models |
unity | analytic | quine | rawls | normative |
soul | ethical | predicates | frege | population |
feeling | philosopher | criteria | realist | intuitions |
sensation | art | wants | strategy | worlds |
ultimate | men | logically | beliefs | modal |
whole | phenomenological | wittgenstein | desires | strategy |
absolute | descriptive | rawls | probabilities | agents |
quality | aesthetic | behaviour | realism | david |
Is jones at the top of 1966–1981 because of Sellars, or Gettier, or Frankfurt? I think the answer is, all of the above! I think frege appears prominently in 1982–1998 because of “Frege cases”, not because of a particular upsurge in attention to Frege’s own writings. Both quine and rawls are a bit later than their most famous writings, which makes sense. And note intuitions turning up as a distinctive word in the 21st century literature. It’s interesting, I think, that it isn’t used as much in the era intuitions were allegedly dominating philosophy as in the era when metaphilosophy became such a big deal.
We can look at the graphs of how frequently these words appeared over time to get a sense of what it means for them to be the distinctive words of an era. I’ll just graph the top five for each era, because otherwise the graphs get too cluttered.
We can confirm that the words in question really do peak in the era in question.
The Y-axis measures the frequency of the words among the words in the JSTOR data. That excludes 1 and 2 letter words, and whatever stop words JSTOR has excluded (like the, and, and the like), but includes things like bibliographic information and latex code. It probably overstates the actual frequency of the words by something like 25 to 50 percent. So if it says that a word appears one time in four hundred, its real frequency is, as far as I can tell, more like one time in five to six hundred.
Note that the distinctive words of the middle eras are much less frequent than the distinctive words of the early eras or even (to a lesser extent) the later eras. The word consciousness seems to have appeared, on average, about once a page in the early years! No word is this prevalent in the later years.
Let’s expand the data set and look at the two thousand most common words.
1876–1945 | 1946–1965 | 1966–1981 | 1982–1998 | 1999–2013 |
---|---|---|---|---|
bradley | dewey | marx | putnam | global |
organic | statements | popper | davidson | epistemic |
idealism | whitehead | utilitarian | supervenience | worry |
consciousness | usage | strawson | kripke | testimony |
reality | statement | jones | women | phenomenal |
unity | factual | legal | preferences | multiple |
existent | analytic | utilitarianism | intentional | options |
soul | synthetic | punishment | van | option |
feeling | ethical | quine | rationality | counterfactual |
eternal | signs | entailment | rawls | vagueness |
Both marx and davidson turn up one era later than I would have guessed. And I would have thought strawson was either an era earlier or an era (or two) later; earlier for the work on descriptions, later for the work on responsibility. So those are a bit interesting. Testimony really was a big topic in the early twenty-first century. And note worry turning up. Recent philosophy has a very distinctive lexicon, which we’ll see more and more of.
Here are the graphs for the first five words in each column.
1876–1945 | 1946–1965 | 1966–1981 | 1982–1998 | 1999–2013 |
---|---|---|---|---|
psychical | ryle | marx | nuclear | williamson |
esthetic | ayer | hare | parfit | credence |
bradley | dewey | austin | dummett | scenario |
apprehension | statements | hempel | fodor | luck |
connexion | whitehead | popper | dworkin | global |
spiritual | usage | utilitarian | computational | epistemically |
volition | statement | strawson | twin | epistemic |
organic | philosophic | chisholm | nozick | target |
impulse | western | goodman | evans | representational |
intellect | peirce | geach | putnam | worry |
Apart from in the earliest era, we’re starting to see the majority of the list here be names of famous (male) philosophers. And we get a pretty good sense of when they were being most commonly discussed. The graphs show this in slightly more detail. (The fourth graph is a little busted because of one year when nuclear went nuclear.)
The pattern stays the same as we expand to four thousand words.
1876–1945 | 1946–1965 | 1966–1981 | 1982–1998 | 1999–2013 |
---|---|---|---|---|
bosanquet | ryle | marx | nuclear | williamson |
psychical | ayer | grue | parfit | scanlon |
stout | historian | hare | dummett | credence |
instinct | dewey | austin | dennett | scenario |
bergson | poem | hempel | burge | bob |
esthetic | philosophies | hart | fodor | luck |
apprehended | verifiable | popper | dworkin | arguably |
presentations | sartre | utilitarian | computational | doxastic |
bradley | malcolm | lorentz | twin | robust |
apprehension | civilization | strawson | consequentialist | global |
I’m a bit surprised to see bob here; I’m not sure if this is Stalnaker, or Brandom, or who is being referred to so informally. The graphs don’t show a great deal that isn’t visible on the previous set, so let’s skip over them and move the limit up to the five thousand most common words.
1876–1945 | 1946–1965 | 1966–1981 | 1982–1998 | 1999–2013 |
---|---|---|---|---|
bosanquet | emotive | illocutionary | deterrence | hawthorne |
schiller | stevenson | marx | laudan | chalmers |
psychical | ryle | hintikka | nuclear | contextualism |
spencer | ayer | grue | churchland | williamson |
muscular | historian | capitalist | rorty | normativity |
stout | dewey | hare | parfit | woodward |
instincts | santayana | lakatos | dummett | scenarios |
instinct | poem | alienation | dennett | scanlon |
bergson | philosophies | austin | burge | credence |
antithesis | verifiable | hempel | fodor | egalitarianism |
And finally we get that not all the names are of men. Korsgaard is here, and the model doesn’t discriminate between the Churchlands, so at least part of the reason for churchland in 1982–1998 is Patricia Churchland.
Apart from the names, the words in the final era are all fairly much as expected. I’ll come back in the last chapter to credence, because I don’t think everyone realizes how new a term it is. And egalitarianism was used so much in 1982–1998 that I’m very surprised it can turn up here.
I have no idea why muscular is such a common term in the first era. I suspect I wouldn’t be happy to find out.
Here are the graphs for the five most distinctive words in each of the eras.
Here part of the story, as is shown in the changing scale of the Y-axes, is ever-increasing diversity. Figures who feel like they dominate the current age, like Williamson, Hawthorne and Chalmers, are discussed much less than figures like Ayer or Ryle were a couple of generations ago.