The benefit of using this kind of modeling is that it allows every article to be taken into account. This is the history of philosophy (in these journals) without any gaps whatsoever.
And this is no small feat. Remember that there are 32,261 articles that are being looked at. Let’s say that eight hours a day, five days a week, could be dedicated just to reading these articles, and that one could on average read an article per hour. Some, to be sure, would take less than an hour, even to read closely. But just one hour is an optimistic reading time for the longer articles. Still, let’s make the optimistic assumption. That would mean 807 weeks of just reading through them all. If one takes two weeks a year off, it would take sixteen years just to do the reading. And at the end of that time, one would, at best, have some sketchy notes on the articles and not anything that could be used for an analysis.
To analyze all the articles, to really have no gaps, then the only way is by machine.
But there are a number of downsides to this algorithmic approach, all of which come from the fact that the machine is just doing string recognition. The algorithm doesn’t know any semantics—just syntax. And this causes some complications. I’ll mention five here, along with a brief discussion of how badly they impacted the model I ended up using.
One problem that turned out not to be too big a deal was that the algorithm has a hard time distinguishing between different uses of the same word. But while this is hard, it isn’t impossible. The model seems, for example, to understand the difference between how function is used in philosophy of biology versus how it is used in logic and mathematics. It didn’t run together the different uses of realism or internalism and externalism in a way that I would have expected. There is a hint of running together scepticism in the sense most relevant to epistemology with other kinds of philosophical scepticism. (Someone who is a free-will sceptic doesn’t say that people don’t know whether free will exists but that people know it doesn’t.) But maybe this isn’t too much of a problem, since the views aren’t that separate.
The one time that this particular model seems to have gotten confused over the two related meanings of a word concerned free. Topic 35 is a mishmash of work on free will, with work on political freedom. It’s possible to think this isn’t too bad, since the subjects are somewhat connected. But it’s not optimal, and eventually it’s necessary to separate out free will and political freedom. But the big picture is that something that seemed likely to be a problem turned out, pleasingly, to not be that bad.
A second, opposing problem is that sometimes the differences in topics come from a change in terminology. This can be seen most clearly, I think, in the logic topics in the model. Papers about sequents are put in a different topic than papers about syllogisms. Papers about implications are put in a different topic than papers about validities. Now there is a sense in which that’s a good thing, and the model is picking up a philosophically significant change. But it’s a relatively minor change compared to what the model thinks. Still, this isn’t a particularly serious problem. The worst-case scenario is that one has to come back in later and manually note that the papers on validities and papers on implications need to be put back together when we’re doing analysis. That’s a bit of work but it isn’t too bad—just remember that it happens.
A third, and related, problem comes from the model making fine-grained distinctions within a subject. I mentioned earlier that I saw several models that ended up separating out work on causation that didn’t discuss counterfactuals (such as that of Mackie) from post-Lewisian work, where counterfactuals are front and center. That’s not great—these really are on the same topic—but it isn’t too bad. Again, the worst-case scenario is that these topics need to be combined by hand when doing analysis. But in practice I don’t think I really saw this problem arise in this particular run of the model.
A potentially bigger problem is the converse, which I already discussed when talking about choosing the number of topics. Sometimes the topics are just disjunctive. For example, topic 37 ends up being half about sets and half about the grue paradox. There is a connection of sorts here—Nelson Goodman is kind of important to both literatures. But really this shouldn’t be a single topic. As I already noted, this is a hard problem to fix. If the number of topics are increased, the model becomes harder to read, and it’s just as likely to split a coherent topic like causation) as it is to split a disjunctive topic.
I did three things here to address these disjunctive topics. One, that I’ve already mentioned, was to keep running refinements until the worst of the disjunctiveness was polished away. Before the refinements, some papers on probabilistic epistemology got classified in with papers on Hume, and I don’t know what the computer was thinking. A handful ended up there after the refinements, but not nearly as many.) A second is to use very clear labels for the topics, like “Sets and Grue,” to indicate that it is a disjunctive topic. And a third is to run a further analysis on articles in that topic to divide up the sets of articles from the grue articles. Eventually there ended up being ten topics where I felt this kind of split was worthwhile.
The fifth and final problem is that the algorithm can’t tell changes of topic apart from changes in style. If it becomes a requirement on all right-thinking philosophers to express oneself more or less exclusively in monosyllables, as seems to have been the case in midcentury Britain, then the algorithm will think that there is a new topic that is being discussed right then. I’m exaggerating of course about midcentury Britain, but there is a trend that matters, and that I’ll talk much more about later.
Or imagine what would happen if every philosopher all at once decided that objections shouldn’t be responded to with a new theory that has distinctive consequences but instead one should respond to worries with a new account that has distinctive commitments. Well, the model will think that there is this cool new subject about “worries,” “accounts,” and “commitments,” and that they’re being talked about. And if this stylistic change happens all at once across philosophy, the model will think that the generalist journals, the philosophy of science journals, and the moral and political journals are suddenly obsessed with the worry/account/commitment subject. Of course, philosophy couldn’t be so caught up chasing trends that something like this would happen all at once, could it? Could it? Let’s return to this issue at the very end and see how bad things got.