5.3 Summarising the Ninety Binary Sorts

So which topics should be split up in this way? To answer this, I wanted to look at three questions:

How confident was the binary sort that it had really found a division in the data?
What were the subtopics that the binary sort generated?
Were these subtopics from different categories?

If the answer to 1 is negative, then this technique seems too random to usefully be applied. If the answer to 3 is negative, then splitting the topic into subtopics is more trouble than its worth. And answering question 3 requires answering question 2.. So here’s a summary of what the splits in each topic looked like.

The first column is the topic number, to help line up with the discussion in chapter 2. The second is the name I gave to the topic. The third is how proportion of articles in the topic that the binary sort put in one or other subtopic with probability of at least 0.99. As can be seen by you scroll down, or sorting by that column, these binary sorts are often very decisive.

The next two columns are the keywords for the two subtopics. That is, they are the (relatively common) words with the highest ratio of their probability of being in one subtopic to the probability of being in another. This is enough to get a sense of what division the subtopic is making.

So pretty clearly topic 38 should be split up. The first subtopic is broady speaking in philosophy of biology, and its centered around issues about animal cognition and Morgan’s Canon. The second subtopic is about some issues in (loosely speaking) Kripkean metaphysics.

And topic 70 looks fairly disjunctive as well. The first subtopic is about applied ethics; the second subtopic is about psychoanalysis. That’s easy enough to split as well.

A lot of the others don’t look like they are dividing across topic boundaries. Both sides of the speech acts topic are in philosophy of language, as are both sides of belief ascriptions. This was actually a little disappointing about speech acts; I was hoping that the model would tease apart the Austin-inspired work from the middle of the twentieth century from the Langton-inspired work from more recent years. But it didn’t find that division. (I was hoping this because a few other models I’d run had found this division. But sadly this one didn’t.)

The model splits decision theory into two parts, one centered around the Pasadena problem and the other around the two envelope paradox. It is far from obvious how to categorise decision theory, but it doesn’t seem that it would get any easier by following this division. So I’ll leave that in one piece.

The really complicated one here is arguments. The keywords of the first subtopic suggest it is primarily about conceivability arguments for dualism. The second subtopic is a bit more of a mixture. There is a hint (backed up by looking at the articles) that it includes some articles about arguments for incompatibilism. But the big thing about this binary sort is that it is very asymmetric. The reason it has such a high confidence measure is that most of the articles are firmly in the second subtopic.

In fact of the 122 articles, 65 of them have a probability greater than 0.99 of being in the second subtopic. A better way to think about what’s happening here is that the binary sort didn’t so much split the topic in two, as carve out a distinctive subset from the whole. I’ll treat the first subtopic as being about conceivability arguments in particular, and the second as being about arguments in general.

Let’s look at the rest of the topics that I decided to split up into subtopics:

Table 5.2: Seven disjunctive topics.
Topic	Subject	Confidence	First	Second
24	Ordinary Language	0.368	morality, moral, duty, ethics, agent, ethical, desires	memory, red, perception, seeing, propositions, material, data
35	Freedom and Free Will	0.569	paternalism, liberal, mill, paternalistic, slave, interference, ideal	blame, causally, effort, excuse, acted, undetermined, caused
36	Crime and Punishment	0.510	desert, wrongdoing, fault, wrongdoer, forgiveness, mercy, forgive	jurisprudence, kelsen, validity, positivism, constitution, statements, norm
37	Sets and Grue	0.571	grue, green, emeralds, examined, projectible, verisimilitude, entrenchment	frege, membership, pure, null, abstraction, plural, boolos
77	Frankfurt Cases	0.616	power, widerker, abilities, transfer, deciding, mckenna, black	fictional, fiction, pretence, judgements, imaginative, characters, imagining
79	Races and DNA	0.655	classical, acid, crick, amino, biochemical, acids, watson	race, races, populations, profiling, folk, elegans, appiah
90	Norms	0.415	brandom, kripke, correctness, assertion, judgements, propositional, wittgenstein	love, strawson, contempt, scanlon, persons, emotions, resentment

Ordinary language is a mess, and the confidence measure isn’t as high as I’d like to make a division, but the subtopics look pretty clearly disjoint. The first is about ethics, the second is about mind. And since we already have a separate topic for contemporaneous British philosophy of language, splitting this topic into ethics and mind seems like a sensible plan.

Freedom and free will is really the one case where the model got thrown by the fact that two different philosophical debates use a common word. But the subtopics bail us out here, splitting it into debates about free will and debates about political freedom.

Crime and punishment could arguably have been left alone. But it looked to me like the first subtopic concerns issues in ethics, and especially about forgiveness as an interpersonal relationship, and the second is about social and political philosophy.

I’ve already gone over sets and grue at some length.

I don’t quite know what happened in the original model with Frankfurt cases. Most of the models I built had a topic centered around Frankfurt cases. (And the ones that didn’t had a distinct topic for free will, it was just that Frankfurt wasn’t especially central to them.) But no model other than this one threw stuff about fiction in with them. (There was one time when the model insisted on putting works about fiction in with philosophy of biology work on function. And I spent a lot of time worrying that it was being overly influenced by the overlapping letters.) Anyway, the subtopics bail us out here—philosophy of fiction has little to do with the free will debates that this topic is primarily about.

Races and DNA makes a bit more sense as a topic, but is still hard to classify. But the subtopics are easy to classify; the first is about philosophy of science, the second about social and political philosophy.

And the work on norms divides reasonably neatly into language norms and ethical norms. Possibly if we ran the clock forward and included papers after 2013 we’d see more papers on epistemic norms here, and that would complicate the neat division.

So those are the divisions I made. It’s helpful to have them as a table, not least because I’ve already been using their names in the previous chapter.

Table 5.3: The ten topics that are divided into subtopics.
Topic	Subject	First Subtopic	Second Subtopic
24	Ordinary language	OLP ethics	OLP mind
35	Freedom and free will	Political freedom	Free will
36	Crime and punishment	Forgiveness	Law
37	Sets and grue	Grue	Sets
38	Origins and purposes	Teleology	Origin essentialism
55	Arguments	Conceivability arguments	Arguments
70	Medical ethics and Freud	Medical ethics	Freud
77	Frankfurt cases	Frankfurt cases	Fiction
79	Races and DNA	DNA	Race
90	Norms	Language norms	Moral norms

There is one last technical point to notice. The binary sort tells us how to divide the articles that are in a topic into one or other subtopic. But to calculate the weighted sums, we also have to assign some weights to the articles that are primarily in other topics, but which have some probability of being in this topic. (This is especially pressing for the ordinary language and arguments topics.)

Happily there is a way to handle this. The topicmodels package lets us apply an LDA model out of sample. That is, once there is a model, we can ask it how probable it is that some new article, which wasn’t used for generating the model, falls in one topic or another. For each of these ten topics, I went back and looked at all 32183 articles, and asked for how probable it is that they are in one of these subtopics or another. I then multiplied that probability by the probability that they were in the original topic to get the probability that they landed in a subtopic. And those probabilities are what went into the weighted-sum graphs in the last chapter.