8.5 Neighbors

There is another way that we can measure distance between articles. This is the way that I was measuring distance in the topic summaries back in chapter 2. For a pair of topics $\langle x, y\rangle$ , look at the articles that are more likely in topic $x$ than any other topic and find the average probability that these articles are in $y$ . Unlike correlations, this is an asymmetric measure. But it tells us something useful about the connections between the topics. I’ll start by looking at the top of this table.

Table 8.23: Table 8.24: Highest cross-topic probability.
Subject One	Subject Two	Average Probability
Emotions	Ordinary language	0.0819
Functions	Evolutionary biology	0.0790
Meaning and use	Ordinary language	0.0783
Depiction	Ordinary language	0.0778
Knowledge	Justification	0.0734
Theory testing	Chance	0.0731
Promises and imperatives	Ordinary language	0.0706
Knowledge	Ordinary language	0.0663
Reasons	Ordinary language	0.0662
Virtues	Ordinary language	0.0653
Ontological argument	Arguments	0.0650
Speech acts	Ordinary language	0.0647
Moral conscience	Ordinary language	0.0645
Life and value	Idealism	0.0630
Vagueness	Truth	0.0620
Freedom and free will	Ordinary language	0.0618
Intention	Ordinary language	0.0609
Value	Ordinary language	0.0609
Perception	Ordinary language	0.0607
Models	Causation	0.0607
Justification	Knowledge	0.0603
Ontological argument	Faith and theism	0.0603
Beauty	Ordinary language	0.0597
Norms	Ordinary language	0.0593
Virtues	Moral conscience	0.0588

That’s not as helpful as I’d hoped. Lots of topics are such that articles in them look a lot like ordinary language philosophy. I’ll deal with this by simple brute force; I’ll filter out the ordinary language philosophy topic, and rerun the table.

Table 8.25: Table 8.26: Highest Cross-Topic Probability (excluding ordinary language).
Subject One	Subject Two	Average Probability
Functions	Evolutionary biology	0.0790
Knowledge	Justification	0.0734
Theory testing	Chance	0.0731
Ontological argument	Arguments	0.0650
Life and value	Idealism	0.0630
Vagueness	Truth	0.0620
Models	Causation	0.0607
Justification	Knowledge	0.0603
Ontological argument	Faith and theism	0.0603
Virtues	Moral conscience	0.0588
Theories and realism	Methodology of science	0.0586
Cognitive science	Mechanisms	0.0558
Propositions and implications	Deduction	0.0551
Color/colour	Perception	0.0549
Kant	Idealism	0.0548
Belief ascriptions	Sense and reference	0.0543
Chance	Theory testing	0.0530
Psychology	Idealism	0.0520
Intention	Promises and imperatives	0.0516
History and culture	Life and value	0.0515
History and culture	Other history	0.0504
Self-consciousness	Idealism	0.0500
Frankfurt cases	Arguments	0.0499
War	Liberal democracy	0.0495
Marx	Life and value	0.0493

That is a little more interesting, and a little more sensible, but there are a couple of things that jumped out.

One is that there are a bunch of things here that don’t appear on the correlations table. It makes sense that Kant and idealism go together, but the correlation table didn’t show that up. So maybe this is a better measure of proximity. It’s at least an interestingly different measure.

But the other surprise is that there are so few pairs that are on this list in both directions. Possibly these two surprises are related. Knowledge and justification are there in both directions, and I think that’s it. In some cases I think it’s easy to see why. The modeling articles are often about causal modeling, so they feel like causation articles. But lots of causation articles, especially pre-Lewis, don’t feel like causal modeling articles, and hence don’t feel like modeling. But I would have guessed pairs like that woud be the outlier; they seem to be the usual case.

Next let’s look at the lower end of this table.

Table 8.27: Table 8.28: Lowest cross-topic probability.
Subject One	Subject Two	Average Probability
Space and time	Emotions	1e-04
Psychology	Formal epistemology	1e-04
Quantum physics	Medical ethics and Freud	1e-04
Life and value	Formal epistemology	1e-04
Perception	Crime and punishment	1e-04
Chemistry	Liberal democracy	1e-04
Time	Liberal democracy	1e-04
Models	Faith and theism	1e-04
Races and DNA	Ancient	1e-04
Models	Moral conscience	1e-04
Color/colour	Crime and punishment	1e-04
Game theory	Early modern	1e-04
Population ethics	Psychology	1e-04
Models	Emotions	1e-04
Quantum physics	Emotions	1e-04
Vagueness	Psychology	1e-04
Wide content	Crime and punishment	1e-04
Hume	Space and time	2e-04
Duties	Mathematics	2e-04
Kant	Medical ethics and Freud	2e-04
Liberal democracy	Space and time	2e-04
Vagueness	Races and DNA	2e-04
Truth	Liberal democracy	2e-04
Minds and machines	Crime and punishment	2e-04
Color/colour	Duties	2e-04

And this is why I’ve used this measure as my preferred distance measure. Those all look like topics that have nothing to do with each other. And they don’t!

There is a relatively technical point that’s worth emphasizing here. The model gives a nonzero probability to each article being in each topic. But it pretty clearly doesn’t calculate each of those probabilities particularly carefully. If you look at the probability distribution for any article, there are some carefully calculated probabilities for anywhere from one to twenty topics. (Usually five to eight, at least by my impression.) Then all the other topics get the very same probability. What that same probability is seems, as far as I can tell, to be a factor of how confident the model is in its assignment. But it’s just some very very low number.

What we’re seeing here is that for a bunch of pairs of topics, every one of the articles that is naturally in the first topic gets one of these residual probabilities for the second topic. For example, for every article in psychology, the probability that it is in formal epistemology is minimal.

That means we really shouldn’t care about the order of this table. This is a list of topics that the model thinks have basically nothing in common. And apart from being a little surprised about Kant being paired up that way with medical ethics and Freud, I can’t see much to complain about here. And note that even in that case, there are some medical ethics articles that the model thinks are a bit about Kant; it just thinks that no Kant articles are maybe about medical ethics. And that seems perfectly sensible.