8.5 Neighbors

There is another way that we can measure distance between articles. This is the way that I was measuring distance in the topic summaries back in chapter 2. For a pair of topics \(\langle x, y\rangle\), look at the articles that are more likely in topic \(x\) than any other topic and find the average probability that these articles are in \(y\). Unlike correlations, this is an asymmetric measure. But it tells us something useful about the connections between the topics. I’ll start by looking at the top of this table.

Table 8.15: Highest cross-topic probability.
Subject One Subject Two Average Probability
Emotions Ordinary language 0.0819
Functions Evolutionary biology 0.0790
Meaning and use Ordinary language 0.0783
Depiction Ordinary language 0.0778
Knowledge Justification 0.0734
Theory testing Chance 0.0731
Promises and imperatives Ordinary language 0.0706
Knowledge Ordinary language 0.0663
Reasons Ordinary language 0.0662
Virtues Ordinary language 0.0653
Ontological argument Arguments 0.0650
Speech acts Ordinary language 0.0647
Moral conscience Ordinary language 0.0645
Life and value Idealism 0.0630
Vagueness Truth 0.0620
Freedom and free will Ordinary language 0.0618
Intention Ordinary language 0.0609
Value Ordinary language 0.0609
Perception Ordinary language 0.0607
Models Causation 0.0607
Justification Knowledge 0.0603
Ontological argument Faith and theism 0.0603
Beauty Ordinary language 0.0597
Norms Ordinary language 0.0593
Virtues Moral conscience 0.0588

That’s not as helpful as I’d hoped. Lots of topics are such that articles in them look a lot like ordinary language philosophy. I’ll deal with this by simple brute force; I’ll filter out the ordinary language philosophy topic, and rerun the table.

Table 8.16: Highest Cross-Topic Probability (excluding ordinary language).
Subject One Subject Two Average Probability
Functions Evolutionary biology 0.0790
Knowledge Justification 0.0734
Theory testing Chance 0.0731
Ontological argument Arguments 0.0650
Life and value Idealism 0.0630
Vagueness Truth 0.0620
Models Causation 0.0607
Justification Knowledge 0.0603
Ontological argument Faith and theism 0.0603
Virtues Moral conscience 0.0588
Theories and realism Methodology of science 0.0586
Cognitive science Mechanisms 0.0558
Propositions and implications Deduction 0.0551
Color/colour Perception 0.0549
Kant Idealism 0.0548
Belief ascriptions Sense and reference 0.0543
Chance Theory testing 0.0530
Psychology Idealism 0.0520
Intention Promises and imperatives 0.0516
History and culture Life and value 0.0515
History and culture Other history 0.0504
Self-consciousness Idealism 0.0500
Frankfurt cases Arguments 0.0499
War Liberal democracy 0.0495
Marx Life and value 0.0493

That is a little more interesting, and a little more sensible, but there are a couple of things that jumped out.

One is that there are a bunch of things here that don’t appear on the correlations table. It makes sense that Kant and idealism go together, but the correlation table didn’t show that up. So maybe this is a better measure of proximity. It’s at least an interestingly different measure.

But the other surprise is that there are so few pairs that are on this list in both directions. Possibly these two surprises are related. Knowledge and justification are there in both directions, and I think that’s it. In some cases I think it’s easy to see why. The modeling articles are often about causal modeling, so they feel like causation articles. But lots of causation articles, especially pre-Lewis, don’t feel like causal modeling articles, and hence don’t feel like modeling. But I would have guessed pairs like that woud be the outlier; they seem to be the usual case.

Next let’s look at the lower end of this table.

Table 8.17: Lowest cross-topic probability.
Subject One Subject Two Average Probability
Space and time Emotions 1e-04
Psychology Formal epistemology 1e-04
Quantum physics Medical ethics and Freud 1e-04
Life and value Formal epistemology 1e-04
Perception Crime and punishment 1e-04
Chemistry Liberal democracy 1e-04
Time Liberal democracy 1e-04
Models Faith and theism 1e-04
Races and DNA Ancient 1e-04
Models Moral conscience 1e-04
Color/colour Crime and punishment 1e-04
Game theory Early modern 1e-04
Population ethics Psychology 1e-04
Models Emotions 1e-04
Quantum physics Emotions 1e-04
Vagueness Psychology 1e-04
Wide content Crime and punishment 1e-04
Hume Space and time 2e-04
Duties Mathematics 2e-04
Kant Medical ethics and Freud 2e-04
Liberal democracy Space and time 2e-04
Vagueness Races and DNA 2e-04
Truth Liberal democracy 2e-04
Minds and machines Crime and punishment 2e-04
Color/colour Duties 2e-04

And this is why I’ve used this measure as my preferred distance measure. Those all look like topics that have nothing to do with each other. And they don’t!

There is a relatively technical point that’s worth emphasizing here. The model gives a nonzero probability to each article being in each topic. But it pretty clearly doesn’t calculate each of those probabilities particularly carefully. If you look at the probability distribution for any article, there are some carefully calculated probabilities for anywhere from one to twenty topics. (Usually five to eight, at least by my impression.) Then all the other topics get the very same probability. What that same probability is seems, as far as I can tell, to be a factor of how confident the model is in its assignment. But it’s just some very very low number.

What we’re seeing here is that for a bunch of pairs of topics, every one of the articles that is naturally in the first topic gets one of these residual probabilities for the second topic. For example, for every article in psychology, the probability that it is in formal epistemology is minimal.

That means we really shouldn’t care about the order of this table. This is a list of topics that the model thinks have basically nothing in common. And apart from being a little surprised about Kant being paired up that way with medical ethics and Freud, I can’t see much to complain about here. And note that even in that case, there are some medical ethics articles that the model thinks are a bit about Kant; it just thinks that no Kant articles are maybe about medical ethics. And that seems perfectly sensible.