8.5 Neighbors
There is another way that we can measure distance between articles. This is the way that I was measuring distance in the topic summaries back in chapter 2. For a pair of topics \(\langle x, y\rangle\), look at the articles that are more likely in topic \(x\) than any other topic and find the average probability that these articles are in \(y\). Unlike correlations, this is an asymmetric measure. But it tells us something useful about the connections between the topics. I’ll start by looking at the top of this table.
Subject One | Subject Two | Average Probability |
---|---|---|
Emotions | Ordinary language | 0.0819 |
Functions | Evolutionary biology | 0.0790 |
Meaning and use | Ordinary language | 0.0783 |
Depiction | Ordinary language | 0.0778 |
Knowledge | Justification | 0.0734 |
Theory testing | Chance | 0.0731 |
Promises and imperatives | Ordinary language | 0.0706 |
Knowledge | Ordinary language | 0.0663 |
Reasons | Ordinary language | 0.0662 |
Virtues | Ordinary language | 0.0653 |
Ontological argument | Arguments | 0.0650 |
Speech acts | Ordinary language | 0.0647 |
Moral conscience | Ordinary language | 0.0645 |
Life and value | Idealism | 0.0630 |
Vagueness | Truth | 0.0620 |
Freedom and free will | Ordinary language | 0.0618 |
Intention | Ordinary language | 0.0609 |
Value | Ordinary language | 0.0609 |
Perception | Ordinary language | 0.0607 |
Models | Causation | 0.0607 |
Justification | Knowledge | 0.0603 |
Ontological argument | Faith and theism | 0.0603 |
Beauty | Ordinary language | 0.0597 |
Norms | Ordinary language | 0.0593 |
Virtues | Moral conscience | 0.0588 |
That’s not as helpful as I’d hoped. Lots of topics are such that articles in them look a lot like ordinary language philosophy. I’ll deal with this by simple brute force; I’ll filter out the ordinary language philosophy topic, and rerun the table.
Subject One | Subject Two | Average Probability |
---|---|---|
Functions | Evolutionary biology | 0.0790 |
Knowledge | Justification | 0.0734 |
Theory testing | Chance | 0.0731 |
Ontological argument | Arguments | 0.0650 |
Life and value | Idealism | 0.0630 |
Vagueness | Truth | 0.0620 |
Models | Causation | 0.0607 |
Justification | Knowledge | 0.0603 |
Ontological argument | Faith and theism | 0.0603 |
Virtues | Moral conscience | 0.0588 |
Theories and realism | Methodology of science | 0.0586 |
Cognitive science | Mechanisms | 0.0558 |
Propositions and implications | Deduction | 0.0551 |
Color/colour | Perception | 0.0549 |
Kant | Idealism | 0.0548 |
Belief ascriptions | Sense and reference | 0.0543 |
Chance | Theory testing | 0.0530 |
Psychology | Idealism | 0.0520 |
Intention | Promises and imperatives | 0.0516 |
History and culture | Life and value | 0.0515 |
History and culture | Other history | 0.0504 |
Self-consciousness | Idealism | 0.0500 |
Frankfurt cases | Arguments | 0.0499 |
War | Liberal democracy | 0.0495 |
Marx | Life and value | 0.0493 |
That is a little more interesting, and a little more sensible, but there are a couple of things that jumped out.
One is that there are a bunch of things here that don’t appear on the correlations table. It makes sense that Kant and idealism go together, but the correlation table didn’t show that up. So maybe this is a better measure of proximity. It’s at least an interestingly different measure.
But the other surprise is that there are so few pairs that are on this list in both directions. Possibly these two surprises are related. Knowledge and justification are there in both directions, and I think that’s it. In some cases I think it’s easy to see why. The modeling articles are often about causal modeling, so they feel like causation articles. But lots of causation articles, especially pre-Lewis, don’t feel like causal modeling articles, and hence don’t feel like modeling. But I would have guessed pairs like that woud be the outlier; they seem to be the usual case.
Next let’s look at the lower end of this table.
Subject One | Subject Two | Average Probability |
---|---|---|
Space and time | Emotions | 1e-04 |
Psychology | Formal epistemology | 1e-04 |
Quantum physics | Medical ethics and Freud | 1e-04 |
Life and value | Formal epistemology | 1e-04 |
Perception | Crime and punishment | 1e-04 |
Chemistry | Liberal democracy | 1e-04 |
Time | Liberal democracy | 1e-04 |
Models | Faith and theism | 1e-04 |
Races and DNA | Ancient | 1e-04 |
Models | Moral conscience | 1e-04 |
Color/colour | Crime and punishment | 1e-04 |
Game theory | Early modern | 1e-04 |
Population ethics | Psychology | 1e-04 |
Models | Emotions | 1e-04 |
Quantum physics | Emotions | 1e-04 |
Vagueness | Psychology | 1e-04 |
Wide content | Crime and punishment | 1e-04 |
Hume | Space and time | 2e-04 |
Duties | Mathematics | 2e-04 |
Kant | Medical ethics and Freud | 2e-04 |
Liberal democracy | Space and time | 2e-04 |
Vagueness | Races and DNA | 2e-04 |
Truth | Liberal democracy | 2e-04 |
Minds and machines | Crime and punishment | 2e-04 |
Color/colour | Duties | 2e-04 |
And this is why I’ve used this measure as my preferred distance measure. Those all look like topics that have nothing to do with each other. And they don’t!
There is a relatively technical point that’s worth emphasizing here. The model gives a nonzero probability to each article being in each topic. But it pretty clearly doesn’t calculate each of those probabilities particularly carefully. If you look at the probability distribution for any article, there are some carefully calculated probabilities for anywhere from one to twenty topics. (Usually five to eight, at least by my impression.) Then all the other topics get the very same probability. What that same probability is seems, as far as I can tell, to be a factor of how confident the model is in its assignment. But it’s just some very very low number.
What we’re seeing here is that for a bunch of pairs of topics, every one of the articles that is naturally in the first topic gets one of these residual probabilities for the second topic. For example, for every article in psychology, the probability that it is in formal epistemology is minimal.
That means we really shouldn’t care about the order of this table. This is a list of topics that the model thinks have basically nothing in common. And apart from being a little surprised about Kant being paired up that way with medical ethics and Freud, I can’t see much to complain about here. And note that even in that case, there are some medical ethics articles that the model thinks are a bit about Kant; it just thinks that no Kant articles are maybe about medical ethics. And that seems perfectly sensible.