Here’s another interview (with transcript) at Academic Influence with Erik J. Larson, author of The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do (2021). The book was #2 at Amazon as of 11:00 am EST today in the Natural Language Processing category.
In this interview, Larson talks about how he developed an algorithm to rank people by the amount of influence they have, using Wikipedia. That was one of the projects that got him thinking about myths of artificial intelligence. It began with his reading of Hannah Arendt, a philosopher of totalitarianism:
Erik Larson: And she has a whole philosophy of technology that I was reading as background to write The Myth of Artificial Intelligence. So I was actually reading this person that has nothing to do with influence algorithms. And it just popped into my head, I felt like, “Shouldn’t we be able to exploit the fact that Wikipedia publishes on influential people and mentions the topics that those influence… That mentions the primary topics where those influential people have influence?” And so that was the original idea.
0:05:09.8 EL: I mean it’s not like I invented the microchip or something, so let’s be clear that this is not the biggest idea in the world. But it is, it was very exciting how it came about. I ended up stopping doing what I was doing. I went home and I stayed up all nights looking at Wikipedia pages and scratching out how we could use that to compute an influence score. And I reached a couple of these points where I gave up and said, “It’s not gonna work. I don’t think the information is available on Wikipedia, we’re gonna have to do something else.” And then by 6:00 AM, I had actually scribbled out the whole algorithm…
0:06:36.6 EL: … the basic idea of the algorithm is, if you take a person page on Wikipedia and you start counting from the very first word of the first sentence, if you start counting until you hit a topic, what they do, that topic, roughly speaking, the number of tokens or words between the beginning of their description and the introduction of that topic will tell you how important that topic is to that person’s bio. So if they mentioned that the person is a gardener in the first sentence, the person is probably an influential gardener, just intuitively. If they mentioned that the person gardens on their spare time and they’re into astrophysics, and gardening shows up in the reference or in the final sort of family life section or something at the end of the article, gardening is probably less important. So that’s the first half of the algorithm: the distance between the beginning and the occurrence of a subject matter will give you a rough idea of how important that subject matter is to the influence, the influence of the author, relative to that subject.
0:07:46.7 EL: The second part is to flip it, and this was really the innovation, to flip it and say, “If you go to that topic and you do the exact same thing and you start counting from the beginning of the description of the topic to the occurrence of the name of the person, that distance measured, combined with the original person page distance measured, in other words, person to topic, and then from that topic to person, as a weighted statistical combination of those two will give you a rough idea of how influential Wikipedia, and therefore, via sort of reasonable extension, the world, thinks that person is to that topic. And so that’s the idea that I had when I was reading Hannah Arendt … and I don’t know where it came from, it just popped into my head. The idea was like, “There has to be some way of exploiting the information that’s sitting in Wikipedia to compute an influence measure.”
The problem Larson was trying to address, as he goes on to say, is that typical web-based measures of influence focus on the amount of traffic to a page. But that is a very blunt measure. A two-headed kitten may attract a lot of attention without having any influence. As Larson puts it, “The problem is that you can have extremely influential people who are not, by and large, popular in modern or in broad media terms, right? Like you can have somebody that’s an expert in string theory or something, but they’re not sort of… They don’t have a huge Instagram following.” [08:52.00 EL]. No, but they may dominate a field in science that transmits basic ideas about our universe to the public. Sometimes such people, Stephen Hawking for example, are well-known. Often they are not.
The way that Larson arrived at his formula for measuring influence using Wikipedia is an instance of “flash of insight” creativity. Mathematician Gregory Chaitin, best known for Chaitin’s unknowable number, reflects on why that sort of creativity is not computable: “… it might be that you could prove theorems about creativity. But the theory wouldn’t give you a formula, a recipe, for being creative. Because once it does that, then it’s not creative. You see? There’s this paradox.”