I'm in the midst of prepping a talk introducing various digital methods to new folks. I figured I'd better tackle the issue of Google Ngram. I prepared the slide below and then tweeted it, but got few responses. I am offering longer explanation here because hey maybe I'm totally wrong that Google Ngram is a No GO. I've been a Google Ngram skeptic for a long time, so I'm always surprised when I see academics using them. (but if you are going to, try this interface instead of Google's).
While a few of the errors in google n gram are well known, such as incorrect metadata or OCR issues, I repeatedly bump up against an error no one seems to be talking about, which is that what Ngram says is there is not.
For example, I searched google ngram for "cultural feminism" (figure 1) [note actually I started searching for the relative rates of radical feminism,cultural feminism which is really what is seductive to scholars I think, to show relational diachronic language shifts)
![]() |
slide |
![]() |
figure 1 click to get to Ngram interface |
then clicked through to the first date range at the bottom of the page (1980-1987). On the second page of results I saw In A Different Voice and The Handmaid's Tale, both of which I was pretty confident did not contain cultural feminism.
` |
Just to be sure I searched for cultural feminism in the advanced book function of google search specifying title = In A Different Voice (figure 3)
The Handmaid's Tale yielded a similar result (figure 5)
![]() |
figure 5 |
When I dug further I found that while the addition above was displayed, the etext was from a 2009 edition which contains Harold Bloom' intro which contains cultural feminism.
I also noticed that variants such as counter-cultural feminism appeared. Generally the further down the results, the less reliable, as in The Heidi Chronicles script (figure 6).
![]() |
figure 6 |
As I teach Women and War I felt pretty certain I'd have notice a phrase I've long researched in it (figure 7)
It was originally the metadata errors that caused me to distrust Ngram, as in Linda Alcoff's 1988 article being cited in a work dated 1977 (figure 8)
I haven't hand checked every result (I do it I use Google Books as evidence in argument) ; the seductive appeal of the Ngram - search lots and lots of books. Bottom line, most of the texts do contain cultural feminism I can see it bolded in the snippet of text below the title and are dated correctly In terms of a very broad very loose approximate change over time, the Ngram might suffice, but I'd be wary of using it for anything precise and I don't think I'd use it in a scholarly presentation without some pretty heavy disclaimers.
got a great tweet from Jeff Sonstein to more sophisticated guide to doing ngram searches which is FAB but with the OCR & metadata issues I don't know that it is enough.
![]() |
figure 7 |
![]() |
figure 8 |
I haven't hand checked every result (I do it I use Google Books as evidence in argument) ; the seductive appeal of the Ngram - search lots and lots of books. Bottom line, most of the texts do contain cultural feminism I can see it bolded in the snippet of text below the title and are dated correctly In terms of a very broad very loose approximate change over time, the Ngram might suffice, but I'd be wary of using it for anything precise and I don't think I'd use it in a scholarly presentation without some pretty heavy disclaimers.
got a great tweet from Jeff Sonstein to more sophisticated guide to doing ngram searches which is FAB but with the OCR & metadata issues I don't know that it is enough.
I'm no expert, bit I think the ngram database might be based on something completely different from the google books search algorithm. In a normal google search, you can find a page which doesn't contain the phrase you search for, if other pages link to it in connection with that phrase, maybe google books is the same?
ReplyDeleteI'd guess the original paper [1] would have more information?
1: Jean-Baptiste Michel*, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, William Brockman, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden*. Quantitative Analysis of Culture Using Millions of Digitized Books. Science (Published online ahead of print: 12/16/2010)