Friday, June 14, 2013

Lessons in Antconc #3 with Heather

Third in a series of transatlantic e-collaboration and learning from the fabulous Heather Froelich (@heatherfro) (earlier lessons here and here)

Bolded for the TL:DR crowd


I’ve already blogged about my search for the entry of “essentialism” into Anglophone feminist discourse using JSTOR’s new beta search.  However since I’m committed to writing an intellectual history of the rise and fall of women’s culture that relies on grassroots as well as feminists inside academia, that means another tool beyond JSTOR.

Off Our Backs (OOB), a grassroots women’s liberation newspaper, appears in the JSTOR database, but the search function reveals only that “essentialism” appears fifteen times, the earliest in 1984.  Given the results of my analysis on academic feminist journals, it might appear that academics introduced the idea of essentialism to grassroots feminism.  However I know essentialism was an issue earlier for grassroots feminists who likely labeled the word “essentialism” as jargon.  So I’m left trying to figure out other ways to get at essentialism discourse in OOB.

Enter AntConc to help me figure out which of 1839 full text articles I have via JSTOR for education from OOB 1977-1981, a pivotal era in my book manuscript, that might have information about essentialism, even if the word itself doesn’t appear.   

AntConc can help me by finding collocates, a linguistic term that refers to patterns of word usage in a body of texts.  However, as McEnery and Hardie point out in Corpus Linguistics: Method, Theory and Practice, “once we move beyond basic generalities and attempt to pin down collocation operationally or conceptually, we find a great multitude of different definitions.”[1]  Sinclair defines collocation as words that occur in relation to a “node” “within a specified span.”[2]

Translation: node = word you are interested in, specified span = 1 word to the left, 2 words to the right, 3 words to the left, 3 words to the right,

Now we have a definition of  a collocate is, but how does Antconc determine collocates? Sinclair, quoted in McEnery and Hardie, explains collocation as determined by “the length of the text in which the words appear, the number of times they both appear in the text, and the number of times they occur together.[3]

Great, so now all I need to decide is what “node word” to use.  As discussion of “biological determinism” is one of the earliest debates about essentialism in feminist discourse I decide to use "biological."

Using Antconc, I see “biological” appears in thirty-four files, which is better than the 1839 file I started with, but still a lot for me to read closely.[4]  Using collocates I hope to narrow that  further.  However, there are a total of 76 collocate types and collocate tokens 162.[5]

 Translation: A token is every word in a text, a type is a specific word.  Using the as an example of “type” we know that type “the” will repeat in a text.  If we have 100 tokens, and 10 are of the type “the” we can express this in a type to token ratio of 1:10 1 occurrence of the for every 10 words in our text. (See Heather’s blog for more on Type to Token ratios). 

In the sentences that include the word “biological” there are 162 tokens (all words) and the software has determine that 76 are “potential” collocate types (unique words), shown below. 

  


Anconc by default ranks the potential collocates by frequency order (first two columns) and then indicates whether the potential collocate appears to the L or R of the node word.  The stat column gives us the significance expressed as (using the default setting) “a 'Mutual Information' score, which is a measure of the probability that the collocate and the node word occur near each other, relative to how many times they each occur in total.) Antconc already “interprets” the relative significance, but here is more information 'Mutual Information' score in case you are curious.

I however want the results sorted by the stat column because “the higher the score the stronger the association between the two.”  Using the “sort by” function, set to “stat” give us the results we want in the order we want.  Among the highest collocates are variations on “biological determinism” which is excellent, as I know in feminist theory biological determinism is a subset of essentialism. 







Clicking on determinism (below) to see the word in context reveals not only biological, but also two instances of “hormonal determinism,” which I think of as a subset of “biological determinism.”  Using the last column I can see the file names in which these words appear.  Since my plan here is to narrow the 34 files down to those that deserve a close reading, I jot then down. I see also a file containing “biological explanation,” and note that as well.  

I’m intrigued also by “difference” as a potential collocate, but clicking on that to reveal the context causes a problem as it appears too many times for me to scan.  Using the concordance tab, I search for  “biological difference” and note relevant files.

Heather’s first lesson: look not only at what I hoped or expected to find, but to look at what is unexpected or contradictory. So I return to the results.  Hmmm “sex” appears four times with a SMI about half of the determinism variants, but still "biological sex" seems like it could be related to essentialist arguments so I add the two articles in which it appears to my list. "Biological constraints" also looks intriguing and as that file isn’t on my list yet, I add it.

Heather’s second lesson: learn to love the “small words,” so I click on “or” expecting to find something like “biological or social,” but I find “biological or sexual” and “biological or relatively static characteristics,” both of which intrigue me so files added to list.  “Of” could be interesting but gets me articles I’ve already selected for close reading (same for “on”).  

Ultimately, of the thirty-four articles that contained “biological,” I’ve used collocates to narrow to sixteen those deserving a close reading (some things just cannot be done digitally, third lesson from Heather, see blog post).  While this may not seem like a big deal,  I know from past experience that reading OOB is time consuming and these sixteen results will probably take two days of close reading, and I've therefore saved myself an additional two days of close reading.

Resources: Antconc Basic youtube,  Antconc advanced youtube



[1] Tony McEnery and Andrew Hardie, Corpus Linguistics: Method, Theory and Practice (Cambridge University Press, 2011).
[2] John Sinclair, Susan Jones, and Robert Daley, English Collocation Studies: The OSTI Report (Continuum International Publishing Group, 2004).
[3] McEnery and Hardie, Corpus Linguistics.
[4] (use concordance and concordance plot to determine this). 
[5] In the default settings of antconc, span is set for the smallest span, words that appear right before biological (1L) and immediately after 1R and for the lowest frequency, 1.  

No comments:

Post a Comment