However, even better than the technical support she has provided, Heather has slowly been teaching me to stop thinking like a historian. Initially all I cared about was content words. Boring I was told. Wait, what? textual analysis = content word. In other words I wanted to make tools do what my mind normally did. Heather keeps pushing me to see how articles and conjunctions and other "stuff" historians, at least this one, tend to ignore in our close readings.
I of course have limited expectations for conjunctions, which clearly occurred to Heather, who then kindly sent me a detailed email. So get you pen and paper (or open that new doc) and learn with me.
fill in the blanks for the following. Try to come up with 3-4 examples each (this shouldn't be very difficult - you can use one word, or 2+ words...)
he _______
the ______
_______ not
_______ it
he said
he sucks
he is
the book
the computer
the kid
why not
is not
can not
get it
was it
find it
What were the kinds of words you used to fill in these blanks? My examples for he included "he is" "he was" "he has" "he ate" "he likes"; for not i had examples like "did not" "dare not" "shall not". The reason this was kind of easy was because we know what words go together VERY often, and are therefore highly salient. (Pick up a book near you and look to see what is appearing next to he or it, for instance). These words, as you have noticed, are collocates - yes - but they also build stock phrases as n-grams.*
so me and my expectations. Conjunctions might not have been where I should start it seems as that is pretty hard (they can join pretty much anything right). However pronouns could be promising.
So I look into the above. I pick "he" and check to see if what I thought above is true
When I say expectations, I mean "What words do you expect to find near these words based on your knowledge of the language anyway?" You may not have ever thought about them this way - but start by filling in the blanks. If you can produce a quick list of words you'd expect to find like that, go see if they are indeed appearing in your corpus. If they are, great, that means you're mostly adhering to expectations. If they're NOT, why not? Can you explain it?
well SH!T it worked for "he"
was
that
is
had
but
as
when
said
"not" also works, as is = top collocateand for "it" got was way up top
OK so √ homework, now I can play with fun possessive pronouns and the like!
This is also where things can get interesting. What about "male _________"? (patriarchy, hegemony, oppression ...) These are all expected based on your corpus of women's lib writing. but what if "male hegemony" only shows up after a certain date? That's interesting, because we might have assumed that the phrase "male hegemony" was already established by 1979 - but you now have evidence which suggests otherwise. (More questions to ask after that: do they all start using "male hegemony" in 1982? Who does it first? How quickly does this phrase spread? etc) Get creative!I'm looking forward to working with 1978-1981 Chrysalis which my amazing graduate student Whitney Esson is digitizing and converting and well as Off Our Backs for the same dates courtesy of JSTOR. I'll be presenting the results at the Greenwood Digital Center for the History of Women's Education in March and as always will blog as I go! he Albert M. Greenfield Digital Center for the History of Women's Educatio
*These are not quite ngrams as google would suggest but these kinds of ngrams http://en.wikipedia.org/wiki/N-gram. Basically, this means they are really likely to appear near each other to the point of being highly-saturated in language as high-saliency combinations. A good example of this is to repeat the process with ________ dark. Which of the following leaps to mind first: "After dark" or "table dark"? "after dark" is the stock phrase, whereas "table dark" requires more effort to come up with (and "dark table" would be the more salient use of dark + table in a phrase, for example)
heather froehlich
I'm glad this was really helpful as a way to think about having expectations for words. (This is just how I think, so it's been an interesting process to present it to someone else.)
ReplyDeleteHere are some quick thoughts on watching you do this:
Conjunctions can't go with everything! Example: won't you, won't go, *won't slowly. You probably would not even try to put "slowly" with "won't", because it's considered to be totally ungrammatical in English. Then there are other examples like "won't house" or "won't chair" which would be valid if "house" and "chair" are verbs (eg, put someone up for a night, run a committee) but invalid if they were the noun (the building you live in, the object you sit on), which require some thinking about.
tl;dr: conjunctions could be interesting.
When working with verbs (eg, "is") remember that there are many forms of is to consider: is, am, are, was, were... (similarly: ran, run, runs, running...) which can only connect to certain words. Don't worry too much about this in practice, because you know that "I running" or "she are" is wrong and you wouldn't try to put them together naturally. These tense/number variations are all ultimately the same verb (more info: http://en.wikipedia.org/wiki/Lemma_%28morphology%29)
This list of high-frequency words in American English http://www.wordfrequency.info/free.asp?s=y is useful for considering terms to try and fill in blanks with. Consider combining function words with more content-heavy phrases you'd expect to find to build some 2-3 word phrases to look at...