So if we take as a given that what historians do is make
connections between things, then I can’t actually fathom why everyone isn’t all
over digital history. I’m envisioning
all sorts of things ranging from network map of authors in and among
periodicals to topic modeling discourses
to see if they line up with what we think of as the “strands” of feminism in
each periodical.
Right now I’m trying to figure out the methodology of my topic modeling project, while at the same time learning how to topic model. Thankfully for me, the fine folks at the programming historian as well as awesome DH tweeps like Miriam Posner and Ted Underwood and Scott Weingart are so helpful and have written huge posts about how –to.
So methodology ... I’m looking at the concept of culture in women's liberation periodicals in the United States. I'm interested in how discourse shifts around culture and if a particular feminist ideology "matches up" with coverage/discussion of culture.
Off Our Backs is the only fully digitized “movement”
periodical. It runs 1970-2008, which is a problem in terms of the limits JSTOR puts on downloading, but I can always request the whole thing. I’ve had Chrysalis, a magazine of women’s
culture digitized for its full run 1977-1981.
I’m interested in looking at Women A Journal of Liberation ( fall 1969 -1981) as a
comparison for OOB and Heresies (1977-1983), which is already in PDF format thanks to this amazing project and documentary, by Joan Braderman, which you should SO bring to you campus
ASAP.
So do I model all of OOB and Women, then subset out the
dates that match Heresies and Chrysalis? Or do I even need to do that?
Should I model individual articles or as issues?
Also wondering if I should stick with David Newman's tool or switch to Mallett Confirmed that Newman wrote an awesome GUI for mallet
UPDATE
Still I’m finding the mistakes as instructive as the successes. Nothing like being the utterly clueless prof to give one insights into teaching.
And Voyant revealed some interesting stuff. For example, looking at the 46 instances of culture, gives me the context via words to right and words to left. However this has to be my favorite, knots (or maybe the collocate clusters).
Update again, found some really really lovely stuff done by Lisa Rhoday @lmrhody (topic modeling poetry)and Aditi S Muralidharan @silverasm (v. sophisticated text mining of slave narratives, Shakespeare)
Should I model individual articles or as issues?
UPDATE
Frustration and getting it all wrong
So playing around on the interwebs I came across some new
things I just had to play with
I quickly grabbed a pdf of Heresies #1. Ran it through ABBYY fine reader to create
text.
Then I decided to run that through voyant which yielded some
lovely results.
Frequencies
|
Count
|
Z-Score
|
Difference
|
Mean
|
Std.
Dev.
|
Peakedness
|
Skew
|
Trend
|
women
|
417
|
7.91
|
–
|
64.6
|
0
|
–
|
–
|
|
art
|
402
|
7.62
|
–
|
62.3
|
0
|
–
|
–
|
|
work
|
198
|
3.7
|
–
|
30.7
|
0
|
–
|
–
|
|
feminist
|
144
|
2.66
|
–
|
22.3
|
0
|
–
|
–
|
|
class
|
132
|
2.43
|
–
|
20.4
|
0
|
–
|
–
|
|
women's
|
128
|
2.35
|
–
|
19.8
|
0
|
–
|
–
|
|
new
|
123
|
2.26
|
–
|
19.1
|
0
|
–
|
–
|
|
woman
|
123
|
2.26
|
–
|
19.1
|
0
|
–
|
–
|
|
like
|
116
|
2.12
|
–
|
18
|
0
|
–
|
–
|
|
la
|
111
|
2.02
|
–
|
17.2
|
0
|
–
|
–
|
|
men
|
107
|
1.95
|
–
|
16.6
|
0
|
–
|
–
|
|
feminism
|
105
|
1.91
|
–
|
16.3
|
0
|
–
|
–
|
|
male
|
99
|
1.79
|
–
|
15.3
|
0
|
–
|
–
|
|
political
|
98
|
1.77
|
–
|
15.2
|
0
|
–
|
–
|
|
world
|
91
|
1.64
|
–
|
14.1
|
0
|
–
|
–
|
|
artists
|
89
|
1.6
|
–
|
13.8
|
0
|
–
|
–
|
|
time
|
89
|
1.6
|
–
|
13.8
|
0
|
–
|
–
|
|
make
|
78
|
1.39
|
–
|
12.1
|
0
|
–
|
–
|
|
que
|
74
|
1.31
|
–
|
11.5
|
0
|
–
|
–
|
|
people
|
71
|
1.26
|
–
|
11
|
0
|
–
|
–
|
|
york
|
71
|
1.26
|
–
|
11
|
0
|
–
|
–
|
|
mural
|
63
|
1.1
|
–
|
9.8
|
0
|
–
|
–
|
|
female
|
61
|
1.06
|
–
|
9.4
|
0
|
–
|
–
|
|
experience
|
60
|
1.04
|
–
|
9.3
|
0
|
–
|
–
|
|
power
|
59
|
1.02
|
–
|
9.1
|
0
|
–
|
–
|
|
working
|
59
|
1.02
|
–
|
9.1
|
0
|
–
|
–
|
|
left
|
55
|
0.95
|
–
|
8.5
|
0
|
–
|
–
|
|
feminists
|
51
|
0.87
|
–
|
7.9
|
0
|
–
|
–
|
|
just
|
51
|
0.87
|
–
|
7.9
|
0
|
–
|
–
|
|
los
|
51
|
0.87
|
–
|
7.9
|
0
|
–
|
–
|
|
movement
|
51
|
0.87
|
–
|
7.9
|
0
|
–
|
–
|
|
way
|
51
|
0.87
|
–
|
7.9
|
0
|
–
|
–
|
|
artist
|
50
|
0.85
|
–
|
7.7
|
0
|
–
|
–
|
|
prison
|
50
|
0.85
|
–
|
7.7
|
0
|
–
|
–
|
|
social
|
50
|
0.85
|
–
|
7.7
|
0
|
–
|
–
|
|
socialist
|
50
|
0.85
|
–
|
7.7
|
0
|
–
|
–
|
|
society
|
50
|
0.85
|
–
|
7.7
|
0
|
–
|
–
|
|
say
|
49
|
0.83
|
–
|
7.6
|
0
|
–
|
–
|
|
culture
|
48
|
0.81
|
–
|
7.4
|
0
|
–
|
–
|
|
day
|
48
|
0.81
|
–
|
7.4
|
0
|
–
|
–
|
|
life
|
48
|
0.81
|
–
|
7.4
|
0
|
–
|
–
|
|
good
|
47
|
0.79
|
–
|
7.3
|
0
|
–
|
–
|
|
love
|
46
|
0.77
|
–
|
7.1
|
0
|
–
|
–
|
|
tion
|
46
|
0.77
|
–
|
7.1
|
0
|
–
|
–
|
|
collective
|
45
|
0.76
|
–
|
7
|
0
|
–
|
–
|
|
en
|
44
|
0.74
|
–
|
6.8
|
0
|
–
|
–
|
|
money
|
44
|
0.74
|
–
|
6.8
|
0
|
–
|
–
|
|
politics
|
44
|
0.74
|
–
|
6.8
|
0
|
–
|
–
|
|
things
|
44
|
0.74
|
–
|
6.8
|
0
|
–
|
–
|
|
fact
|
43
|
0.72
|
–
|
6.7
|
0
|
–
|
–
|
I saved the above as csv spreadsheet thinking I could run that
through gephi (although I later realized I didn't export ALL of it which runs to 219 screens)
(actually I’m skipping the part where I messed up the export
to CSV, which I realized when I put file into gephi and saw it had 0 edges).
So happily I worked my way through Gephi tutorial, which is
lovely. Then I reached “rank” Clicking on that I realized that the numbers
in my excel spread sheet were being entered into the graph. DOH see what happens when you play without
knowing the tools? I’m pretty sure the
error is in formatting then importing into Gephi but damned if I can figure it
out tonight.
Still I’m finding the mistakes as instructive as the successes. Nothing like being the utterly clueless prof to give one insights into teaching.
And Voyant revealed some interesting stuff. For example, looking at the 46 instances of culture, gives me the context via words to right and words to left. However this has to be my favorite, knots (or maybe the collocate clusters).
Update again, found some really really lovely stuff done by Lisa Rhoday @lmrhody (topic modeling poetry)and Aditi S Muralidharan @silverasm (v. sophisticated text mining of slave narratives, Shakespeare)
I would recommend modeling individual articles, if that's achievable without crippling labor. It's reasonable to assume that there will be significant differences between articles on different topics, perhaps written by different hands.
ReplyDeleteBut mostly the answer to all these questions is a mixture of a) it depends on what you want to find and especially b) you'll have to try it and see.
I would start by modeling EverythingYouHave. Then look skeptically at the results to see if there there are fractures/clumps caused by differences in the print dates of different periodicals.