So based on my corpus of about 1.5m words (1800+ items), they suggested 60 topics, more than double what I'd ever tried.
We also discussed how topic modeling works wells for some sources (things with short relatively coherent content, like newspapers and diaries) but not so well for some other things potentially. Jeri noted that she and Fred Gibbs' had discussed this issue as well but concluded that different algorithms might be necessary for topic modeling different sources. Of course if we get into individual writing idiosyncrasies, well we are looking at some custom scripting then right (Which is when Bridget told me I could never pass one of her classes OH SNAP, she is FABULOUS BTW).
ANYWHO using David Newman's mallett tool I had yet ANOTHER RUN at Off Our Backs and DANG if it didn't work out pretty nicely. HOLLA Jon Goodwin!
List of Topics
So using same metric I ran 30 topics on 600K word corpus of Chrysalis (156 items). The text is less clean, so the clusters have some non words, and the results don't seem quite as clear.
No comments:
Post a Comment