Analysis of Topic Modeling

I played around with a few different numbers of topic/topic words and iterations in class. I ended up choosing topics from lists generated with 50 topics and 1000 iterations25 topics 2500 iterations, and finally 50 topics and 2000 iterations.  I played around making lists with iterations as low as 200, but I couldn’t make much sense out of them. I noticed that the higher number of iterations, the longer it took for the program to generate the lists. This made sense because it was going through the text a significant amount more than with less iterations. The first topics which I chose from our starting point of 50 topics and 1000 iterations were money, murder, Sherlock’s study, and women. From 25 topics and 2500 iterations I chose the topics crime, letter/message, and Sherlock. The final group of topics from 50 topics and 2000 iterations were journey/travels, appearance, and case. The more iterations and more words per topic definitely help in deciphering what the topics are. Through all the different lists I compiled using different settings it was clear that a lot of themes were always present such as crime and murder/violence and words surrounding solving cases. Using MALLET to topic model Sherlock Holme’s stories definitely helped to show the many themes present throughout the stories, but I found it rather difficult even having read some of the stories and being familiar with what Sherlock Holmes is about. Some word lists made no sense to me at all. Overall, using topic modeling you can get the gist of the main underlying themes featured in Sherlock Holmes, but to make sense of some of the names and words that come up in the lists you still need to read the stories to gain more detail and understanding. As we talked about in class, I think topic modeling serves to help close reading, but alone the data is too general to make use of.