Sherlock Holmes Topic Modeling – In Review

I enjoyed using Mallet and was surprised how extremely fast it did the topic modeling. I’m glad it’s not a slow process; it actually takes longer to set it up and tell it exactly what to do. As for titling my topics, I found some easier to do than others. While some were rather clear in what they could be titled, some were more difficult and I ended up using a word that was already included in my list for some, such as Work. I also found myself determined to make my titles only one word, not realizing that two or even three words would do just fine as well. I found two lists that were very similar to each other (Features and Appearance) as I thought some of the words in the two lists could be interchangeable. I certainly expected the popular topics such as Crime and Murder to show up. A lot of my topics are also related to one another, such as Features and Appearance, Crime and Murder, and Communication and Literature. In my Communication topic, the word “tregennis” appeared and I had no idea what that was. It turns out to be a character’s name in the short story titled “The Adventure of the Devil’s Foot.”  This just goes to show that additional research is always necessary, no matter what academic tool is used. As I played with different iterations and topic numbers, I noticed that the higher the number, the more variety in words included in lists. However, too many topics/words may be hard too hard to analyze, therefore making the whole process of titling the topics a strenuous task. I like Mallet as a tool for distant reading, which is a concept that I think is definitely useful. It’s kind of like a more organized word cloud in a way, one that groups words together instead of just gathering them. As someone who isn’t very fond of reading, analyzing texts, especially this many at one time, is way more enjoyable for me.