Sherlock Holmes Topic Modeling

Word Cloud for Blog

First and foremost, I accidentally miscounted and neglected to post a tenth topic so it is included in the following list:

(50 topics/1000 iterations/20 topics printed)

Place: house side road passed walked front round garden hall windows path corner direction window standing ran houses yards led bicycle

Murder/death: found left body blood lay brought examined revolver round examination ground knife carefully wood death stick marks track dead spot

Letter/note: paper note read letter book pocket letters handed wrote written writing write sheet post document slip table reading date envelope

(60 topics/700 iterations/15 topics printed)

Woman: woman lady wife young mrs girl love life husband child miss married story daughter beautiful

Spirits/ghosts: doubt lost danger dangerous clear life criminal law friend memory powers presence death care fear

Time: night heard morning evening clock ten past waiting house thirty usual surprise found quarter quiet

Crime: house found examined night body showed show clue signs finally death proved carefully carried servant

Money: years money ago twenty hundred lady king pounds gold months pay photograph age year thousand

Deduction process: case interest facts points point investigation remarked give follow incident theory interesting obvious run conclusion

Family: father made left happened death poor mother imagine story returned died strange mad truth butler

Though I found topic modeling to be an interesting concept and distant reading tool, I thought it was difficult to understand when I was configuring and selecting my own topics.  I don’t think I was able to spend enough time with the program.  Since I don’t have any background with programming, I felt like there was something I was missing.  It was difficult for me even to get MALLET to compute the data in the first place.  After that, I could go through the lists of words and find how many times they were used and, to an extent, the way they related each other – so I was able to better grasp the use for this tool.  Looking at the words this way appears to be more effective in finding information about a lot of text, as opposed to a word cloud.  A word cloud will display all of the words randomly and show their frequency [like above, displaying the frequency of the words in my topics]; MALLET will list words in relation to each other, so a reader will get a better idea of the themes throughout the collection of literature.  In theory, this word cloud should illustrate a very condensed version of the Sherlock Holmes stories, but these are only words based on my selections of topics from topic modeling.  To any reader outside of this blog, the word cloud above [which focuses mostly on death and bodies and seems to make the stories out to be much more morbid than they really are] could not possibly produce an authentic understanding of the text.

When I chose my topics, I picked out the ones that were the most intriguing to me.  Some were simple and some didn’t make sense – for example, the final topic [the one I had forgotten] makes so little sense to me I don’t know how to title it, whereas the “woman” topic features only words that have direct correlations with the female gender.  For the “family” topic, I finally chose that word to represent them all primarily because of “mother” and “father.”  However, I still wonder what “strange,” “mad,” and “truth” have to do with the topic.  Perhaps “family” is incorrect and the topic is really to do with “storytelling,” which is prevalent in the Holmes stories.  Sherlock’s clients and/or Sherlock himself tell their stories in every individual mystery.  Many of the topics feature at least one word that throws me off of what I think the topic is in general.  So, for me, there is still a disconnect in the idea of distant reading as a comprehensible look at lots of text, but I’m really enjoying looking at new technological ways to consider and discuss literature.