A deeper look at topic modeling

wordcloud

All categories chosen from 50 topics with 1000 iterations:

time – morning night back clock waiting past early morrow quarter arrived

writing – paper note read letter table book handed letters written wrote

physical features – face eyes looked thin features lips figure tall dark expression

household – woman lady wife husband life love girl child married maid

clothing/accessories descriptions – black hair red hat heavy round broad centre coat dress

death/crime – found man dead lay body blood death knife lying round

interrogation/crime solving – give matter idea reason question impossible occurred absolutely explanation true

physical reactions – face turned back instant hand sprang forward moment side head

transportation – station train road carriage passed side drive reached drove hour

darkness/mystery – light suddenly dark long caught sat lamp spoke silence silent

Using MALLET was an interesting experience. I enjoyed how simple and accessible the interface was. I had no trouble navigating the program and tweaking the iterations and so forth to my liking. I experimented with several numbers before choosing to analyze my topics with 50 topics, 1000 iterations, and a 10 topic word selection. I tested extreme numbers to see how it would influence the data. In one trial I searched 500 topics with 3000 iterations. This resulted in too specific of data that explored topics that were relative to particular stories. I also searched as few as 10 topics with only 500 iterations. This generated too many broad and vague topics that did not capture the essence of the mysteries. In the end I felt that narrowing it down to 50 different topics with 1000 iterations gave me a good sense of the Sherlock Holmes stories in a general yet helpful way. The word cloud above displays these words in a creative and interactive way.

The ten topics that I chose out of the fifty total were due to their overall similarity. I assigned the simplest titles that I could think of to each of them to give a general structure for understanding the Sherlock Holmes stories as a collection. Understanding ten basic concepts that are reflective of the entire collection is easier to grasp and accept by the reader. Each title represents an element of the stories that is imperative to the work as a murder mystery relative to the time it was written. Obviously topics such as death, crime, interrogation, and mystery are all blunt examples of what a mystery story encompasses. Some of the other topics such as physical reactions and features are more subtle examples yet serve just as important a role. The stories rely primarily on context clues and other literary devices that create an interesting and challenging mystery to solve. Things such as physical expressions and reactions are important elements of any mystery story because they can explain a lot about an individual character or the way they respond to certain situations. Another topic such as clothing descriptions seems to be part of the style of writing of the collection of Sherlock Holmes stories. Holmes is an icon for mystery investigators and the way that he is dressed is an important part of his appeal. The author pays a lot of attention to the way that Holmes’ dress is described as well as other characters throughout the entire series.

Topic modeling provides a unique framework for examining thousands or millions of texts at once. Distant reading is an interesting concept that I will hopefully be able to exercise in future research. The ability to apply your own ideas and lens to any given topic or series of works through topic modeling is something truly valuable that many other classic tools or academic research methods do not allow or facilitate.