Topic Modeling Analysis

From these topic modeling graphs, trends in the Sherlock Holmes stories as well as the real world can be seen.  It is safe to say that this was the popular culture back in the late 1800s/early 1900s just from seeing the themes within the story.  I thought it was interesting to find the relationships with real world events.

Screen Shot 2015-04-03 at 2.08.40 PM

The first chart shows “crime”, “crime scene”, “murder”, “family and relationships” and “investigation”.  There are a couple of large spikes for family and relationships, especially in the 1920s, although a quick google search leaves me empty handed.  Crime also shows a spike in the early 1920s as well, and this could be because of The Red Scare, which was not exclusive to the United States.  During this several high profile cases in the United States such as, Sacco and Vanzetti as well as the Scopes Monkey Trial have occurred. By this time, news sources in Great Britain would have got word of these cases. The other three topics are very related to crime in itself.

Screen Shot 2015-04-03 at 2.02.08 PM

With the next chart, which shows “finance” and “foreign affairs”, there is one large spike for foreign affairs on September 1st, 1917.  The Great War was still going on, and this was the year the United States entered the war.  Also, Germany has declared unrestrictive submarine warfare several months earlier.  Russia’s position in the war was being questioned as Bolsheviks started to gain more control in Russia, starting with the abdication of Tsar Nicholas II in March of 1917 as well as continuing riots in the country.  Finance unfortunately does not receive the same attention that foreign affairs has been receiving.

Screen Shot 2015-04-03 at 2.57.56 PM

With the last chart, dealing with “smoking”, “residential streets” and “transportation”.  A large spike in residential streets is seen on January 1st, 1904.  In this year, road infrastructure is still in its infancy, roads were still poorly made, cars were not as widespread and modern traffic laws have not been drafted yet.  What is quite strange is that transportation does not see as large of a spike even in 1908 when the Ford Motor Company introduced the Model T, which has quickly become the most popular car around the world, beating British brands such as Austin, Rolls-Royce and Bentley.

MALLET Results MichealF

word cloud 2

Posted above is my word cloud made with my MALLET results. We had used MALLET previously in class and it was interesting to create a key word or category for a group of related words. Making them ourselves however was a different experience. I got to see what goes into making these topic models. I used 4 separate combinations when topic modeling. My first search was 50 topics/1000 iterations/ 20 words printed. Within this search I picked the 3 sentences that were able to be categorized the easiest. The topics for the three examples I chose were “Hallway”, “Communication”, and “Study/Office”. The second search I did was 25 topics/ 500 iterations/ 15 words printed. The three examples were “Case”, “Suspect” and “Evidence”. The third search I did was 20 topics/ 250 iterations/ 10 words printed. The four examples I chose were “Suspicious”, “Location”, “Discover/Trace” and “Attack/Violence”. During my search results I felt that it would be best to narrow my search requirements after every time. My reasoning behind this was that by narrowing my search queue, I would get more accurate results every time. I felt that the more words printed in a search results would make the topic harder to categorize because there is more words that you need to relate with each other. The models I got with narrower search results were easier to understand and easier to categorize. Overall, topic modeling using MALLET was a helpful tool to try and find main themes throughout all the Sherlock Holmes stories and I look forward to doing it again in class if given the opportunity.