Topic Modeling Sherlock Holmes Stories

All categories chosen from 50 topics with 1000 iterations:

1. morning night back clock waiting past early morrow quarter arrived
Title: time

2. paper note read letter table book handed letters written wrote
Title: writing

3. face eyes looked thin features lips figure tall dark expression
Title: physical features

4. woman lady wife husband life love girl child married maid
Title: household

5. black hair red hat heavy round broad centre coat dress
Title: clothing/accessories descriptions

6. found man dead lay body blood death knife lying round
Title: death/crime

7. give matter idea reason question impossible occurred absolutely explanation true
Title: interrogation/crime solving

8. face turned back instant hand sprang forward moment side head
Title: physical reactions

9. station train road carriage passed side drive reached drove hour
Title: transportation

10. light suddenly dark long caught sat lamp spoke silence silent
Title: darkness/mystery

Topic Modeling

A majority of the topic models that I had been able to decode were ones found within the 100 topic range and only a few within the 50 topic range. If I had used more than around 20 words, the topics would become too broad. All iterations were around 2,000 iterations. I happen to find one of a rather interesting and almost humorous topic.

100 Topics/20 words:

Room (100 Topics): small floor side left square wood carpet lower piece furnished hand cut chamber match central edge examining hole fashioned evidently
Clothing (100 Topics): black broad coat dress eye von red heavy brown dressed looked bork double hat yellow centre shining wore cap weary
Sailing (100 Topics): wind ship long sea peter east boat captain skin carey rising box time beginning distance board rain seaman securities initials
Actions from a chair (100 Topics): chair holmes back sit pray companion laid amazement arm laughed visitor stared leaned speaking rose seated seat conceal called heartily
Pistol (100 Topics): hand pocket held drew long table opened glass laid revolver crossed noticed bottle pistol drawer sleeve laying wing nerves pressed
Horses (100 Topics): horse colonel straker moor night bicycle boy stables stable john lad stranger miles ross trainer simpson silver horses led maid
Facial Appearance (100 topics): face eyes hair white cut pale appearance entered staring colour clean cheeks blue faced middle shoulders making shaven strength forehead

50 Topics/20 words

Train (50 Topics): train london station end late line close west reached points clue hour suppose imagine investigation give affair bridge roof examine

Murder (50 topics): found dead body head struck lay drawn blood hand finally blow knife deep stick lying round terrible fell unfortunate master
Message (50 Topics): paper note read pocket book held handed written writing wrote hand drew reading sheet put post attention picked slip finger

Comparing History and Sports with Google Ngram Viewer

For my first graph, i used Google Ngrams to visualize the usage of the names of three legendary presidents: George Washington, Thomas Jefferson and Abraham Lincoln. I did so as a way to view the popularity of the presidents during or after their tenures, and to compare their legacies decades after they left office.

Ngrams graph 1

Initially, the data seems a bit peculiar because the line representing president Lincoln has a minor spike right by the y-axis. Aside from that, it appears that the largest spike on the entire graph came on president Lincoln’s line during his tenure. It is much larger than the spikes during and after Jefferson and Washington’s tenures, respectively. This is probably due to the rising population of the United States during the 19th century and the increasing number of literate minds. However decades after all of their tenures Lincoln still has a higher percentage of appearances than Jefferson and Washington. Because of this it can be concluded that Lincoln had a larger overall influence on our country, probably because of the obvious social issues happening during his tenure.

My second graph displays the usage of the words baseball, cricket and soccer during the 19th century.

Ngrams graph 2

The line for baseball is very reasonable since the sport wasn’t even played professionally until the 1880s. The line for football can also be trusted, but obviously the football being referred to is what Americans now refer to as soccer. The line for cricket cannot be seen as reliable in the realm of sports though, because saying the word cricket could be in reference to the sport or the insect. Therefore to conclude, at the turn of the 20th century it was not clear whether cricket or football (soccer) was the more popular sport, but it is clear that baseball was still relatively unpopular in comparison.