Topic Modeling Sherlock Holmes’ Short Stories

After applying different settings at the Topic Modeling Tool, I have chosen the result of 2000 iterations, 40 topics and 20 words printed per topic.

TOPICS:

1) Investigation

“case point fact facts points remarked investigation evidence mystery interest follow simple theory incident clue confess obvious problem curious afraid”

2) Time

“day morning days evening made news surprised telegram called meet order explain yesterday week hours spent return caused received longer”

3) Violence

“man found dead body death blood struck terrible dreadful creature poor blow knife wild ground unfortunate stick horrible lay picked”

4) Location

“house road side passed hall front place dog drive windows round drove standing direction high houses yards scene building square”

5) Sitting

“chair sat room fire half laid rose pipe bell arm glass lay asked silent seated alarm lit lamp cigar smoke”

6) House

“room window door entered open bed table left key bedroom study inside round safe sitting rushed floor locked lawn dressing

7) Appearance

“black man red face dark hair thin features tall head appearance figure white middle dress blue eyes glasses yellow faced”

8) Relationship

“husband knew thought heart wife girl told love truth child mind back gentlemen break loved leave mine met mary ferguson”

9) Mystery

“door light stood window opened heard open dark sound passage closed steps silence ran instant front pushed suddenly sharp drew”

10) Conversation

“asked give answered thought matter thing make time business good call start taking kind rest turn happy questions wished excuse”

Lauren Gao’s Topic Modeling

Using Mallet’s Topic Modeling program, DHM293 ran all 56 Sherlock Holmes short stories with various settings. Changing the number of iterations and topics, I finally settled on settings of 100 topics and 2000 iterations, with 20 words in each topic.

1) Murder in Sherlock Holmes

crime death police murder reason charge scene tragedy night committed arrest violence evidence murdered motive constable caused suspicion escape attempt

2) Watson

watson dr doctor friend means surprised matter natural blessington amberley patient disease days medical continued knowing reasons armstrong trevelyan brougham

3) Men in Sherlock Holmes Stories

face man eyes dark thin tall expression figure features looked beard voice middle manner handsome gray clean age huge fierce

4) Women in Sherlock Holmes

woman wife husband love life knew girl loved married lady women rich daughter soul beautiful power nature beauty marriage young

5) Transportation in Sherlock Holmes

home minutes cab waiting heard wait glad ten ha walking twenty church quiet send reach talking feel driven long drove

6) Deducing in Sherlock Holmes

case facts points explanation fact simple theory admit investigation give solution problem confess correct present obvious formed probable connection false

7) Holmes’ mannerism

holmes head hands shook easy smiled sank sunk breast short forehead gesture rubbed began forward clapped despair branch leaning eagerly

8) Villains in Sherlock Holmes

great doubt criminal dangerous country brain set career act failed makes gang cunning power war europe compelled sufficient traced remains

9) Smoking in Sherlock Holmes

sat pipe fire looked time cigar tobacco smoke asked sherlock corner long chair armchair smoked lit roylott smoking moran observe

10) Accents in Sherlock Holmes

don ll ve won talk thing give answered didn bit ready bad couldn wait eh minute masser wouldn isn lucky

 

 

Topic Modeling Project

Here are my (informed) guesses for the theme names. The data is based on me running 150 topics, 2,000 iterations, and 30 words.

Clues:paper note read letter papers word book answer letters table importance handed short written wrote received message writing pocket attention sheet write account reading von post secret picture men document

Reaction:face cried hands eyes turned instant holmes suddenly back voice looked head sprang forward words spoke lips threw moment step quick hand feet shook raised amazement staring moved burst stepped

House: room door window open light opened entered floor bed key bedroom closed lamp sound passage safe table bell steps sitting rushed inside ran pushed locked dressing lawn stair stood study

Crime: found police crime place death evidence brought case murder hopkins dead finally making tragedy body remained person arrest unfortunate attempt order hotel violence prisoner missing inspector inquiry appeared naturally weapon

Time: night morning day hour house back evening clock work late heard happened quarter room past start early breakfast returned waited hurried usual mrs left morrow arrived ten signs occurred twelve

Gun: back heard hand head stood lay long struck held turned fell deep round sight dreadful caught blood moment body cry horror strange revolver shot blow lying gave surprise dropped arms

Case: watson case point find dear clear doubt end points possibly affair person follow investigation obvious surely simple difficult clue perfectly confess theory close admit remarked undoubtedly prove suggest formed solution

Estate: house round side front road passed place long hall carriage garden high drive walked direction windows drove dark led corner pulled reached line square standing coming low past miles slowly

Travel: street found station train lord baker office waiting st cab minutes evening started yard half quiet gentleman company west made monday town detective official afternoon scotland service home reached engaged

Physical Description: face eyes black man red dark white figure hair thin light side tall heavy fashion hat appearance blue expression pale broad dressed features mouth centre middle yellow dress sharp brown

Trends Discovered By Topic Modeling Sherlock Holmes Stories

Here is a compilation of topics I have gathered through the topic modeling project.

Settings: 75 Topics, 2,000 Iterations and 20 Topic Words Printed

Female/Romance Isses 2. lady woman left maid young told beautiful fear sudden drawn truth mistress strong voice beauty late send frightened refused fiend

Writing 3. paper note read letter book letters written handed wrote pocket writing write sheet post address drew envelope slip document description

Banking 9. money business hundred pounds company pay thousand price fifty sum terms asked manager firm worth offer ruin milverton hard paid

Murder 11. crime police evidence murder death night arrest tragedy violence charge murderer committed arrested attention escape discovered unfortunate murdered motive instantly

Individual Descriptions 25. black hat heavy dressed brown side broad coat round boots observe grey dress slight nose double pair chin yellow sharp

Romance 29. years time ago father met mother twenty months year death age life weeks married lived friends month sister home uncle

Family Life/Values 40. wife husband woman love life child knew girl loved daughter understand character women ferguson nature heart devoted influence america married

Ominous Scene 42. light dark lamp window stood long sound silent darkness shadow heavily low sharp black lit standing shining yellow vague whispered

Killing/Death 48. found body dead man blood head knife lay stick lying blow house weapon close long wood finally wound carried burned

Morning 75. night morning clock heard early day breakfast hour morrow arrived sleep twelve dropped surprise eleven bright caused late weary started

Best Selling Cars in 2014

https://www.google.com/fusiontables/data?docid=1nQCcpLRRsesXRw9SnXVKGJQTQBgQ2JQ1ZHxsAYf-#card:id=2

Screen Shot 2015-03-14 at 6.08.51 PM
What gas prices? Full-size pickups make up the top three. Then again, a large chunk are used in commercial applications.

 

Screen Shot 2015-03-14 at 6.09.10 PM

Screen Shot 2015-03-14 at 6.09.37 PM

Screen Shot 2015-03-14 at 6.09.47 PM

Screen Shot 2015-03-14 at 6.13.53 PM
Most cars on this list originate from the United States or Japan. Only a few come from South Korea or Germany.  Doesn’t mean there aren’t any red hot Ferraris sold here.

 

Screen Shot 2015-03-14 at 6.14.25 PM

Screen Shot 2015-03-14 at 6.14.31 PM
A breakdown of sales by brand. Ford has the most sales on the list.

 

Screen Shot 2015-03-14 at 6.14.34 PM

Screen Shot 2015-03-14 at 6.14.41 PM

 

Sorry this post has so many photos, so I’ll make up to you all by posting this adorable husky.

Lauren Gao’s Google Fusion Tables: Where in the World did Nancy Drew Go?

Well, wrong game. However, for this week’s assignment, I put together 10 of the Nancy Drew adventure series games to practice using Google’s Fusion Tables. While ten games, each with different locations, are not enough to grasp how well-traveled Nancy Drew is, I collected information on where each game was located, what year the game was published, how many supporting characters there were, and what kind of mystery Nancy Drew had to solve. Most games were set in actual states or locations, but often the town or building in question was a fictionalized location. A trend that came up during my data visualizations was that the later the year, the more supporting characters the game was likely to have. Additionally, the type of mystery with the most supporting characters was “Robbery”, which was the mystery in the game “Secret of the Scarlet Hand”.

https://www.google.com/fusiontables/DataSource?docid=1T8kwjpf1RqWo8QU5uDq4abMY3MH-c-P2IIAx0Aei

Screenshot (36)Screenshot (37)

Screenshot (38)

Screenshot (39)

 

Screenshot (40)

Screenshot (42)

 

Google Fusion Tables: an easy way to create data visualizations

I have selected some of my favorite movies from different genres and nationalities. I was curious to figure out how much each one had cost to be produced. In the case of movie series, I have chosen the one that I like most: Harry Potter and the Half-Blood Prince; Star Wars Episode III: Revenge of the Sith; Back to the Future Part II; Hunger Games: Catching Fire. I also have chosen two Brazilian movies that I admire very much. As I expected, the Brazilian productions had spent very lower budgets than the Hollywood creations, and it is nice to verify this data through visualizations.

Chart-card
Default Card image.
pie-graph
My movies’ preferences per genre.
bar-grah-CORRECT
Comparison of movies’ budgets.
Location-studios-2
Location of the studios. It is interesting to observe that most of the continents host one of my favorite movies’ studios.
Network-graph
Genres such as Animation and Science Fiction share similar locations.

Link for google Fusion Tables:

https://www.google.com/fusiontables/DataSource?docid=156_b0bEG8Url9J8yqe3xm5m7bFQlQDOQgBEDECcv

Link for the Spreadsheet:

https://docs.google.com/spreadsheets/d/1PX_0hpj46zaOQBs3ZmVjtjPjk1kJD-1I0h_OpVHIAzc/edit#gid=0

Spring Must be Here: Baseball and Numbers, Together Again

My Fusion Tables Link and Google Sheets Link

Screen Shot 2015-03-12 at 8.58.46 PM
Pictures and facts for all ten New York starting pitchers.
Screen Shot 2015-03-12 at 8.59.51 PM
This heat map accentuates New England and the Dominican Republic; both areas are home to two NY pitchers. Masahiro Tanaka, a recent signee from Japan, is the only pitcher not from the Western Hemisphere.
Screen Shot 2015-03-12 at 9.29.37 PM
Although CC Sabathia stands alone with 14 years, the network of Harvey, Wheeler and Pineda can be seen together, each having 2 years pitched.
Screen Shot 2015-03-12 at 9.11.55 PM
Bartolo Colon and CC Sabathia represent more than half of the collective professional experience of the pitchers.
Screen Shot 2015-03-12 at 9.12.20 PM
Note the overall slope, as velocity decreases with age.
Screen Shot 2015-03-12 at 9.12.33 PM
The majority of the NY pitchers are under 30, and Bartolo Colon is timeless.



If you made it this far thinking about baseball and numbers, Congratulations! Let’s elect the pitcher with the “Best Hair”:

images
Jenrry Mejia
images-1
Jacob DeGrom