Topic Modeling Graphs – Jen Pereira

The results of my topic modeling graphs were incredibly interesting to me. In my first analysis, I combined the topics of “crime,” “police work,” and “murder/death”. In this graph I found that, while the topic of police work tends to spike at random points throughout 1903-1904, crime and murder/death tends to be lower and typically the same throughout the two years in question. I did a bit of research and noticed that the two years graphed in this analysis were years of important sports events and a few protests. This would explain the spike in police activity without a correlating spike in crime.

Screen Shot 2015-04-02 at 11.09.29 AM
Crime vs. Police Work vs. Murder/Death

The second graph I charted was comparing the topics of travel and time. Travel appears to spike in a dramatic increase in 1908, with time spiking upwards as well during this time. I discovered through outside research that throughout the year of 1908 travel was becoming increasingly popular: the year beings with two expeditions around the world (one specifically from New Zealand to Antarctica); the Olympics were held in London in 1908 (which would increase travel to the area); and finally, the first aircraft manufacturing company in England is found in London. This would explain the increase in travel, as well as time.

Screen Shot 2015-04-02 at 11.28.31 AM
Travel vs. Time

Another set of topics I compared and contrasted were Business/Commerical and Construction. I noticed in my graphs that there was a correlating increase in both topics in 9125 and I wondered why that was. Looking at outside research, I noted that in 1925 there was a great deal of economic/commercial events taking place. For instance, primogeniture (or the rule that the first born son would inherit from the father) was abolished, Britain returned to the gold standard, the government granted a subsidy to the coal industry while they investigated its issues, the first double-decker buses with covered tops were introduced, and various bridges and tunnels were constructed. These events would clearly influence the Sherlock Holmes stories, as well as explain the increasing spike of these topics in 1925.

Screen Shot 2015-04-02 at 11.37.33 AM
Business/Commercial vs. Construction

Lastly, I decided to compare the topics of literature, description of clothing, and emotional verbs/actions. I thought that these topics were comparable as they all had to do with writing, and to an extent, education.I found the most interesting time period in this graph to look at was the years of 1891-1893. I found that literature often spiked dramatically first, and then the descriptive words would follow. One interesting fact that I discovered was that in 1891 Elementary Education was made free, allowing for an increase in literacy and education. This would, therefore, explain why the spikes were so dramatic around this time period. Futhermore, in 1892 Scottish universities began accepting women, and in 1893 the Brontë Society is established (the oldest literary society) and the Elementary Education Act raises the age to leave school to 11. All these historical events were significant in the rise of literacy and education, therefore explaining the rise of the topics of literature and the following descriptive words and actions.

Literature vs. Description of Clothing vs. Emotional Verbs/Actions
Literature vs. Description of Clothing vs. Emotional Verbs/Actions

 

Topic Modeling: Graphing the Results

The first topic is Travel:

Screen Shot 2015-04-01 at 1.12.09 PM

 

In this graph, we see an increase in travel around 1893 and the only other spike that occurs is later on in 1904, but the 1904 spike is not as high as the spike in 1893, therefore, I decided to research why that might have happened.  I found out that by the end of the 19th century, they invented a new method of transportation.  Based on the website Primary Homework Help The Victorians , “In the 1890s they could travel by motor car.”  Based on the research, I think that people decided to travel more after the invention of the motor car which explains the spike in 1893.

The second and third topics are Writing with Business:

Screen Shot 2015-04-01 at 1.12.31 PM

In this graph, I decided to compare the topics writing and business.  These topics both seem to have a spike at about the same time; Writing in 1903 and Business in 1904.  Therefore, I decided to research this further to find out why this might be.  The amount of writing words appear the most in “The Adventure of the Three Students”.  After reading the plot on the Wikipedia article, there is a lot of writing going on in the story because of the fact that it deals with students and a university.  However, it does not explain why business words showed up often, therefore, I looked at another story that was published in 1904.  Based on the Wikipedia article, business words appear pretty frequently in the story “The Adventure of the Abbey Grange” because it talks about how a man has been killed by the Randall gang.  It is interesting why these words tend to rise and fall together; it helps us understand the stories better because it will explain that the stories’ topics will be about writing or business.

The fourth topic is Detective Case:

Screen Shot 2015-04-01 at 2.25.32 PM

In this graph, we see a spike, that is higher than the other peak, in detective cases around 1891.  Then, I decided to research why this spike happened when it did.  Based on the Wikipedia article about the Whitechapel Murders, “The Whitechapel murders were committed in or near the impoverished Whitechapel district in the East End of London between 3 April 1888 and 13 February 1891.”  Based on this research, It is possible that the Jack the Ripper case influenced the amount of detecting words in the Holmes stories in 1891.

The fifth topic is Death:

Screen Shot 2015-04-01 at 2.25.47 PM

In this graph, we see a spike around 1903 regarding death and there is no other spike like that one throughout the rest of the graph.  Based on the website The Guardian, “During the 1880s and 1890s, local authorities, the LCC and the Metropolitan Public Gardens, Boulevard and Playground Association began to clean up and reopen old burial sites.”  It is possible that the actions of these authorities influenced the amount of death words in the Holmes stories based on the fact that from 1893 onward there is a steady rise in the amount of death words.  However, after the researching, I still am not able to explain the sudden peak in 1903.

The sixth and seventh topics are Time with Crime:

Screen Shot 2015-04-02 at 10.55.17 AM

In this graph, we see a spike for both time and crime in the year 1904.  Based on my research in the Wikipedia article of the story “The Adventure of Charles Augustus Milverton”, which was published in 1904, it is about the crime of blackmailing.  It explains how in order to help solve the case, Holmes visits Milverton’s Hampstead house, disguised as a plumber, in order to learn the plan of the house and Milverton’s daily routine.”  Therefore, daily routine refers to time.  Even though, it is evident that crime and time words appear in every Sherlock Holmes story.

The eight and ninth topics are Physical Description with Building:

Screen Shot 2015-04-02 at 11.06.51 AM

I decided to pair these two topics together because I wanted to see if there is a correlation between the two and also because they are both descriptions.  Based on this graph, the amount of physical descriptive words and building words tend to rise and fall together.  Except in the year 1904, the amount of building words increases and the amount of physical description words is not as high.  Then, after 1905 they do the complete opposite of each other; when the amount of building words rise, the amount of psychical description words fall or vice versa.  It’s possible that this kind of correlation tells us  that either the story will have more building words or that the story will have more physical description words.

The tenth topic is Emotion:

Screen Shot 2015-04-02 at 11.19.53 AM

In this graph, we see that there is a spike in the following years where emotion words show up most frequently, 1893, 1904, 1913, and 1924.  I have come to the conclusion that the stories that were published in these years all contained woman characters, based on the Wikipedia articles, “The Adventure of the Cardboard Box”, “The Adventure of Charles Augustus Milverton”“The Adventure of the Sussex Vampire”, and the bubble news article.  Based on the fact that they all contained woman characters, It’s possible that the amount of emotion words increased during these times because in Victorian times women were not considered equal based on the Wikipedia article.  This helps us understand the stories better because we can connect them to how the past really was.

Topic Modeling Graph Results

I wasn’t sure how to label in Google Fusion tables(oops), but in my graphs the X axis represents the publication year and the Y axis represents theme frequency. Overall, I liked thinking about the graph results and musing over what the data might represent.

Gun: There was a large increase in this topic from December 1st 1893 to October 12th 1893. In 1893, The Final Problem was published. Although Holmes dies (insert massive question mark here) in the story, it isn’t gun related. He plummets to his death (insert another massive question mark here) at Reichenbach Falls with Moriarty. However, he is beaten with a police baton, so maybe my topic is faulty. The topic drops the next year, rises again in 1904, and then falls until 1911. After this, the graph experiences spikes in 1917, 1922, and 1925. I looked up guns in Victorian London using victorianlondon.org, and found an entry detailing a gun involved murder from 1876. Given the later dates, and presuming that I didn’t mess up to topic, maybe it’s that guns became more available, and recognized in crime stories.

Gun topic
Gun topic

Continue reading

Topic Modeling

Iterations: 1500

Topics: 20

Topics Printed: 20

  1. ESTATE: house road passed round side place walked carriage garden left master dog horse hall drive path led ground walk standing
  2. CRIME: man young found inspector house father colonel dead police heard death body attention son crime evidence murder returned dangerous hopkins
  3. PUBLIC: street found home back station train lord james baker minutes st occurred waiting reached cab hours police late order town
  4. SEARCH: thought time make give made leave knew great find back hear place things doubt bring chance lost position fellow danger
  5. EXPRESSION: cried face turned back hands instant hand suddenly head moment voice words sprang forward eyes fell appeared feet lips threw
  6. APPEARANCE: man eyes face black looked red dark white figure deep hair thin features hat heavy drawn tall appearance blue sharp
  7. REASONING: clear mind point reason question person find matter idea make means absolutely secret presence impossible save excellent aware explanation sign
  8. CASE: interest sherlock case facts strange remarkable friend singular account london cases arthur nature problem extraordinary details public effect give find
  9. SILENT REFLECTION: holmes chair sat gave mrs companion fire fresh visitor rose pipe easy start table glanced silence cold silent horror change
  10. BUSINESS: business london money men papers set office answered letters brother made hundred work address man considerable great company west mycroft

Sherlock Holmes’ Short Stories, Topic Modeling

For this project I started off with 5,000 iterations, 20 topics and 10 words printed, but I realized the words seemed to different or many repeated and I couldn’t easily put a topic on them. I tried a couple more times with less iterations more topics and more words and as I went down in iterations and up in topic and words I started to get ones that I liked. After trying numerous of different options I concluded with 2,500 iterations, 30 topics and 20 words, that made it easy to get a topic from.

Topics:

Murder

1.”found, left, lay, end, body, dead, path, ground, feet, death, foot, blood, ran, blow, knife, carried, water, lying, showed, mark”

Travel

 2.”house, road, station, place, train, reached, past, line, carriage, direction, drive, haul, walk, back, town, country, drove, dog, pulled, round”

House

3.”room, door, window, open, opened, bed, entered, floor, bedroom, key, heard, closed, sound, passage, inside, step, sitting, safe, light, rushed”

Description

4.”face, eyes, man, black, dark, white, red, spoke, hair, thin, drawn, tall, appearance, features, blue, deep, pale, sharp, mouth, middle”

Religion

5.” wife, told, life, knew, woman, heat, girl, god, secret, hands, speak, love, truth, child, married, sake, thing, mine, understand, loved”

Divorce

6.”lady, woman, Mrs., left, back, husband, bring, pour, brought, story, maid, heard, told, happened, creature, gentleman, beautiful, terrible, real, live”

Schedule

7.”morning, night, day, doctor, clock, hour, morrow, DR., news, hours, yesterday, days, evening, early, state, breakfast, telegram, return, late surprise”

Job

8.”London, business, money, time, man, years, office, Hopkins, hundred, twenty, company, pay, west, pounds, country, thirty, thousand, paid, city, advertisement”

Investigation

9.”police, inspector, found, house, crime, made, murder, night, attention, London, shot, tragedy, dead, remainde, reason, arrest, attempt, moment, official,charge.

Performance

10.”face, instant, moment, cried, eyes, voice, turned, suddenly, sprang, forward, through, hands, sat,air, cought, struck, quick, sudden, strange, dreadful”

Topics

100 topics, 20 words, 10,000 iterations

Crime: police found case evidence arrest charge constable undoubtedly appeared arrested court robbery official avoid credit unfortunate confession referred instantly jury

Lovewoman wife husband love knew married loved women influence give daughter life lived power marriage strong died true fit beauty

Money: money business hundred pounds pay thousand worth price sum single paid check ruin fifty ten offer advance terms fifteen buy

Facial Expressions/Face: face eyes pale white nervous thin lips turned colour forehead angry cheeks told nerves frightened spoke excitement staring breath brow

Chillin’ Like Sherlock: pipe sat fire asked smoke tobacco silence opposite cigar lit room sitting armchair smoked fresh original rest habits dull bachelor

Male Descriptive Words: man hair dark cut middle white tall appearance beard handsome clean age features gentleman pleasant fashion elderly bore short bearded

Detective Words: case watson remarked interest interesting investigation answered methods prove client full cases clue exceedingly greatest contrary obliged result art afraid

Investigation: sherlock arthur adventure cases doyle conan friend public long facts problem notes details years series year famous practice feel record

Sailing: long wind ship sea peter captain boat carey rising london bound board securities spirit names rain pulled seaman cleared command

Death: body lay head found dead blow knife lying drawn shot weapon fell blood finally revolver heavy wound unfortunate carried bullet

Topic Modeling Holmes

1000 iterations, 50 topics, 20 topic words

Nationality: country England world American fear great English law real days present living set bound meet south date british year baron

Writing: paper read note letter book hand papers pocket table written put handed letters writing sheet post wrote write document held

Tracks: boy examined feet blow carefully examined feet blow carefully examination mark left ran wood found path marks stick impression marked showed edge foot unfortunate

Reaction: turned hand face instant moment rushed fell eyes voice felt dreadful horror air white lips arms milverton threw minutes cold

Numeral Values: years money time ago twenty sister hundred months thirty age considerable weeks ponds days club year month named sum

Simple Explanation: case point give points facts curious fact investigation obvious clue incident idea events theory admit solution simple connection criminal explanation

Suspense: head back struck suddenly sprang forward caught quick instant cry began sight eye held feet step dropped broke ears stepped

Family Scandal: woman lady wife husband love girl life child secret maid married beautiful dear character ferguson strong mistress ill daughter engaged

Stakeout: road side place carriage hall dog high direction miles drive house cottage pulled yards passed village bicycle houses trap mile

At Sea: line end thought water full lay place fall black wind ship peter sea memory forever walked captain stackhurst boat brandy

Topics

100 topics, 15  topic words, 5,000 iterations

Face/head: eyes face lips pale expression hands eager hot thin white cheeks fixed grey brow emotion

Realtionship: life woman love heart knew loved break mine truth evil power world women hands told

Money: hundred money ten business pounds year thousand worth large made pay paid price fifty sum

Detective: facts case theory simple fact points difficulty formed suggest correct impossible idea test remained explanation

Travel: train station carriage line reached journey started bridge drove return hurried cross roof passing save

Protection/Security: door room key opened safe closed inside long study lock locked shut entered bag fastened

Room Descriptions: round corner side table front left stood dressing covered top square carpet gown furnished books

Appearance: man face dark tall hair features eyes head middle thin figure clean beard cut gray

Writing: note letter paper book letters wrote written writing write read handed date envelope slip address

Time: years ago time twenty months thirty week age daughter weeks lives quiet meet retired engaged

 

Topic Modeling

200 topics, 7500 iterations, 10 topic words

Physical Description- man, hair, cut, middle, appearance, short, clean, faced, eagerly, shaven

Crime Scene- dead, body, found, knife, blood, cut, wound, death, heavy, lag

Travel- train, station, carriage, journey, roof, bridge, started, body, leave, ticket

Writing- paper, note, wrote, written, book, handed, writing, sheet, slip, write

Smoking- room, pipe, sat, fire, cigar, tobacco, chair, smoke, lit, writing

Marriage- woman, love, wife, husband, loved, knew, life, heart, married, women

Crime Solving- case, interest, points, remarkable, facts, singular, problem, fact, experience, solution

Time- night, hour, late, quarter, clock, work, twelve, eleven, time, ten

Business- hundred, money, pounds, business, thousand, company, price, sum, terms, pay

Light- light, lamp, darkness, dark, match, lantern, gas, lit, heavily, burning

DeFranco_Topic Modeling Assignment

Iterations: 2000
Number of Topic Words: 15
Number of Topics:150

Body/Posture: face eyes turned caught stood looked sunk glimpse mouth staring breast sat chin sight covered

Attack: man found dead body blood knife struck blow weapon fell wound stick head picked wounded

Crime: crime murder night appeared committed scene charge criminal motive violence arrest discovered lucas tragedy police

Smoking: pipe sat fire lit silence tobacco smoke cigar opposite smoking cigarette smoked peculiar thoughts handed

Family/Household: family england people real year high children live friends folk history shows household governess gather

Writing a Letter: paper read note written wrote handed sheet letter writing post write slip pen tossed printed

Neighborhood: house passed garden door walk gate lane cottage walking park windows dark grounds road knocked

Clothing/Wardrobe: black coat dressed hat st clair broad cap wore dress collar den eye trousers coloured

Thought: mind clear idea make remember observed vague effort easily forced absurd draw suspected memory false

Marriage: woman wife husband love life loved knew married girl women nature marriage marry power lover