I played around with a few different numbers of topic/topic words and iterations in class. I ended up choosing topics from lists generated with 50 topics and 1000 iterations, 25 topics 2500 iterations, and finally 50 topics and 2000 iterations. I played around making lists with iterations as low as 200, but I couldn’t make much sense out of them. I noticed that the higher number of iterations, the longer it took for the program to generate the lists. This made sense because it was going through the text a significant amount more than with less iterations. The first topics which I chose from our starting point of 50 topics and 1000 iterations were money, murder, Sherlock’s study, and women. From 25 topics and 2500 iterations I chose the topics crime, letter/message, and Sherlock. The final group of topics from 50 topics and 2000 iterations were journey/travels, appearance, and case. The more iterations and more words per topic definitely help in deciphering what the topics are. Through all the different lists I compiled using different settings it was clear that a lot of themes were always present such as crime and murder/violence and words surrounding solving cases. Using MALLET to topic model Sherlock Holme’s stories definitely helped to show the many themes present throughout the stories, but I found it rather difficult even having read some of the stories and being familiar with what Sherlock Holmes is about. Some word lists made no sense to me at all. Overall, using topic modeling you can get the gist of the main underlying themes featured in Sherlock Holmes, but to make sense of some of the names and words that come up in the lists you still need to read the stories to gain more detail and understanding. As we talked about in class, I think topic modeling serves to help close reading, but alone the data is too general to make use of.
Class Blog 1
Sherlock Holmes Topic Modeling (10)
No. of Iterations: 1000
No. of topic words printed: 20
Topic Modeling (10)
Number of Topics (40)
1. Deliberation: case, fact, reason, facts, explanation, mystery, obvious, idea, simple, shown, great, effect, prove, evident, impossible, solution, theory, observed, probable, story
Number of Topics (50)
2. Investigation: crime, police, evidence, murder, case, attention, account, death, tragedy, arrest, mark, occurred, inquiry, missing, unfortunate, discovered, charge, complete, naturally, committed
3. Attributes: man, face, eyes, dark, figure, looked, tall, head, drawn, black, features, mouth, thin, middle, appearance, deep, huge, beard, nose, lines
4. Text: paper, note, read, letter, letters, book, handed, table, papers, written, message, writing, wrote, address, short, sheet, post, write, importance, document
5. Expression: face, eyes, turned, lips, spoke, appeared, light, suddenly, pale, manner, sat, staring, sank, expression, nervous, excitement, silent, eager, breath, fixed
Number of Topics (60)
6. Homicide: found, dead, body, left, dreadful, finally, carried, terrible, blow, lying, round, knife, stick, fell, brought, horrible, single, strong, weapon, person
7. Frontyard: road, house, carriage, side, drive, hall, front, direction, drove, back, garden, place, walked, station, yards, pulled, passed, stopped, gate, grounds
8. Setting: room, door, open, window, entered, opened, key, rushed, closed, bedroom, passage, instant, locked, floor, stair, pushed, lock, stairs, led, safe
9. Path: path, passed, showed, foot, round, water, led, track, leaving, ran, walked, edge, traces, feet, hard, grass, marks, fall, lay, ground
10. Mycroft: london, office, brother, suppose, papers, west, mycroft, young, company, evening, monday, club, card, foreign, fog, clerk, pycroft, pocket, daily, government
I kept the number of iterations to 1000 and the number of topic words to 20. I only experimented with the number of topics. I found that the lower the number (ten, foo example) the more general the words were, which made the meaning of word combinations difficult to pinpoint. I ended up using 40, 50, and 60.
At times, I found it difficult to understand some words usage with other terms. I think this is because I haven’t read many Sherlock Holmes stories and I don’t understand some associations. The topic modeling that I did end up using are those terms I strongly associate with Sherlock Holmes. Deliberation, investigation, and homicide relate very much to the overall Sherlock Holmes story line. I think these terms are more general and broad. The other terms (frontyard, attributes, text, setting, path, and expression) are more specific. These are the kind of things Sherlock would use during an investigation, as well as to DO an investigation. These kinds of terms would mostly be used in the middle part of the stories, during the investigation.
Mycroft, of course, is Sherlock’s brother.
Dan Albrechts’ Topics.
Here are my ten topics with the words that constituted them.
10, 2000, 30:
Morbid: Man, Face, Eyes, Head, Hands, Turned, Back, Moment, Sat, Looked, Cried, Woman, Held, Thing, Dead, Voice, Deep, Long, White, Spoke, Struck, Told, Mind, Lay, Gave, Full, Blood, Surprise, Fashion.
Services Request: Sir, Matter, Find, Make, Note, Doubt, Letter, Clear, Gentlemen, Dear, Surely, Book, Order, Present, James, Fear, Position, Imagine, Offices, Letters, Point, Some, Danger, Importance, Important, Mystery, Call, News.
30, 1000, 15:
Death: Found, Lestrade, Man, Dead, Death, Lay, Evidence, Blood Long, Carried, Murder, Moment, Crime, Master.
Details: Face, Eyes, Looked, Red, Dark, White, Spoke, Hair, Features, Lips, Thin, Drawn, Figure, Appearance.
Setting: Day, London, Morning, Evening, Train, Doubt, Station, Days, Made, Return, Hours, News, Surprised, Found, Police.
Victorian Women: Lady, Woman, Wife, Life, Mrs, Husband, Poor, Young, Girl, Heart, Maid, Love, Child, Married, Story.
Deep Thought: Chair, Sat, Hand, Long, Turned, Back, Fire, Ligth, Head, Rose, Lamp, Laid, Drew, Companion, Cold.
100, 500, 20:
Action: Suddenly, Forward, Sprang, Instant, Quick, Hands, Step, Feet, Cried, Sight, Appeared, Stepped, Broke, Coming, Burst, Bent, Empty, Rage, Sleeve, Stairs.
Physical: Man, Dark, Tall, Dressed, Beard, Thin Corner, Great, Use, Fashion, Pair, Description, Handsome, Suit, Seated, Powerful, Glasses, Bearded
Terror: Eyes, Face, Spoke, Pale, Coming, Fear, Turned, Nervous, Voice, Horror, Thin, Dreadful, Staring, Frightened, Told, Cheeks, Emotion, Terror, Agitation, Sitting.
Analyzation of Topics
My first four topics (crime, death, family, and messages/notes) were found using 25 topics, 2000 iterations, and 20 topic words printed, with the stop words removed. My next three topics (characteristics of a man, love, and house/home) were found using 30 topics, 1500 iterations, and 20 topic words printed, with the stop words removed. My final three topics (traveling, evidence, and time) were found using 20 topics, 2500 iterations, and 20 topic words printed, with the stop words removed. Despite having changed the numbers of topics and iterations multiple times, themes relating to crime, mystery, and investigation kept coming up, showing the main aspect of each of the Sherlock Holmes stories. This shows that topic modeling succeeds at highlighting the recurring topics that multiple texts have in common.
I found it most difficult to come up with common topics between the words that came up when I decreased the number of iterations. However, those topics (characteristics, love, and house/home) all ended up making sense in regards to the Sherlock Holmes stories. Describing one’s characteristics is part of solving any mystery, the theme of love is part of the backstory given in each text of those involved in or affected by the mystery/crime, and house/home can represent Holmes in his house/room or the home where a crime or mystery is being solved. Being somewhat familiar with various Sherlock Holmes stories definitely helped me recognize the topics – if I had never read any Holmes stories, I would have had more trouble.
Overall, MALLET successfully navigates multiple texts in an efficient manner to point out common topics. The fact that it offers additional information about these topics, such as the percentage of a topic’s presence in a text, makes it helpful in pointing out themes and ideas that one may have overlooked during a close reading of the text.
Topics for modeling Sherlock Holmes
50 topics
1000 iterations
20 topics printed
Place: house side road passed walked front round garden hall windows path corner direction window standing ran houses yards led bicycle
Murder/death: found left body blood lay brought examined revolver round examination ground knife carefully wood death stick marks track dead spot
Letter/note: paper note read letter book pocket letters handed wrote written writing write sheet post document slip table reading date envelope
60 topics
700 iterations
15 topics printed
Woman: woman lady wife young mrs girl love life husband child miss married story daughter beautiful
Spirits/ghosts: doubt lost danger dangerous clear life criminal law friend memory powers presence death care fear
Time: night heard morning evening clock ten past waiting house thirty usual surprise found quarter quiet
Crime: house found examined night body showed show clue signs finally death proved carefully carried servant
Money: years money ago twenty hundred lady king pounds gold months pay photograph age year thousand
Deduction process: case interest facts points point investigation remarked give follow incident theory interesting obvious run conclusion
Exploring Sherlock Holmes Stories and Topic Modeling
When using Mallet for topic modeling, I was surprised by how quickly the tool implemented algorithms to sort through over two thousand text elements. I assumed it would take much longer to go through 1000 iterations. After playing with the number of topics, I decided to pick three categories from each of the four html outputs which were composed of 75, 50, 30, and 15 topics. Though I started with 50 topics on a list and moved upward, I found that having over 100 topics on a list was a lot to sift through; while being aesthetically overwhelming, some of the general themes began repeating after 100. When looking at my twelve selected topics, they are undoubtedly representative of Sherlock Holmes’ world. The topics I’ve chosen to label include:
Found Corpse Crime in London Murder Bedroom
Baker Street Holmes in his Room Attack Investigation
Examine Sudden House Holmes Sitting
In 3 out of 4 of the groups of topics, there is at least one topic related to crime. “Found Corpse” (from 75 topics), “Murder” (from 50 topics), and “Attack” (from 30 topics) all reference the violent crimes that make Holmes’ mysteries so engaging. Though the number of topics differs quite significantly between the lists of 75 and 30, the word blood still made it into both groups, showing its definite relevance and recurrence. In lists from two different outputs, there are references to Victorian London. The output of 75 topics includes a group of words that I labeled “Baker Street” and the output of 50 topics has a topic I named “Crime in London.” Another commonality among html outputs of varying topic numbers was groups of words relating to investigation or examination.
Each time I manipulated the number of topics, the outputs still maintained references to violence, Holmes, Watson, crime, and violence/murder. These being the most obvious elements in Arthur Conan Doyle’s world of story, I was not surprised that they continuously surfaced. Instead, I was mainly interested that different groups of words still brought to light similar themes, no matter what the number of topics. This reinforced my understanding that topic modeling helps to do just as its name suggests – model the kinds of overarching topics within a broad collection of texts. My only issue with this activity was in naming some groups of words. For some of my chosen categories, there was so much variation in the types of words – nouns, adjectives, verbs – that I needed more than one or two words to describe the topic. The ability to further explore details about topics such as the percentage of each story that a topic has, is very useful, but does not help to note a given topic at face value in some instances.
Analyzing Topic Modeling
Using my mallet results, I was able to produce the above word cloud. creating our own topic modeling and then using these words to create our own word cloud was interesting because it was different from what we usually do: this time we were able to see our own results in a nice visual. When picking from the lists in topic modeling, I only picked the ones that correlated well together. Some of them all made sense together, and other ones had some words in there that appeared more random. For instance, in one of my topic modelings which I named “fatherly advice to son” all the words can be directly related to that topic. However in my other topic modeling labeled “facial features” all the words used could be used to describe a face, but some words could be used in a much broader sense. For example, this topic modeling includes words such as “gray” and “pleasant”– words that could be used to describe a face but only in context. This can prove a possible theory that the more narrow the search, the more specific the topic modeling. I conducted a few different searches: 30 topics and 2000 iterations, 50 topics and 3000 iterations, 40 topics and 5000 iterations, 20 topics and 2000 iterations. “fatherly advice to son” was generated from a 20 topics and 2000 iterations but “facial features” was generated from 50 topics and 3000 iterations. If the topic search was larger, it is likely Mallet will just be searching for any words that could fit in any way, whereas more narrower searches are able to use just words that fit the subject. Although I am just theorizing, Mallet is a very interesting and useful program.
Sherlock Holmes Topic Modeling Lists…
From 75 topics…
Found Corpse: body found lay son blood dead lying knife weapon ran mccarthy moved alive pool wound part head ground examining minutes
Baker Street: street half hour past baker minutes cab quarter waiting wait ten clock glad bound hansom stepped church eleven step men
Examine: left examined found full death examination made carefully marks signs stick finally fire traces showed brought wood burned unfortunate finding
From 50 topics…
Crime in London: crime present case murder night gentleman account tragedy arrest london person appeared caused friends reason discovered committed disappearance charge violence
Holmes in his Room: holmes chair hand sat back turned pipe fire rose arm glanced asked seated shoulder half laid laughed cigar tobacco lit
Murder: man dead hand found lay body blood head struck shot revolver blow terrible knife lying stick death weapon picked finally
From 30 topics…
Bedroom: room door window open light opened entered bed heard key bedroom table closed lamp passage sound steps sitting inside ran
Sudden: suddenly instant face round coming box spring air turned stood forward drawn caught step quick holmes eyes sharp drew appeared
Attack: back cried moment hands head held struck face god carried fell threw cry sake voice creature blood dropped rushed amazement
From 15 topics…
Investigation: case point find matter interest facts friend remarkable singular remarked present fact points curious absolutely cases obvious importance investigation problem
House: room door window light open table round house opened side entered from bed floor large fire led sitting rushed key
Holmes Sitting: holmes hand chair back gave pocket put suddenly answer fellow companion felt laid sort observed pipe cold hat bell answered
Topic Modeling MichaelF
3 Topics from first search (50 topics/ 1000 iterations/ 15 words printed)
1- (Hallway) door room opened open heard key light sound passage stood inside closed hall entered locked steps pass lock dressing stair
2- (Communication) asked friend gentleman mine round godfrey back client fine strong word laughed telegram turn match news rucastle laughing staunton rough
3- (Study/Office) small pocket put study drew cut papers attention eye safe examination bird piece left cigar mark thumb finger seat interest
3 Topics from second search (25 topics/ 500 iterations/ 15 words printed)
1- (Case) inspector colonel crime fact remarked evidence present points train murder observed important follow complete mystery
2- (Suspect) face man head hand dark black cried instant turned white suddenly figure opened quick sight
3- (Evidence) note letter men short word letters handed fashion means written answered writing strong wrong message
4 Topics from third search (20 topics/ 250 iterations/ 10 words printed)
1- (Suspicious) man face eyes head dark hand half deep figure dog
2- (Location) night morning hour made note told brought box clock lord
3- (Discover/Trace) window light black lay side walked led floor corner close
4- (Attack/Violence) woman cried hands hand moment suddenly face instant feet voice
Topic Modeling for Sherlock Holmes
First, I tried my assignment with the computer just sorting them into 10 word categories. These were some of the results.
1. TIME- time, find, turn, hours, knowledge, matters, remarked, present, afternoon, problem
2. MORNING- morning, surprise, breakfast, early, sat, seated, fire, energy, finished, bright
3. NOTES- paper, note, book, read, wrote, sheet, written, writing, write, handed
4. INVESTIGATION- case, points, facts, interest, investigation, remarked, explanation, follow, solution, theory
5. TRANSPORTATION- train, station, carriage, morning, started, found, cross, time, journey, catch
6. CLUES- examined, showed, examination, carefully, cut, marks, bed, top, full, traces
7. CRIME- crime, police, case, evidence, murder, arrest, charge, tragedy, violence, murderer
Next, I updated it to 20 words per category, and I upped the amount of categories to 100! These were some of the best categories I found.
1. EVIDENCE- facts obvious clear person theory impossible explanation question idea perfectly mind means confess formed affair absurd probable possibly evident correct
2. RUNNING- door room open passage ran steps rushed led empty stair stairs pushed corridor foot tore seized feeling vague furniture running
3. CRIME- crime death murder occurred showed evidence scene tragedy violence terrible reason committed murdered moran murderer suspicion attempt criminal motive inquest
4. INVESTIGATION- case interest remarked interesting problem investigation remarkable events solution difficult clue find methods prove points give reasoning follow simple connection
5. MANLINESS- sat pipe fire laid smoke tobacco blue corner lit armchair cigar hung silent gas brandy smoked smoking comfortable shining bachelor
6. LOCATIONS- window room open bedroom moment looked sitting fire threw floor lawn garden alarm moving curtain drawn fired energy forget powder
7. WRITING- paper read note book handed table sheet papers slip written importance post piece pen page pencil tossed picked desk printed
8. TIME- years ago time twenty thirty months age year week lives ship forty voyage exposed dozen bought tongue boat popular families
9. TIMES OF DAY- night morning early clock breakfast morrow surprise sleep signs rest disappeared day slept watch midnight fault trail hopes news clearing
10. TRANSPORTATION- train station carriage cab drive waiting journey drove town cross started line follow fresh bridge reach passing hansom class reached
~Austin Carpentieri