Understanding the MALLET lab

Screen Shot 2014-10-29 at 10.51.34 PM

1) Death: night poor happened de…

Death:

night poor happened death met terrible father heard mother dead creature sister died wild dreadful

Murder:

hand body found round blood dead head lay close shot revolver moment blow fell carried

Law:

police case inspector found law hands force evidence official arrest charge quietly court arrested constable

Relationship:

wife woman husband life left love knew child married heart made give lived loved happy

Time:

morning night hour work clock train past half time station late early breakfast morrow quarter

Finance:

set money business hundred week ten work began company earth position pay pounds thousand paid

Message:

word short letters words means make message single listened earth men expected true american game

Note:

paper note read hand book letter handed pocket written wrote writing put attention drew reading

Family:

father young wife son girl family returned boy child married mother poor died lived daughter

Expressions:

eyes face looked dark thin figure features tall voice lips expression pale turned spoke drawn

After using MALLET, I realized that it was a very easy tool to use. The interface was extremely user friendly and easy to work with. The word cloud above is comprised of my results from topic modeling. My most used words were child, death, and dead. I chose more morbid topics when choosing my ten to post.

Before coming across these 10 topics, i searched through 2000 iterations and chose three out of fifty for my first 3 topics. I then searched through 1000 iterations, and after, i searched through another 1000.

This word cloud helped to emphasize the occurrences of certain words in certain topics.

I chose topics based on my interest in the words that appeared. I always find myself fascinated with morbidity so of course I chose the topics associated with death etc.

In order to understand the underlying topics in Sherlock Holmes stories, topic modeling can be used to help understand what is going on at any given time.

The topics I worked on were very similar to eachother, so I wasnt very surprised to see that certain words were more prevalent than others, especially words having to do with death, etc.

Topic modeling is an incredible tool when trying to understand many works at once. The idea of distant reading is a brilliant way of analyzing thousands, or even millions of texts at once. Distant reading and topic modeling are both tools that I know I will use in my future life.

The reoccurring words that appear help to identify the theme of the story/stories you are reading, which is incredibly useful.

A deeper look at topic modeling

wordcloud

All categories chosen from 50 topics with 1000 iterations:

time – morning night back clock waiting past early morrow quarter arrived

writing – paper note read letter table book handed letters written wrote

physical features – face eyes looked thin features lips figure tall dark expression

household – woman lady wife husband life love girl child married maid

clothing/accessories descriptions – black hair red hat heavy round broad centre coat dress

death/crime – found man dead lay body blood death knife lying round

interrogation/crime solving – give matter idea reason question impossible occurred absolutely explanation true

physical reactions – face turned back instant hand sprang forward moment side head

transportation – station train road carriage passed side drive reached drove hour

darkness/mystery – light suddenly dark long caught sat lamp spoke silence silent

Using MALLET was an interesting experience. I enjoyed how simple and accessible the interface was. I had no trouble navigating the program and tweaking the iterations and so forth to my liking. I experimented with several numbers before choosing to analyze my topics with 50 topics, 1000 iterations, and a 10 topic word selection. I tested extreme numbers to see how it would influence the data. In one trial I searched 500 topics with 3000 iterations. This resulted in too specific of data that explored topics that were relative to particular stories. I also searched as few as 10 topics with only 500 iterations. This generated too many broad and vague topics that did not capture the essence of the mysteries. In the end I felt that narrowing it down to 50 different topics with 1000 iterations gave me a good sense of the Sherlock Holmes stories in a general yet helpful way. The word cloud above displays these words in a creative and interactive way.

The ten topics that I chose out of the fifty total were due to their overall similarity. I assigned the simplest titles that I could think of to each of them to give a general structure for understanding the Sherlock Holmes stories as a collection. Understanding ten basic concepts that are reflective of the entire collection is easier to grasp and accept by the reader. Each title represents an element of the stories that is imperative to the work as a murder mystery relative to the time it was written. Obviously topics such as death, crime, interrogation, and mystery are all blunt examples of what a mystery story encompasses. Some of the other topics such as physical reactions and features are more subtle examples yet serve just as important a role. The stories rely primarily on context clues and other literary devices that create an interesting and challenging mystery to solve. Things such as physical expressions and reactions are important elements of any mystery story because they can explain a lot about an individual character or the way they respond to certain situations. Another topic such as clothing descriptions seems to be part of the style of writing of the collection of Sherlock Holmes stories. Holmes is an icon for mystery investigators and the way that he is dressed is an important part of his appeal. The author pays a lot of attention to the way that Holmes’ dress is described as well as other characters throughout the entire series.

Topic modeling provides a unique framework for examining thousands or millions of texts at once. Distant reading is an interesting concept that I will hopefully be able to exercise in future research. The ability to apply your own ideas and lens to any given topic or series of works through topic modeling is something truly valuable that many other classic tools or academic research methods do not allow or facilitate.

Topics Analysis.

Dan Albrecht.

I feel as though Mallet can be an extremely useful tool in modeling different topics within a  given amount of text.  When I ran the program with the setting for 30 topics, 1000 iterations, and 15 words, I got Death, Details, Victorian Women, Setting, and Deep Thought.  With 10,2000, 30, I got Morbid and Services Request. With 10, 500, and 20, I got Action, Physical Characteristics, and Terror.

The topics Services Request, Details, Setting, Physical Characteristics, and Deep Thought as part of Sherlock Holmes did not surprise me since this is a mystery genre, and one would expect to find them.  What surprised me a little was the presence of Death, Morbid, and Terror.  Since many of these stories deal with murder, than maybe they shouldn’t have, but I was impressed that the Holmes stories don’t just appeal to those who want logic and analytically detective work, but these stories can also appeal to the emotions of their readers to keep them gripped.

I also got Action and Victorian Women in this list of topics.  Action was another plot device that Conan Doyle was able to use to appeal to his readers.  Victorian Women was an indicator that much of these stories reflect general attitudes about Victorian culture, including gender attitudes.

These lists of topics really help to underscore some of the general themes and plot devices of the Holmes stories, but these topics might have been harder to understand if the user has never read the Holmes stories, but it can be useful nonetheless.

MALLET Results MichealF

word cloud 2

Posted above is my word cloud made with my MALLET results. We had used MALLET previously in class and it was interesting to create a key word or category for a group of related words. Making them ourselves however was a different experience. I got to see what goes into making these topic models. I used 4 separate combinations when topic modeling. My first search was 50 topics/1000 iterations/ 20 words printed. Within this search I picked the 3 sentences that were able to be categorized the easiest. The topics for the three examples I chose were “Hallway”, “Communication”, and “Study/Office”. The second search I did was 25 topics/ 500 iterations/ 15 words printed. The three examples were “Case”, “Suspect” and “Evidence”. The third search I did was 20 topics/ 250 iterations/ 10 words printed. The four examples I chose were “Suspicious”, “Location”, “Discover/Trace” and “Attack/Violence”. During my search results I felt that it would be best to narrow my search requirements after every time. My reasoning behind this was that by narrowing my search queue, I would get more accurate results every time. I felt that the more words printed in a search results would make the topic harder to categorize because there is more words that you need to relate with each other. The models I got with narrower search results were easier to understand and easier to categorize. Overall, topic modeling using MALLET was a helpful tool to try and find main themes throughout all the Sherlock Holmes stories and I look forward to doing it again in class if given the opportunity.

Iterations of Sherlock Holmes

Samantha Harris

50 topics, 1000 iterations

1. found dead man body crime police blood murder hopkins death tragedy evidence scene knife account violence lying committed showed weapon- Crime, Investigation, murder

2. room window bed sitting bedroom entered open empty floor table dressing clothes lamp lawn finally study signs horror fire tregennis- bedroom, scenery, setting

60 topics, 600 iterations

3. table papers pocket hand box put drew small glanced looked thing ready seated examination envelope revolver silent piece contents thrust-  mail, investigation

4. found blood examined left round long examination finally carefully knife stick marks wood cut body- investigation, murder, blood, death, crime, detectives

5. paper note letter read hand book handed pocket written letters wrote writing sheet write slip- letters, reading, writing, mystery

70 topics 700 iterations 

6. room bed entered sitting bedroom window table bell dressing lawn floor upstairs drawing lamp furnished- bedroom, description, scenery

7. case facts points point fact investigation mystery interest theory give problem attention solution formed inquiry– facts, theories, knowledge, investigation, police, examination

8. face eyes tall features figure thin dark expression lips drawn pale raised beard looked voice – characterization, looks, expressions

80 topics 700 iterations

9. face back sat dreadful cry caught sudden sight horrible rushed voice fell sunk broke suddenly- depression, sadness, uncertainty

10. half hour past back late cab waiting minutes quarter ten wait heard time evening clock- time, waiting, clock

Topic Modeling Analysis and Word Cloud

Wordcloud

My topics for MALLET were really interesting, and I think that they say a lot about Sherlock Holmes as a whole. One of the first ones I came across was one that I entitled “Evidence.” This category had words like “Facts,clear, theory, possibly…” and many others. The importance of this category to the Sherlock Holmes stories cannot be understated. Obviously, to a detective, evidence is a pretty important thing. I found many other categories which one would expect to find in detective stories (e.g. Crime and Investigation) but some of the others were a little more interesting. Take for example a category I named “Manliness.” This category had words like “pipe, fire, smoke, tobacco, armchair, cigar” and “brandy.” Just from these words alone, one can get the image of a wax mustachioed man, sipping brandy and smoking a pipe by the fireside. While this is not exactly how anyone in the Sherlock Holmes’ stories is portrayed, it does have a certain feel that you get from these stories– an almost Rudyard Kipling type ambiance. Another big category i noticed, I named “Transportation.” In it were words like “train, station, carriage, cab, drive, waiting” and “journey.” I think that this category illustrates that transportation is a big part of the stories, and also shows that there is not just one was of getting around that the stories focuses on. Sherlock and Watson use train, automobile, walking, carriage, and almost any other type of transportation that you can imagine. They are always going somewhere. These were the most interesting and telling categories I discovered with the MALLET tool, and upping the number of words in the categories really did help with creating some more unique categories. Overall, I really enjoyed using MALLET, and look forward to using it in the future.

~Austin Carpentieri

Followup: 2000 iterations and a burning hot computer

My computer is not sluggish- it can handle Battlefield 4 on Ultra at 1080p/60fps (which, for you nongamers, means very fast and very good looking). However, it would seem skimming through text documents gives it some pause for concern. 62.976 seconds after starting up the topic modeling tool, though, my little machine spit out a list of 50 topics that could be isolated from the various words therein. So that one doesn’t need to refer back to my last post, here’s a refresher:

1. holmes word head words men message revolver shook life shot — Holmes, firearms, and investigations
2. light stood long suddenly lamp dark sound low shoulder figure — Stealth and sneakiness
3. clear doubt mind person possibly obvious idea excellent perfectly point — Deduction and flattery
4. make father made heard son returned left mr view point — Conspiracy and inheritance
5. eyes face man looked dark thin tall features companion pale — Description of characters
6. house small large stone great high place square windows houses — Houses and mansions
7. reason remember fear danger clear told chance strong horror family — Rationale
8. told heart knew god story hands life speak truth leave — Rationalization
9. matter understand position imagine call absolutely important trust force hope — Help me, Holmes, you’re my only hope
10. holmes mr professor fresh work aware surprise action great change —Sudden change in behavior

So, why did I choose these topics? They all had a primary commonality, being that they were about a general topic narrowed down to instances from their specific stories. Examples were plucked from specific passages, but these are overarching sentiments seen again and again in the archives. These sentiments are basic tropes in the mystery canon: implements of murder (1), men creeping in the shadows (2), a victim’s family rationalizing their sorrows (8), and, particularly for Holmes, a plea for help (9).

The simplicity of the fairly elaborate points here makes these 10 topics effective for getting a “feel” for Sherlock Holmes and the universe he inhabits. Together, they detail the basic elements of an average story. Thus, I believe them to be the most effective topics to be chosen out of this fairly bulky list.

As for the generation of the list, I experimented with a variety of settings before settling on the 50 topics/2000 iterations/10 topic word option. I tried as many as 500 topics and 5000 iterations, and as few as 10 topics and 500 iterations. The former produced too many specific topics, focusing on specific plot elements from specific stories. The latter produced too many broad topics, focusing on broadly used vocabulary words from many of the stories. I determined that an appropriate middle ground was found in the 50/2000/10 option, and I believe the topics chosen reflect that.

50 topics, 2000 iterations and a strangely sluggish i7 later

All from a cycle consistent of 50 topics, 2000 iterations, and 10 topic words.

1. holmes word head words men message revolver shook life shot — Holmes, firearms, and investigations
2. light stood long suddenly lamp dark sound low shoulder figure — Stealth and sneakiness
3. clear doubt mind person possibly obvious idea excellent perfectly point — Deduction and flattery
4. make father made heard son returned left mr view point — Conspiracy and inheritance
5. eyes face man looked dark thin tall features companion pale — Description of characters
6. house small large stone great high place square windows houses — Houses and mansions
7. reason remember fear danger clear told chance strong horror family — Rationale
8. told heart knew god story hands life speak truth leave — Rationalization
9. matter understand position imagine call absolutely important trust force hope — Help me, Holmes, you’re my only hope
10. holmes mr professor fresh work aware surprise action great change — Sudden change in behavior

Murder

crime night death murder criminal fact tragedy terrible arrest attempt violence account caused discovered scene proved action murderer save committed

House

house small large front stone side high windows place standing evidently houses centre low butler sun building iron spot narrow

At 50:1000:20

Time Traveling

half back minutes past hour waiting cab ten start drove hurried wait pulled close journey drive station stepped quarter pocket

Letters

paper note read letter table book letters papers handed written wrote short pocket message write importance word put sheet post

Love/Marriage

woman wife husband lady girl love mrs child life married miss beautiful loved ferguson heart rucastle women boy hunter jack

At 30:700:15

Sherlcock Shooting

holmes chair sat table sherlock fire looked asked laid rose drew arm glanced pocket early

At 100:1000:20

School

professor fresh young world aware london smith moriarty bennett prepared class change brain general susan great developments university presbury reputation

Death in Family

girl child told mother father died truth heart wife ferguson mad struck poor dear weak fate nurse frightened dreadful mine

Observations/Lighting of Room

light dark lamp suddenly sound sharp darkness figure low stood heard silent shining lit silence struck match shone ears whispered

Case

case interest cases points mystery remarked solution problem prove events connection criminal methods results facts working investigation reasoning engaged observation

MALLET Topic Modeling

1. Money:

money hundred business pounds thousand year ten price sum worth terms pay fifty check paid advance wished bank ruin capital

(100 topics, 1000 iterations, 20 words)

2. Murder:

found man body blood dead knife lay stick blow head carried weapon heavy finally unfortunate neck wound lying drawn struck

(100 topics, 1000 iterations, 20 words)

3. Actions:

holmes chair rose laid pray sit sherlock companion seated quick visitor glance seat satisfaction sat ha questioning listen arm hope

(100 topics, 2000 iterations, 20 words)

4. Married Life:

woman lady wife love maid husband loved young married life beautiful mistress women daughter marriage spite lovely lover hated brackenstall

(100 topics, 2000 iterations, 20 words)

5. Investigation:

case find matter point watson impossible investigation doubt problem explanation clue complete simple question facts confess remember true admit present

(50 topics, 1000 iterations, 20 words)

6. Clothing:

black round hat heavy blue coat large white side yellow broad dress red brown dressed centre head grey bird observe

(50 topics, 1000 iterations, 20 words)

7. Home:

room bed table window house entered fire round sitting bedroom

(50 topics, 1000 iterations, 10 words)

8. Observation:

mind thought matter observed people sort things made study trouble

(50 topics, 1000 iterations, 10 words)

9. Writing:

paper note read letter table book papers pocket letters written

(50 topics, 2000 iterations, 10 words)

10. Time Measures:

time years week ago year country months days age twenty

(50 topics, 2000 iterations, 10 words)