Word Clouds: An Engineer’s Thumb

I’ll be honest, I have not heard of word clouds until a few days ago when we discussed them in class.  I have seen them but, I thought they’re only a way of counting the amount of times a word appears in a text, but they’re more than that. They can reveal some overall themes of the text based on how frequently they occur.  My story for example, had the word “German” appear several times, although this was to denote that fact that the characters Lysander and Elise are German.  Also sometimes if the name of a place appears a few times, it could mean that it could play a minor role in the story.  Watson at the time has a practice near Paddington Station and it is noted that Paddington is used as a reference several times in the story.  Also engineer and hydraulics appear a few times as the victim, Victor Hatherley is a hydraulics engineer, showing some insight about a character’s life.

Now, there is some arguments against the use of word clouds, as explained by Jacob Harris of The New York Times.  I agree to a certain extent that word clouds are considered to be a crude analysis of a speech, story, or other text but maybe people should not consider word clouds to be very informative.  Now if they are well thought out and carefully crafted they can be somewhat useful as a learning aid, but you don’t really know how these words connect.

Screen Shot 2015-03-03 at 8.52.35 AM

So to create these word clouds I have used two of the tools, the first one I used is Wordle.  I found it very simple to use, except there is one major flaw. In order to get to one of the menus, I found myself having to click the word cloud itself before I can even open one of the menus.  That’s a big demerit in my book as this is irritating and I was wondering if my touchpad was thinking the same.  Once you get past that, Wordle is very straightforward and you can easily change the font, color and shape (though it’s very limited).  You can call this the iPhone of word cloud generators since it’s does exactly what you need, except there’s a fatal flaw that drives you insane.

Screen Shot 6.png

The second tool I used was Tagxedo, and I found its advanced options to be great.  The official site says “Making word cloud is fun, and is much more fun with Tagxedo!” They are absolutely right, I had a lot of fun making the word clouds with this tool!  The possibilities are endless! My word cloud is in the shape of a swan, which does make the cloud look nice.  Obviously nothing is perfect as Tagxedo has a few flaws, first off, when you move your cursor over a word the small ones are still hard to read.  Another flaw is that it didn’t work on Google Chrome (for me at least) and I had to use Safari.  The last flaw was the fact that I couldn’t figure out how to change the colors, it only changed when you changed the theme.  I do believe that there is a “pro” version that would allow the capabilities or maybe I just missed something.  Still, I really enjoyed using Tagxedo and I think it is a great tool for making word clouds.

A Kiss Goodbye, A Kiss Hello

A Kiss Goodbye, a Kiss Hello
A Kiss Goodbye, a Kiss Hello

The book, The Life and Death in Rebel Prisons by Robert Kellogg, published in 1865, contains an imprisoned marking within the binding. Etched into the fabrics of the page are a pair of red lip stick markings. A kiss goodbye, or a kiss hello?

A kiss goodbye to the man on their way to Lee’s army or a kiss hello to the man who safely arrived to Newbern. A kiss goodbye to someone fallen at the hands of Lee in battle. A kiss hello, to the death and demise of the village of Newbern; a kiss goodbye to the village itself. A kiss hello to welcome the men of Charleston. A kiss hello to the anniversary of our nation, to the harbinger summer and the bright Southern sky. Did a woman fall in love with a man who arrived in Charleston? Did she lose him on the journey there. Did she lose him in the prison, Andersonville. Was the last time she saw the man she loved before he was captured–before he was killed? Did a man carry these lips within the binding of his hands as he wept alone in prison. Did he lose it on the way? Is it a kiss goodbye to those southern days, those summer celebrations, the last of what was left. Was it a kiss hello to the arrival of something fresh and new? The lip prints evoke many questions as to their origins. A kiss of death or a kiss of love? If anyone knows the woman who owns these lips, ask them why they pressed this print.

Check out the post on Book Traces! http://www.booktraces.org/book-submission-life-and-death-in-rebel-prisons/

Adrian Jurek Extra Credit: Victorian London – Police and Policing

Searching through the different terms on the Dictionary of Victorian London i came across a post about “Victorian Era Police”. This really interested me because i never really researched Victorian era law enforcement and the images that keep popping into mind are those from the Sherlock Holmes movie that came out in 2009. This article can be found under: Police; City of London Police; -duties and organization.

Since the City of London was the wealthiest business center of the world, they had their own police force that only watched the city in both day and night. The goal of the policemen was to get a high number of convictions using any means necessary. There was no checks of what the police was doing. The press, unlike today, didn’t accuse officers of corruption , or any malpractice.

One thing that interested me that, in addition to arresting criminals, officers helped out civilians by escorting old ladies to their homes , looking out for future crimes, etc. In addition to this there was a Nightwatch established which patrolled the streets from 10 P.M. to check stores, and prevent any thievery. As quoted from the author,  ALEX. INNES SHAND. , “the City police has arrived at pretty nearly the perfection of efficiency” . This is an example that you don’t need to have superior technology to be highly efficient in what you are doing. According to the chart that is found midway through the article there was 800 total officers working in London in the public sector and 99 policeman in the private sector, both a fairly large amount of officers.

victorian police

What makes a good DH project?

A good Digital Humanities project includes five things;

  • Well done, thorough research.
  • Ascetically Pleasing presentation
  • Good formatting
  • Clear Information
  • Searchable options

What makes a good DH project?

A good digital humanities project is ascetically pleasing, meaning it is nice to look at, but also that the information is clearly shown. It should have enough information, cited correctly, that shows that it is fully researched. The format of the project should be clear and concise, and searchable to find exact details.

How does DH allow scholars to ask new questions?

Digital Humanities allows students as well as scholars to observe information in new ways. DH can provide different ways to see the same information. Through the digital outlet, you might see links you aren’t aware of by looking at a group of books, or information on paper. Digital Humanities, through programs like mallet, can distant read large amounts of information.

What Makes a Good DH Project?

Five qualities of a good DH project include:

  1. Goal oriented: Clear specifications of what the project is and the content it creates.
  2. Organized: No one likes a confusing and hard to understand tool they are using for the first time. Therefore, an organized DH project will allow for easier navigation of the website, and using the tool in everyday life situations.
  3. Aesthetically pleasing: If a DH project is dull and boring, chances are people will begin to stray away from it. Therefore, a project must incorporate techniques that will grab users attention right from the start.
  4. Clear citations: Many DH projects are used by researches (especially students who may be writing a paper). Therefore, having clear citations from where information is taken is essential for both legality purposes, and for viewers who also need citations.
  5. Thematic: Each DH project should have a central idea that their website is based on. Having a central theme allows scholars to narrow down which websites they will find useful in their studies, and which they will not.

Besides the five qualities listed above, I believe a good DH project is one that brings scholars to understand the information that is otherwise looked passed. For example, before actually completing the mapping unit, I had never thought of looking up certain streets in Sherlock Holmes to see if they were actual streets. Instead, I just assumed they were. However, DH projects prevent assumption, and prevail actualization. By completing the mapping unit, I was able to see which streets were actually real, and which were falsely created. I also feel that a good DH project will provide scholars with knowledge that they hadn’t known before visiting the website. It is easy to find websites that reiterate what everyone knows, but it is difficult to find websites that teach everyone something new, and have citations to prove it! I have quickly learned that the DH projects and websites we have taken a look at in class have done that very well, which is specifically why I have enjoyed using them.

DH lets scholars ask new questions, because as I mentioned before, every time they go on a DH website, they are learning something they hadn’t known before. Therefore, I can imagine questions such as “Is that true?” or “How have I never used this digital tool before?” have become frequent questions. Not to mention, because DH is relatively just becoming prevalent in the digital world, many DH scholars may question what other DH projects are available to them and how they can use them in their studies. DH projects also allow for peer-to-peer interaction, as well as peer-to-scholar interaction, allowing for everyone to ask questions and have them answered by the DH community.

GIS: The Final Problem–Vere Street

For my GIS (Geographic Information Systems) project on Sherlock Holmes, I picked Vere Street, where in “The Final Problem,” where Sherlock almost gets hit in the head by a “falling” brick. This is the second attempt on his life in this story. The following quote is the context of the mention of the location within the story: “I kept to the pavement after that, Watson, but as I walked down Vere Street a brick came down from the roof of one of the houses and was shattered to fragments at my feet. I called the police and had the place examined. There were slates and bricks piled up on the roof preparatory to some repairs, and they would have me believe that the wind had toppled over one of these. Of course I knew better, but I could prove nothing.” (The Final Problem)

Screenshot1

I looked on many of the sites listed to provide what the mention of tis street could have to do with the story as a whole. The first data I came across was from the Booth Poverty Map:

Screenshot2

This map shows that in, and around Vere Street there is a wide mixture of people. It almost covers the whole spectrum in this tiny area, going from blue (the poorest) to yellow (the wealthiest). Could this have something to do with the construction going on? Possibly. You would think that the police, though, would have been a little more attentive in such a rich area. Next I looked on Old Bailey Online, but Vere Street yielded no results, so I tried the keyword “brick,” and there was only one result of someone being killed by a brick
Screenshot3

hmmm… not exactly a brick falling from a ledge… But I kept trying on the other databases. Locating London gave me some…weird results. It gave me about 5 pins near Vere Street, but when i clicked on them all it said was “No Results.” alright…..So my last hope for some kind of data was British Histories. I searched Vere Street again…but alas, only one result, which seemed to be a log of a tax collector, or a tax assessment.

Capture

So not much about the cconstruction history, but possibly something can be said here about the wealth of the people here. It seems thaat most of the people renting here are pretty upper to middle class, so it still surprises me that, in this story the police didn’t investigate any further…perhaps somebody paid them not to? That might be what Doyle was trying to get across by using this particular street: that whoever is trying to kill Holmes, has a lot of power.

~ Austin Carpentieri

Sherlock Collaboration-Rosalba Corrao and Alexis Moody

In our collaboration, and by reviewing our topic modeling results, we have learned that the number of topics and iterations has a major effect on the results produced. Increasing the number of topics made it easier to find cohesive topics with an identifiable label, though it made picking through data much more labor intensive and got overwhelming as numbers increased. It seems like a small sacrifice to make, as reducing the number of topics increased the presence of unusable topics. We both seemed to agree that 40-60 topics was an ideal range for achieving good results. In terms of iterations, increasing the number really seemed to increase how well the words within topic groups related to one another. We both increased our number of iterations with each output and noticed that it got easier to identify topics. Ideal settings for the topic modeling tool, to us, seemed to be 50 topics, at least 2000 iterations and 20-25 words printed.

In choosing three of our favorite topics we narrowed it down to suicide, physical appearance, and written document.

Suicide: found man body dead lay blood head struck hand shot revolver blow knife stick heavy weapon unfortunate left death sign lying wound bullet handle formidable pistol finally escaped wounded tied fired carried world struggle dragged grotesque injury spot shirt gun

This topic was most prevalent in Norwood Builder, and least prevalent in Empty House.

Questions:

  1. What can these topics tell us about Sir Arthur Conan Doyle’s writing style?
  2. Was suicide an actual phrase in twentieth century London?

 Physical Appearance: black red white hair hat head large broad coat heavy small middle set short dress cut brown round thick centre grey faced dressed clean glancing

This topic was most prevalent in A Case of Identity, and least prevalent in The Blue Carbuncle.

Questions:

  1. What do the colors symbolize in this short story?
  2. Did the weather factor into the physical appearances of characters in short stories based in twentieth century London?

Written Document: paper note table read papers box book pocket put handed writing written drew sheet glanced picked document slip envelope piece

This topic was most prevalent in The “Gloria Scott,” and least prevalent in The Second Stain.

Questions:

  1. What prevalence does this document have in “Gloria Scott?”
  2. Were written documents important for all investigations?

Topic Modeling Group Project

While working with MALLET, we noticed that a lot of different factors change the types of topics you will get. Here are some of the things which we noticed affected our results.

  • Number of Topics–The number of topics affects the type of topics you get because if you let the computer sort it into more categories, they will have more variety as opposed to if you just have a few to choose from.  The more variety you have instantly makes you think outside the box as to what a specific topic really means.
  • Number of Iterations–The iterations affects the topics the tool gives you because you more words to work with creating more of a complex sentence with more foundation.

I found that the best settings for me was to let the computer sort the data 1000 times, into 100 categories. it gave me a lot to work with so I didn’t get caught up on the topics that meant nothing to me. 

These were the three categories we found the most interesting, and the stories they appeared the most, and least in.

  1. Manliness- sat pipe fire laid smoke tobacco blue corner lit armchair cigar hung silent gas brandy smoked smoking comfortable shining bachelor                                                                                                                                     MOST: man with the twisted lip    LEAST: His Last Bow
  2. Transportation- train station carriage cab drive waiting journey drove town cross started line follow fresh bridge reach passing hansom class reached                                                                                                                                 MOST: The Final Problem     LEAST: The Noble Bachelor
  3. Evidence- facts obvious clear person theory impossible explanation question idea perfectly mind means confess formed affair absurd probable possibly evident correct                                                                                                MOST: Boscombe Valley Mystery      LEAST: The Adventure of the Red Headed Leauge

I think that this raises a few questions. Mainly: How accurate is this data in considering ALL of the Holmes’ stories (considering each has it’s own specific themes) and, how do these topics change chronologically through each of the storied being published?

~Austin Carpentieri & Sammy Harris

Discussing Topic Models with Mary Dellas and Joe Mausler

After discussing the process and results of topic modeling using MALLET, we know that the fewer topics we have, the broader the topic category MALLET gives us. The more iterations we have, the easier it is to identify a topic name. We recommend the default settings we used in class: 50 topics,1000 iterations, and 20 topic words. This setting gave us enough topic words to determine a topic name, but not so many that it became confusing and repetitive.

These are our three favorite topics:

1. Physical Description (Male): face man eyes looked thin dark features tall expression appearance middle high pale figure set glasses gray keen clean bear

  • a) The top ranked document in the Physical Description (Male) topic is Charles Augustus Milverton. 26 words in the document are assigned to this topic.
  • b) The story The Sussex Vampire uses this topic the least (2 times).
    • Question 1: Even though 26 words in the document are assigned to Physical Description (Male), does this imply that this document is entirely dedicated to the topic Physical Description (Male)?
    • Question 2: Why does it seem like some of the words (ex. set, bear) do not relate to the other words in the topic?

2. Letter Writing: paper note read letter table book box letters papers written handed wrote writing sheet brought importance post write document address

  • a) The top ranked document in the letter writing topic is The “Gloria Scott”. 18 words in the document are assigned to this topic.
  • b) The story Shoscombe Old Place uses the topic least (2 times).
    • Question 1: Why does the same story name appear multiple times on the list of the top ranked documents?
    • Question 2: When we click the story chunk, why is MALLET only showing us a small part of the document?

3. Crime: police crime case night evidence murder death account occurred arrest unfortunate effect tragedy violence complete charge appeared reason terrible committed

  • a)  The top ranked document in the crime topic is The Second Stain. 62 words in The Second Stain we assigned to this topic.
  • b) We found that The Priory School uses crime the least–a total of two times.
    • Question 1: Is crime a more common topic in the later Sherlock Holmes stories or the earlier ones?
    • Question 2: Can MALLET tell us how many stories in total discuss crime?

Topic Modeling, Sherlock Holmes Edition

This week’s digital tool was very different from the others we have used in class. After playing around with Mallet and topic modeling, I actually enjoyed trying to “un-puzzle” the words (so to speak), and figure out what the major topics were for each Sherlock Holmes story. For the first three topics I chose, I displayed the modeling tool with 1,000 iterations, 20 words printed, and 50 topics. In reviewing the long list of topics I could have chosen, I actually struggled in finding one that I understood and thought was relevant enough to twentieth century London. For my first topic, broken home, I learned that these groups of words were most popular in the Sherlock Holmes story of The Solitary Cyclist. From reviewing the topics, it seemed as though 25% of the words were those of my topic. When identifying what I thought the topic would be, I eventually labeled it abandoned. However, after reviewing the words again I felt broken home was more appropriate, seeing as how even though a father left, the siblings remained in contact-not an ideal family, but still there version of what a family is. My second topic had to do with investigation. I chose this as a topic because not only did I get it right away, but it also proves that the majority of Sherlock Holmes stories are revolved around investigation! It seemed as though 39% of the words in the story revolved around this theme of investigation, but it didn’t rely to heavenly on it. My third topic was household. It seemed as though 25% of the words in Sussex Vampire revolved around the topic; however, it was only a small portion of the story seeing as how it only outlined the characters of the short story. Lastly, the fourth topic I chose was written document. I was surprised by these results because only about 13% of the story included this topic. Although it may not have been a major theme in “Gloria Scott,” I assume the document was the premise for what the investigative story was based upon.

When I played around with Mallet again, I decided to change it up. Instead of doing 1,000 iterations, this time I did 1,500 iterations, 25 words printed, and 40 topics. I found it easier labeling topics for these various groups of words because I had more to work with and more to compare. Therefore, the first topic I chose was characteristics. It became very clear that this was the topic for these words because words like “face, grey, man, thin, lips” were very prevalent. The next topic I chose was emotions. This was one of the harder topics I had to label because words like “god, voice, words” threw me off- but after reviewing it once more, I decided that a general topic for these words and more would have to do with a person’s emotions. My last topic for this set was physical appearance. This was kind of a fun group to label because it there was a lot of imagery and colors involved, so it was very clear for me to imagine this person standing in front of me. Therefore, I knew this topic had to involve some sort of appearance. When I reviewed the top Sherlock Holmes stories for these related topics I got The Priory School, Lion’s Mane, and A Case of Identity. Interestingly enough, all of these topics were slightly similar, leading me to believe these stories may have similar themes.

For my last set, I chose 2,000 iterations, 40 words printed, and 60 topics. I was a bit more overwhelmed with this set because although I had a lot more to work with, I felt like it was almost too much to work with. When looking at the different sets of words I felt like most of the words matched one another to create a topic, but I felt that others were kind of strenuous and took away from the major theme of the topic. With that being said, I said my first topic was schedule. A lot of the words had to do with timing and places to be, which reminded me a lot of when I plan my day out. For this topic, 12% of The Missing Three-Quarter had to do with my topic, and although another 6% had to do with a different topic, one of the major themes still pointed to scheduling. My second topic was suicide. This topic was very easy to label because all of the words involved with it pointed to something tragic and done to self. Therefore, I thought suicide would be an appropriate label. Looking at the percentages it seemed as though The Norwood Builder was the top story that had about 13% of the words listed for this topic. Although it was not the number one topic for the story, it is prevalent seeing as how a murder must have taken place. Lastly, I chose the topic of traveling for my next set. This was another topic that I was iffy about because I felt like it could have been journey or travel, but I leaned more toward travel because of words such as “bridge, town, (and) cross.” It seemed as though traveling was prevalent in Final Problem, with a total of 29% of topic words mentioned throughout the short story.

Overall I thought Mallet was a fun and interesting tool, and I would most definitely try it out again sometime. It taught me a lot about gathering a major theme based on prevalent words in a short story, but in a unique way. Every time I figured out a topic I felt as though I was unscrambling a really difficult puzzle piece; however, once I came up with the correct topics the whole process became extremely entertaining!