A Good DH Project

Digital Humanities (DH) is a fairly new, trending topic that many seem to be experimenting with. Below is a list of some things that need to be kept in mind when delving into the world of DH:

1) Your topic should be of relevance to a larger community. The whole world now has access to all your information. Don’t you want at least some people to care about all of it?

2) Your project or site should be user friendly. Again, the whole world can see this. Don’t you want them to be able to navigate it? What’s the point of having all of this relevant information out there if no one can figure out how to use it?

3) There should be a theme. Your project should not just be a random assortment of facts. Your viewers will be confused as to what they are supposed to get out of it and will therefore, not care about it.

4) Grammar is still important! While this is the internet, and on social media sites it has become more and more acceptable to use “slang”, the DH community is an academic community and you want your readers to be able to understand the point you are trying to get across.

5) Plagiarism is still a crime! Cite everything that is not your original thought! Plagiarism is a crime with severe consequences that still apply to information you post online!

What makes a good DH project?

Aside from all the traits listed above, there are still other important factors to a good DH project. Your project should be easily navigated. Your readers and viewers should not have to do work, you should have done all the work for them. Make sure it is easy for them to find all of the information on your site! Your project should also be aesthetically appealing, but the colors and themes should not make the information hard to view/read.

How does DH let scholars ask new questions?

Well, the obvious answer is that there is more information, and the information is much more accessible to everyone. DH allows everyone to view the information that in the past had to be physically shared with you and wasn’t as easily found. Today, you can find another just by clicking a few buttons. When everyone has the information, the logical outcome is that more questions will be asked. Also, when everyone has all the information, scholars can ask questions within their projects. Whether it’s an interactive site or just a survey, scholars can get non-scholars answers now.

Qualities of a Good DH Project

5 Qualities of a Good DH Project:

1.) Easy to use

2.) Useful information

3.) Easy to search

4.) Sources states

5.) New Perspective on information

What makes a good DH project?

The user should be able to easily and efficiently navigate the project; necessary directions should be provided and the layout should not be confusing.  The project itself should contain information that would be useful to scholars and the different pieces of data that are collected should be related to each other and relevant to different types of research. The user should be able to efficiently search the project for whatever they need so they can make sure they are using the proper program for what they need and find information quickly. Sources should be clearly stated so the user knows where the information is coming from and can research the sources further if needed. The project should give the scholar a new way to look at the information at hand, whether it be through an interactive map or an interactive digital version of a piece of literature.

How does DH let scholars ask new questions?

Digital Humanities projects give scholars the opportunity to look at data in a completely revolutionary way.  Maps are no longer on paper, they can now be on a computer screen with an unlimited amount of features to help the scholar.  Maps can be zoomed in and out, different locations can be highlighted, the possibilities seem endless.  Thousands of books can be uploaded to one project; the user can compare all the words that are being used and study the language used.  Now that scholars have these advanced tools to look at different areas of the humanities they can ask new questions because they have been given a new perspective.  They are able to look at topics in different ways and make connections between different data.  For example, a scholar comparing a database of digital copies of books can ask questions about how different words changed over time.  Instead of reading every single book and comparing how each word was used, the scholar can use a computer to search through the books in a matter of minutes.

Leadenhall Street

Looking for mentions in Sherlock Holmes’ stories I came across Leadenhall Street that appears in A Case of Identity when Miss Sutherland tell Holmes:

“Oh, yes, Mr. Holmes. We were engaged after the first walk that we took. Hosmer—Mr. Angel—was a cashier in an office in Leadenhall Street—and—”

dhm1I used Locating London to map the crimes commited by women in Leadenhall Street and that was my result.

dhm3

Most of the crimes were theft and only 2 of them were royal offences. When I searched for all the genders I noticed that theft was the most popular crime in the area and there were a total of 90 crimes registered in this street.

dhm5

But back to my previous search and looking through the items one caught my attention because it’s punishment was public whipping. The woman was Elizabeth Bond and she was accused of stealing three pewter pint mugs in 14th January 1768. Mrs. Bond deffended herself saying “These pots were lying in the street, I saw them as I was coming by, and put them in to my apron, by that reason I could not tell where to carry them; as to striking them, I never did no more than I do this moment.” * but she was considered guilty anyway.

The full record can be found here.

After checking the crimes I decided to see how Leadenhall Street was classified by Charle’s Booth but what I noticed is that it is not classified at all.

dhm2Searching more thoroughly I found out that it happened because all Charlies Booth’s entries were from workers and the place was mostly mercantile as we can see from the entries below.

dhm4

I tried to use Historical Eye and British Histories but they didn’t show me any relevant information about the street so I decided not to use them.

Works cited

The Proceedings of the Old Bailey case t17680114-42 (reference number)

Charlie’s Booth Survey Notebook – Link

A Case of Identity (Leandenhall Street)

“Father was a plumber in the tottenham Court Road”

Screen Shot 2014-11-07 at 2.00.07 PM

My next choice was the Leandenhall Street Post Office, which is where the letters from Mr. Hosmer Angel were addressed to. Street Post Office did not show up on the map, so I cut it down to Leadenhall Street, the street on which Mr. Hosmer Angel worked, and got these results:

Screen Shot 2014-11-07 at 2.06.33 PM

The first thing I noticed was the amount of information there was about crimes on or relating to Leadenhall Street. When I searched this location on the “Old Bailey Online,” over 1,100 results came back. While obviously not all these crimes had to take place on Leadenhall Street, and this location may just be a background detail to the case, it does show that the location is not remote.

While going through the “Old Bailey Online,” I’ve concluded that the majority of the crimes that happened on or in relation to Leadenhall Street are theft of some degree.

Screen Shot 2014-11-09 at 3.36.03 PM

When searching the Charles Booth Online Archive, I first came across the notebook catalogue. Most of these entries are surveys or questionnaires about businesses regarding pay and working conditions. This leads me to believe that this is not a residential area.Screen Shot 2014-11-09 at 3.39.36 PM

Additionally, I searched the poverty maps on the Charles Booth Online Archive. While I found it was near impossible to find the street on the 1898-99 map, the 2000 map showed the street. Leadenhall Street itself was no colored in, but the surrounding areas ranged from comfortable to wealthy. This shows me that there is little to no poverty in this area. Screen Shot 2014-11-09 at 2.59.55 PM

Screen Shot 2014-11-09 at 3.01.11 PM

The information I learned during this research drew me to a few conclusions about the Sherlock Holmes stories. One of said conclusions is the accuracy of the locations in these stories. For example, since Leadenhall Street is not a residential area, it makes sense that Mr. Hosmer Angel would work in an office on this street. Moreover, the findings on theft on Leadenhall Street could be interpreted as symbolic. At the conclusion of the story, we find that Miss Mary Sutherland’s stepfather had been impersonating a fellow names Mr. Hosmer Angel in order to ensure that Miss Sutherland doesn’t stop paying him money for things such as rent. Therefore, her stepfather stole her money, her time, and her heart for a brief while.

Topic Modeling

Using MALLET, I was able to Topic Model the canon of Sherlock Holmes stories. Once the program was set up, I changed the configuration to use 50 “Number of Topics,” 1000 “Number of Iterations,” and 20 topic words printed. I then fed in all the Sherlock Holmes stories. With these settings, MALLET gave me fifty different topics, each topic having twenty different words in them. From these fifty topics, I choose ten of the more obvious topics and named them. These are the following:

TOPICS:

Monetary Transactions

  • Business, money, make, hundred, asked, man, year, England, company, pounds, pay, thousand, friends, fifty, lived, ten, gold, paid, price, named

Discovering a Murder

  • Found, dead, man, body, crime, death, murder, police, bloody, finally, blow, knife, tragedy, lay, weapon, criminal, murderer, terrible, committed, scene

Presenting Cases to Holmes

  • Matter, understand, family, gave, brought, trust, complete, confidence, force, absolute, question, son, save, promise, happy, taking, honour, roof, reputation, private

Dinner Party

  • House, night, live, people, large, master, servant, evening, servants, household, purpose, dinner, lodge, baynes, enter, high, Garcia, gregson, scott, children

In a Bedroom

  • Room, window, bed, night, sitting, entered, bedroom, morning, open, heard, dressing, lawn, moment, sleep, drawing, rose, upstairs, gown, rooms, smoking

Waiting for a Taxi

  • Street, half, back, hour, past, baker, waiting, cab, quarter, ten, minutes, waited, drive, found, reach, waking, reach, hurried, passing, presently

Standing at a Door

  • Door, open, opened, heard, light, key, stood, closed, sound, passage, led, inside, room, locked, step, heavy, stair, hall, lock, instant

Stationary

  • Paper, note, letter, read, papers, table, box, handed, written, book, wrote, writing, letters, happened, write, sheet, post, document, slip, pocket

Travel

  • House, road, hall, place, side, front, walked, carriage, windows, round, led, garden, miles, drove, houses, yards, direction, drive, walk, cottage

Detective

  • Case, lestrade, evidence, mystery, yard, points, theory, afraid, arrest, facts, effect, Scotland, undoubtedly, difficulty, prisoner, innocent, charge, simple, probably, disappearance

I was then able to experiment with the settings to see different results. As I began to increase the number of iterations, the process began to take longer and longer, until I didn’t really have the time to wait for it. I think the last successful trial was one with 2000 “Number of Iterations.”

At first I had thought that having twenty words in each topic was a bit too much, so I experimented with decreasing this number. By doing this I learned that having less words in each topic often began to make the process more difficult. Sometimes a smaller group of words isn’t enough to successfully establish a trend, and thus a topic. There may be a better number to use than twenty, but it seems to be a pretty fine line. And, of course, not all of the topics I received with using twenty as the number of words in each topic were very good. Out of the fifty topics I received, I was really only able to find ten topics that made any sense to me, and even those may be pushing the proverbial envelope.

As for Topic Modeling, I went into this experiment as a skeptic and I did not really leave any more confident in the process. What Topic Modeling does is take a large number of files, looks through all of the words, then puts certain words in groups. The words in groups are suppose to have a common theme, and thus they are meant to tell you something about the group of text as a whole. I am skeptical about this process because words, when put together, don’t always mean what they may mean if you take them in a literal sense. Examples of this are figurative language such as idioms and metaphors. I’m not sure if the algorithm used in MALLET takes this into consideration, but I wouldn’t think it does. So, if this is the case, MALLET could still be a great program (it really is a great program) and Topic Modeling could still be very useful, but only in the cases where everything is literal. That being said, Topic Modeling has it’s limits just like everything else has it’s limits, and it still may be a great tool to distant read.

Sherlock Holmes Topic Modeling (10)

No. of Iterations: 1000

No. of topic words printed: 20


Topic Modeling (10)

Number of Topics (40)

1. Deliberation: case, fact, reason, facts, explanation, mystery, obvious, idea, simple, shown, great, effect, prove, evident, impossible, solution, theory, observed, probable, story

Number of Topics (50)

2. Investigation: crime, police, evidence, murder, case, attention, account, death, tragedy, arrest, mark, occurred, inquiry, missing, unfortunate, discovered, charge, complete, naturally, committed

3. Attributes: man, face, eyes, dark, figure, looked, tall, head, drawn, black, features, mouth, thin, middle, appearance, deep, huge, beard, nose, lines

4.  Text: paper, note, read, letter, letters, book, handed, table, papers, written, message, writing, wrote, address, short, sheet, post, write, importance, document

5. Expression: face, eyes, turned, lips, spoke, appeared, light, suddenly, pale, manner, sat, staring, sank, expression, nervous, excitement, silent, eager, breath, fixed

Number of Topics (60)

6. Homicide: found, dead, body, left, dreadful, finally, carried, terrible, blow, lying, round, knife, stick, fell, brought, horrible, single, strong, weapon, person

7. Frontyard: road, house, carriage, side, drive, hall, front, direction, drove, back, garden, place, walked, station, yards, pulled, passed, stopped, gate, grounds

8. Setting: room, door, open, window, entered, opened, key, rushed, closed, bedroom, passage, instant, locked, floor, stair, pushed, lock, stairs, led, safe

9. Path: path, passed, showed, foot, round, water, led, track, leaving, ran, walked, edge, traces, feet, hard, grass, marks, fall, lay, ground

10. Mycroft: london, office, brother, suppose, papers, west, mycroft, young, company, evening, monday, club, card, foreign, fog, clerk, pycroft, pocket, daily, government

I kept the number of iterations to 1000 and the number of topic words to 20. I only experimented with the number of topics. I found that the lower the number (ten, foo example) the more general the words were, which made the meaning of word combinations difficult to pinpoint. I ended up using 40, 50, and 60.

At times, I found it difficult to understand some words usage with other terms. I think this is because I haven’t read many Sherlock Holmes stories and I don’t understand some associations. The topic modeling that I did end up using are those terms I strongly associate with Sherlock Holmes. Deliberation, investigation, and homicide relate very much to the overall Sherlock Holmes story line. I think these terms are more general and broad. The other terms (frontyard, attributes, text, setting, path, and expression) are more specific. These are the kind of things Sherlock would use during an investigation, as well as to DO an investigation. These kinds of terms would mostly be used in the middle part of the stories, during the investigation.

Mycroft, of course, is Sherlock’s brother.

 

 

 

Topic Modeling

10 favorite topics from Mallet:

1)  Police

         crime inspector london police found murder evidence arrest charge tragedy proved violence   discovered present remained committed appeared bust trace person

2) Death

 man life poor strange heart told death creature died broke devil met terrible happened human wild loose killed telling terror

3)  Home

  room chair sat fire table sitting half pipe rose corner round lit arm lamp laid cigar tobacco smoke bed chamber

4) Movement

 back station half hour past carriage waiting drive minutes cab train started drove hurried standing start ten pulled quarter reached

5) Scared

light window dark suddenly threw looked round sprang forward shadow steps moved darkness thrown figure black sharp lamp standing water

6) Appearance

face eyes features looked dark tall pale thin expression figure lips glance sprang gray colour manner spoke clean angry handsome

7) Case

holmes street sherlock baker call men king address rooms cross photograph smiling laughing working ha afternoon description seldom absolute admirable

8) Murder

28. found man dead body blood left head finally lay drawn knife sign fell round sight blow stick lying clothes thing

9) Abandon

32. man father young found left point true occurred account time view home narrative evidence court single alive witness place general

10)  Stationary

paper note letter read table book st written handed writing papers wrote sheet post write letters simon document slip reading

11) Routine

night morning clock room morrow mrs hour early breakfast bed arrived work late heard twelve change hours hear sleep dressed

Settings:  Number of topics: 50

                 Number of Iterations  1000

                Number of Topic Words 20

                 Stop Words Removed

Google Ngrams

Ngram 1- Gender, sex, politics

ngram1

Ngram 2- Race, homosexuality, evolution

ngram3

Referencing the Branch Collective website, I chose these terms after looking at their topic clusters page. I mostly focused on their identity section, pulling words such as gender, sex, race, homosexuality. I added politics and evolution because I thought it seemed relevant, given that talking about gender and sex is often taken into a political context, and since they are fairly controversial topics I also added in evolution. As you can see from these Ngram charts, as I expected, gender and homosexuality are barely mentioned in literature during this time period. Also evolution was very low on the chart until about 1870 which makes sense because it was around that time Darwin started publishing his theories. I find it interesting that during the early 1800’s sex was mentioned more than politics, and then in the 1830’s they switch. Clearly politics became more important/popular to write about than sex. Race is mentioned pretty heavily all throughout this period and is on the incline. As time goes on, especially in literature today, I would expect almost all of these words to increase.

Google Ngram definitely helps to see trends in literature during specific time periods, but as the blog post by Ted Underwood explains, it really doesn’t give much context. Although we know that race was talked about frequently during this time period, we have no idea how it was being talked about. Similarly with the word sex. Was it so high because Google books has a lot of erotic novels from this time period? We have no idea what type of books they are taking from to make these charts.

I do like the visualization aspect about these charts, but once again this information can’t stand alone and we need to look further to find more context for the words.

Book Traces – Bullet and Shell

Hello again everybody. This week we worked on our Book Traces. We were asked to hunt down books in the library that had different writings or markings inside of them. The idea behind this stems from the question, what is a book? Some could argue that it is a collection of words and we could throw those words onto a Kindle, Nook, or some other electronic reader. But really we’ve discovered that books are much more than that. They are made of different materials, they age over time, and of course, the reader may interact with it in ways that will go unnoticed and lost forever unless documented by projects such as Book Traces.

I know some people found their subjects pretty quickly, I was not so lucky. I had to search about 15 books before finding one I was happy with. Not that I minded though, scouring through the library turned out to be a fun way to learn about what books have to offer us beyond the original publication. The book I eventually found is Bullet and Shell: War as the Soldier Saw it by Geo F. Williams with illustrations by Edwin Forbes.

IMG_1305

Bullet and Shell is a book about the American Civil War told mostly through the soldiers’ perspectives. The marginalia I discovered was an inscription on the first page of the book. Although in cursive and not entirely legible, it appears to read, “In remembrance of Harry … Brant Christmas 1919 …”

BulletandShellPG

 

There isn’t much to go on, but I wondered who Harry Brant was. Perhaps he was a soldier who passed away the previous year to the inscription in the first World War. Or maybe he is an ancestor who gave his life in the Civil War. Although no concrete answers can be determined, it is fascinating to think of the human level of emotion that may have transpired from whoever received this book (presumably as a gift on Christmas).

Last note: I’m posting this just after noon on Sunday, October 5th. As of now my page hasn’t made it onto the site just yet, but I will edit back here with the link once it is up.