What Makes a Good DH Project?

Qualities of a Good DH Project: Large Data Set, Citations, Aesthetically Pleasing, Interactive, Search Feature

The new wave of scholarly work that is known as the ‘Digital Humanities’ is all about data (and lots of it). When the humanities and technology converge to create new research and discoveries, large amounts of information are compiled in order to do so. New questions cannot be asked without analyzing as much information as possible, so as to ensure a secure basis for those new inquiries. Living in the digital space, a Digital Humanities project is worth exploring if it has some key qualities.

A great DH project has a large data set that it pulls from, references, analyzes, reflects, or even contradicts. The more data there is in a project, the more expansive and the greater the opportunity to analyze findings, identify trends, and pose new questions. With that in mind, with large data comes large responsibility …to be scholarly. Incorporating lots of data is a wonderful first step, but citations are the important second step. The credibility of a project determines whether it is considered scholarly, and the best DH projects not only reveal new data, but also openly share the original data that the project pulls from.

The advent of digitizing works of the humanities creates a new question of ‘what will the digitized project look like?’ No matter what the project, a good DH work should be aesthetically pleasing. This does not mean that it needs to be reminiscent of the Romantic era, but it does need to look clear, rather than cluttered or illegible. At a glance an aesthetically pleasing project will look like it relates to the data that it is based upon. For example, a good mapping project or Ngram will be legible and clear, and a good topic modeling or word cloud project will at a glance be somewhat representative of the publications it is pulling from.

Aesthetics of a project are important for the analysis and exploration of them but looks aren’t everything. A good DH project is interactive. Depending on the type of project, this can be interpreted in several ways. In a word cloud visualization, the ability to hover over or click on words to learn about their frequency and other information is one form of interactivity. On a map, this could just mean the ability to zoom in and out, or it could mean having the ability to plot data points on a map or compare a map form the past to a map from the present. No matter what the form of engagement, a good DH project will take advantage of the endless possibilities that the digital space allows.

Related to project interactivity, a great DH project has some kind of search feature. The ability to not only explore the information in front of you in a project, but to also quickly explore related texts, visuals, maps, and more is what makes the digital space unique. Digitized projects that live online allow web surfers to have a world of information at their fingertips. On a map, searching could just mean plotting data on a map or it could mean searching for a specific street in a city. In a topic modeling project it could mean looking up metadata regarding a specific word across a corpus of texts. No matter what the search function, its addition to a project allows for that immediate exploration, analysis, and creation of questions.

The field of Digital Humanities has a modern goal made possible by the World Wide Web – to engage a broader network of scholars in the pursuit of knowledge of the humanities. Pulling discoveries from the modernization of dated texts, works, and arts, this field allows scholars to ask new questions because of the ability to share content online across oceans and continents. The “Digital” in Digital Humanities provides both an increase in accessibility and a modern approach to works of the humanities. Greater access means greater possibility of discovery and taking a modern look at older works creates for a new lens through which scholars can evaluate the past. Increased accessibility of old and new works, a broad network of scholars, and the use of digital tools all allow scholars to ask new questions through collaboration, detailed analysis, and modern technology.

Qualities of a Good DH Project

5 Qualities of a Good DH Project:

1.) Easy to use

2.) Useful information

3.) Easy to search

4.) Sources states

5.) New Perspective on information

What makes a good DH project?

The user should be able to easily and efficiently navigate the project; necessary directions should be provided and the layout should not be confusing.  The project itself should contain information that would be useful to scholars and the different pieces of data that are collected should be related to each other and relevant to different types of research. The user should be able to efficiently search the project for whatever they need so they can make sure they are using the proper program for what they need and find information quickly. Sources should be clearly stated so the user knows where the information is coming from and can research the sources further if needed. The project should give the scholar a new way to look at the information at hand, whether it be through an interactive map or an interactive digital version of a piece of literature.

How does DH let scholars ask new questions?

Digital Humanities projects give scholars the opportunity to look at data in a completely revolutionary way.  Maps are no longer on paper, they can now be on a computer screen with an unlimited amount of features to help the scholar.  Maps can be zoomed in and out, different locations can be highlighted, the possibilities seem endless.  Thousands of books can be uploaded to one project; the user can compare all the words that are being used and study the language used.  Now that scholars have these advanced tools to look at different areas of the humanities they can ask new questions because they have been given a new perspective.  They are able to look at topics in different ways and make connections between different data.  For example, a scholar comparing a database of digital copies of books can ask questions about how different words changed over time.  Instead of reading every single book and comparing how each word was used, the scholar can use a computer to search through the books in a matter of minutes.

Digital Humanities Project: Five Keys to success

1. Good Layout: It should be very easy to manage as a user and very simple as well. Not a lot of deotores away from the main home page where the viewer will end up just being lost. It should be very easy to navigate.

2. Straight Forward: Information should be very straight forward. People aren’t looking for a bunch of information thats only slightly relevant. This wont help them, just keep them waiting for what they want.

3. Easy to Search: There should be search bar nice and big at the top of the page. This way if the user knows what they want, they can find it within seconds instead of having to search around.

4. Concentrated: This project should be concentrated with a lot of information on the same topic. Instead of little bits of information here and there on a bunch of different topics, there should be loads of information on one single topic.

5. Cited: It needs to be cited so that its not plagiarism and the viewer could go to these sites as well as yours for other information.

 

Questions:

1. What makes a good Digital Humanities project are all the things listed above. These are the main components of any DH project. The information should be interesting to the reader so it keeps their attention. It should also be solid information as well so that they can maybe tell a friend about it and say that it was a really good site to look at. They may end up coming back to it again for future reference. All of your information that you are giving out to the people should be easy to follow. It shouldn’t be all over the place or else the viewers attention will keep breaking and they will leave and move on. It needs to be well organized.

 

2. Digital Humanities lets scholars ask new questions by bringing up new theories and different perspectives. The way technology is improving our ways of researching can bring up many questions about how we are able to do such things as make a map of 1776 London and be bale to make it side by side with present day London. Many questions can be asked by scholars, its all in their opinions on what they find interesting and what more information they would like to find out.

Five Qualities of an Excellent Digital Humanities Project

An excellent DH project should be:

1. Focused: A great Digital Humanities Project will have a central theme that brings all of the data together

2. Relevant: The project should concern something that people will care about. Creating a project of your brother’s poetry won’t be successful, because most people do not care to study your brothers poetry. Rather, a distinguished poet’s work would make your project more relevant because it may be useful for scholars scholars.

3. Searchable: Especially if the project contains many different works, a search bar will make it easier for scholars to sift through all of your data

4. Cited: Be sure to specify where you got your information. It is extremely important that scholars know your data is reliable.

5. Visually Appealing: The format and graphics of a good DH project keep scholars engaged. Make sure the format of your project is clean and structured around your theme. This will make it easier to navigate. Clear images are critical because they ensure accuracy, especially when you can zoom in.

1.  A great digital humanities should contain interesting, organized, reliable material. Scholars use projects that have compelling themes and useful data. In order to prove that your data is interesting, it is important to keep it organized. Well-organized data keeps scholars interested. Data that is not structured and orderly will be difficult to focus on. Lastly, a DH project is great when the data is reliable. A dependable DH project is comprised entirely of correctly cited data that has no mistakes (especially in digital editions!).

2. Digital Humanities enables scholars to ask new questions about past issues by providing them with new methods. Old concepts can be analyzed using new technology. A prime example of this is distant reading. Scholars can use distant reading to examine trends across thousands of texts without closely reading each individual one. MALLET, a topic modeling tool used for distant reading can process multitudes of digital texts in less than two minutes. MALLET reveals countless different topics within groups of texts, allowing scholars to question previous research conducted using traditional tools. New Digital Humanities tools such as MALLET allow scholars to expand their research and ask new questions.

Qualities of a good Digital Humanities project

Five qualities of a good digital humanities project:
1) The project is aesthetically pleasing
2) It is easy to navigate/clean interface
3) Includes context about the data and proper credit/citations
4) There is a functioning search feature
5) It incorporates a vast amount of relevant material

How each person defines a “good” digital humanities project will vary depending on who they are, what they are building, and their perception of the world and the way everything operates. Each person has their own cultural lens of experience that they understand and interpret everyday life with. With that in mind, I feel that in my own experience, a good digital humanities project will incorporate each of the five qualities I listed above. The purpose of a digital humanities project seems to be a tool or interface that is created not only to display information and data for a particular agenda but also to benefit the greater community of people doing research. Projects should be attractive and easy on the eyes. People will be more inclined to use your project or database if the colors and layout appeal to them. This also connects to the idea of simple navigation. I feel that a good project will be straightforward, clean, and easy to peruse. If there was a project or archive we looked at in class that was hard to read or navigate, I found myself straying away from it. Context is critical for a good project. If someone does not know the context of the material they are viewing, how are they expected to effectively use it? A good project should explicitly describe the mission of the project and provide adequate background information on what is being presented. Citations of any sources is also crucial for a good project. If there is one thing I learned, it is to give credit where credit is due. Having a search feature for a project may not be necessary depending on what type of project is chosen, although with an archive or other similar type of project, incorporating a functioning search feature is very helpful for the researcher. Having a search engine allows user to quickly browse for exactly what they are looking for in order to extract whether or not the project is something they could benefit from using. The final quality on my list discusses using a vast amount of evidence that is relevant to your topic. I feel that in most cases the more data you have, the stronger your project will be. Having a lot of information may be a lot of work to find and incorporate, but it will be worth it for strengthening the final product.

Digital humanities is a significant and exciting emerging discipline. It is constantly evolving and creating new perceptions on how academic research is done. Through different digital tools and archives, the creator enables scholars to actively engage and interact with what they are studying. It challenges classical methods of academic research and allows researchers to understand topics in different perspectives. Using digital outlets to display information allows participants to collectively collaborate and share ideas which is a fantastic benefit to digital humanities. Seeing topics in different lights enables scholars to raise new questions about what they are studying. Projects can create new ideas and discussions that may not have been easy to identify using classic methods of research. Digital humanities is an important, growing, and interesting field that allows scholars to research in new ways and raise questions that had never been thought of before.

Mapping Fleet Street in the Victorian Age

I chose to search the popular London location Fleet Street which is mentioned in the Sherlock Holmes story “The Red-Headed League.” This tale is one of the less dramatic mysteries that Holmes explores, and when I first read it, the mention of Fleet Street caught my eye. I primarily knew the street as home to Stephen Sondheim’s Demon Barber and Mrs. Lovett’s meat pies. In the story of The Red-Headed League, a man gets tricked into working at an office on Fleet Street, assisting with the manual copying of the Encyclopedia Britannica. His new “league” mysteriously disbands very suddenly and with a turn of events, we learn that the office on Fleet Street was a decoy for another crime to take place. As the map from Victorian Google Maps below shows, Fleet Street is broad and stretches across several intersections in central London.

Fleet Street on the Victorian Google Maps
Fleet Street in the Victorian Age, Courtesy of Victorian Google Maps

Fleet Street was known as “a tavern street, as well as a literary centre,” according to historicaleye.com, a website composed of a compilation of academic works about various historical events/locations. Through exploring this and several other sites, I learned that Fleet Street is known as more than Sweeney Todd’s home. In fact, by 1896 several notable writers are cited as having inhabited the street’s pubs; “Shakespeare, Ben Johnson, Raleigh, Dryden, Johnson, Goldsmith…are closely associated with this famous street” (historialeye.com). In regards to this website as a scholarly archive tool, the section on Fleet Street and The Strand on historicaleye.com is difficult to find if accidentally navigated away from. There seem to be two very different parts  of this website – the Then and Now section about London that features historical summaries of London locations in 1896, and the newly “renovated” part of the site that is exposed when clicking on the home button. With no search bar on either of these parts of historicaleye.com, exploration was left only to clicking around the tabs most relevant to London.

The interesting combination of literary greats and taverns is reflected in the socioeconomic status of Fleet Street. Using the Charles Booth Online Archive (http://goo.gl/Jg­RmhL), I looked for the street to learn about its economic makeup in the 1890s. Based on the Charles Booth Poverty Classification Legend, the map below shows that the end of Fleet Street where it converges with the Strand had many middle-class/well-to-do individuals living here, as noted by the red markings. Both Victorian Google Maps and the Charles Booth map note that there are many banks on the part of Fleet Street that approaches The Strand, so the increase in well-to-do individuals correlates well.  Though the map is not very clear to read, I interpret the light blue/gray along the center of Fleet Street to represent the “poor 18-21 year olds” from the Booth Poverty Classification Legend. To the right of Fleet Street as it approaches St. Bride Street, all of light pink represents the population of people who were “fairly comfortable” with “good ordinary earnings.” From well-to-do individuals to poor young adults, this street had a variety of people passing through it in the late 1800’s, further  verifying the reputation of taverns and great Victorian writers in one place.

Using Charles Booth's Poverty Classification Legend, this map shows that the end of Fleet Street that converges to The Strand had many middle-class/well-to-do individuals living here as noted by the red markings.
Fleet Street on a 1898-1899 Map of London

 

The broad range of socioeconomic status on Fleet Street prompted interest for me to explore the types of crime that were documented at the time of the Charles Booth Poverty map. Below are cases that either took place on or involved Fleet Street and therefore surfaced as search results on Old Bailey Online (www.oldbaileyonline.org/static/London-life19th.jsp), an archive that houses centuries of London court cases. Limiting my search to 1896-1898 to coincide with the Poverty Classification map, I found an interesting trend in crimes in the late 1890s on Fleet Street. If I were topic modeling the cases below, it’d be easy to detect the highest trending topic for court cases…theft. Two counts of burglary, two counts of pocketpicking, and two counts of fraud all point to the majority of crimes revolving around stealing money on this street. The somewhat broad range of socioeconomic status may have been responsible for these crimes. These court case crimes, including the extreme manslaughter charge and then perjury and larceny charges all sound like the London that Arthur Conan Doyle depicts by means of Sherlock’s cases, while also relating to the variation of inhabitants’ economic statuses at the time.

A list of cases from the Old Bailey Online Archive that were documented as taking place on Fleet Street in the Victorian Age
A list of cases from the Old Bailey Online Archive that were documented as taking place on Fleet Street in the Victorian Age

 

Sources:

“Booth Poverty Map & Modern Map (Charles Booth Online Archive).” London School of Economics & Political Science, Web. 09 Nov. 2014.

 

Rees, Simon. “Fleet Street and the Strand.” Historicaleye.com. Simon Rees. Web. 09 Nov. 2014.

 

Tim Hitchcock, Robert Shoemaker, Clive Emsley, Sharon Howard and Jamie McLaughlin, et al., The Old Bailey Proceedings Online, 1674-1913 (www.oldbaileyonline.org, version 7.0, 24 March 2012). 09 Nov. 2014.

 

 

 

Traveling to Tottenham: Using GIS to Analyze Locations in Sherlock Holmes

I chose to research Tottenham Court Road because it is mentioned in Sir Arthur Conan Doyle’s The Adventure of the Blue Carbuncle. In the story, Peterson is heading home at night on Tottenham Court Road when he stumbles upon a group of men beating James Ryder. Evidently, Tottenham Court Road is located in an unsafe area.

I began my research by taking screenshots of Tottenham Court Road in Victorian Google Maps. Below is Tottenham Court Road during the both Victorian Era and the present day (“London”).

Screen Shot 2014-11-07 at 6.45.57 PM“British Histories” offered me the most information. At the top of the page is a search bar in which I entered my street name. The search resulted in plethora of publications concerning Tottenham Court Road. The excerpt that I found most valuable on this website revealed that 1878 marked a turn in the lives of many people in London. The author explains that in 1878, poverty was spreading on Tottenham Court Road in Rathbone Place. Walford describes it, “where poverty is almost hopeless” (“Quick”). This helps to explain the violence that Peterson encountered that night he was heading home on Tottenham Court Road. Because of widespread poverty in the area, the men likely attacked James Ryder because they were hungry wanted to eat his goose (and take the blue carbuncle!).

Another site that I found useful in researching Tottenham Court Road was “Old Bailey Online.” It was easy to search for criminal records near or on Tottenham Court Road. The search page allowed me to adjust the time period as well. I searched for records between 1800 and 1901. Almost all of the records from Tottenham Court Road and nearby areas documented theft crimes. Most of them were grand larceny, some of them highway robbery, and a few of them theft from a specified place. One of the records, for example showed that Sarah Crosby stole a shirt, and seven stockings (“The Proceedings”). After viewing the records, I reviewed the situation from The Adventure of the Blue Carbuncle during which the men are attempting to take James Ryder’s goose. With my new knowledge of theft crime on Tottenham Court Road, I realized that it wasn’t uncommon for such situations to occur.

The sites that I found least useful were “Historical Eye,” “Locating London,” and “Charles Booth’s Online Archive.” I was unable to search on “Historical Eye.” On top of that, reading through the site proved to be ineffective because it lacked any information on Tottenham Court Road. “Locating London” turned up only four results even after searching various forms of the street name in 1800 (Ex. Tottenham, Tottenham Ct. Tottenham Court Road). As it turned out, each result led me to the same exact record. The record had nothing to do with Tottenham Court Road, in fact, it only appeared in my search results because the word “Tottenham” appeared on the record once without any context (“Home”). Link to this record: http://www.londonlives.org/browse.jsp?div=NAHOCR70004CR700040070. After an hour of trying to find Tottenham Court Road on the “Charles Booth’s Online Archive” by switching back and forth from Victorian Google Maps to Booth’s map, I still could not locate it. This is unfortunate because Booth’s archive would have been useful for my research considering that it maps poverty.

[Edit Nov. 10th: I now know that there is a search bar on Charles booths online archive. I searched my street name and the following picture of my street revealed that in 1898-99, residents of were living fairly comfortably.]

Screen Shot 2014-11-11 at 7.12.45 AM

Screen Shot 2014-11-11 at 7.12.32 AM

Altogether, Victorian Google Maps, “British Histories,” and “Old Bailey Online” were helpful in learning about Tottenham Court Road, but the other GIS maps were difficult to navigate even after reviewing how to use some of them in class.

Works Cited:

“Booth Poverty Map & Modern Map (Charles Booth Online Archive).”Booth Poverty Map & Modern Map (Charles Booth Online Archive). N.p., n.d. Web. 07 Nov. 2014.

“Circa 1896: Reinventing the Wheel.” Historicaleye.com. N.p., n.d. Web. 07 Nov. 2014.

“Home | LOCATING LONDON’S PAST.” Home | LOCATING LONDON’S PAST. N.p., n.d. Web. 07 Nov. 2014.

“London – OS Town Plan 1893-6.” London – OS Town Plan 1893-6. N.p., n.d. Web. 07 Nov. 2014.

“The Proceedings of the Old Bailey.” London History. N.p., n.d. Web. 07 Nov. 2014.

“Quick Introduction || Pause.” British History Online. N.p., n.d. Web. 07 Nov. 2014.

Sherlock Holmes Topic Modeling

Word Cloud for Blog

First and foremost, I accidentally miscounted and neglected to post a tenth topic so it is included in the following list:

(50 topics/1000 iterations/20 topics printed)

Place: house side road passed walked front round garden hall windows path corner direction window standing ran houses yards led bicycle

Murder/death: found left body blood lay brought examined revolver round examination ground knife carefully wood death stick marks track dead spot

Letter/note: paper note read letter book pocket letters handed wrote written writing write sheet post document slip table reading date envelope

(60 topics/700 iterations/15 topics printed)

Woman: woman lady wife young mrs girl love life husband child miss married story daughter beautiful

Spirits/ghosts: doubt lost danger dangerous clear life criminal law friend memory powers presence death care fear

Time: night heard morning evening clock ten past waiting house thirty usual surprise found quarter quiet

Crime: house found examined night body showed show clue signs finally death proved carefully carried servant

Money: years money ago twenty hundred lady king pounds gold months pay photograph age year thousand

Deduction process: case interest facts points point investigation remarked give follow incident theory interesting obvious run conclusion

Family: father made left happened death poor mother imagine story returned died strange mad truth butler

Though I found topic modeling to be an interesting concept and distant reading tool, I thought it was difficult to understand when I was configuring and selecting my own topics.  I don’t think I was able to spend enough time with the program.  Since I don’t have any background with programming, I felt like there was something I was missing.  It was difficult for me even to get MALLET to compute the data in the first place.  After that, I could go through the lists of words and find how many times they were used and, to an extent, the way they related each other – so I was able to better grasp the use for this tool.  Looking at the words this way appears to be more effective in finding information about a lot of text, as opposed to a word cloud.  A word cloud will display all of the words randomly and show their frequency [like above, displaying the frequency of the words in my topics]; MALLET will list words in relation to each other, so a reader will get a better idea of the themes throughout the collection of literature.  In theory, this word cloud should illustrate a very condensed version of the Sherlock Holmes stories, but these are only words based on my selections of topics from topic modeling.  To any reader outside of this blog, the word cloud above [which focuses mostly on death and bodies and seems to make the stories out to be much more morbid than they really are] could not possibly produce an authentic understanding of the text.

When I chose my topics, I picked out the ones that were the most intriguing to me.  Some were simple and some didn’t make sense – for example, the final topic [the one I had forgotten] makes so little sense to me I don’t know how to title it, whereas the “woman” topic features only words that have direct correlations with the female gender.  For the “family” topic, I finally chose that word to represent them all primarily because of “mother” and “father.”  However, I still wonder what “strange,” “mad,” and “truth” have to do with the topic.  Perhaps “family” is incorrect and the topic is really to do with “storytelling,” which is prevalent in the Holmes stories.  Sherlock’s clients and/or Sherlock himself tell their stories in every individual mystery.  Many of the topics feature at least one word that throws me off of what I think the topic is in general.  So, for me, there is still a disconnect in the idea of distant reading as a comprehensible look at lots of text, but I’m really enjoying looking at new technological ways to consider and discuss literature.

A deeper look at topic modeling

wordcloud

All categories chosen from 50 topics with 1000 iterations:

time – morning night back clock waiting past early morrow quarter arrived

writing – paper note read letter table book handed letters written wrote

physical features – face eyes looked thin features lips figure tall dark expression

household – woman lady wife husband life love girl child married maid

clothing/accessories descriptions – black hair red hat heavy round broad centre coat dress

death/crime – found man dead lay body blood death knife lying round

interrogation/crime solving – give matter idea reason question impossible occurred absolutely explanation true

physical reactions – face turned back instant hand sprang forward moment side head

transportation – station train road carriage passed side drive reached drove hour

darkness/mystery – light suddenly dark long caught sat lamp spoke silence silent

Using MALLET was an interesting experience. I enjoyed how simple and accessible the interface was. I had no trouble navigating the program and tweaking the iterations and so forth to my liking. I experimented with several numbers before choosing to analyze my topics with 50 topics, 1000 iterations, and a 10 topic word selection. I tested extreme numbers to see how it would influence the data. In one trial I searched 500 topics with 3000 iterations. This resulted in too specific of data that explored topics that were relative to particular stories. I also searched as few as 10 topics with only 500 iterations. This generated too many broad and vague topics that did not capture the essence of the mysteries. In the end I felt that narrowing it down to 50 different topics with 1000 iterations gave me a good sense of the Sherlock Holmes stories in a general yet helpful way. The word cloud above displays these words in a creative and interactive way.

The ten topics that I chose out of the fifty total were due to their overall similarity. I assigned the simplest titles that I could think of to each of them to give a general structure for understanding the Sherlock Holmes stories as a collection. Understanding ten basic concepts that are reflective of the entire collection is easier to grasp and accept by the reader. Each title represents an element of the stories that is imperative to the work as a murder mystery relative to the time it was written. Obviously topics such as death, crime, interrogation, and mystery are all blunt examples of what a mystery story encompasses. Some of the other topics such as physical reactions and features are more subtle examples yet serve just as important a role. The stories rely primarily on context clues and other literary devices that create an interesting and challenging mystery to solve. Things such as physical expressions and reactions are important elements of any mystery story because they can explain a lot about an individual character or the way they respond to certain situations. Another topic such as clothing descriptions seems to be part of the style of writing of the collection of Sherlock Holmes stories. Holmes is an icon for mystery investigators and the way that he is dressed is an important part of his appeal. The author pays a lot of attention to the way that Holmes’ dress is described as well as other characters throughout the entire series.

Topic modeling provides a unique framework for examining thousands or millions of texts at once. Distant reading is an interesting concept that I will hopefully be able to exercise in future research. The ability to apply your own ideas and lens to any given topic or series of works through topic modeling is something truly valuable that many other classic tools or academic research methods do not allow or facilitate.

Topic Modeling with MALLET: Analyzing the Results

Initially, it was difficult for me to understand the definition and purpose of topic modeling. However, after using MALLET, a topic modeling tool, to find patterns in Sherlock Holmes stories, I began to understand how topic modeling works.

After entering the Sherlock Holmes stories into MALLET, I found 10 good topics. The first 6 topics came from 50 topics,1000 iterations, and 20 topic words printed. The topic names were Letter Writing, Crime, Marriage, Death, Clues, and Physical Description (Male). The other four topics came from 70 topics, 1500 iterations, and 15 topic words printed. These were Holmes in his Chair, Rooms in a House, London Finance, and Investigation Process. I experimented with other variations of iterations, topics, and topic words printed, but only had time to upload these output files onto my computer. By testing out many different variations I found that the more iterations and topic words you have, the easier it is to identify the topic name. After I picked out my 10 topics, I clicked on the topic words within them in order to see the top ranked documents within that topic. MALLET then allowed me to see the number of words in a specific document that were assigned to that topic. I found, for example, that 22 words in a document from The Stock Broker’s Clerkwere assigned to the London Finance topic. The words in this topic were: money business work hundred answered good pounds company asked thousand advertisement city price headed pay. The document excerpt that MALLET showed at the top of the page revealed that this part of the story was about a “gigantic robbery” in which “nearly a hundred thousand pounds worth of American railway bonds” were found in the robber’s bag. This explains why 22 of the words within the document were assigned to London Finance. MALLET also showed that only 12% of the words in that entire document were assigned to this topic. I went through this same process with all of my topics to figure out which Sherlock Holmes stories discussed certain topics, and how many words in each story were assigned to those topics.

Altogether, I think topic modeling with MALLET is a great way of distant reading. MALLET proved to be efficient after it sifted through mass amounts of text from Sherlock Holmes stories and found patterns within them faster than most of us could even finish reading just one of those stories. There were a few aspects of MALLET, however, that I disliked. First, it creates enormous files. These files take up a lot of space, and this makes the process of transferring them onto Google Drive and onto other computers extremely slow. On top of this, some of the topics it creates are extremely difficult to decipher names for because the words didn’t seem have much in common. A lot of the topics also reappeared after I changed the number of iterations, topics, and topic words (ex. London Finance, Death, Holmes in his Chair). I suppose that was inevitable though, because the text being read by MALLET didn’t change.

After completing this project, I understand that topic modeling tools such as MALLET are useful in that they can take texts and then find patterns in the use of words. topic modeling is most effective when we have many documents/texts that we want to understand without actually closely reading each individual text (distant reading!).

Mary Dellas