It’s difficult for me to narrow down how much I’ve learned this semester about Digital Humanities. I guess I’ll choose my good examples from my favorite topics: archives, wordclouds, and topic modeling:
With archives like the Old Bailey, data can be easily accessible online. So so so much data, including books, photos of paintings, sound bits, video, and et cetera. It’s not just digitally scanned documents anymore. You no longer have to travel to some dark basement or well established college in England to see the original paper documents– they are scanned for you, ready to be read. Although physicality is still important, digital archives offer ways to the general public to access once hidden and/or difficult to study materials. Problems concerning this access and what gets put up online are certainly an issue, but digital archives allow scholars and non scholars alike to access things… Which is pretty neat.
The Old Bailey Proceeding site is easy to navigate, offering plenty of instructive videos on the search functions. Huge plus, especially given how much info there is. It also provides illustrations pertaining to the Old Bailey’s history, in paintings or photographic form. The graphic design choices remain uniform and pertinent to the topics at hand. If you are interested, the site allows you to look at original copies of some proceedings. Also, all the data is cited, a number one rule.
Wordclouds allow creative exploration in their graphic representations of texts. They can clue in the reader to seeing certain word spikes, suggesting an emphasis on specific themes. Again, they are fairly user friendly, and sites like Tagxedo are basic enough to master without much prior experience. Although context is still necessary, (reading the whole text still counts and word clouds do not alleviate this need) word clouds can offer new insight.
A “good” word cloud is one that takes readability into consideration. The words must be legible, and font comes into play there, along with how you position the text. It must have colors that are visually pleasing without being too distracting, and a huge bonus there if somehow your colors correspond to your cloud. Lastly, the data should not include “stop words.”
Topic modeling was a bit harder for me to grasp at first. It seems to take more time and expertise to find the text then break it down to the processed data via MALLET. However, skipping ahead to the part where the themed words are available for labeling, topic modeling was fun. It was something like a crossword puzzle.
While studying topic modeling, we read Robert Nelson’s “Mining the Dispatch” article. The closing comments were what stuck out to me the most when we did our topic modeling for the Holmes stories. In this section, Nelson accurately described what is so interesting about much of digital humanities. What’s missing or not fully there makes up a new set of questions alongside the questions drawn from what is there.
A topic modeling project should make sense, and follow a general theme. If you have a word collection and you choose a random/unrelated thing for the word cluster title, that’s not too great. Again, data you use should be cited. Your context should be factual, which is another easy concept that makes a large difference in the finished project. When dealing with graphs, the axis must be labeled, and show some type of pattern that you illuminate when discussing the results.
Fascinating reflections, very well written! http://telkomuniversity.ac.id