Discussing Topic Models with Mary Dellas and Joe Mausler

After discussing the process and results of topic modeling using MALLET, we know that the fewer topics we have, the broader the topic category MALLET gives us. The more iterations we have, the easier it is to identify a topic name. We recommend the default settings we used in class: 50 topics,1000 iterations, and 20 topic words. This setting gave us enough topic words to determine a topic name, but not so many that it became confusing and repetitive.

These are our three favorite topics:

1. Physical Description (Male): face man eyes looked thin dark features tall expression appearance middle high pale figure set glasses gray keen clean bear

  • a) The top ranked document in the Physical Description (Male) topic is Charles Augustus Milverton. 26 words in the document are assigned to this topic.
  • b) The story The Sussex Vampire uses this topic the least (2 times).
    • Question 1: Even though 26 words in the document are assigned to Physical Description (Male), does this imply that this document is entirely dedicated to the topic Physical Description (Male)?
    • Question 2: Why does it seem like some of the words (ex. set, bear) do not relate to the other words in the topic?

2. Letter Writing: paper note read letter table book box letters papers written handed wrote writing sheet brought importance post write document address

  • a) The top ranked document in the letter writing topic is The “Gloria Scott”. 18 words in the document are assigned to this topic.
  • b) The story Shoscombe Old Place uses the topic least (2 times).
    • Question 1: Why does the same story name appear multiple times on the list of the top ranked documents?
    • Question 2: When we click the story chunk, why is MALLET only showing us a small part of the document?

3. Crime: police crime case night evidence murder death account occurred arrest unfortunate effect tragedy violence complete charge appeared reason terrible committed

  • a)  The top ranked document in the crime topic is The Second Stain. 62 words in The Second Stain we assigned to this topic.
  • b) We found that The Priory School uses crime the least–a total of two times.
    • Question 1: Is crime a more common topic in the later Sherlock Holmes stories or the earlier ones?
    • Question 2: Can MALLET tell us how many stories in total discuss crime?