Topic Modeling Analysis

Messing around with Topic Modeling, I tried it a few times to see how the topics changed with different settings. The first time I used Mallet, I used the settings that we did as a group first for my first blog post to get used to the programing. The settings I used were, the numbers of topics 50, number of iterations 1000, number of topic words 20, and the stop words were removed. From using this program and looking at the first set of 50 topics, I realized I am really bad at figuring out the label or category for these words. It seems to me that I automatically go with the first word that comes to mind which does not fit the entire categories. It was amazing to see how fast this program works. The first time it took 48.149 seconds which is not a lot of time for 2845 files. I assumed it would have taken longer for the program to split up these works into topics. It was really interesting to see this work so efficiently.

The second time using Mallet, I changed up the settings to see if there were huge differences. This time the settings I changed the number of topics from 50 to 25, the number of iterations from 1000 to 1500, the number of topic words from 15 instead of 20, and I decided to keep the stop words removed. The program ran even faster this time 47.219 seconds. I am really impressed with how fast a program can run all that data.

Because the program runs everything so fast it really makes the process more efficient. I do not have to read all these works and I can still see common themes among them and topics. It was interesting seeing how many times words came up once you clicked on certain topics, I also liked the fact that it was split up then by frequency of that word being used within the works. I personally thought it was a good program, but I could see how someone else may catch on more flaws.  For me doing it in  class was very useful and it helped me see more themes within Sherlock Holmes stories.

 

-Erin S.