Topic Modeling

When using the program Mallet, when you maximize or minimize the number of topics it affects the topics the tool gives you. This happens because when you change those settings it changes the varied outcomes with a larger variety of words. If you limit yourself too much or too little you can get too broad of an outcome to identify topics easily or on the flip side create a too detailed and specific topic. The number of iterations affects the topics the tool gives you because it limits the outcome depending on how large or small the number you input is. The same type of situation. It can create a topic that has too specific results when these topics are all combined and limited. The main settings that we would recommend are starting to use the program on the class settings. We set it at first to the number of topics to be 50, number of iterations 1000, number of topic words 20, and having the stop words removed. We believed it was a good way to start off using the Mallet program and get a basis before we began messing around with the program and seeing the different outcomes possible from infinite numbers of settings.

We decided to look at three of our topics that were similar and we labeled pretty much the same. We thought it was interesting that we had similar findings with our changes in settings but still thought of the same labels with small outliers.

Appearance– face eyes features looked dark tall pale thin expression figure lips glance sprang gray colour manner spoke clean angry handsome

 Murder -Found man dead body blood left head finally lay drawn knife sign fell round sight blow stick lying clothes thing

Bedroom- room window bed night sitting bedroom bell entered half looked floor heard morning dressing lawn finally remained sleep opened alarm

After looking at our favorite topics the program really let us down. We could not re-access the information. The list appeared but when trying to access by clicking on the topics an error page appeared. It did not work well going back to the program due to errors in saving. It also seemed like some others in the class were having the same issue. This made it difficult to research further with the use of the abbreviations list. Seems as though looking through the blog posts it seems like some of us had similar topics and that is what we discussed together even with minor deviations in settings. This program was helpful in helping us organize the topics but it became more difficult and frustrating trying to find where we went wrong when using this program. If the program worked for us the ways others did I feel like it would of been a very useful tool.

 

 

 

By: Erin & Paul