On June 9th, 2021 Kai Analytics had the pleasure of being represented at the Association for the Assessment of Learning in Higher Education (AAHLE 2021 Online Conference) by Kevin Chang, who was joined by Toshiko Shibano of UBC to give a talk on Topic Modeling, and how it’s used to solve a persistent problem; “Every student I survey says they love the course, so why do they keep dropping it?”.
If you’re reading this article, you probably know that student attrition is an issue too big for institutions to ignore. It has been estimated to cost universities and colleges in the US ~$37 billion per year, and so the incentives to tackle this problem are high. Many institutions rely on quantitative data to understand these problems, and while this is very good for establishing benchmarks, it often ignores solutions suggested by the students themselves.
The area of Natural Language Processing (NLP) being discussed in this talk was the subject of Topic Modeling. Kevin laid out the processes behind a Topic Modeling procedure by taking the group through an NLP pipeline. For those of you who aren’t familiar with NLP, it is a method of qualitative data analysis that combines linguistics, computer science, and machine learning. It’s the process of translating words into a format a computer can understand and process.
Topic Modeling is a subject we’ve covered extensively in our blog so we’ve linked to another article of ours that explains the NLP pipeline and Topic Modeling in detail. You can read that here.
After Kevin walked the audience through the standard data pipeline for a Topic Modeling procedure, Toshiko Shibano took over to address the question at hand; how can we learn from course reviews? More specifically, most reviews tend to be positive, but massive open online courses (MOOC’s) still suffer from low retention rates, low completion rates, and enrolment declines. This contradicts the information given by course reviews.
Why is this? The answer could lie in the Pollyanna Hypothesis. The Pollyanna Hypothesis seeks to explain why humans are more likely to remember positive events than negative ones.
There is a universal human tendency to use evaluatively positive words more frequently and diversely than evaluating negative words in communicating.
Boucher, J., & Osgood, C.E. (1969). The Pollyanna Hypothesis.
Journal of Verbal Learning & Verbal Behaviours, 8(1), 1-8
Put simply, we tend to find positive aspects in an experience when we talk about it. Many students will choose to be generous in their course reviews, and the Pollyanna Hypothesis asserts that this is because at a subconscious level, the mind tends to focus on the optimistic; but on the conscious level we tend to focus on the negative. So, when students are asked to review a course, they will give a higher positive score on a Likert scale, even when talking about a negative experience.
To illustrate this problem, Toshiko used a case study analyzing three courses: Digital Marketing, Financial Markets, and Calculus 1. She looked at three elements:
What are the common topics (which equate to student needs)?
Which area receives the most complaints?
How do we develop action plans around this information?
When Topic Modeling is first performed, the weight of positive reviews overshadows the problem areas of the course. But when the data is segmented to just the lowest two satisfaction scores the problem reveals itself. Here’s what that would look like visually:
The darker the colours correspond to a higher concentration of responses that fit in that box. When viewed on this scale, it appears that all three courses are doing very well. But we know from the course retention rates that this is not the case. So, Toshiko recommends segmenting the data by the two lowest scores.
At this scale the weaknesses of each course reveal themselves. A brief glance would suggest that this particular course is weak in peer review, mentor, and guest speaker, followed by responses to issues raised. Note that these are all topics identified in the Topic Modeling phase.
To begin developing an action plan, the analyst could sample a couple responses from these topics to find what, specifically, the students didn’t like about peer review, mentor, and guest speakers.
This talk demonstrated how NLP can be applied to real organizational problems to find potential answers quickly and effectively in qualitative data. In this case, the Pollyanna Hypothesis provided a scientific explanation to the question “why are retention rates low when satisfaction scores are high?”. By using Topic Modeling, we were able to quickly find common themes across multiple courses from different domains (3 courses vs. 25 courses vs. 300+ course and more!). We then looked at the Likert scores to identify areas for course improvement and to get a more nuanced understanding of why students would drop out of a course they rate so highly.
A special thanks to Toshiko for her work in generating the visuals for the presentation.
If you want to see a live demonstration of how this works, feel free to check out our qualitative analytics tool, Unigrams, or reach out to us at hello@kaianalytics.com!
Comments