Dmitry Abbakumov: Data from Online Educational Platforms Can Be Useful Both for Students and Teachers
On the one hand, the transition to online learning has brought new opportunities. On the other, it has raised several critical questions that are essential for evidence-based education. The pandemic has only made these issues all the more acute. The problems of online education were discussed at the recent eSTARS conference, which was organized by HSE University together with Coursera. Dmitry Abbakumov, Head of the HSE Centre for Computational Educational Sciences, shared his perspective on the matter.
An X-Ray Image of a Course
The easiest way to describe evidence-based education is to compare it with evidence-based medicine – a paradigm in which all treatment decisions are made based on existing evidence of efficiency and safety. Accordingly, evidence-based education is a paradigm in which only educational tools and practices that have been proven to be effective are used.
Today, we have an opportunity to automatically collect large sets of data about users’ behaviour in educational environments, which can be a valuable source for evidence-based learning. Online platforms record all student activities, literally each mouse click, and save this data in a database, which can be retrieved from the platform.
Data per se is hardly useful. Added value comes from its analysis. For example, a set of zeros and ones – students’ correct and wrong answers to assignments recorded by the platform – can be used in statistical (psychometric) analysis to understand parameters such as students’ preparedness and academic progress.
Data on student activity while they watch video lectures may indicate the tight spots in the lectures. A pattern when a student clicks ‘pause’ and rewinds several seconds back, is empirical evidence that this moment in the lecture video is unclear. If this pattern is observed in single students, we shouldn’t worry, since this is normal. But if a considerable group of students demonstrates this kind of activity, this shows that there is a need to improve the video lecture.
More and more exams are conducted online. Students make errors. We’ve learned to unite these errors in one network and detect common patterns: to understand the cause of certain learning difficulties, and to understand whether it is systemic and requires a solution at the course level (such as via additional learning materials).
Experience shows that data and its analysis serve as an ‘x-ray’ image of a course. Professors find out what they need to improve, while students get the best contents.
High Stakes of Trust
The issue of trust in data is inevitable. Can we be sure that a correct answer to an online problem demonstrates that a user really knows the topic, rather than copied from someone else or randomly chose the correct answer? In order to tackle this problem, we need to base all our diagnostical procedures in education on educational goals, rather than on content. Here’s how it is often done. To create an exam, the teacher looks at their lecture notes and compiles questions based on them—for example, they include a question about a particular date they covered about event the class discussed.
Such an approach is fundamentally wrong. At first, we should look at the topic goals (or formulate them if it hadn’t been done previously) – what students should remember as a result of this topic; what they should understand and how they can apply what they’ve learned. And the exam assignments should test this understanding and application.
The second question concerns trust in the machine learning models that are used in education. There is growing global concern about the use of ‘black box’ models in supporting and making high-stakes decisions. But in education, it is still poorly understood, and the debates about the need to use interpreted models are yet to come.
In education, unlike many other fields of research and practice, it is critical to understand the reasons behind our current situation. For example, when we forecast road traffic, prediction is essential – we need to know the time and location of the future traffic jam. It is much less important how the model that forecasted the jam precisely works. In education, it is essential to understand why precisely the student had made an error, and what kind of difficulty they had. That’s why forecasting (errors and difficulties) is important, but explaining their reasons is a priority. In education, we need to understand the way the machine learning model works, why it makes a certain forecast, how this forecast is related to a student’s previous experience and other contexts. We need to have a model that is interpretable from the very beginning.
Dmitry Abbakumov