AI-Generated Exam Questions Match Human Complexity, Astonishing Study Finds
A recent study conducted by researchers at the UKB (University Hospital Bonn) has revealed groundbreaking findings regarding the ability of AI to generate exam questions that rival the complexity of those created by human educators. The study included the use of OpenAI’s ChatGPT language model to generate a set of multiple-choice questions (MCQs) for medical studies.
In the study, two sets of 25 MCQs were created, one set by an experienced medical lecturer and the other set by ChatGPT. 161 students participated in answering the questions, while also indicating whether they believed each question was created by a human or by AI. The surprising results demonstrated that the difficulty of the human-generated and AI-generated questions was virtually identical. In fact, students were unable to correctly identify the origin of the questions in almost half of the cases.
Matthias Laupichler, a research associate at the Institute for Medical Didactics at the UKB and one of the study authors, expressed his astonishment at the findings. He stated, We were surprised that the difficulty of human-generated and ChatGPT-generated questions was virtually identical. Even more surprising for us is that the students could not correctly identify the question’s origin in almost half of the cases.
The promising implications of this study suggest that automated generation of exam questions using AI, such as ChatGPT, could prove to be a valuable tool for medical studies. Lecturers can utilize ChatGPT to generate ideas for exam questions, which can then be reviewed and revised as needed. However, it is believed that students, in particular, can greatly benefit from the automated generation of medical practice questions, as self-testing is known to enhance learning.
Johanna Rother, a co-author of the study and colleague of Laupichler, explained, Lecturers can use ChatGPT to generate ideas for exam questions, which are then checked and, if necessary, revised by the lecturers. In our opinion, however, students in particular benefit from the automated generation of medical practice questions, as it has long been known that self-testing one’s own knowledge is very beneficial for learning.
Tobias Raupach, the Director of the Institute of Medical Didactics, emphasized the significance of this research by stating, We have now shown for the first time that the software can also be used to write new questions that hardly differ from those of experienced teachers.
The study participant, Tizian Kaiser, who is studying human medicine in his seventh semester, provided valuable insights into the experience of using AI-generated questions. Kaiser expressed his surprise at the difficulty in differentiating between human-generated and AI-generated questions during the mock exam. He admitted that he had to rely on guessing, as he could barely distinguish between them. This led him to believe that AI has the potential to present a meaningful knowledge query, even exclusively through AI-generated questions.
Kaiser highlighted the benefits of ChatGPT for student learning, particularly in terms of repetitive practice. Students have the opportunity to engage with the material in various ways through AI-generated quizzes, mock exams, and written simulations of oral exams. This tailored repetition of the material aligns with the exam concept and provides endless training possibilities for students.
The study’s findings suggest that regular testing, even without grading, aids in long-term retention of learning content. With the ability to easily create tests using AI-generated questions, educators can incorporate regular testing into their teaching strategies. However, further research is needed to apply these findings across different subjects, semesters, and countries, as well as to explore the potential of AI in generating questions beyond multiple-choice format, commonly used in medical studies.
In conclusion, the UKB study highlights the impressive capabilities of AI-generated exam questions. The research shows that AI-generated questions can match the complexity of those created by experienced human teachers. This breakthrough has the potential to enhance medical education and improve student learning outcomes. However, it is crucial for future studies to validate these findings and explore the broader application of AI in various educational contexts.