Kimberly's Research Blog: Doru, B., Maier, C., Busse, J. S., Lücke, T., Schönhoff, J., Enax-Krumova, E., ... & Tokic, M. (2025). Detecting artificial intelligence–generated versus human-written medical student essays: Semirandomized controlled study. JMIR Medical Education, 11, e62779.

What They Did

The researchers had medical and humanities experts identify which of a set of German-language term papers were written by medical students and which were generated by ChatGPT 3.5, as well as rate several aspects of each paper. Each participant had a week to make their decisions, though they were instructed not to discuss the project. Participants identified student or AI authorship of the papers 70% of the time, with no significant differences between the medical or humanities experts, nor correlation with participant traits such as experience in academia, experience with ChatGPT, or knowledge of the subject matter.

For papers that were correctly identified, the medical experts rated the student papers as having better language use, logic, and scientific approaches, while the humanities experts gave the correctly-identified student papers better ratings on scientific approaches but not other traits. Medical experts also gave student papers better ratings on citation of sources, even when they incorrectly identified them as AI-generated. When papers were correctly identified, medical experts rated student papers better at suggesting new research directions, but when they were incorrectly identified, the AI papers were rated better.

In follow-up interviews, participants frequently indicated that they identified papers as AI-generated because they were redundant, repetitive, or lacked a sense of coherence. The researchers point out that although the participants relied heavily on linguistic style to distinguish between the student and AI-generated papers, it is not clear how effective they would be if they didn’t already know that one of the papers was AI-generated.

Further Exploration

Difficulty distinguishing between papers written by students and those generated by AI poses a major challenge to education, and advances in AI will likely make it even harder. AI-generated books have appeared for sale on Amazon, including a mushroom-foraging guide that could pose a real danger with inaccurate information (see https://www.theguardian.com/technology/2023/sep/01/mushroom-pickers-urged-to-avoid-foraging-books-on-amazon-that-appear-to-be-written-by-ai.) At the same time, students may be falsely accused of cheating with AI and have trouble proving otherwise (see https://odsc.medium.com/ai-detectors-wrongly-accuse-students-of-cheating-sparking-controversy-7afb2ea7edc8).

The prompts the researchers used for generating papers with ChatGPT simply told the program to write sections of the paper one at a time with citations. I imagine that a tech-savvy student could edit such a paper to sound more human without having to know much about the topic. At the same time, using natural langue to prompt an AI can make it more effective than a traditional thesaurus, and students can also use AI to help organize and structure their own ideas. These uses of AI seem akin to the use of a calculator in a math class.

More complex is the question of what constitutes fair use of written material in building an AI. On one hand, the major AIs have been built using copyright material without permission (see https://authorsguild.org/advocacy/artificial-intelligence/faq/.) On the other hand, my own human “brain soup” is full of fragments of books I’ve read and ideas I’ve absorbed without the ability to truly credit all of the sources. Figuring out what’s ethical in this sense is a huge question, but that’s a rabbit hole for another day!

A classic style metal robot with big eyes holding a stack of books in a library

Image credit: Salino01

https://commons.wikimedia.org/wiki/File:Literatursuche_mit_KI_(3).jpg

Kimberly's Research Blog

Friday, April 18, 2025

Doru, B., Maier, C., Busse, J. S., Lücke, T., Schönhoff, J., Enax-Krumova, E., ... & Tokic, M. (2025). Detecting artificial intelligence–generated versus human-written medical student essays: Semirandomized controlled study. JMIR Medical Education, 11, e62779.

No comments:

Post a Comment

Howard, M. C., & Kasprzyk, M. (2025). Integrating modern research on social courage via psychological contract theory: direct and interactive effects of work engagement and moral disengagement. Journal of Organizational Effectiveness: People and Performance, 12(3), 459-478.

Report Abuse