This study investigates how targeted attacks can compromise the reliability and applications of large language models (LLMs) in educational assessment, highlighting security vulnerabilities that are frequently underestimated in current AI-supported learning environments. As LLMs and other AI tools are increasingly being integrated into grading, providing feedback, and supporting the evaluation workflow, educators are adopting them for their potential to increase efficiency and scalability. However, this rapid adoption also introduces new risks. An unexplored threat is prompt injection, whereby a student acting as an attacker embeds malicious instructions within seemingly regular assignment submissions to influence the model’s behaviour and obtain a more favourable evaluation. To the best of our knowledge, this is the first systematic comparative study to investigate the vulnerability of popular LLMs within a real-world educational context. We analyse a significant representative scenario involving prompt injection in exam assessment to highlight how easily such manipulations can bypass the teacher’s oversight and distort results, thereby disrupting the entire evaluation process. By modelling the structure and behavioural patterns of LLMs under attack, we aim to clarify the underlying mechanisms and expose their limitations when used in educational settings.
When AI Is Fooled: Hidden Risks in LLM-Assisted Grading / Milani, Alfredo; Franzoni, Valentina; Florindi, Emanuele; Omarbekova, Assel; Bekmanova, Gulmira; Yergesh, Banu. - In: EDUCATION SCIENCES. - ISSN 2227-7102. - 15:11(2025), pp. 1-18. [10.3390/educsci15111419]
When AI Is Fooled: Hidden Risks in LLM-Assisted Grading
Florindi, Emanuele;
2025
Abstract
This study investigates how targeted attacks can compromise the reliability and applications of large language models (LLMs) in educational assessment, highlighting security vulnerabilities that are frequently underestimated in current AI-supported learning environments. As LLMs and other AI tools are increasingly being integrated into grading, providing feedback, and supporting the evaluation workflow, educators are adopting them for their potential to increase efficiency and scalability. However, this rapid adoption also introduces new risks. An unexplored threat is prompt injection, whereby a student acting as an attacker embeds malicious instructions within seemingly regular assignment submissions to influence the model’s behaviour and obtain a more favourable evaluation. To the best of our knowledge, this is the first systematic comparative study to investigate the vulnerability of popular LLMs within a real-world educational context. We analyse a significant representative scenario involving prompt injection in exam assessment to highlight how easily such manipulations can bypass the teacher’s oversight and distort results, thereby disrupting the entire evaluation process. By modelling the structure and behavioural patterns of LLMs under attack, we aim to clarify the underlying mechanisms and expose their limitations when used in educational settings.| File | Dimensione | Formato | |
|---|---|---|---|
|
education-15-01419.pdf
Accesso riservato
Tipologia:
VOR - Versione pubblicata dall'editore
Dimensione
1.27 MB
Formato
Adobe PDF
|
1.27 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris




