Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling

Cappelletti, Silvia; Poppi, Tobia; Poppi, Samuele; Yong, Zheng-Xin; Garcia-Olano, Diego; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Large Language Models (LLMs) are traditionally evaluated on multiple-choice question answering (MCQA) tasks using First-Token Probability (FTP), which selects the answer option whose initial token has the highest likelihood. While efficient, FTP can be fragile: models may assign high probability to unrelated tokens (misalignment) or use a valid token merely as part of a generic preamble rather than as a clear answer choice (misinterpretation), undermining the reliability of symbolic evaluation. We propose a simple solution: output prefilling, a structured natural-language prefix (e.g., 'The correct option is:') prepended to the model output. Originally explored in AI safety as an attack strategy, we repurpose prefilling to steer the model to respond with a clean, valid option, without modifying its parameters. Through extensive evaluation, we find that the FTP with prefilling strategy substantially improves accuracy, calibration, and output consistency across a broad set of LLMs and MCQA benchmarks. It outperforms standard FTP and often matches the performance of open-ended generation approaches that require full decoding and external classifiers, while being significantly more efficient. Our analysis suggests that prefilling is a simple, robust, and zero-cost method to enhance the reliability of FTP-based evaluation in multiple-choice settings.

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling / Cappelletti, Silvia; Poppi, Tobia; Poppi, Samuele; Yong, Zheng-Xin; Garcia-Olano, Diego; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - (2026). ( International Conference on Pattern Recognition Lyon, France August 17-22, 2026).

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling

Silvia Cappelletti;Tobia Poppi;Samuele Poppi;Zheng-Xin Yong;Diego Garcia-Olano;Marcella Cornia;Lorenzo Baraldi;Rita Cucchiara

2026

Abstract

Large Language Models (LLMs) are traditionally evaluated on multiple-choice question answering (MCQA) tasks using First-Token Probability (FTP), which selects the answer option whose initial token has the highest likelihood. While efficient, FTP can be fragile: models may assign high probability to unrelated tokens (misalignment) or use a valid token merely as part of a generic preamble rather than as a clear answer choice (misinterpretation), undermining the reliability of symbolic evaluation. We propose a simple solution: output prefilling, a structured natural-language prefix (e.g., 'The correct option is:') prepended to the model output. Originally explored in AI safety as an attack strategy, we repurpose prefilling to steer the model to respond with a clean, valid option, without modifying its parameters. Through extensive evaluation, we find that the FTP with prefilling strategy substantially improves accuracy, calibration, and output consistency across a broad set of LLMs and MCQA benchmarks. It outperforms standard FTP and often matches the performance of open-ended generation approaches that require full decoding and external classifiers, while being significantly more efficient. Our analysis suggests that prefilling is a simple, robust, and zero-cost method to enhance the reliability of FTP-based evaluation in multiple-choice settings.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2026
			
	Titolo del Convegno
	
				International Conference on Pattern Recognition
			
	Luogo del Convegno
	
				Lyon, France
			
	Data del Convegno
	
				August 17-22, 2026
			
	Tutti gli autori
	
						Cappelletti, Silvia; Poppi, Tobia; Poppi, Samuele; Yong, Zheng-Xin; Garcia-Olano, Diego; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
					
	Citazione
	
				Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling / Cappelletti, Silvia; Poppi, Tobia; Poppi, Samuele; Yong, Zheng-Xin; Garcia-Olano, Diego; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - (2026). ( International Conference on Pattern Recognition Lyon, France August 17-22, 2026).
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
2026_ICPR_Prefilling.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 813.91 kB Formato Adobe PDF Visualizza/Apri	813.91 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris