Beyond Accuracy: Evaluating LLMs for Validating Community Service Provider Directory
ORCID
Saei: https://orcid.org/0000-0001-8125-435X; Anreddy: https://orcid.org/0000-0003-3362-1332
MSU Affiliation
James Worth Bagley College of Engineering; Department of Industrial and Systems Engineering
Creation Date
2026-06-01
Abstract
As artificial intelligence tools are increasingly adopted to validate community service provider directories, it is critical to assess whether large language models (LLMs) can reliably verify structured data in these systems. This study evaluates five LLMs, LLaMA 3.3 70B Versatile, LLaMA 3.1 8B Instant, LLaMA 3 70B 8192, LLaMA 3 8B 8192, and Gemma2 9B IT, using community service provider data from Mississippi across three evaluation conditions: clean records (base-line), systematically corrupted entries, and records with missing fields. Model responses were categorized as “Verified,” “Not Verified,” or “Needs Checking” to assess each model’s ability to confirm correct data, reject erroneous records, and handle uncertainty, respectively. Among the models tested, LLaMA 3.3 70B Versatile demonstrated the most robust overall performance, achieving high verification accuracy on clean data (96%) and the strongest error detection capabilities by rejecting 47% of corrupted entries. In contrast, LLaMA 3 8B 8192 incorrectly verified 79% of corrupted records, indicating unsafe over-permissiveness and weak anomaly detection. These results underscore that high verification accuracy alone is insufficient; effective referral system design must prioritize models that exhibit strong error detection capabilities and appropriately defer uncertain cases to human oversight.
Publication Date
10-13-2025
Publication Title
Software and Data Engineering: 34th International Conference, SEDE 2025, New Orleans, LA, USA, October 20-21, 2025, Proceedings
Publisher
Springer
First Page
373
Last Page
380
Rights
© 2026 The Author(s), under exclusive license to Springer Nature Switzerland AG
Recommended Citation
Saei, S., Ghimire, S., Anreddy, S. (2026). Beyond Accuracy: Evaluating LLMs for Validating Community Service Provider Directory. In: Rahimi, N., Margapuri, V., Golilarz, N.A. (eds) Software and Data Engineering. SEDE 2025. Communications in Computer and Information Science, vol 2720 . Springer, Cham. https://doi.org/10.1007/978-3-032-08649-5_23