Comparative Empirical Evaluation of Hallucination Mitigation Strategies in LLM-Based Text Generation

Authors

  • Shuyang Xu Master of Professional Studies, Applied Statistics, Cornell University, Ithaca, NY, USA Author
  • Minhao Li Master of Science in Computer Engineering, University of California, Davis, Davis, CA, USA Author
  • Fanyi Zhao Computer Science, Stevens Institute of Technology, Hoboken, NJ, USA Author

Keywords:

large language models, hallucination mitigation, retrieval-augmented generation, factuality evaluation

Abstract

Large language models (LLMs) have achieved remarkable performance across natural language tasks, yet their tendency to generate factually incorrect content --- commonly termed hallucination --- remains a critical barrier to deployment in high-stakes domains. Two dominant families of mitigation strategies have emerged: retrieval-augmented generation (RAG) approaches that ground outputs in external knowledge, and prompting-based approaches that leverage self-verification without external retrieval. While both families have demonstrated promising results individually, no systematic comparative evaluation exists across standardized benchmarks under unified conditions. This paper presents a comparative empirical analysis of hallucination mitigation strategies spanning four RAG variants (Naive RAG, Self-RAG, Corrective RAG, FLARE) and three prompting-based methods (Chain-of-Verification, self-consistency decoding, self-contradiction detection) evaluated on five public benchmarks: TruthfulQA, HaluEval, FActScore, FELM, and RAGBench. Drawing exclusively from published experimental results, the analysis reveals that advanced RAG strategies achieve 10--25 percentage-point improvements in factual precision over naive baselines, while prompting-based methods offer competitive performance on reasoning-intensive tasks without retrieval infrastructure. Task-dependent performance patterns emerge: knowledge-intensive factoid tasks favor retrieval augmentation, whereas logical consistency tasks benefit from self-verification prompting. A practical decision matrix is derived to guide practitioners in selecting appropriate strategies based on task characteristics and resource constraints.

Downloads

Published

2026-05-06