Özet
Introduction: The adoption of Large Language Models (LLMs) in search systems necessitates new evaluation methodologies beyond traditional rule-based or manual approaches. Methods: We propose a general framework for evaluating structured outputs using LLMs, focusing on search query parsing within an online classified platform. Our approach leverages LLMs' contextual reasoning capabilities through three evaluation methodologies: Pointwise, Pairwise, and Pass/Fail assessments. Additionally, we introduce a Contextual Evaluation Prompt Routing strategy to improve reliability and reduce hallucinations. Results: Experiments conducted on both small- and large-scale datasets demonstrate that LLM-based evaluation achieves approximately 90% agreement with human judgments. Discussion: These results validate LLM-driven evaluation as a scalable, interpretable, and effective alternative to traditional evaluation methods, providing robust query parsing for real-world search systems.
| Orijinal dil | İngilizce |
|---|---|
| Makale numarası | 1611389 |
| Dergi | Frontiers in Big Data |
| Hacim | 8 |
| DOI'lar | |
| Yayın durumu | Yayınlandı - 2025 |
Bibliyografik not
Publisher Copyright:Copyright © 2025 Baysan, Uysal, İşlek, Çığ Karaman and Güngör.
Parmak izi
LLM-as-a-Judge: automated evaluation of search query parsing using large language models' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.Alıntı Yap
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver