Ana gezinime geç Aramaya geç Ana içeriğe geç

LLM-as-a-Judge: automated evaluation of search query parsing using large language models

  • Mehmet Selman Baysan*
  • , Serkan Uysal
  • , İrem İşlek
  • , Çağla Çığ Karaman
  • , Tunga Güngör
  • *Bu çalışma için yazışmadan sorumlu yazar
  • Bogazici University

Araştırma sonucu: Dergiye katkıMakalebilirkişi

6 Atıf (Scopus)

Özet

Introduction: The adoption of Large Language Models (LLMs) in search systems necessitates new evaluation methodologies beyond traditional rule-based or manual approaches. Methods: We propose a general framework for evaluating structured outputs using LLMs, focusing on search query parsing within an online classified platform. Our approach leverages LLMs' contextual reasoning capabilities through three evaluation methodologies: Pointwise, Pairwise, and Pass/Fail assessments. Additionally, we introduce a Contextual Evaluation Prompt Routing strategy to improve reliability and reduce hallucinations. Results: Experiments conducted on both small- and large-scale datasets demonstrate that LLM-based evaluation achieves approximately 90% agreement with human judgments. Discussion: These results validate LLM-driven evaluation as a scalable, interpretable, and effective alternative to traditional evaluation methods, providing robust query parsing for real-world search systems.

Orijinal dilİngilizce
Makale numarası1611389
DergiFrontiers in Big Data
Hacim8
DOI'lar
Yayın durumuYayınlandı - 2025

Bibliyografik not

Publisher Copyright:
Copyright © 2025 Baysan, Uysal, İşlek, Çığ Karaman and Güngör.

Parmak izi

LLM-as-a-Judge: automated evaluation of search query parsing using large language models' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Alıntı Yap