Ana gezinime geç Aramaya geç Ana içeriğe geç

Evaluating Biases in Large Language Models Over Time: A Framework With a GPT Case Study on Political Bias

  • Meltem Aksoy
  • , Erik Weber
  • , Jérôme Rutinowski*
  • , Niklas Jost
  • , Markus Pauly
  • *Bu çalışma için yazışmadan sorumlu yazar

Araştırma sonucu: Dergiye katkıMakalebilirkişi

Özet

Large Language Models (LLMs) have repeatedly been shown to reflect systematic biases. At the same time, commercial LLMs are updated at a rapid rate, often without notice to end-users, so that a bias profile captured today may already be outdated tomorrow. However, the literature still leans heavily on one-shot evaluations of single model versions, leaving a gap in our understanding of how biases evolve over time and how they should be monitored. We address this gap by introducing a framework for longitudinal evaluation of biases in LLMs, focusing on political bias as a case study. The framework is model-agnostic, reproducible, and user-friendly. It consists of (i) locking model versions via dated identifiers to guarantee temporal comparability, (ii) multi-prompt questionnaires on position statements to analyze potential biases; and (iii) a longitudinal statistical evaluation framework that quantifies and infers absolute bias and drifts between models. Moreover, we suggest conducting (iv) cross-questionnaire correlation analyses to reveal orthogonal biases, as well as (v) sensitivity analyses on the model's role-assignment mechanisms to analyze robustness to concrete instructions. All code, prompts, and outputs are openly available to facilitate replication and extension to other bias analyses. To illustrate the framework, we investigate the political biases and personality traits of ChatGPT, specifically comparing GPT-3.5, GPT-4, GPT-4o, and GPT-5.2. In addition, the ability of the models to emulate political viewpoints (e.g., liberal or conservative positions) is analyzed. Across 4000 generated answers, we observe clear political shifts between versions: While newer models appear less left-leaning, they still mimic progressive personality profiles and exhibit biases. These findings demonstrate the persistence and transformation of biases across updates, underlining the need for longitudinal monitoring.

Orijinal dilİngilizce
Makale numarasıe70078
DergiApplied Stochastic Models in Business and Industry
Hacim42
Basın numarası2
DOI'lar
Yayın durumuYayınlandı - 1 Mar 2026
Harici olarak yayınlandıEvet

Bibliyografik not

Publisher Copyright:
© 2026 The Author(s). Applied Stochastic Models in Business and Industry published by John Wiley & Sons Ltd.

Parmak izi

Evaluating Biases in Large Language Models Over Time: A Framework With a GPT Case Study on Political Bias' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Alıntı Yap