Efficient Semantic Retrieval via Multilingual Embeddings and Reranking

  • Rabia Eda Yilmaz
  • , Mehmet Anil Taysi
  • , Ayse Irem Ozmen
  • , Gokhan Ince

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Retrieving semantically relevant information from large-scale multilingual corpora remains a fundamental challenge in information retrieval. While dense retrieval with multilingual sentence encoders enables scalable matching, it often lacks the granularity required for accurate top-k ranking, particularly in domain-specific or noisy query contexts. This paper presents a compact, cascaded retrieval framework that integrates dense multilingual encoders with a lightweight reranking module, all operating under a sub-7B parameter constraint. Evaluations are conducted on two representative English subsets of the BEIR benchmark, FIQA (financial-domain, long-form queries) and Quora (short, general-domain paraphrases). Results demonstrate that reranking provides substantial improvements for mid-tier models in challenging settings (e.g., +13% MAP@10 for e5-base on FIQA), and modest but consistent gains even in cleaner retrieval scenarios. Through ablation on reranking components, we show their critical role in refining semantic alignment, particularly for models with weaker initial rankings. Overall, the proposed framework achieves robust, high-accuracy retrieval with minimal computational overhead, offering a practical solution for scalable deployment in real-world, resource-constrained environments.

Original languageEnglish
Title of host publication19th International Conference on Innovations in Intelligent Systems and Applications, INISTA 2025 - Proceedings
EditorsSchahram Dustdar, Tulay Yildirim, Mahmoud Barhamgi, Elio Masciari, Yannis Manolopoulos
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331570248
DOIs
Publication statusPublished - 2025
Event19th International Conference on Innovations in Intelligent Systems and Applications, INISTA 2025 - Ras Al Khaimah, United Arab Emirates
Duration: 29 Oct 202531 Oct 2025

Publication series

Name19th International Conference on Innovations in Intelligent Systems and Applications, INISTA 2025 - Proceedings

Conference

Conference19th International Conference on Innovations in Intelligent Systems and Applications, INISTA 2025
Country/TerritoryUnited Arab Emirates
CityRas Al Khaimah
Period29/10/2531/10/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Keywords

  • Dense Representations
  • Information Retrieval
  • Multilingual Embeddings
  • Reranking
  • Semantic Retrieval

Fingerprint

Dive into the research topics of 'Efficient Semantic Retrieval via Multilingual Embeddings and Reranking'. Together they form a unique fingerprint.

Cite this