BaMCo: Balanced Multimodal Contrastive Learning for Knowledge-Driven Medical VQA

Ziya Ata Yazici*, Hazim Kemal Ekenel

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Medical Visual Question Answering enables large language models to answer questions related to clinical images. While domain-specific LLMs are capable of strong reasoning, their development can be costly. In contrast, general-purpose models are more efficient, but often lack deep understanding. Previous research has shown that integrating external knowledge enhances the performance of general-purpose LLMs, particularly for questions that involve complex medical terminology. To improve the utilization of external knowledge, we introduce a novel multimodal knowledge space pretraining method trained with the proposed Balanced Multimodal Contrastive Learning Loss. Our approach optimizes knowledge spaces through balanced contrastive learning across modalities, together with the auxiliary classification task. Additionally, we developed a novel framework to improve knowledge-driven Medical VQA for LLMs by integrating the pretrained knowledge space. Experiments on the Slake, VQA-RAD, and PathVQA datasets demonstrate that our approach outperforms state-of-the-art Medical VQA methods with an average accuracy of 85.8%, 76.7%, and 60.0%, respectively. The source code is available at https://github.com/yaziciz/BaMCo.

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention, MICCAI 2025 - 28th International Conference, Proceedings
EditorsJames C. Gee, Jaesung Hong, Carole H. Sudre, Polina Golland, Daniel C. Alexander, Juan Eugenio Iglesias, Archana Venkataraman, Jong Hyo Kim
PublisherSpringer Science and Business Media Deutschland GmbH
Pages77-87
Number of pages11
ISBN (Print)9783032049803
DOIs
Publication statusPublished - 2026
Event28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - Daejeon, Korea, Republic of
Duration: 23 Sept 202527 Sept 2025

Publication series

NameLecture Notes in Computer Science
Volume15966 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025
Country/TerritoryKorea, Republic of
CityDaejeon
Period23/09/2527/09/25

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

Keywords

  • Knowledge Space
  • Medical VQA
  • Multimodal LLMs

Fingerprint

Dive into the research topics of 'BaMCo: Balanced Multimodal Contrastive Learning for Knowledge-Driven Medical VQA'. Together they form a unique fingerprint.

Cite this