Abstract
Medical Visual Question Answering enables large language models to answer questions related to clinical images. While domain-specific LLMs are capable of strong reasoning, their development can be costly. In contrast, general-purpose models are more efficient, but often lack deep understanding. Previous research has shown that integrating external knowledge enhances the performance of general-purpose LLMs, particularly for questions that involve complex medical terminology. To improve the utilization of external knowledge, we introduce a novel multimodal knowledge space pretraining method trained with the proposed Balanced Multimodal Contrastive Learning Loss. Our approach optimizes knowledge spaces through balanced contrastive learning across modalities, together with the auxiliary classification task. Additionally, we developed a novel framework to improve knowledge-driven Medical VQA for LLMs by integrating the pretrained knowledge space. Experiments on the Slake, VQA-RAD, and PathVQA datasets demonstrate that our approach outperforms state-of-the-art Medical VQA methods with an average accuracy of 85.8%, 76.7%, and 60.0%, respectively. The source code is available at https://github.com/yaziciz/BaMCo.
| Original language | English |
|---|---|
| Title of host publication | Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - 28th International Conference, Proceedings |
| Editors | James C. Gee, Jaesung Hong, Carole H. Sudre, Polina Golland, Daniel C. Alexander, Juan Eugenio Iglesias, Archana Venkataraman, Jong Hyo Kim |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 77-87 |
| Number of pages | 11 |
| ISBN (Print) | 9783032049803 |
| DOIs | |
| Publication status | Published - 2026 |
| Event | 28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - Daejeon, Korea, Republic of Duration: 23 Sept 2025 → 27 Sept 2025 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 15966 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Daejeon |
| Period | 23/09/25 → 27/09/25 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
Keywords
- Knowledge Space
- Medical VQA
- Multimodal LLMs