Abstract
Deep learning architectures like Transformers and Convolutional Neural Networks (CNNs) have led to ground-breaking advances across numerous fields. However, their extensive need for parameters poses challenges for implementation in environments with limited resources. In our research, we propose a strategy that focuses on the utilization of the column and row spaces of weight matrices, significantly reducing the number of required model parameters without substantially affecting performance. This technique is applied to both Bottleneck and Attention layers, achieving a notable reduction in parameters with minimal impact on model efficacy. Our proposed model, HaLViT, exemplifies a parameter-efficient Vision Transformer. Through rigorous experiments on the ImageNet dataset and COCO dataset, HaLViT's performance validates the effectiveness of our method, offering results comparable to those of conventional models.
Original language | English |
---|---|
Title of host publication | Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 |
Publisher | IEEE Computer Society |
Pages | 3669-3678 |
Number of pages | 10 |
ISBN (Electronic) | 9798350365474 |
DOIs | |
Publication status | Published - 2024 |
Event | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 - Seattle, United States Duration: 16 Jun 2024 → 22 Jun 2024 |
Publication series
Name | IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops |
---|---|
ISSN (Print) | 2160-7508 |
ISSN (Electronic) | 2160-7516 |
Conference
Conference | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 16/06/24 → 22/06/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- CNN
- Efficient
- Parameter
- Transformer
- ViT
- Vision Transformer
- computer vision
- deep learning