Abstract
Modern GPUs rely on atomic operations to perform global communication. These atomic operations can be used to construct finer-grained locks to provide support for mutual exclusion. However, equipped with only these basic synchronization primitives to support mutual exclusion results in inefficient use of resources. In this paper, we propose a new hardware-based blocking synchronization mechanism which uses hierarchical queuing for scalability and efficiency. We evaluate our design using a set of GPU applications for stressing synchronization mechanisms. We perform detailed simulation utilizing the Multi2Sim heterogeneous simulation infrastructure. Our results indicate that we can reduce the number of instructions executed by a GPU application by as much as 84%, while improving execution performance by as much as 73%.
Original language | English |
---|---|
Pages | 475-486 |
Number of pages | 12 |
DOIs | |
Publication status | Published - 2013 |
Externally published | Yes |
Event | 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 - Boston, MA, United States Duration: 20 May 2013 → 24 May 2013 |
Conference
Conference | 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 |
---|---|
Country/Territory | United States |
City | Boston, MA |
Period | 20/05/13 → 24/05/13 |
Keywords
- GPUs
- Mutual-exclusion
- synchronization