Özet
The core of current-generation high-performance multiprocessor systems is out-of-order execution processors with aggressive branch prediction. Despite their relatively high branch prediction accuracy, these processors still execute many memory instructions down mispredicted paths. Previous work that focused on uniprocessors showed that these wrong-path (WP) memory references may pollute the caches and increase the amount of cache and memory traffic. On the positive side, however, they may prefetch data into the caches for memory references on the correct-path. While computer architects have thoroughly studied the impact of WP effects in uniprocessor systems, there is no comparable work for multiprocessor systems. In this paper, we explore the effects of WP memory references on the memory system behavior of shared-memory multiprocessor (SMP) systems for both broadcast and directory-based cache coherence. Our results show that these WP memory references can increase the amount of cache-to-cache transfers by 32%, invalidations by 8% and 20% for broadcast and directory-based SMPs, respectively, and the number of writebacks by up to 67% for both systems. In addition to the extra coherence traffic, WP memory references also increase the number of cache line state transitions by 21% and 32% for broadcast and directory-based SMPs, respectively. In order to reduce the performance impact of these WP memory references, we introduce two simple mechanisms-filtering WP blocks that are not likely-to-be-used and WP aware cache replacement-that yield speedups of up to 37%.
Orijinal dil | İngilizce |
---|---|
Sayfa (başlangıç-bitiş) | 1256-1269 |
Sayfa sayısı | 14 |
Dergi | Journal of Parallel and Distributed Computing |
Hacim | 67 |
Basın numarası | 12 |
DOI'lar | |
Yayın durumu | Yayınlandı - Ara 2007 |
Harici olarak yayınlandı | Evet |
Finansman
This research is supported in part by US National Science Foundation grant CCF-0541162. We would like to thank Babak Falsafi and Thomas Wenisch for supplying us with the em3d benchmark. A preliminary version of this work was presented at the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006) [31].
Finansörler | Finansör numarası |
---|---|
US National Science Foundation | CCF-0541162 |