Luke, Edward A.
Date of Degree
Graduate Thesis - Open Access
Master of Science (M.S.)
James Worth Bagley College of Engineering
Department of Computer Science and Engineering
Irregular applications, such as unstructured mesh operations, do not easily map onto the typical GPU programming paradigms endorsed by GPU manufacturers, which mostly focus on maximizing concurrency for latency hiding. In this work, we show how alternative techniques focused on latency amortization can be used to control overall latency while requiring less concurrency. We used a custom-built microbenchmarking framework to test several GPU kernels and show how the GPU behaves under relevant workloads. We demonstrate that coalescing is not required for efficacious performance; an uncoalesced access pattern can achieve high bandwidth - even over 80% of the theoretical global memory bandwidth in certain circumstances. We also make other further observations on specific relevant behaviors of GPUs. We hope that this study opens the door for further investigation into techniques that can exploit latency amortization when latency hiding does not achieve sufficient performance.
Winans-Pruitt, Dalton R., "GPGPU microbenchmarking for irregular application optimization" (2022). Theses and Dissertations. 5591.