Theses and Dissertations

ORCID

https://orcid.org/0000-0003-2289-0780

Advisor

Luke, Edward A.

Committee Member

Jankun-Kelly, T.J.

Committee Member

Banicescu, Ioana

Committee Member

Zope, Anup

Date of Degree

8-9-2022

Document Type

Graduate Thesis - Open Access

Major

Computer Science

Degree Name

Master of Science (M.S.)

College

James Worth Bagley College of Engineering

Department

Department of Computer Science and Engineering

Abstract

Irregular applications, such as unstructured mesh operations, do not easily map onto the typical GPU programming paradigms endorsed by GPU manufacturers, which mostly focus on maximizing concurrency for latency hiding. In this work, we show how alternative techniques focused on latency amortization can be used to control overall latency while requiring less concurrency. We used a custom-built microbenchmarking framework to test several GPU kernels and show how the GPU behaves under relevant workloads. We demonstrate that coalescing is not required for efficacious performance; an uncoalesced access pattern can achieve high bandwidth - even over 80% of the theoretical global memory bandwidth in certain circumstances. We also make other further observations on specific relevant behaviors of GPUs. We hope that this study opens the door for further investigation into techniques that can exploit latency amortization when latency hiding does not achieve sufficient performance.

Share

COinS