Author

Anup D Zope

Issuing Body

Mississippi State University

Advisor

Luke, Edward A.

Committee Member

Janus, J. Mark

Committee Member

Young, Maxwell

Committee Member

Kim, Seongjai

Committee Member

Bhushan, Shanti

Date of Degree

1-1-2019

Document Type

Dissertation - Open Access

Degree Name

Doctor of Philosophy

Abstract

In this work, a model of computation for shared memory parallelism is presented. To address fundamental constraints of modern memory systems, the presented model constrains how parallelism interacts with memory access patterns and in doing so provides a method for design and analysis of algorithms that estimates reliable execution time based on a few architectural parameters. This model is presented as an alternative to modern thread based models that focus on computational concurrency but rely on reactive hardware policies to hide and amortize memory latency. Since modern processors use reactive mechanisms and heuristics to deduce the data access requirement of computations, the memory access costs of these threaded programs may be difficult to predict reliably. This research presents the Queue Streaming Model (QSM) that aims to address these shortcomings by providing a prescriptive mechanism to achieve latency-amortized and predictable-cost data access. Further, the work presents application of the QSM to algorithms commonly used in a number of applications. These algorithms include structured regular computations represented by merge sort, unstructured irregular computations represented by sparse matrix dense vector multiplication, and dynamic computations represented by MapReduce. The analysis of these algorithms reveal architectural tradeoffs between memory system bottlenecks and algorithm design. The techniques described in this dissertation reveal a general software approach that could be used to construct more general irregular applications, provided they can be transformed into a relational query form. It demonstrates that the QSM can be used to design algorithms that enhance utilization of memory system resources by structuring concurrency and memory accesses such that system bandwidths are balanced and latency is amortized. Finally, the benefit of applying the QSM algorithm to the Euler inviscid flow solver is demonstrated through experiments on the Intel(R) Xeon(R) E5-2680 v2 processor using ten cores. The transformation produced a speed-up of 25% over an optimized OpenMP implementation having identical computational structure.

URI

https://hdl.handle.net/11668/21146

Comments

queue streaming model||bridging model of computation||streaming access||performance predictability and portability

Share

COinS