Characterization and exploitation of nested parallelism and concurrent kernel execution to accelerate high performance applications
PhD Thesis: Northeastern University |
Over the past decade, GPU computing has evolved from being a simple task of mapping data-parallel kernels to Single Instruction Multiple Thread (SIMT) hardware, to a more complex challenge, mapping multiple complex, and potentially irregular, kernels to more powerful and sophisticated many-core engines. Further, recent advances in GPU architectures, including support for advanced features such as nested parallelism and concurrent kernel execution, further complicate the mapping task.