Redian新闻
>
转让CS/EE TCAD journal审稿机会
avatar
转让CS/EE TCAD journal审稿机会# Immigration - 落地生根
t*e
1
Manuscript ID TCAD-2015-0250 entitled "An Accurate GPU Performance Model for
Effective Control Flow Divergence Optimization" with Prof. Liang as contact
author has been submitted to the Transactions on Computer-Aided Design of
Integrated Circuits and Systems.
The abstract appears at the end of this letter, along with the names of the
authors.
.....
MANUSCRIPT DETAILS
TITLE: An Accurate GPU Performance Model for Effective Control Flow
Divergence Optimization
AUTHORS: Liang, Yun; Satria, Muhammad; Rupnow, Kyle; Chen, Deming
ABSTRACT: Graphic processing units (GPUs) are composed of a group of single-
instruction multiple data (SIMD) streaming multiprocessors (SMs).
GPUs are able to efficiently execute highly data parallel tasks through SIMD
execution on the SMs. However, if those threads take diverging control
paths, all divergent paths are executed serially. In the worst case, every
thread takes a different control path and the highly parallel architecture
is used serially by each thread. This control flow divergence problem is
well known in GPU development; code transformation, memory access
redirection, and data layout reorganization are commonly used to reduce the
impact of divergence. These techniques attempt to eliminate divergence by
grouping together threads or data to ensure identical behavior.
However, prior efforts using these techniques do not model the performance
impact of any particular divergence or consider that complete elimination of
divergence may not be possible. Thus, we perform analysis of the
performance impact of divergence and potential thread regrouping algorithms
that eliminate divergence or minimize the impact of remaining divergence.
Finally, we develop a divergence optimization framework that analyzes and
transforms the kernel at compile-time and regroups the threads at run-time.
Our proposed metrics achieve performance estimation accuracy within 6.2% of
measured performance. Using these metrics, we develop thread regrouping
algorithms, which consider the impact of divergence, and speed up kernel
execution up to 4.7X on an NVIDIA GTX480.
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。