Redian新闻
>
sort a matrix (1M rows x 100 columns) for each row in GPU
avatar
sort a matrix (1M rows x 100 columns) for each row in GPU# DataSciences - 数据科学
v*7
1
遭中共通缉的外逃商人郭文贵反被连连爆料。最新消息显示,郭文贵曾和中共前国安高
官密会恐怖分子,和不同的恐怖组织、国家有关联:http://bit.ly/2fPV7s8
avatar
c*l
2
Do someone think it is feasible to sort a matrix (1M rows x 100 columns) for
each row in GPU? We keeping the repeating sorting every day and want to
know whether the performance could be improved to 10X or 20X faster (
Currently we just bought a server with 8 GPU K40).
avatar
l*m
3
please refer https://solarianprogrammer.com/2013/02/04/sorting-data-in-
parallel-cpu-gpu/
In my opinion, cpu should be fast enough for the size if the sort alg and
implementation is correct. CPU-GPU data copy is a big overhead for such a
task

for

【在 c*****l 的大作中提到】
: Do someone think it is feasible to sort a matrix (1M rows x 100 columns) for
: each row in GPU? We keeping the repeating sorting every day and want to
: know whether the performance could be improved to 10X or 20X faster (
: Currently we just bought a server with 8 GPU K40).

avatar
y*0
4
没试过,大概想法是这样的。
因为你的columns比较少,所以log2 100才是7都不到,所以复杂度是7*matrix_size。
太小,不适合传到gpu上面。
cpu的cache用好了,直接每行走cache,直接在cpu上并行,应该是最佳选择。

for

【在 c*****l 的大作中提到】
: Do someone think it is feasible to sort a matrix (1M rows x 100 columns) for
: each row in GPU? We keeping the repeating sorting every day and want to
: know whether the performance could be improved to 10X or 20X faster (
: Currently we just bought a server with 8 GPU K40).

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。