我被老黄骗了# Programming - 葵花宝典
w*r
1 楼
刚用nccl测了一下multi-gpu的性能,两块titan v的性能居然只有两块1080的1/3。我
没有敲错,是的,还不如1080。。。
Following the test at https://github.com/NVIDIA/nccl, I got about 1/3 the
performance as before, compared to my dual gtx1080 setup. The inter-GPU
communication performance seems very bad! Any insight?
-- command I used is "./all_reduce_test 10000000 2 0 1"
# Using devices
# Rank 0 uses device 0 [0x19] TITAN V
# Rank 1 uses device 1 [0x1a] TITAN V
# out-of-place
in-place
# bytes N type op time algbw busbw res
time algbw busbw res
10000000 10000000 char sum 2.739 3.65 3.65 0e+00
2.759 3.63 3.63 0e+00
10000000 10000000 char prod 2.741 3.65 3.65 0e+00
2.759 3.62 3.62 0e+00
10000000 10000000 char max 2.743 3.65 3.65 0e+00
2.754 3.63 3.63 0e+00
10000000 10000000 char min 2.742 3.65 3.65 0e+00
2.765 3.62 3.62 0e+00
# Using devices
# Rank 0 uses device 0 [0x02] GeForce GTX 1080
# Rank 1 uses device 1 [0x03] GeForce GTX 1080
# out-of-place
in-place
# bytes N type op time algbw busbw res
time algbw busbw res
10000000 10000000 char sum 1.077 9.28 9.28 0e+00
1.092 9.16 9.16 0e+00
10000000 10000000 char prod 1.194 8.38 8.38 0e+00
1.105 9.05 9.05 0e+00
10000000 10000000 char max 1.181 8.47 8.47 0e+00
1.097 9.12 9.12 0e+00
10000000 10000000 char min 1.182 8.46 8.46 0e+00
1.100 9.09 9.09 0e+00
没有敲错,是的,还不如1080。。。
Following the test at https://github.com/NVIDIA/nccl, I got about 1/3 the
performance as before, compared to my dual gtx1080 setup. The inter-GPU
communication performance seems very bad! Any insight?
-- command I used is "./all_reduce_test 10000000 2 0 1"
# Using devices
# Rank 0 uses device 0 [0x19] TITAN V
# Rank 1 uses device 1 [0x1a] TITAN V
# out-of-place
in-place
# bytes N type op time algbw busbw res
time algbw busbw res
10000000 10000000 char sum 2.739 3.65 3.65 0e+00
2.759 3.63 3.63 0e+00
10000000 10000000 char prod 2.741 3.65 3.65 0e+00
2.759 3.62 3.62 0e+00
10000000 10000000 char max 2.743 3.65 3.65 0e+00
2.754 3.63 3.63 0e+00
10000000 10000000 char min 2.742 3.65 3.65 0e+00
2.765 3.62 3.62 0e+00
# Using devices
# Rank 0 uses device 0 [0x02] GeForce GTX 1080
# Rank 1 uses device 1 [0x03] GeForce GTX 1080
# out-of-place
in-place
# bytes N type op time algbw busbw res
time algbw busbw res
10000000 10000000 char sum 1.077 9.28 9.28 0e+00
1.092 9.16 9.16 0e+00
10000000 10000000 char prod 1.194 8.38 8.38 0e+00
1.105 9.05 9.05 0e+00
10000000 10000000 char max 1.181 8.47 8.47 0e+00
1.097 9.12 9.12 0e+00
10000000 10000000 char min 1.182 8.46 8.46 0e+00
1.100 9.09 9.09 0e+00