say push file with size F to N nodes, each node has download speed and
upload speed di and si, the lower bound of completion time is max{F/u_s, F/
min{di}, NF/(u_s + sum{ui})} and I think this lower bound can be achieved
closely.
example, F=16G, N=16, divide the file into 16 pieces
after 1s: each node has one piece
after 1s: each node has two pieces by exchanging (0-1, 2-3, etc)
after 2s: each node has four ((01)-(23), etc)
after 4s: each node has eight
after 8s: each node has 16
total 16s