求教网络大侠，如何提高网卡吞吐量 - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>Programming - 葵花宝典

求教网络大侠，如何提高网卡吞吐量

求教网络大侠，如何提高网卡吞吐量# Programming - 葵花宝典

z*i2018-03-01 08:03

1 楼

现在是engineer,年底很可能升manager，485在12月5日刚交上去。如果升了需要重新弄
H1-B, perm或者485么？多谢！

i*t2018-03-01 08:03

2 楼

那他以后用的什么塔？

v*e2018-03-01 08:03

3 楼

【以下文字转载自 Chinook 俱乐部】
发信人: vankie (新浪微博@洛城王二), 信区: Chinook
标题: 俺是真正的果粉
发信站: BBS 未名空间站 (Thu Mar 15 21:10:54 2012, 美东)
用得上的苹果产品全入了，现在只好往周边发展了

d*n2018-03-01 08:03

4 楼

今天终于点不亮了，按下电源几秒钟就重启，看来不是电源挂了就是主板挂了。
最近p67是不是还不行啊？
要不干脆走廉价路线，amd那套？
还是死马当作活马医，换块主板试试？

d*h2018-03-01 08:03

5 楼

I have a windows application. My network support 10 Gig (bit). However I can
only push the network throughput put to 200M Byte (1.6G bit).
But my CPU and memory are both low, CPU is only 50% and memory is only 40%.
Where should I optimize in order to increase the network throughput?
I don't use any disk I/O. Everything is in the memory.
My application is doing a broadcast function.
I receive the data from a single connection. Then I broadcast to 1000+
clients (using websocket tcp connections).

d*r2018-03-01 08:03

6 楼

你太天真了，这种Made In China的塔他家还不是备了几百个。

w*22018-03-01 08:03

7 楼

看pixar 看多了。

A*s2018-03-01 08:03

8 楼

amd吧

【在 d****n 的大作中提到】

: 今天终于点不亮了，按下电源几秒钟就重启，看来不是电源挂了就是主板挂了。
: 最近p67是不是还不行啊？
: 要不干脆走廉价路线，amd那套？
: 还是死马当作活马医，换块主板试试？

p*o2018-03-01 08:03

9 楼

CPU is full at 50%. You need better cpu or more cores.

can

【在 d****h 的大作中提到】

: I have a windows application. My network support 10 Gig (bit). However I can
: only push the network throughput put to 200M Byte (1.6G bit).
: But my CPU and memory are both low, CPU is only 50% and memory is only 40%.
: Where should I optimize in order to increase the network throughput?
: I don't use any disk I/O. Everything is in the memory.
: My application is doing a broadcast function.
: I receive the data from a single connection. Then I broadcast to 1000+
: clients (using websocket tcp connections).

I*s2018-03-01 08:03

10 楼

人家有formula造塔的，其实应该叫他造塔天王。

M*n2018-03-01 08:03

11 楼

自打我的那盏灯坏了之后，我总觉得这种台灯容易坏，

d*n2018-03-01 08:03

12 楼

microcenter还有低价combo不？

【在 A*****s 的大作中提到】

: amd吧

w*g2018-03-01 08:03

13 楼

I think hyperthreading just jumps from 50% to 100% very quickly, but
one still sees 100%.
Many factors can get one below 10Gbps. Not enabling jumbo frame is one
of those. I would seriously consider DPDK or f-stack for 10Gbps service
programming.

【在 p***o 的大作中提到】

: CPU is full at 50%. You need better cpu or more cores.
:
: can

p*e2018-03-01 08:03

14 楼

真是尖锐的问题

v*e2018-03-01 08:03

15 楼

Luxo应该不容易坏吧，industrial grade的东西。

【在 M*****n 的大作中提到】

: 自打我的那盏灯坏了之后，我总觉得这种台灯容易坏，

A*s2018-03-01 08:03

16 楼

从没享受过mc

【在 d****n 的大作中提到】

: microcenter还有低价combo不？

e*g2018-03-01 08:03

17 楼

有可能是因为Send调用次数太多，你把几次的数据收集一下再一次性Send出去试试。

m*x2018-03-01 08:03

18 楼

缝一年，补一年，缝缝补补又一年

M*n2018-03-01 08:03

19 楼

俺的不是，样子一样，但不是这个牌子，也许是山寨的，

【在 v****e 的大作中提到】

: Luxo应该不容易坏吧，industrial grade的东西。

t*s2018-03-01 08:03

20 楼

要查你们店的availability吧

【在 d****n 的大作中提到】

: microcenter还有低价combo不？

d*h2018-03-01 08:03

21 楼

Thank you all. In my case,
1. I can't use jumbo frame because I need to broadcast to thousands of
client across internet. Not all switch will support jumbo frame.
2. I validated that the CPU is stable at 50%, it didn't go up or down.
3. I can't reduce the # of sending because of the real time requirement.
I am using the async call.
Is there any network perf counter I should watch for the network throughput?

j*u2018-03-01 08:03

22 楼

lol

【在 d********r 的大作中提到】

: 你太天真了，这种Made In China的塔他家还不是备了几百个。

v*e2018-03-01 08:03

23 楼

哦，这个要接近200块钱呢，应该比较经造。

【在 M*****n 的大作中提到】

: 俺的不是，样子一样，但不是这个牌子，也许是山寨的，

d*n2018-03-01 08:03

24 楼

上次不是有人说已经结束了么？如果我在网上可以下单，是不是到店里一定有啊？
sigh，又要装windows了。

l*02018-03-01 08:03

25 楼

这个靠谱。
或者尝试增大mtu。

【在 e*****g 的大作中提到】

: 有可能是因为Send调用次数太多，你把几次的数据收集一下再一次性Send出去试试。

r*k2018-03-01 08:03

26 楼

楼主明显没有看过多少修仙的书，很多法宝在斗法中看上去碎了，马上又能复原，只是
灵性大失，需要一定时间温养而已。

f*r2018-03-01 08:03

27 楼

不错，正想买个类似的灯

d*h2018-03-01 08:03

28 楼

K*r2018-03-01 08:03

29 楼

塔是如来赐给他降服哪吒用的
瓷的碎了都能粘起来何况宝物

★ 发自iPhone App: ChineseWeb 8.2.2

【在 i******t 的大作中提到】

: 那他以后用的什么塔？

l*t2018-03-01 08:03

30 楼

我倒。这是你买的房子？这paint是啥颜色啊？这窗帘是啥质地啊？真给我们果轮丢人
啊。。。省下买果子的钱请个interior designer先。

【在 v****e 的大作中提到】

: 哦，这个要接近200块钱呢，应该比较经造。

d*h2018-03-01 08:03

31 楼

Can you tell more why "send" will cause the trouble?
I know send will cost CPU/Memory. But given my CPU/Memory is not saturated,
I will think I should be able to do more send.

【在 l**********0 的大作中提到】

: 这个靠谱。
: 或者尝试增大mtu。

l*u2018-03-01 08:03

32 楼

神仙撒，碎塔复原跟玩儿似的　：）

【在 i******t 的大作中提到】

: 那他以后用的什么塔？

L*t2018-03-01 08:03

33 楼

擦灰比较麻烦

【在 v****e 的大作中提到】

: 哦，这个要接近200块钱呢，应该比较经造。

e*g2018-03-01 08:03

34 楼

3. I can't reduce the # of sending because of the real time requirement.
I am using the async call.
============================
异步调用也不是问题，通常异步调用都有个completion event call back, 如果有
pending就别再send了，放buffer里收集起来，等completion event call back时再一
次性把累积的数据发送出去。

l*02018-03-01 08:03

35 楼

就是利用buffer降低发送次数而减少tcp连接的开销。这是常见的low latency和high
throighput的trade off。

,

【在 d****h 的大作中提到】

: Can you tell more why "send" will cause the trouble?
: I know send will cost CPU/Memory. But given my CPU/Memory is not saturated,
: I will think I should be able to do more send.

d*h2018-03-01 08:03

36 楼

Yes, that's what I am doing. I am using asyncawait. I will await the
websocket send. And only after await return, i will send the next data.

【在 e*****g 的大作中提到】

: 3. I can't reduce the # of sending because of the real time requirement.
: I am using the async call.
: ============================
: 异步调用也不是问题，通常异步调用都有个completion event call back, 如果有
: pending就别再send了，放buffer里收集起来，等completion event call back时再一
: 次性把累积的数据发送出去。

d*h2018-03-01 08:03

37 楼

I need to send the data in real time. If I use the buffer to collect the
data, even a few ms, the latency will be too high for my client.

【在 l**********0 的大作中提到】

: 就是利用buffer降低发送次数而减少tcp连接的开销。这是常见的low latency和high
: throighput的trade off。
:
: ,

m*u2018-03-01 08:03

38 楼

尽管原则上我们不应该找别人的问题，但你确信你的client 动作够麻利？

d*a2018-03-01 08:03

39 楼

程序是单线程的吗？听起来象是双核处理器，程序只用了一个核。这可能是瓶颈。
程序是用什么语言写的呢？

...
...

【在 d****h 的大作中提到】

: Thank you all. In my case,
: 1. I can't use jumbo frame because I need to broadcast to thousands of
: client across internet. Not all switch will support jumbo frame.
: 2. I validated that the CPU is stable at 50%, it didn't go up or down.
: 3. I can't reduce the # of sending because of the real time requirement.
: I am using the async call.
: Is there any network perf counter I should watch for the network throughput?

d*h2018-03-01 08:03

40 楼

This is a win2016 server. 4 physical core and 8 logical core.
I am using c# .net 4.6, websocket programming.
Yes, it is multiple thread. I am using .NET TPL (Task Parallel Library).

【在 d***a 的大作中提到】

: 程序是单线程的吗？听起来象是双核处理器，程序只用了一个核。这可能是瓶颈。
: 程序是用什么语言写的呢？
:
: ...
: ...

l*02018-03-01 08:03

41 楼

需要看几件事情
1）每次发送的包的size
2）每次发送的消耗的纳秒级的时间，包括整个数据接收和处理的时间，比如有xml
object创建需要包含进来，打印出来看看每次耗时多少
3）你的TPS和Average Delay的计算方式

d*a2018-03-01 08:03

42 楼

这样的话，CPU 50%，可能是4个物理核都在忙，但系统没有用hyperthreading。也就是
说，实际的CPU利用率是100%，表现出来是50%。
你的程序，是不是每次发送的数据尺寸比较小？如果是，试试把每次发送的数据尺寸加
大一倍，看流量会不会显著增加。这样可以看出来CPU（包括memory）是不是瓶颈。

【在 d****h 的大作中提到】

: This is a win2016 server. 4 physical core and 8 logical core.
: I am using c# .net 4.6, websocket programming.
: Yes, it is multiple thread. I am using .NET TPL (Task Parallel Library).

d*h2018-03-01 08:03

43 楼

I believe I have already enabled hyper threading.
That's why in task manager/system. I see more logica core than the physical
core.

【在 d***a 的大作中提到】

: 这样的话，CPU 50%，可能是4个物理核都在忙，但系统没有用hyperthreading。也就是
: 说，实际的CPU利用率是100%，表现出来是50%。
: 你的程序，是不是每次发送的数据尺寸比较小？如果是，试试把每次发送的数据尺寸加
: 大一倍，看流量会不会显著增加。这样可以看出来CPU（包括memory）是不是瓶颈。

h*c2018-03-01 08:03

44 楼

奔来想白话两句，一看是网卡不是网络，dat is not mein area

b*s2018-03-01 08:03

45 楼

nagle algorithm, so called

【在 l**********0 的大作中提到】

: 就是利用buffer降低发送次数而减少tcp连接的开销。这是常见的low latency和high
: throighput的trade off。
:
: ,

b*s2018-03-01 08:03

46 楼

just my two cents
1) have you checked context switching of your program? how many threads are
you running? what if just keep one thread for actual sending operations?
2) have you ever tried a solarflare nic with openonload?
wdong mentioned dpdk but it is not as user friendly as solarflare nics.
3) have a tcpdump with ethernet frame info please, just in case there is
something bad happened at lower layers
4) websocket is less efficient than native tcp/ip, so just stay easy with
its performance
cheers

b*s2018-03-01 08:03

47 楼

and try multiplex please if your are holding thousands of sockets

d*a2018-03-01 08:03

48 楼

The system may decide not to use more than one thread per core, if it thinks
that will hurt performance.
The same may happen to memory. The system may limit the amount of memory
used as TCP/IP buffer, even though the overall memory usage is low.

physical

【在 d****h 的大作中提到】

: I believe I have already enabled hyper threading.
: That's why in task manager/system. I see more logica core than the physical
: core.

X*n2018-03-01 08:03

49 楼

Have you checked that your OS/infrastructure/network can handle the traffic?
What's the highest throughput you can get if you cut the number of
websocket clients by half or to 1/4? What about only 1 client?
What's the threading model at the server side? How many threads do you use,
say, to handle 10000 clients?
Other areas to look into:
- As others have pointed out, TCP Nagle's algorithm can greatly impact
throughput. Have you tried TCP_NODELAY?
- Do you use inter-thread synchronization in any extent?

v*n2018-03-01 08:03

50 楼

try capturing a trace using windows performance toolkit
https://docs.microsoft.com/en-us/windows-hardware/test/wpt/
and see whether you can find any obvious bottleneck

physical

【在 d****h 的大作中提到】

: I believe I have already enabled hyper threading.
: That's why in task manager/system. I see more logica core than the physical
: core.