Twitter Search is Now 3x Faster using Java server - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>Java - 爪哇娇娃

Twitter Search is Now 3x Faster using Java server

Twitter Search is Now 3x Faster using Java server# Java - 爪哇娇娃

T*o2011-04-07 07:04

1 楼

http://engineering.twitter.com/2011/04/twitter-search-is-now-3x

r*l2011-04-07 07:04

2 楼

"changing our back-end from MySQL to a real-time version of Lucene"
This may contribute quite a lot to the performance gain.

【在 T*o 的大作中提到】

: http://engineering.twitter.com/2011/04/twitter-search-is-now-3x

g*g2011-04-07 07:04

3 楼

Don't get this part, MySql is a DB, Lucene is a search engine.
how is this replacible?

【在 r*****l 的大作中提到】

: "changing our back-end from MySQL to a real-time version of Lucene"
: This may contribute quite a lot to the performance gain.

g*g2011-04-07 07:04

4 楼

Reading the blog, it seems they get this by changing the architecture
from synchronous mode to asynchrnous mode, that's where the most
gain is coming from. They also imply Ruby on Rail is getting unmaintainable
to do this kind of change, or lack of NIO libraries. I am surprised they
didn't do it using Scala though.

【在 T*o 的大作中提到】

: http://engineering.twitter.com/2011/04/twitter-search-is-now-3x

l*e2011-04-07 07:04

5 楼

indexing?

【在 g*****g 的大作中提到】

: Don't get this part, MySql is a DB, Lucene is a search engine.
: how is this replacible?

i*e2011-04-07 07:04

6 楼

NIO是异步process web requests。有什么web server能异步
到后台取data，然后回到原来的socket connection去serve page？

【在 g*****g 的大作中提到】

: Reading the blog, it seems they get this by changing the architecture
: from synchronous mode to asynchrnous mode, that's where the most
: gain is coming from. They also imply Ruby on Rail is getting unmaintainable
: to do this kind of change, or lack of NIO libraries. I am surprised they
: didn't do it using Scala though.

F*n2011-04-07 07:04

7 楼

If you think of DBMS as nothing but indexing, Lucene has its own indexing
managing & access mechanism, which is much faster than other DBs for Lucene'
s own specific tasks.

【在 g*****g 的大作中提到】

: Don't get this part, MySql is a DB, Lucene is a search engine.
: how is this replacible?

r*l2011-04-07 07:04

8 楼

Yes. My feeling is that the index engine and new architecture help directly.
The title implies Java is the main reason though.

unmaintainable

【在 g*****g 的大作中提到】

z*e2011-04-07 07:04

9 楼

自己实现啊,web server只是frontend接受请求，后面就自己处理了吧。
原理不复杂，给每个connection一个ID，然后就可以随便怎么折腾了，等数据回来了，
根据ID再写回
去，就好像一个proxy server一样。
或者client发送请求后就把connection断了，靠client不断poll来取数据。

【在 i**e 的大作中提到】

: NIO是异步process web requests。有什么web server能异步
: 到后台取data，然后回到原来的socket connection去serve page？

g*g2011-04-07 07:04

10 楼

In java's term, they create a Future in servlet, and block
on Future to return. In the Future, they do all kinds of
async processing. On a loaded system, there'll be less CPU
cycles blocking on IO, and they can achieve better throughput.
Though they don't really use servlet, that part is in RoR.

【在 i**e 的大作中提到】

: NIO是异步process web requests。有什么web server能异步
: 到后台取data，然后回到原来的socket connection去serve page？

s*o2011-04-07 07:04

11 楼

Will Oracle sue this?

faster_1656.html

【在 T*o 的大作中提到】

: http://engineering.twitter.com/2011/04/twitter-search-is-now-3x

i*e2011-04-07 07:04

12 楼

嗯。应该是你说的这样。我是想知道现在有哪个(open source)
web framework implement了这个

【在 g*****g 的大作中提到】

: In java's term, they create a Future in servlet, and block
: on Future to return. In the Future, they do all kinds of
: async processing. On a loaded system, there'll be less CPU
: cycles blocking on IO, and they can achieve better throughput.
: Though they don't really use servlet, that part is in RoR.

i*e2011-04-07 07:04

13 楼

土办法大概就是把request（连同connection, headers, etc.）
放在一个hashtable里，然后发一个基于NIO的异步request到后台。
但是，跟前台的servlet结合怎么弄？servlet是同步的by design吧。

【在 z***e 的大作中提到】

: 自己实现啊,web server只是frontend接受请求，后面就自己处理了吧。
: 原理不复杂，给每个connection一个ID，然后就可以随便怎么折腾了，等数据回来了，
: 根据ID再写回
: 去，就好像一个proxy server一样。
: 或者client发送请求后就把connection断了，靠client不断poll来取数据。

i*e2011-04-07 07:04

14 楼

你说的这个貌似仍然是block的，只不过blocked on Future？
这个跟在memory/thread里面block有啥区别捏？

【在 g*****g 的大作中提到】

c*n2011-04-07 07:04

15 楼

this is essentially the thread-vs-msg processing argument
Cassandra does exactly what you said: every request creates a handler and
Cassandra shoves it in a huge map , with that request ID, when reply msg
comes back, the ID is used to look up the request handler. so overall
there are very few "processor " threads, but there can be many many more
requests on the queue

【在 z***e 的大作中提到】

g*g2011-04-07 07:04

16 楼

You can use plain servlet to hook up netty or mina. They use netty
here.

【在 i**e 的大作中提到】

: 嗯。应该是你说的这样。我是想知道现在有哪个(open source)
: web framework implement了这个

i*e2011-04-07 07:04

17 楼

Hmm. I must be missing something. In the case we are
discussing, there are two web servers involve, one
front-end server serving web requests, which in turn
calls a back-end server for mashing up data.
I thought netty or mina used async network handling.
But for the servlet running on the front-end server,
the requests going to back-end servers are still
blocking?

【在 g*****g 的大作中提到】

: You can use plain servlet to hook up netty or mina. They use netty
: here.

g*g2011-04-07 07:04

18 楼

Http is a request/response protocol, unless you are using a long
poll (comet like framework) in web layer, it has to be blocking
in front end. You can, however,do the heavy lifting in another
component.

【在 i**e 的大作中提到】

: Hmm. I must be missing something. In the case we are
: discussing, there are two web servers involve, one
: front-end server serving web requests, which in turn
: calls a back-end server for mashing up data.
: I thought netty or mina used async network handling.
: But for the servlet running on the front-end server,
: the requests going to back-end servers are still
: blocking?

i*e2011-04-07 07:04

19 楼

Isn't this what they did at twitter? I think they
made the front-end async. When a request is received
by front-end, it sends a request to back-end service
and continue on. When the back-end response is back,
someone picks up the response and mash it up and
send to the original front-end client.
"Creating a fully asynchronous aggregation service.
No thread waits on network I/O to complete."

【在 g*****g 的大作中提到】

: Http is a request/response protocol, unless you are using a long
: poll (comet like framework) in web layer, it has to be blocking
: in front end. You can, however,do the heavy lifting in another
: component.

g*g2011-04-07 07:04

20 楼

They made the heavy lifting part async, that's all.
Http protocol is a synchronous protocol and you can't
change that. It's not like there's a connection open,
and the server can push data to client whenever it wants.

【在 i**e 的大作中提到】

: Isn't this what they did at twitter? I think they
: made the front-end async. When a request is received
: by front-end, it sends a request to back-end service
: and continue on. When the back-end response is back,
: someone picks up the response and mash it up and
: send to the original front-end client.
: "Creating a fully asynchronous aggregation service.
: No thread waits on network I/O to complete."

i*e2011-04-07 07:04

21 楼

The request handler code can be async, though.
Traditionally the request handling thread is blocked
(as in servlets) waiting for back-end I/O (file system
or network). It sounds like twister has made this non-blocking,
which means the thread is freed to do other things. When
back-end I/O is done, the back-end response thread sends
data back to the front-end client.

【在 g*****g 的大作中提到】

: They made the heavy lifting part async, that's all.
: Http protocol is a synchronous protocol and you can't
: change that. It's not like there's a connection open,
: and the server can push data to client whenever it wants.