U of Virginia的化工怎样，打听一个老板 - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>ChemEng - 化学工程

U of Virginia的化工怎样，打听一个老板

U of Virginia的化工怎样，打听一个老板# ChemEng - 化学工程

c*n2009-06-02 07:06

1 楼

I'm new , so having the following question on the mailing list,
haven't got an answer, maybe someone here could help? thanks!
http://mail-archives.apache.org/mod_mbox/lucene-java-
user/201104.mbox/browser

c*i2009-06-02 07:06

2 楼

Matthew Neurock. 做计算的一个老板。人怎样？研究方向好不好

g*y2009-06-02 07:06

3 楼

what's your question? The link points to April archive.

【在 c******n 的大作中提到】

: I'm new , so having the following question on the mailing list,
: haven't got an answer, maybe someone here could help? thanks!
: http://mail-archives.apache.org/mod_mbox/lucene-java-
: user/201104.mbox/browser

E*d2009-06-02 07:06

4 楼

据说从来不管学生？

【在 c*******i 的大作中提到】

: Matthew Neurock. 做计算的一个老板。人怎样？研究方向好不好

c*n2009-06-02 07:06

5 楼

thanks, I didn't realize the link shows differently.... here it is:
########################################################
I'm new to lucene/search engine , and have been struggling with these
questions recently.
I'd appreciate a lot of you could shed some light on this.
let's say I do a query on
dog greyhound
note that I did not quote them, i.e. this is not a phrase search.
what happens under the hood ?
which term does Lucene use to look up the inverted Index ?
I read somewhere that Lucene uses the term with the higher IDF (i.e.
the more distinguishing term), i.e. in this case
"greyhound", but what about dog? does Lucene traverse down the doclist
of "dog" at all? if I provide multiple terms in my query,
generally how does Lucene decide how many doclists to travel down?
I read that Lucene uses a combination of "binary model" and VSM, then
it seems that in the above case, it finds
the full doclist of dog , and that of "greyhound", (the binary model
part), then find the common docs from the two doclists,
then order them by scores ( the VSM part). is it true that the FULL
doclists are fetched first? or is some pruning done on the individual
doclists? I see the
talk in http://www.slideshare.net/abial/eurocon2010 that talks about
pruning and tiered search, but is this the default behavior of Lucene?
how are the doclists sorted? (by idf ?? --- sorry I'm just beginning
to sift through a lot of docs online, somehow got this impression but
can't form a precise conclusion)
also generally, could you please provide some good articles on how
lucene/search engines work? I've read the "anatomy of a search engine"
(google Sergey Brin & Larry Page paper),
"introduction to information retrieval (Manning et al ) " , "Lucene
in action" ....
Thanks
Yang

【在 g**********y 的大作中提到】

: what's your question? The link points to April archive.

S*n2009-06-02 07:06

6 楼

现在对matt感兴趣的人很多嘛
今年看见好几个人问了：）
方向还不错，人怎么样就不了解了

【在 c*******i 的大作中提到】

: Matthew Neurock. 做计算的一个老板。人怎样？研究方向好不好

g*y2009-06-02 07:06

7 楼

Sorry, I just use Lucene as a search engine in our product. I didn't dive
into how it works.
I did read some documents and code from Lucene project for curiosity. My
impression is: it is a C-style Java program, painful to read and use.
Maybe you can directly contact the developers for technical details.

【在 c******n 的大作中提到】

: thanks, I didn't realize the link shows differently.... here it is:
: ########################################################
: I'm new to lucene/search engine , and have been struggling with these
: questions recently.
: I'd appreciate a lot of you could shed some light on this.
: let's say I do a query on
: dog greyhound
: note that I did not quote them, i.e. this is not a phrase search.
: what happens under the hood ?
: which term does Lucene use to look up the inverted Index ?

i*e2009-06-02 07:06

8 楼

I don't know much about the internals of Lucene.
With Solr, it's possible to specify the default
operator as OR or AND. I think your were more
talking about the OR case. It is optional, that
when AND gives you a very small number of results,
you could do an OR to enrich the result.

【在 c******n 的大作中提到】

c*n2009-06-02 07:06

9 楼

exactly!!
I read it and it was all "abstract class", if it's java, most of it would
have been written as interfaces + baseImpl

dive

【在 g**********y 的大作中提到】

: Sorry, I just use Lucene as a search engine in our product. I didn't dive
: into how it works.
: I did read some documents and code from Lucene project for curiosity. My
: impression is: it is a C-style Java program, painful to read and use.
: Maybe you can directly contact the developers for technical details.

c*n2009-06-02 07:06

10 楼

btw, have you built solr in eclipse?
its directory organization is not very standardized, I had to manually
tweak it a lot to make it work

dive

【在 g**********y 的大作中提到】

g*y2009-06-02 07:06

11 楼

no, I didn't use solr in work.

【在 c******n 的大作中提到】

: btw, have you built solr in eclipse?
: its directory organization is not very standardized, I had to manually
: tweak it a lot to make it work
:
: dive