avatar
an interview problem# Programming - 葵花宝典
m*t
1
write the program for displaying the ten most frequent words in a file such
that your program should be efficient in all complexity measures.
I think hash table with counting will be a natual solution, but not sure if
it is a best fit for this problem.
avatar
m*k
2
hashtable or b+ tree/trie like dictinonary data structure etc only provide
fast search/index, so still need additional data structure on the frequnecy
order statistics. A min-heap is ok. or you can use another b+ tree to store
the frequency as key

such
if

【在 m***t 的大作中提到】
: write the program for displaying the ten most frequent words in a file such
: that your program should be efficient in all complexity measures.
: I think hash table with counting will be a natual solution, but not sure if
: it is a best fit for this problem.

avatar
m*t
3
I guess u mean max heap to store frequency, right?

frequnecy
store

【在 m******k 的大作中提到】
: hashtable or b+ tree/trie like dictinonary data structure etc only provide
: fast search/index, so still need additional data structure on the frequnecy
: order statistics. A min-heap is ok. or you can use another b+ tree to store
: the frequency as key
:
: such
: if

avatar
m*k
4
I mean a min-heap of size 10 store top 10 most frequence words on the fly.
because we need to compare other candidates with the least frequent one of
the top 10 to determine whether do a replcaement. Other operation on the min
-heap includes increase_key if anyone of the current top 10 get hit again.
of course, a max-heap also works, but it seems you need to store the entire
N elements to determine the top 10.

I guess u mean max heap to store frequency, right?
frequnecy
store

【在 m***t 的大作中提到】
: I guess u mean max heap to store frequency, right?
:
: frequnecy
: store

avatar
m*t
5
//blush. That is absolutely right.

min
entire

【在 m******k 的大作中提到】
: I mean a min-heap of size 10 store top 10 most frequence words on the fly.
: because we need to compare other candidates with the least frequent one of
: the top 10 to determine whether do a replcaement. Other operation on the min
: -heap includes increase_key if anyone of the current top 10 get hit again.
: of course, a max-heap also works, but it seems you need to store the entire
: N elements to determine the top 10.
:
: I guess u mean max heap to store frequency, right?
: frequnecy
: store

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。