Redian新闻
>
Find top K most frequent numbers?
avatar
Find top K most frequent numbers?# JobHunting - 待字闺中
l*r
1
I think I saw this before, but can't recall it.
How to find the top K most frequent numbers from a large number set? (or
from a stream of numbers)
Thanks!
avatar
p*2
2

priority queue?

【在 l********r 的大作中提到】
: I think I saw this before, but can't recall it.
: How to find the top K most frequent numbers from a large number set? (or
: from a stream of numbers)
: Thanks!

avatar
w*x
3
selection sort?
avatar
c*a
4
priority queue+comparator( value, freq)?
avatar
c*t
5
use files as hashmap?

【在 l********r 的大作中提到】
: I think I saw this before, but can't recall it.
: How to find the top K most frequent numbers from a large number set? (or
: from a stream of numbers)
: Thanks!

avatar
c*s
6
This is what I heard.
You can first build a hash table to map number to its frequency.
Then you know range of the number and has enough space, you can use counting
sort to sort the frequency to get the top k elements.
If not, you can use merge sort to sort the frequency. But because of most of
the frequency can be very small. It is very likely that they are not in the
top K positions. So you can estimate a threshold to filter this frequency.
The frequency below this threshold in each sublist is not needed to be
considered when sorting. After merging, if you have top K numbers, then you
are done.
If not, lower the threshold, and sort the remaining frequency again until
you get K numbers.
To come up with an threshold, we can randomly sample a subset of all
frequency and do the estimate, or we can use a max-heap of the frequency.
The later approach seems to be a standard one.
avatar
c*t
7
请问ls这几位,priority queue, selection sort 和 hashmap 都需要大空间储存吧,
尤其hashmap不太可能用。
我觉得external merge sort和 用file作hashmap似乎更可行啊。

counting
of
the
considered
you

【在 c********s 的大作中提到】
: This is what I heard.
: You can first build a hash table to map number to its frequency.
: Then you know range of the number and has enough space, you can use counting
: sort to sort the frequency to get the top k elements.
: If not, you can use merge sort to sort the frequency. But because of most of
: the frequency can be very small. It is very likely that they are not in the
: top K positions. So you can estimate a threshold to filter this frequency.
: The frequency below this threshold in each sublist is not needed to be
: considered when sorting. After merging, if you have top K numbers, then you
: are done.

avatar
w*x
8

题目看错了, 求正解

【在 l********r 的大作中提到】
: I think I saw this before, but can't recall it.
: How to find the top K most frequent numbers from a large number set? (or
: from a stream of numbers)
: Thanks!

avatar
d*g
9

用file作hashmap,求解释~

【在 c********t 的大作中提到】
: 请问ls这几位,priority queue, selection sort 和 hashmap 都需要大空间储存吧,
: 尤其hashmap不太可能用。
: 我觉得external merge sort和 用file作hashmap似乎更可行啊。
:
: counting
: of
: the
: considered
: you

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。