Redian新闻
>
Pick k lines from a large file randomly uniformly distributed
avatar
Pick k lines from a large file randomly uniformly distributed# JobHunting - 待字闺中
t*e
1
I know the solution:
read the first k lines from the file,
then repeat the following steps:
- read one line from the file
- with probability X, keep the new line, and randomly drop a line from the
previous selected k lines.
- with probability (1-X), drop the new line.
till all the lines of the lines are read
My questions is:
- What should be the value of X?
- How to give a strict math proof that this method gives a randomly
uniformly distributed k lines.
thanks alot for your help.
avatar
l*8
2
CareerCup 150题里有道要求从一串数字流里随机挑一个数。这个应该也是用一样的方
法吧。
avatar
a*y
3
wiki reservoir sampling
basically you have to have a random() function to generate from 1 to i where
i is the count of the line you have seen so far, if it fall between 1 and k
, you replace that one, otherwise, drop this one
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。