Pick k lines from a large file randomly uniformly distributed# JobHunting - 待字闺中
t*e
1 楼
I know the solution:
read the first k lines from the file,
then repeat the following steps:
- read one line from the file
- with probability X, keep the new line, and randomly drop a line from the
previous selected k lines.
- with probability (1-X), drop the new line.
till all the lines of the lines are read
My questions is:
- What should be the value of X?
- How to give a strict math proof that this method gives a randomly
uniformly distributed k lines.
thanks alot for your help.
read the first k lines from the file,
then repeat the following steps:
- read one line from the file
- with probability X, keep the new line, and randomly drop a line from the
previous selected k lines.
- with probability (1-X), drop the new line.
till all the lines of the lines are read
My questions is:
- What should be the value of X?
- How to give a strict math proof that this method gives a randomly
uniformly distributed k lines.
thanks alot for your help.