avatar
s*s
1
有一个问题想问问大家,谢谢了。
You have a 200 GB text file and a Linux box with 8GB of RAM and 4 cores.
Write a program/script that outputs a file listing the frequency of all
words in the file (i.e. a TSV file with two columns ). Note
that the set of words in the file may not fit in memory.
avatar
f*t
2
mlock 8GB as buffer;
4 threads: 1st process 0-2G buffer; 2nd process 2-4G buffer; 3rd 4-6G .. and
produce their own unorderded_maps.
mmap 8GB file each time into memory.
merge unordered_maps.

Note

【在 s******s 的大作中提到】
: 有一个问题想问问大家,谢谢了。
: You have a 200 GB text file and a Linux box with 8GB of RAM and 4 cores.
: Write a program/script that outputs a file listing the frequency of all
: words in the file (i.e. a TSV file with two columns ). Note
: that the set of words in the file may not fit in memory.

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。