avatar
这道FB题如何解?# JobHunting - 待字闺中
d*n
1
Given 1 trillion messages on FB and each message has at max 10 words, how do
you build the index table and how many machines do you need on the cluster
to store the index table ?
avatar
k*a
2
什么样的index table?
avatar
p*a
4
为什么要index table?

do
cluster

【在 d****n 的大作中提到】
: Given 1 trillion messages on FB and each message has at max 10 words, how do
: you build the index table and how many machines do you need on the cluster
: to store the index table ?

avatar
d*8
5
I am not sure whether my analysis below is correct:
We have 2^42 messages. Each message has a unique ID, which is a 8B interger.
Assume each message has 8 words on average, and there are 2^14 unique words.
So each word appears in roughly 2^42 * 8 / 2^14 = 2^31 messages.
So in the index table, each word has roughly 2^31 corresponding records and
each record is a message ID (8B size). So the size of the index of each word
is 2^34B = 4GB.
Since there are 2^14 unique words, the total size of the index table is 2^14
* 4GB = 64TB. Suppose each machine's storage is 2TB, then we need 32
machines. If we add redundancy in case of system failure, we need 32 * 2 =
64 machines.
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。