I am not sure whether my analysis below is correct:
We have 2^42 messages. Each message has a unique ID, which is a 8B interger.
Assume each message has 8 words on average, and there are 2^14 unique words.
So each word appears in roughly 2^42 * 8 / 2^14 = 2^31 messages.
So in the index table, each word has roughly 2^31 corresponding records and
each record is a message ID (8B size). So the size of the index of each word
is 2^34B = 4GB.
Since there are 2^14 unique words, the total size of the index table is 2^14
* 4GB = 64TB. Suppose each machine's storage is 2TB, then we need 32
machines. If we add redundancy in case of system failure, we need 32 * 2 =
64 machines.