出了一个inverted index的题,就是有一大堆doc,对doc里出现的word建inverted index,doc很多所以是distribute在很多machine上的,问怎么实现这个inverted index
g*g
2 楼
Cassandra is a perfect DB for illustration. You have each word mapping to a list of doc ids in each row. The doc id can be UUID or URL as long as it's unique. For each index row, the row key (word) is also hashed and the row is replicated so you can have N copy in the cluster and the keys will evenly distribute. You may also use timestamp etc. to arrange your index row so you can optionally use a time range query which is very common in such design.
【在 s*******m 的大作中提到】 : 出了一个inverted index的题,就是有一大堆doc,对doc里出现的word建inverted : index,doc很多所以是distribute在很多machine上的,问怎么实现这个inverted : index
【在 g*****g 的大作中提到】 : Cassandra is a perfect DB for illustration. You have each word mapping to a : list of doc ids in each row. The doc id can be UUID or URL as long as it's : unique. For each index row, the row key (word) is also hashed and the row is : replicated so you can have N copy in the cluster and the keys will evenly : distribute. You may also use : timestamp etc. to arrange your index row so you can optionally use a time : range query which is very common in such design.
p*2
4 楼
检索key很快 然后基本没有index 不过inverted index是不是一般 in memory的?我可能会用redis搞搞
【在 s*******m 的大作中提到】 : 谢谢。 : 想请教个初级的问题, 想cassandra这样的key-value数据库, : 内部有index吗? 比如,我检索一个key,会不会很快的完成? : : a : is
App doesn't need to know. It knows the keyword which is a unique word, it doesn't need to know the hash value. Cassandra can cache rows in memory, for access, you don't need memcache. But Memcache can be convenient for different things, like caching a rich object in memory which you don't do in NoSQL.
【在 g*****g 的大作中提到】 : App doesn't need to know. It knows the keyword which is a unique word, it : doesn't need to know the hash value. Cassandra can cache rows in memory, for : access, you don't need memcache. But Memcache can be convenient for : different things, like caching a rich object in memory which you don't do in : NoSQL.
b*5
14 楼
我刷啊, 刷得黑天白夜的, 然后面试时, 问到一个怎么产生一个random bejewel的 题, 你叫我怎么办? 给它基本解出来, 我觉得, 但没写全, 你叫我怎么办? 然后去面个二流公司, 题目都解出来啊, 然后领走时, 面试官说, we will get back to u very soon。。。 然后二个礼拜过去了, 发信去问, 人家屁都不回
【在 b**********5 的大作中提到】 : 我刷啊, 刷得黑天白夜的, 然后面试时, 问到一个怎么产生一个random bejewel的 : 题, 你叫我怎么办? 给它基本解出来, 我觉得, 但没写全, 你叫我怎么办? : 然后去面个二流公司, 题目都解出来啊, 然后领走时, 面试官说, we will get : back to u very soon。。。 然后二个礼拜过去了, 发信去问, 人家屁都不回
Think of it as a Json object, a doc. Anything that's a value and too big to fit into C* row cache.
【在 h*******0 的大作中提到】 : 好虫大神 rich object是什么? 能举个例子吗? : : for : in
g*g
21 楼
How is this a mapreduce? It's just an index. Everybody knows what an inverted index is, the question is how to implemented it in a distributed system so that it can scale.
【在 g*****g 的大作中提到】 : How is this a mapreduce? It's just an index. Everybody knows what an : inverted index is, the question is how to implemented it in a distributed : system so that it can scale.
g*g
23 楼
If you are taking counts, it can be MapReduce, otherwise what are you reducing in an inverted index?
【在 g*****g 的大作中提到】 : How is this a mapreduce? It's just an index. Everybody knows what an : inverted index is, the question is how to implemented it in a distributed : system so that it can scale.
b*5
25 楼
我只是说, 本来的问题是, 你只有一些hdfs file, 你要建立这个inverted index。 你store 这个inverted index in the hbase或者cassandra都可以
【在 g*****g 的大作中提到】 : If you are taking counts, it can be MapReduce, otherwise what are you : reducing in an inverted index?