avatar
o*0
1
Design a data structure that supports kind of full text search but in
numbers.
We are given file with lot of 10-digits numbers, for example:
1234 567 890
4124 123 123
3123 123 322
On a given number X we should return all numbers that contain X.
For example, if the number 123 was given, we should return all numbers (from
the list above) because 123 is in all of them.
If the number 41 was given we should return only the middle number - because
the number 41 is only in it.
能想到的是转成string,再search.
avatar
S*n
2
4124 123 123 与 123
4124 123 123 / 10^7 % 1000 = 412 != 123
4124 123 123 / 10^6 % 1000 = 124 != 123
4124 123 123 / 10^5 % 1000 = 241 != 123
4124 123 123 / 10^4 % 1000 = 412 != 123
4124 123 123 / 10^3 % 1000 = 123 == 123 YES
4124 123 123 / 10^2 % 1000 = 231 != 123
4124 123 123 / 10^1 % 1000 = 312 != 123
4124 123 123 / 10^0 % 1000 = 123 == 123 YES
avatar
m*s
3
用reverted index

from
because

【在 o******0 的大作中提到】
: Design a data structure that supports kind of full text search but in
: numbers.
: We are given file with lot of 10-digits numbers, for example:
: 1234 567 890
: 4124 123 123
: 3123 123 322
: On a given number X we should return all numbers that contain X.
: For example, if the number 123 was given, we should return all numbers (from
: the list above) because 123 is in all of them.
: If the number 41 was given we should return only the middle number - because

avatar
b*b
4
seems standard application of use suffix tree.

from
because

【在 o******0 的大作中提到】
: Design a data structure that supports kind of full text search but in
: numbers.
: We are given file with lot of 10-digits numbers, for example:
: 1234 567 890
: 4124 123 123
: 3123 123 322
: On a given number X we should return all numbers that contain X.
: For example, if the number 123 was given, we should return all numbers (from
: the list above) because 123 is in all of them.
: If the number 41 was given we should return only the middle number - because

avatar
c*t
5
brilliant!

【在 b******b 的大作中提到】
: seems standard application of use suffix tree.
:
: from
: because

avatar
r*g
6
牛b,面试遇到suffix tree基本上就准备直接挂,看wiki也感觉好难。

【在 b******b 的大作中提到】
: seems standard application of use suffix tree.
:
: from
: because

avatar
m*s
7
看了suffix tree, 但是仍然觉得不是最好的解决方法. 比如找出含有123的数字, 仍然
需要先查找123所在的边, 然后再找数字. 这样和一个一个数字扫描没有太大区别.

【在 b******b 的大作中提到】
: seems standard application of use suffix tree.
:
: from
: because

avatar
x*9
8
想了个土方法
对于needle为一位或两位的情况,我们用个hashmap预处理出来。
然后将一个10位数拆成8个三位数
for example:
1 234 567 890 => (123) (234) (345) ...(890)
然后建倒排索引。
接下来的每一次查找,对于needle为一位或两位的情况,直接查表来做。
对于needle.length() >= 3的情况,我们用同样的方法将needle进行分拆,拆成多个
key。
之后用这些key拉出倒排链,做一次merge操作。在筛选过后,再暴力检查一下是否真含
有这个子串。
如果数据是随机的,这种方法应该不差。
avatar
v*N
9
mark
avatar
r*g
10
这道题应该没有好办法吧,suffix tree很难写的。
avatar
h*u
11
I feel it is still invert index.
Each string has an ID.
Each sub-string maps to a set of raw string IDs.
For example,
Let 3123 123 322 have ID C.
3 -> C
31 -> C
312 -> C
...
Finally combine them, say
3 -> {A, B, C}
312 -> {B, C}
...
For each string, n^2 combinations, or words. M strings, M n^2 total words, n
as average length.
avatar
y*x
12
最直接的难道不是正则表达式
[在 ocean100 (ocean100) 的大作中提到:]
:Design a data structure that supports kind of full text search but in
:numbers.
:...........
avatar
y*x
13
最直接的难道不是正则表达式
[在 ocean100 (ocean100) 的大作中提到:]
:Design a data structure that supports kind of full text search but in
:numbers.
:...........
avatar
h*u
14
So, replace Google by "ctrl-F".
avatar
K*r
15
用String开销很大吗?

★ 发自iPhone App: ChineseWeb 11

【在 y***x 的大作中提到】
: 最直接的难道不是正则表达式
: [在 ocean100 (ocean100) 的大作中提到:]
: :Design a data structure that supports kind of full text search but in
: :numbers.
: :...........

avatar
r*g
16
n^2开销还不如直接建一个suffix tree了,也是n*m, n是字符串个数,m是字符串长度。

【在 h*****u 的大作中提到】
: I feel it is still invert index.
: Each string has an ID.
: Each sub-string maps to a set of raw string IDs.
: For example,
: Let 3123 123 322 have ID C.
: 3 -> C
: 31 -> C
: 312 -> C
: ...
: Finally combine them, say

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。