avatar
求问一道G家Onsite题# JobHunting - 待字闺中
l*9
1
【 以下文字转载自 Military 讨论区 】
发信人: lqm1989 (Jeremy Lin), 信区: Military
标 题: 带两斤干大枣过美国海关没事吧?
发信站: BBS 未名空间站 (Wed Feb 20 10:23:55 2013, 美东)
多谢
avatar
e*y
2
Given millions of lines of strings, each of which have a length of 1000
characters without spaces. Among of these strings, only one of them is real
English consisting of valid English words. All others and just random
letter combinations that is not real English. Also, bear in mind that the
English may contains several mismatches. Your task is to identify the only
string that is English.
avatar
b*l
3
当然没事,我父母带过。
avatar
l*a
4
这是题吗?
你怎么知道所谓的那些不是英文的就不是mismatch呢
那你说ab,ac这两个词哪个是English with mismatch,哪个是random letter
combination?

real

【在 e**********y 的大作中提到】
: Given millions of lines of strings, each of which have a length of 1000
: characters without spaces. Among of these strings, only one of them is real
: English consisting of valid English words. All others and just random
: letter combinations that is not real English. Also, bear in mind that the
: English may contains several mismatches. Your task is to identify the only
: string that is English.

avatar
r*t
5
没事,刚带回干红枣,木耳,干桂圆,小米。
avatar
e*y
6
这是题。这种题目,只需要大致的跟面试官交流思路怎么做。
random letter combination的话,1000个字都是这样,没有能在字典里找到的词
English with mismatch是1000个字里面大部分你都能在字典里面找到,少部分不能。

【在 l*****a 的大作中提到】
: 这是题吗?
: 你怎么知道所谓的那些不是英文的就不是mismatch呢
: 那你说ab,ac这两个词哪个是English with mismatch,哪个是random letter
: combination?
:
: real

avatar
s*6
7
查查海关规定?

【在 l*****9 的大作中提到】
: 【 以下文字转载自 Military 讨论区 】
: 发信人: lqm1989 (Jeremy Lin), 信区: Military
: 标 题: 带两斤干大枣过美国海关没事吧?
: 发信站: BBS 未名空间站 (Wed Feb 20 10:23:55 2013, 美东)
: 多谢

avatar
h*c
8
sum of the lengths is 1000.
avatar
h*c
9
Is the dictionary given?
avatar
h*c
10
Ok, approximately,
Suppose you have the dictionary in a hashed structure O(m)
So the maxim len of one word possilble is known.
start a with the first character with max length window, hash search in the
dictionary. Not found window size minus one.
Found, slide the window.
Iterate through.
Time complexity N * (1000/min_word_length) * m

real

【在 e**********y 的大作中提到】
: Given millions of lines of strings, each of which have a length of 1000
: characters without spaces. Among of these strings, only one of them is real
: English consisting of valid English words. All others and just random
: letter combinations that is not real English. Also, bear in mind that the
: English may contains several mismatches. Your task is to identify the only
: string that is English.

avatar
r*7
11
难道是建一个后缀树数组?复杂度也不低啊。。。

real

【在 e**********y 的大作中提到】
: Given millions of lines of strings, each of which have a length of 1000
: characters without spaces. Among of these strings, only one of them is real
: English consisting of valid English words. All others and just random
: letter combinations that is not real English. Also, bear in mind that the
: English may contains several mismatches. Your task is to identify the only
: string that is English.

avatar
p*t
12
假设random strings是uniform的 统计下每行里各字母出现的次数
大概就可以大致找到哪行是英文单词组成的了吧。。。
毕竟英语单词各字母频率是有很大bias的。。。

real

【在 e**********y 的大作中提到】
: Given millions of lines of strings, each of which have a length of 1000
: characters without spaces. Among of these strings, only one of them is real
: English consisting of valid English words. All others and just random
: letter combinations that is not real English. Also, bear in mind that the
: English may contains several mismatches. Your task is to identify the only
: string that is English.

avatar
j*r
13
可以计算每一行的q-gram分布,再计算熵,熵最小的就是real English,线性时间可以
完成。
avatar
s*d
14
这个题怎么搞?

real

【在 e**********y 的大作中提到】
: Given millions of lines of strings, each of which have a length of 1000
: characters without spaces. Among of these strings, only one of them is real
: English consisting of valid English words. All others and just random
: letter combinations that is not real English. Also, bear in mind that the
: English may contains several mismatches. Your task is to identify the only
: string that is English.

avatar
B*L
15
for each line, randomly sample a few segments of 10 chars and then check if
these segments contain valid English words or not. If none of them does,
then the corresponding line is not valid English. The time complexity could
be sublinear.

real

【在 e**********y 的大作中提到】
: Given millions of lines of strings, each of which have a length of 1000
: characters without spaces. Among of these strings, only one of them is real
: English consisting of valid English words. All others and just random
: letter combinations that is not real English. Also, bear in mind that the
: English may contains several mismatches. Your task is to identify the only
: string that is English.

avatar
x*9
16
+1

【在 p**t 的大作中提到】
: 假设random strings是uniform的 统计下每行里各字母出现的次数
: 大概就可以大致找到哪行是英文单词组成的了吧。。。
: 毕竟英语单词各字母频率是有很大bias的。。。
:
: real

avatar
t*5
17
mark
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。