请教一道google的数组遍历题 - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>JobHunting - 待字闺中

请教一道google的数组遍历题

请教一道google的数组遍历题# JobHunting - 待字闺中

k*r2014-01-05 08:01

1 楼

2. Given a list of words, find two strings S & T such that:
a. S & T have no common character
b. S.length() * T.length() is maximized
刚从板上看到这题，我能想到的就是greedy search解法, 都需要O(n^2 * l), n是单
词个数, l是单词平均字母数
这题怎么才能做得比O(n^2)更快

l*a2014-01-05 08:01

2 楼

单词按照长度排个序就是nlgn*l了吧
然后显然从最长单词开始处理。。。

【在 k*******r 的大作中提到】

: 2. Given a list of words, find two strings S & T such that:
: a. S & T have no common character
: b. S.length() * T.length() is maximized
: 刚从板上看到这题，我能想到的就是greedy search解法, 都需要O(n^2 * l), n是单
: 词个数, l是单词平均字母数
: 这题怎么才能做得比O(n^2)更快

k*r2014-01-05 08:01

3 楼

可是拍完序后greedy search，比较word pair, 不还得O(n^2)吗

l*n2014-01-05 08:01

4 楼

O(n^2)是少不了的吧？毕竟是否overlap不是偏序关系，没办法做到排序的nlogn。
可以优化的地方在于*l。用一个26bit的int来表示某个单词的字母使用，然后看是否有
共同的字符的时候用AND操作就好了。

【在 k*******r 的大作中提到】

x*62014-01-05 08:01

5 楼

先sort这个list nlogn,建一个matrix记录字符串是否有相同字符，找出所有有相同字
符的字符串mn记录在matrix里面,再从matrix的一角开始安顺序找到没有被mark的位置
得到那两个字符串worst case n^2,但是average应该还不错

d*12014-01-05 08:01

6 楼

就是先对每个word排序，如果是假设全都是小写字母的话，
然后建立trie。
总的复杂度是 O(n*l)

a*s2014-01-05 08:01

7 楼

具体怎么做？这样回答在interview的时候肯定fail啊

【在 d*****1 的大作中提到】

: 就是先对每个word排序，如果是假设全都是小写字母的话，
: 然后建立trie。
: 总的复杂度是 O(n*l)

k*r2014-01-05 08:01

8 楼

k*r2014-01-05 08:01

9 楼

可是拍完序后greedy search，比较word pair, 不还得O(n^2)吗

l*n2014-01-05 08:01

10 楼

x*62014-01-05 08:01

11 楼

d*12014-01-05 08:01

12 楼

就是先对每个word排序，如果是假设全都是小写字母的话，
然后建立trie。
总的复杂度是 O(n*l)

a*s2014-01-05 08:01

13 楼

具体怎么做？这样回答在interview的时候肯定fail啊

【在 d*****1 的大作中提到】

: 就是先对每个word排序，如果是假设全都是小写字母的话，
: 然后建立trie。
: 总的复杂度是 O(n*l)

b*p2014-01-05 08:01

14 楼

这个是python solution, worst case 是O(n^2). 比普通的Greedy Solution, 有两个
改进
1. 计算bit map of word, and AND operation to check if there is common char
2. 遍历是从最长的word开始
还有一个可以改进的地方是，利用元音字母，用word_len+vowel 作key，减少不必要
的compare
class DiffWordsMaxLenProduct:
def __init__(self):
# dict: word -> word char bit map
self.dict_word_bit = {}
# dict: word_len -> [word1, word2, word3...]
self.dict_word_len = {}
#
# compute the bit map of word char
#
def bit_word(self, word):
bit_word = 0
ord_a = ord('a')
for ch in list(word):
bit_word = bit_word | (int('1', 2) << (ord(ch) - ord_a))
return bit_word
#
# by AND bit map of words to see if they have common char
#
def has_common_char(self, word1, word2):
print word1, word2, self.dict_word_bit[word1] & self.dict_word_bit[
word2]
return self.dict_word_bit[word1] & self.dict_word_bit[word2]
def solution(self, list_words):
#
# scan all words and calculate the bitwise int of char
# : O(n*l), n - # of words, l - avg len of word
#
max_word_len = 0
for word in list_words:
word = word.lower()
word_len = len(word)
# get the max len of word
if word_len > max_word_len:
max_word_len = word_len
# skip words already processed
if word in self.dict_word_bit:
continue
# get bitwise integer of word
self.dict_word_bit[word] = self.bit_word(word)
# update word len dict
if word_len in self.dict_word_len:
self.dict_word_len[word_len].append(word)
else:
self.dict_word_len[word_len] = [word]
#
# compare all words in the order of len
# worest case: N^2
#
word_pool = []
while max_word_len > 0:
if max_word_len not in self.dict_word_len:
max_word_len -= 1
continue
for word in self.dict_word_len[max_word_len]:
for base_word in word_pool:
if not self.has_common_char(word, base_word):
return (word, base_word)
word_pool.append(word)
max_word_len -= 1
if __name__ == "__main__":
word_list = ['abcdefg', 'ghijklmnop', 'qrstug', 'avwxyz',]
s = DiffWordsMaxLenProduct()
print s.solution(word_list)

【在 k*******r 的大作中提到】

s*r2014-01-05 08:01

15 楼

得到每个word就是n^2了吧？

【在 d*****1 的大作中提到】

: 就是先对每个word排序，如果是假设全都是小写字母的话，
: 然后建立trie。
: 总的复杂度是 O(n*l)

i*n2014-01-05 08:01

16 楼

假设忽略大小写，那么可以建一个n*26的bitmap矩阵。然后遍历各列，去除overlap的
word，这需要o（n）。
剩下的word都是互不overlap的，那么可以在线性时间内找到长度最大的两个word，返
回之。
整体的复杂度还是o（n）

m*h2014-01-05 08:01

17 楼

直接去除overlap是不对的，e.g., a和b 是overlap的，但是并不意味着a和c也是
overlap的，所以不能直接去除a和b。

【在 i**********n 的大作中提到】

: 假设忽略大小写，那么可以建一个n*26的bitmap矩阵。然后遍历各列，去除overlap的
: word，这需要o（n）。
: 剩下的word都是互不overlap的，那么可以在线性时间内找到长度最大的两个word，返
: 回之。
: 整体的复杂度还是o（n）

Q*a2014-01-05 08:01

18 楼

没想到什么比O(n^2)好的方法，毕竟是partially order, 不是total order
但到不了O(n^2*l), 只是O(n^2 + nl).只需要过一遍string计算sign就可以了，另外将
数组按长度从大到小排列，提前剪枝也可以优化一些，但复杂度还是O(n^2)
const int INT_BITS = sizeof(int) * 8;
const int DATA_LEN = 256/INT_BITS;
class Int256 {

vector data;
public:
Int256(): data(DATA_LEN, 0) {
}
Int256(string s): data(DATA_LEN, 0) {
for (auto c: s) {
Set((unsigned char)c);
}
}
void Set(int l) {
data[l/INT_BITS] |= 1 << (l%INT_BITS);
}
bool IsZero() const {
for (int i = 0; i < DATA_LEN; ++i) {
if (data[i]) return false;
}
return true;
}
Int256 BitAnd(const Int256& rhs) {
Int256 results;
for (int i = 0; i < DATA_LEN; ++i) {
results.data[i] = data[i] & rhs.data[i];
}
return results;
}
bool operator==(const Int256& rhs) {
for (int i = 0; i < DATA_LEN; ++i) {
if (data[i] != rhs.data[i]) return false;
}
return true;
}
};
struct Node {
string s;
Int256 sign;
Node(string s1):s(s1), sign(s1){};
};
bool CompareNodes(const Node& lhs, const Node& rhs) {
return lhs.s.size() > rhs.s.size();
}
int NoOverlapMultply(vector ss) {
vector nodes;
for (string s: ss) {
nodes.push_back(Node(s));
}
sort(nodes.begin(), nodes.end(), CompareNodes);
int curMax = 0;
for (int i = 0; i < nodes.size(); ++i) {
int iLen = nodes[i].s.size();
Int256 iSign = nodes[i].sign;
for (int j = i + 1; j < nodes.size(); ++j) {
if (nodes[j].s.size() * iLen <= curMax) break;
Int256 andSign = iSign.BitAnd(nodes[j].sign);
if (andSign.IsZero()) {
curMax = nodes[j].s.size() * iLen;
break;
}
}
}
return curMax;
}

【在 k*******r 的大作中提到】