问道G家的题 - 未名空间MITBBS历史存档

f*j2014-07-27 07:07

1 楼

看面经看来的，求大牛开解
Abbreviation: apple can be abbreviated to 5, a4, 4e, a3e, …
Given a target string (internationalization), and a set of strings,
return the minimal length of abbreviation of this target string so that it
won’t conflict with abbrs of the strings in the set.
“apple”, [“blade”] -> a4 (5 is conflicted with “blade”)
“apple”, [“plain”, “amber”, “blade”] -> ???
Problem changed to:
If given a string and an abbreviation, return if the string matches abbr.
“internationalization”, “i5a11o1” -> true
后半道是正常的string processing, 问题是前半道毫无头绪，跪求思路!

r*a2014-07-27 07:07

2 楼

我很怀疑第一题有没有多项式解法。因为似乎可以从图的最小支配集问题（一个NPC）
归约到它。

【在 f***j 的大作中提到】

: 看面经看来的，求大牛开解
: Abbreviation: apple can be abbreviated to 5, a4, 4e, a3e, …
: Given a target string (internationalization), and a set of strings,
: return the minimal length of abbreviation of this target string so that it
: won’t conflict with abbrs of the strings in the set.
: “apple”, [“blade”] -> a4 (5 is conflicted with “blade”)
: “apple”, [“plain”, “amber”, “blade”] -> ???
: Problem changed to:
: If given a string and an abbreviation, return if the string matches abbr.
: “internationalization”, “i5a11o1” -> true

b*e2014-07-27 07:07

3 楼

This reduces to:
http://en.wikipedia.org/wiki/Minimum-cost_flow_problem
Only strings of the same length as the target string are candidates of a
potential conflict. Compute the intersection of each candidates with the
target. In your example:
“apple”, [“plain”, “amber”, “blade”]
We get [{1}, {0}, {4}]. Then the task is to find the smallest set that is
not totally included in any of the intersections. In this case, {2} is good
. So the answer should be "2p2". From another angle to look at
it, it is the same as the following problem:
Given a set of subsets S, where for each s subset I of U, such that I intersects with every s be reduced to a max flow and min cost problem.

【在 f***j 的大作中提到】

: 看面经看来的，求大牛开解
: Abbreviation: apple can be abbreviated to 5, a4, 4e, a3e, …
: Given a target string (internationalization), and a set of strings,
: return the minimal length of abbreviation of this target string so that it
: won’t conflict with abbrs of the strings in the set.
: “apple”, [“blade”] -> a4 (5 is conflicted with “blade”)
: “apple”, [“plain”, “amber”, “blade”] -> ???
: Problem changed to:
: If given a string and an abbreviation, return if the string matches abbr.
: “internationalization”, “i5a11o1” -> true

b*e2014-07-27 07:07

4 楼

你的reduction可能不是polynormial。

【在 r**a 的大作中提到】

: 我很怀疑第一题有没有多项式解法。因为似乎可以从图的最小支配集问题（一个NPC）
: 归约到它。

r*a2014-07-27 07:07

5 楼

实际上你想的应该和我差不多，但最后抽象出来的那个问题用费用流是解不了的，那其
实就是更一般化的最小支配集问题

good

【在 b***e 的大作中提到】

: This reduces to:
: http://en.wikipedia.org/wiki/Minimum-cost_flow_problem
: Only strings of the same length as the target string are candidates of a
: potential conflict. Compute the intersection of each candidates with the
: target. In your example:
: “apple”, [“plain”, “amber”, “blade”]
: We get [{1}, {0}, {4}]. Then the task is to find the smallest set that is
: not totally included in any of the intersections. In this case, {2} is good
: . So the answer should be "2p2". From another angle to look at
: it, it is the same as the following problem:

f*j2014-07-27 07:07

6 楼

如果所有的position 都有confict呢？
"aabadaa"
["aabaxaa",
"aaxadaa"]
应该返回 2b1d2

good

【在 b***e 的大作中提到】

: This reduces to:
: http://en.wikipedia.org/wiki/Minimum-cost_flow_problem
: Only strings of the same length as the target string are candidates of a
: potential conflict. Compute the intersection of each candidates with the
: target. In your example:
: “apple”, [“plain”, “amber”, “blade”]
: We get [{1}, {0}, {4}]. Then the task is to find the smallest set that is
: not totally included in any of the intersections. In this case, {2} is good
: . So the answer should be "2p2". From another angle to look at
: it, it is the same as the following problem:

m*n2014-07-27 07:07

7 楼

非科班出身，看都看不懂什么最小子集，唉~~。
用个最土的方法，BFS来解，时间复杂度可能不是最优，抛砖引玉。
建一个bitset table，C++用vector >
每个row就是dynamic_bitset<> 存dict里面长度与target相同的string的第column位的
与target第column位的异同值，比如
apple, [blade] -> [[0],[0],[0],[0],[1]]
apple, [“plain”, “amber”, “blade”] -> [[0,1,0],[0,0,0],[0,0,0],[0,0,0]
,[0,0,1]]。
完成转换以后可以一眼看出，如何得到答案，就是每个row全0的可以完全区分。以上第
二题1p3,2p2,3l1都是答案。
其中，头尾两个row最特殊，比中间row要abbr度要高，比如第一题a4,显然比1p3,2p2,
3l1要高。
对于复杂问题，没有一个row全0的话，就要用BFS，逐个logic_and组合，得到全0的row
。组合时
可以把全1的row全部踢掉，因为没帮助。
比如 "aabadaa"，["aabaxaa","aaxadaa"] -> [[1,1],[1,1],[1,0],[1,1],[0,1],[1,1
],[1,1]] -> 第3位[1,0] & 第5位[0,1] -> 2b1d2

w*e2014-07-27 07:07

8 楼

0]

【在 m*****n 的大作中提到】

: 非科班出身，看都看不懂什么最小子集，唉~~。
: 用个最土的方法，BFS来解，时间复杂度可能不是最优，抛砖引玉。
: 建一个bitset table，C++用vector >
: 每个row就是dynamic_bitset<> 存dict里面长度与target相同的string的第column位的
: 与target第column位的异同值，比如
: apple, [blade] -> [[0],[0],[0],[0],[1]]
: apple, [“plain”, “amber”, “blade”] -> [[0,1,0],[0,0,0],[0,0,0],[0,0,0]
: ,[0,0,1]]。
: 完成转换以后可以一眼看出，如何得到答案，就是每个row全0的可以完全区分。以上第
: 二题1p3,2p2,3l1都是答案。

Z*42014-07-27 07:07

9 楼

可以详细说说你怎么使用BFS嘛？
我看了一下你的解法，大概理解但是还是有疑问。比如这样的例子你的解法怎么计算？
target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
我的想法跟你差不多不过是按照行进行编码（你把一个位置的放到一个数组里面，我一
个单词就是一个二进制如下）:
比如target apple [amble, plain] 这里amble就是10011 plain就是00000。
然后我们需要找到apple的一个最短abbr X满足一定的条件。
X也是可以对应二进制的，比如5就相当于00000 a4相当于10000 4e相当于00001
ap3相当于11000
你就会发现符合条件的X是满足X|dict[i] != dict[i] （i is any binary code in
dict）
比如你看a4在这里就不行因为10000 | 10011 == 10011 所以必须用1p3才能最短。
知道这个以后就把apple的各种缩写进行遍历，找到第一个不喝dict里面的binaries冲
突的就行。
复杂度2^(target lenth)*O(dict size)
当然可以优化，我觉得可以把dict转化成binaries的时候就可以记录下很多可以排除的
abbr了。但是具体怎么弄还没想清楚。

0]

【在 m*****n 的大作中提到】

: 非科班出身，看都看不懂什么最小子集，唉~~。
: 用个最土的方法，BFS来解，时间复杂度可能不是最优，抛砖引玉。
: 建一个bitset table，C++用vector >
: 每个row就是dynamic_bitset<> 存dict里面长度与target相同的string的第column位的
: 与target第column位的异同值，比如
: apple, [blade] -> [[0],[0],[0],[0],[1]]
: apple, [“plain”, “amber”, “blade”] -> [[0,1,0],[0,0,0],[0,0,0],[0,0,0]
: ,[0,0,1]]。
: 完成转换以后可以一眼看出，如何得到答案，就是每个row全0的可以完全区分。以上第
: 二题1p3,2p2,3l1都是答案。

f*j2014-07-27 07:07

10 楼

这道题当电面题太逆天了

【在 Z**********4 的大作中提到】

: 可以详细说说你怎么使用BFS嘛？
: 我看了一下你的解法，大概理解但是还是有疑问。比如这样的例子你的解法怎么计算？
: target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
: 我的想法跟你差不多不过是按照行进行编码（你把一个位置的放到一个数组里面，我一
: 个单词就是一个二进制如下）:
: 比如target apple [amble, plain] 这里amble就是10011 plain就是00000。
: 然后我们需要找到apple的一个最短abbr X满足一定的条件。
: X也是可以对应二进制的，比如5就相当于00000 a4相当于10000 4e相当于00001
: ap3相当于11000
: 你就会发现符合条件的X是满足X|dict[i] != dict[i] （i is any binary code in

m*n2014-07-27 07:07

11 楼

target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
编码是[[10000],[01000],[00100],[00010],[00001]]所以最小的是取任意两个row
and就可以了。但是要优先取头尾两个row，原因上面说过了。所以这题答案是ab3, 3de
, a3e都行，其他的如a1c2,等等，abbr度不够，

【在 Z**********4 的大作中提到】

: 可以详细说说你怎么使用BFS嘛？
: 我看了一下你的解法，大概理解但是还是有疑问。比如这样的例子你的解法怎么计算？
: target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
: 我的想法跟你差不多不过是按照行进行编码（你把一个位置的放到一个数组里面，我一
: 个单词就是一个二进制如下）:
: 比如target apple [amble, plain] 这里amble就是10011 plain就是00000。
: 然后我们需要找到apple的一个最短abbr X满足一定的条件。
: X也是可以对应二进制的，比如5就相当于00000 a4相当于10000 4e相当于00001
: ap3相当于11000
: 你就会发现符合条件的X是满足X|dict[i] != dict[i] （i is any binary code in

w*e2014-07-27 07:07

12 楼

这个方法是我看到最好的一个但是最后这个选row的过程可能没有那么简单啊这步
复杂度是多少能分析一下么

3de

【在 m*****n 的大作中提到】

: target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
: 编码是[[10000],[01000],[00100],[00010],[00001]]所以最小的是取任意两个row
: and就可以了。但是要优先取头尾两个row，原因上面说过了。所以这题答案是ab3, 3de
: , a3e都行，其他的如a1c2,等等，abbr度不够，

m*n2014-07-27 07:07

13 楼

不敢当，其实我觉得还可以优化。BFS是最省力的方法，最差时间复杂度是2^(M/2) M是
string的长度。

【在 w******e 的大作中提到】

: 这个方法是我看到最好的一个但是最后这个选row的过程可能没有那么简单啊这步
: 复杂度是多少能分析一下么
:
: 3de

Z*42014-07-27 07:07

14 楼

你的算法是有问题的。
看这个例子
target:
apple
dict: [applt, hpple]
字典里面转化成二进制是[11110, 01111]
而此题的答案应该是a3e -- 10001

【在 m*****n 的大作中提到】

: 不敢当，其实我觉得还可以优化。BFS是最省力的方法，最差时间复杂度是2^(M/2) M是
: string的长度。

m*n2014-07-27 07:07

15 楼

你没看懂我的意思，你的例子的编码是
[[10],[11],[11],[11],[01]]，不是[11110, 01111]。
所以很快就能得到a3e的答案。

【在 Z**********4 的大作中提到】

: 你的算法是有问题的。
: 看这个例子
: target:
: apple
: dict: [applt, hpple]
: 字典里面转化成二进制是[11110, 01111]
: 而此题的答案应该是a3e -- 10001

Z*42014-07-27 07:07

16 楼

恩恩。这下看懂你的意思。
但是到底是怎么得到最短的编码还是有点不明白。
能再解释解释吗？

【在 m*****n 的大作中提到】

: 你没看懂我的意思，你的例子的编码是
: [[10],[11],[11],[11],[01]]，不是[11110, 01111]。
: 所以很快就能得到a3e的答案。

y*n2014-07-27 07:07

17 楼

用Brute Force 会不会被秒杀。
apple can be abbreviated to 5, a4, 4e, a3e, …，用每一个abbreviation 去和每
个Candiate比较。

b*e2014-07-27 07:07

18 楼

Let’s see. The problem reduces to the following, formally:
Let U be the universe and S is a set of subsets of U. Find the minimal
subset of U, namely i, such that i intersects each s in S.
Here is how this problem can be reduced to max flow&min cost:
1. Build a bii-secting graph, where the left side vertices are elements of U
, and the right side vertices are elements of S. Connect left side with
right side if there is a “belongs-to” relation. Each connection has a
capacity of 1 and cost 0.
2. Build a source that is connecting to each vertex on the left side. Each
connection has a capacity of infinity and cost 1.
3. Build a target that is connecting to each vertex on the right side. Each
connection has a capacity of 1 and cost 0.
4. Resolving the flow/cost problem from the source vertex to the target
vertex as laid out in step 1 through 3 solves the original supporting set
problem.

【在 r**a 的大作中提到】

: 实际上你想的应该和我差不多，但最后抽象出来的那个问题用费用流是解不了的，那其
: 实就是更一般化的最小支配集问题
:
: good

r*a2014-07-27 07:07

19 楼

>> 2. Build a source that is connecting to each vertex on the left side.
Each
connection has a capacity of infinity and cost 1.
如果结果里某个点cover了N(大于1)个set，费用流的cost会把它算成N，但实际上应该
是1

U
Each

【在 b***e 的大作中提到】

: Let’s see. The problem reduces to the following, formally:
: Let U be the universe and S is a set of subsets of U. Find the minimal
: subset of U, namely i, such that i intersects each s in S.
: Here is how this problem can be reduced to max flow&min cost:
: 1. Build a bii-secting graph, where the left side vertices are elements of U
: , and the right side vertices are elements of S. Connect left side with
: right side if there is a “belongs-to” relation. Each connection has a
: capacity of 1 and cost 0.
: 2. Build a source that is connecting to each vertex on the left side. Each
: connection has a capacity of infinity and cost 1.

b*e2014-07-27 07:07

20 楼

对。

【在 r**a 的大作中提到】

: >> 2. Build a source that is connecting to each vertex on the left side.
: Each
: connection has a capacity of infinity and cost 1.
: 如果结果里某个点cover了N(大于1)个set，费用流的cost会把它算成N，但实际上应该
: 是1
:
: U
: Each

l*62014-07-27 07:07

21 楼

I think this problem is more complicated because it asks for the shortest
length of such a abbre.
With 2 letters remain, ap3 is different from a1p2

U
Each

【在 b***e 的大作中提到】

: Let’s see. The problem reduces to the following, formally:
: Let U be the universe and S is a set of subsets of U. Find the minimal
: subset of U, namely i, such that i intersects each s in S.
: Here is how this problem can be reduced to max flow&min cost:
: 1. Build a bii-secting graph, where the left side vertices are elements of U
: , and the right side vertices are elements of S. Connect left side with
: right side if there is a “belongs-to” relation. Each connection has a
: capacity of 1 and cost 0.
: 2. Build a source that is connecting to each vertex on the left side. Each
: connection has a capacity of infinity and cost 1.

m*n2014-07-27 07:07

22 楼

BFS算法的确不行。
这道题确实还能优化，最快时间复杂度是 M*N, M是target string的长度，N是dict的
长度。
我觉得这个算法基本到极限了。

【在 m*****n 的大作中提到】

: 不敢当，其实我觉得还可以优化。BFS是最省力的方法，最差时间复杂度是2^(M/2) M是
: string的长度。

b*p2014-07-27 07:07

23 楼

这个行吗？
#!/usr/bin/python
class WordAbbr:
def __init__(self):
pass
def word_abbr(self, word, start, len_abbr):
if len(word) < start + len_abbr:
return False
return word[:start] + word[start+len_abbr:]
def solution(self, word, list_words):
word_len = len(word)
for i in range(word_len - 1):
len_abbr = word_len - i
for j in range(word_len + 1 - len_abbr):
word_abbr = self.word_abbr(word, j, len_abbr)
conflict = False
for ref_word in list_words:
if word_abbr == self.word_abbr(ref_word, j, len_abbr):
conflict = True
break
if not conflict:
return word[:j]+str(len_abbr)+word[j+len_abbr:]
if __name__ == "__main__":
s = WordAbbr()
print s.solution('apple', ['blad'])
print s.solution('apple', ['blade'])
print s.solution('apple', ['blade', 'plain', 'amber'])
print s.solution('abcdef', ['ablade', 'xxxxef', 'abdefi'])
print s.solution('aaaaa', ['bbbbb', 'ccccc', 'ddddd', 'eeeee'])

【在 f***j 的大作中提到】

: 看面经看来的，求大牛开解
: Abbreviation: apple can be abbreviated to 5, a4, 4e, a3e, …
: Given a target string (internationalization), and a set of strings,
: return the minimal length of abbreviation of this target string so that it
: won’t conflict with abbrs of the strings in the set.
: “apple”, [“blade”] -> a4 (5 is conflicted with “blade”)
: “apple”, [“plain”, “amber”, “blade”] -> ???
: Problem changed to:
: If given a string and an abbreviation, return if the string matches abbr.
: “internationalization”, “i5a11o1” -> true

f*j2014-07-27 07:07

24 楼

看面经看来的，求大牛开解
Abbreviation: apple can be abbreviated to 5, a4, 4e, a3e, …
Given a target string (internationalization), and a set of strings,
return the minimal length of abbreviation of this target string so that it
won’t conflict with abbrs of the strings in the set.
“apple”, [“blade”] -> a4 (5 is conflicted with “blade”)
“apple”, [“plain”, “amber”, “blade”] -> ???
Problem changed to:
If given a string and an abbreviation, return if the string matches abbr.
“internationalization”, “i5a11o1” -> true
后半道是正常的string processing, 问题是前半道毫无头绪，跪求思路!

r*a2014-07-27 07:07

25 楼

我很怀疑第一题有没有多项式解法。因为似乎可以从图的最小支配集问题（一个NPC）
归约到它。

【在 f***j 的大作中提到】

: 看面经看来的，求大牛开解
: Abbreviation: apple can be abbreviated to 5, a4, 4e, a3e, …
: Given a target string (internationalization), and a set of strings,
: return the minimal length of abbreviation of this target string so that it
: won’t conflict with abbrs of the strings in the set.
: “apple”, [“blade”] -> a4 (5 is conflicted with “blade”)
: “apple”, [“plain”, “amber”, “blade”] -> ???
: Problem changed to:
: If given a string and an abbreviation, return if the string matches abbr.
: “internationalization”, “i5a11o1” -> true

b*e2014-07-27 07:07

26 楼

This reduces to:
http://en.wikipedia.org/wiki/Minimum-cost_flow_problem
Only strings of the same length as the target string are candidates of a
potential conflict. Compute the intersection of each candidates with the
target. In your example:
“apple”, [“plain”, “amber”, “blade”]
We get [{1}, {0}, {4}]. Then the task is to find the smallest set that is
not totally included in any of the intersections. In this case, {2} is good
. So the answer should be "2p2". From another angle to look at
it, it is the same as the following problem:
Given a set of subsets S, where for each s subset I of U, such that I intersects with every s be reduced to a max flow and min cost problem.

【在 f***j 的大作中提到】

: 看面经看来的，求大牛开解
: Abbreviation: apple can be abbreviated to 5, a4, 4e, a3e, …
: Given a target string (internationalization), and a set of strings,
: return the minimal length of abbreviation of this target string so that it
: won’t conflict with abbrs of the strings in the set.
: “apple”, [“blade”] -> a4 (5 is conflicted with “blade”)
: “apple”, [“plain”, “amber”, “blade”] -> ???
: Problem changed to:
: If given a string and an abbreviation, return if the string matches abbr.
: “internationalization”, “i5a11o1” -> true

b*e2014-07-27 07:07

27 楼

你的reduction可能不是polynormial。

【在 r**a 的大作中提到】

: 我很怀疑第一题有没有多项式解法。因为似乎可以从图的最小支配集问题（一个NPC）
: 归约到它。

r*a2014-07-27 07:07

28 楼

实际上你想的应该和我差不多，但最后抽象出来的那个问题用费用流是解不了的，那其
实就是更一般化的最小支配集问题

good

【在 b***e 的大作中提到】

: This reduces to:
: http://en.wikipedia.org/wiki/Minimum-cost_flow_problem
: Only strings of the same length as the target string are candidates of a
: potential conflict. Compute the intersection of each candidates with the
: target. In your example:
: “apple”, [“plain”, “amber”, “blade”]
: We get [{1}, {0}, {4}]. Then the task is to find the smallest set that is
: not totally included in any of the intersections. In this case, {2} is good
: . So the answer should be "2p2". From another angle to look at
: it, it is the same as the following problem:

f*j2014-07-27 07:07

29 楼

如果所有的position 都有confict呢？
"aabadaa"
["aabaxaa",
"aaxadaa"]
应该返回 2b1d2

good

【在 b***e 的大作中提到】

: This reduces to:
: http://en.wikipedia.org/wiki/Minimum-cost_flow_problem
: Only strings of the same length as the target string are candidates of a
: potential conflict. Compute the intersection of each candidates with the
: target. In your example:
: “apple”, [“plain”, “amber”, “blade”]
: We get [{1}, {0}, {4}]. Then the task is to find the smallest set that is
: not totally included in any of the intersections. In this case, {2} is good
: . So the answer should be "2p2". From another angle to look at
: it, it is the same as the following problem:

m*n2014-07-27 07:07

30 楼

非科班出身，看都看不懂什么最小子集，唉~~。
用个最土的方法，BFS来解，时间复杂度可能不是最优，抛砖引玉。
建一个bitset table，C++用vector >
每个row就是dynamic_bitset<> 存dict里面长度与target相同的string的第column位的
与target第column位的异同值，比如
apple, [blade] -> [[0],[0],[0],[0],[1]]
apple, [“plain”, “amber”, “blade”] -> [[0,1,0],[0,0,0],[0,0,0],[0,0,0]
,[0,0,1]]。
完成转换以后可以一眼看出，如何得到答案，就是每个row全0的可以完全区分。以上第
二题1p3,2p2,3l1都是答案。
其中，头尾两个row最特殊，比中间row要abbr度要高，比如第一题a4,显然比1p3,2p2,
3l1要高。
对于复杂问题，没有一个row全0的话，就要用BFS，逐个logic_and组合，得到全0的row
。组合时
可以把全1的row全部踢掉，因为没帮助。
比如 "aabadaa"，["aabaxaa","aaxadaa"] -> [[1,1],[1,1],[1,0],[1,1],[0,1],[1,1
],[1,1]] -> 第3位[1,0] & 第5位[0,1] -> 2b1d2

w*e2014-07-27 07:07

31 楼

0]

【在 m*****n 的大作中提到】

: 非科班出身，看都看不懂什么最小子集，唉~~。
: 用个最土的方法，BFS来解，时间复杂度可能不是最优，抛砖引玉。
: 建一个bitset table，C++用vector >
: 每个row就是dynamic_bitset<> 存dict里面长度与target相同的string的第column位的
: 与target第column位的异同值，比如
: apple, [blade] -> [[0],[0],[0],[0],[1]]
: apple, [“plain”, “amber”, “blade”] -> [[0,1,0],[0,0,0],[0,0,0],[0,0,0]
: ,[0,0,1]]。
: 完成转换以后可以一眼看出，如何得到答案，就是每个row全0的可以完全区分。以上第
: 二题1p3,2p2,3l1都是答案。

Z*42014-07-27 07:07

32 楼

可以详细说说你怎么使用BFS嘛？
我看了一下你的解法，大概理解但是还是有疑问。比如这样的例子你的解法怎么计算？
target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
我的想法跟你差不多不过是按照行进行编码（你把一个位置的放到一个数组里面，我一
个单词就是一个二进制如下）:
比如target apple [amble, plain] 这里amble就是10011 plain就是00000。
然后我们需要找到apple的一个最短abbr X满足一定的条件。
X也是可以对应二进制的，比如5就相当于00000 a4相当于10000 4e相当于00001
ap3相当于11000
你就会发现符合条件的X是满足X|dict[i] != dict[i] （i is any binary code in
dict）
比如你看a4在这里就不行因为10000 | 10011 == 10011 所以必须用1p3才能最短。
知道这个以后就把apple的各种缩写进行遍历，找到第一个不喝dict里面的binaries冲
突的就行。
复杂度2^(target lenth)*O(dict size)
当然可以优化，我觉得可以把dict转化成binaries的时候就可以记录下很多可以排除的
abbr了。但是具体怎么弄还没想清楚。

0]

【在 m*****n 的大作中提到】

: 非科班出身，看都看不懂什么最小子集，唉~~。
: 用个最土的方法，BFS来解，时间复杂度可能不是最优，抛砖引玉。
: 建一个bitset table，C++用vector >
: 每个row就是dynamic_bitset<> 存dict里面长度与target相同的string的第column位的
: 与target第column位的异同值，比如
: apple, [blade] -> [[0],[0],[0],[0],[1]]
: apple, [“plain”, “amber”, “blade”] -> [[0,1,0],[0,0,0],[0,0,0],[0,0,0]
: ,[0,0,1]]。
: 完成转换以后可以一眼看出，如何得到答案，就是每个row全0的可以完全区分。以上第
: 二题1p3,2p2,3l1都是答案。

f*j2014-07-27 07:07

33 楼

这道题当电面题太逆天了

【在 Z**********4 的大作中提到】

: 可以详细说说你怎么使用BFS嘛？
: 我看了一下你的解法，大概理解但是还是有疑问。比如这样的例子你的解法怎么计算？
: target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
: 我的想法跟你差不多不过是按照行进行编码（你把一个位置的放到一个数组里面，我一
: 个单词就是一个二进制如下）:
: 比如target apple [amble, plain] 这里amble就是10011 plain就是00000。
: 然后我们需要找到apple的一个最短abbr X满足一定的条件。
: X也是可以对应二进制的，比如5就相当于00000 a4相当于10000 4e相当于00001
: ap3相当于11000
: 你就会发现符合条件的X是满足X|dict[i] != dict[i] （i is any binary code in

m*n2014-07-27 07:07

34 楼

target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
编码是[[10000],[01000],[00100],[00010],[00001]]所以最小的是取任意两个row
and就可以了。但是要优先取头尾两个row，原因上面说过了。所以这题答案是ab3, 3de
, a3e都行，其他的如a1c2,等等，abbr度不够，

【在 Z**********4 的大作中提到】

: 可以详细说说你怎么使用BFS嘛？
: 我看了一下你的解法，大概理解但是还是有疑问。比如这样的例子你的解法怎么计算？
: target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
: 我的想法跟你差不多不过是按照行进行编码（你把一个位置的放到一个数组里面，我一
: 个单词就是一个二进制如下）:
: 比如target apple [amble, plain] 这里amble就是10011 plain就是00000。
: 然后我们需要找到apple的一个最短abbr X满足一定的条件。
: X也是可以对应二进制的，比如5就相当于00000 a4相当于10000 4e相当于00001
: ap3相当于11000
: 你就会发现符合条件的X是满足X|dict[i] != dict[i] （i is any binary code in

w*e2014-07-27 07:07

35 楼

这个方法是我看到最好的一个但是最后这个选row的过程可能没有那么简单啊这步
复杂度是多少能分析一下么

3de

【在 m*****n 的大作中提到】

: target: abcde 字典：[aaaaa, bbbbb, ccccc, ddddd, eeeee]
: 编码是[[10000],[01000],[00100],[00010],[00001]]所以最小的是取任意两个row
: and就可以了。但是要优先取头尾两个row，原因上面说过了。所以这题答案是ab3, 3de
: , a3e都行，其他的如a1c2,等等，abbr度不够，

m*n2014-07-27 07:07

36 楼

不敢当，其实我觉得还可以优化。BFS是最省力的方法，最差时间复杂度是2^(M/2) M是
string的长度。

【在 w******e 的大作中提到】

: 这个方法是我看到最好的一个但是最后这个选row的过程可能没有那么简单啊这步
: 复杂度是多少能分析一下么
:
: 3de

Z*42014-07-27 07:07

37 楼

你的算法是有问题的。
看这个例子
target:
apple
dict: [applt, hpple]
字典里面转化成二进制是[11110, 01111]
而此题的答案应该是a3e -- 10001

【在 m*****n 的大作中提到】

: 不敢当，其实我觉得还可以优化。BFS是最省力的方法，最差时间复杂度是2^(M/2) M是
: string的长度。

m*n2014-07-27 07:07

38 楼

你没看懂我的意思，你的例子的编码是
[[10],[11],[11],[11],[01]]，不是[11110, 01111]。
所以很快就能得到a3e的答案。

【在 Z**********4 的大作中提到】

: 你的算法是有问题的。
: 看这个例子
: target:
: apple
: dict: [applt, hpple]
: 字典里面转化成二进制是[11110, 01111]
: 而此题的答案应该是a3e -- 10001

Z*42014-07-27 07:07

39 楼

恩恩。这下看懂你的意思。
但是到底是怎么得到最短的编码还是有点不明白。
能再解释解释吗？

【在 m*****n 的大作中提到】

: 你没看懂我的意思，你的例子的编码是
: [[10],[11],[11],[11],[01]]，不是[11110, 01111]。
: 所以很快就能得到a3e的答案。

y*n2014-07-27 07:07

40 楼

用Brute Force 会不会被秒杀。
apple can be abbreviated to 5, a4, 4e, a3e, …，用每一个abbreviation 去和每
个Candiate比较。

b*e2014-07-27 07:07

41 楼

Let’s see. The problem reduces to the following, formally:
Let U be the universe and S is a set of subsets of U. Find the minimal
subset of U, namely i, such that i intersects each s in S.
Here is how this problem can be reduced to max flow&min cost:
1. Build a bii-secting graph, where the left side vertices are elements of U
, and the right side vertices are elements of S. Connect left side with
right side if there is a “belongs-to” relation. Each connection has a
capacity of 1 and cost 0.
2. Build a source that is connecting to each vertex on the left side. Each
connection has a capacity of infinity and cost 1.
3. Build a target that is connecting to each vertex on the right side. Each
connection has a capacity of 1 and cost 0.
4. Resolving the flow/cost problem from the source vertex to the target
vertex as laid out in step 1 through 3 solves the original supporting set
problem.

【在 r**a 的大作中提到】

: 实际上你想的应该和我差不多，但最后抽象出来的那个问题用费用流是解不了的，那其
: 实就是更一般化的最小支配集问题
:
: good

r*a2014-07-27 07:07

42 楼

>> 2. Build a source that is connecting to each vertex on the left side.
Each
connection has a capacity of infinity and cost 1.
如果结果里某个点cover了N(大于1)个set，费用流的cost会把它算成N，但实际上应该
是1

U
Each

【在 b***e 的大作中提到】

: Let’s see. The problem reduces to the following, formally:
: Let U be the universe and S is a set of subsets of U. Find the minimal
: subset of U, namely i, such that i intersects each s in S.
: Here is how this problem can be reduced to max flow&min cost:
: 1. Build a bii-secting graph, where the left side vertices are elements of U
: , and the right side vertices are elements of S. Connect left side with
: right side if there is a “belongs-to” relation. Each connection has a
: capacity of 1 and cost 0.
: 2. Build a source that is connecting to each vertex on the left side. Each
: connection has a capacity of infinity and cost 1.

b*e2014-07-27 07:07

43 楼

对。

【在 r**a 的大作中提到】

: >> 2. Build a source that is connecting to each vertex on the left side.
: Each
: connection has a capacity of infinity and cost 1.
: 如果结果里某个点cover了N(大于1)个set，费用流的cost会把它算成N，但实际上应该
: 是1
:
: U
: Each

l*62014-07-27 07:07

44 楼

I think this problem is more complicated because it asks for the shortest
length of such a abbre.
With 2 letters remain, ap3 is different from a1p2

U
Each

【在 b***e 的大作中提到】

: Let’s see. The problem reduces to the following, formally:
: Let U be the universe and S is a set of subsets of U. Find the minimal
: subset of U, namely i, such that i intersects each s in S.
: Here is how this problem can be reduced to max flow&min cost:
: 1. Build a bii-secting graph, where the left side vertices are elements of U
: , and the right side vertices are elements of S. Connect left side with
: right side if there is a “belongs-to” relation. Each connection has a
: capacity of 1 and cost 0.
: 2. Build a source that is connecting to each vertex on the left side. Each
: connection has a capacity of infinity and cost 1.

m*n2014-07-27 07:07

45 楼

BFS算法的确不行。
这道题确实还能优化，最快时间复杂度是 M*N, M是target string的长度，N是dict的
长度。
我觉得这个算法基本到极限了。

【在 m*****n 的大作中提到】

: 不敢当，其实我觉得还可以优化。BFS是最省力的方法，最差时间复杂度是2^(M/2) M是
: string的长度。

b*p2014-07-27 07:07

46 楼

这个行吗？
#!/usr/bin/python
class WordAbbr:
def __init__(self):
pass
def word_abbr(self, word, start, len_abbr):
if len(word) < start + len_abbr:
return False
return word[:start] + word[start+len_abbr:]
def solution(self, word, list_words):
word_len = len(word)
for i in range(word_len - 1):
len_abbr = word_len - i
for j in range(word_len + 1 - len_abbr):
word_abbr = self.word_abbr(word, j, len_abbr)
conflict = False
for ref_word in list_words:
if word_abbr == self.word_abbr(ref_word, j, len_abbr):
conflict = True
break
if not conflict:
return word[:j]+str(len_abbr)+word[j+len_abbr:]
if __name__ == "__main__":
s = WordAbbr()
print s.solution('apple', ['blad'])
print s.solution('apple', ['blade'])
print s.solution('apple', ['blade', 'plain', 'amber'])
print s.solution('abcdef', ['ablade', 'xxxxef', 'abdefi'])
print s.solution('aaaaa', ['bbbbb', 'ccccc', 'ddddd', 'eeeee'])

【在 f***j 的大作中提到】

: 看面经看来的，求大牛开解
: Abbreviation: apple can be abbreviated to 5, a4, 4e, a3e, …
: Given a target string (internationalization), and a set of strings,
: return the minimal length of abbreviation of this target string so that it
: won’t conflict with abbrs of the strings in the set.
: “apple”, [“blade”] -> a4 (5 is conflicted with “blade”)
: “apple”, [“plain”, “amber”, “blade”] -> ???
: Problem changed to:
: If given a string and an abbreviation, return if the string matches abbr.
: “internationalization”, “i5a11o1” -> true

y*i2014-07-27 07:07

47 楼

照着Maxthon的编码解法写了一个：
#include
#include
#include
#include
using namespace std;
bool bool_vec_none(const vector & vec) {
for(auto elem : vec) {
if (elem) {
return false;
}
}
return true;
}
bool bool_vec_all(const vector & vec) {
for(auto elem : vec) {
if (!elem) {
return false;
}
}
return true;
}
string word_abbreviation(string target, vector dict)
{
if (target.empty() || target.size() == 1) {
return target;
}
dict.erase(remove_if(dict.begin(), dict.end(),
[target](string word){ return word.length() !=
target.length(); }),
dict.end());
if (dict.empty()) {
return to_string(target.size());
}
int length = target.size();
vector> encoded(length, vector(dict.size(), false));
for(int word_index = 0; word_index < dict.size(); word_index++) {
for(int i = 0; i < length; i++) {
if(dict[word_index][i] == target[i]) {
encoded[i][word_index] = true;
}
}
}
string result;
int ignore_count = 0;
for(int i = 0; i < length; i++) {
if (bool_vec_none(encoded[i])) {
if (i == 0) {
return target.substr(i, 1) + to_string(length - 1 - i);
}
return to_string(i) + target.substr(i, 1) + (i == length - 1 ? "
" : to_string(length - i - 1));
}
if (bool_vec_all(encoded[i])) {
ignore_count++;
continue;
}
if (ignore_count) {
result += to_string(ignore_count) + target.substr(i, 1);
ignore_count = 0;
}
}
return result;
}
int main()
{
cout << word_abbreviation("apple", vector({"kkkk"})) << endl;
cout << word_abbreviation("apple", vector({"kkkkk"})) << endl;
cout << word_abbreviation("apple", vector({"blade"})) << endl;
cout << word_abbreviation("apple", vector({"kkkk", "blade"})) <<
endl;
cout << word_abbreviation("apple", vector({"blade", "plain", "
amber"})) << endl;
cout << word_abbreviation("abcdef", vector({"ablade", "xxxxef",
"abdefi"})) << endl;
cout << word_abbreviation("aaaaa", vector({"bbbbb", "ccccc", "
ddddd", "eeeee"})) << endl;
}

d*o2014-07-27 07:07

48 楼

其实我反倒觉得Brute Force才是面试里的正解
第一步：先把给定单词的所有abbreviation列出来
之后，去字典里match，也就是用到楼主提到的：
Problem changed to:
If given a string and an abbreviation, return if the string matches abbr.
其中，第一步也没有想象中的那么简单，需要用DFS去求出所有可能的组合，同时
ignore掉类似1pple这样明显无效的情况。在GlassDoor上看到Google也考过一个类似的题
目（不过简单些）：http://www.careercup.com/question?id=5733696185303040

【在 y***n 的大作中提到】

: 用Brute Force 会不会被秒杀。
: apple can be abbreviated to 5, a4, 4e, a3e, …，用每一个abbreviation 去和每
: 个Candiate比较。