一道电面题，分享下，这个题应该用哪几个data structure? - 未名空间MITBBS历史存档

一道电面题，分享下，这个题应该用哪几个data structure?# JobHunting - 待字闺中

d*i2013-01-09 08:01

1 楼

given input as an array of strings.
Such as: {"apple Orange", "ORANGE apple", "APPLe oRange", "HI There", "THERE
hI"};
return an array of strings. in the above case, will return "APPLe, oRange",
""THERE hI".
Here are the rules:
1. two strings are the same when words matches(case insensitive) and order
doesn't matter, so "apple Orange" == "APPLe oRange" == "ORANGE apple".
2. if multiple identical strings exist, only return the one that occurs at
the last location, so "APPLe oRange" and "THERE hI" will in the result.
3. the relative order cannot be changed, so we cannot have result as "THERE
hI", "APPLe oRange".
Any idea on what data structure is the best for this question?

j*y2013-01-09 08:01

2 楼

set > ?

THERE
,

【在 d******i 的大作中提到】

: given input as an array of strings.
: Such as: {"apple Orange", "ORANGE apple", "APPLe oRange", "HI There", "THERE
: hI"};
: return an array of strings. in the above case, will return "APPLe, oRange",
: ""THERE hI".
: Here are the rules:
: 1. two strings are the same when words matches(case insensitive) and order
: doesn't matter, so "apple Orange" == "APPLe oRange" == "ORANGE apple".
: 2. if multiple identical strings exist, only return the one that occurs at
: the last location, so "APPLe oRange" and "THERE hI" will in the result.

p*22013-01-09 08:01

3 楼

THERE
,
HashMap应该就可以了。

【在 d******i 的大作中提到】

: given input as an array of strings.
: Such as: {"apple Orange", "ORANGE apple", "APPLe oRange", "HI There", "THERE
: hI"};
: return an array of strings. in the above case, will return "APPLe, oRange",
: ""THERE hI".
: Here are the rules:
: 1. two strings are the same when words matches(case insensitive) and order
: doesn't matter, so "apple Orange" == "APPLe oRange" == "ORANGE apple".
: 2. if multiple identical strings exist, only return the one that occurs at
: the last location, so "APPLe oRange" and "THERE hI" will in the result.

d*i2013-01-09 08:01

4 楼

key and value是什么

【在 p*****2 的大作中提到】

:
: THERE
: ,
: HashMap应该就可以了。

p*22013-01-09 08:01

5 楼

把一个string的words转换成小写，然后排序做key
value就是真正string的index

d*i2013-01-09 08:01

6 楼

将string中的word排序，然后这个排序的array当做key。
我当时也是这么说的，面试官貌似不同意。
能把这个hashtable如何声明写下吗？

【在 p*****2 的大作中提到】

: 把一个string的words转换成小写，然后排序做key
: value就是真正string的index

p*22013-01-09 08:01

7 楼

array不行，要string才可以
HashMap

【在 d******i 的大作中提到】

: 将string中的word排序，然后这个排序的array当做key。
: 我当时也是这么说的，面试官貌似不同意。
: 能把这个hashtable如何声明写下吗？

p*22013-01-09 08:01

8 楼

这道题我做了一下，放到了博客里
http://blog.sina.com.cn/s/blog_b9285de20101h24d.html

M*52013-01-09 08:01

9 楼

排序会增加不必要的复杂度，这个不需要排序这一步。。。

【在 d******i 的大作中提到】

: 将string中的word排序，然后这个排序的array当做key。
: 我当时也是这么说的，面试官貌似不同意。
: 能把这个hashtable如何声明写下吗？

p*22013-01-09 08:01

10 楼

感觉问题不大。你的想法是什么？

【在 M********5 的大作中提到】

: 排序会增加不必要的复杂度，这个不需要排序这一步。。。

M*52013-01-09 08:01

11 楼

跟你想法差不多，不过我还在想如果用这个算法的话，最后还是要根据index来做排序
输出最后的结果，因为题目说relative order不能变，有没有可能省去这一步？

【在 p*****2 的大作中提到】

:
: 感觉问题不大。你的想法是什么？

p*22013-01-09 08:01

12 楼

又想了一下。有个办法。用一个set
每次碰到重复的把前一个放到set里。这样最后走一边，不在set里的输出。我写写。

【在 M********5 的大作中提到】

: 跟你想法差不多，不过我还在想如果用这个算法的话，最后还是要根据index来做排序
: 输出最后的结果，因为题目说relative order不能变，有没有可能省去这一步？

M*52013-01-09 08:01

13 楼

除了用一个map的structure, 建立一个原来数组长度的bitset，在遍历原来的数组的时
候，如果发现有相同的string存在过，就把原来的那个bit set为0，把新的bit set为1
，然后最后从头到尾扫一遍bitset，就可以得到最后的结果。不过这个提法跟你的提法
可能会适用于不同的地方，如果原数组的重复的string很少，可能这个会更好，因为
sorting一般会nlogn的时间，这个是线性的；不过如果重复的多的话，sorting会更划
算，比如如果10000个数组最后可能只有100个unique的，显然遍历一遍也要花10000，
但是sorting比这个快。

【在 p*****2 的大作中提到】

:
: 又想了一下。有个办法。用一个set
: 每次碰到重复的把前一个放到set里。这样最后走一边，不在set里的输出。我写写。

p*22013-01-09 08:01

14 楼

我更新了一下blog，把这个算法也实现了。

【在 p*****2 的大作中提到】

:
: 又想了一下。有个办法。用一个set
: 每次碰到重复的把前一个放到set里。这样最后走一边，不在set里的输出。我写写。

p*22013-01-09 08:01

15 楼

为1
你这个跟我那个算法差不多，我是用set的。

【在 M********5 的大作中提到】

: 除了用一个map的structure, 建立一个原来数组长度的bitset，在遍历原来的数组的时
: 候，如果发现有相同的string存在过，就把原来的那个bit set为0，把新的bit set为1
: ，然后最后从头到尾扫一遍bitset，就可以得到最后的结果。不过这个提法跟你的提法
: 可能会适用于不同的地方，如果原数组的重复的string很少，可能这个会更好，因为
: sorting一般会nlogn的时间，这个是线性的；不过如果重复的多的话，sorting会更划
: 算，比如如果10000个数组最后可能只有100个unique的，显然遍历一遍也要花10000，
: 但是sorting比这个快。

p*22013-01-09 08:01

16 楼

java的code好复杂呀。

M*52013-01-09 08:01

17 楼

大概记得c++里面的set本来就是排好序的对不对？如果是的话那么用set就不需要最后
sort了，不过这是c++ std::set的功劳，哈哈

【在 p*****2 的大作中提到】

:
: java的code好复杂呀。

y*m2013-01-09 08:01

18 楼

Set sets = new HashSet();

THERE
,

【在 d******i 的大作中提到】

: given input as an array of strings.
: Such as: {"apple Orange", "ORANGE apple", "APPLe oRange", "HI There", "THERE
: hI"};
: return an array of strings. in the above case, will return "APPLe, oRange",
: ""THERE hI".
: Here are the rules:
: 1. two strings are the same when words matches(case insensitive) and order
: doesn't matter, so "apple Orange" == "APPLe oRange" == "ORANGE apple".
: 2. if multiple identical strings exist, only return the one that occurs at
: the last location, so "APPLe oRange" and "THERE hI" will in the result.

c*t2013-01-09 08:01

19 楼

还没看你的codes,不过感觉set不好，因为最后的输出是要求顺序的。
我想的与lanti一样，用list与map结合。for each string, convert to small case
and bucket sort, use the converted string as key in map, value is the index
in arraylist, maintain/update the list by using this index.
the result is the arraylist.

【在 p*****2 的大作中提到】

:
: java的code好复杂呀。

M*52013-01-09 08:01

20 楼

如果我没有记错的话，c++里面的set应该是按顺序输出的。。。

index

【在 c********t 的大作中提到】

: 还没看你的codes,不过感觉set不好，因为最后的输出是要求顺序的。
: 我想的与lanti一样，用list与map结合。for each string, convert to small case
: and bucket sort, use the converted string as key in map, value is the index
: in arraylist, maintain/update the list by using this index.
: the result is the arraylist.

p*22013-01-09 08:01

21 楼

index
我给了两种解法。

【在 c********t 的大作中提到】

: 还没看你的codes,不过感觉set不好，因为最后的输出是要求顺序的。
: 我想的与lanti一样，用list与map结合。for each string, convert to small case
: and bucket sort, use the converted string as key in map, value is the index
: in arraylist, maintain/update the list by using this index.
: the result is the arraylist.

e*e2013-01-09 08:01

22 楼

Loop through list from end to front, use LinkedHashMap.
Map map = new LinkedHashMap();
for( int i = words.size()-1; i >= 0; i-- ) {
String sortedWord = sort( words.get( i ) );
if ( !map.containsKey( sortedWord ) )
map.put( sortedWord, word );
}
String[] r = new String[map.size()];
int i = r.size() - 1;
for( Map.Entry entry : map.entrySet() )
r[i--] = entry.getValue();
return r;

p*22013-01-09 08:01

23 楼

不错呀。LinkedHashMap我前不久还扫过一下，不过从来没用过，学习了。

【在 e****e 的大作中提到】

: Loop through list from end to front, use LinkedHashMap.
: Map map = new LinkedHashMap();
: for( int i = words.size()-1; i >= 0; i-- ) {
: String sortedWord = sort( words.get( i ) );
: if ( !map.containsKey( sortedWord ) )
: map.put( sortedWord, word );
: }
: String[] r = new String[map.size()];
: int i = r.size() - 1;
: for( Map.Entry entry : map.entrySet() )

e*e2013-01-09 08:01

24 楼

Thank er ye's compliments.

p*22013-01-09 08:01

25 楼

想了想关键是从后往前扫描这点太有用了，我一直没想到。这一点太赞了。看来思路还
是不够灵活。感觉如果对LinkedHashMap不熟，用HashSet+ArrayList也可以了。

【在 e****e 的大作中提到】

: Thank er ye's compliments.

e*e2013-01-09 08:01

26 楼

IMHO, HashSet+ArrayList+从后往前扫描 is the best solution. Time complexity
is log(n+m), n is num of words, m is num of anagrams. The space complexity
is log(m).
Set set = new HashSet();
List list = new ArrayList();
for( int i = words.size()-1; i >= 0; i-- ) {
String sortedWord = sort( words.get( i ).toLowerCase() );
if ( !set.contains( sortedWord ) ) {
set.add( sortedWord );
list.add( word );
}
}
Collections.reverse( list );
return list;

s*02013-01-09 08:01

27 楼

in java, use linkedhashmap of string string.
first string is lower case sorted words.
second string is the original.

THERE
,

【在 d******i 的大作中提到】

: given input as an array of strings.
: Such as: {"apple Orange", "ORANGE apple", "APPLe oRange", "HI There", "THERE
: hI"};
: return an array of strings. in the above case, will return "APPLe, oRange",
: ""THERE hI".
: Here are the rules:
: 1. two strings are the same when words matches(case insensitive) and order
: doesn't matter, so "apple Orange" == "APPLe oRange" == "ORANGE apple".
: 2. if multiple identical strings exist, only return the one that occurs at
: the last location, so "APPLe oRange" and "THERE hI" will in the result.

l*b2013-01-09 08:01

28 楼

太NB了，从tail scan啊，这样根本没必要用LinkedHashMap了吧？直接HashMap +
LinkedList就可以了吧？每次看见没有key的value就直接加到LinkedList的head。

s*r2013-01-09 08:01

29 楼

这个直接这么sort是不是有问题？
比如 "Hi there" 和 “ther hir" sort成一样的了
还有最后直接用vector存结果是不是好一些，直接从尾部输出，这样就不用reverse
list了

【在 e****e 的大作中提到】

: IMHO, HashSet+ArrayList+从后往前扫描 is the best solution. Time complexity
: is log(n+m), n is num of words, m is num of anagrams. The space complexity
: is log(m).
: Set set = new HashSet();
: List list = new ArrayList();
: for( int i = words.size()-1; i >= 0; i-- ) {
: String sortedWord = sort( words.get( i ).toLowerCase() );
: if ( !set.contains( sortedWord ) ) {
: set.add( sortedWord );
: list.add( word );

p*22013-01-09 08:01

30 楼

他只是assume有个sort函数吧。

【在 s****r 的大作中提到】

: 这个直接这么sort是不是有问题？
: 比如 "Hi there" 和 “ther hir" sort成一样的了
: 还有最后直接用vector存结果是不是好一些，直接从尾部输出，这样就不用reverse
: list了

M*52013-01-09 08:01

31 楼

std::set + std::deque 就可以搞定了，如果tail scan的话，tail scan这个想法太好
了。。。果然是有更efficient的算法。。。

【在 l**b 的大作中提到】

: 太NB了，从tail scan啊，这样根本没必要用LinkedHashMap了吧？直接HashMap +
: LinkedList就可以了吧？每次看见没有key的value就直接加到LinkedList的head。

l*a2013-01-09 08:01

32 楼

what is the difference between LinkedHashMap and
LinkedList+HashMap

【在 l**b 的大作中提到】

: 太NB了，从tail scan啊，这样根本没必要用LinkedHashMap了吧？直接HashMap +
: LinkedList就可以了吧？每次看见没有key的value就直接加到LinkedList的head。

l*a2013-01-09 08:01

33 楼

这样复杂度很高，不明白这里为什么不用Set作为key

【在 p*****2 的大作中提到】

:
: 他只是assume有个sort函数吧。

l*a2013-01-09 08:01

34 楼

如果输入中还有一个"hi"需要返回吗？

THERE
,

【在 d******i 的大作中提到】

: given input as an array of strings.
: Such as: {"apple Orange", "ORANGE apple", "APPLe oRange", "HI There", "THERE
: hI"};
: return an array of strings. in the above case, will return "APPLe, oRange",
: ""THERE hI".
: Here are the rules:
: 1. two strings are the same when words matches(case insensitive) and order
: doesn't matter, so "apple Orange" == "APPLe oRange" == "ORANGE apple".
: 2. if multiple identical strings exist, only return the one that occurs at
: the last location, so "APPLe oRange" and "THERE hI" will in the result.

l*b2013-01-09 08:01

35 楼

用HashMap + LinkedList的话可以省略最后那个copy那个步骤吧。因为LinkedList的
addFirst也是O(1)的。

【在 l*****a 的大作中提到】

: 如果输入中还有一个"hi"需要返回吗？
:
: THERE
: ,

p*22013-01-09 08:01

36 楼

Set怎么做key？

【在 l*****a 的大作中提到】

: 这样复杂度很高，不明白这里为什么不用Set作为key

l*a2013-01-09 08:01

37 楼

什么东西不能做key?
连null都可以做key ah

【在 p*****2 的大作中提到】

:
: Set怎么做key？

e*e2013-01-09 08:01

38 楼

I agree. LinkedList is good. Don't need to copy or reverse.

【在 l**b 的大作中提到】

: 用HashMap + LinkedList的话可以省略最后那个copy那个步骤吧。因为LinkedList的
: addFirst也是O(1)的。

p*22013-01-09 08:01

39 楼

牛逼。

【在 l*****a 的大作中提到】

: 什么东西不能做key?
: 连null都可以做key ah

p*22013-01-09 08:01

40 楼

arraylist reverse就可以了linkedlist不是还得copy吗？如果返回list的话。

【在 e****e 的大作中提到】

: I agree. LinkedList is good. Don't need to copy or reverse.

l*b2013-01-09 08:01

41 楼

大牛那个tail scan太NB了，太NB了。佩服得我们五体投地啊。

【在 e****e 的大作中提到】

: I agree. LinkedList is good. Don't need to copy or reverse.

p*22013-01-09 08:01

42 楼

知道大牛的背景吗？

【在 l**b 的大作中提到】

: 大牛那个tail scan太NB了，太NB了。佩服得我们五体投地啊。

l*b2013-01-09 08:01

43 楼

没有研究过，和背景什么关系？

【在 p*****2 的大作中提到】

:
: 知道大牛的背景吗？

l*a2013-01-09 08:01

44 楼

英雄莫问出处

【在 p*****2 的大作中提到】

:
: 知道大牛的背景吗？

p*22013-01-09 08:01

45 楼

感觉背景应该很牛。

【在 l**b 的大作中提到】

: 没有研究过，和背景什么关系？

e*e2013-01-09 08:01

46 楼

Yes. Or we could use char array to represent the given string, and list of
char array for words. Use Arrays.sort() to do the sorting. However the map
key has to be string type which means we have to convert char array to
string in the map loop. If the average word length is k, the time complexity
for sort is k if counting sort is used and the total time complexity is O(
nk), n is num of words.

【在 p*****2 的大作中提到】

:
: 感觉背景应该很牛。

l*b2013-01-09 08:01

47 楼

没有背景也一样佩服，佩服的是做题的能力，不是在什么公司工作，做什么。哪怕是扫
大街的也一样佩服。

【在 p*****2 的大作中提到】

:
: 感觉背景应该很牛。

e*e2013-01-09 08:01

48 楼

Not Niu at all. I've learned a lot from your 诸位大牛!

p*22013-01-09 08:01

49 楼

扫大街的算法应该不会这么牛的。

【在 l**b 的大作中提到】

: 没有背景也一样佩服，佩服的是做题的能力，不是在什么公司工作，做什么。哪怕是扫
: 大街的也一样佩服。

l*b2013-01-09 08:01

50 楼

只是大哥比方，就没可能是扫大街的。我看过大牛的几个帖子，都很NB的说。

【在 p*****2 的大作中提到】

:
: 扫大街的算法应该不会这么牛的。

d*g2013-01-09 08:01

51 楼

排序没错啊，恢复成string后确保key唯一

【在 M********5 的大作中提到】

: 排序会增加不必要的复杂度，这个不需要排序这一步。。。

d*g2013-01-09 08:01

52 楼

来一个Python版的
def test_strings(string_list):
key_list, word_dict = [], {}
for x in string_list:
word_key = ' '.join(sorted(x.lower().split()))
if not word_key in key_list:
key_list.append(word_key)
word_dict[word_key] = x
return [word_dict[key] for key in key_list]

THERE
,

【在 d******i 的大作中提到】

: given input as an array of strings.
: Such as: {"apple Orange", "ORANGE apple", "APPLe oRange", "HI There", "THERE
: hI"};
: return an array of strings. in the above case, will return "APPLe, oRange",
: ""THERE hI".
: Here are the rules:
: 1. two strings are the same when words matches(case insensitive) and order
: doesn't matter, so "apple Orange" == "APPLe oRange" == "ORANGE apple".
: 2. if multiple identical strings exist, only return the one that occurs at
: the last location, so "APPLe oRange" and "THERE hI" will in the result.

l*b2013-01-09 08:01

53 楼

Java的HashSet还真的是O(1)呢，所以如果用Set做key的话，理论上说应该更
加快。

t*a2013-01-09 08:01

54 楼

一行搞定(clojure)：
(use 'clojure.string)
(let [x ["apple Orange" "ORANGE apple" "APPLe oRange" "HI There" "THERE hI"]]
(map last (partition-by #(sort (split (upper-case %) #" ")) x)))
; ("APPLe oRange" "THERE hI")