avatar
f*4
2
有比O(n^2)更快的办法咩?
avatar
c*t
3
我觉得lolhaha的算法挺好,就是不知道怎么算复杂度。

【在 s*******t 的大作中提到】
: 一个单词的列表,要找出两个单词,它们没有相同的字母出现,而且长度乘积最大
: 原帖在这里:
: http://mitbbs.com/article_t/JobHunting/31744763.html
: 感觉没有哪个解法比较好。

avatar
i*e
4
lolhaha 的算法的复杂度是 O(N^2).
应该没有比 O(N^2) 更好的算法了,
但是有一个优化可以把算法运行速度大大提升。
根据 lolhaha 的算法,
1.create hash for each word
2.sort the dictionary by word length.
assume the dictionary become
a[0],a[1]......
find the largest k so that hash(a[0])&hash(a[k])==0,
get max=len(a[0])*len(a[k])
find max i in [2,k-1] so that hash(a[1])&hash(a[i])==0,
compare max and len(a[1])*len(a[i])
next step find max j in [3,i-1]....
Using this online word list:
http://www.sil.org/linguistics/wordlists/english/wordlist/words
(containing total 109562 valid words with characters a-z (deleting 20
invalid words with apostrophe ')),
the run time is 20 seconds using lolhaha's algorithm on my machine.
However, by just adding this condition before the inner loop:
if (a[j].length * a[j].length < max) break; // stop comparing
the run time is reduced to less than 1 second!!!
If you're interested, the answer using the above word list is:
microphotographic defenselessness 255
photomicrographic defenselessness 255
Isn't this interesting?
It is able to reduce the run time by such a large factor because the
dictionary only contains a handful of long words, compared to the majority
words which are shorter. Since the word list is sorted by descending order
in length, the max is likely to be found during the beginning of loop and
could be terminated earlier.
一些常见面试题的答案与总结 -
http://www.ihas1337code.com

【在 c********t 的大作中提到】
: 我觉得lolhaha的算法挺好,就是不知道怎么算复杂度。
avatar
s*t
5
then I guess how to design the hash function will be key issue here.
someone suggest use bitset to calculate the hash, however in some word,
there are duplicated characters which maybe an issue.

【在 i**********e 的大作中提到】
: lolhaha 的算法的复杂度是 O(N^2).
: 应该没有比 O(N^2) 更好的算法了,
: 但是有一个优化可以把算法运行速度大大提升。
: 根据 lolhaha 的算法,
: 1.create hash for each word
: 2.sort the dictionary by word length.
: assume the dictionary become
: a[0],a[1]......
: find the largest k so that hash(a[0])&hash(a[k])==0,
: get max=len(a[0])*len(a[k])

avatar
i*e
6
Duplicated characters is not an issue here.
It is only required that characters that appeared in word A do not appear in
word B.
Therefore, assuming a word can contain only characters from 'a' to 'z' all
lowercase, then each character can be mapped to a bit in an integer using
only 26 bits.
If 'a' appeared in a word, bit 1 is set. If 'z' appeared, bit 26 is set.
This integer will form the hash for a word.
To find out if all characters that are in word A are not in word B, just
simply do bit AND operation of their hash. Duplicate characters will not
cause an issue here.
一些常见面试题的答案与总结 -
http://www.ihas1337code.com
avatar
s*t
7
Thanks. I understand now. Even two words may have the same hash value, it
doesn't affect the final result.

in

【在 i**********e 的大作中提到】
: Duplicated characters is not an issue here.
: It is only required that characters that appeared in word A do not appear in
: word B.
: Therefore, assuming a word can contain only characters from 'a' to 'z' all
: lowercase, then each character can be mapped to a bit in an integer using
: only 26 bits.
: If 'a' appeared in a word, bit 1 is set. If 'z' appeared, bit 26 is set.
: This integer will form the hash for a word.
: To find out if all characters that are in word A are not in word B, just
: simply do bit AND operation of their hash. Duplicate characters will not

avatar
g*s
8
good observation.
however, how can you ensure this hash is evenly distributed?

in

【在 i**********e 的大作中提到】
: Duplicated characters is not an issue here.
: It is only required that characters that appeared in word A do not appear in
: word B.
: Therefore, assuming a word can contain only characters from 'a' to 'z' all
: lowercase, then each character can be mapped to a bit in an integer using
: only 26 bits.
: If 'a' appeared in a word, bit 1 is set. If 'z' appeared, bit 26 is set.
: This integer will form the hash for a word.
: To find out if all characters that are in word A are not in word B, just
: simply do bit AND operation of their hash. Duplicate characters will not

avatar
i*e
9
Do you mean hash table?
This 'hash' is not a hash table. It is like a signature for a word which
contains just enough information of the set of characters that appeared in
the word.
一些常见面试题的答案与总结 -
http://www.ihas1337code.com

【在 g*********s 的大作中提到】
: good observation.
: however, how can you ensure this hash is evenly distributed?
:
: in

avatar
c*t
10
太赞了,我觉得还可以优化的就是,不需要一开始就把所有单词的hashcode算出来储存
。不如用到的时候才算出来存。这样因为a[j].length*a[j].length用算了。

【在 i**********e 的大作中提到】
: lolhaha 的算法的复杂度是 O(N^2).
: 应该没有比 O(N^2) 更好的算法了,
: 但是有一个优化可以把算法运行速度大大提升。
: 根据 lolhaha 的算法,
: 1.create hash for each word
: 2.sort the dictionary by word length.
: assume the dictionary become
: a[0],a[1]......
: find the largest k so that hash(a[0])&hash(a[k])==0,
: get max=len(a[0])*len(a[k])

avatar
z*o
11
1. use an integer Mark[i] to represent the character set for word[i] in the
dict; use lower 26 bits, each represent whether a certain character appears
in the word.
(Mark[i] & Mark[j]) == 0 means word[i] and
word[j] are disjoint.
2. at the mean time of calculating the Mark[i], we update the maximum-length
word index in
Longest[(~Mark[i]) & 0x03ffffff];
this means eg.
if word[i] = "abcdefg........uvwabcdefg.....uvw", Longest[(~
Mark[i]) & 0x03ffffff] = i :
the longest word contains and only contains {a,b,c,....,u,v,w} is
word[i] with length 46.
3. we modify the definition of Longest[mark] from
"longest word contains and only contains all unmarked
characters" to
"longest word contains at most all unmarked characters".
To modify the values for Longest[mark], we update Longest[ mark] using
all the Longest[ mark' ] values, where:
mark' = turn one 1 in mark into 0;
if we update the Longest[ mark] using "number of 1s in mark" as order,
increasingly, every Longest[mark] will only be recalculated once.
4. using queue to recalculate every "updated" Longest[mark], instead of
recalculate every Longest[mark] can sometimes dramatically reduce the
calculation. But complexity keeps the same.
5. For complexity: 2^26*26+O(n*m), which is not a good bit-O representation
but more useful.
6. I wrote it just practicing to explain something in English. It sucks to
me.
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。