问一下Repeated DNA Sequences# JobHunting - 待字闺中
a*e
1 楼
自己做的总是memory exceeded,
看了讨论,
https://oj.leetcode.com/discuss/24478/i-did-it-in-10-lines-of-c
还是不懂为啥m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1就能达到rolling hash
的效果呢?
i只是一个char啊,这个是哪里推导出来的?多谢!
Neat idea. The additional 1 bit per letter still encode each substring in
10x3 = 30 bits, just under 4 bytes for a 32-bit integer.
Your code could be further simplified. By observing that s[i] & 7 is never 0
, each of the first nine substrings with length < 10 will have unique hash
key and will never collide with other 10-letter long sequences. Therefore
the first loop could be removed and be compacted into a single loop.
vector findRepeatedDnaSequences(string s) {
unordered_map m;
vector r;
for (int t = 0, i = 0; i < s.size(); i++)
if (m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1)
r.push_back(s.substr(i - 9, 10));
return r;
}
看了讨论,
https://oj.leetcode.com/discuss/24478/i-did-it-in-10-lines-of-c
还是不懂为啥m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1就能达到rolling hash
的效果呢?
i只是一个char啊,这个是哪里推导出来的?多谢!
Neat idea. The additional 1 bit per letter still encode each substring in
10x3 = 30 bits, just under 4 bytes for a 32-bit integer.
Your code could be further simplified. By observing that s[i] & 7 is never 0
, each of the first nine substrings with length < 10 will have unique hash
key and will never collide with other 10-letter long sequences. Therefore
the first loop could be removed and be compacted into a single loop.
vector
unordered_map
vector
for (int t = 0, i = 0; i < s.size(); i++)
if (m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1)
r.push_back(s.substr(i - 9, 10));
return r;
}