Redian新闻
>
问一下Repeated DNA Sequences
avatar
问一下Repeated DNA Sequences# JobHunting - 待字闺中
a*e
1
自己做的总是memory exceeded,
看了讨论,
https://oj.leetcode.com/discuss/24478/i-did-it-in-10-lines-of-c
还是不懂为啥m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1就能达到rolling hash
的效果呢?
i只是一个char啊,这个是哪里推导出来的?多谢!
Neat idea. The additional 1 bit per letter still encode each substring in
10x3 = 30 bits, just under 4 bytes for a 32-bit integer.
Your code could be further simplified. By observing that s[i] & 7 is never 0
, each of the first nine substrings with length < 10 will have unique hash
key and will never collide with other 10-letter long sequences. Therefore
the first loop could be removed and be compacted into a single loop.
vector findRepeatedDnaSequences(string s) {
unordered_map m;
vector r;
for (int t = 0, i = 0; i < s.size(); i++)
if (m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1)
r.push_back(s.substr(i - 9, 10));
return r;
}
avatar
s*a
2
因为他说ACTG在ASC里面后三位不一样吧
其实你就说A001 C010 T011 G100啥都没有是000 一样的
无非赶得巧省几行代码
avatar
p*e
3
你不一定非要用这种方法,一个字母用2个bit表示,就不会memory exceed了
m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1
这里的s[i] & 7 是取下一个字母N,
t << 3 & 0x3FFFFFFF 是保留当前10-letter string的后9个letter S,
| 是把 N 合并到S中,这样就更新了key
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。