avatar
新手请教CNV caller# Biology - 生物学
k*g
1
刚开始搞CNV,我会用CNVnator但好似不是很sensitive,MrFast+MRCaNaVar从没写清
MRCaNaVar的具体算法是什么。
请问现在比较常用的用NGS 数据的CNV caller是什么?多谢。
avatar
u*1
2
Read-depth:CNVnator
Read-pair: Breakdancer
Split-Read: Pindel
一般就是上面三种metrics来通过NGS找CNV,也是1000Genome project用的办法;
CNVnator(read-depth)这个慢慢会被淘汰,因为read-depth本来就不是个很靠谱的东西
,除非你有个很明显的large deletion,不然read alignment本身就有很多
fluctuation,容易有很多false positive;总之CNVnator是挺不靠谱的,但也算是
read-depth里最好的了
Split-read是最accurate的,也是method for future;当然你要说真正未来的trend,
应该是assembly,但对sequencing数据本身要求很高,需要很高的coverage,要long
reads
Mrfast之类是另外一个门派(Eichler lab),核心是基于multiple alignment;目的
是take care of segmental duplication,提高复杂区域的calling specificity/
sensitivity;但运算量会提高很多,所以目前也是小众的工具,如果你不是对repeats
很有兴趣,那也就别用这个
我现在的做法就是:combine这几种方法,如果一个很obvious的比如large deletion同
时被至少两种metrics支持,那我就相信;这样至少可以high-confidence的找到一些很
obvious的至少是deletion
总之对SV/CNV calling其实最大的限制是read length还是太短了

【在 k********g 的大作中提到】
: 刚开始搞CNV,我会用CNVnator但好似不是很sensitive,MrFast+MRCaNaVar从没写清
: MRCaNaVar的具体算法是什么。
: 请问现在比较常用的用NGS 数据的CNV caller是什么?多谢。

avatar
y*k
3
我认为Mrfast之类本质上还是readdepth,只不过他改进multiple alignment的reads的
计算.
还有想问一句,你是怎么“combine”的呢?
avatar
u*1
4
mrFAST/mrsFAST,是alignment工具,对应的是BWA/Bowtie,
mrFAST得到的alignment的文件基础上,Eichler group又开发出一套基于各种metrics
的软件,比如你说的readdepth的叫MRCaNaVar,对应BWA系列的CNVnator
combine的问题,其实我是最弱智的,就是分别call,然后bedtools找overlap
我现在能做的也就这么多;有的人会在这个基础之上做local assembly
当然了,也有一些软件,会基于两种三种signal来找calling,比如Genome STRiP啦,
DELLY啦;但我感觉效果都差不多;只要read length不增长,不管你如何玩弄program
的花样这个领域还是没有长足进展
我的principle是,我只需要找罕见的SV,而不是optimally的找所有的SV;比如一个疾
病是由一个obvious的罕见的10kb的deletion造成的,我相信combine以上几个signal肯
定可以找到

【在 y***k 的大作中提到】
: 我认为Mrfast之类本质上还是readdepth,只不过他改进multiple alignment的reads的
: 计算.
: 还有想问一句,你是怎么“combine”的呢?

avatar
k*g
5
Thank you very much. I cannot type Chinese on the desktop in my office. I
apologize for the inconvenience.
I am actually interested in the repeats, and that is why I looked in MrFast+
MrCaNaVar. But I cannot find the algorithm behind MrCaNaVar, though the
algorithm of MrFast is well documented. CNVnator, on the other hand, is not
sensitive to the duplication in my experience.
Regarding to Split-read, this is the first time I heard that SR methods are
most accurate. The read length of my data is 101, do you think it is too
short for Split-Read methods?
I will also check out GenomeSTRiP and DELLY you mentioned. Thank you very
much!

【在 u*********1 的大作中提到】
: Read-depth:CNVnator
: Read-pair: Breakdancer
: Split-Read: Pindel
: 一般就是上面三种metrics来通过NGS找CNV,也是1000Genome project用的办法;
: CNVnator(read-depth)这个慢慢会被淘汰,因为read-depth本来就不是个很靠谱的东西
: ,除非你有个很明显的large deletion,不然read alignment本身就有很多
: fluctuation,容易有很多false positive;总之CNVnator是挺不靠谱的,但也算是
: read-depth里最好的了
: Split-read是最accurate的,也是method for future;当然你要说真正未来的trend,
: 应该是assembly,但对sequencing数据本身要求很高,需要很高的coverage,要long

avatar
u*1
6
SR methods are definitely the most accurate because it provides the exact
breakpoint; but we're not lucky enough to have reads encompassing
breakpoints all the time even for SV in unique region, not to mention those
complex structural variants involving repeats/duplication.
So till now, SV field or even indel calling, I would say still quite messy
with lots of false positives, and whole field is lagging behind compared
with SNP calling.
If you are interested in repeats, please first define "repeats" here, do you
mean short tandem repeats (microsatillite)? For di-, tri-,tetra- nucleotids
, if copy number is not that big, ie.tandem repeats polymorphism, say around
10, GATK/samtools can call them just as SNP; if you use Split-read based SV
programs like Pindel I think they'll also be called. But also look at the
link below:
http://erlichlab.wi.mit.edu/lobSTR/
Though I haven't tried this, I think this lobSTR should achieve better
performance.
Again, it's for polymorphism, if you're looking for repeat expansion, say
1000 copies trinucleotides expanded, I don't think any programs right now
will give a best answer given 101bp reads available.

MrFast+
not
are

【在 k********g 的大作中提到】
: Thank you very much. I cannot type Chinese on the desktop in my office. I
: apologize for the inconvenience.
: I am actually interested in the repeats, and that is why I looked in MrFast+
: MrCaNaVar. But I cannot find the algorithm behind MrCaNaVar, though the
: algorithm of MrFast is well documented. CNVnator, on the other hand, is not
: sensitive to the duplication in my experience.
: Regarding to Split-read, this is the first time I heard that SR methods are
: most accurate. The read length of my data is 101, do you think it is too
: short for Split-Read methods?
: I will also check out GenomeSTRiP and DELLY you mentioned. Thank you very

avatar
b*r
7
这个帖子值得收藏
几位大牛预期一下,目前阶段cCGH和illumina NGS的call CNV能力,谁更强,谁的潜力
更大呢?
avatar
k*g
8
多谢,受教了。 我是搞统计出身,现阶段确实是更关心比较长 indel,因为从我们的
角度来看建模比较简单。您提过的几个paper我会仔细研究一下。多谢。

those
you
nucleotids
around

【在 u*********1 的大作中提到】
: SR methods are definitely the most accurate because it provides the exact
: breakpoint; but we're not lucky enough to have reads encompassing
: breakpoints all the time even for SV in unique region, not to mention those
: complex structural variants involving repeats/duplication.
: So till now, SV field or even indel calling, I would say still quite messy
: with lots of false positives, and whole field is lagging behind compared
: with SNP calling.
: If you are interested in repeats, please first define "repeats" here, do you
: mean short tandem repeats (microsatillite)? For di-, tri-,tetra- nucleotids
: , if copy number is not that big, ie.tandem repeats polymorphism, say around

avatar
o*a
9
我感觉array CGH能detect large SV,但是无法准确定位breakpoint。
再说split-read method,detect deletion是没有问题的,任意长度都可以,detect
insertion就只能小于read length了,另外它找的duplication只限于tandem
duplication
Delly是比较新的软件,融合了split-read和read pair的方法。用起来也比较简单。
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。