Thank you very much. I cannot type Chinese on the desktop in my office. I apologize for the inconvenience. I am actually interested in the repeats, and that is why I looked in MrFast+ MrCaNaVar. But I cannot find the algorithm behind MrCaNaVar, though the algorithm of MrFast is well documented. CNVnator, on the other hand, is not sensitive to the duplication in my experience. Regarding to Split-read, this is the first time I heard that SR methods are most accurate. The read length of my data is 101, do you think it is too short for Split-Read methods? I will also check out GenomeSTRiP and DELLY you mentioned. Thank you very much!
SR methods are definitely the most accurate because it provides the exact breakpoint; but we're not lucky enough to have reads encompassing breakpoints all the time even for SV in unique region, not to mention those complex structural variants involving repeats/duplication. So till now, SV field or even indel calling, I would say still quite messy with lots of false positives, and whole field is lagging behind compared with SNP calling. If you are interested in repeats, please first define "repeats" here, do you mean short tandem repeats (microsatillite)? For di-, tri-,tetra- nucleotids , if copy number is not that big, ie.tandem repeats polymorphism, say around 10, GATK/samtools can call them just as SNP; if you use Split-read based SV programs like Pindel I think they'll also be called. But also look at the link below: http://erlichlab.wi.mit.edu/lobSTR/ Though I haven't tried this, I think this lobSTR should achieve better performance. Again, it's for polymorphism, if you're looking for repeat expansion, say 1000 copies trinucleotides expanded, I don't think any programs right now will give a best answer given 101bp reads available.
MrFast+ not are
【在 k********g 的大作中提到】 : Thank you very much. I cannot type Chinese on the desktop in my office. I : apologize for the inconvenience. : I am actually interested in the repeats, and that is why I looked in MrFast+ : MrCaNaVar. But I cannot find the algorithm behind MrCaNaVar, though the : algorithm of MrFast is well documented. CNVnator, on the other hand, is not : sensitive to the duplication in my experience. : Regarding to Split-read, this is the first time I heard that SR methods are : most accurate. The read length of my data is 101, do you think it is too : short for Split-Read methods? : I will also check out GenomeSTRiP and DELLY you mentioned. Thank you very
【在 u*********1 的大作中提到】 : SR methods are definitely the most accurate because it provides the exact : breakpoint; but we're not lucky enough to have reads encompassing : breakpoints all the time even for SV in unique region, not to mention those : complex structural variants involving repeats/duplication. : So till now, SV field or even indel calling, I would say still quite messy : with lots of false positives, and whole field is lagging behind compared : with SNP calling. : If you are interested in repeats, please first define "repeats" here, do you : mean short tandem repeats (microsatillite)? For di-, tri-,tetra- nucleotids : , if copy number is not that big, ie.tandem repeats polymorphism, say around