Redian新闻
>
含有indel的reads怎么比对?
avatar
含有indel的reads怎么比对?# Biology - 生物学
j*g
1
测序结果含有一小部分特殊的reads(<0.1%),是gRNA造成的indel的300bp长的测序
reads [其余的reads都是正常的RNA测序reads]。请问怎么往reference genome上比
对,用什么方法?
谢谢!
avatar
g*a
2
有个叫CRISPResso的工具也许对你有帮助:http://crispresso.rocks/
但是,<0.1%, 这个已经在测序的误差范围内了。你怎么知道是gRNA造成的indel?
avatar
j*g
3
谢谢!我去看看。
不是,我的意思是一个样品测的RNAseq,0.5M reads里面,有indel的reads大概500(
甚至更低),另外的reads都可以map到基因组其他位置,但是没有indel,基本能100%
匹配。不知道说清楚没。
[在 googlealpha () 的大作中提到:]
:有个叫CRISPResso的工具也许对你有帮助:http://crispresso.rocks/
:但是,<0.1%, 这个已经在测序的误差范围内了。你怎么知道是gRNA造成的indel
avatar
g*a
4
If I'm not mistaken, you used CRISPR to introduce indels to a particular
target gene, followed by RNA-seq to measure the mRNA expression in genome-
scale including the target itself. You are worrying about the indels at the
target, as they will probably cause alignment problem in RNA-seq analysis.
Am I right?
avatar
j*i
5
我的文章用的方法 请生物信息的人帮我分析的
The indel frequencies were analyzed using CRISPResso (Pinello, 2015) using
the CRISPRessoPooled amplicons mode, while ignoring substitutions.
Alternatively, and for further validation, the HiSeq reads were trimmed with
Sickle (Joshi and Fass, 2011), merged with FLASH (Magoč and Salzberg,
2011), and mapped to the amplicons using bowtie2 (Langmead and Salzberg,
2012). Reads with an average Phred score below 23 were discarded. The reads
were then aligned using EMBOSS water (Rice, 2000) and analyzed for indels.
The maximum-likelihood estimate of the true-indel fraction was calculated as
previously described (Hsu et al., 2013).
avatar
C*5
6
jessecai
暴露身份了。
ELIFE
avatar
g*a
7
I guess you addressed a different bioinformatics need. You used targeted
sequencing, whereas he used RNA-seq. CRISPResso was designed for your
application, but I'm not sure if it works for RNA-seq as well. My gut
feeling is, the alignment in RNA-seq is more challenging.
BTW, congrats for your paper.

with
,
reads
as

【在 j******i 的大作中提到】
: 我的文章用的方法 请生物信息的人帮我分析的
: The indel frequencies were analyzed using CRISPResso (Pinello, 2015) using
: the CRISPRessoPooled amplicons mode, while ignoring substitutions.
: Alternatively, and for further validation, the HiSeq reads were trimmed with
: Sickle (Joshi and Fass, 2011), merged with FLASH (Magoč and Salzberg,
: 2011), and mapped to the amplicons using bowtie2 (Langmead and Salzberg,
: 2012). Reads with an average Phred score below 23 were discarded. The reads
: were then aligned using EMBOSS water (Rice, 2000) and analyzed for indels.
: The maximum-likelihood estimate of the true-indel fraction was calculated as
: previously described (Hsu et al., 2013).

avatar
j*i
8
不好意思 没看清是RNA ; )

【在 g*********a 的大作中提到】
: I guess you addressed a different bioinformatics need. You used targeted
: sequencing, whereas he used RNA-seq. CRISPResso was designed for your
: application, but I'm not sure if it works for RNA-seq as well. My gut
: feeling is, the alignment in RNA-seq is more challenging.
: BTW, congrats for your paper.
:
: with
: ,
: reads
: as

avatar
j*g
9
太对了!indel是在基因的3‘UTR区,所以不影响基因expression。
怎么比对?有成熟的发表的方法吗?
谢谢!

the

【在 g*********a 的大作中提到】
: If I'm not mistaken, you used CRISPR to introduce indels to a particular
: target gene, followed by RNA-seq to measure the mRNA expression in genome-
: scale including the target itself. You are worrying about the indels at the
: target, as they will probably cause alignment problem in RNA-seq analysis.
: Am I right?

avatar
j*g
10
谢谢。不过我不是targeted sequencing,所以你文章的这个方法不确定行不行,可能
要修改一下。

with
,
reads
as

【在 j******i 的大作中提到】
: 我的文章用的方法 请生物信息的人帮我分析的
: The indel frequencies were analyzed using CRISPResso (Pinello, 2015) using
: the CRISPRessoPooled amplicons mode, while ignoring substitutions.
: Alternatively, and for further validation, the HiSeq reads were trimmed with
: Sickle (Joshi and Fass, 2011), merged with FLASH (Magoč and Salzberg,
: 2011), and mapped to the amplicons using bowtie2 (Langmead and Salzberg,
: 2012). Reads with an average Phred score below 23 were discarded. The reads
: were then aligned using EMBOSS water (Rice, 2000) and analyzed for indels.
: The maximum-likelihood estimate of the true-indel fraction was calculated as
: previously described (Hsu et al., 2013).

avatar
g*a
11
It's an interesting question. To the best of my knowledge, no RNA-seq
alignment tool was designed to tolerate CRISPR-mediated indels to date. I'm
not an RNA-seq expert, so it's possible that there are some on the way. You
can also email Luca Pinello, the author of CRISPResso. He might know more,
and he's a really nice guy.
Here I'm trying to think about a solution. There are two situations - I don'
t know which one is your case.
a) The sample for RNA-seq was derived from a single mutant clone. It would
be easy to use targeted sequencing to determine the pattern of indels (e.g.
2nt insertion, 5nt deletion). Normally there are two allele-specific
patterns, and can be more if the gene is in copy number amplified region.
Once you know the EXACT indel patterns, you can modify the reference genome
and build new index for alignment using TopHat. This is the most precise way
I can think of.
b) The sample was derived from a population of mutant cells, where it's
difficult to enumerate indel patterns since the indels introduced by the
same sgRNA can be stochastically distributed. Note that the indel size of
CRISPR mutations are mostly in the range < 20nt. It's not frequent that the
indel will impact the sequence of both ends, if you used pair-end sequencing
. Therefore you might align one end (TopHat supports single end alignment)
to determine the PE reads from the target transcript. If one end is mappable
and the other is not, the second end is likely to bear indel. Then you can
BLAST the 2nd end reads against the transcript of the target gene to
determine the indels. BLAST has better performance in detecting relatively
larger indels. This way you will rescue the reads overlapped with mutations.
One thing you might need to be cautious is, mutations at 3'UTR can alter
gene expression through perturbation of miRNA regulation or even chromatin
structure. It would be better to figure out this potential confounding
factor during data interpretation.
Hope this helps.

【在 j*********g 的大作中提到】
: 太对了!indel是在基因的3‘UTR区,所以不影响基因expression。
: 怎么比对?有成熟的发表的方法吗?
: 谢谢!
:
: the

avatar
j*g
12
非常感谢!这正是我在想的问题。具体问题就是你说的第2种情况。你觉得paired-end
seq比single好?我原先打算读300nt,single end。按你说的恐怕还是2*150比较好?
我发站内信给你了。

m
You
don'
.

【在 g*********a 的大作中提到】
: It's an interesting question. To the best of my knowledge, no RNA-seq
: alignment tool was designed to tolerate CRISPR-mediated indels to date. I'm
: not an RNA-seq expert, so it's possible that there are some on the way. You
: can also email Luca Pinello, the author of CRISPResso. He might know more,
: and he's a really nice guy.
: Here I'm trying to think about a solution. There are two situations - I don'
: t know which one is your case.
: a) The sample for RNA-seq was derived from a single mutant clone. It would
: be easy to use targeted sequencing to determine the pattern of indels (e.g.
: 2nt insertion, 5nt deletion). Normally there are two allele-specific

avatar
g*a
13
看到信了。很有意思的idea,跟我理解的还不一样。不好意思我的ID是偶尔看到你的问
题才临时注册回答的。好象有三天试用期,还发不了站内信。不介意的话我三天后回你
,或给个个人邮箱地址我email你。


: 非常感谢!这正是我在想的问题。具体问题就是你说的第2种情况。你觉得
paired-end

: seq比single好?我原先打算读300nt,single end。按你说的恐怕还是2*150比
较好?

: 我发站内信给你了。

: m

: You

: don'

: .



【在 j*********g 的大作中提到】
: 非常感谢!这正是我在想的问题。具体问题就是你说的第2种情况。你觉得paired-end
: seq比single好?我原先打算读300nt,single end。按你说的恐怕还是2*150比较好?
: 我发站内信给你了。
:
: m
: You
: don'
: .

avatar
j*g
14
看你信箱。

【在 g*********a 的大作中提到】
: 看到信了。很有意思的idea,跟我理解的还不一样。不好意思我的ID是偶尔看到你的问
: 题才临时注册回答的。好象有三天试用期,还发不了站内信。不介意的话我三天后回你
: ,或给个个人邮箱地址我email你。
:
:
: 非常感谢!这正是我在想的问题。具体问题就是你说的第2种情况。你觉得
: paired-end
:
: seq比single好?我原先打算读300nt,single end。按你说的恐怕还是2*150比
: 较好?
:
: 我发站内信给你了。
:
: m

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。