谢谢,其实我就是想等你来回答:) 我查了一下,这个是illumina Casava 1.8以后的格式,index在ID这一行的最后: With Casava 1.8 the format of the '@' line has changed: @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG EAS139 the unique instrument name 136 the run id FC706VJ the flowcell id 2 flowcell lane 2104 tile number within the flowcell lane 15343 'x'-coordinate of the cluster within the tile 197393 'y'-coordinate of the cluster within the tile 1 the member of a pair, 1 or 2 (paired-end or mate-pair reads only) Y Y if the read is filtered, N otherwise 18 0 when none of the control bits are on, otherwise it is an even number ATCACG index sequence 我check了一下,这个是用I7/I5 demultiplexing的时候自动生成的 而我是想处理inline barcode sequence,跟这个不一样 不过这个让我有了另一个问题: 这个fastq ID的信息有用吗? 我好像从来没有关注过reads ID 这里面唯一可能有用的就是paired end的/1 /2了 好像早年有些代码还是用这个来识别两个reads 现在都是单独存两个文件了
【在 n******7 的大作中提到】 : 谢谢,其实我就是想等你来回答:) : 我查了一下,这个是illumina Casava 1.8以后的格式,index在ID这一行的最后: : With Casava 1.8 the format of the '@' line has changed: : @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG : EAS139 the unique instrument name : 136 the run id : FC706VJ the flowcell id : 2 flowcell lane : 2104 tile number within the flowcell lane : 15343 'x'-coordinate of the cluster within the tile
n*0
12 楼
象我的韩国邻居。 小区里韩国美女真不少,别都是这样做出来的吧。
n*7
13 楼
谢谢 记得bam是不记录fastq id line 那我决定随便搞了 board是喜欢bam,我之前说的那个用bam记录demultiplexed reads的就是board出来的 人弄的 还没看源码,感觉是基于picard做的 我问过能不能用fastq.gz 他说fastq只是temporary的格式。。 只是存序列的话,我还是喜欢fastq.gz 简单明了,兼容所有reads处理工具 最多用gzip pipe一下 unaligned bam的压缩比应该差不多,但是后续处理大部分第三方工具不支持 我猜board是喜欢自己搞整个工具链吧 你说的那个hiseq4000的error rate太吓人了,伊鲁米娜肯定不承认,或者会快速修复的
http://biorxiv.org/content/early/2017/04/09/125724 HiSeq 4000 problems ``` We discovered that up to 5-10% of sequencing reads (or signals) are incorrectly assigned from a given sample to other samples in a multiplexed pool. We provide evidence that this "spreading-of-signals" arises from low levels of free index primers present in the pool. These index primers can prime pooled library fragments at random via complementary 3′ ends, and get extended by DNA polymerase, creating a new library molecule with a new index before binding to the patterned flow cell to generate a cluster for sequencing. This causes the resulting read from that cluster to be assigned to a different sample, causing the spread of signals within multiplexed samples. ```
擦,这要是真的,玩大了啊 我看摘要,这不光是HiSeq4000,还有hiseq3000和X ten都有这个问题 In 2015, a new chemistry of cluster generation was introduced in the newer Illumina machines (HiSeq 3000/4000/X Ten) called exclusion amplification ( ExAmp), which was a fundamental shift from the earlier method of random cluster generation by bridge amplification on a non-patterned flow cell. 可能最新的novaseq也会有这问题 这要是用来测 tumor samples, 结果完全废了 --- 看了一下正文,通篇说hiseq4000是因为他们只有这个测试 Since the HiSeq 3000 and HiSeq X Ten share the same chemistry as the HiSeq 4000, it is possible that such index switching may also occur at a similar rate using these sequencers, although we have not tested this directly.
get assigned
【在 s******s 的大作中提到】 : http://biorxiv.org/content/early/2017/04/09/125724 : HiSeq 4000 problems : ``` We discovered that up to 5-10% of sequencing reads (or signals) are : incorrectly assigned from a given sample to other samples in a multiplexed : pool. We provide evidence that this "spreading-of-signals" arises from low : levels of free index primers present in the pool. These index primers can : prime pooled library fragments at random via complementary 3′ ends, and get : extended by DNA polymerase, creating a new library molecule with a new : index before binding to the patterned flow cell to generate a cluster for : sequencing. This causes the resulting read from that cluster to be assigned
p*n
18 楼
你这个生下第二代还得整啊 转基因才是王道
【在 z**c 的大作中提到】
s*s
19 楼
坐等illumina跳出来spin
【在 n******7 的大作中提到】 : 擦,这要是真的,玩大了啊 : 我看摘要,这不光是HiSeq4000,还有hiseq3000和X ten都有这个问题 : In 2015, a new chemistry of cluster generation was introduced in the newer : Illumina machines (HiSeq 3000/4000/X Ten) called exclusion amplification ( : ExAmp), which was a fundamental shift from the earlier method of random : cluster generation by bridge amplification on a non-patterned flow cell. : 可能最新的novaseq也会有这问题 : 这要是用来测 tumor samples, 结果完全废了 : --- : 看了一下正文,通篇说hiseq4000是因为他们只有这个测试
z*c
20 楼
听说现在韩国因为整过头,又开始流行单眼皮了 :)
z*t
21 楼
弱问next-seq 500/550会不会受影响?
:擦,这要是真的,玩大了啊 :我看摘要,这不光是HiSeq4000,还有hiseq3000和X ten都有这个问题 :In 2015, a new chemistry of cluster generation was introduced in the newer :Illumina machines (HiSeq 3000/4000/X Ten) called exclusion amplification ( :ExAmp), which was a fundamental shift from the earlier method of random :cluster generation by bridge amplification on a non-patterned flow cell. :可能最新的novaseq也会有这问题 :这要是用来测 tumor samples, 结果完全废了 :--- :看了一下正文,通篇说hiseq4000是因为他们只有这个测试 :..........
【在 n******7 的大作中提到】 : 擦,这要是真的,玩大了啊 : 我看摘要,这不光是HiSeq4000,还有hiseq3000和X ten都有这个问题 : In 2015, a new chemistry of cluster generation was introduced in the newer : Illumina machines (HiSeq 3000/4000/X Ten) called exclusion amplification ( : ExAmp), which was a fundamental shift from the earlier method of random : cluster generation by bridge amplification on a non-patterned flow cell. : 可能最新的novaseq也会有这问题 : 这要是用来测 tumor samples, 结果完全废了 : --- : 看了一下正文,通篇说hiseq4000是因为他们只有这个测试
y*n
22 楼
一猪头整成了孙菲菲,真是化腐朽为神奇。
n*7
23 楼
no
newer (
【在 z*t 的大作中提到】 : 弱问next-seq 500/550会不会受影响? : : :擦,这要是真的,玩大了啊 : :我看摘要,这不光是HiSeq4000,还有hiseq3000和X ten都有这个问题 : :In 2015, a new chemistry of cluster generation was introduced in the newer : :Illumina machines (HiSeq 3000/4000/X Ten) called exclusion amplification ( : :ExAmp), which was a fundamental shift from the earlier method of random : :cluster generation by bridge amplification on a non-patterned flow cell. : :可能最新的novaseq也会有这问题 : :这要是用来测 tumor samples, 结果完全废了