avatar
NGS数据分析的流程# Biology - 生物学
c*e
1
短评:今天开盘后,银行已经上调了一次利率价格。购房贷款除30年固定外,其它
program的利率全部上涨!
******************************** CALIFORNIA ********************************
REFINANCE RATE:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 4.875 4.250 4.750 4.250
<=$729K 5.125 4.500

REFINANCE CASH-OUT RATE:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 4.875 4.375 4.875 4.250
<=$729K

PURCHASE RATE - PRIMARY RESIDENCE:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 4.750 4.125 4.625 4.000 3.375 3.625
APR: 4.860 4.314 4.775 4.271 3.477 3.728

<=$729K 4.875 4.375 4.000
APR: 4.966 4.531 4.086

PURCHASE RATE - INVESTMENT PROPERTY:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 5.125 4.500 4.875 4.375
APR: 5.238 4.691 5.027 4.648

******************************** MARYLAND ********************************
REFINANCE RATE:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 4.875 4.250 4.750 4.125 4.250
<=$729K 5.000 4.375

REFINANCE CASH-OUT RATE:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 4.875 4.375 4.875 4.250
<=$729K

PURCHASE RATE - PRIMARY RESIDENCE:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 4.750 4.125 4.625 4.000 3.375 3.625
APR: 4.860 4.314 4.775 4.271 3.477 3.728

<=$729K 4.875 4.250 4.000
APR: 4.966 4.405 4.086

PURCHASE RATE - INVESTMENT PROPERTY:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 5.000 4.375 4.875 4.375
APR: 5.112 4.565 5.027 4.648

******************************** VIRGINIA ********************************
REFINANCE RATE:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 4.875 4.250 4.750 4.125
<=$729K 5.000 4.375

REFINANCE CASH-OUT RATE:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 4.875 4.375 4.750 4.250
<=$729K

PURCHASE RATE - PRIMARY RESIDENCE:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 4.750 4.000 4.625 3.875 3.375 3.625
APR: 4.860 4.188 4.775 4.146 3.477 3.728

<=$729K 4.875 4.250 4.000
APR: 4.966 4.405 4.086

PURCHASE RATE - INVESTMENT PROPERTY:
30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
<=$417K 5.000 4.375 4.875 4.250
APR: 5.112 4.565 5.027 4.523
Rates above are based on the following assumptions:
1. Loan Amount: $380K for Conforming loan, $550K for Super-Conforming loans;
2. Loan-To-Value (LTV):
- <=80% for Primary Residence Conforming Purchase & Refinance;
- <=70% for Primary Residence Conforming Cash-Out Refinance;
- <=75% for Primary Residence Super-Conforming Refinance and Purchase;
- <=65% for Primary Residence Super-Conforming Cash-Out Refinance;
- <=75% for Investment Property Conforming Purchase.
3. Primary Residence except specified;
4. Single Family Residence;
5. Credit score>=740;
6. No Impound/Escrow Account.
For Refinance, rates are based on No Point No Fee. So APR for Refinance =
Refinance Rate.
注:我们的每日贷款利率发布现在已经包括投资房的购房贷款利率了!欢迎查看! 以
上凡是没有表明利率的loan program,意味着我们没法提供No Point的贷款。如果您想
要知道该program的利率,请和我们联系查询。我们的联系方式可通过点击我们的User
ID来得到。
avatar
v*r
2
关于Genotype calling的,哪位有经验的能否把从公司给的数据文件到最终结果的几个
步骤和文件介绍一下。谢过先。
avatar
j*e
3
good sign

**

【在 c**********e 的大作中提到】
: 短评:今天开盘后,银行已经上调了一次利率价格。购房贷款除30年固定外,其它
: program的利率全部上涨!
: ******************************** CALIFORNIA ********************************
: REFINANCE RATE:
: 30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM
: <=$417K 4.875 4.250 4.750 4.250
: <=$729K 5.125 4.500
:
: REFINANCE CASH-OUT RATE:
: 30-Yr 15-Yr 20-Yr 10-Yr 5/1 ARM 7/1 ARM

avatar
T*u
4
this type of analysis needs training.
several files are not enough to fullfill this task

【在 v***r 的大作中提到】
: 关于Genotype calling的,哪位有经验的能否把从公司给的数据文件到最终结果的几个
: 步骤和文件介绍一下。谢过先。

avatar
c*e
5
看来你是希望利率上升啊!

【在 j***e 的大作中提到】
: good sign
:
: **

avatar
v*r
6
我只要知道公司给的结果是到哪一步,接下来的主要步骤有哪些就可以了。

【在 T****u 的大作中提到】
: this type of analysis needs training.
: several files are not enough to fullfill this task

avatar
l*1
7
LZ can try
GATK (Broad Institute)
>http://www.broadinstitute.org/gatk/guide/best-practices
Ref:
Nucleic Acids Res. 2014 Jan 11. [Epub ahead of print]
An integrated framework for discovery and genotyping of genomic variants
from high-throughput sequencing experiments.
>http://www.ncbi.nlm.nih.gov/pubmed/24413664

【在 v***r 的大作中提到】
: 关于Genotype calling的,哪位有经验的能否把从公司给的数据文件到最终结果的几个
: 步骤和文件介绍一下。谢过先。

avatar
a*e
8
your sample --> company --> FQ or FA file
--> blat or bowtie or Tophat to align --> (.sam, .bam file)
--> Samtools or GATK to call variants --> .vcf file (excel file)
--> igvtools or genome browser to visualize
or it is said u can use CLC to replace the last three steps.
avatar
s*y
9
一般仪器出来的结果,你通过那个软件,可以得到bam file
这个基本上买仪器的时候,都会培训,不难,包括上样操作到后面数据的简单分析
但是bam file 往后的分析就不是三两下能搞定的了,要会code才行

【在 v***r 的大作中提到】
: 我只要知道公司给的结果是到哪一步,接下来的主要步骤有哪些就可以了。
avatar
v*r
10
就是说公司给的是aligned好的bam文件,接下来不就是用samtools, GATK做snp
calling吗?为什么不是很容易搞定呢?难点在哪?

【在 s******y 的大作中提到】
: 一般仪器出来的结果,你通过那个软件,可以得到bam file
: 这个基本上买仪器的时候,都会培训,不难,包括上样操作到后面数据的简单分析
: 但是bam file 往后的分析就不是三两下能搞定的了,要会code才行

avatar
v*r
11
Thanks a lot!
This is very helpful!
Is there toy data that I can play with?

【在 a***e 的大作中提到】
: your sample --> company --> FQ or FA file
: --> blat or bowtie or Tophat to align --> (.sam, .bam file)
: --> Samtools or GATK to call variants --> .vcf file (excel file)
: --> igvtools or genome browser to visualize
: or it is said u can use CLC to replace the last three steps.

avatar
d*e
13
如果你知道原理,或者自己做过一次的话,确实不算困难。难点在于:
1 这种data一般很大,一般通过服务器端计算而不是本机计算完成的
2 主要软件都是在linux下运行,需要用户有基本操作知识,至少会一点scripting
language
3 真正的困难在于down streaming analysis,即拿到variants call之后怎么办。每个
人的要求不一样,所以并没有统一的标准。如果对于用户到此为止就够了,那也可以。
avatar
v*r
14
也就是说,如果我是做下游的分析,拿到vcf files就可以了,上游的那些不用重做或
者调整?

【在 d*******e 的大作中提到】
: 如果你知道原理,或者自己做过一次的话,确实不算困难。难点在于:
: 1 这种data一般很大,一般通过服务器端计算而不是本机计算完成的
: 2 主要软件都是在linux下运行,需要用户有基本操作知识,至少会一点scripting
: language
: 3 真正的困难在于down streaming analysis,即拿到variants call之后怎么办。每个
: 人的要求不一样,所以并没有统一的标准。如果对于用户到此为止就够了,那也可以。

avatar
d*e
15
if you trust the procedure and pipeline, it surely is.
I would suggest using different pipelines to get VCF files and do the
comparison if you can. There were quite a lot difference between methods.
Everybody declared his own method is the best.

【在 v***r 的大作中提到】
: 也就是说,如果我是做下游的分析,拿到vcf files就可以了,上游的那些不用重做或
: 者调整?

avatar
W*o
16
其实我觉得主要工作 应该是如何分析VCF,VCF之前的那些步骤高中生都能run,主要学
问在于分析VCF

【在 v***r 的大作中提到】
: 也就是说,如果我是做下游的分析,拿到vcf files就可以了,上游的那些不用重做或
: 者调整?

avatar
v*r
17
展开说说,为何分析VCF很重要

【在 W***o 的大作中提到】
: 其实我觉得主要工作 应该是如何分析VCF,VCF之前的那些步骤高中生都能run,主要学
: 问在于分析VCF

avatar
s*r
18
怎么分析vcf确实是主要的
不过不同的pipeline出来的vcf都不一样
如果之前有建好的pipeline,参数都定好,能省很多事情,现搭的话还是要optimize一
下的

【在 W***o 的大作中提到】
: 其实我觉得主要工作 应该是如何分析VCF,VCF之前的那些步骤高中生都能run,主要学
: 问在于分析VCF

avatar
W*o
19
VCF 数据对于一般的project来说还是太大,需要有一定的方法extract useful inform
ation。我得睡觉了,如果有兴趣明天继续

【在 v***r 的大作中提到】
: 展开说说,为何分析VCF很重要
avatar
v*r
20
有兴趣,请继续

inform

【在 W***o 的大作中提到】
: VCF 数据对于一般的project来说还是太大,需要有一定的方法extract useful inform
: ation。我得睡觉了,如果有兴趣明天继续

avatar
v*r
21
这正是我担心的,现在手头数据coverage比较低,skycolor君有什么好的建议请不吝赐
教。

【在 s******r 的大作中提到】
: 怎么分析vcf确实是主要的
: 不过不同的pipeline出来的vcf都不一样
: 如果之前有建好的pipeline,参数都定好,能省很多事情,现搭的话还是要optimize一
: 下的

avatar
l*1
22
i have no idea to mouse breeding trouble,
such as posted on the link,
http://www.mitbbs.com/article_t/Biology/31869195.html
those queries should ask mitbbs Mouse Queen 'Dua' or
other IDs..

【在 v***r 的大作中提到】
: haha, thanks.
: Ylotkaeuler11 can find answer for every question posted here.

avatar
v*r
23
LOL
Queen of math/comp bio?

【在 l**********1 的大作中提到】
: i have no idea to mouse breeding trouble,
: such as posted on the link,
: http://www.mitbbs.com/article_t/Biology/31869195.html
: those queries should ask mitbbs Mouse Queen 'Dua' or
: other IDs..

avatar
t*d
24
coverage低的做什么都不行,你得先保证一定的coverage,才能保证你variant
calling的正确性。

【在 v***r 的大作中提到】
: 这正是我担心的,现在手头数据coverage比较低,skycolor君有什么好的建议请不吝赐
: 教。

avatar
t*d
25
比方说vcf文件里面的variant的位点属于哪个基因,或者是不是在重要的splicing
site上,这些都需要编程序代码来parse cvf文件来获得

【在 v***r 的大作中提到】
: 展开说说,为何分析VCF很重要
avatar
l*1
26
Plus
To LZ:
just check,
>http://bcbio.wordpress.com/tag/ngs/
cited:
>Access VCF variant information
>In addition to extending the GATK through walkers and annotations you can
also utilize the extensive API directly, taking advantage of parsers and
data structures to handle common file formats. Using Clojure’s Java
interoperability, the variantcontext module provides a high level API to
parse and extract information from VCF files. To loop through a VCF file and
print the location, reference allele and called alleles for each variant we:
Open a VCF source providing access to the underlying file inside a with-open
statement to ensure closing of the resource.
Parse the VCF source, returning an iterator of VariantContext maps for each
variant in the file.
Extract values from the map: the chromosome, start, reference allele and
called alleles for the first genotype.
******
1(use 'bcbio.variation.variantcontext)
2
3(with-open [vcf-source (get-vcf-source "test/data/gatk-calls.vcf")]
4 (doseq [vc (parse-vcf vcf-source)]
5 (println (:chr vc) (:start vc) (:ref-allele vc)
6 (-> vc :genotypes first :alleles)))
*****
or
To further identify causes of discordance, we subdivide the missing and
extra variants using annotations from the GEMINI variation framework:
Low coverage: positions with limited read coverage (4 to 9 reads).
Repetitive: regions identified by RepeatMasker.
Error prone: variants falling in motifs found to induce sequencing errors.
We subdivide and restrict our comparisons to help identify sources of
differences between methods indistinguishable when looking at total
discordant counts. A critical subdivison is comparing SNPs and indels
separately. With lower total counts of indels but higher error rates, each
variant type needs independent visualization. Secondly, it’s crucial to
distinguish between discordance caused by a lack of coverage, and
discordance caused by an actual difference in variant assessment. We
evaluate only in callable regions with 4 or more reads. This low minimum
cutoff provides a valuable evaluation of low coverage regions, which differ
the most between alignment and calling methods.
I’ll use this data to provide recommendations for alignment, post-alignment
preparation and variant calling. In addition to these high level summaries,
the full dataset and summary plots available below providing a starting
place for digging further into the data.
Aligners
We compared two recently released aligners designed to work with longer
reads coming from new sequencing technologies: novoalign (3.00.02) and bwa
mem (0.7.3a). bwa mem identified 1389 additional concordant SNPs and 145
indels not seen with novoalign. 1024 of these missing variants are in
regions where novoalign does not provide sufficient coverage for calling. Of
those, 92% (941) have low coverage with less than 10 reads in the bwa
alignments. Algorithmic changes impact low coverage regions more due to the
decreased evidence and susceptibility to crossing calling coverage
thresholds, so we need extra care and consideration of calls in these
regions.
Our standard workflow uses novoalign based on its stringency in resolving
large insertions and deletions. These results suggest equally good results
using bwa mem, along with improved processing times. One caveat to these
results is that some of the available Illumina call data that feeds into
NIST’s reference genomes comes from a bwa alignment, so some differences
may reflect a bias towards bwa alignment heuristics. Using non-simulated
reference data sets has the advantage of capturing real biological and
process errors, but requires iterative improvement of the reference
materials to avoid this type of potential algorithmic bias.
alternatively LZ can try 'Platypus' n.b. Python based,
>http://www.well.ox.ac.uk/platypus
or
>http://www.well.ox.ac.uk/~rimmer/README.txt
more,
try go to
>http://www-huber.embl.de/users/anders/HTSeq/doc/tour.html
or
>http://pyvcf.readthedocs.org/en/latest/INTRO.html

【在 t****d 的大作中提到】
: 比方说vcf文件里面的variant的位点属于哪个基因,或者是不是在重要的splicing
: site上,这些都需要编程序代码来parse cvf文件来获得

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。