added an evaluation section
This commit is contained in:
parent
cee8149b12
commit
eb664c2fe8
|
|
@ -135,6 +135,31 @@ other programs for typing such as [Warren et al (2012)][hla4], [Liu et al
|
|||
(2013)][hla2], [Bai et al (2014)][hla3] and [Dilthey et al (2014)][hla1], though
|
||||
most of them are distributed under restrictive licenses.
|
||||
|
||||
## Preliminary Evaluation
|
||||
|
||||
To check whether GRCh38 is better than GRCh37, we mapped the CHM1 and NA12878
|
||||
unitigs to GRCh37 primary (hs37), GRCh38 primary (hs38) and GRCh38+ALT+decoy
|
||||
(hs38d6), and called small variants from the alignment. CHM1 is haploid.
|
||||
Ideally, heterozygous calls are false positives (FP). NA12878 is diploid. The
|
||||
true positive (TP) heterozygous calls from NA12878 are approximately equal
|
||||
to the difference between NA12878 and CHM1 heterozygous calls. A better assembly
|
||||
should yield higher TP and lower FP. The following table shows the numbers for
|
||||
these assemblies:
|
||||
|
||||
|Assembly|hs37 |hs38 |hs38d6|CHM1_1.1| huref|
|
||||
|:------:|------:|------:|------:|------:|------:|
|
||||
|FP | 255706| 168068| 142516|307172 | 575634|
|
||||
|TP |2142260|2163113|2150844|2167235|2137053|
|
||||
|
||||
With this measurement, hs38 is clearly better than hs37. Genome hs38d6 reduces
|
||||
FP by ~25k but also reduces TP by ~12k. We manually inspected variants called
|
||||
from hs38 only and found the majority of them are associated with excessive read
|
||||
depth, clustered variants or weak alignment. We believe most hs38-only calls are
|
||||
problematic. In addition, if we compare two NA12878 replicates from HiSeq X10
|
||||
with nearly identical library construction, the difference is ~140k, an order
|
||||
of magnitude higher than the difference between hs38 and hs38d6. ALT contigs,
|
||||
decoy and HLA genes in hs38d6 improve variant calling at little cost.
|
||||
|
||||
## Problems and Future Development
|
||||
|
||||
There are some uncertainties about ALT mappings - we are not sure whether they
|
||||
|
|
|
|||
|
|
@ -104,17 +104,17 @@
|
|||
|
||||
\f0\fs24 \cf2 Read: A\cf0 TCAGCATC\
|
||||
\cf2 \
|
||||
ALT ctg 1: \cf3 TGA\cf3 AA---CGAATGCAAATCA
|
||||
ALT ctg 1: \cf3 TGA\cf3 AA---CGAATGCAAATGGTCA
|
||||
\f1\b \cf4 ATCAGCATC
|
||||
\f0\b0 \cf3 GAACTAGTCACAT\cf2 \
|
||||
\cf3 |||||\cf5 (high div) \cf3 |||\cf5 (novel ins)\cf3 ||||||||||\cf2 \
|
||||
\cf3 |||||\cf5 (high div) \cf3 ||||||\cf5 (novel ins)\cf3 ||||||||||\cf2 \
|
||||
Chromosome:\cf3 GCGTACATGATACGA
|
||||
\f1\b \cf6 ATCgGCATC
|
||||
\f0\b0 \cf3 ATC-------------CTAGTCACATCGTAATCGA\
|
||||
\cf2 \cf3 |||||||||||| |||||||\cf5 (novel ins) \cf3 ||||||||||\
|
||||
\f0\b0 \cf3 ATGGTC-------------CTAGTCACATCGTAATC\
|
||||
\cf2 \cf3 |||||||||||| ||||||||||\cf5 (novel ins) \cf3 ||||||||||\
|
||||
\cf2 ALT ctg 2:\cf3 TGATACGA
|
||||
\f1\b \cf7 ATCgcCATC
|
||||
\f0\b0 \cf3 ATCA
|
||||
\f0\b0 \cf3 ATGGTCA
|
||||
\f1\b \cf8 ATCgcCAgC
|
||||
\f0\b0 \cf3 GAACTAGTCACAT\
|
||||
\
|
||||
|
|
@ -140,7 +140,9 @@ Chromosome:\cf3 GCGTACATGATACGA
|
|||
\cf0 Hits considered in mapQ:
|
||||
\f1\b \cf4 ATCAGCATC
|
||||
\f0\b0 \cf0 and
|
||||
\f1\b \cf6 ATCgGCATC\
|
||||
\f1\b \cf6 ATCgGCATC
|
||||
\f0\b0 \cf2 (best from each group)
|
||||
\f1\b \cf6 \
|
||||
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural
|
||||
|
||||
\f0\b0 \cf3 \
|
||||
|
|
@ -217,7 +219,7 @@ Chromosome:\cf3 GCGTACATGATACGA
|
|||
<key>MasterSheets</key>
|
||||
<array/>
|
||||
<key>ModificationDate</key>
|
||||
<string>2014-11-17 18:01:49 +0000</string>
|
||||
<string>2014-11-17 18:28:10 +0000</string>
|
||||
<key>Modifier</key>
|
||||
<string>Heng Li</string>
|
||||
<key>NotesVisible</key>
|
||||
|
|
|
|||
Binary file not shown.
|
Before Width: | Height: | Size: 45 KiB After Width: | Height: | Size: 47 KiB |
Loading…
Reference in New Issue