added an evaluation section
This commit is contained in:
parent
cee8149b12
commit
eb664c2fe8
|
|
@ -135,6 +135,31 @@ other programs for typing such as [Warren et al (2012)][hla4], [Liu et al
|
||||||
(2013)][hla2], [Bai et al (2014)][hla3] and [Dilthey et al (2014)][hla1], though
|
(2013)][hla2], [Bai et al (2014)][hla3] and [Dilthey et al (2014)][hla1], though
|
||||||
most of them are distributed under restrictive licenses.
|
most of them are distributed under restrictive licenses.
|
||||||
|
|
||||||
|
## Preliminary Evaluation
|
||||||
|
|
||||||
|
To check whether GRCh38 is better than GRCh37, we mapped the CHM1 and NA12878
|
||||||
|
unitigs to GRCh37 primary (hs37), GRCh38 primary (hs38) and GRCh38+ALT+decoy
|
||||||
|
(hs38d6), and called small variants from the alignment. CHM1 is haploid.
|
||||||
|
Ideally, heterozygous calls are false positives (FP). NA12878 is diploid. The
|
||||||
|
true positive (TP) heterozygous calls from NA12878 are approximately equal
|
||||||
|
to the difference between NA12878 and CHM1 heterozygous calls. A better assembly
|
||||||
|
should yield higher TP and lower FP. The following table shows the numbers for
|
||||||
|
these assemblies:
|
||||||
|
|
||||||
|
|Assembly|hs37 |hs38 |hs38d6|CHM1_1.1| huref|
|
||||||
|
|:------:|------:|------:|------:|------:|------:|
|
||||||
|
|FP | 255706| 168068| 142516|307172 | 575634|
|
||||||
|
|TP |2142260|2163113|2150844|2167235|2137053|
|
||||||
|
|
||||||
|
With this measurement, hs38 is clearly better than hs37. Genome hs38d6 reduces
|
||||||
|
FP by ~25k but also reduces TP by ~12k. We manually inspected variants called
|
||||||
|
from hs38 only and found the majority of them are associated with excessive read
|
||||||
|
depth, clustered variants or weak alignment. We believe most hs38-only calls are
|
||||||
|
problematic. In addition, if we compare two NA12878 replicates from HiSeq X10
|
||||||
|
with nearly identical library construction, the difference is ~140k, an order
|
||||||
|
of magnitude higher than the difference between hs38 and hs38d6. ALT contigs,
|
||||||
|
decoy and HLA genes in hs38d6 improve variant calling at little cost.
|
||||||
|
|
||||||
## Problems and Future Development
|
## Problems and Future Development
|
||||||
|
|
||||||
There are some uncertainties about ALT mappings - we are not sure whether they
|
There are some uncertainties about ALT mappings - we are not sure whether they
|
||||||
|
|
|
||||||
|
|
@ -104,17 +104,17 @@
|
||||||
|
|
||||||
\f0\fs24 \cf2 Read: A\cf0 TCAGCATC\
|
\f0\fs24 \cf2 Read: A\cf0 TCAGCATC\
|
||||||
\cf2 \
|
\cf2 \
|
||||||
ALT ctg 1: \cf3 TGA\cf3 AA---CGAATGCAAATCA
|
ALT ctg 1: \cf3 TGA\cf3 AA---CGAATGCAAATGGTCA
|
||||||
\f1\b \cf4 ATCAGCATC
|
\f1\b \cf4 ATCAGCATC
|
||||||
\f0\b0 \cf3 GAACTAGTCACAT\cf2 \
|
\f0\b0 \cf3 GAACTAGTCACAT\cf2 \
|
||||||
\cf3 |||||\cf5 (high div) \cf3 |||\cf5 (novel ins)\cf3 ||||||||||\cf2 \
|
\cf3 |||||\cf5 (high div) \cf3 ||||||\cf5 (novel ins)\cf3 ||||||||||\cf2 \
|
||||||
Chromosome:\cf3 GCGTACATGATACGA
|
Chromosome:\cf3 GCGTACATGATACGA
|
||||||
\f1\b \cf6 ATCgGCATC
|
\f1\b \cf6 ATCgGCATC
|
||||||
\f0\b0 \cf3 ATC-------------CTAGTCACATCGTAATCGA\
|
\f0\b0 \cf3 ATGGTC-------------CTAGTCACATCGTAATC\
|
||||||
\cf2 \cf3 |||||||||||| |||||||\cf5 (novel ins) \cf3 ||||||||||\
|
\cf2 \cf3 |||||||||||| ||||||||||\cf5 (novel ins) \cf3 ||||||||||\
|
||||||
\cf2 ALT ctg 2:\cf3 TGATACGA
|
\cf2 ALT ctg 2:\cf3 TGATACGA
|
||||||
\f1\b \cf7 ATCgcCATC
|
\f1\b \cf7 ATCgcCATC
|
||||||
\f0\b0 \cf3 ATCA
|
\f0\b0 \cf3 ATGGTCA
|
||||||
\f1\b \cf8 ATCgcCAgC
|
\f1\b \cf8 ATCgcCAgC
|
||||||
\f0\b0 \cf3 GAACTAGTCACAT\
|
\f0\b0 \cf3 GAACTAGTCACAT\
|
||||||
\
|
\
|
||||||
|
|
@ -140,7 +140,9 @@ Chromosome:\cf3 GCGTACATGATACGA
|
||||||
\cf0 Hits considered in mapQ:
|
\cf0 Hits considered in mapQ:
|
||||||
\f1\b \cf4 ATCAGCATC
|
\f1\b \cf4 ATCAGCATC
|
||||||
\f0\b0 \cf0 and
|
\f0\b0 \cf0 and
|
||||||
\f1\b \cf6 ATCgGCATC\
|
\f1\b \cf6 ATCgGCATC
|
||||||
|
\f0\b0 \cf2 (best from each group)
|
||||||
|
\f1\b \cf6 \
|
||||||
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural
|
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural
|
||||||
|
|
||||||
\f0\b0 \cf3 \
|
\f0\b0 \cf3 \
|
||||||
|
|
@ -217,7 +219,7 @@ Chromosome:\cf3 GCGTACATGATACGA
|
||||||
<key>MasterSheets</key>
|
<key>MasterSheets</key>
|
||||||
<array/>
|
<array/>
|
||||||
<key>ModificationDate</key>
|
<key>ModificationDate</key>
|
||||||
<string>2014-11-17 18:01:49 +0000</string>
|
<string>2014-11-17 18:28:10 +0000</string>
|
||||||
<key>Modifier</key>
|
<key>Modifier</key>
|
||||||
<string>Heng Li</string>
|
<string>Heng Li</string>
|
||||||
<key>NotesVisible</key>
|
<key>NotesVisible</key>
|
||||||
|
|
|
||||||
Binary file not shown.
|
Before Width: | Height: | Size: 45 KiB After Width: | Height: | Size: 47 KiB |
Loading…
Reference in New Issue