gatk-3.8

History

Guillermo del Angel 55d5f2194c Read Error Corrector for haplotype assembly Principle is simple: when coverage is deep enough, any single-base read error will look like a rare k-mer but correct sequence will be supported by many reads to correct sequences will look like common k-mers. So, algorithm has 3 main steps: 1. K-mer graph buildup. For each read in an active region, a map from k-mers to the number of times they have been seen is built. 2. Building correction map. All "rare" k-mers that are sparse (by default, seen only once), get mapped to k-mers that are good (by default, seen at least 20 times but this is a CL argument), and that lie within a given Hamming distance (by default, =1). This map can be empty (i.e. k-mers can be uncorrectable). 3. Correction proposal For each constituent k-mer of each read, if this k-mer is rare and maps to a good k-mer, get differing base positions in k-mer and add these to a list of corrections for each base in each read. Then, correct read at positions where correction proposal is unanimous and non-empty. The algorithm defaults are chosen to be very stringent and conservative in the correction: we only try to correct singleton k-mers, we only look for good k-mers lying at Hamming distance = 1 from them, and we only correct a base in read if all correction proposals are congruent. By default, algorithm is disabled but can be enabled in HaplotypeCaller via the -readErrorCorrect CL option. However, at this point it's about 3x-10x more expensive so it needs to be optimized if it's to be used.		2013-06-11 12:26:24 -04:00
..
annotator	Refactor rsID and overlap detection in VariantOverlapAnnotator utility class	2013-06-10 15:51:13 -04:00
beagle	Updated all JAVA file licenses accordingly	2013-01-10 17:06:41 -05:00
bqsr	Make BQSR calculateIsIndel robust to indel CIGARs are start/end of read	2013-05-31 13:58:37 -04:00
compression/reducereads	Fix error in merging code in HC	2013-05-31 16:29:29 -04:00
diagnostics	Update MD5s and the Diagnose Target scala script	2013-05-13 12:06:17 -04:00
diffengine	Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs)	2013-03-12 10:57:14 -04:00
fasta	Updated all JAVA file licenses accordingly	2013-01-10 17:06:41 -05:00
filters	Don't allow users to specify keys and IDs that contain angle brackets or equals signs (not allowed in VCF spec).	2013-04-05 00:52:32 -04:00
genotyper	Refactor rsID and overlap detection in VariantOverlapAnnotator utility class	2013-06-10 15:51:13 -04:00
haplotypecaller	Read Error Corrector for haplotype assembly	2013-06-11 12:26:24 -04:00
indels	Secondary alignments were not handled correctly in IndelRealigner	2013-05-06 19:09:10 -04:00
phasing	Updated all JAVA file licenses accordingly	2013-01-10 17:06:41 -05:00
validation	MathUtils.randomSubset() now uses Collections.shuffle() (indirectly, through the other methods	2013-03-29 14:52:10 -04:00
varianteval	Move some VCF/VariantContext methods back to the GATK based on feedback	2013-01-29 16:56:55 -05:00
variantrecalibration	Update MD5s for VQSR header change	2013-04-16 11:45:45 -04:00
variantutils	CombineVariants no longer adds PASS to unfiltered records	2013-05-20 16:53:51 -04:00