gatk-3.8/public
Mark DePristo 34ea443cdb Better algorithm for choosing which indel alleles are present in samples
-- The previous approach (requiring > 5 copies among all reads) is breaking down in many samples (>1000) just from sequencing errors.
-- This breakdown is producing spurious clustered indels (lots of these!) around real common indels
-- The new approach requires >X% of reads in a sample to carry an indel of any type (no allele matching) to be including in the counting towards 5.  This actually makes sense in that if you have enough data we expect most reads to have the indel, but the allele might be wrong because of alignment, etc.  If you have very few reads, then the threshold is crossed with any indel containing read, and it's counted.
-- As far as I can tell this is the right thing to do in general.  We'll make another call set in ESP and see how it works at scale.
-- Added integration tests to ensure that the system is behaving as I expect on the site I developed the code on from ESP
2012-03-26 16:28:49 -04:00
..
R Further cleanup of R 2012-03-22 21:24:37 -04:00
c At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings. 2012-01-17 14:47:53 -05:00
chainFiles Reorganized the codebase beneath top-level public and private directories, 2011-06-28 06:55:19 -04:00
doc Reorganized the codebase beneath top-level public and private directories, 2011-06-28 06:55:19 -04:00
java Better algorithm for choosing which indel alleles are present in samples 2012-03-26 16:28:49 -04:00
keys Public-key authorization scheme to restrict use of NO_ET 2012-03-06 00:09:43 -05:00
packages Public-key authorization scheme to restrict use of NO_ET 2012-03-06 00:09:43 -05:00
perl Update to the bindings for liftOverVCF.pl (to -V from -B) 2011-09-15 15:33:09 -04:00
scala Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-13 09:28:16 -04:00
testdata BQSR with GATKReport implementation 2012-03-23 15:42:32 -04:00