gatk-3.8/public
Eric Banks 379a9841ce Various bug fixes for recent Reduce Reads additions plus solution implemented for low MQ reads.
1. Using cumulative binomial probability was not working at high coverage sites (because p-values quickly
got out of hand) so instead we use a hybrid system for determining significance: at low coverage sites
use binomial prob and at high coverage sites revert to using the old base proportions.  Then we get the
best of both worlds.  As a note, coverage refers to just the individual base counts and not the entire pileup.

2. Reads were getting lost because of the comparator being used in the SlidingWindow. When read pairs had
the same alignment end position the 2nd one encountered would get dropped (but added to the header!). We
now use a PriorityQueue instead of a TreeSet to allow for such cases.

3. Each consensus keeps track of its own number of softclipped bases.  There was no reason that that number
should be shared between them.

4. We output consensus filtered (i.e. low MQ) reads whenever they are present for now.  Don't lose that
information.  Maybe we'll decide to change this in the future, but for now we are conservative.

5. Also implemented various small performance optimizations based on profiling.

Added unit tests to cover these changes; systematic assessment now tests against low MQ reads too.
2013-04-24 18:18:50 -04:00
..
R Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) 2013-03-12 10:57:14 -04:00
c At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings. 2012-01-17 14:47:53 -05:00
chainFiles
doc Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) 2013-03-12 10:57:14 -04:00
java Various bug fixes for recent Reduce Reads additions plus solution implemented for low MQ reads. 2013-04-24 18:18:50 -04:00
keys Public-key authorization scheme to restrict use of NO_ET 2012-03-06 00:09:43 -05:00
packages ValidatingPileup was renamed to CheckPileup 2013-02-15 11:56:19 -05:00
perl Split out contig names from Reference .fai file on white space (to support the GATK resource bundle's file human_g1k_v37.fasta.fai.gz, which does not use tab delimiters) 2012-06-07 16:56:32 -04:00
scala Adding the 1000G_phase1.snps.high_confidence callset to the GATK resource bundle for use in the April 2013 updated best practices. 2013-04-24 11:41:32 -04:00
testdata Updating the exampleGRP.grp test file 2013-04-13 09:07:13 -04:00