Refactored several Interval utilties from GenomeLocParser to IntervalUtils, as one might expect they go
Removed GenomeLoc.clone() method, as this was not correctly implemented, and actually unnecessary, as GenomeLocs are immutable. Several iterator classes have changed to remove their use of clone()
Removed misc. unnecessary imports
Disabled, temporarily, the validating pileup integration test, as it uses reads mapped to an different reference sequence for ecoli, and this now does not satisfy the contracts for GenomeLoc
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5827 348d0f76-0448-11de-a6fe-93d51630548a
and its three or four nearest neighbors could be in memory at once. Tweaking
the iterators to ensure that previous AlignmentContexts don't have strong
references which means that the garbage collector can work effectively to
help us trundle through these regions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5820 348d0f76-0448-11de-a6fe-93d51630548a
Also added more logging when extension generation fails.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5812 348d0f76-0448-11de-a6fe-93d51630548a
This step just changes storage of likelihoods so now we have, instead of an internal matrix, a class member which stores, as a hash table, a mapping from pileup element to an (allele, likelihood) pair. There's no functional change aside from internal data storage.
As a bonus, we get for free a 2-3x improvement in speed in calling because redundant likelihood computations are removed.
Next step will hook this up to, and redefine annotation engine interaction with UG for indel case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5809 348d0f76-0448-11de-a6fe-93d51630548a
Added doc string for getNBoundRodTracks()
Intermediate commit for CalibrateGenotypeLikelihoods and GenotypeConcordanceTable, so I have a record of my work. Not ready for public consumption. Really looking forward to making local commits so I can track my progress without needing to push incomplete functionality up to the server.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5807 348d0f76-0448-11de-a6fe-93d51630548a
FindLargeShards. Runtime of FindLargeShards on papuan dataset is now 75min.
GATK proper should benefit as well, although the benefits might be so small
as to not be measurable.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5798 348d0f76-0448-11de-a6fe-93d51630548a
BAM files looking for outliers (outliers right now are defined naively as
shards whose sizes are more than 5 stddevs away from the mean). Runs in
13 minutes per chromosome on 707 low pass whole genome BAMs -- not great, but
much faster than running UG on the same region to discover anomalies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5782 348d0f76-0448-11de-a6fe-93d51630548a
only download that test configuration when running unit/integration tests.
This means that the build will (hopefully) never break because it can't
fetch a file that isn't required for the GATK to run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5775 348d0f76-0448-11de-a6fe-93d51630548a
a) Forgot to convert from phred to log-prob when computing gap penalties from recal table.
b) Forgot to uncomment code to correctly deal with hard-clipped bases in a read. But because of this, had to do a short term workaround to at least temporarily return class from hardClipAdaptorSequence to GATKSAMRecord. Otherwise, I get exceptions when casting because somehow some reads in HiSeq get to be SAMRecord (which GATKSAMRecord inherits from) but some reads get to be BAMRecords (which can't be cast into GATKSAMRecord), not sure why.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5771 348d0f76-0448-11de-a6fe-93d51630548a
Minor updates to the FCPTest to match the changes due to using the old indel caller.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5766 348d0f76-0448-11de-a6fe-93d51630548a
to extend from org.broadinstitute.sting.gatk.filters.ReadFilter rather than
directly from net.sf.picard.filter.SamRecordFilter, which allows us to add
an initialize(GATKEngine) method so that filters can do any initialization
they'd like based on CL arguments, SAM headers, etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5760 348d0f76-0448-11de-a6fe-93d51630548a
- Added sensitivity and Specificity to the report.
- With the changes in genotype likelihoods, the indel analysis only happens if the BAM file also has an extended event. Not great, but at least it's not broken.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5759 348d0f76-0448-11de-a6fe-93d51630548a