Commit Graph

805 Commits (5dca1e4d2e44d734b95a89e46cb9e416dfd4bae0)

Author SHA1 Message Date
chartl 480859db50 Contractified version of MannWhitneyU. Some behavior has been changed:
- Running a test when there are no observations of at least one of the sets now breaks the MWU contract
   + MWU returns Pair(Double.NaN,Double.NaN) in these instances to maintain the contract of never returning null
   + No more Double.Infinity values will appear
 - RankSumTests now probe the return values for NaNs, and don't annotate if they appear
 - For small sets where the probability is calculated recursively, the z-value is now the inversion of the error function
    and not the approximate z-value
 - UG and Annotator integration tests updated to reflect changes



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5845 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 13:57:15 +00:00
depristo 6a49e8df34 Significant change to the way subsetting by sample works with monomorphic sites. Now keeps the alt allele, even if a record is AC=0 after the subset. Previously, the system dropped the alt allele, which I don't think is the right behavior. If you really want a VCF without monomorphic sites, use the option to drop monomorphic sites after subsetting. See detailed information below.
Right now, if you select a multi-sample VCF file down (or one with filters I see) down to a smaller set of samples, and the site isn't polymorphic in that subgroup, then the alt allele is lost.  For example, when selecting down NA12878 from the OMNI, I previously received the following VCF:

1       82154   rs4477212       A       .       .       PASS    AC=0;AF=0.00;AN=2;CR=100.0;DP=0;GentrainScore=0.7826;HW=1.0     GT:GC   0/0:0.7205
1       534247  SNP1-524110     C       .       .       PASS    AC=0;AF=0.00;AN=2;CR=99.93414;DP=0;GentrainScore=0.7423;HW=1.0  GT:GC   0/0:0.6491
1       565286  SNP1-555149     C       T       .       PASS    AC=2;AF=1.00;AN=2;CR=98.8266;DP=0;GentrainScore=0.7029;HW=1.0   GT:GC   1/1:0.3471
1       569624  SNP1-559487     T       C       .       PASS    AC=2;AF=1.00;AN=2;CR=97.8022;DP=0;GentrainScore=0.8070;HW=1.0   GT:GC   1/1:0.3942

Where the first two records lost the ALT allele, because NA12878 is hom-ref at this site.  My change results in a VCF that looks like:

1       82154   rs4477212       A       G       .       PASS    AC=0;AF=0.00;AN=2;CR=100.0;DP=0;GentrainScore=0.7826;HW=1.0     GT:GC   0/0:0.7205
1       534247  SNP1-524110     C       T       .       PASS    AC=0;AF=0.00;AN=2;CR=99.93414;DP=0;GentrainScore=0.7423;HW=1.0  GT:GC   0/0:0.6491
1       565286  SNP1-555149     C       T       .       PASS    AC=2;AF=1.00;AN=2;CR=98.8266;DP=0;GentrainScore=0.7029;HW=1.0   GT:GC   1/1:0.3471
1       569624  SNP1-559487     T       C       .       PASS    AC=2;AF=1.00;AN=2;CR=97.8022;DP=0;GentrainScore=0.8070;HW=1.0   GT:GC   1/1:0.3942

The genotype remains unchanged, but the ALT allele is now preserved.  I think this is the correct behavior, as reducing samples down shouldn't change the character of the site, only the AC in the subpopulation.  This is related to the tricky issue of isPolymorphic() vs. isVariant().  

isVariant => is there an ALT allele?
isPolymorphic => is some sample non-ref in the samples?

In part this is complicated as the semantics of sites-only VCFs, where ALT = . is used to mean not-polymorphic.  Unfortunately, I just don't think there's a consistent convention right now, but it might be worth at some point to adopt a single approach to handling this.  Wiki docs updated.

Does anyone have critical infrastructure that depends on the previous convention?  Let me know so we can coordinate the change.

There's a new function subContextFromGenotypes() that also takes a Set<Allele> to handle this type of behavior.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5832 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-21 13:59:16 +00:00
depristo e234589240 Contracts for GenomeLocParser and GenomeLoc are now fully implemented.
GenomeLocs can officially have any start/stop values from -Inf - +Inf.  Bounds w.r.t. the reference are enforced, optionally, by GenomeLocParser.  General code cleanup throughout the subsystem.

All validation code for GLs is now centralized, and all I/O systems now validate their inputs.  Because of this, the Picard interval processing code has been changed to examine whether an interval is valid, and only keep the valid intervals.  Note that the scatter/gather test was changed, because the original hg18 chr20 interval files as actually malformed (all records for some reason where on chr20).  

Many interval processing routines were moved to IntervalUtils, as this is their natural home.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5830 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-21 02:01:59 +00:00
depristo e16bc2cbd9 Contracts for Java now write for GenomeLoc and GenomeLocParser. The semantics of GenomeLoc are now much clearer. It is no longer allowed to create invalid GenomeLocs -- you can only create them with well formed start, end, and contigs, with respect to the mater dictionary. Where one previously created an invalid GenomeLoc, and asked is this valid, you must now provide the raw arguments to helper functions to assess this. Providing bad arguments to GenomeLoc generates UserExceptions now. Added utilty functions contigIsInDictionary and indexIsInDictionary to help with this.
Refactored several Interval utilties from GenomeLocParser to IntervalUtils, as one might expect they go

Removed GenomeLoc.clone() method, as this was not correctly implemented, and actually unnecessary, as GenomeLocs are immutable.  Several iterator classes have changed to remove their use of clone()

Removed misc. unnecessary imports

Disabled, temporarily, the validating pileup integration test, as it uses reads mapped to an different reference sequence for ecoli, and this now does not satisfy the contracts for GenomeLoc


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5827 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-20 15:43:27 +00:00
hanna 03452c15c0 Cleanup GATKBAMIndex unit test to allow a more efficient access pattern for
FindLargeShards.  Runtime of FindLargeShards on papuan dataset is now 75min.
GATK proper should benefit as well, although the benefits might be so small
as to not be measurable.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5798 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-12 21:50:33 +00:00
rpoplin 40797f9d45 Ensuring a minimum number of variants when clustering with bad variants. Better error message when Matrix library fails to calculate inverse.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5793 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-12 01:48:37 +00:00
ebanks dfdef2d29b PLEASE READ ME! In order to prepare for the upcoming changes to VCF4, we felt it was best to split up the vcf3 and vcf4 codecs (vcf4 is not backwards compatible to vcf3 and certain changes are too complex to handle in both codecs). Using the 'VCF' rod type in the GATK will now throw a UserException for vcf3.2 or vcf3.3 files telling you to use the 'VCF3' type instead (and vice versa). Integration/unit tests have been updated. For programmers: note that there is currently a lot of code duplication in the two codecs (although I pulled out the easy stuff to a VCFCodecUtils class); however WE ARE FREEZING THE VCF3 CODEC AND WILL NO LONGER MAKE CHANGES TO IT. All updates/improvements will be targetted to the vcf4 codec only as vcf3 is there only to be able to read legacy files. People should really be using vcf4 files only.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5787 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-11 12:07:44 +00:00
hanna f275be6968 A 'fat shard' finder. Cranks through the indices of a BAM file or list of
BAM files looking for outliers (outliers right now are defined naively  as 
shards whose sizes are more than 5 stddevs away from the mean).  Runs in
13 minutes per chromosome on 707 low pass whole genome BAMs -- not great, but
much faster than running UG on the same region to discover anomalies.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5782 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-10 12:56:47 +00:00
ebanks 15c7bd82a5 Fix for IndelRealigner memory problem. Now the Constrained mate fixing writer is told whether a read has been modified and, if it wasn't, can dump it when the cache needs to get flushed at places with tons of coverage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5777 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-06 19:34:41 +00:00
hanna c2e8c460cb Factor out all testing dependencies into a separate test configuration and
only download that test configuration when running unit/integration tests.
This means that the build will (hopefully) never break because it can't
fetch a file that isn't required for the GATK to run.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5775 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 22:42:11 +00:00
delangel 7d7ce6cf00 Two embarassing bug fixes:
a) Forgot to convert from phred to log-prob when computing gap penalties from recal table.
b) Forgot to uncomment code to correctly deal with hard-clipped bases in a read. But because of this, had to do a short term workaround to at least temporarily return class from hardClipAdaptorSequence to GATKSAMRecord. Otherwise, I get exceptions when casting because somehow some reads in HiSeq get to be SAMRecord (which GATKSAMRecord inherits from) but some reads get to be BAMRecords (which can't be cast into GATKSAMRecord), not sure why.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5771 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 17:08:34 +00:00
carneiro 3882d1b9c0 fixing the build \o/
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5767 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 00:57:49 +00:00
hanna 5c6965575e Some refactoring that Mauricio and I worked through together. Changed filters
to extend from org.broadinstitute.sting.gatk.filters.ReadFilter rather than
directly from net.sf.picard.filter.SamRecordFilter, which allows us to add
an initialize(GATKEngine) method so that filters can do any initialization
they'd like based on CL arguments, SAM headers, etc.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5760 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:29:08 +00:00
rpoplin 6c7a0adc76 Updating VariantGaussianMixtureModelUnitTest to use truth sensitivity cutting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5750 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 13:56:01 +00:00
rpoplin 23cd3a7a5d Moving VQSR v2 to core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5740 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 20:20:06 +00:00
ebanks d4cbd8691c Make the default that we only output SNPs (so that when I make another release we don't get flooded with questions about why the UG is all of a sudden so slow)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5729 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 16:38:55 +00:00
ebanks deed7c47a1 Continuing the epic fail, some of our existing integration tests were wrong because of the lazy loading failure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5712 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 17:54:41 +00:00
ebanks ab9ffb1a74 Epic failure on the lazy loading of genotypes: if the input VCF had its samples unsorted and we used a walker that didn't require genotypes, then we would sort the samples but not load genotypes (and therefore the genotypes wouldn't match the samples anymore). Added simple integration test to cover this case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5711 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 16:03:45 +00:00
rpoplin b7334dcc1e Rank sum test annotations are the Z-scores from the test instead of the p-value.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5707 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 14:35:00 +00:00
ebanks 45081c32d7 continuing from last night, the integration tests weren't covering the right behavior either
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5706 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 13:30:57 +00:00
droazen d650efd40a Fix for bug GSA-449: Intervals that are not in GATK format are not validated
to the same standard as GATK format intervals. Full validation against contig
bounds is now performed for all intervals, regardless of their source. Also
fixed a few tests for validation exclusions that were backwards.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5698 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 18:12:10 +00:00
delangel 600617a63c Enabled code to deal with hard-clipping adaptor sequence when processing reads in pileup in indel caller. Proven now that changes are minimal (4 less calls in NA12878 chr20, quals slightly different), minor changes in vcf fields in integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5679 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 14:10:33 +00:00
hanna 7428ae338a A fix for Marian Thieme's NPE in the new sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5675 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 19:47:14 +00:00
kshakir 8619f49d20 Added a utility method to retrieve the contig lengths for WG chunking.
Added a rudimentary GATKReportParser for parsing VE3 results.
Re-enabled the FCPTest using VE3, the GATKRP, and the PicardAggregationUtils.
The tag type for .rod files is DBSNP, not ROD.
More explicit return types on implicit methods.
Added null checks for implicit string to/from file conversions.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5668 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 19:22:21 +00:00
hanna 54660a8c25 Fix requested by Lee Lichtenstein: first check to see whether it's time for
a progress message, then aggregate metrics.  Makes the overhead of
printProgress in RealignerTargetCreator go from >20% to ~3%.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5663 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 03:22:48 +00:00
ebanks 49ea07acce My fixes to Tribble yesterday revealed that some of the test VCFs for integration tests were actually malformed. Also, Guillermo updated the b37 dbSNP VCF and that broke some tests. Should be good for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5655 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-17 03:39:11 +00:00
depristo 8ed9c0f518 VariantsToTable now blows up by default if you ask for a field that isn't present in a record.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5636 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 14:42:43 +00:00
hanna 22a11e41e1 Rewrite of GATKBAMIndex to avoid mmaps causing false reports of heavy memory
usage.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5620 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:49:58 +00:00
rpoplin 30a19a00fe Fix for when running with EMIT_ALL_SITES but not GENOTYPE_GIVEN_ALLELES. Still want to emit a site even when over the deletion fraction for example.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5617 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 20:07:06 +00:00
delangel 3b424fd74d Enable new indel likelihood model by default, cleanup code, remove dead arguments, still more cleanups to follow. This isn't final version but at least it performs better in all cases than previous Dindel-based version, so no reason to keep old one around.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5615 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 17:54:46 +00:00
ebanks b6e7b5dace Updating to reflect my recent Tribble fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5601 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 11:48:00 +00:00
ebanks cd61ef7169 Re-enabling multi-threaded integration tests. To make this work, downsampling and annotations are disabled for this test so that we don't have randomization issues for it based on which shards get executed first.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5597 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 03:07:39 +00:00
ebanks af09170167 As I threatened yesterday, I've moved the various and disparate randomization code out of the walkers. Now they all (except VQSRv1, whose days are numbered anyways) use a static generator available in the engine itself. Please use this from now on. The seed is reset before every individual integration test is run. I think there may still be an issue with the IndelRealigner but I need to confirm with the commit to see what testNG does. Integration tests are already broken anyways, so no big deal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5589 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 17:03:48 +00:00
rpoplin 3f3f35dea0 UnifiedGenotyper now BAQs via ADD_TAG to facilitate using BAQed quals for GL calculations but unBAQed quals for annotation calculations. UnifiedGenotyper now produces SNP and indel calls simultaneously. 40 base mismatch intrinsic filter removed from UG to greatly simplify the code. RankSumTests are now standard annotations but the integration tests are commented out pending changes that will allow random annotations to work.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5585 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:06:24 +00:00
ebanks 4b451314b2 Only store a read in the mate hash if it could possibly be moved. This reduces memory consumption especially when dealing with a case of tons of unmapped reads at the end of the bam; however, it's only mildly helpful for chr1 of the Papuans (there's a truly massive pileup 120Mb into it; more thought needed at a later point). Integration tests changed only because some of the reads in the original bam were busted to begin with (it's an old pilot 1000G bam).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5580 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-05 22:20:09 +00:00
droazen db9908ec02 Small correction to the unit test code from my last commit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5572 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:55:38 +00:00
droazen a5acb0b7a6 Fix for bug GSA-314: Detect -XL and -L incompatibility. An ArgumentException is
now thrown if the combination of -L and -XL intervals specified on the command 
line results in an empty interval set after set subtraction. 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5571 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 18:41:55 +00:00
depristo 095125152b Updated to now longer include 2nd-best base output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5567 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 20:13:10 +00:00
droazen 0927b7c297 Fix for bug GSA-441: BAM file list with blank lines gives a confusing error
message. Lines containing only whitespace in .list files are now ignored. 
Also added support for comments in .list files: lines whose first
non-whitespace character is '#' are now also ignored.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5550 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 15:04:35 +00:00
droazen 7b452ea2b9 Fix for bug GSA-430: Can't specify same BAM file twice on the command line. An ArgumentException with an appropriate error message and a list of the duplicate BAMs is now thrown in this case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5542 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 22:23:24 +00:00
rpoplin 5ddc0e464a Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 21:04:09 +00:00
depristo cd8321cdc9 Removed the completely unused generic but extremely expensive infrastructure for dynamic LocusIteratorFilters. Now the one, and probably only useful one, is called directly in the LocusIteratorByState itself to filter adaptor bases from reads. This shaves 10% off the runtime of all walkers, apparently. Has the additional benefit of eliminating a lot of complex infrastructure that resulted ultimately in only a single function call.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5525 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 20:48:24 +00:00
depristo 3bcd4c5d75 --simplifyBAM is now in the SAMFileWriterArgumentTypeDescriptor, as suggested by map. PrintReads has an integrationtest now that writes out a 1 MB bit of HiSeq normally, with compress 0, and with simplifyBAM on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5521 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 14:57:18 +00:00
ebanks 69646ff840 ... and the corresponding integration test update
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5496 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 01:58:07 +00:00
ebanks 1c95208e26 Finally found the bug that everyone is reporting on GS. Iterators on PriorityQueues aren't guaranteed to return elements in sorted order (a pretty stupid contract) - so we were passing items to the constrained writer out of order. Just do a Collections.sort instead (1 line of code). Happy father's day!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5476 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 21:28:19 +00:00
depristo 3e3ec85807 Checked for consistency with the previous integration tests, and updated the walker and test to use the new I/O system (always prints 4 digits on floats.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5433 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-13 15:24:22 +00:00
depristo ee8f2871f7 A better output for Genotype Concordance summary. Now does only % comp hom-ref called hom-ref, het called het, and hom-var called hom-var, which are the quantities we typically show in slides. Updated intergration tests to reflect this change.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5429 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-12 02:03:48 +00:00
ebanks 3596c56602 New attempt at the constrained movement version of the indel realigner (I've kept around the old writer for now). The new contract is that the realigner must ask permission before trying to clean an area; permission will be denied by the CM-Manager if it was required to flush its cache of reads because of too much depth within a distance of maxInsertSizeForMovingReadPairs. Added integration tests to cover different max cache sizes, including an expected exception when too small a value is chosen. The actual logic changes were fairly minor - much of this commit is really just some cleanup. I'd like to throw 1000G Phase I at it, but will respectfully wait for Ryan to hit his deadline before doing so.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5414 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 02:48:29 +00:00
rpoplin ff7edc4493 Minor bug fix in empiricalMu prior calculation in VQSR.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5412 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-10 00:42:38 +00:00
rpoplin 509daac9f7 Minor bug fix in k-means implementation. Updating VQSR integration tests in preparation for VQSRv2 by removing some unused features such as VariantDatum.weight and ti/tv cutting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5410 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-09 00:26:28 +00:00