gatk-3.8

Commit Graph

Author	SHA1	Message	Date
delangel	7d7ce6cf00	Two embarassing bug fixes: a) Forgot to convert from phred to log-prob when computing gap penalties from recal table. b) Forgot to uncomment code to correctly deal with hard-clipped bases in a read. But because of this, had to do a short term workaround to at least temporarily return class from hardClipAdaptorSequence to GATKSAMRecord. Otherwise, I get exceptions when casting because somehow some reads in HiSeq get to be SAMRecord (which GATKSAMRecord inherits from) but some reads get to be BAMRecords (which can't be cast into GATKSAMRecord), not sure why. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5771 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 17:08:34 +00:00
kshakir	28b897d5de	Fixed O(N^2) operation when scattering interval files. Cleaned up intervals contig count function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5768 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 03:32:35 +00:00
carneiro	3882d1b9c0	fixing the build \o/ git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5767 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-05 00:57:49 +00:00
kshakir	8ad547e6c2	Fixed another interval bug where dividing up N intervals into N parts wasn't working. Minor updates to the FCPTest to match the changes due to using the old indel caller. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5766 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 20:49:35 +00:00
hanna	5c6965575e	Some refactoring that Mauricio and I worked through together. Changed filters to extend from org.broadinstitute.sting.gatk.filters.ReadFilter rather than directly from net.sf.picard.filter.SamRecordFilter, which allows us to add an initialize(GATKEngine) method so that filters can do any initialization they'd like based on CL arguments, SAM headers, etc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5760 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 19:29:08 +00:00
rpoplin	6c7a0adc76	Updating VariantGaussianMixtureModelUnitTest to use truth sensitivity cutting git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5750 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-04 13:56:01 +00:00
rpoplin	23cd3a7a5d	Moving VQSR v2 to core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5740 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 20:20:06 +00:00
rpoplin	e73720c2db	Updating VQSLOD annotation description git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5735 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 19:01:08 +00:00
ebanks	d4cbd8691c	Make the default that we only output SNPs (so that when I make another release we don't get flooded with questions about why the UG is all of a sudden so slow) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5729 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-03 16:38:55 +00:00
rpoplin	3224bbe750	New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-02 19:14:42 +00:00
ebanks	deed7c47a1	Continuing the epic fail, some of our existing integration tests were wrong because of the lazy loading failure. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5712 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 17:54:41 +00:00
ebanks	ab9ffb1a74	Epic failure on the lazy loading of genotypes: if the input VCF had its samples unsorted and we used a walker that didn't require genotypes, then we would sort the samples but not load genotypes (and therefore the genotypes wouldn't match the samples anymore). Added simple integration test to cover this case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5711 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-29 16:03:45 +00:00
rpoplin	b7334dcc1e	Rank sum test annotations are the Z-scores from the test instead of the p-value. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5707 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-28 14:35:00 +00:00
ebanks	45081c32d7	continuing from last night, the integration tests weren't covering the right behavior either git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5706 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-28 13:30:57 +00:00
droazen	d650efd40a	Fix for bug GSA-449: Intervals that are not in GATK format are not validated to the same standard as GATK format intervals. Full validation against contig bounds is now performed for all intervals, regardless of their source. Also fixed a few tests for validation exclusions that were backwards. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5698 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-27 18:12:10 +00:00
chartl	7afeb1ab17	Removing broken imports (boo) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5692 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-26 18:55:25 +00:00
chartl	bc3fd70b0a	Removing the old association walker, switching test to just validate that MannWhitneyU is doing the right thing. Unit tests still pass. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5690 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-26 18:05:19 +00:00
kshakir	f619dd3ca7	Refactored IntervalUtils used to parse and scatter intervals for Queue. Scattering non-contig interval lists by number of loci in the intervals instead of just number of intervals. Queue caches the list of locs and how to split them up instead of reloading them from disk repeatedly. TODO: general purpose function to divide data evenly. Skip over comments when parsing picard analysis files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5687 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-26 00:06:00 +00:00
chartl	a56a2dfdb7	Nothing to see here. Move along. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5681 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-22 15:01:02 +00:00
delangel	600617a63c	Enabled code to deal with hard-clipping adaptor sequence when processing reads in pileup in indel caller. Proven now that changes are minimal (4 less calls in NA12878 chr20, quals slightly different), minor changes in vcf fields in integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5679 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-22 14:10:33 +00:00
hanna	7428ae338a	A fix for Marian Thieme's NPE in the new sharding system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5675 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-21 19:47:14 +00:00
kshakir	8619f49d20	Added a utility method to retrieve the contig lengths for WG chunking. Added a rudimentary GATKReportParser for parsing VE3 results. Re-enabled the FCPTest using VE3, the GATKRP, and the PicardAggregationUtils. The tag type for .rod files is DBSNP, not ROD. More explicit return types on implicit methods. Added null checks for implicit string to/from file conversions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5668 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 19:22:21 +00:00
hanna	54660a8c25	Fix requested by Lee Lichtenstein: first check to see whether it's time for a progress message, then aggregate metrics. Makes the overhead of printProgress in RealignerTargetCreator go from >20% to ~3%. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5663 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-20 03:22:48 +00:00
ebanks	49ea07acce	My fixes to Tribble yesterday revealed that some of the test VCFs for integration tests were actually malformed. Also, Guillermo updated the b37 dbSNP VCF and that broke some tests. Should be good for now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5655 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-17 03:39:11 +00:00
depristo	8ed9c0f518	VariantsToTable now blows up by default if you ask for a field that isn't present in a record. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5636 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-14 14:42:43 +00:00
kshakir	4bb573b1f5	Centralizing a bunch of Broad specific utility functions from code scattered in GSA-Firehose, PipelineTest, custom QScripts, etc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5631 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 21:29:02 +00:00
chartl	efe6c539ac	Re-enabling disabled test. Apparently T-tests are very picky about your using an unbiased variance. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5622 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 03:05:50 +00:00
chartl	42bc003f46	Oops. I'll need to look at this, I think it was accidentally enabled. Disabling for now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5621 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-13 00:54:52 +00:00
hanna	22a11e41e1	Rewrite of GATKBAMIndex to avoid mmaps causing false reports of heavy memory usage. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5620 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-12 23:49:58 +00:00
rpoplin	30a19a00fe	Fix for when running with EMIT_ALL_SITES but not GENOTYPE_GIVEN_ALLELES. Still want to emit a site even when over the deletion fraction for example. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5617 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-12 20:07:06 +00:00
delangel	3b424fd74d	Enable new indel likelihood model by default, cleanup code, remove dead arguments, still more cleanups to follow. This isn't final version but at least it performs better in all cases than previous Dindel-based version, so no reason to keep old one around. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5615 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-12 17:54:46 +00:00
ebanks	b6e7b5dace	Updating to reflect my recent Tribble fix git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5601 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-08 11:48:00 +00:00
hanna	53db7b8faa	Did some refactoring which broke some unit tests, and then failed to run the unit tests. Definitely not my best effort... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5599 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-08 03:31:52 +00:00
ebanks	cd61ef7169	Re-enabling multi-threaded integration tests. To make this work, downsampling and annotations are disabled for this test so that we don't have randomization issues for it based on which shards get executed first. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5597 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-08 03:07:39 +00:00
hanna	32d502c122	Enable BAM OTF index writing by default. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5594 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-07 23:44:25 +00:00
ebanks	af09170167	As I threatened yesterday, I've moved the various and disparate randomization code out of the walkers. Now they all (except VQSRv1, whose days are numbered anyways) use a static generator available in the engine itself. Please use this from now on. The seed is reset before every individual integration test is run. I think there may still be an issue with the IndelRealigner but I need to confirm with the commit to see what testNG does. Integration tests are already broken anyways, so no big deal. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5589 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-07 17:03:48 +00:00
kshakir	45ebbf725c	Instead of always merging Picard interval files they are optionally merged by Sting Utils. Disabled the MFCP while the FCP gets an update. Minor updates to email messages for upcoming scala 2.9. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5588 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-06 21:12:05 +00:00
rpoplin	3f3f35dea0	UnifiedGenotyper now BAQs via ADD_TAG to facilitate using BAQed quals for GL calculations but unBAQed quals for annotation calculations. UnifiedGenotyper now produces SNP and indel calls simultaneously. 40 base mismatch intrinsic filter removed from UG to greatly simplify the code. RankSumTests are now standard annotations but the integration tests are commented out pending changes that will allow random annotations to work. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5585 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-06 19:06:24 +00:00
ebanks	4b451314b2	Only store a read in the mate hash if it could possibly be moved. This reduces memory consumption especially when dealing with a case of tons of unmapped reads at the end of the bam; however, it's only mildly helpful for chr1 of the Papuans (there's a truly massive pileup 120Mb into it; more thought needed at a later point). Integration tests changed only because some of the reads in the original bam were busted to begin with (it's an old pilot 1000G bam). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5580 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-05 22:20:09 +00:00
chartl	79b5fa6cc5	Structural refactoring in advance of dichotomization statistics; generalization of statistical test infrastructure. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5579 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-05 18:52:32 +00:00
chartl	bb6a30611c	Forgot to modify the test too. What a bad commit. Sorry guys. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5575 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-05 02:11:08 +00:00
droazen	db9908ec02	Small correction to the unit test code from my last commit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5572 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-04 18:55:38 +00:00
droazen	a5acb0b7a6	Fix for bug GSA-314: Detect -XL and -L incompatibility. An ArgumentException is now thrown if the combination of -L and -XL intervals specified on the command line results in an empty interval set after set subtraction. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5571 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-04 18:41:55 +00:00
depristo	095125152b	Updated to now longer include 2nd-best base output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5567 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-03 20:13:10 +00:00
droazen	0927b7c297	Fix for bug GSA-441: BAM file list with blank lines gives a confusing error message. Lines containing only whitespace in .list files are now ignored. Also added support for comments in .list files: lines whose first non-whitespace character is '#' are now also ignored. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5550 348d0f76-0448-11de-a6fe-93d51630548a	2011-04-01 15:04:35 +00:00
droazen	7b452ea2b9	Fix for bug GSA-430: Can't specify same BAM file twice on the command line. An ArgumentException with an appropriate error message and a list of the duplicate BAMs is now thrown in this case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5542 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-30 22:23:24 +00:00
hanna	deab9f0aa5	Initial work on proto-shard merger: - create size() method that returns an approximation of the uncompressed size in bytes of BAM span. I'll use this method as a protoshard weighting function until we determine how to normalize the weights across the different data access mechanisms (reads, reference, RODs). - Implementations of basic union/intersection/subtraction mechanisms for BAM spans; should be enough to get an accurate weight for two proto-shards put together. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5541 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-30 22:03:43 +00:00
chartl	328f89f66a	Minor changes to MannWhitneyU: - Comment fixes to better explain why two-sided test wants to use the LOWER (not higher) value for U - Much more direct testing of MWU functions - Uniform approximation was always using the < cumulant (sometimes the > cumulant should be used instead) - Uniform approximation currently not used (regime in which it was being used was not the right one -- not necessarily bad, but not an improvement over normal) + this particular approximation is for major imbalances of the form m >> n. Code may be altered in the future to use this method for this particular regime, if the method's not too slow. - Hook into one-sided test. RegionalAssociationRecalibrator: NaNs were being caused by presence of Infinity and -Infinity values out of the walker. Currently I'm just re-setting them to arbitrary post-whitened values, but the walker will be changed to prevent output of these values, and the "fix" will undone. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5539 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-30 17:03:02 +00:00
rpoplin	5ddc0e464a	Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 21:04:09 +00:00
chartl	f6dfdc7f3b	Single-tailed hypothesis testing in MWU git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5533 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-29 15:53:40 +00:00

1 2 3 4 5 ...

1135 Commits (1d11e88899d37fcfae7020e959653569b821bc8d)