gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mauricio Carneiro	7b8b064165	Last manual license update (hopefully) if everyone updates their git hook accordingly, this will be the last time I have to manually run the script. GSATDG-5	2013-01-18 16:13:07 -05:00
Eric Banks	cac439bc5e	Optimized the Allele Biased Downsampling: now it doesn't re-sort the pileup but just removes reads from the original one. Added a small fix that slightly changed md5s.	2013-01-18 11:17:31 -05:00
Chris Hartl	08d2da9057	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2013-01-18 10:28:45 -05:00
Chris Hartl	bf5748a538	Forgot to actually put in the md5. Also with the new change to record pairing and filtering, the multiple-records integration test changed: the indel records (T/TG \| T/TGACA) are matched up (rather than left separate) resulting in properly identifying mismatching alleles, rather than HET-UNAVAILABLE and UNAVAILABLE-HET. Very nice.	2013-01-18 10:25:36 -05:00
Chris Hartl	91030e9afa	Bugfix: records that get paired up during the resolution of multiple-records-per-site were not going into genotype-level filtering. Caught via testing. Testing for moltenized output, and for genotype-level filtering. This tool is now fully functional. There are three todo items: 1) Docs 2) An additional output table that gives concordance proportions normalized by records in both eval and comp (not just total in eval or total in comp) 3) Code cleanup for table creation (putting a table together the way I do takes -way- too many lines of code)	2013-01-18 09:49:48 -05:00
Eric Banks	39c73a6cf5	1. Ryan and I noticed that the FisherStrand annotation was completely busted for indels with reduced reads; fixed. 2. While making the previous fix and unifying FS for SNPs and indels, I noticed that FS was slightly broken in the general case for indels too; fixed. 3. I also fixed a minor bug in the Allele Biased Downsampling code for reduced reads.	2013-01-18 03:35:48 -05:00
Eric Banks	6a903f2c23	I finally gave up on trying to get the Haplotype/Allele merging to work in the HaplotypeCaller. I've resigned myself instead to create a mapping from Allele to Haplotype. It's cheap so not a big deal, but really shouldn't be necessary. Ryan and I are talking about refactoring for GATK2.5.	2013-01-18 01:21:08 -05:00
Eric Banks	953592421b	I think we got out of sync with the HC tests as we were clobbering each other's changes. Only differences here are to some RankSumTest values.	2013-01-17 09:19:21 -05:00
Eric Banks	a623cca89a	Bug fix for HaplotypeCaller, as reported on the forum: when reduced reads didn't completely overlap a deletion call, we were incorrectly trying to find the reference position of a base on the read that didn't exist. Added integration test to cover this case.	2013-01-16 22:47:58 -05:00
Mark DePristo	2a42b47e4a	Massive expansion of ActiveRegionTraversal unit tests, resulting in several bugfixes to ART -- UnitTests now include combinational tiling of reads within and spanning shard boundaries -- ART now properly handles shard transitions, and does so efficiently without requiring hash sets or other collections of reads -- Updating HC and CountReadsInActiveRegions integration tests	2013-01-16 15:30:00 -05:00
Eric Banks	d18dbcbac1	Added tests for changing IUPAC bases to Ns, for failing on bad ref bases, and for the HaplotypeCaller not failing when running over a region with an IUPAC base. Out of curiosity, why does Picard's IndexedFastaSequenceFile allow one to query for start position 0? When doing so, that base is a line feed (-1 offset to the first base in the contig) which is an illegal base (and which caused me no end of trouble)...	2013-01-16 14:55:33 -05:00
Eric Banks	392b5cbcdf	The CachingIndexedFastaSequenceFile now automatically converts IUPAC bases to Ns and errors out on other non-standard bases. This way walkers won't see anything except the standard bases plus Ns in the reference. Added option to turn off this feature (to maintain backwards compatibility). As part of this commit I cleaned up the BaseUtils code by adding a Base enum and removing all of the static indexes for each of the bases. This uncovered a bug in the way the DepthOfCoverage walker counts deletions (it was counting Ns instead!) that isn't covered by tests. Fortunately that walker is being deprecated soon...	2013-01-16 10:22:43 -05:00
Eric Banks	4fb3e48099	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-16 00:13:38 -05:00
Chris Hartl	327169b283	Refactor the method that identifies the site overlap type into the type enum class (so it can be used elsewhere potentially). Completed todo item: for sites like (eval) 20 12345 A C 20 12345 A AC (comp) 20 12345 A C 20 12345 A ACCC the records will be matched by the presence of a non-empty intersection of alleles. Any leftover records are then paired with an empty variant context (as though the call was unique). This has one somewhat counterintuitive feature, which is that normally (eval) 20 12345 A AC (comp) 20 12345 A ACCC would be classified as 'ALLELES_DO_NOT_MATCH' (and not counted in genotype tables), in the presence of the SNP, they're counted as EVAL_ONLY and TRUTH_ONLY respectively. + integration test	2013-01-15 12:13:45 -05:00
Mark DePristo	3c37ea014b	Retire original TraverseActiveRegion, leaving only the new optimized version -- Required some updates to MD5s, which was unexpected, and will be sorted out later with more detailed unit tests	2013-01-15 10:24:45 -05:00
Eric Banks	94800771e3	1. Initial implementation of bam writing for the HaplotypeCaller with -bam argument; currently only assembled haplotypes are emitted. 2. Framework is set up in the VariantAnnotator for the HaplotypeCaller to be able to call in to annotate dbSNP plus comp RODs. Until the HC uses meta data though, this won't work.	2013-01-15 10:19:18 -05:00
Chris Hartl	682c59ff04	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2013-01-14 13:27:34 -05:00
Chris Hartl	61bc334df1	Ensure output table formatting does not contain NaNs. For (0 eval ref calls)/(0 comp ref calls), set the proportion to 0.00. Added integration tests (checked against manual tabulation)	2013-01-14 09:21:30 -05:00
Ryan Poplin	a7fe334a3f	calculating the md5s for the new tests.	2013-01-11 15:43:52 -05:00
Ryan Poplin	65afec2a53	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-11 15:22:52 -05:00
Mark DePristo	85b529cced	Updating MD5s in HC and UG that changed due to new LIBS -- Resolved what was clearly a bug in UG (GGA mode was returning a neighboring, equivalent indel site that wasn't in input list. Not ideal) -- Trivial read count differences in HC	2013-01-11 15:17:19 -05:00
Mark DePristo	fb9eb3d4ee	PileupElement and LIBS cleanup -- function to create pileup elements in AlignmentStateMachine and LIBS -- Cleanup pileup element constructors, directing users to LIBS.createPileupFromRead() that really does the right thing	2013-01-11 15:17:17 -05:00
Mark DePristo	b53286cc3c	HaplotypeCaller mode to skip assembly and genotyping for performance testing -- Added HCPerformance evaluation Qscript -- Added some docs about one of the HC integration tests -- HaplotypeCaller / ART performance evaluation script	2013-01-11 15:17:16 -05:00
Ryan Poplin	e952296c10	Adding HC GGA integration test to cover duplicated input alleles.	2013-01-11 15:01:27 -05:00
Ryan Poplin	7f7f40f851	Adding additional HC GGA integration tests to cover more complicated input alleles.	2013-01-11 14:36:21 -05:00
Mauricio Carneiro	2a4ccfe6fd	Updated all JAVA file licenses accordingly GSATDG-5	2013-01-10 17:06:41 -05:00
Chris Hartl	31a5f88c4f	Expanded unit tests to cover the Concordance Metrics class fairly uniformly.	2013-01-10 14:33:47 -05:00
Chris Hartl	c1de92b511	Add in some todo items	2013-01-09 13:16:06 -05:00
Chris Hartl	8d126161e2	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2013-01-09 13:15:04 -05:00
Eric Banks	4fa439d89e	Move some classes back to public because they are used in the engine. Move some test classes to protected. We should have no more public->protected dependancies now	2013-01-09 11:06:10 -05:00
Chris Hartl	b56754606b	Initial break-out of GenotypeConcordance as a standalone walker. Some basic functionality testing. Currently performs only a pairwise comparison, but is very careful about proper tabulation through the GenotypeType enum.	2013-01-09 00:34:07 -05:00
Eric Banks	ee7d85c6e6	Move around the DiploidGenotype classes (so it can be used by the GATKPaperGenotyper)	2013-01-08 15:53:11 -05:00
Eric Banks	b099e2b4ae	Moving integration tests to protected	2013-01-08 09:34:08 -05:00
Ryan Poplin	4f95f850b3	Bug fix in the HC's allele mapping for multi-allelic events. Using the allele alone as a key isn't sufficient because alleles change when the reference allele changes during VariantContextUtils.simpleMerge for multi-allelic events.	2013-01-07 11:05:44 -05:00
Eric Banks	52067f0549	Handle merge conflicts	2013-01-06 12:29:12 -05:00
Chris Hartl	41bc416b65	Remove AAL and update MD5s.	2013-01-04 16:46:14 -05:00
Eric Banks	dd7f5e2be7	Hooking up the Bayesian estimate code for calculating Qemp in BQSR; various fixes after adding unit tests.	2013-01-04 14:43:11 -05:00
Chris Hartl	3753209584	One md5sum slipped past in the HC integration test.	2013-01-02 15:09:28 -05:00
Chris Hartl	e1d09ab0db	QD is now divided by the average length of the alternate allele (weighted by the allele count). The average length is stored in a related annotation, "AAL", which can be used to re-compute the "old" QD by simple multiplication. Integration tests should all pass.	2013-01-02 14:41:29 -05:00
Ryan Poplin	c8cd6ac465	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-20 14:58:04 -05:00
Ryan Poplin	a098888f4d	Updating missed UG md5	2012-12-20 14:57:53 -05:00
Tad Jordan	b491c177ff	Added functionality of outputting sorted GATKReport Tables - Added an optional argument to BaseRecalibrator to produce sorted GATKReport Tables - Modified BSQR Integration Tests to include the optional argument. Tests now produce sorted tables	2012-12-20 14:02:21 -05:00
Ryan Poplin	54e5c84018	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-19 11:31:40 -05:00
Ryan Poplin	aa39037be8	updating UG integration tests.	2012-12-19 11:31:35 -05:00
Eric Banks	70479cb71d	RR bug fix: we were failing when a read started with an insertion just at the edge of the consensus region. The weird part is that the comments claimed it was doing what it was supposed to, but it didn't actually do it. Now we maintain the last header element of the consensus (but without bases and quals) if it adjoins an element with an insertion. Added the user's test file as an integration test.	2012-12-19 10:59:07 -05:00
David Roazen	07b369ca7e	Move VCF/BCF2/VariantContext to new standalone org.broadinstitute.variant package This is an intermediate commit so that there is a record of these changes in our commit history. Next step is to isolate the test classes as well, and then move the entire package to the Picard repository and replace it with a jar in our repo. -Removed all dependencies on org.broadinstitute.sting (still need to do the test classes, though) -Had to split some of the utility classes into "GATK-specific" vs generic methods (eg., GATKVCFUtils vs. VCFUtils) -Placement of some methods and choice of exception classes to replace the StingExceptions and UserExceptions may need to be tweaked until everyone is happy, but this can be done after the move.	2012-12-19 10:25:22 -05:00
Ryan Poplin	92185dd5f4	updating HC integration tests.	2012-12-19 10:12:07 -05:00
Ryan Poplin	98f18b5f9e	Changing the HC over to using the non-contamination-downsampled read maps for the purposes of annotations. This behavior now matches the UG. There is a new command line option to go back to the older behavior to explore the differences.	2012-12-17 11:27:44 -05:00
Mauricio Carneiro	5f1afb4136	Fixing an off-by-one clipping error in ReduceReads for reads off the contig Reads that are soft-clipped off the contig (before the beginning of the contig) were being soft-clipped to position 0 instead of 1 because of an off-by-one issue. Fixed and included in the integration test.	2012-12-13 22:10:11 -05:00
Mauricio Carneiro	74344a3871	Bringing in the changes from the CMI repo	2012-12-13 21:59:37 -05:00
Mark DePristo	aeab932c63	Actual working version of unflushing VCFWriter -- Uses high-performance local writer backed by byte array that writes the entire VCF line in some write operation to the underlying output stream. -- Fixes problems with indexing of unflushed writes while still allowing efficient block zipping -- Same (or better) IO performance as previous implementation -- IndexingVariantContextWriter now properly closes the underlying output stream when it's closed -- Updated compressed VCF output file	2012-12-13 16:15:08 -05:00
Mauricio Carneiro	33290bfe0c	Added integration test to catch the read off contig in ReduceReads. So upstream changes won't break it again.	2012-12-12 13:49:54 -05:00
Mark DePristo	5632c13bf2	Resolves GSA-681 / Compressed VCF.gz output is too big because of unnecessary call to flush(). -- Now compressed output VCFs are properly blocked compressed (i.e., they are actually smaller than the uncompressed VCF)	2012-12-12 10:27:07 -05:00
Mark DePristo	dd52a70d45	Fix AFCalcResult unit test -- I was simply passing in the wrong values into the function. Fixed the calls, and expanded the docs on what needs to be passed in.	2012-12-11 10:40:12 -05:00
Eric Banks	bdda63d973	Related bug fixes to GGA mode in the HC: some variants (especially MNPs) were causing problems because they don't have to start at the current location to match the allele being genotyped. Fixed.	2012-12-10 14:47:04 -05:00
David Roazen	46edab6d6a	Use the new downsampling implementation by default -Switch back to the old implementation, if needed, with --use_legacy_downsampler -LocusIteratorByStateExperimental becomes the new LocusIteratorByState, and the original LocusIteratorByState becomes LegacyLocusIteratorByState -Similarly, the ExperimentalReadShardBalancer becomes the new ReadShardBalancer, with the old one renamed to LegacyReadShardBalancer -Performance improvements: locus traversals used to be 20% slower in the new downsampling implementation, now they are roughly the same speed. -Tests show a very high level of concordance with UG calls from the previous implementation, with some new calls and edge cases that still require more examination. -With the new implementation, can now use -dcov with ReadWalkers to set a limit on the max # of reads per alignment start position per sample. Appropriate value for ReadWalker dcov may be in the single digits for some tools, but this too requires more investigation.	2012-12-10 09:44:50 -05:00
Eric Banks	574d5b467f	Bug fix for indel HMM: protect against situation where long reads (e.g. Sanger) in a pileup can lead to a read starting after the haplotype end for a given haplotype.	2012-12-09 02:09:34 -05:00
Mark DePristo	d0cab795b7	Got caught in the middle of a bad integration test, that was fixed in independent push. Moved test bam into testdata.	2012-12-05 14:49:22 -05:00
Eric Banks	ef87b18e09	In retrospect, it wasn't a good idea to have FisherStrand handle reduced reads since they are always on the forward strand. For now, FS ignores reduced reads but I've added a note (and JIRA) to make this work once the RR het compression is enabled (since we will have directionality in reads then).	2012-12-05 02:00:35 -05:00
Eric Banks	726332db79	Disabling the testNoCmdLineHeaderStdout test in UG because it keeps crashing when I run it locally	2012-12-05 00:54:00 -05:00
Eric Banks	bca860723a	Updating tests to handle bad validation data files (that used the wrong qual score encoding); overrides push from stable.	2012-12-03 22:01:07 -05:00
Ryan Poplin	d5ed184691	Updating the HC integration test md5s. According to the NA12878 knowledge base this commit cuts down the FP rate by more than 50 percent with no loss in sensitivity.	2012-12-03 15:38:59 -05:00
Ryan Poplin	156d6a5e0b	misc minor bug fixes to GenotypingEngine.	2012-12-03 12:47:35 -05:00
Mark DePristo	2849889af5	Updating md5 for UG	2012-12-01 14:24:19 -05:00
Mark DePristo	c676853731	Merged bug fix from Stable into Unstable. Updating md5s Conflicts: protected/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java	2012-11-28 12:54:36 -05:00
Mark DePristo	a1d6461121	Critical bugfix to AFCalcResult affecting UG/HC quality score emission thresholds As reported by Menachem Fromer: a critical bug in AFCalcResult: Specifically, the implementation: public boolean isPolymorphic(final Allele allele, final double log10minPNonRef) { return getLog10PosteriorOfAFGt0ForAllele(allele) >= log10minPNonRef; } seems incorrect and should probably be: getLog10PosteriorOfAFEq0ForAllele(allele) <= log10minPNonRef The issue here is that the 30 represents a Phred-scaled probability of error and it's currently being compared to a log probability of non-error. Instead, we need to require that our probability of error be less than the error threshold. This bug has only a minor impact on the calls -- hardly any sites change -- which is good. But the inverted logic effects multi-allelic sites significantly. Basically you only hit this logic with multiple alleles, and in that case it'\s including extra alt alleles incorrectly, and throwing out good ones. Change was to create a new function that properly handles thresholds that are PhredScaled quality scores: /** * Same as #isPolymorphic but takes a phred-scaled quality score as input */ public boolean isPolymorphicPhredScaledQual(final Allele allele, final double minPNonRefPhredScaledQual) { if ( minPNonRefPhredScaledQual < 0 ) throw new IllegalArgumentException("phredScaledQual " + minPNonRefPhredScaledQual + " < 0 "); final double log10Threshold = Math.log10(QualityUtils.qualToProb(minPNonRefPhredScaledQual)); return isPolymorphic(allele, log10Threshold); }	2012-11-28 12:08:02 -05:00
Ryan Poplin	59cef880d1	Updating HC integration tests because experimental, HC-specific annotations have been removed.	2012-11-26 12:20:07 -05:00
Ryan Poplin	c3b7dd1374	Misc cleanup in the HaplotypeCaller. Cleaning up unused arguments after recent changes to HC-GenotypingEngine	2012-11-26 12:19:11 -05:00
Ryan Poplin	fedc4fde6c	Merged bug fix from Stable into Unstable	2012-11-25 21:55:55 -05:00
Ryan Poplin	d978cfe835	Soft clipped bases shouldn't be counted in the delocalized BQSR.	2012-11-25 21:55:29 -05:00
Eric Banks	937ac7290f	Lots more GGA fixes for the HC now that I understand what's going on internally. Integration tests pass except for the GGA test which I believe now produces better results.	2012-11-20 16:13:29 -05:00
Eric Banks	ff180a8e02	Significant refactoring of the Haplotype Caller to handle problems with GGA. The main fix is that we now maintain a mapping from 'original' allele to 'Smith-Waterman-based' allele so that we no longer need to do a (buggy) matching throughout the calling process.	2012-11-19 09:09:57 -05:00
Mauricio Carneiro	e35fd1c717	Merging CMI-0.5.0 and GATK-2.2 together.	2012-11-14 10:42:03 -05:00
Mauricio Carneiro	a17cd54b68	Co-Reduction implementation in ReduceReads ReduceReads now co-reduces bams if they're passed in toghether with multiple -I. Co-reduction forces every variant region in one sample to be a variant region in all samples. Also: * Added integrationtest for co-reduction * Fixed bug with new no-recalculation implementation of the marksites object where the last object wasn't being removed after finalizing a variant region (updated MD5's accordingly) DEV-200 #resolve #time 8m	2012-11-14 10:33:21 -05:00
Eric Banks	e93d461910	Adding integration test to BQSR for the csv file	2012-11-09 09:11:04 -05:00
Eric Banks	2da76db945	Updating integration tests	2012-11-06 22:23:05 -08:00
Eric Banks	0a2dded093	Fixes for bugs uncovered by unit tests	2012-11-06 16:07:40 -08:00
Eric Banks	b07106b3a7	Reimplement the allele biased downsampling to be smarter. Now we don't blindly pull n% of reads off of each allele. Instead, we try all possible genotype conformations for the contaminating sample and choose the one that provides the best genotype for the target sample (based heuristically on allele balance). This method allows us to save some of the reads that belong to the target sample, which should make Daniel M happy. Added unit tests to test the biased downsampling functionality.	2012-11-06 14:39:58 -08:00
Mark DePristo	1444cd753b	Bugfix for GSA-647 HaplotypeCaller misses good variant because the active region doesn't trigger for an exome -- The logic for determining active regions was a bit broken in the HC when intervals were used in the system -- TraverseActiveRegions now uses the AllLocus view, since we always want to see all reference sites, not just those covered. Simplifies logic of TAR -- Non-overlapping intervals are always treated as separate objects for determing active / inactive state. This means that each exon will stand on its own when deciding if it should be active or inactive -- Misc. cleanup, docs of some TAR infrastructure to make it safer and easier to debug in the future. -- Committing the SingleExomeCalling script that I used to find this problem, and will continue to use in evaluating calling of a single exome with the HC -- Make sure to get all of the reads into the set of potentially active reads, even for genomic locations that themselves don't overlap the engine intervals but may have reads that overlap the regions -- Remove excessively expensive calls to check bases are upper cased in ReferenceContext -- Update md5s after a lot of manual review and discussion with Ryan	2012-11-01 15:34:04 -04:00
Eric Banks	f8af8a2355	Moving UG integration tests to protected since they use protected-only contamination filtering. Adding a new UGLite integration test to confirm that contamination filtering is ignored in lite.	2012-10-31 21:28:07 -04:00
Guillermo del Angel	51a9ce28e1	Merge remote-tracking branch 'unstable/master' into develop	2012-10-31 10:29:48 -04:00
Ryan Poplin	4e661847b2	DelocalizedBaseRecalibrator becomes the BaseRecalibrator.	2012-10-29 12:53:39 -04:00
David Roazen	35483a7eef	Update MD5s for PrintReads with BQSR Integration Test The MD5s for these tests were changed in commit 87435f1074615b2cd016f042980109fd53962c8d to match the output of a broken version of BaseRecalibration. With the patch in commit c397102ecc1fd1d2cd8f209a8f358ab4a60b50a7, the output once again matches the original MD5s for these tests, and does not vary as you increase -nct. Final resolution to GSA-632	2012-10-26 14:25:25 -04:00
Eric Banks	b06f689d4b	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-26 02:13:26 -04:00
Eric Banks	bf3d61ce82	The default value for --contamination_fraction_to_filter is now 0.05 (5%) in both UG and HC. Users of GATK-lite get pushed down to 0% by default (since it's not enabled) or get a user error if they try to set it.	2012-10-26 01:04:51 -04:00
Mark DePristo	cc8c12b954	Committing a broken version of BaseRecalibration -- I'm committing because there's some kind of fundamental problem with the ReadCovariates cache, in that historical data isn't being cleared / computed properly, and I'd rather it fail for a while than leave it in JIRA. -- The integration tests test the -nct with PrintReads to get 1, 2, 4 and the 4 fails. But that's because of this incorrect calculation -- Updating GATKPerformanceOverTime with the new @ClassType annotation	2012-10-25 14:46:35 -04:00
David Roazen	02018ca764	Legacy BaseRecalibrator walker is neither TreeReducible nor NanoSchedulable The old BaseRecalibrator walker is and never will be thread-safe, since it's a LocusWalker that uses read attributes to track state. ONLY the newer DelocalizedBaseRecalibrator is believed likely to be thread-safe at this point. It is safe to run the DelocalizedBaseRecalibrator with -nct > 1 for testing purposes, but wait for further testing to be done before using it for production purposes in multithreaded mode.	2012-10-24 15:22:50 -04:00
Ryan Poplin	a27ee26481	updating HC integration test.	2012-10-24 14:08:39 -04:00
Ryan Poplin	094db7bf24	We now require at least 10 samples to merge variants into complex events in the HC. Added a new population based bam for the complex event integration test.	2012-10-24 14:07:36 -04:00
Mauricio Carneiro	4cd1a92358	Updating RR integration tests Forgot to update the integration tests after merging DEV-117 with optimizations from GATK main repo.	2012-10-23 11:26:26 -04:00
Mauricio Carneiro	c210b7cde4	Merge GATK repo into CMI-GATK Bringing in the following relevant changes: * Fixes the indel realigner N-Way out null pointer exception DEV-10 * Optimizations to ReduceReads that bring the run time to 1/3rd. Conflicts: protected/java/src/org/broadinstitute/sting/gatk/walkers/compression/reducereads/SlidingWindow.java DEV-10 #resolve #time 2m	2012-10-23 10:59:11 -04:00
Mauricio Carneiro	bbf7a0fb09	Adding integration test to ReduceReads coreduction DEV-117 #resolve	2012-10-23 10:56:33 -04:00
Mark DePristo	90f59803fd	MaxAltAlleles now defaults to 6, no more MaxAltAllelesForIndels -- Updated StandardCallerArgumentCollection to remove MaxAltAllelesForIndels. Previous argument is deprecated with meaningful doc message for people to use maxAltAlleles -- All constructores, factory methods, and test builders and their users updated to provide just a single argument -- Updating MD5s for integration tests that change due to genotyping more alleles -- Adding more alleles to genotyping results in slight changes in the QUAL value for multi-allelic loci where one or more alleles aren't polymorphic. That's simply due to the way that alternative hypotheses contribute as reference evidence against each true allele. The effect can be large (new qual = old qual / 2 in one case here). -- If we want more precision in our estimates we could decide (Eric, should we discuss?) to actually separately do a discovery phase in the genotyping, eliminate all variants not considered polymorphic, and then do a final round of calling to get the exact QUAL value for only those that are segregating. This would have the value of having the QUAL stay constant as more alleles are genotyped, at the cost of some code complexity increase and runtime. Might be worth it through	2012-10-22 13:47:56 -04:00
Mark DePristo	9f2851d769	Updating UnifiedGenotyperGeneralPloidyIntegrationTest following rebasing -- Created a JIRA ticket https://jira.broadinstitute.org/browse/GSA-623 for Guillermo to look at the differences as the multi-allelic nature of many sites seems to change with the new more protected infrastructure. This may be due to implementation issues in the pooled caller, problems with my interface, or could be a genuine improvement.	2012-10-21 20:23:11 -04:00
Mark DePristo	d21e42608a	Updating integration tests for minor changes due to switching to EXACT_INDEPENDENT model by default	2012-10-21 12:43:46 -04:00
Mark DePristo	0fcd358ace	Original EXACT model implementation lives, providing another reference (bi-allelic only) EXACT model -- Potentially a very fast implementation (it's very clean) but restricted to the biallelic case -- A starting point for future bi-allelic only optimized (logless) or generalized (bi-allelic general ploidy) implementations -- Added systematic unit tests covering this implementation, and comparing it to others -- Uncovered a nasty normalization bug in StateTracker that was capping our likelihoods at 0, even after summing up multiple likelihoods, which is just not safe to do and was causing us to lose likelihood in some cases -- Removed the restriction that a likelihood be <= 0 in StateTracker, and the protection for these cases in GeneralPloidyExactAFCalc which just wasn't right	2012-10-21 12:42:31 -04:00
Mark DePristo	eaffb814d3	IndependentExactAFCalc is now the default EXACT model implementation -- Changed UG / HC to use this one via the StandardCallerArgumentCollection -- Update the AFCalcFactory.Calculation to have a getDefault() value instead of having a duplicate entry in the enums	2012-10-21 12:42:31 -04:00
Mark DePristo	326f429270	Bugfixes to make new AFCalc system pass integrationtests -- GeneralPloidyExactAFCalc turns -Infinity values into -Double.MAX_VALUE, so our calculations pass unit tests -- Bugfix for GeneralPloidyGenotypeLikelihoodsCalculationModel, return a null VC when the only allele we get from our final alleles to use method is the reference base -- Fix calculation of reference posteriors when P(AF == 0) = 0.0 and P(AF == 0) = X for some meaningful value of X. Added unit test to ensure this behavior is correct -- Fix horrible sorting bug in IndependentAllelesDiploidExactAFCalc that applied the theta^N priors in the wrong order. Add contract to ensure this doesn't ever happen again -- Bugfix in GLBasedSampleSelector, where VCs without any polymorphic alleles were being sent to the exact model --	2012-10-21 12:42:31 -04:00
Mark DePristo	99c9031cb4	Merge AFCalcResultTracker into StateTracker, cleanup -- These two classes were really the same, and now they are actually the same! -- Cleanuped the interfaces, removed duplicate data -- Added lots of contracts, some of which found numerical issues with GeneralPloidyExactAFCalc (which have been patched over but not fixed) -- Moved goodProbability and goodProbabilityVector utilities to MathUtils. Very useful for contracts!	2012-10-21 12:42:31 -04:00
Guillermo del Angel	e9b7324dc1	Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-21 12:38:49 -04:00

1 2 3 4 5 ...

259 Commits (d004bfbe6f3fda62fc9d83c0611db00dfd0b530e)