gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	b7d59ea13b	LIBS unit test debugging should be false	2013-04-08 12:47:47 -04:00
Mauricio Carneiro	ebe2edbef3	Fix caching indices in the PairHMM Problem: -------- PairHMM was generating positive likelihoods (even after the re-work of the model) Solution: --------- The caching idices were never re-initializing the initial conditions in the first position of the deletion matrix. Also the match matrix was being wrongly initialized (there is not necessarily a match in the first position). This commit fixes both issues on both the Logless and the Log10 versions of the PairHMM. Summarized Changes: ------------------ * Redesign the matrices to have only 1 col/row of padding instead of 2. * PairHMM class now owns the caching of the haplotype (keeps track of last haplotypes, and decides where the caching should start) * Initial condition (in the deletionMatrix) is now updated every time the haplotypes differ in length (this was wrong in the previous version) * Adjust the prior and probability matrices to be one based (logless) * Update Log10PairHMM to work with prior and probability matrices as well * Move prior and probability matrices to parent class * Move and rename padded lengths to parent class to simplify interface and prevent off by one errors in new implementations * Simple cleanup of PairHMMUnitTest class for a little speedup * Updated HC and UG integration test MD5's because of the new initialization (without enforcing match on first base). * Create static indices for the transition probabilities (for better readability) [fixes #47399227]	2013-04-08 11:05:12 -04:00
Eric Banks	6253ba164e	Using --keepOriginalAC in SelectVariants was causing it to emit bad VCFs * This occurred when one or more alleles were lost from the record after selection * Discussed here: http://gatkforums.broadinstitute.org/discussion/comment/4718#Comment_4718 * Added some integration tests for --keepOriginalAC (there were none before)	2013-04-05 00:53:28 -04:00
Eric Banks	7897d52f32	Don't allow users to specify keys and IDs that contain angle brackets or equals signs (not allowed in VCF spec). * As reported here: http://gatkforums.broadinstitute.org/discussion/comment/4270#Comment_4270 * This was a commit into the variant.jar; the changes here are a rev of that jar and handling of errors in VF * Added integration test to confirm failure with User Error * Removed illegal header line in KB test VCF that was causing related tests to fail.	2013-04-05 00:52:32 -04:00
Eric Banks	14bbba0980	Optimization to method for getting values in ArgumentMatch * Very trivial, but I happened to see this code and it drove me nuts so I felt compelled to refactor it. * Instead of iterating over keys in map to get the values, just iterate over the values...	2013-04-04 23:30:47 -04:00
Ryan Poplin	8a93bb687b	Critical bug fix for the case of duplicate map calls in ActiveRegionWalkers with exome interval lists. -- When consecutive intervals were within the bandpass filter size the ActiveRegion traversal engine would create duplicate active regions. -- Now when flushing the activity profile after we jump to a new interval we remove the extra states which are outside of the current interval. -- Added integration test which ensures that the output VCF contains no duplicate records. Was failing test before this commit.	2013-04-03 13:15:30 -04:00
David Roazen	2eac97a76c	Remove auto-creation of fai/dict files for fasta references -A UserException is now thrown if either the fai or dict file for the reference does not exist, with pointers to instructions for creating these files. -Gets rid of problematic file locking that was causing intermittent errors on our farm. -Integration tests to verify that correct exceptions are thrown in the case of a missing fai / dict file. GSA-866 #resolve	2013-04-02 18:34:08 -04:00
Mark DePristo	e7a8e6e8ee	Merge pull request #140 from broadinstitute/dr_interval_intersection_bug_GSA-909 Intervals: fix bug where we could fail to find the intersection of unsorted/missorted interval lists	2013-04-02 11:59:01 -07:00
David Roazen	5baf906c28	Intervals: fix bug where we could fail to find the intersection of unsorted/missorted interval lists -The algorithm for finding the intersection of two sets of intervals relies on the sortedness of the intervals within each set, but the engine was not sorting the intervals before attempting to find the intersection. -The result was that if one or both interval lists was unsorted / lexicographically sorted, we would often fail to find the intersection correctly. -Now the IntervalBinding sorts all sets of intervals before returning them, solving the problem. -Added an integration test for this case. GSA-909 #resolve	2013-04-02 14:01:52 -04:00
Ryan Poplin	a58a3e7e1e	Merge pull request #134 from broadinstitute/mc_phmm_experiments PairHMM rework	2013-04-01 12:10:43 -07:00
Mark DePristo	7c83efc1b9	Merge pull request #135 from broadinstitute/mc_pgtag_fix Fixing @PG tag uniqueness issue	2013-03-31 11:36:40 -07:00
Guillermo del Angel	9686e91a51	Added small feature to VariantFiltration to filter sites outside of a given mask: -- Sometimes it's desireable to specify a set of "good" regions and filter out other stuff (like say an alignability mask or a "good regions" mask). But by default, the -mask argument in VF will only filter sites inside a particular mask. New argument -filterNotInMask will reverse default logic and filter outside of a given mask. -- Added integration test, and made sure we also test with a BED rod.	2013-03-31 08:48:16 -04:00
Mauricio Carneiro	ec475a46b1	Fixing @PG tag uniqueness issue The Problem: ------------ the SAM spec does not allow multiple @PG tags with the same id. Our @PG tag writing routines were allowing that to happen with the boolean parameter "keep_all_pg_records". How this fixes it: ------------------ This commit removes that option from all the utility functions and cleans up the code around the classes that used these methods off-spec. Summarized changes: ------------------- * Remove keep_all_pg_records option from setupWriter utility methos in Util * Update all walkers to now replace the last @PG tag of the same walker (if it already exists) * Cleanup NWaySamFileWriter now that it doesn't need to keep track of the keep_all_pg_records variable * Simplify the multiple implementations to setupWriter Bamboo: ------- http://gsabamboo.broadinstitute.org/browse/GSAUNSTABLE-PARALLEL31 Issue Tracker: -------------- [fixes 47100885]	2013-03-30 20:31:33 -04:00
Mauricio Carneiro	52e67a6973	ReviewedStingException -> IllegalStateException	2013-03-30 20:11:55 -04:00
Guillermo del Angel	6b8bed34d0	Big bad bug fix: feature added to LeftAlignAndTrimVariants to left align multiallelic records didn't work. -- Corrected logic to pick biallelic vc to left align. -- Added integration test to make sure this feature is tested and feature to trim bases is also tested.	2013-03-30 19:31:28 -04:00
Mauricio Carneiro	0de6f55660	PairHMM rework The current implementation of the PairHMM had issues with the probabilities and the state machines. Probabilities were not adding up to one because: # Initial conditions were not being set properly # Emission probabilities in the last row were not adding up to 1 The following commit fixes both by # averaging all potential start locations (giving an equal prior to the state machine in it's first iteration -- allowing the read to start it's alignment anywhere in the haplotype with equal probability) # discounting all paths that end in deletions by not adding the last row of the deletion matrix and summing over all paths ending in matches and insertions (this saves us from a fourth matrix to represent the end state) Summarized changes: * Fix LoglessCachingPairHMM and Log10PairHMM according to the new algorithm * Refactor probabilities check to throw exception if we ever encounter probabilities greater than 1. * Rename LoglessCachingPairHMM to LoglessPairHMM (this is the default implementation in the HC now) * Rename matrices to matchMatrix, insertionMatrix and deletionMatrix for clarity * Rename metric lengths to read and haplotype lengths for clarity * Rename private methods to initializePriors (distance) and initializeProbabilities (constants) for clarity * Eliminate first row constants (because they're not used anyway!) and directly assign initial conditions in the deletionMatrix * Remove unnecessary parameters from updateCell() * Fix the expected probabilities coming from the exact model in PairHMMUnitTest * Neatify PairHMM class (removed unused methods) and PairHMMUnitTest (removed unused variables) * Update MD5s: Probabilities have changed according to the new PairHMM model and as expected HC and UG integration tests have new MD5s. [fix 47164949]	2013-03-30 10:50:06 -04:00
Guillermo del Angel	8fbf9c947f	Upgrades and changes to LeftAlignVariants, motivated by 1000G consensus indel production: -- Added ability to trim common bases in front of indels before left-aligning. Otherwise, records may not be left-aligned if they have common bases, as they will be mistaken by complext records. -- Added ability to split multiallelic records and then left align them, otherwise we miss a lot of good left-aligneable indels. -- Motivated by this, renamed walker to LeftAlignAndTrimVariants. -- Code refactoring, cleanup and bring up to latest coding standards. -- Added unit testing to make sure left alignment is performed correctly for all offsets. -- Changed phase 3 HC script to new syntax. Add command line options, more memory and reduce alt alleles because jobs keep crashing.	2013-03-29 10:02:06 -04:00
Chris Hartl	73d1c319bf	Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils Currently, the multi-allelic test is covering the following case: Eval A T,C Comp A C reciprocate this so that the reverse can be covered. Eval A C Comp A T,C And furthermore, modify ConcordanceMetrics to more properly handle the situation where multiple alternate alleles are available in the comp. It was possible for an eval C/C sample to match a comp T/T sample, so long as the C allele were also present in at least one other comp sample. This comes from the fact that "truth" reference alleles can be paired with any allele also present in the truth VCF, while truth het/hom var sites are restricted to having to match only the alleles present in the genotype. The reason that truth ref alleles are special case is as follows, imagine: Eval: A G,T 0/0 2/0 2/2 1/1 Comp: A C,T 0/0 1/0 0/0 0/0 Even though the alt allele of the comp is a C, the assessment of genotypes should be as follows: Sample1: ref called ref Sample2: alleles don't match (the alt allele of the comp was not assessed in eval) Sample3: ref called hom-var Sample4: alleles don't match (the alt allele of the eval was not assessed in comp) Before this change, Sample2 was evaluated as "het called het" (as the T allele in eval happens to also be in the comp record, just not in the comp sample). Thus: apply current logic to comp hom-refs, and the more restrictive logic ("you have to match an allele in the comp genotype") when the comp is not reference. Also in this commit,major refactoring and testing for MathUtils. A large number of methods were not used at all in the codebase, these methods were removed: - dotProduct(several types). logDotProduct is used extensively, but not the real-space version. - vectorSum - array shuffle, random subset - countOccurances (general forms, the char form is used in the codebase) - getNMaxElements - array permutation - sorted array permutation - compare floats - sum() (for integer arrays and lists). Final keyword was extensively added to MathUtils. The ratio() and percentage() methods were revised to error out with non-positive denominators, except in the case of 0/0 (which returns 0.0 (ratio), or 0.0% (percentage)). Random sampling code was updated to make use of the cleaner implementations of generating permutations in MathUtils (allowing the array permutation code to be retired). The PaperGenotyper still made use of one of these array methods, since it was the only walker it was migrated into the genotyper itself. In addition, more extensive tests were added for - logBinomialCoefficient (Newton's identity should always hold) - logFactorial - log10sumlog10 and its approximation All unit tests pass	2013-03-28 23:25:28 -04:00
MauricioCarneiro	a2b69790a6	Merge pull request #128 from broadinstitute/eb_rr_polyploid_compression_GSA-639	2013-03-28 06:39:43 -07:00
Mark DePristo	12475cc027	Display the active MappingQualityFilter if mmq > 0 in the HaplotypeCaller	2013-03-26 14:27:18 -04:00
Mark DePristo	ad04fdb233	PerReadAlleleLikelihoodMap getMostLikelyAllele returns an MostLikelyAllele objects now -- This new functionality allows the client to make decisions about how to handle non-informative reads, rather than having a single enforced constant that isn't really appropriate for all users. The previous functionality is maintained now and used by all of the updated pieces of code, except the BAM writers, which now emit reads to display to their best allele, regardless of whether this is particularly informative or not. That way you can see all of your data realigned to the new HC structure, rather than just those that are specifically informative. -- This all makes me concerned that the informative thresholding isn't appropriately used in the annotations themselves. There are many cases where nearby variation makes specific reads non-informative about one event, due to not being informative about the second. For example, suppose you have two SNPs A/B and C/D that are in the same active region but separated by more than the read length of the reads. All reads would be non-informative as no read provides information about the full combination of 4 haplotypes, as they reads only span a single event. In this case our annotations will all fall apart, returning their default values. Added a JIRA to address this (should be discussed in group meeting)	2013-03-26 14:27:13 -04:00
Eric Banks	593d3469d4	Refactored the het (polyploid) consensus creation in ReduceReads. * It is now cleaner and easier to test; added tests for newly implemented methods. * Many fixes to the logic to make it work * The most important change was that after triggering het compression we actually need to back it out if it creates reads that incorporated too many softclips at any one position (because they get unclipped). * There was also an off-by-one error in the general code that only manifested itself with het compression. * Removed support for creating a het consensus around deletions (which was broken anyways). * Mauricio gave his blessing for this. * Het compression now works only against known sites (with -known argument). * The user can pass in one or more VCFs with known SNPs (other variants are ignored). * If no known SNPs are provided het compression will automatically be disabled. * Added SAM tag to stranded (i.e. het compressed) reduced reads to distinguish their strandedness from normal reduced reads. * GATKSAMRecord now checks for this tag when determining whether or not the read is stranded. * This allows us to update the FisherStrand annotation to count het compressed reduced reads towards the FS calculation. * [It would have been nice to mark the normal reads as unstranded but then we wouldn't be backwards compatible.] * Updated integration tests accordingly with new het compressed bams (both for RR and UG). * In the process of fixing the FS annotation I noticed that SpanningDeletions wasn't handling RR properly, so I fixed it too. * Also, the test in the UG engine for determining whether there are too many overlapping deletions is updated to handle RR. * I added a special hook in the RR integration tests to additionally run the systematic coverage checking tool I wrote earlier. * AssessReducedCoverage is now run against all RR integration tests to ensure coverage is not lost from original to reduced bam. * This helped uncover a huge bug in the MultiSampleCompressor where it would drop reads from all but 1 sample (now fixed). * AssessReducedCoverage moved from private to protected for packaging reasons. * #resolve GSA-639 At this point, this commit encompasses most of what is needed for het compression to go live. There are still a few TODO items that I want to get in before the 2.5 release, but I will save those for a separate branch because as it is I feel bad for the person who needs to review all these changes (sorry, Mauricio).	2013-03-25 09:34:54 -04:00
Mauricio Carneiro	eb33da6820	Added support to reduce reads to Callable Loci -- added calls to representativeCount() of the pileup instead of using ++ -- renamed CallableLoci integration test -- added integration test for reduce read support on callable loci	2013-03-21 15:53:04 -04:00
Mark DePristo	7ae15dadbe	HC now by default only uses reads with MAPQ >= 20 for assembly and calling -- Previously we tried to include lots of these low mapping quality reads in the assembly and calling, but we effectively were just filtering them out anyway while generating an enormous amount of computational expense to handle them, as well as much larger memory requirements. The new version simply uses a read filter to remove them upfront. This causes no major problems -- at least, none that don't have other underlying causes -- compared to 10-11mb of the KB -- Update MD5s to reflect changes due to no longer including mmq < 20 by default	2013-03-21 13:10:50 -04:00
Mark DePristo	3a8f001c27	Misc. fixes upon pull request review -- DeBruijnAssemblerUnitTest and AlignmentUtilsUnitTest were both in DEBUG = true mode (bad!) -- Remove the maxHaplotypesToConsider feature of HC as it's not useful	2013-03-20 22:54:37 -04:00
Mark DePristo	98c4cd060d	HaplotypeCaller now uses SeqGraph instead of kmer graph to build haplotypes. -- DeBruijnAssembler functions are no longer static. This isn't the right way to unit test your code -- An a HaplotypeCaller command line option to use low-quality bases in the assembly -- Refactored DeBruijnGraph and associated libraries into base class -- Refactored out BaseEdge, BaseGraph, and BaseVertex from DeBruijn equivalents. These DeBruijn versions now inherit from these base classes. Added some reasonable unit tests for the base and Debruijn edges and vertex classes. -- SeqVertex: allows multiple vertices in the sequence graph to have the same sequence and yet be distinct -- Further refactoring of DeBruijnAssembler in preparation for the full SeqGraph <-> DeBruijnGraph split -- Moved generic methods in DeBruijnAssembler into BaseGraph -- Created a simple SeqGraph that contains SeqVertex objects -- Simple chain zipper for SeqGraph that reproduces the results for the mergeNode function on DeBruijnGraphs -- A working version of the diamond remodeling algorithm in SeqGraph that converts graphs that look like A -> Xa, A -> Ya, Xa -> Z, Ya -> Z into A -> X -> a, A -Y -> a, a -> Z -- Allow SeqGraph zip merging of vertices where the in vertex has multiple incoming edges or the out vertex has multiple outgoing edges -- Fix all unit tests so they work with the new SeqGraph system. All tests passed without modification. -- Debugging makes it easier to tell which kmer graph contributes to a haplotype -- Better docs and unit tests for BaseVertex, SeqVertex, BaseEdge, and KMerErrorCorrector -- Remove unnecessary printing of cleaning info in BaseGraph -- Turn off kmer graph creation in DeBruijnAssembler.java -- Only print SeqGraphs when debugGraphTransformations is set to true -- Rename DeBruijnGraphUnitTest to SeqGraphUnitTest. Now builds DeBruijnGraph, converts to SeqGraph, uses SeqGraph.mergenodes and tests for equality. -- Update KBestPathsUnitTest to use SeqGraphs not DebruijnGraphs -- DebruijnVertex now longer takes kmer argument -- it's implicit that the kmer length is the sequence.length now	2013-03-20 22:54:36 -04:00
Mark DePristo	ffea6dd95f	HaplotypeCaller now has the ability to only consider the best N haplotypes for genotyping -- Added a -dontGenotype mode for testing assembly efficiency -- However, it looks like this has a very negative impact on the quality of the results, so the code should be deleted	2013-03-20 22:54:36 -04:00
Mark DePristo	a8fb26bf01	A generic downsampler that reduces coverage for a bunch of reads -- Exposed the underlying minElementsPerStack parameter for LevelingDownsampler	2013-03-20 22:54:35 -04:00
Mark DePristo	752440707d	AlignmentUtils.calcNumDifferentBases computes the number of bases that differ between a reference and read sequence given a cigar between the two.	2013-03-20 22:54:35 -04:00
Geraldine Van der Auwera	d70bf64737	Created new DeprecatedToolChecks class --Based on existing code in GenomeAnalysisEngine --Hashmaps hold mapping of deprecated tool name to version number and recommended replacement (if any) --Using FastUtils for maps; specifically Object2ObjectMap but there could be a better type for Strings... --Added user exception for deprecated annotations --Added deprecation check to AnnotationInterfaceManager.validateAnnotations --Run when annotations are initialized --Made annotation sets instead of lists	2013-03-20 06:46:02 -04:00
Geraldine Van der Auwera	6b4d88ebe9	Created ListAnnotations utility (extends CommandLineProgram) --Refactored listAnnotations basic method out of VA into HelpUtils --HelpUtils.listAnnotations() is now called by both VA and the new ListAnnotations utility (lives in sting.tools) --This way we keep the VA --list option but we also offer a way to list annotations without a full valid VA command-line, which was a pain users continually complained about --We could get rid of the VA --list option altogether ...?	2013-03-20 06:15:27 -04:00
Geraldine Van der Auwera	95a9ed853d	Made some documentation updates & fixes --Mostly doc block tweaks --Added @DocumentedGATKFeature to some walkers that were undocumented because they were ending up in "uncategorized". Very important for GSA: if a walker is in public or protected, it HAS to be properly tagged-in. If it's not ready for the public, it should be in private.	2013-03-20 06:15:20 -04:00
Mark DePristo	d7bec9eb6e	AssessNA12878 bugfixes -- @Output isn't required for AssessNA12878 -- Previous version would could non-variant sites in NA12878 that resulted from subsetting a multi-sample VC to NA12878 as CALLED_BUT_NOT_IN_DB sites. Now they are properly skipped -- Bugfix for subsetting samples to NA12878. Previous version wouldn't trim the alleles when subsetting down a multi-sample VCF, so we'd have false FN/FP sites at indels when the multi-sample VCF has alleles that result in the subset for NA12878 having non-trimmed alleles. Fixed and unit tested now.	2013-03-18 15:48:08 -04:00
Ami Levy-Moonshine	0e9c1913ff	fix typos in argument docs and in printed output in CoveredByNSamplesSites and rewrite an unaccurate comment	2013-03-18 13:54:21 -04:00
Mark DePristo	2b80068164	Merged bug fix from Stable into Unstable	2013-03-18 12:36:21 -04:00
Mark DePristo	7ab7c873a1	Temp. to PairHMM to avoid bad likelihoods -- Simply caps PairHMM likelihoods from rising above 0 by taking the min of the likelihood and 0. Will be properly fixed in GATK 2.5 with better PairHMM implementation.	2013-03-18 12:34:51 -04:00
David Roazen	a67d8c8dd6	Bump timeout for MaxRuntimeIntegrationTest Looks like returning this timeout to its original value was a bit too aggressive -- adding 40 seconds to the tolerance limit.	2013-03-17 16:17:29 -04:00
David Roazen	742a7651e9	Further tweaking of test timeouts Increase one timeout, restore others that were only timing out due to the Java crypto lib bug to their original values. -DOUBLE timeout for NanoSchedulerUnitTest.testNanoSchedulerInLoop() -REDUCE timeout for EngineFeaturesIntegrationTest to its original value -REDUCE timeout for MaxRuntimeIntegrationTest to its original value -REDUCE timeout for GATKRunReportUnitTest to its original value	2013-03-15 14:49:21 -04:00
Mark DePristo	8317cc155e	Merge pull request #108 from broadinstitute/eb_bqsr_out_of_bounds_fix Added check in the MalformedReadFilter for reads without stored bases (i...	2013-03-14 17:29:35 -07:00
MauricioCarneiro	6f0269df2c	Merge pull request #107 from broadinstitute/eb_fix_bqsr_clip_exception	2013-03-14 14:40:06 -07:00
Eric Banks	232afdcbea	Added check in the MalformedReadFilter for reads without stored bases (i.e. that use ''). We now throw a User Error for such reads * User can override this to filter instead with --filter_bases_not_stored * Added appropriate unit test	2013-03-14 17:17:26 -04:00
droazen	0fd9f0e77c	Merge pull request #104 from broadinstitute/eb_fix_output_annotation_GSA-837 Fixed the logic of the @Output annotation and its interaction with 'required'	2013-03-14 12:52:00 -07:00
Ryan Poplin	38914384d1	Changing CALLED_IN_DB_UNKNOWN_STATUS to count as TRUE_POSITIVEs in the simplified stats for AssessNA12878.	2013-03-14 14:44:18 -04:00
Eric Banks	6d6264b108	Merge pull request #105 from broadinstitute/gg_annotations_cleanup_45802765 Cleaned up annotations	2013-03-14 11:35:00 -07:00
Geraldine Van der Auwera	61349ecefa	Cleaned up annotations - Moved AverageAltAlleleLength, MappingQualityZeroFraction and TechnologyComposition to Private - VariantType, TransmissionDisequilibriumTest, MVLikelihoodRatio and GCContent are no longer Experimental - AlleleBalanceBySample, HardyWeinberg and HomopolymerRun are Experimental and available to users with a big bold caveat message - Refactored getMeanAltAlleleLength() out of AverageAltAlleleLength into GATKVariantContextUtils in order to make QualByDepth independent of where AverageAltAlleleLength lives - Unrelated change, bundled in for convenience: made HC argument includeUnmappedreads @Hidden - Removed unnecessary check in AverageAltAlleleLength	2013-03-14 14:26:48 -04:00
Eric Banks	7cab709a88	Fixed the logic of the @Output annotation and its interaction with 'required'. ALL GATK DEVELOPERS PLEASE READ NOTES BELOW: I have updated the @Output annotation to behave differently and to include a 'defaultToStdout' tag. * The 'defaultToStdout' tags lets walkers specify whether to default to stdout if -o is not provided. * The logic for @Output is now: * if required==true then -o MUST be provided or a User Error is generated. * if required==false and defaultToStdout==true then the output is assigned to stdout if no -o is provided. * this is the default behavior (i.e. @Output with no modifiers). * if required==false and defaultToStdout==false then the output object is null. * use this combination for truly optional outputs (e.g. the -badSites option in AssessNA12878). * I have updated walkers so that previous behavior has been maintained (as best I could). * In general, all @Outputs with default long/short names have required=false. * Walkers with nWayOut options must have required==false and defaultToStdout==false (I added checks for this) * I added unit tests for @Output changes with David's help (thanks!). * #resolve GSA-837	2013-03-14 11:58:51 -04:00
Eric Banks	573ed07ad0	Fixed reported bug in BQSR for RNA seq alignments with Ns. * ClippingOp updated to incorporate Ns in the hard clips. * ReadUtils.getReadCoordinateForReferenceCoordinate() updated to account for Ns. * Added test that covers the BQSR case we saw. * Created GSA-856 (for Mauricio) to add lots of tests to ReadUtils. * It will require refactoring code and not in the scope of what I was willing to do to fix this.	2013-03-14 11:26:52 -04:00
Eric Banks	ff87b62fe3	Fixed bug in SelectVariants where maxIndelSize argument wasn't getting applied to deletions. Added unit tests and docs.	2013-03-13 15:11:34 -04:00
Mark DePristo	b5b63eaac7	New GATKSAMRecord concept of a strandless read, update to FS -- Strandless GATK reads are ones where they don't really have a meaningful strand value, such as Reduced Reads or fragment merged reads. Added GATKSAMRecord support for such reads, along with unit tests -- The merge overlapping fragments code in FragmentUtils now produces strandless merged fragments -- FisherStrand annotation generalized to treat strandless as providing 1/2 the representative count for both strands. This means that that merged fragments are properly handled from the HC, so we don't hallucinate fake strand-bias just because we managed to merge a lot of reads together. -- The previous getReducedCount() wouldn't work if a read was made into a reduced read after getReducedCount() had been called. Added new GATKSAMRecord method setReducedCounts() that does the right thing. Updated SlidingWindow and SyntheticRead to explicitly call this function, and so the readTag parameter is now gone. -- Update MD5s for change to FS calculation. Differences are just minor updates to the FS	2013-03-13 11:16:36 -04:00
Mark DePristo	925846c65f	Cleanup of FragmentUtils -- Code was undocumented, big, and not well tested. All three things fixed. -- Currently not passing, but the framework works well for testing -- Added concat(byte[] ... arrays) to utils	2013-03-13 07:36:20 -04:00
David Roazen	8ed78b453f	Increase timeout for a test in the EngineFeaturesIntegrationTest -This test was intermittently failing when run on the farm	2013-03-12 23:53:26 -04:00
Mark DePristo	b3f67899b5	Merge pull request #101 from broadinstitute/dr_fix_failing_parallel_tests Fix more tests that fail when run in parallel on the farm	2013-03-12 14:11:02 -07:00
David Roazen	cdb1fa1105	Fix more tests that fail when run in parallel on the farm -Allow the default S3 put timeout of 30 seconds for GATKRunReports to be overridden via a constructor argument, and use a timeout of 300 seconds for tests. The timeout remains 30 seconds in all other cases. -Change integration tests that themselves dispatch farm jobs into pipeline tests. Necessary because some farm nodes are not set up as submit hosts. Pipeline tests are still run directly on gsa4. -Bump up the timeout for the MaxRuntimeIntegrationTest even more (was still occasionally failing on the farm!)	2013-03-12 16:53:30 -04:00
Geraldine Van der Auwera	f972963918	Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) GATK-73 updated docs for bqsr args GATK-9 differentiate CountRODs from CountRODsByRef GATK-76 generate GATKDoc for CatVariants GATK-4 made resource arg required GATK-10 added -o, some docs to CountMales; some docs to CountLoci GATK-11 fixed by MC's -o change; straightened out the docs. GATK-77 fixed references to wiki GATK-76 Added Ami's doc block GATK-14 Added note that these annotations can only be used with VariantAnnotator GATK-15 specified required=false for two arguments GATK-23 Added documentation block GATK-33 Added documentation GATK-34 Added documentation GATK-32 Corrected arg name and docstring in DiffObjects GATK-32 Added note to DO doc about reference (required but unused) GATK-29 Added doc block to CountIntervals GATK-31 Added @Output PrintStream to enable -o GATK-35 Touched up docs GATK-36 Touched up docs, specified verbosity is optional GATK-60 Corrected GContent annot module location in gatkdocs GATK-68 touched up docs and arg docstrings GATK-16 Added note of caution about calling RODRequiringAnnotations as a group GATK-61 Added run requirements (num samples, min genotype quality) Tweaked template and generic doc block formatting (h2 to h3 titles) GATK-62 Added a caveat to HR annot Made experimental annotation hidden GATK-75 Added setup info regarding BWA GATK-22 Clarified some argument requirements GATK-48 Clarified -G doc comments GATK-67 Added arg requirement GATK-58 Added annotation and usage docs GSATDG-96 Corrected doc Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)	2013-03-12 10:57:14 -04:00
Guillermo del Angel	695723ba43	Two features useful for ancient DNA processing. Ancient DNA sequencing data is in many ways different from modern data, and methods to analyze it need to be adapted accordingly. Feature 1: Read adaptor trimming. Ancient DNA libraries typically have very short inserts (in the order of 50 bp), so typical Illumina libraries sequenced in, say, 100bp HiSeq will have a large adaptor component being read after the insert. If this adaptor is not removed, data will not be aligneable. There are third party tools that remove adaptor and potentially merge read pairs, but are cumbersome to use and require precise knowledge of the library construction and adaptor sequence. -- New walker ReadAdaptorTrimmer walks through paired end data, computes pair overlap and trims auto-detected adaptor sequence. -- Unit tests added for trimming operation. -- Utility walker (may be retired later) DetailedReadLengthDistribution computes insert size or read length distribution stratified by read group and mapping status and outputs a GATKReport with data. -- Renamed MaxReadLengthFilter to ReadLengthFilter and added ability to specify minimum read length as a filter (may be useful if, as a consequence of adaptor trimming, we're left with a lot of very short reads which will map poorly and will just clutter output BAMs). Feature 2: Unbiased site QUAL estimation: many times ancestral allele status is not known and VCF fields like QUAL, QD, GQ, etc. are affected by the pop. gen. prior at a site. This might introduce subtle biases in studies where a species is aligned against the reference of another species, so an option for UG and HC not to apply such prior is introduced. -- Added -noPrior argument to StandardCallerArgumentCollection. -- Added option not to fill priors is such argument is set. -- Added an integration test.	2013-03-09 18:18:13 -05:00
Yossi Farjoun	baad965a57	- Changed loadContaminationFile file parser to delimit by tab only. This allows spaces in sampleIDs, which apparently are allowed. - This was needed since samples with spaces in their names are regularly found in the picard pipeline. - Modified the tests to account for this (removed spaces from the good tests, and changed the failing tests accordingly) - Cleaned up the unit tests using a @DataProvider (I'm in love...). - Moved AlleleBiasedDownsamplingUtilsUnitTest to public to match location of class it is testing (due to the way bamboo operates)	2013-03-07 13:04:24 -05:00
David Roazen	3ab78543a7	Fix tests that were consistently or intermittently failing when run in parallel on the farm -Make MaxRuntimeIntegrationTest more lenient by assuming that startup overhead might be as long as 120 seconds on a very slow node, rather than the original assumption of 20 seconds -In TraverseActiveRegionsUnitTest, write temp bam file to the temp directory, not to the current working directory -SimpleTimerUnitTest: This test was internally inconsistent. It asserted that a particular operation should take no more than 10 milliseconds, and then asserted again that this same operation should take no more than 100 microseconds (= 0.1 millisecond). On a slow node it could take slightly longer than 100 microseconds, however. Changed the test to assert that the operation should require no more than 10000 microseconds (= 10 milliseconds) -change global default test timeout from 20 to 40 minutes (things just take longer on the farm!) -build.xml: allow runtestonly target to work with scala test classes	2013-03-06 13:56:54 -05:00
Eric Banks	3759d9dd67	Added the functionality to impose a relative ordering on ReadTransformers in the GATK engine. * ReadTransformers can say they must be first, must be last, or don't care. * By default, none of the existing ones care about ordering except BQSR (must be first). * This addresses a bug reported on the forum where BAQ is incorrectly applied before BQSR. * The engine now orders the read transformers up front before applying iterators. * The engine checks for enabled RTs that are not compatible (e.g. both must be first) and blows up (gracefully). * Added unit tests.	2013-03-06 12:38:59 -05:00
Mark DePristo	446cd61f7e	Merge pull request #84 from broadinstitute/eb_allelic_primitives Added new walker to split MNPs into their allelic primitives (SNPs).	2013-03-06 09:02:21 -08:00
Eric Banks	78721ee09b	Added new walker to split MNPs into their allelic primitives (SNPs). * Can be extended to complex alleles at some point. * Currently only works for bi-allelics (documented). * Added unit and integration tests.	2013-03-05 23:16:42 -05:00
Mauricio Carneiro	e2d41f0282	Turning @Output required to false By default all output is assigned to stdout if a -o is not provided. Technically this makes @Output a not required parameter, and the documentation is misleading because it's reading from the annotation. GSA-820 #resolve	2013-03-05 17:26:16 -05:00
Eric Banks	2be57fbcfb	Merged bug fix from Stable into Unstable	2013-03-05 13:28:46 -05:00
Eric Banks	5e89f01e10	Don't allow the use of compressed (.gz) references in the GATK.	2013-03-05 13:28:19 -05:00
Mauricio Carneiro	d0c8105387	Cleaning up hilarious exception messages Too many users (with RNASeq reads) are hitting these exceptions that were never supposed to happen. Let's give them (and us) a better and clearer error message.	2013-03-04 16:52:22 -05:00
Mark DePristo	42d3919ca4	Expanded functionality for writing BAMs from HaplotypeCaller -- The new code includes a new mode to write out a BAM containing reads realigned to the called haplotypes from the HC, which can be easily visualized in IGV. -- Previous functionality maintained, with bug fixes -- Haplotype BAM writing code now lives in utils -- Created a base class that includes most of the functionality of writing reads realigned to haplotypes onto haplotypes. -- Created two subclasses, one that writes all haplotypes (previous functionality) and a CalledHaplotypeBAMWriter that will only write reads aligned to the actually called haplotypes -- Extended PerReadAlleleLikelihoodMap.getMostLikelyAllele to optionally restrict set of alleles to consider best -- Massive increase in unit tests in AlignmentUtils, along with several new powerful functions for manipulating cigars -- Fix bug in SWPairwiseAlignment that produces cigar elements with 0 size, and are now fixed with consolidateCigar in AlignmentUtils -- HaplotypeCaller now tracks the called haplotypes in the GenotypingEngine, and returns this information to the HC for use in visualization. -- Added extensive docs to HaplotypeCaller on how to use this capability -- BUGFIX -- don't modify the read bases in GATKSAMRecord in LikelihoodCalculationEngine in the HC -- Cleaned up SWPairwiseAlignment. Refactored out the big main and supplementary static methods. Added a unit test with a bug TODO to fix what seems to be an edge case bug in SW -- Integration test to make sure we can actually write a BAM for each mode. This test only ensures that the code runs and doesn't exception out. It doesn't actually enforce any MD5s -- HaplotypeBAMWriter also left aligns indels in the reads, as SW can return a random placement of a read against the haplotype. Calls leftAlign to make the alignments more clear, with unit test of real read to cover this case -- Writes out haplotypes for both all haplotype and called haplotype mode -- Haplotype writers now get the active region call, regardless of whether an actual call was made. Only emitting called haplotypes is moved down to CalledHaplotypeBAMWriter	2013-03-03 12:07:29 -05:00
depristo	6204e6ccc9	Merge pull request #76 from broadinstitute/md_kb_bugfix_GSA-795 Bug fixes and optimizations for NA12878 KB	2013-03-01 10:52:16 -08:00
Eric Banks	ebd5404124	Fixed the add functionality of GenomeLocSortedSet. * Fixed GenomeLocSortedSet.add() to ensure that overlapping intervals are detected and an exception is thrown. * Fixed GenomeLocSortedSet.addRegion() by merging it with the add() method; it now produces sorted inputs in all cases. * Cleaned up duplicated code throughout the engine to create a list of intervals over all contigs. * Added more unit tests for add functionality of GLSS. * Resolves GSA-775.	2013-02-28 23:31:00 -05:00
Mark DePristo	4095a9ef32	Bugfixes for AssessNA12878 -- Refactor initialization routine into BadSitesWriter. This now adds the GQ and DP genotype header lines which are necessarily if the input VCF doesn't have proper headers -- GATKVariantContextUtils subset to biallelics now tolerates samples with bad GL values for multi-allelics, where it just removes the PLs and issues a warning.	2013-02-28 10:35:06 -05:00
depristo	92d6a4f441	Merge pull request #75 from broadinstitute/eb_missing_rg_error_GSA-407 Added better error message for BAMs with bad read groups.	2013-02-28 05:20:39 -08:00
Eric Banks	12fc198b80	Added better error message for BAMs with bad read groups. * Split the cases into reads that don't have a RG at all vs. those with a RG that's not defined in the header. * Added integration tests to make sure that the correct error is thrown. * Resolved GSA-407.	2013-02-27 16:02:56 -05:00
Eric Banks	69b8173535	Replace uses of NestedHashMap with NestedIntegerArray. * Removed from codebase NestedHashMap since it is unused and untested. * Integration tests change because the BQSR CSV is now sorted automatically. * Resolves GSA-732	2013-02-27 14:03:39 -05:00
David Roazen	752f4335a5	Merged bug fix from Stable into Unstable	2013-02-27 05:20:41 -05:00
David Roazen	2a7af43164	Fix improper dependencies in QScripts used by pipeline tests, and attempt to fix the flawed MisencodedBaseQualityUnitTest -Some QScripts used by public pipeline tests unnecessarily used the (now protected) UnifiedGenotyper. Changed them to use PrintReads instead. -Moved ExampleUnifiedGenotyperPipelineTest to protected -Attempt to fix the flawed and sporadically failing MisencodedBaseQualityUnitTest: After looking at this class a bit, I think the problem was the use of global arrays for the quals shared across all reads in all tests (BAMRecord class definitely does not make a separate copy for each read!). One test (testFixBadQuals) modifies the bad quals array, and if this happens to run before the testBadQualsThrowsError test the bad quals array will have been "fixed" and no exception will be thrown.	2013-02-27 04:45:53 -05:00
David Roazen	a53b4a7521	Merged bug fix from Stable into Unstable	2013-02-26 21:41:13 -05:00
David Roazen	65d31ba4ad	Fix runtime public -> protected dependencies in the test suite -replace unnecessary uses of the UnifiedGenotyper by public integration tests with PrintReads -move NanoSchedulerIntegrationTest to protected, since it's completely dependent on the UnifiedGenotyper	2013-02-26 21:19:12 -05:00
depristo	93205154b5	Merge pull request #63 from broadinstitute/eb_fix_pairhmm_unittest_GSA-776 Eb fix pairhmm unittest gsa 776	2013-02-26 11:56:58 -08:00
Mauricio Carneiro	711cbd3b5a	Archiving CoverageBySample This walker was not updated since 2009, and users were getting wrong answers when running it with ReduceReads. I don't want to deal with this because DiagnoseTargets does everything this walker does.	2013-02-26 13:49:00 -05:00
depristo	51d618de97	Merge pull request #62 from broadinstitute/rp_increase_max_kmer_in_assembly The maximum kmer length is derived from the reads.	2013-02-26 05:37:02 -08:00
Eric Banks	7519484a38	Refactored PairHMM.initialize to first take haplotype max length and then the read max length so that it is consistent with other PairHMM methods.	2013-02-25 15:04:23 -05:00
Ryan Poplin	89e2943dd1	The maximum kmer length is derived from the reads. -- This is done to take advantage of longer reads which can produce less ambiguous haplotypes -- Integration tests change for HC and BiasedDownsampling	2013-02-25 14:40:25 -05:00
David Roazen	3645ea9bb6	Sequence dictionary validation: detect problematic contig indexing differences The GATK engine does not behave correctly when contigs are indexed differently in the reads sequence dictionaries vs. the reference sequence dictionary, and the inconsistently-indexed contigs are included in the user's intervals. For example, given the dictionaries: Reference dictionary = { chrM, chr1, chr2, ... } BAM dictionary = { chr1, chr2, ... } and the interval "-L chr1", the engine would fail to correctly retrieve the reads from chr1, since chr1 has a different index in the two dictionaries. With this patch, we throw an exception if there are contig index differences between the dictionaries for reads and reference, AND the user's intervals include at least one of the mismatching contigs. The user can disable this exception via -U ALLOW_SEQ_DICT_INCOMPATIBILITY In all other cases, dictionary validation behaves as before. I also added comprehensive unit tests for the (previously-untested) SequenceDictionaryUtils class. GSA-768 #resolve	2013-02-25 11:14:22 -05:00
Ryan Poplin	6a639c8ffc	Replace Smith-Waterman alignment with the bubble traversal. -- Instead of doing a full SW alignment against the reference we read off bubbles from the assembly graph. -- Smith-Waterman is run only on the base composition of the bubbles which drastically reduces runtime. -- Refactoring graph functions into a new DeBruijnAssemblyGraph class. -- Bug fix in path.getBases(). -- Adding validation code to the assembly engine. -- Renaming SimpleDeBruijnAssembler to match the naming of the new Assembly graph class. -- Adding bug fixes, docs and unit tests for DeBruijnAssemblyGraph and KBestPaths classes. -- Added ability to ignore bubbles that are too divergent from the reference -- Max kmer can't be bigger than the extension size. -- Reverse the order that we create the assembly graphs so that the bigger kmers are used first. -- New algorithm for determining unassembled insertions based on the bubble traversal instead of the full SW alignment. -- Don't need the full read span reference loc for anything any more now that we clip down to the extended loc for both assembly and likelihood evaluation. -- Updating HaplotypeCaller and BiasedDownsampling integration tests. -- Rebased everything into one commit as requested by Eric -- improvements to the bubble traversal are coming as a separate push	2013-02-22 15:42:16 -05:00
depristo	2ad559cf58	Merge pull request #59 from broadinstitute/mc_reving_testng_GSA-695 Updating TestNG to the latest version	2013-02-22 10:39:04 -08:00
Mauricio Carneiro	4ac50c89ad	Updating TestNG to the latest version -- changed SkipException constructors that are now private in TestNG -- Updated build.xml to use the latest testng -- Added guice dependency to ivy -- Fixed broken SampleDBUnitTest The SampleDBUnitTest was only passing before because the map comparison in the old TestNG was broken. It was comparing two DIFFERENT samples and testing for "equals" GSA-695 #resolve	2013-02-22 09:40:23 -05:00
Mark DePristo	182c32a2b7	Relax bounds checking in QualityUtils.boundQual -- Previous version did runtime checking that qual >= 0 but BQSR was relying on boundQual to restore -1 to 1. So relax the bound.	2013-02-22 08:46:59 -05:00
Mark DePristo	8ac6d3521f	Vast improvements to AssessNA12878 code and functionality -- AssessNA12878 now breaks out multi-allelics into bi-allelic components. This means that we can properly assess multi-allelic calls against the bi-allelic KB -- Refactor AssessNA12878, moving into assess package in KB. Split out previously private classes in the walker itself into separate classes. Added real docs for all of the classes. -- Vastly expand (from 0) unit tests for NA12878 assessments -- Allow sites only VCs to be evaluated by Assessor -- Move utility for creating simple VCs from a list of string alleles from GATKVariantContextUtilsUnitTest to GATKVariantContextUtils -- Assessor bugfix for discordant records at a site. Previous version didn't handle properly the case where one had a non-matching call in the callset w.r.t. the KB, so that the KB element was eaten during the analysis. Fixed. UnitTested -- See GSA-781 -- Handle multi-allelic variants in KB for more information -- Bugfix for missing site counting in AssessNA12878. Previous version would count N misses for every missed value at a site. Not that this has much impact but it's worth fixing -- UnitTests for BadSitesWriter -- UnitTests for filtered and filtering sites in the Assessor -- Cleanup end report generation code (simply the code). Note that instead of "indel" the new code will print out "INDELS" -- Assessor DoC calculations now us LIBS and RBPs for the depth calculation. The previous version was broken for reduced reads. Added unit test that reads a complex reduced read example and matches the DoC of this BAM with the output of the GATK DoC tool here. -- Added convenience constructor for LIBS using just SAMFileReader and an iterator. It's now easy to create a LIBS from a BAM at a locus. Added advanceToLocus function that moves the LIBS to a specific position. UnitTested via the assessor (which isn't ideal, but is a proper test)	2013-02-21 20:43:12 -05:00
Mark DePristo	29319bf222	Improved allele trimming code in GATKVariantContextUtils -- Now supports trimming the alleles from both the reverse and forward direction. -- Added lots of unit tests for forwrad allele trimming, as well as creating VC from forward and reverse trimming. -- Added docs and tests for the code, to bring it up to GATK spec	2013-02-21 12:01:43 -05:00
Eric Banks	6996a953a8	Haplotype/Allele based optimizations for the HaplotypeCaller that knock off nearly 20% of the total runtime (multi-sample). These 2 changes improve runtime performance almost as much as Ryan's previous attempt (with ID-based comparisons): * Don't unnecessarily overload Allele.getBases() in the Haplotype class. * Haplotype.getBases() was calling clone() on the byte array. * Added a constructor to Allele (and Haplotype) that takes in an Allele as input. * It makes a copy of he given allele without having to go through the validation of the bases (since the Allele has already been validated). * Rev'ed the variant jar accordingly. For the reviewer: all tests passed before rebasing, so this should be good to go as far as correctness.	2013-02-21 10:14:11 -05:00
Geraldine Van der Auwera	c3e01fea40	Added several more info types / annotations to GATKDocs -- top-level walker type (locus, read etc) -- parallelism options (nt or nct) -- annotation type (for Variant Annotations) -- downsampling settings that override engine defaults -- reference window size -- active region settings -- partitionBy info	2013-02-21 03:12:40 -05:00
Geraldine Van der Auwera	e674b4a524	Added new ReadFilter that allows users to specifically reassign one single mapping quality to a different value. Useful for TopHat and other RNA-seq software users.	2013-02-20 01:24:45 -05:00
MauricioCarneiro	76810465aa	Merge pull request #40 from broadinstitute/gg_retrieve_readfilters_GSATDG-63	2013-02-19 19:42:35 -08:00
Mark DePristo	910d966428	Extend timeout of NanoScheduler deadlock tests -- The previous timeout of 1 second was just dangerously short. Increase the timeout to 10 seconds	2013-02-19 20:25:25 -05:00
Eric Banks	0055a6f1cd	Merge pull request #45 from broadinstitute/mc_fix_indelrealigner_GSA-774 Fix to the Indel Realigner bug described in GSA-774	2013-02-19 16:16:48 -08:00
Geraldine Van der Auwera	faef85841b	Added GATKDocs fct to indicate default Read Filters for each tool -- Added getClazzAnnotations() as hub to retrieve various annotations values and class properties through reflection -- Added getReadFilters() method to retrieve Read Filter annotations -- getReadFilters() uses recursion to walk up the inheritance to also capture superclass annotations -- getClazzAnnotations() stores collected info in doc handler root, which is unit.forTemplate in Doclet -- Modified FreeMarker template to use the Readfilters info (displayed after arg table, before additional capabilities) -- Tadaaa :-) #GSATDG-63 resolve	2013-02-19 16:12:29 -05:00
Mauricio Carneiro	371ea2f24c	Fixed IndelRealigner reference length bug (GSA-774) -- modified ReadBin GenomeLoc to keep track of softStart() and softEnd() of the reads coming in, to make sure the reference will always be sufficient even if we want to use the soft-clipped bases -- changed the verification from readLength to aligned bases to allow reads with soft-clipped bases -- switched TreeSet -> PriorityQueue in the ConstrainedMateFixer as some different reads can be considered equal by picard's SAMRecordCoordinateComparator (the Set was replacing them) -- pulled out ReadBin class so it can be testable -- added unit tests for ReadBin with soft-clips -- added tests for getMismatchCount (AlignmentUtils) to make sure it works with soft-clipped reads GSA-774 #resolve	2013-02-19 16:00:36 -05:00
Mauricio Carneiro	815028edd4	Added verbose error message to the PluginManager -- added a logger.error with a more descriptive message of what the most likely cause of the error is Typical error happens when a walker's global variable is not initialized properly (usually in test conditions). The old error message was very hard to understand "Could not create module because of an exception of type NullPointerException ocurred caused by exception null"	2013-02-19 16:00:35 -05:00
Ryan Poplin	c025e84c8b	Fix for calculating read pos rank sum test with reads that are informative but don't actually overlap the variant due to some hard clipping. -- Updated a few integration tests for HC, UG, and UG general ploidy	2013-02-19 14:09:24 -05:00
Mark DePristo	be45edeff2	ActivityProfile and ActiveRegions respects engine interval boundaries -- Active regions are created as normal, but they are split and trimmed to the engine intervals when added to the traversal, if there are intervals present. -- UnitTests for ActiveRegion.splitAndTrimToIntervals -- GenomeLocSortedSet.getOverlapping uses binary search to efficiently in ~ log N time find overlapping intervals -- UnitTesting overlap function in GenomeLocSortedSet -- Discovered fundamental implementation bug in that adding genome locs out of order (elements on 20 then on 19) produces an invalid GenomeLocSortedSet. Created a JIRA to address this: https://jira.broadinstitute.org/browse/GSA-775 -- Constructor that takes a collection of genome locs now sorts its input and merges overlapping intervals -- Added docs for the constructors in GLSS -- Update HaplotypeCaller MD5s, which change because ActiveRegions are now restricted to the engine intervals, which changes slightly the regions in the tests and so the reads in the regions, and thus the md5s -- GenomeAnalysisEngineUnitTest needs to provide non-null genome loc parser	2013-02-18 10:40:25 -05:00
Mark DePristo	3b67aa8aee	Final edge case bug fixes to QualityUtil routines -- log10 functions in QualityUtils allow -Infinity to allow log10(0.0) values -- Fix edge condition of log10OneMinusX failing with Double.MIN_VALUE -- Fix another edge condition of log10OneMinusX failing with a small but not min_value double	2013-02-16 07:31:38 -08:00
Mark DePristo	b393c27f07	QualityUtils now uses runtime argument checks instead of contract -- There's some runtime cost for these tests, but it's not big enough to outweigh the value of catching errors quickly	2013-02-16 07:31:38 -08:00

1 2 3 4 5 ...

3553 Commits (cb49b8cc714528fa7ecccfa8dc038e1c8e0c09c1)