Commit Graph

257 Commits (9826192854c8c66fdf2e802c70f03e32dec51d74)

Author SHA1 Message Date
Eric Banks 9826192854 Added contracts, docs, and tests for several methods in AlignmentUtils. There are over 74K tests being run now for this class!
* AlignmentUtils.getMismatchCount()
* AlignmentUtils.calcAlignmentByteArrayOffset()
* AlignmentUtils.readToAlignmentByteArray().
* AlignmentUtils.leftAlignIndel()
2013-02-07 13:04:24 -05:00
Eric Banks 481982202d Fixing the failing RR integration tests.
* After consulting Tim/David/Mauricio we determined that the md5 changes were due to different encodings of binary arrays in samjdk
   * However, it made no functional difference to the results (confirmed by Eric) so we agreed to update md5s
 * Also, the header of one of the test bams was malformed but old picard jar didn't perform checks so it only started failing now
   * Fixed the bam
2013-02-06 12:40:56 -05:00
eitanbanks 584899329c Merge pull request #13 from broadinstitute/dr_variant_migration_GSA-692
Replace org.broadinstitute.variant with jar built from the Picard repo
2013-02-06 07:22:30 -08:00
Eric Banks 562f2406d7 Added check that BaseRecalibrator is not being run on a reduced bam.
- Throws user exception if it is.
 - Can be turned off with --allow_bqsr_on_reduced_bams_despite_repeated_warnings argument.
 - Added test to check this is working.
 - Added docs to BQSRReadTransformer explaining why this check is not performed on PrintReads end.
 - Added small bug fix to GenomeAnalysisEngine that I uncovered in this process.
 - Added comment about not changing the program record name, as per reviewer comments.
 - Removed unused variable.
2013-02-06 10:14:27 -05:00
Eric Banks e7c35a907f Fixes to BQSR for the --maximum_cycle_value argument.
- It's now written into the recal report so that it can be used in the PrintReads step.
  - Note that we also now write the --deletions_default_quality value which accidentally wasn't being written before!
  - Added tests to make sure that the value of the --maximum_cycle_value is being used properly by PR with -BQSR.
(This is my last non-branch commit; all future pushes will follow new GATK practices)
2013-02-05 17:38:03 -05:00
David Roazen e7e76ed76e Replace org.broadinstitute.variant with jar built from the Picard repo
The migration of org.broadinstitute.variant into the Picard repo is
complete. This commit deletes the org.broadinstitute.variant sources
from our repo and replaces it with a jar built from a checkout of the
latest Picard-public svn revision.
2013-02-05 17:24:25 -05:00
Ryan Poplin cb2dd470b6 Moving the random number generator over to using GenomeAnalysisEngine.getRandomGenerator in the logless versus exact pair hmm unit test. We don't believe this will fix the problem with the non-deterministic test failures but it will give us more information the next time it fails. 2013-02-05 12:56:20 -05:00
MauricioCarneiro 050c4794a5 Merge pull request #11 from yfarjoun/per_sample2
-Added Per-Sample Contamination Removal to UnifiedGenotyper: Added an @A...
2013-02-05 08:04:29 -08:00
Eric Banks 23c6aee236 Added in some basic unit tests for polyploid consensus creation in RR.
- Uncovered small bug in the fix that I added yesterday, which is now fixed properly.
- Uncovered massive general bug: polyploid consensus is totally busted for deletions (because of call to read.getReadBases()[readPos]).
  - Need to consult Mauricio on what to do here (are we supporting het compression for deletions?  (Insertions are definitely not supported)
2013-02-05 10:35:45 -05:00
Yossi Farjoun de03f17be4 -Added Per-Sample Contamination Removal to UnifiedGenotyper: Added an @Advanced option to the StandardCallerArgumentCollection, a file which should
contain two columns, Sample (String) and Fraction (Double) that form the Sample-Fraction map for the per-sample AlleleBiasedDownsampling.
-Integration tests to UnifiedGenotyper (Using artificially contaminated BAMs created from a mixure of two broadly concented samples) were added
-includes throwing an exception in HC if called using per-sample contamination file (not implemented); tested in a new integration test.
-(Note: HaplotypeCaller already has "Flat" contamination--using the same fraction for all samples--what it doesn't have is
   _per-sample_ AlleleBiasedDownsampling, which is what has been added here to the UnifiedGenotyper.
-New class: DefaultHashMap (a Defaulting HashMap...) and new function: loadContaminationFile (which reads a Sample-Fraction file and returns a map).
-Unit tests to the new class and function are provided.
-Added tests to see that malformed contamination files are found and that spaces and tabs are now read properly.
-Merged the integration tests that pertain to biased downsampling, whether HaplotypeCaller or unifiedGenotyper, into a new IntegrationTest class.
2013-02-04 18:24:36 -05:00
Eric Banks 70f3997a38 More RR tests and fixes.
* Fixed implementation of polyploid (het) compression in RR.
  * The test for a usable site was all wrong.  Worked out details with Mauricio to get it right.
  * Added comprehensive unit tests in HeaderElement class to make sure this is done right.
  * Still need to add tests for the actual polyploid compression.
  * No longer allow non-diploid het compression; I don't want to test/handle it, do you?
* Added nearly full coverage of tests for the BaseCounts class.
2013-02-04 15:55:15 -05:00
Ryan Poplin 79ef41e7b1 Added some docs, unit test, and contracts to SimpleDeBruijnAssembler.
-- Testing that cycles in the reference graph fail graph construction appropriately.
-- Minor bug fix in assembly with reduced reads.

Added some docs and contracts to SimpleDeBruijnAssembler

Added a unit test to SimpleDeBruijnAssembler
2013-02-04 15:17:22 -05:00
Chris Hartl 41a030f4b7 Apparently I'm a failure at rebasing...there should have been only one commit message to write. But whatever, here it is again:
Part 1 of Variant Annotator Unit tests: PerReadAlleleLikelihoodMap

 - Added contract enforcement for public methods
 - Refactored the conversion from read -> (allele -> likelihood) to allele -> list[read] into its own method
 - added method documentation for non getters/setters
 - finals, finals everywhere
 - Add in a unit test for the PerReadAlleleLikelihoodMap. Complete coverage except for .clear() and a method that is a straight call into a separately-tested utility class.
2013-02-04 14:16:28 -05:00
Ryan Poplin d9fd89ecaa Somehow these md5 updates got lost in my previous git rebase disaster. Sorry for the trouble. 2013-02-04 13:26:18 -05:00
Eric Banks 2d518f3063 More RR-related updates and tests.
- ReduceReads by default now sets up-front ReadWalker downsampling to 40x per start position.
   - This is the value I used in my tests with Picard to show that memory issues pretty much disappeared.
   - This should hopefully take care of the memory issues being reported on the forum.
- Added javadocs to SlidingWindow (the main RR class) to follow GATK conventions.
- Added more unit tests to increase coverage of BaseCounts class.
- Added more unit tests to test I/D operators in the SlidingWindow class.
2013-02-04 12:57:43 -05:00
Guillermo del Angel 971ded341b Swap java Random generator for GATK one to ensure test determinism 2013-02-04 10:57:34 -05:00
Guillermo del Angel f31bf37a6f First step in better BQSR unit tests for covariates (not done yet): more test coverage in basic covariates, test logging several read groups/read lengths and more combinations simultaneously.
Add basic Javadocs headers for PerReadAlleleLikehoodMap.
2013-02-03 15:31:30 -05:00
Eric Banks 03df5e6ee6 - Added more comprehensive tests for consensus creation to RR. Still need to add tests for I/D ops.
- Added RR qual correctness tests (note that this is a case where we don't add code coverage but still need to test critical infrastructure).
- Also added minor cleanup of BaseUtils
2013-02-01 15:37:19 -05:00
Ryan Poplin 2fee000dba Adding unit tests for KBestPaths class and fixing edge case bugs. 2013-02-01 13:51:31 -05:00
David Roazen c6581e4953 Update MD5s to reflect version number change in the BAM header
I've confirmed via a script that all of these differences only
involve the version number bump in the BAM headers and nothing
else:

< @HD   VN:1.0  GO:none SO:coordinate
---
> @HD   VN:1.4  GO:none SO:coordinate
2013-02-01 13:51:31 -05:00
Guillermo del Angel a520058ef6 Add option to specify maximum STR length to RepeatCovariates from command line to ease testing 2013-02-01 13:51:31 -05:00
Ryan Poplin 495bca3d1a Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2013-01-31 10:12:26 -05:00
Ryan Poplin ca6968d038 Use base List and Map types in the GenotypingEngineUnitTest. 2013-01-31 10:12:18 -05:00
Eric Banks 75ceddf9e5 Adding new unit tests for RR. These tests took a frustratingly long time to get to pass, but now we have a framework for
testing the adding of reads into the SlidingWindow plus consensus creation.  Will flesh these out more after I take care of
some other items on my plate.
2013-01-31 09:46:38 -05:00
Ryan Poplin 5f4a063def Breaking up my massive commits into smaller pieces that I can successfully merge and digest. This one enables downsampling in the HaplotypeCaller (by lowering the default dcov to 20) and removes my long-standing, temporary region-based downsampling. 2013-01-30 16:14:07 -05:00
Ryan Poplin ff8ba03249 Updating BQSR integration test md5s to reflect the updates to the hierarchicalBayesianQualityEstimate function 2013-01-30 13:30:18 -05:00
Ryan Poplin 85dabd321f Adding unit tests for hierarchicalBayesianQualityEstimate function 2013-01-30 13:26:07 -05:00
Ryan Poplin 07fe3dd1ef Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2013-01-30 13:19:24 -05:00
David Roazen 9985f82a7a Move BaseUtils back to the GATK by request, along with associated utility methods 2013-01-30 13:09:44 -05:00
Ryan Poplin 2967776458 The Empirical quality column in the recalibration report can't be compared in the BQSRGatherer because the value is calculated using the Bayesian estimate with different priors. This value should never be used from a recalibration report anyway except during plotting. 2013-01-30 12:28:14 -05:00
Eric Banks 9025567cb8 Refactoring the SimpleGenomeLoc into the now public utility UnvalidatingGenomeLoc and the RR-specific FinishedGenomeLoc.
Moved the merging utility methods into GenomeLoc and moved the unit tests around accordingly.
2013-01-30 10:45:29 -05:00
Mark DePristo 92c5635e19 Cleanup, document, and unit test ActiveRegion
-- All functions tested.  In the testing / review I discovered several bugs in the ActiveRegion routines that manipulate reads.  New version should be correct
-- Enforce correct ordering of supporting states in constructor
-- Enforce read ordering when adding reads to an active region in add
-- Fix bug in HaplotypeCaller map with new updating read spans.  Now get the full span before clipping down reads in map, so that variants are correctly placed w.r.t. the full reference sequence
-- Encapsulate isActive field with an accessor function
-- Make sure that all state lists are unmodifiable, and that the docs are clear about this
-- ActiveRegion equalsExceptReads is for testing only, so make it package protected
-- ActiveRegion.hardClipToRegion must resort reads as they can become out of order
-- Previous version of HC clipped reads but, due to clipping, these reads could no longer overlap the active region.  The old version of HC kept these reads, while the enforced contracts on the ActiveRegion detected this was a problem and those reads are removed.  Has a minor impact on PLs and RankSumTest values
-- Updating HaplotypeCaller MD5s to reflect changes to ActiveRegions read inclusion policy
2013-01-30 09:47:12 -05:00
Mauricio Carneiro 29fd536c28 Updating licenses manually
Please check that your commit hook is properly pointing at ../../private/shell/pre-commit

Conflicts:
	public/java/test/org/broadinstitute/variant/VariantBaseTest.java
2013-01-29 17:27:53 -05:00
David Roazen a536e1da84 Move some VCF/VariantContext methods back to the GATK based on feedback
-Moved some of the more specialized / complex VariantContext and VCF utility
 methods back to the GATK.

-Due to this re-shuffling, was able to return things like the Pair class back
 to the GATK as well.
2013-01-29 16:56:55 -05:00
Eric Banks e4ec899a87 First pass at adding unit tests for the RR framework: I have added 3 tests and all 3 uncovered RR bugs!
One of the fixes was critical: SlidingWindow was not converting between global and relative positions correctly.
Besides not being correct, it was resulting in a massive slow down of the RR traversal.
That fix definitely breaks at least one of the integration tests, but it's not worth changing md5s now because I'll be
changing things all over RR for the next few days, so I am going to let that test fail indefinitely until I can confirm
general correctness of the tool.
2013-01-29 15:51:07 -05:00
Guillermo del Angel 1d5b29e764 Unit tests for repeat covariates: generate 100 random reads consisting of tandem repeat units of random content and size, and check that covariates match expected values at all positions in reads.
Fixed corner case where value of covariate at border between 2 tandem repeats of different length/content wasn't consistent
2013-01-29 15:23:02 -05:00
Ryan Poplin 35543b9cba updating BQSR integration test values for the PR half of BQSR. 2013-01-29 09:47:57 -05:00
Guillermo del Angel ff799cc79a Fixed bad merge 2013-01-28 20:04:25 -05:00
Guillermo del Angel 5995f01a01 Big intermediate commit (mostly so that I don't have to go again through merge/rebase hell) in expanding BQSR capabilities. Far from done yet:
a) Add option to stratify CalibrateGenotypeLikelihoods by repeat - will add integration test in next push.
b) Simulator to produce BAM files with given error profile - for now only given SNP/indel error rate can be given. A bad context can be specified and if such context is present then error rate is increased to given value.
c) Rewrote RepeatLength covariate to do the right thing - not fully working yet, work in progress.
d) Additional experimental covariates to log repeat unit and combined repeat unit+length. Needs code refactoring/testing
2013-01-28 19:55:46 -05:00
David Roazen f63f27aa13 org.broadinstitute.variant refactor, part 2
-removed sting dependencies from test classes
-removed org.apache.log4j dependency
-misc cleanup
2013-01-28 09:03:46 -05:00
Ami Levy-Moonshine b4447cdca2 In cases where one uses VariantContextUtils.GenotypeMergeType.REQUIRE_UNIQUE we used to verify that the samples names are unique in VariantContextUtils.simpleMerge for each VCs. It couse to a bug that was reported on the forum (when a VCs had 2 VC from the same sample).
Now we will check it only in CombineVariants.init using the headers. A new function was added to SamplesUtils with unitTests in CVunitTest.java.
2013-01-25 15:49:51 -05:00
Mark DePristo 3f95f39be3 Updating HC md5s for new cutting algorithm and default band pass filter parameters 2013-01-25 11:07:29 -05:00
Eric Banks 6dd0e1ddd6 Pulled out the --regenotype functionality from SelectVariants into its own tool: RegenotypeVariants.
This allows us to move SelectVariants into the public suite of tools now.
2013-01-25 09:42:04 -05:00
Chris Hartl a3b98daf1a Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2013-01-23 14:49:34 -05:00
Chris Hartl 7fcfa4668c Since GenotypeConcordance is now a standalone walker, remove the old GenotypeConcordance evaluation module and the associated integration tests. 2013-01-23 14:47:23 -05:00
Mark DePristo 8026199e4c Updating md5s for CountReadsInActiveRegions and HaplotypeCaller to reflect new activity profile mechanics
-- In this process I discovered a few missed sites in the old code.  The new approach actually produces better HC results than the previous version.
2013-01-23 13:46:01 -05:00
Chris Hartl c500e1d8ac Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable 2013-01-22 15:31:30 -05:00
Chris Hartl 7060e01a8e Fix for broken unit test plus some minor changes to comments. Unit tests were broken by my pulling the site status utility function into the enum. Thankfully the unit tests caught my silly duplication of a line. 2013-01-22 15:14:41 -05:00
Mauricio Carneiro 7b8b064165 Last manual license update (hopefully)
if everyone updates their git hook accordingly, this will be the last time I have to manually run the script.

GSATDG-5
2013-01-18 16:13:07 -05:00
Eric Banks cac439bc5e Optimized the Allele Biased Downsampling: now it doesn't re-sort the pileup but just removes reads from the original one.
Added a small fix that slightly changed md5s.
2013-01-18 11:17:31 -05:00