Commit Graph

1590 Commits (c8c06c775326b964be22842bbb6e45dbeec8c4a7)

Author SHA1 Message Date
Ryan Poplin 3caa1b83bb Updating HC integration tests 2012-02-11 11:48:32 -05:00
Ryan Poplin 9b8fd4c2ff Updating the half of the code that makes use of the recalibration information to work with the new refactoring of the bqsr. Reverting the covariate interface change in the original bqsr because the error model enum was moved to a different class and didn't make sense any more. 2012-02-11 10:57:20 -05:00
Eric Banks f52f1f659f Multiallelic implementation of the TDT should be a pairwise list of values as per Mark Daly. Integration tests change because the count in the header is now A instead of 1. 2012-02-10 14:15:59 -05:00
Mauricio Carneiro 1fb19a0f98 Moving the covariates and shared functionality to public
so Ryan can work on the recalibration on the fly without breaking the build. Supposedly all the secret sauce is in the BQSR walker, which sits in private.
2012-02-10 11:44:01 -05:00
Eric Banks 5e18020a5f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-10 11:08:33 -05:00
Eric Banks f53cd3de1b Based on Ryan's suggestion, there's a new contract for genotyping multiple alleles. Now the requester submits alleles in any arbitrary order - rankings aren't needed. If the Exact model decides that it needs to subset the alleles because too many were requested, it does so based on PL mass (in other words, I moved this code from the SNPGenotypeLikelihoodsCalculationModel to the Exact model). Now subsetting alleles is consistent. 2012-02-10 11:07:32 -05:00
Mauricio Carneiro 5af373a3a1 BQSR with indels integrated!
* added support to base before deletion in the pileup
   * refactored covariates to operate on mismatches, insertions and deletions at the same time
   * all code is in private so original BQSR is still working as usual in public
   * outputs a molten CSV with mismatches, insertions and deletions, time to play!
   * barely tested, passes my very simple tests... haven't tested edge cases.
2012-02-09 18:46:45 -05:00
Eric Banks 7a937dd1eb Several bug fixes to new genotyping strategy. Update integration tests for multi-allelic indels accordingly. 2012-02-09 16:14:22 -05:00
Eric Banks 0f728a0604 The Exact model now subsets the VC to the first N alleles when the VC contains more than the maximum number of alleles (instead of throwing it out completely as it did previously). [Perhaps the culling should be done by the UG engine? But theoretically the Exact model can be called outside of the UG and we'd still want the context subsetted.] 2012-02-09 14:02:34 -05:00
Matt Hanna aa097a83d5 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-09 11:26:48 -05:00
Matt Hanna b57d4250bf Documentation request by Eric. At each stage of the GATK where filtering occurs, added documentation suggesting the goal of the filtering along with examples of suggested inputs and outputs. 2012-02-09 11:24:52 -05:00
Mauricio Carneiro d561914d4f Revert "First implementation of GATKReportGatherer"
premature push from my part. Roger is still working on the new format and we need to update the other tools to operate correctly with the new GATKReport.

This reverts commit aea0de314220810c2666055dc75f04f9010436ad.
2012-02-08 23:28:55 -05:00
Eric Banks 2f800b078c Changes to default behavior of UG: multi-allelic mode is always on; max number of alternate alleles to genotype is 3; alleles in the SNP model are ranked by their likelihood sum (Guillermo will do this for indels); SB is computed again. 2012-02-08 15:27:16 -05:00
Matt Hanna 51ac87b28c Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-08 08:43:55 -05:00
Matt Hanna 5b58fe741a Retiring Picard customizations for async I/O and cleaning up parts of the code to use common Picard utilities I recently discovered.
Also embedded bug fix for issues reading sparse shards and did some cleanup based on comments during BAM reading code transition meetings.
2012-02-08 08:34:37 -05:00
Mauricio Carneiro 337819e791 disabling the test while we fix it 2012-02-07 19:22:32 -05:00
Roger Zurawicki c0c676590b First implementation of GATKReportGatherer
- Added the GATKReportGatherer
- Added private methods in GATKReport to combine Tables and Reports
- It is very conservative and it will only gather if the table columns, match.
- At the column level it uses the (redundant) row ids to add new rows. It will throw an exception if it is overwriting data.
Added the gatherer functions to CoverageByRG

Also added the scatterCount parameter in the Interval Coverage script
Made some more GATKReport methods public

The UnitTest included shows that the merging methods work
Added a getter for the PrimaryKeyName
Fixed bugs that prevented the gatherer form working

Working GATKReportGatherer
Has only the functional to addLines
The input file parser assumes that the first column is the primary key

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-02-07 18:14:47 -05:00
Mauricio Carneiro e89887cd8e laying groundwork to have insertions and deletions going through the system. 2012-02-07 18:11:53 -05:00
Mauricio Carneiro 0d3ea0401c BQSR Parameter cleanup
* get rid of 320C argument that nobody uses.
   * get rid of DEFAULT_READ_GROUP parameter and functionality (later to become an engine argument).
2012-02-07 14:42:11 -05:00
Eric Banks 717cd4b912 Document -L unmapped 2012-02-07 13:30:54 -05:00
Eric Banks 718da7757e Fixes to ValidateVariants as per GS post: ref base of mixed alleles were sometimes wrong, error print out of bad ACs was throwing a RuntimeException, don't validate ACs if there are no genotypes. 2012-02-07 13:15:58 -05:00
Eric Banks 9d1a19bbaa Multi-allelic indels were not being printed out correctly in VariantsToTable; fixed. 2012-02-06 22:49:29 -05:00
Mauricio Carneiro 5961868a7f fixup for BQSR (HC integration tests)
In the new BQSR implementation, covariates do depend on the RecalibrationArgumentCollection.
2012-02-06 22:47:27 -05:00
Mauricio Carneiro 6e6f0f10e1 BaseQualityScoreRecalibration walker (bqsr v2) first commit includes
* Adding the context covariate standard in both modes (including old CountCovariates) with parameters
   * Updating all covariates and modules to use GATKSAMRecord throughout the code.
   * BQSR now processes indels in the pileup (but doesn't do anything with them yet)
2012-02-06 17:38:29 -05:00
Eric Banks 0717c79901 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-06 16:23:36 -05:00
Eric Banks 91897f5fe7 Transpose rows/cols in AF table to make it molten (so I can plot easily in R) 2012-02-06 16:23:32 -05:00
Guillermo del Angel fb5786385c Merged bug fix from Stable into Unstable 2012-02-06 13:22:56 -05:00
Guillermo del Angel 6ec686b877 Complement to previous commit: make sure we also don't inherit filter from input VCF when genotyping at an empty site 2012-02-06 13:19:26 -05:00
Guillermo del Angel 93ffca1e3a Merged bug fix from Stable into Unstable 2012-02-06 11:58:58 -05:00
Guillermo del Angel 827be878b4 Bug fix when running UG in GenotypeGivenAlleles mode: if an input site to genotype had no coverage, the output VCF had AC,AF and AN inherited from input VCF, which could have nothing to do with given BAM so numbers could be non-sensical. Now new vc has clear attributes instead of attributes inherited from input VCF. 2012-02-06 11:58:13 -05:00
Eric Banks fbbd04621d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-06 11:53:31 -05:00
Eric Banks edb4edc08f Commented out unused metrics for now 2012-02-06 11:53:15 -05:00
Ryan Poplin 096c23a473 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-06 11:10:38 -05:00
Ryan Poplin dc05b71e39 Updating Covariate interface with Mauricio to include an errorModel parameter. On the fly recalibration of base insertion and base deletion quals is live for the HaplotypeCaller 2012-02-06 11:10:24 -05:00
Guillermo del Angel 1e11408f8b Merged bug fix from Stable into Unstable 2012-02-06 10:34:26 -05:00
Guillermo del Angel 090d87b48b Bug fix in ValidationSiteSelector: when input vcf had genotypes and was multiallelic, the parsing of the AF/AC fields was wrong. Better logic to unify parsing of field 2012-02-06 10:33:12 -05:00
Eric Banks 9d94f310f1 Break AF histogram into max and min AFs 2012-02-06 09:01:19 -05:00
Ryan Poplin b7ffd144e8 Cleaning up the covariate classes and removing unused code from the bqsr optimizations in 2009. 2012-02-06 08:54:42 -05:00
Eric Banks cef550903e Minor optimization 2012-02-06 00:48:00 -05:00
Ryan Poplin 5343f8ba67 Initial version of on-the-fly, lazy loading base quality score recalibration. It isn't completely hooked up yet but I'm committing so Mauricio and Mark can see how I envision it will fit together. Look it over and give any feedback. With the exception of the Solid specific code we are very very close to being able to remove TableRecalibrationWalker from the code base and just replace it with PrintReads -BQSR recal.csv 2012-02-05 13:09:03 -05:00
Ryan Poplin f94d547e97 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-03 17:14:20 -05:00
Ryan Poplin 894d3340be Active Region Traversal should use GATKSAMRecords everywhere instead of SAMRecords. misc cleanup. 2012-02-03 17:13:52 -05:00
Mauricio Carneiro 4a57add6d0 First implementation of DiagnoseTargets
* calculates and interprets the coverage of a given interval track
   * allows to expand intervals by specified number of bases
   * classifies targets as CALLABLE, LOW_COVERAGE, EXCESSIVE_COVERAGE and POOR_QUALITY.
   * outputs text file for now (testing purposes only), soon to be VCF.
   * filters are overly aggressive for now.
2012-02-03 17:12:43 -05:00
Mauricio Carneiro 3dd6a1f962 Adding some generic sum and average functions to MathUtils 2012-02-03 17:12:43 -05:00
Mauricio Carneiro e1d69e4060 make the size of a GenomeLoc int instead of long
it will never be bigger than an int and it's actually useful to be an int so we can use it as parameters to array/list/hash size creation.
2012-02-03 17:12:42 -05:00
Ryan Poplin 0e44430e47 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-03 13:45:11 -05:00
Christopher Hartl aa3638ecb3 Merge branch 'master' of ssh://chartl@ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-03 13:42:09 -05:00
Eric Banks 3abfbcbcf2 Generalized the TDT for multi-allelic events 2012-02-03 12:23:21 -05:00
Ryan Poplin 601e53d633 Fix when specifying preset active regions with -AR argument 2012-02-02 16:34:26 -05:00
Christopher Hartl 0111505ea9 Terrible. Swapping the paternal and sample ids. 2012-02-02 11:41:16 -05:00