Mauricio Carneiro
a7c6f255e9
Adding the old gatherer to BQSR
...
for now, the old gatherer will still work for us to scatter/gather our tests.
2012-02-10 13:33:57 -05:00
Mauricio Carneiro
1fb19a0f98
Moving the covariates and shared functionality to public
...
so Ryan can work on the recalibration on the fly without breaking the build. Supposedly all the secret sauce is in the BQSR walker, which sits in private.
2012-02-10 11:44:01 -05:00
Mark DePristo
48cc4b913a
bugfix for incremental refresh in gsafolkLSFlogs
2012-02-10 11:30:51 -05:00
Eric Banks
8ad4046642
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-10 11:18:02 -05:00
Mark DePristo
0722df46db
gsafolkLSFLogs creates a subset of LSF MySQL db related to gsafolk only
...
Creates a SQL table in the MySQL server calcium at the Broad that contains only
key information about the LSF usage of members of the gsafolk fairshare group
Does this by first building a list of gsafolk uids, selecting lsf info from matter's
table, and inserts this information into the gsafolk_lsf queue as part of the
GATK schema. The standard way to run this is with incremental refreshes enabled,
so that the program only fetches new raw lsf records with timestamps beyond the
max timestamp present in the GATK LSF table.
The default way to run this program is via cron with 'python private/python/gsafolkLSFLogs.py'
2012-02-10 11:12:20 -05:00
Eric Banks
5e18020a5f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-10 11:08:33 -05:00
Eric Banks
f53cd3de1b
Based on Ryan's suggestion, there's a new contract for genotyping multiple alleles. Now the requester submits alleles in any arbitrary order - rankings aren't needed. If the Exact model decides that it needs to subset the alleles because too many were requested, it does so based on PL mass (in other words, I moved this code from the SNPGenotypeLikelihoodsCalculationModel to the Exact model). Now subsetting alleles is consistent.
2012-02-10 11:07:32 -05:00
Mauricio Carneiro
5af373a3a1
BQSR with indels integrated!
...
* added support to base before deletion in the pileup
* refactored covariates to operate on mismatches, insertions and deletions at the same time
* all code is in private so original BQSR is still working as usual in public
* outputs a molten CSV with mismatches, insertions and deletions, time to play!
* barely tested, passes my very simple tests... haven't tested edge cases.
2012-02-09 18:46:45 -05:00
Eric Banks
7a937dd1eb
Several bug fixes to new genotyping strategy. Update integration tests for multi-allelic indels accordingly.
2012-02-09 16:14:22 -05:00
Eric Banks
0f728a0604
The Exact model now subsets the VC to the first N alleles when the VC contains more than the maximum number of alleles (instead of throwing it out completely as it did previously). [Perhaps the culling should be done by the UG engine? But theoretically the Exact model can be called outside of the UG and we'd still want the context subsetted.]
2012-02-09 14:02:34 -05:00
Matt Hanna
aa097a83d5
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-09 11:26:48 -05:00
Matt Hanna
b57d4250bf
Documentation request by Eric. At each stage of the GATK where filtering occurs, added documentation suggesting the goal of the filtering along with examples of suggested inputs and outputs.
2012-02-09 11:24:52 -05:00
Ryan Poplin
5b3d875833
Incorporating Mark's suggestions on the AnalyzeCovariates plots.
2012-02-09 09:05:08 -05:00
Mauricio Carneiro
d561914d4f
Revert "First implementation of GATKReportGatherer"
...
premature push from my part. Roger is still working on the new format and we need to update the other tools to operate correctly with the new GATKReport.
This reverts commit aea0de314220810c2666055dc75f04f9010436ad.
2012-02-08 23:28:55 -05:00
Ryan Poplin
270b160d87
Incorporating feedback from Mauricio on the plots
2012-02-08 16:26:32 -05:00
Ryan Poplin
4316437a62
Initial version of R script that will be called by new BQSR to replace the entire AnalyzeCovariates program
2012-02-08 16:00:12 -05:00
Eric Banks
2f800b078c
Changes to default behavior of UG: multi-allelic mode is always on; max number of alternate alleles to genotype is 3; alleles in the SNP model are ranked by their likelihood sum (Guillermo will do this for indels); SB is computed again.
2012-02-08 15:27:16 -05:00
Matt Hanna
51ac87b28c
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-08 08:43:55 -05:00
Matt Hanna
5b58fe741a
Retiring Picard customizations for async I/O and cleaning up parts of the code to use common Picard utilities I recently discovered.
...
Also embedded bug fix for issues reading sparse shards and did some cleanup based on comments during BAM reading code transition meetings.
2012-02-08 08:34:37 -05:00
Khalid Shakir
cda1e1b207
Minor manual merge update for List class to Seq interface usage.
2012-02-08 02:24:54 -05:00
Khalid Shakir
ef74363b1b
Merged bug fix from Stable into Unstable
2012-02-08 02:14:26 -05:00
Khalid Shakir
23e7f1bed9
When an interval list specifies overlapping intervals merge them before scattering.
2012-02-08 02:12:16 -05:00
Mauricio Carneiro
f30731f19b
these were not supposed to be committed. Pulling it out
2012-02-07 21:38:39 -05:00
Mauricio Carneiro
337819e791
disabling the test while we fix it
2012-02-07 19:22:32 -05:00
Roger Zurawicki
c0c676590b
First implementation of GATKReportGatherer
...
- Added the GATKReportGatherer
- Added private methods in GATKReport to combine Tables and Reports
- It is very conservative and it will only gather if the table columns, match.
- At the column level it uses the (redundant) row ids to add new rows. It will throw an exception if it is overwriting data.
Added the gatherer functions to CoverageByRG
Also added the scatterCount parameter in the Interval Coverage script
Made some more GATKReport methods public
The UnitTest included shows that the merging methods work
Added a getter for the PrimaryKeyName
Fixed bugs that prevented the gatherer form working
Working GATKReportGatherer
Has only the functional to addLines
The input file parser assumes that the first column is the primary key
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-02-07 18:14:47 -05:00
Mauricio Carneiro
e89887cd8e
laying groundwork to have insertions and deletions going through the system.
2012-02-07 18:11:53 -05:00
Mauricio Carneiro
0d3ea0401c
BQSR Parameter cleanup
...
* get rid of 320C argument that nobody uses.
* get rid of DEFAULT_READ_GROUP parameter and functionality (later to become an engine argument).
2012-02-07 14:42:11 -05:00
Eric Banks
717cd4b912
Document -L unmapped
2012-02-07 13:30:54 -05:00
Eric Banks
718da7757e
Fixes to ValidateVariants as per GS post: ref base of mixed alleles were sometimes wrong, error print out of bad ACs was throwing a RuntimeException, don't validate ACs if there are no genotypes.
2012-02-07 13:15:58 -05:00
Ryan Poplin
a6477e558a
adding docs to HaplotypeCaller
2012-02-07 09:37:32 -05:00
Eric Banks
9d1a19bbaa
Multi-allelic indels were not being printed out correctly in VariantsToTable; fixed.
2012-02-06 22:49:29 -05:00
Mauricio Carneiro
5961868a7f
fixup for BQSR (HC integration tests)
...
In the new BQSR implementation, covariates do depend on the RecalibrationArgumentCollection.
2012-02-06 22:47:27 -05:00
Mauricio Carneiro
6e6f0f10e1
BaseQualityScoreRecalibration walker (bqsr v2) first commit includes
...
* Adding the context covariate standard in both modes (including old CountCovariates) with parameters
* Updating all covariates and modules to use GATKSAMRecord throughout the code.
* BQSR now processes indels in the pileup (but doesn't do anything with them yet)
2012-02-06 17:38:29 -05:00
Eric Banks
0717c79901
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-06 16:23:36 -05:00
Eric Banks
91897f5fe7
Transpose rows/cols in AF table to make it molten (so I can plot easily in R)
2012-02-06 16:23:32 -05:00
Guillermo del Angel
fb5786385c
Merged bug fix from Stable into Unstable
2012-02-06 13:22:56 -05:00
Guillermo del Angel
6ec686b877
Complement to previous commit: make sure we also don't inherit filter from input VCF when genotyping at an empty site
2012-02-06 13:19:26 -05:00
Guillermo del Angel
93ffca1e3a
Merged bug fix from Stable into Unstable
2012-02-06 11:58:58 -05:00
Guillermo del Angel
827be878b4
Bug fix when running UG in GenotypeGivenAlleles mode: if an input site to genotype had no coverage, the output VCF had AC,AF and AN inherited from input VCF, which could have nothing to do with given BAM so numbers could be non-sensical. Now new vc has clear attributes instead of attributes inherited from input VCF.
2012-02-06 11:58:13 -05:00
Eric Banks
fbbd04621d
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-06 11:53:31 -05:00
Eric Banks
edb4edc08f
Commented out unused metrics for now
2012-02-06 11:53:15 -05:00
Ryan Poplin
096c23a473
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-06 11:10:38 -05:00
Ryan Poplin
dc05b71e39
Updating Covariate interface with Mauricio to include an errorModel parameter. On the fly recalibration of base insertion and base deletion quals is live for the HaplotypeCaller
2012-02-06 11:10:24 -05:00
Guillermo del Angel
1e11408f8b
Merged bug fix from Stable into Unstable
2012-02-06 10:34:26 -05:00
Guillermo del Angel
090d87b48b
Bug fix in ValidationSiteSelector: when input vcf had genotypes and was multiallelic, the parsing of the AF/AC fields was wrong. Better logic to unify parsing of field
2012-02-06 10:33:12 -05:00
Eric Banks
9d94f310f1
Break AF histogram into max and min AFs
2012-02-06 09:01:19 -05:00
Ryan Poplin
b7ffd144e8
Cleaning up the covariate classes and removing unused code from the bqsr optimizations in 2009.
2012-02-06 08:54:42 -05:00
Eric Banks
cef550903e
Minor optimization
2012-02-06 00:48:00 -05:00
Ryan Poplin
5343f8ba67
Initial version of on-the-fly, lazy loading base quality score recalibration. It isn't completely hooked up yet but I'm committing so Mauricio and Mark can see how I envision it will fit together. Look it over and give any feedback. With the exception of the Solid specific code we are very very close to being able to remove TableRecalibrationWalker from the code base and just replace it with PrintReads -BQSR recal.csv
2012-02-05 13:09:03 -05:00
Mark DePristo
2cd33b2f1f
Better display of LSF usage for gsafolk
2012-02-04 08:22:55 -05:00