Commit Graph

6731 Commits (f534c2e7bb44b03250cf9a580f0cf5a43bc3ff91)

Author SHA1 Message Date
Khalid Shakir f534c2e7bb Merged bug fix from Stable into Unstable 2011-08-06 10:43:52 -04:00
Khalid Shakir eaa2f16d83 When a job finishes successfully in the ShellJobRunner, mark it as DONE instead of FAILED. 2011-08-06 10:42:04 -04:00
Guillermo del Angel a8eb8c27f0 a) Minor changes to indel consensus scripts to better reflect good default values, b) Fixed up Mills/Devine codec so it always produces correct ref padded bases, and added option to VariantsToVCF to fix reference base 2011-08-04 15:34:49 -04:00
Eric Banks 406982284c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-04 12:49:02 -04:00
Eric Banks e48492f3c3 Validate that the reference padding base for indels is correct. 2011-08-04 12:48:56 -04:00
Eric Banks f10588420c Fixing path to dbSNP file as the other one was replaced 2011-08-04 12:36:24 -04:00
Mauricio Carneiro 0739b7f75b Merged bug fix from Stable into Unstable 2011-08-04 11:07:25 -04:00
Mauricio Carneiro aff681e407 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-04 11:05:25 -04:00
Mauricio Carneiro fa97bd8ac1 Merged bug fix from Stable into Unstable 2011-08-04 09:52:10 -04:00
Mauricio Carneiro 23ec5b94cf fixed a missing check for null
There was a missed check for the case when you don't provide an indels vcf for the cleaner.
2011-08-04 09:50:02 -04:00
Eric Banks a831af1166 Another misprint when removing the references to -D 2011-08-03 21:29:21 -04:00
Mauricio Carneiro 8981367307 Updating memory usage for picard programs 2011-08-03 15:48:28 -04:00
Eric Banks f62f47d476 Not sure why this didn't fail before, but bringing VE up to date with previous changes 2011-08-03 14:27:07 -04:00
Eric Banks 3de10b1ef8 Fixing misprint from Ryan's commit 2011-08-03 12:37:50 -04:00
Eric Banks db2e0aaa1a Darn, forgot to update unit tests. 2011-08-03 12:31:08 -04:00
Eric Banks 020b2408a8 Adding integration test for left alignment of indels 2011-08-03 12:19:44 -04:00
Eric Banks f6648e0144 Don't left-align complex indels because it's too complicated. 2011-08-03 12:03:50 -04:00
Eric Banks 5dc324ff35 Dealing with merge confict 2011-08-03 11:03:47 -04:00
Eric Banks 7c89fe01b3 Instead of having the padded reference base be some hackish attribute it is now an actual variable in the Variant Context class. More importantly, we now always require that it be present when padding is necessary - and validate as such upon construction of the VC. This cleans up the interface significantly because we no longer require that a reference base be passed in when writing a VC/VCF record. 2011-08-03 11:00:36 -04:00
Khalid Shakir 3e043a633c Merged bug fix from Stable into Unstable 2011-08-03 02:23:16 -04:00
Khalid Shakir a587f38808 Fixed example unified genotyper pipeline to wrap filter expressions with quotes and use rod binding name "variant" instead of "vcf". 2011-08-03 02:21:01 -04:00
Khalid Shakir 5dcac7b064 GATKReport v0.2:
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
2011-08-03 00:24:47 -04:00
David Roazen d3437e62da Added a simple utility method Utils.optimumHashSize() to calculate the optimum
initial size for a Java hash table (HashMap, HashSet, etc.) given an expected
maximum number of elements. The optimum size is the smallest size that's
guaranteed not to result in any rehash / table-resize operations.

Example Usage:
Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements));

I think we're paying way too heavy a price in unnecessary rehash operations across
the GATK. If you don't specify an initial size, you get a table of size 16 that gets
completely rehashed and doubles in size every time it becomes 75% full. This means you
do at least twice as much work as you need to in order to populate your table:

(n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n
2011-08-02 21:59:06 -04:00
Guillermo del Angel df37716857 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 18:27:13 -04:00
Khalid Shakir daeabe6a2f Updated HybridSelectionPipelineTest's variant count and titv to match expectations based on ebanks removing strand bias.
TODO: update filtering/recalibrations in HSP.
2011-08-02 17:09:16 -04:00
Ryan Poplin 20afe2a437 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 15:34:52 -04:00
Ryan Poplin b2cde87378 Removing --DBSNP syntax from BQSR integration tests 2011-08-02 15:34:38 -04:00
Khalid Shakir c2dc7c9c99 Updated flanks eval the same way the targets eval was updated by ebanks. 2011-08-02 15:30:04 -04:00
Ryan Poplin c0653514b3 minor update to comment in UG 2011-08-02 13:34:48 -04:00
Ryan Poplin 2ba57bb502 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 13:30:46 -04:00
Ryan Poplin 38e4ae4176 minor update to comment in UG 2011-08-02 13:30:38 -04:00
Guillermo del Angel 821bbfa9e0 Bug fixes and enhancements to run whole-genome indel VQSR, removed old chr20-only code and cleanup 2011-08-02 13:17:20 -04:00
Eric Banks 65c5d55b72 Not sure how I missed these. These lines are now superfluous. 2011-08-02 12:48:36 -04:00
Eric Banks b9d0d2af22 Adding back temporarily removed integration test now that the file permissions have been fixed. 2011-08-02 12:39:11 -04:00
Eric Banks 9497c303e6 Temporarily comment out annotation from the pipeline test; David will re-enable. 2011-08-02 12:38:47 -04:00
Eric Banks 1c387848de No more use of -D in the integration tests but instead stick with VCFs only. Since all of these tests were duplicated (one each for dbSNP format and for VCF), we don't actually lose coverage in the integration tests. 2011-08-02 10:39:50 -04:00
Eric Banks a2ca994e9a Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 10:34:53 -04:00
Eric Banks 2c5e526eb7 Don't use the mismatch fraction by default in the RealignerTargetCreator (since it's only useful when using SW in the indel realigner). Also, no more use of -D but instead move over to using VCFs. One integration test is temporarily commented out while I wait for a VCF file to get fixed. 2011-08-02 10:34:46 -04:00
Eric Banks 5626199bb6 The Unified Genotyper now does NOT emit SLOD/SB by default; to compute SB use --computeSLOD 2011-08-02 10:14:21 -04:00
Guillermo del Angel a6a87006a5 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 08:26:05 -04:00
Guillermo del Angel 3b2d1dee66 Final changes to indel project consensus script 2011-08-02 08:25:50 -04:00
Eric Banks 3a9b6eacdf Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-01 11:23:18 -04:00
Mauricio Carneiro e7b4959ebe Merged bug fix from Stable into Unstable 2011-07-30 02:06:45 -04:00
Mauricio Carneiro 2d94037ad0 Remove temporary index files (*.bai)
some temporary index files were not being removed.
2011-07-30 02:05:22 -04:00
Ryan Poplin b06deac9ea Merged bug fix from Stable into Unstable 2011-07-29 10:02:36 -04:00
Ryan Poplin c0d4110ffd Correcting redundant warning text. 2011-07-29 10:01:11 -04:00
Eric Banks 33b32c4211 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-28 13:57:22 -04:00
Eric Banks 7a2a65155f Merged bug fix from Stable into Unstable 2011-07-28 13:56:43 -04:00
Eric Banks 1afc49a297 There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor. 2011-07-28 13:55:58 -04:00
Eric Banks 1865211b6d Merged bug fix from Stable into Unstable 2011-07-27 22:52:06 -04:00