Commit Graph

6710 Commits (5dcac7b0643cb03a4666d220441bfb1d8692b2ee)

Author SHA1 Message Date
Khalid Shakir 5dcac7b064 GATKReport v0.2:
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
2011-08-03 00:24:47 -04:00
David Roazen d3437e62da Added a simple utility method Utils.optimumHashSize() to calculate the optimum
initial size for a Java hash table (HashMap, HashSet, etc.) given an expected
maximum number of elements. The optimum size is the smallest size that's
guaranteed not to result in any rehash / table-resize operations.

Example Usage:
Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements));

I think we're paying way too heavy a price in unnecessary rehash operations across
the GATK. If you don't specify an initial size, you get a table of size 16 that gets
completely rehashed and doubles in size every time it becomes 75% full. This means you
do at least twice as much work as you need to in order to populate your table:

(n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n
2011-08-02 21:59:06 -04:00
Guillermo del Angel df37716857 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 18:27:13 -04:00
Khalid Shakir daeabe6a2f Updated HybridSelectionPipelineTest's variant count and titv to match expectations based on ebanks removing strand bias.
TODO: update filtering/recalibrations in HSP.
2011-08-02 17:09:16 -04:00
Ryan Poplin 20afe2a437 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 15:34:52 -04:00
Ryan Poplin b2cde87378 Removing --DBSNP syntax from BQSR integration tests 2011-08-02 15:34:38 -04:00
Khalid Shakir c2dc7c9c99 Updated flanks eval the same way the targets eval was updated by ebanks. 2011-08-02 15:30:04 -04:00
Ryan Poplin c0653514b3 minor update to comment in UG 2011-08-02 13:34:48 -04:00
Ryan Poplin 2ba57bb502 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 13:30:46 -04:00
Ryan Poplin 38e4ae4176 minor update to comment in UG 2011-08-02 13:30:38 -04:00
Guillermo del Angel 821bbfa9e0 Bug fixes and enhancements to run whole-genome indel VQSR, removed old chr20-only code and cleanup 2011-08-02 13:17:20 -04:00
Eric Banks 65c5d55b72 Not sure how I missed these. These lines are now superfluous. 2011-08-02 12:48:36 -04:00
Eric Banks b9d0d2af22 Adding back temporarily removed integration test now that the file permissions have been fixed. 2011-08-02 12:39:11 -04:00
Eric Banks 9497c303e6 Temporarily comment out annotation from the pipeline test; David will re-enable. 2011-08-02 12:38:47 -04:00
Eric Banks 1c387848de No more use of -D in the integration tests but instead stick with VCFs only. Since all of these tests were duplicated (one each for dbSNP format and for VCF), we don't actually lose coverage in the integration tests. 2011-08-02 10:39:50 -04:00
Eric Banks a2ca994e9a Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 10:34:53 -04:00
Eric Banks 2c5e526eb7 Don't use the mismatch fraction by default in the RealignerTargetCreator (since it's only useful when using SW in the indel realigner). Also, no more use of -D but instead move over to using VCFs. One integration test is temporarily commented out while I wait for a VCF file to get fixed. 2011-08-02 10:34:46 -04:00
Eric Banks 5626199bb6 The Unified Genotyper now does NOT emit SLOD/SB by default; to compute SB use --computeSLOD 2011-08-02 10:14:21 -04:00
Guillermo del Angel a6a87006a5 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 08:26:05 -04:00
Guillermo del Angel 3b2d1dee66 Final changes to indel project consensus script 2011-08-02 08:25:50 -04:00
Eric Banks 3a9b6eacdf Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-01 11:23:18 -04:00
Mauricio Carneiro e7b4959ebe Merged bug fix from Stable into Unstable 2011-07-30 02:06:45 -04:00
Mauricio Carneiro 2d94037ad0 Remove temporary index files (*.bai)
some temporary index files were not being removed.
2011-07-30 02:05:22 -04:00
Ryan Poplin b06deac9ea Merged bug fix from Stable into Unstable 2011-07-29 10:02:36 -04:00
Ryan Poplin c0d4110ffd Correcting redundant warning text. 2011-07-29 10:01:11 -04:00
Eric Banks 33b32c4211 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-28 13:57:22 -04:00
Eric Banks 7a2a65155f Merged bug fix from Stable into Unstable 2011-07-28 13:56:43 -04:00
Eric Banks 1afc49a297 There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor. 2011-07-28 13:55:58 -04:00
Eric Banks 1865211b6d Merged bug fix from Stable into Unstable 2011-07-27 22:52:06 -04:00
Eric Banks 6230315ff2 Along with my half-written commit message from earlier, I also forgot to commit the integration test updates. This is what happens when you try to do things 30 seconds before you leave for the day. To finish up from before: complex events weren't being padded with the reference base as per the VCF spec. They are now. 2011-07-27 22:51:21 -04:00
Eric Banks ff31fa7990 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-27 16:15:23 -04:00
Eric Banks 5809a61b20 Merged bug fix from Stable into Unstable 2011-07-27 16:14:59 -04:00
Eric Banks 64aad67b5f Fixing dbSNP adaptor for complex indels (wasn) 2011-07-27 16:13:45 -04:00
Kiran V Garimella ab69b8e4ee Merged bug fix from Stable into Unstable 2011-07-27 12:37:34 -04:00
Kiran V Garimella 6ebd83478b Fixed build.xml to reflect path changes for gsalib 2011-07-27 12:37:00 -04:00
Kiran V Garimella fe52f2dd8c Merged bug fix from Stable into Unstable 2011-07-27 12:30:15 -04:00
Kiran V Garimella ca35defdcd Moved gsalib sources from private/ to public/ 2011-07-27 12:29:43 -04:00
Kiran V Garimella ada2f21976 Revert "Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable"
This reverts commit 9c81ef835a3ac581d4eb9cf1243e30df20a46795, reversing
changes made to f23d3ad5aec1c70cc1ecc48b295258aa70d30c7d.
2011-07-27 12:27:17 -04:00
Kiran V Garimella 86d38b7f0b Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-27 10:35:19 -04:00
Kiran V Garimella dc8061e7a6 Moved gsalib from private/ to public/ 2011-07-27 10:34:56 -04:00
Mauricio Carneiro e607461db1 leftover </ol> removed... 2011-07-26 19:31:33 -04:00
Mauricio Carneiro 20a3b31b61 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-26 19:29:45 -04:00
Mauricio Carneiro 321afac4e8 Updates to the help layout.
*New style.css, new template for the walker auto-generated html. Short description is no longer repeated in the long description of the walker.

 *Updated DiffObjectsWalker and ContigStatsWalker as "reference" documented walkers.
2011-07-26 19:29:25 -04:00
Kiran V Garimella 405e521d44 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-26 17:56:48 -04:00
Kiran V Garimella 92a11ed8dc Updated MD5 for PhaseByTransmissionIntegrationTest 2011-07-26 17:52:25 -04:00
Kiran V Garimella 412c466de6 Bug fix, wherein triple-hets after genotype refinement need to be left unphased, not just prior to refinement 2011-07-26 17:43:43 -04:00
Mark DePristo 81f8e05bfa Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-26 17:35:46 -04:00
Mark DePristo f6a5e0e36a Go for global integrationtest path first, if possible. 2011-07-26 17:35:30 -04:00
Kiran V Garimella 36daaa7bda Extract GA, AR2, and DR2 from the BEAGLE output 2011-07-26 17:29:23 -04:00
Matt Hanna fec495e292 Fix a nasty little bug in the sharding system: if the last shard in contig n
overlaps exactly on disk with the first shard in contig n+1, the shards
would be merged together to avoid duplicate extraction.  Unfortunately,
the interval overlap filter couldn't handle shards spanning contigs, and
was choosing to filter out reads from contig n+1 which should have been
included.
I'm not completely sure why the BAM indexing code would ever specify that the
end of one chromosome had the same on-disk location as the start of the next
one.  I suspect that this is a indexer performance bug.
2011-07-26 15:43:20 -04:00