Commit Graph

7638 Commits (93f7e632bd2febd5f8af2e846bf054893997dee0)

Author SHA1 Message Date
Guillermo del Angel 93f7e632bd Minor fix/enhancement for VariantEval: if a vcf has symbolic alleles, program would crash ungracefully - now we'll just skip record without processing. This is a big issue since we can't process 1000G integration files with code as is. 2011-10-06 10:07:46 -04:00
Guillermo del Angel f75573dd54 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-30 10:35:26 -04:00
Guillermo del Angel 6637cd9dd5 Fixes to CalibrateGenotypeLikelihoods: a) Fix up the walker itself which was broken since the new rod binding system, b) added ability to do indels and to optionally test banded implementation, c) experimental ability to do multiple samples (may need more work) 2011-09-30 10:35:10 -04:00
Mauricio Carneiro cabacf028d Intermediate commit to fix interval skipping
may need additional testing.
2011-09-29 18:45:12 -04:00
Mauricio Carneiro 9508220157 fixed hard clipping both ends inside deletion
If both ends of the interval falls within a deletion in the read then hardClipBothEnds would cut the right tail first including the entire deletion, then fail to cut the left tail because there would not be any bases there anymore. Fixed.
2011-09-29 15:36:49 -04:00
Mauricio Carneiro a5e75cd14c Outputting both consensus base qualities and counts
The base qualities of a consensus reads are now the average quality of the bases forming the consensus base (most common base) and the consensus quality tag now carry an array with the counts of each base in the consensus. This should increase file size but improve calling sensitivity/specificity.
2011-09-29 12:54:41 -04:00
Mauricio Carneiro d62f2f33bc Added indel specific context size parameter
Parameter was added to the framework but implementing the functionality is pending.
2011-09-29 12:54:41 -04:00
Mauricio Carneiro 21c4abdd36 Disabling all SlidingReadUnitTests 2011-09-29 12:20:35 -04:00
Mauricio Carneiro 4086fa768f Disabling all ReadClipperUnitTests 2011-09-29 12:20:35 -04:00
Khalid Shakir 6dec932ca9 Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-29 11:47:13 -04:00
Khalid Shakir c08468eb9d A couple of updates while trying to get desired R 2.13 compactPDF support.
preQC:
- For R 2.13 when parsing fingerprints explicitly coercing the text before parsing
- Added LOD geom_line() at +/-3 based on Tim's presentation at PM meeting (ppt to go to pipeline wiki asap)
- PF_INDEL_RATE of zero replaced with NA
- NA's are not "violations" auto filter samples since 0+NA = NA, and subset test only looks for 0 violations
- Restored plots for MEAN_READ_LENGTH, BAD_CYCLES, and MEDIAN_INSERT_SIZE by explicitly print()'ing the created plots

postQC:
- Fixed R 2.13 font scaling by moving size out of aes, except when using highlighting
- TODO: Don't know how to scale by aes for highlighting *and* use a smaller overall font size outside aes
2011-09-29 11:21:50 -04:00
Mauricio Carneiro fc86cd6fd8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr 2011-09-29 00:12:15 -04:00
Roger Zurawicki 4fd5630f6a Added ReadClipper Unit Test
* Includes tests that include HardClip to Read and Reference Coords.
* Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing
2011-09-28 23:13:50 -04:00
Mauricio Carneiro f49a12de6b Updating latest changes from the repository to reduce reads repo 2011-09-28 22:31:57 -04:00
Matt Hanna 9272ed03b5 Merged bug fix from Stable into Unstable 2011-09-28 21:26:43 -04:00
Matt Hanna 0acaf2df65 Fix an embarrassing issue where a specific configuration of minimal coverage
over small intervals could cause reads to be dropped from the pileup.  Nothing
to see here...
2011-09-28 21:23:01 -04:00
Roger Zurawicki 07b0a75d96 Added SlidingRead Unit Test
Includes test clipStart and trimToVariableRegion
2011-09-28 21:22:57 -04:00
Khalid Shakir c5f1a4325f Updated preQC:
- full 8.5x11
- concating multiple initiatives / bait_sets
- Using NA instead of python None when WR dates are unavailable
- In new aggregations where the sample may have per library metrics, only using the sample level metrics, i.e. library is null
Updated postQC:
- Renamed some variables to assist with traceback()
- Fixed crashes on batches with two alleles or two samples such as Seminara_MC_1_09222011 or Engle_MC_2_09222011
- Added dependency tracking to PostCallingQC.scala so that the R script does try to run before the evals are complete
Other minor cleanup.
Tried to use R 2.13 compactPDF but a few issues to work out with fingerprint boxplots in preQC and geom_text font size in postQC.
2011-09-28 20:23:30 -04:00
Mauricio Carneiro edf852d47d Adding lists to ReduceReads script
script can handle single file or list of files separately now. Always scatter/gathering.
2011-09-28 18:40:30 -04:00
Mauricio Carneiro 64e7b3000c Fix read spans deletion through the entire interval
if the read has a deletion that spans the entire length of the interval, it should not be added to mapped reads.
2011-09-28 18:40:30 -04:00
Mauricio Carneiro a93ece07e3 ScatterGatherable reduce reads script
Get your reduce read in a matter of seconds...
2011-09-28 18:40:30 -04:00
Guillermo del Angel c8d3a720f9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 18:17:34 -04:00
Guillermo del Angel 7e3cb45093 Further performance optim in banded hmm, about 60% speed improvement over current implementation now 2011-09-28 16:27:28 -04:00
Ryan Poplin 1b1ca80df2 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 16:17:39 -04:00
Ryan Poplin 3b73dc89fe Making several esoteric arguments in the BQSR @Hidden. Adding basic support for Complete Genomics machine cycle. 2011-09-28 16:17:31 -04:00
Mauricio Carneiro ff2f4df043 Fixed hardclipping inside indel (right tail)
when hard clipping the right tail of a read falls inside a deletion, clipping should fall back to the last base before the deletion to follow the ReadClipper's contract.
2011-09-28 16:07:34 -04:00
Mauricio Carneiro 3c7b7f74ef Optimized interval iteration
Using a TreedSet to manipulate getToolkit.getIntervals() and being smart about which intervals to test makes interval clipping O(1) instead of O(n).
2011-09-28 16:07:34 -04:00
Mauricio Carneiro 5c9b659c02 clipping both ends of the reads was modifying the original read
This goes against the ReadClipper contract, and was affecting the second part of the read that spans over multiple intervals. Fixed.
2011-09-28 16:07:34 -04:00
Guillermo del Angel fe23e4d10c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 15:53:11 -04:00
Guillermo del Angel e2b9030e93 First mostly fully functional implementation of banded pair HMM likelihood computation for indel caller. More experimentation to follow but it right now works in small data sets and at least it doesn't break existing things. Disabled by default at this point 2011-09-28 15:51:48 -04:00
Eric Banks 1b45f21774 Removing this command-line tool. Purposely not doing this in stable so that users who may still use it have time to find other options. But the docs are no longer on the wiki. 2011-09-28 13:18:32 -04:00
Eric Banks 1f0e354fae Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 13:13:21 -04:00
Eric Banks bb619a9a3c Fixing docs 2011-09-28 13:13:03 -04:00
Mark DePristo 5812004e06 Merge branch 'stable' 2011-09-28 11:36:40 -04:00
Mark DePristo a88b7c1203 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 11:36:33 -04:00
Mark DePristo a5006831d7 Shows "" not empty space when default string value is "" 2011-09-28 11:35:52 -04:00
Mark DePristo 1e32281a15 Fix to not show -null when missing short name argument 2011-09-28 11:31:20 -04:00
Mauricio Carneiro 89544c209c Fixing contracts
changed return type to Pair, changing contracts accordingly.
2011-09-28 11:19:17 -04:00
Mark DePristo 2e2463633f Queue script to find missing calls between full and reduced bams 2011-09-28 11:17:25 -04:00
Eric Banks eacbee3fe5 Merged bug fix from Stable into Unstable 2011-09-27 20:35:18 -04:00
Eric Banks 43b0c98298 Fix docs 2011-09-27 20:34:46 -04:00
Eric Banks 232a6df11c Add longhand form to the error message. 2011-09-27 20:29:31 -04:00
Eric Banks 1d6fcb6eb1 Revert "Add longhand form to the error message to prevent users from posting borderline dumb posts to GS."
This reverts commit 75b2600527cfce05ae683cb394290ff2a80e8552.
2011-09-27 20:27:00 -04:00
Eric Banks 269b9826b6 Add longhand form to the error message to prevent users from posting borderline dumb posts to GS. 2011-09-27 20:26:36 -04:00
Mauricio Carneiro 3b6e43b7c4 Use reads that span multiple intervals
* RR will now compress reads that span across multiple intervals correctly and output them in the correct order.
* Fixed bug in getReadCoordinateForReferenceCoordinate where if the requested reference coordinate fell inside a deletion in the read the read would be clipped up to one element past the deletion.
2011-09-27 18:39:06 -04:00
Khalid Shakir 84bd355690 Merged bug fix from Stable into Unstable 2011-09-27 14:34:39 -04:00
Khalid Shakir b090751f62 Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Matt Hanna db785eb50d Fix a bug where no fingerprint LODs across an entire project would cause the
R script to blow up.

Also, correct the sample names displayed at the bottom of the fingerprint plot;
previously, it displayed the order of the sample in a sequence sorted by
last sequencing date.
2011-09-27 12:39:25 -04:00
Matt Hanna e5ce5e265a Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-27 11:07:01 -04:00
Eric Banks 26e71f6688 The Omni files have multiple records (with the same ALT) at a particular location, with one PASSing and the other(s) filtered. Chris, this is why using this file as both eval and comp leads to ref/no-call cells in the GenotypeConcordance table. However, this led to non-determinism in VE because the VCs were placed in a HashSet; we use a LinkedHashMap instead to bring back determinism. 2011-09-27 11:03:17 -04:00