Commit Graph

7652 Commits (c4dfc1fb8bf9b69bd620bcb1b1b2490a31751140)

Author SHA1 Message Date
Eric Banks c4dfc1fb8b Temporary commit of parallelization support for RealignerTargetCreator. Tim begged us for this and I got assurances from Khalid/Matt that this would also be extremely helpful for the whole genome calling pipeline, so I spent a while working on this. Needs to be fixed up though because apparently only the leaves in the hierarchical reduce get their output aggregated. Worked out a better solution with Matt. 2011-10-06 13:41:36 -04:00
Eric Banks c3eff7451a Found a small inefficiency while profiling: we were still using String.split instead of ParsingUtils.split to break up array values in the INFO field. There was a noticeable (albeit not big) difference in the change when reading sites only files. 2011-10-03 14:20:39 -04:00
Andrey Sivachenko c7898a9be7 inconsequential change in string constants printed into the vcf which noone uses anyway... 2011-09-30 16:40:21 -04:00
Mark DePristo 010899f886 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-30 15:51:09 -04:00
Mauricio Carneiro 05fba6f23a Clipping ends inside deletion and before insertion
fixed.
2011-09-30 15:44:43 -04:00
Mauricio Carneiro e0b771c233 Pre-sorting output
sorting variable regions independently to make the output pre-sorted. Major speedup.
2011-09-30 15:44:43 -04:00
Mauricio Carneiro b5bdea1cb9 Fix: hardclip first interval overlap but overlap subsequent intervals
If a read had the left tail clipped (that would overlap the current interval) but the right tail overlaps other intervals, this was triggering out of order reads to get processed before they should. Fixed.
2011-09-30 15:44:43 -04:00
Mauricio Carneiro d211b3b7f0 Fixing interval overlap logic
Complete re-write of the optimized interval overlap logic. Added lots of exceptions to trap  future wrongdoings.
2011-09-30 15:44:42 -04:00
Ryan Poplin 73577b72cf Misc minor updates in HaplotypeCaller 2011-09-30 13:41:24 -04:00
Ryan Poplin af6c053435 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-30 13:33:31 -04:00
Mark DePristo a881d6f145 Now only generates the poly VCF with select variants if the file doesn't exist 2011-09-30 08:42:09 -04:00
Mark DePristo d901fed617 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-30 08:41:44 -04:00
Mauricio Carneiro cabacf028d Intermediate commit to fix interval skipping
may need additional testing.
2011-09-29 18:45:12 -04:00
Mark DePristo 98ecaf8aa0 Support for ReducedReads with reduced counts and average quals
-- ReadUtils and UnitTest updated to support new byte[] style
-- Removed unnecessary read transformer in PairHMM
2011-09-29 17:18:39 -04:00
Mauricio Carneiro 9508220157 fixed hard clipping both ends inside deletion
If both ends of the interval falls within a deletion in the read then hardClipBothEnds would cut the right tail first including the entire deletion, then fail to cut the left tail because there would not be any bases there anymore. Fixed.
2011-09-29 15:36:49 -04:00
Mauricio Carneiro a5e75cd14c Outputting both consensus base qualities and counts
The base qualities of a consensus reads are now the average quality of the bases forming the consensus base (most common base) and the consensus quality tag now carry an array with the counts of each base in the consensus. This should increase file size but improve calling sensitivity/specificity.
2011-09-29 12:54:41 -04:00
Mauricio Carneiro d62f2f33bc Added indel specific context size parameter
Parameter was added to the framework but implementing the functionality is pending.
2011-09-29 12:54:41 -04:00
Mauricio Carneiro 21c4abdd36 Disabling all SlidingReadUnitTests 2011-09-29 12:20:35 -04:00
Mauricio Carneiro 4086fa768f Disabling all ReadClipperUnitTests 2011-09-29 12:20:35 -04:00
Khalid Shakir 6dec932ca9 Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-29 11:47:13 -04:00
Khalid Shakir c08468eb9d A couple of updates while trying to get desired R 2.13 compactPDF support.
preQC:
- For R 2.13 when parsing fingerprints explicitly coercing the text before parsing
- Added LOD geom_line() at +/-3 based on Tim's presentation at PM meeting (ppt to go to pipeline wiki asap)
- PF_INDEL_RATE of zero replaced with NA
- NA's are not "violations" auto filter samples since 0+NA = NA, and subset test only looks for 0 violations
- Restored plots for MEAN_READ_LENGTH, BAD_CYCLES, and MEDIAN_INSERT_SIZE by explicitly print()'ing the created plots

postQC:
- Fixed R 2.13 font scaling by moving size out of aes, except when using highlighting
- TODO: Don't know how to scale by aes for highlighting *and* use a smaller overall font size outside aes
2011-09-29 11:21:50 -04:00
Ryan Poplin e366ee18bc Adding ability to read in and make use of kmer quality tables during HMM evaluation 2011-09-29 07:46:19 -04:00
Ryan Poplin 9e32ede5b2 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-29 07:40:08 -04:00
Mauricio Carneiro fc86cd6fd8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr 2011-09-29 00:12:15 -04:00
Roger Zurawicki 4fd5630f6a Added ReadClipper Unit Test
* Includes tests that include HardClip to Read and Reference Coords.
* Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing
2011-09-28 23:13:50 -04:00
Mauricio Carneiro f49a12de6b Updating latest changes from the repository to reduce reads repo 2011-09-28 22:31:57 -04:00
Matt Hanna 9272ed03b5 Merged bug fix from Stable into Unstable 2011-09-28 21:26:43 -04:00
Matt Hanna 0acaf2df65 Fix an embarrassing issue where a specific configuration of minimal coverage
over small intervals could cause reads to be dropped from the pileup.  Nothing
to see here...
2011-09-28 21:23:01 -04:00
Roger Zurawicki 07b0a75d96 Added SlidingRead Unit Test
Includes test clipStart and trimToVariableRegion
2011-09-28 21:22:57 -04:00
Khalid Shakir c5f1a4325f Updated preQC:
- full 8.5x11
- concating multiple initiatives / bait_sets
- Using NA instead of python None when WR dates are unavailable
- In new aggregations where the sample may have per library metrics, only using the sample level metrics, i.e. library is null
Updated postQC:
- Renamed some variables to assist with traceback()
- Fixed crashes on batches with two alleles or two samples such as Seminara_MC_1_09222011 or Engle_MC_2_09222011
- Added dependency tracking to PostCallingQC.scala so that the R script does try to run before the evals are complete
Other minor cleanup.
Tried to use R 2.13 compactPDF but a few issues to work out with fingerprint boxplots in preQC and geom_text font size in postQC.
2011-09-28 20:23:30 -04:00
Mauricio Carneiro edf852d47d Adding lists to ReduceReads script
script can handle single file or list of files separately now. Always scatter/gathering.
2011-09-28 18:40:30 -04:00
Mauricio Carneiro 64e7b3000c Fix read spans deletion through the entire interval
if the read has a deletion that spans the entire length of the interval, it should not be added to mapped reads.
2011-09-28 18:40:30 -04:00
Mauricio Carneiro a93ece07e3 ScatterGatherable reduce reads script
Get your reduce read in a matter of seconds...
2011-09-28 18:40:30 -04:00
Guillermo del Angel c8d3a720f9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 18:17:34 -04:00
Guillermo del Angel 7e3cb45093 Further performance optim in banded hmm, about 60% speed improvement over current implementation now 2011-09-28 16:27:28 -04:00
Ryan Poplin 1b1ca80df2 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 16:17:39 -04:00
Ryan Poplin 3b73dc89fe Making several esoteric arguments in the BQSR @Hidden. Adding basic support for Complete Genomics machine cycle. 2011-09-28 16:17:31 -04:00
Mauricio Carneiro ff2f4df043 Fixed hardclipping inside indel (right tail)
when hard clipping the right tail of a read falls inside a deletion, clipping should fall back to the last base before the deletion to follow the ReadClipper's contract.
2011-09-28 16:07:34 -04:00
Mauricio Carneiro 3c7b7f74ef Optimized interval iteration
Using a TreedSet to manipulate getToolkit.getIntervals() and being smart about which intervals to test makes interval clipping O(1) instead of O(n).
2011-09-28 16:07:34 -04:00
Mauricio Carneiro 5c9b659c02 clipping both ends of the reads was modifying the original read
This goes against the ReadClipper contract, and was affecting the second part of the read that spans over multiple intervals. Fixed.
2011-09-28 16:07:34 -04:00
Guillermo del Angel fe23e4d10c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 15:53:11 -04:00
Guillermo del Angel e2b9030e93 First mostly fully functional implementation of banded pair HMM likelihood computation for indel caller. More experimentation to follow but it right now works in small data sets and at least it doesn't break existing things. Disabled by default at this point 2011-09-28 15:51:48 -04:00
Eric Banks 1b45f21774 Removing this command-line tool. Purposely not doing this in stable so that users who may still use it have time to find other options. But the docs are no longer on the wiki. 2011-09-28 13:18:32 -04:00
Eric Banks 1f0e354fae Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 13:13:21 -04:00
Eric Banks bb619a9a3c Fixing docs 2011-09-28 13:13:03 -04:00
Mark DePristo 5812004e06 Merge branch 'stable' 2011-09-28 11:36:40 -04:00
Mark DePristo a88b7c1203 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-28 11:36:33 -04:00
Mark DePristo a5006831d7 Shows "" not empty space when default string value is "" 2011-09-28 11:35:52 -04:00
Mark DePristo 1e32281a15 Fix to not show -null when missing short name argument 2011-09-28 11:31:20 -04:00
Ryan Poplin fd287da189 Final tweaking of Smith-Waterman parameters based on large-scale parameter search with Queue 2011-09-28 11:22:01 -04:00