-- Added 'jobRunnerJobName' definition to QFunction, defaults to value of shortDescription
-- Edited Lsf and Drmaa JobRunners to use this string instead of description for naming jobs in the scheduler
Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
We have generalized the processing script to be able to handle multiple scenarios. Originally it was
designed for PCR free data only, we added all the steps necessary to start from fastq and process
RNA-seq as well as non-human data. This is our go to script in TechDev.
* add optional "starting from fastq" path to the pipeline
* add mark duplicates (optionally) to the pipeline
* add an option to run with the mouse data (without dbsnp and with single ended fastq)
* add option to process RNA-seq data from topHat (add RG and reassign mapping quality if necessary)
* add option to filter or include reads with N in the cigar string
* add parameter to allow keeping the intermediate files
1. Updated NIST path to its proper place on the file system (and updated the NIST calls to the latest, v2.17).
2. Don't assess genotype concordance for multi-allelic sites because we mess up the GTs when we break
them into their component parts (and therefore the GTs look wrong when they really aren't).
3. Add an argument to control the minimum GQ for a GT to be considered called.
This improves genotyping accuracy assessments which were unfairly penalizing low confidence GT calls.
Delivers PT #59846158.
-- We use the RegenotypeVariants walker to recompute the qual field. (instead of the discussed idea of adding this functionality to CombineVariants)
-- QualByDepth will now be recomputed even if the stratified contexts are missing. This greatly improves the QD estimate for this pipeline. Doesn't work for multi-allelics since the qual can't be recomputed.
Making the usage more clear since the parameter is being used over and over to define baited
regions. Updated the headers accordingly and made it more readable.
--Previously it gave a cryptic message:
----IO error while decoding blarg.script with UTF-8
----Please try specifying another one using the -encoding option
Pool Caller scripts with last minute fixes. Also committed script that plotted 1000G FDR that I used in ASHG2012.
Added also a README.txt file in /humgen/gsa-hpprojects/dev/validationExperiments/largeScaleValidation/finalPaperData/README.txt
in case things need to get run again.
this script downsamples an exome BAM several times and makes a coverage distribution
analysis (of bases that pass filters) as well as haplotype caller calls with a NA12878
Knowledge Base assessment with comparison against multi-sample calling
with the UG.
This script was used for the "downsampling the exome" presentation
Quick fix the missing column header in the QualifyMissingIntervals
report.
Adding a QScript for the tool as well as a few minor updates to the
GATKReportGatherer.
* add a length of the overlaping interval metric as per CSER request
* standardized the distance metrics to be positive when fully overlapping and the longest off-target tail (as a negative number) when not overlapping
* add gatkdocs to the tool (finally!)