-- Use default VQSR params of QD, FS, DP and MQ for SNPs, with ReadPosRankSum and HaplotypeScore for UG SNPs
-- Add combine variants to GeneralCallingPipelin
-- Fix incorrect intervals in HaplotypeCaller in GeneralCallingPipeline.scala
-- GCP now emits tables for VCFs by default
-- GCP runs HC first before UG
-- GeneralCallingPipeline now jointly calls input BAMs, not separately processes them. Ready to handle CEU trio calling
-- Assess NA12878 on the particularly well reviewed 10-11mb in addition to all of 20
-- Use 4G for HC
In particular, someone produced a tandem repeat site with 57 alt alleles (sic) which made the caller blow up.
Inelegant fix is to detect if # of alleles is > our max cached capacity, and if so, emit an informative warning and skip site.
-- Added unit test to UG engine to cover this case.
-- Commit to posterity private scala script currently used for 1000G indel consensus (still very much subject to changes).
GSA-878 #resolve
--Based on existing code in GenomeAnalysisEngine
--Hashmaps hold mapping of deprecated tool name to version number and recommended replacement (if any)
--Using FastUtils for maps; specifically Object2ObjectMap but there could be a better type for Strings...
--Added user exception for deprecated annotations
--Added deprecation check to AnnotationInterfaceManager.validateAnnotations
--Run when annotations are initialized
--Made annotation sets instead of lists
--Refactored listAnnotations basic method out of VA into HelpUtils
--HelpUtils.listAnnotations() is now called by both VA and the new ListAnnotations utility (lives in sting.tools)
--This way we keep the VA --list option but we also offer a way to list annotations without a full valid VA command-line, which was a pain users continually complained about
--We could get rid of the VA --list option altogether ...?
--Mostly doc block tweaks
--Added @DocumentedGATKFeature to some walkers that were undocumented because they were ending up in "uncategorized". Very important for GSA: if a walker is in public or protected, it HAS to be properly tagged-in. If it's not ready for the public, it should be in private.
Name cache was filling up with names of all reads in entire file, which for large file eventually
consumes all of memory. Only keep read name cache for the reads that are together in one variant
region, so that a pair of reads within the same variant region will still be joined via read name.
Otherwise the ability to connect a read to its mate is lost.
Update MD5s in integration test to reflect altered output.
Add new integration test that confirms that pair within variant region is joined by read name.
Turns out the email script doesn't work correctly from cron.
Converting the webhook script back to a daemon for now until
it can be made to work as a cron job.
This reverts commit 9679accb641537f5c637cce0aeb63f3925521b42.
-having this as a daemon was annoying because we had to be sure to
re-spawn the daemon whenever it got killed
-now it will be run as a cron job once per minute
-delete now-unnecessary spawn script
-don't check exit status of wget in the trigger_pdfgen script;
it was exiting with non-0 status even though the pdf generation
was being triggered correctly
-introduce a delay after filtering the git history to allow HEAD
to be properly reset
-re-enable sanity checks in filter_stable and source_release scripts
that had temporarily been disabled while the new protected repository
was being set up
-- @Output isn't required for AssessNA12878
-- Previous version would could non-variant sites in NA12878 that resulted from subsetting a multi-sample VC to NA12878 as CALLED_BUT_NOT_IN_DB sites. Now they are properly skipped
-- Bugfix for subsetting samples to NA12878. Previous version wouldn't trim the alleles when subsetting down a multi-sample VCF, so we'd have false FN/FP sites at indels when the multi-sample VCF has alleles that result in the subset for NA12878 having non-trimmed alleles. Fixed and unit tested now.
-- Simply caps PairHMM likelihoods from rising above 0 by taking the min of the likelihood and 0. Will be properly fixed in GATK 2.5 with better PairHMM implementation.
-Jars will get updated every time the "Serial Commit Tests" plan in
Bamboo passes on the master branch
-Differs from the nightly builds in that it includes "private" and
has actually passed the test suite
-latest jar is always located at:
/humgen/gsa-hpprojects/GATK/private_unstable_builds/GenomeAnalysisTK_latest_unstable.jar
Increase one timeout, restore others that were only timing out due to the
Java crypto lib bug to their original values.
-DOUBLE timeout for NanoSchedulerUnitTest.testNanoSchedulerInLoop()
-REDUCE timeout for EngineFeaturesIntegrationTest to its original value
-REDUCE timeout for MaxRuntimeIntegrationTest to its original value
-REDUCE timeout for GATKRunReportUnitTest to its original value
-- This is a temporarily fix / hack to deal with the very high QD values that are generated by the haplotype caller when nearby events occur within reads. In that case, the QUAL field can be many fold higher than normal, and results in an inflated QD value. This hack projects such high QD values back into the good range (as these are good variants in general) so they aren't filtered away by VQSR.
-- The long-term solution to this problem is to move the HaplotypeCaller to the full bubble calling algorithm
-- Update md5s
- Moved AverageAltAlleleLength, MappingQualityZeroFraction and TechnologyComposition to Private
- VariantType, TransmissionDisequilibriumTest, MVLikelihoodRatio and GCContent are no longer Experimental
- AlleleBalanceBySample, HardyWeinberg and HomopolymerRun are Experimental and available to users with a big bold caveat message
- Refactored getMeanAltAlleleLength() out of AverageAltAlleleLength into GATKVariantContextUtils in order to make QualByDepth independent of where AverageAltAlleleLength lives
- Unrelated change, bundled in for convenience: made HC argument includeUnmappedreads @Hidden
- Removed unnecessary check in AverageAltAlleleLength
ALL GATK DEVELOPERS PLEASE READ NOTES BELOW:
I have updated the @Output annotation to behave differently and to include a 'defaultToStdout' tag.
* The 'defaultToStdout' tags lets walkers specify whether to default to stdout if -o is not provided.
* The logic for @Output is now:
* if required==true then -o MUST be provided or a User Error is generated.
* if required==false and defaultToStdout==true then the output is assigned to stdout if no -o is provided.
* this is the default behavior (i.e. @Output with no modifiers).
* if required==false and defaultToStdout==false then the output object is null.
* use this combination for truly optional outputs (e.g. the -badSites option in AssessNA12878).
* I have updated walkers so that previous behavior has been maintained (as best I could).
* In general, all @Outputs with default long/short names have required=false.
* Walkers with nWayOut options must have required==false and defaultToStdout==false (I added checks for this)
* I added unit tests for @Output changes with David's help (thanks!).
* #resolve GSA-837
* ClippingOp updated to incorporate Ns in the hard clips.
* ReadUtils.getReadCoordinateForReferenceCoordinate() updated to account for Ns.
* Added test that covers the BQSR case we saw.
* Created GSA-856 (for Mauricio) to add lots of tests to ReadUtils.
* It will require refactoring code and not in the scope of what I was willing to do to fix this.
-"git clone" was failing intermittently with disturbing error messages about
missing certain files. Use cp -r instead.
-Add extra checks and steps to try to ensure we have a complete checkout
with no missing files.
-Turns out the Java 6 JCE crypto library (used to decrypt our AWS keys)
uses the current list of files in the java.io.tmpdir as a source of
entropy. This file list operation was prohibitively slow with a large,
shared temp directory.
-Starting with an independent, empty temp dir for each run should solve
this problem, and get rid of all/most of the test timeouts we've been
seeing.