Previously, if a SNP occurred in sample A at a position that was in the middle of a deletion for sample B,
sample B would be genotyped as homozygous reference there (but it's NOT reference - there's a deletion).
Now, sample B is genotyped as having a symbolic DEL allele.
Minor cleanup added. Note that I also removed Laura's previous fix for this problem.
Existing integration tests change because I've added a new header line to the VCF being output.
I also added several tests for the new functionality showing:
1. genotyping from separate and already combined gvcfs give the same output
2. genotyping over multiple spanning deletions works
3. combining works too
Existing unit tests also cover this case.
Exclude MQ0BySample
Move SD and TRA to new StandardUGAnnotation interface
There is now annotation interface (StandardUGAnnotation) holding annots that are standard in UG but should't be used as they are now with HC. This allows us to not have to exclude these annotations explicitly in HC, but still be able to use them for development purposes.
Fairly minor if plentiful fixes to various gatkdocs. Merging this without formal review since all tests pass, the gatkdocs build, and no one really wants to review corrections to grammar, typos and layout for 120+ documents. Review will be done by users in production ;-)
When -qsub-broad is specified instead of -qsub, use the "h_vmem" parameter
instead of "h_rss" to specify memory limit requests.
Also cause the GridEngine native arguments to be output by default to the logger,
instead of only when in debug mode.
The GATK command line header keys were being repeated in the VCF and
subsequently lost to a single key value by HTSJDK. This resolves
the issue by appending the name of the walker after the text
"GATKCommandLine" and a number after that if the same walker was
used more than once in the form: GATKCommandLine.(walker name) for
the first occurrence of the walker, and GATKCommandLine.(walker name).#
where # is the number of the occurrence of the walker (e.g.
GATKCommandLine.SomeWalker.2 for the second occurrence of SomeWalker).
Integration test added to EngineFeaturesIntegrationTest to verify
two runs of same walker follow expected form.
Resolves#909
See also: HTSJDK #43
Build a ReferenceContext in ActiveRegionWalkers to pass in to annotation engine so we can call the TandemRepeatAnnotator from M2
Make TandemRepeatAnnotator default annotation for M2.
Setup (but don't use yet) HC-style contamination downsampling.
New HC integration test with TandemRepeatAnnotator
- ASEReadCounter (public tool) replce Tuuli's script to produce the input to Manny's tool.
It count the number of reads that support the ref allele and the alt allele, filtereing low qual reads and bases and keep only properPaired reads
- ASECaller (private tool) take both RNA and DNA, and produce ontingencyTables ** still under development **
minor changes in other tools:
- update RNA HC variant calling scala script
- expose FS method pValueForContingencyTable to be able to call it from ASEcaller
In ASEReadCounter:
- allow different option to deal with overlaping read from the same fragment
- add option to ignore or include indels in the pileups
- add option to disabled DuplicateRead
add ASEReadCounterIntegrationTest.java and files for the test
Now, instead of stripping out the GQs for mono sites, we transfer them to the RGQ.
This is extremely useful for people who want to know how confident the hom ref genotype calls are.
Perhaps this is just what CRSP needs for pertinent negatives.
Note that I also changed the tool to no longer use the GenotypeSummaries annotation by default since
it was adding some seemingly unnecessary annotations (like mean GQ now that we keep the GQ around and
number of no-calls). Let me know if this was a mistake (although Laura gave me a thumbs up).
* The value of this element (default true) determines whether Queue will explicitly run this walker over unmapped reads
* This patch fixes a runtime error when FindCoveredIntervals was used with Queue
* PT 81777160