Commit Graph

4535 Commits (847c832ef92ebf92c6caa44c10576bd0b57a99fa)

Author SHA1 Message Date
droazen 847c832ef9 Merge pull request #999 from broadinstitute/rhl_load_vector_pair_hmm
Fix loading of VectorLoglessPairHMM by rolling back to Intel's lib version
2015-06-04 12:54:59 -04:00
Eric Banks 27d3bafcbd Merge pull request #997 from broadinstitute/eb_add_foreign_read_filter
Added a new filter that can be used to remove reads that are too smal…
2015-05-22 14:34:28 -04:00
Eric Banks 8c81e7df95 Added a new filter that can be used to remove reads that are too small and overly clipped. 2015-05-22 14:33:35 -04:00
Ron Levine 3b0cb028e6 Fix loading of VectorLoglessPairHMM by rolling back to Intel's lib version 2015-05-22 14:16:00 -04:00
Ron Levine a6ca97ef14 Site-level selection based on genotype filter status 2015-05-21 11:27:20 -04:00
Kristian Cibulskis 3b1ee17727 added "artifact detection mode" for PON creation
added "str_contraction" artifact filter (improves specificity, especially in exomes)
refactored out VCF constants and added descriptions

added "artifact detection mode" for PON creation
added "str_contraction" artifact filter (improves specificity, especially in exomes)

added new dream evaulation markdown

added results for SMC 4

fixed up documentation, moved location to /dsde/working/mutect/dream_smc, and checked in scala script

added "artifact detection mode" for PON creation
added "str_contraction" artifact filter (improves specificity, especially in exomes)

fixed bug which would overwrite germline_risk filter errors
updated "how to" documents and records

fixed license text

thinned down FP regression test from 700 sites to 100.  we have better ways (DREAM, NN) to check accuracy of the method and 100 is good enough to catch regressions

why oh why do the MD5-based unit tests produce different results on different machine architectures?  I hate that :/

Thanks to GG, LDG and DR -- test should now produce the same results regardless of machine architecture

disabled downsampling... hopefully in the final attempt to make this work cross architecture!

enforced LOGLESS_CACHING... hopefully in the final final attempt to make this work cross architecture!

refactored out VCF constants and added descriptions
2015-05-15 07:14:33 -04:00
Geraldine Van der Auwera d1a7edd796 Update pom versions to mark the start of GATK 3.5 development 2015-05-15 00:44:54 -04:00
Geraldine Van der Auwera f19618653a Update pom versions for the 3.4 release 2015-05-15 00:40:39 -04:00
Geraldine Van der Auwera 8b20523f5e Merge pull request #979 from broadinstitute/ami-fixASE-bug
solve bug - now work also when the reads does not have mate
2015-05-14 21:09:52 -04:00
David Roazen caafe84e74 Rev htsjdk to version 1.132 and picard to version 1.131, and switch to using the versions in maven central
-We now pull htsjdk and picard from maven central.

-Updated the GATK codebase as necessary to adapt to changes in the Feature
 interface.

-Since VCFHeader now requires that all header lines have unique keys, uniquified
 the keys of GVCFBlock header lines by including the min/max GQ in the key.
 Updated MD5s accordingly.

-Other MD5s changed as a result of an htsjdk fix to eliminate "-0" in VCF output.
2015-05-14 15:26:23 -04:00
Ami Levy-Moonshine 536d550794 solve bug - now work also when the reads does not have mate
reads with no mate will be counted as valid reads
2015-05-12 17:51:01 -04:00
Ron Levine 4a75d54e65 Added invert and exclude flags for variant selection queries 2015-05-12 15:08:28 -04:00
Geraldine Van der Auwera 7a75f4ae79 Merge pull request #974 from broadinstitute/jw_Var2BinPEDSwap
Correct errant array element swap in FAM file output.
2015-05-12 08:49:16 -04:00
Eric Banks 53a34cea4a Merge pull request #938 from broadinstitute/eb_fix_spanning_deletions_in_genotyping
Added a fix for genotyping positions over spanning deletions.
2015-05-11 23:11:47 -04:00
Joseph White abb6bc6f57 Correct errant array element swap in FAM file output.
dad and mom are swapped; paternal first, then maternal

updated MD5 chksums for test files

remove commented lines
2015-05-11 20:45:50 -04:00
Eric Banks 530e0e5ea6 Added a fix for combining/genotyping positions over spanning deletions.
Previously, if a SNP occurred in sample A at a position that was in the middle of a deletion for sample B,
sample B would be genotyped as homozygous reference there (but it's NOT reference - there's a deletion).
Now, sample B is genotyped as having a symbolic DEL allele.

Minor cleanup added.  Note that I also removed Laura's previous fix for this problem.

Existing integration tests change because I've added a new header line to the VCF being output.
I also added several tests for the new functionality showing:
1. genotyping from separate and already combined gvcfs give the same output
2. genotyping over multiple spanning deletions works
3. combining works too

Existing unit tests also cover this case.
2015-05-11 15:11:16 -04:00
Geraldine Van der Auwera 5d8b9a7c20 Moved MQ0 out of HC exclusion and into StandardUGAnnotation 2015-05-03 01:04:49 +02:00
Geraldine Van der Auwera 071d82d1bf Un-exclude SD and TRA from HC annotators; resolves #966
Exclude MQ0BySample
Move SD and TRA to new StandardUGAnnotation interface
There is now annotation interface (StandardUGAnnotation) holding annots that are standard in UG but should't be used as they are now with HC. This allows us to not have to exclude these annotations explicitly in HC, but still be able to use them for development purposes.
2015-05-03 00:45:53 +02:00
Geraldine Van der Auwera e49f6dfd0f Merge pull request #970 from broadinstitute/gg_minor_docfixes
Fairly minor if plentiful fixes to various gatkdocs. Merging this without formal review since all tests pass, the gatkdocs build, and no one really wants to review corrections to grammar, typos and layout for 120+ documents. Review will be done by users in production ;-)
2015-05-03 00:36:12 +02:00
Geraldine Van der Auwera 919c3eaa2e Numerous doc fixes; mostly formatting and clarifications 2015-05-03 00:28:46 +02:00
Geraldine Van der Auwera fddc5331e1 Merge pull request #965 from broadinstitute/gg_nsubtil_clamp_hmm_fix
Clamp the HMM window starting coordinate to 1 instead of 0
2015-05-01 22:18:20 +02:00
Ron Levine 9ff827c83a More allele trimming for VariantAnnotator 2015-04-29 21:11:49 -04:00
Geraldine Van der Auwera f2b34d0823 Clamp the HMM window starting coordinate to 1 instead of 0 2015-04-30 01:37:20 +02:00
David Roazen 19ceca5e86 Queue: add -qsub-broad argument
When -qsub-broad is specified instead of -qsub, use the "h_vmem" parameter
instead of "h_rss" to specify memory limit requests.

Also cause the GridEngine native arguments to be output by default to the logger,
instead of only when in debug mode.
2015-04-27 17:43:25 -04:00
Ron Levine d5f98e99f0 Bypass reads with a bad CIGAR length 2015-04-21 11:55:56 -04:00
Geraldine Van der Auwera bfcac455c9 Merge pull request #932 from broadinstitute/yf_fix_picard_md
Fix the scala wrapper for Picard MarkDuplicates
2015-04-16 12:08:39 -04:00
Khalid Shakir 90b579c78e CatVariants now allows different input / output file types.
Escaping the CatVariantsIntegrationTest classpaths for possible spaces in the directory names.
2015-04-13 14:39:46 -03:00
Yossi Farjoun a7487e282a since Picard mark duplicates moved to a different package, this class was broken. here's the fix. it would be good to have tests for all the scala picard-wrappers, but that is out of scope for this commit. 2015-04-13 08:44:30 -04:00
Yossi Farjoun d30a6258bc added the missing file to the error message 2015-04-06 08:21:55 -04:00
Alex Baumann 024ec69e97 Modify GATK command line header for unique keys
The GATK command line header keys were being repeated in the VCF and
subsequently lost to a single key value by HTSJDK.  This resolves
the issue by appending the name of the walker after the text
"GATKCommandLine" and a number after that if the same walker was
used more than once in the form: GATKCommandLine.(walker name) for
the first occurrence of the walker, and GATKCommandLine.(walker name).#
where # is the number of the occurrence of the walker (e.g.
GATKCommandLine.SomeWalker.2 for the second occurrence of SomeWalker).
Integration test added to EngineFeaturesIntegrationTest to verify
two runs of same walker follow expected form.

Resolves #909
See also: HTSJDK #43
2015-04-02 13:56:11 -04:00
Ron Levine fe87484074 Update -mv example documentation
Made general doc fixes
2015-04-01 02:37:42 -04:00
Geraldine Van der Auwera d7f7022dce Merge pull request #904 from broadinstitute/pd_orig_dp
Added keepOriginalDP argument to SelectVariants
2015-03-30 09:01:33 -04:00
ldgauthier 0101003138 Merge pull request #899 from broadinstitute/ldg_M2_tandemRepeatsAndContamination
Lots of changes to M2:
2015-03-30 07:58:35 -04:00
Geraldine Van der Auwera 87b3dddb39 Merge pull request #894 from broadinstitute/gg_ami_docs_license
Edited ASEReadCounter documentation
2015-03-28 13:15:24 -04:00
Laura Gauthier 5a10758e2e Annotation changes for M2:
Build a ReferenceContext in ActiveRegionWalkers to pass in to annotation engine so we can call the TandemRepeatAnnotator from M2
Make TandemRepeatAnnotator default annotation for M2.
Setup (but don't use yet) HC-style contamination downsampling.
New HC integration test with TandemRepeatAnnotator
2015-03-27 18:25:23 -04:00
Ron Levine aef0a83c52 Automatically choose indexing strategy by file extension 2015-03-27 11:10:35 -04:00
Geraldine Van der Auwera 9b812308b1 Edited ASEReadCounter documentation
Also changed output file variable type from String to Enum
2015-03-26 02:43:53 -04:00
Phillip Dexheimer c97c253ec8 Added keepOriginalDP argument to SelectVariants
Fixes #830
2015-03-25 22:45:31 -04:00
Geraldine Van der Auwera dfa18a8fc6 Merge pull request #887 from broadinstitute/pd_vcf_cmdline_hdr
Fixed logging of 'out' command line parameter in VCF headers
2015-03-25 00:48:55 -04:00
Ami Levy-Moonshine c5fc5c4f8c create 2 new tools:
- ASEReadCounter (public tool) replce Tuuli's script to produce the input to Manny's tool.
   It count the number of reads that support the ref allele and the alt allele, filtereing low qual reads and bases and keep only properPaired reads
- ASECaller (private tool) take both RNA and DNA, and produce ontingencyTables ** still under development **

minor changes in other tools:
- update RNA HC variant calling scala script
- expose FS method pValueForContingencyTable to be able to call it from ASEcaller

In ASEReadCounter:
- allow different option to deal with overlaping read from the same fragment
- add option to ignore or include indels in the pileups
- add option to disabled DuplicateRead

add ASEReadCounterIntegrationTest.java and files for the test
2015-03-21 16:56:00 -04:00
Phillip Dexheimer 3b567d7a98 Fixed logging of 'out' command line parameter in VCF headers 2015-03-18 23:12:13 -04:00
Geraldine Van der Auwera a75e1d4ce4 Fixes the test that was failing due to gsalib build failure 2015-03-17 04:26:03 -04:00
Phillip Dexheimer 4d4d33404e Added gsa.reshape.concordance.table function to gsalib 2015-03-16 22:52:27 -04:00
Geraldine Van der Auwera 1d39ed9156 Merge pull request #814 from broadinstitute/biocyberman_maven_patches
Biocyberman maven patches
2015-03-13 16:26:02 -04:00
Geraldine Van der Auwera 39a972f348 Merge pull request #872 from broadinstitute/eb_create_rgq_format_field
Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Fixes #870
2015-03-13 13:59:53 -04:00
Geraldine Van der Auwera 7681e89454 Merge pull request #869 from broadinstitute/gg_fix_vqsr_plots_GSA-860
Switched VQSR tranches plot ordering rule
2015-03-13 10:46:55 -04:00
Eric Banks 1ff9463285 Added the RGQ format annotation to monomorphic sites in the VCF output of GenotypeGVCFs.
Now, instead of stripping out the GQs for mono sites, we transfer them to the RGQ.
This is extremely useful for people who want to know how confident the hom ref genotype calls are.
Perhaps this is just what CRSP needs for pertinent negatives.

Note that I also changed the tool to no longer use the GenotypeSummaries annotation by default since
it was adding some seemingly unnecessary annotations (like mean GQ now that we keep the GQ around and
number of no-calls).  Let me know if this was a mistake (although Laura gave me a thumbs up).
2015-03-13 10:27:20 -04:00
Phillip Dexheimer 6ffa295963 Regression: The new 'includeUnmapped' PartitionBy annotation was incorrectly set for HC
Fixes #828
2015-03-13 00:24:57 -04:00
Geraldine Van der Auwera aa4084d42f Switched VQSR tranches plot ordering rule 2015-03-12 19:57:03 -04:00
Geraldine Van der Auwera f8a081a262 Updated readme in public/doc to just point to the website 2015-03-12 11:52:48 -04:00