gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	559a4bc05d	Updating general calling pipeline to work with newer HC and UG arguments and filtering -- Use default VQSR params of QD, FS, DP and MQ for SNPs, with ReadPosRankSum and HaplotypeScore for UG SNPs -- Add combine variants to GeneralCallingPipelin -- Fix incorrect intervals in HaplotypeCaller in GeneralCallingPipeline.scala -- GCP now emits tables for VCFs by default -- GCP runs HC first before UG -- GeneralCallingPipeline now jointly calls input BAMs, not separately processes them. Ready to handle CEU trio calling -- Assess NA12878 on the particularly well reviewed 10-11mb in addition to all of 20 -- Use 4G for HC	2013-03-20 22:54:35 -04:00
Eric Banks	1fae750ebe	Merge pull request #120 from broadinstitute/aw_reduce_reads_clear_name_cache Clear ReduceReads name cache after each set of reads produced by ReduceR...	2013-03-20 19:47:42 -07:00
Mark DePristo	7e29beadff	Merge pull request #121 from broadinstitute/gda_hc_gls_for_1000g_GSA-878 Fix (rather workaround) encountered when running HaplotypeCaller in GGA ...	2013-03-20 14:08:10 -07:00
Guillermo del Angel	ea01dbf130	Fix to issue encountered when running HaplotypeCaller in GGA mode with data from other 1000G callers. In particular, someone produced a tandem repeat site with 57 alt alleles (sic) which made the caller blow up. Inelegant fix is to detect if # of alleles is > our max cached capacity, and if so, emit an informative warning and skip site. -- Added unit test to UG engine to cover this case. -- Commit to posterity private scala script currently used for 1000G indel consensus (still very much subject to changes). GSA-878 #resolve	2013-03-20 14:30:37 -04:00
MauricioCarneiro	470746c907	Merge pull request #117 from broadinstitute/gg_handling_deprecated_tools_45941819 gg handling deprecated tools 45941819	2013-03-20 07:31:33 -07:00
Geraldine Van der Auwera	d70bf64737	Created new DeprecatedToolChecks class --Based on existing code in GenomeAnalysisEngine --Hashmaps hold mapping of deprecated tool name to version number and recommended replacement (if any) --Using FastUtils for maps; specifically Object2ObjectMap but there could be a better type for Strings... --Added user exception for deprecated annotations --Added deprecation check to AnnotationInterfaceManager.validateAnnotations --Run when annotations are initialized --Made annotation sets instead of lists	2013-03-20 06:46:02 -04:00
Geraldine Van der Auwera	6b4d88ebe9	Created ListAnnotations utility (extends CommandLineProgram) --Refactored listAnnotations basic method out of VA into HelpUtils --HelpUtils.listAnnotations() is now called by both VA and the new ListAnnotations utility (lives in sting.tools) --This way we keep the VA --list option but we also offer a way to list annotations without a full valid VA command-line, which was a pain users continually complained about --We could get rid of the VA --list option altogether ...?	2013-03-20 06:15:27 -04:00
Geraldine Van der Auwera	95a9ed853d	Made some documentation updates & fixes --Mostly doc block tweaks --Added @DocumentedGATKFeature to some walkers that were undocumented because they were ending up in "uncategorized". Very important for GSA: if a walker is in public or protected, it HAS to be properly tagged-in. If it's not ready for the public, it should be in private.	2013-03-20 06:15:20 -04:00
Alec Wysoker	bccc9d79e5	Clear ReduceReads name cache after each set of reads produced by ReduceReadsStash. Name cache was filling up with names of all reads in entire file, which for large file eventually consumes all of memory. Only keep read name cache for the reads that are together in one variant region, so that a pair of reads within the same variant region will still be joined via read name. Otherwise the ability to connect a read to its mate is lost. Update MD5s in integration test to reflect altered output. Add new integration test that confirms that pair within variant region is joined by read name.	2013-03-19 14:12:33 -04:00
Ryan Poplin	c813259283	Merge pull request #119 from broadinstitute/md_assessn12878_bugfixes AssessNA12878 bugfixes	2013-03-19 05:11:50 -07:00
David Roazen	d4f873f664	Revert "github webhook handler: convert from daemon to cron job" Turns out the email script doesn't work correctly from cron. Converting the webhook script back to a daemon for now until it can be made to work as a cron job. This reverts commit 9679accb641537f5c637cce0aeb63f3925521b42.	2013-03-19 03:50:39 -04:00
David Roazen	ff79118379	github webhook handler: convert from daemon to cron job -having this as a daemon was annoying because we had to be sure to re-spawn the daemon whenever it got killed -now it will be run as a cron job once per minute -delete now-unnecessary spawn script	2013-03-19 02:47:13 -04:00
David Roazen	f9ad8d4325	Merged bug fix from Stable into Unstable Conflicts: private/gsa-engineering/pdfgen/trigger_pdfgen.sh	2013-03-19 01:23:58 -04:00
David Roazen	532efad8cd	Release scripts: small changes to reduce intermittent failures -don't check exit status of wget in the trigger_pdfgen script; it was exiting with non-0 status even though the pdf generation was being triggered correctly -introduce a delay after filtering the git history to allow HEAD to be properly reset -re-enable sanity checks in filter_stable and source_release scripts that had temporarily been disabled while the new protected repository was being set up	2013-03-19 01:09:30 -04:00
Mark DePristo	d7bec9eb6e	AssessNA12878 bugfixes -- @Output isn't required for AssessNA12878 -- Previous version would could non-variant sites in NA12878 that resulted from subsetting a multi-sample VC to NA12878 as CALLED_BUT_NOT_IN_DB sites. Now they are properly skipped -- Bugfix for subsetting samples to NA12878. Previous version wouldn't trim the alleles when subsetting down a multi-sample VCF, so we'd have false FN/FP sites at indels when the multi-sample VCF has alleles that result in the subset for NA12878 having non-trimmed alleles. Fixed and unit tested now.	2013-03-18 15:48:08 -04:00
Eric Banks	a36e2b8f9d	Merge pull request #118 from broadinstitute/ami-typoInCoveredByNSamplesSites fix typos in argument docs in CoveredByNSamplesSites and rewrite an unac...	2013-03-18 11:10:10 -07:00
Ami Levy-Moonshine	0e9c1913ff	fix typos in argument docs and in printed output in CoveredByNSamplesSites and rewrite an unaccurate comment	2013-03-18 13:54:21 -04:00
Mark DePristo	2b80068164	Merged bug fix from Stable into Unstable	2013-03-18 12:36:21 -04:00
Mark DePristo	7ab7c873a1	Temp. to PairHMM to avoid bad likelihoods -- Simply caps PairHMM likelihoods from rising above 0 by taking the min of the likelihood and 0. Will be properly fixed in GATK 2.5 with better PairHMM implementation.	2013-03-18 12:34:51 -04:00
David Roazen	a67d8c8dd6	Bump timeout for MaxRuntimeIntegrationTest Looks like returning this timeout to its original value was a bit too aggressive -- adding 40 seconds to the tolerance limit.	2013-03-17 16:17:29 -04:00
droazen	a67aae0261	Merge pull request #114 from broadinstitute/dr_tweak_test_timeouts Further tweaking of test timeouts	2013-03-15 15:43:55 -07:00
Mark DePristo	d86a1242d1	Merge pull request #115 from broadinstitute/md_kb_unstable_server_GSA-778 NA12878 KB startup script takes full path to GATK.jar	2013-03-15 13:34:10 -07:00
Mark DePristo	2f27e5682a	NA12878 KB startup script takes full path to GATK.jar	2013-03-15 16:33:29 -04:00
David Roazen	236eb54abd	Trivial script to publish private unstable jars for group use -Jars will get updated every time the "Serial Commit Tests" plan in Bamboo passes on the master branch -Differs from the nightly builds in that it includes "private" and has actually passed the test suite -latest jar is always located at: /humgen/gsa-hpprojects/GATK/private_unstable_builds/GenomeAnalysisTK_latest_unstable.jar	2013-03-15 16:00:59 -04:00
Mark DePristo	090db06793	Merge pull request #110 from broadinstitute/rp_fix_extending_partial_haplotype_bug_GSA-840 Bug fix in assembly for edge case in which the extendPartialHaplotype fu...	2013-03-15 11:53:31 -07:00
David Roazen	742a7651e9	Further tweaking of test timeouts Increase one timeout, restore others that were only timing out due to the Java crypto lib bug to their original values. -DOUBLE timeout for NanoSchedulerUnitTest.testNanoSchedulerInLoop() -REDUCE timeout for EngineFeaturesIntegrationTest to its original value -REDUCE timeout for MaxRuntimeIntegrationTest to its original value -REDUCE timeout for GATKRunReportUnitTest to its original value	2013-03-15 14:49:21 -04:00
droazen	e681df68c9	Merge pull request #113 from broadinstitute/dr_parallel_tests_print_exited_classes parallel tests: print names of test classes that had an error in real time	2013-03-15 11:41:40 -07:00
David Roazen	68c6ebd93f	parallel tests: print names of test classes that had an error in real time	2013-03-15 14:28:20 -04:00
Ryan Poplin	0cf5d30dac	Bug fix in assembly for edge case in which the extendPartialHaplotype function was filling in deletions in the middle of haplotypes.	2013-03-15 14:20:25 -04:00
droazen	9d6d1f94b0	Merge pull request #112 from broadinstitute/dr_parallel_tests_print_unfinished_classes parallel tests: start printing the names of unfinished test classes once...	2013-03-15 10:57:59 -07:00
Mark DePristo	4a042e9bff	Merge pull request #111 from broadinstitute/rp_no_ref_padding_bug_GSA-860 Fix for edge case bug of trying to create insertions/deletions on the ed...	2013-03-15 10:34:45 -07:00
David Roazen	f42a52c090	parallel tests: start printing the names of unfinished test classes once there are < 10 jobs left This will let us see in real time in Bamboo which classes are preventing our runs from finishing	2013-03-15 13:34:30 -04:00
Ryan Poplin	b8991f5e98	Fix for edge case bug of trying to create insertions/deletions on the edge of contigs. -- Added integration test using MT that previously failed	2013-03-15 12:32:13 -04:00
David Roazen	0fd40dbde9	parallel tests: use experimental Class A storage (We were previously using Class C storage)	2013-03-15 10:20:27 -04:00
Ryan Poplin	daa0f8b551	Merge pull request #109 from broadinstitute/md_qd_fix_for_high_depth QualityByDepth remaps QD values > 40 to a gaussian around 30	2013-03-15 07:05:32 -07:00
Mark DePristo	8317cc155e	Merge pull request #108 from broadinstitute/eb_bqsr_out_of_bounds_fix Added check in the MalformedReadFilter for reads without stored bases (i...	2013-03-14 17:29:35 -07:00
MauricioCarneiro	6f0269df2c	Merge pull request #107 from broadinstitute/eb_fix_bqsr_clip_exception	2013-03-14 14:40:06 -07:00
Eric Banks	232afdcbea	Added check in the MalformedReadFilter for reads without stored bases (i.e. that use ''). We now throw a User Error for such reads * User can override this to filter instead with --filter_bases_not_stored * Added appropriate unit test	2013-03-14 17:17:26 -04:00
Mark DePristo	2d35065238	QualityByDepth remaps QD values > 40 to a gaussian around 30 -- This is a temporarily fix / hack to deal with the very high QD values that are generated by the haplotype caller when nearby events occur within reads. In that case, the QUAL field can be many fold higher than normal, and results in an inflated QD value. This hack projects such high QD values back into the good range (as these are good variants in general) so they aren't filtered away by VQSR. -- The long-term solution to this problem is to move the HaplotypeCaller to the full bubble calling algorithm -- Update md5s	2013-03-14 16:09:41 -04:00
droazen	0fd9f0e77c	Merge pull request #104 from broadinstitute/eb_fix_output_annotation_GSA-837 Fixed the logic of the @Output annotation and its interaction with 'required'	2013-03-14 12:52:00 -07:00
David Roazen	c3b5f66386	run_parallel_tests: further attempts to work around git issues in bamboo	2013-03-14 15:35:55 -04:00
Mark DePristo	5d6faef50e	Merge pull request #106 from broadinstitute/rp_unknown_sites_assess_as_tp_in_kb Changing CALLED_IN_DB_UNKNOWN_STATUS to count as TRUE_POSITIVEs in the s...	2013-03-14 11:50:12 -07:00
Ryan Poplin	38914384d1	Changing CALLED_IN_DB_UNKNOWN_STATUS to count as TRUE_POSITIVEs in the simplified stats for AssessNA12878.	2013-03-14 14:44:18 -04:00
Eric Banks	6d6264b108	Merge pull request #105 from broadinstitute/gg_annotations_cleanup_45802765 Cleaned up annotations	2013-03-14 11:35:00 -07:00
delangel	ec43112d28	Merge pull request #100 from broadinstitute/eb_maxIndelSize_SV_fix Fixed bug in SelectVariants where maxIndelSize argument wasn't getting a...	2013-03-14 11:32:56 -07:00
Geraldine Van der Auwera	61349ecefa	Cleaned up annotations - Moved AverageAltAlleleLength, MappingQualityZeroFraction and TechnologyComposition to Private - VariantType, TransmissionDisequilibriumTest, MVLikelihoodRatio and GCContent are no longer Experimental - AlleleBalanceBySample, HardyWeinberg and HomopolymerRun are Experimental and available to users with a big bold caveat message - Refactored getMeanAltAlleleLength() out of AverageAltAlleleLength into GATKVariantContextUtils in order to make QualByDepth independent of where AverageAltAlleleLength lives - Unrelated change, bundled in for convenience: made HC argument includeUnmappedreads @Hidden - Removed unnecessary check in AverageAltAlleleLength	2013-03-14 14:26:48 -04:00
Eric Banks	7cab709a88	Fixed the logic of the @Output annotation and its interaction with 'required'. ALL GATK DEVELOPERS PLEASE READ NOTES BELOW: I have updated the @Output annotation to behave differently and to include a 'defaultToStdout' tag. * The 'defaultToStdout' tags lets walkers specify whether to default to stdout if -o is not provided. * The logic for @Output is now: * if required==true then -o MUST be provided or a User Error is generated. * if required==false and defaultToStdout==true then the output is assigned to stdout if no -o is provided. * this is the default behavior (i.e. @Output with no modifiers). * if required==false and defaultToStdout==false then the output object is null. * use this combination for truly optional outputs (e.g. the -badSites option in AssessNA12878). * I have updated walkers so that previous behavior has been maintained (as best I could). * In general, all @Outputs with default long/short names have required=false. * Walkers with nWayOut options must have required==false and defaultToStdout==false (I added checks for this) * I added unit tests for @Output changes with David's help (thanks!). * #resolve GSA-837	2013-03-14 11:58:51 -04:00
Eric Banks	573ed07ad0	Fixed reported bug in BQSR for RNA seq alignments with Ns. * ClippingOp updated to incorporate Ns in the hard clips. * ReadUtils.getReadCoordinateForReferenceCoordinate() updated to account for Ns. * Added test that covers the BQSR case we saw. * Created GSA-856 (for Mauricio) to add lots of tests to ReadUtils. * It will require refactoring code and not in the scope of what I was willing to do to fix this.	2013-03-14 11:26:52 -04:00
David Roazen	acaa96f853	parallel_tests: use a safer method to copy the working dir into an LSF-accessible location -"git clone" was failing intermittently with disturbing error messages about missing certain files. Use cp -r instead. -Add extra checks and steps to try to ensure we have a complete checkout with no missing files.	2013-03-14 11:23:56 -04:00
David Roazen	be729410b9	run_parallel_tests: use independent java.io.tmpdir for each run -Turns out the Java 6 JCE crypto library (used to decrypt our AWS keys) uses the current list of files in the java.io.tmpdir as a source of entropy. This file list operation was prohibitively slow with a large, shared temp directory. -Starting with an independent, empty temp dir for each run should solve this problem, and get rid of all/most of the test timeouts we've been seeing.	2013-03-14 08:55:26 -04:00

1 2 3 4 5 ...

12079 Commits (559a4bc05dcddc35e2a5282f0a7ad21ffcf80144) All Branches Search

12079 Commits (559a4bc05dcddc35e2a5282f0a7ad21ffcf80144)

All Branches