gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	7cab709a88	Fixed the logic of the @Output annotation and its interaction with 'required'. ALL GATK DEVELOPERS PLEASE READ NOTES BELOW: I have updated the @Output annotation to behave differently and to include a 'defaultToStdout' tag. * The 'defaultToStdout' tags lets walkers specify whether to default to stdout if -o is not provided. * The logic for @Output is now: * if required==true then -o MUST be provided or a User Error is generated. * if required==false and defaultToStdout==true then the output is assigned to stdout if no -o is provided. * this is the default behavior (i.e. @Output with no modifiers). * if required==false and defaultToStdout==false then the output object is null. * use this combination for truly optional outputs (e.g. the -badSites option in AssessNA12878). * I have updated walkers so that previous behavior has been maintained (as best I could). * In general, all @Outputs with default long/short names have required=false. * Walkers with nWayOut options must have required==false and defaultToStdout==false (I added checks for this) * I added unit tests for @Output changes with David's help (thanks!). * #resolve GSA-837	2013-03-14 11:58:51 -04:00
David Roazen	acaa96f853	parallel_tests: use a safer method to copy the working dir into an LSF-accessible location -"git clone" was failing intermittently with disturbing error messages about missing certain files. Use cp -r instead. -Add extra checks and steps to try to ensure we have a complete checkout with no missing files.	2013-03-14 11:23:56 -04:00
David Roazen	be729410b9	run_parallel_tests: use independent java.io.tmpdir for each run -Turns out the Java 6 JCE crypto library (used to decrypt our AWS keys) uses the current list of files in the java.io.tmpdir as a source of entropy. This file list operation was prohibitively slow with a large, shared temp directory. -Starting with an independent, empty temp dir for each run should solve this problem, and get rid of all/most of the test timeouts we've been seeing.	2013-03-14 08:55:26 -04:00
Ryan Poplin	3b4dca1b94	Merge pull request #103 from broadinstitute/md_fragutils Cleanup FragmentUtils; Add concept of strandless reads	2013-03-13 10:12:40 -07:00
Mark DePristo	b5b63eaac7	New GATKSAMRecord concept of a strandless read, update to FS -- Strandless GATK reads are ones where they don't really have a meaningful strand value, such as Reduced Reads or fragment merged reads. Added GATKSAMRecord support for such reads, along with unit tests -- The merge overlapping fragments code in FragmentUtils now produces strandless merged fragments -- FisherStrand annotation generalized to treat strandless as providing 1/2 the representative count for both strands. This means that that merged fragments are properly handled from the HC, so we don't hallucinate fake strand-bias just because we managed to merge a lot of reads together. -- The previous getReducedCount() wouldn't work if a read was made into a reduced read after getReducedCount() had been called. Added new GATKSAMRecord method setReducedCounts() that does the right thing. Updated SlidingWindow and SyntheticRead to explicitly call this function, and so the readTag parameter is now gone. -- Update MD5s for change to FS calculation. Differences are just minor updates to the FS	2013-03-13 11:16:36 -04:00
Mark DePristo	925846c65f	Cleanup of FragmentUtils -- Code was undocumented, big, and not well tested. All three things fixed. -- Currently not passing, but the framework works well for testing -- Added concat(byte[] ... arrays) to utils	2013-03-13 07:36:20 -04:00
David Roazen	8ed78b453f	Increase timeout for a test in the EngineFeaturesIntegrationTest -This test was intermittently failing when run on the farm	2013-03-12 23:53:26 -04:00
David Roazen	3847de5290	run_parallel_tests: detect farm glitches -add a function to detect the case where there were no ant test failures, but one or more jobs exited with an error	2013-03-12 23:26:33 -04:00
Mark DePristo	c289103c7d	Merge pull request #102 from broadinstitute/dr_parallel_test_runner_improvements parallel test runner: support multiple kinds of tests per run, logging, ...	2013-03-12 18:04:55 -07:00
David Roazen	7d06d15f3c	parallel test runner: support multiple kinds of tests per run, logging, improved script output -script now supports a variable number of test class suffixes (eg., UnitTest, IntegrationTest, etc.) meaning we can, for example, dispatch all unit and integration tests at once in a single job array -write an entry to a log file at the end of each run including the build ID, exit status (COMPLETED or TIMED_OUT), total runtime, and time spent waiting for farm jobs to complete -more detailed output: print how many jobs are pending vs. running vs. done, instead of just how many jobs are unfinished -all errors now go to stderr rather than stdout	2013-03-12 20:46:38 -04:00
Mark DePristo	b3f67899b5	Merge pull request #101 from broadinstitute/dr_fix_failing_parallel_tests Fix more tests that fail when run in parallel on the farm	2013-03-12 14:11:02 -07:00
David Roazen	cdb1fa1105	Fix more tests that fail when run in parallel on the farm -Allow the default S3 put timeout of 30 seconds for GATKRunReports to be overridden via a constructor argument, and use a timeout of 300 seconds for tests. The timeout remains 30 seconds in all other cases. -Change integration tests that themselves dispatch farm jobs into pipeline tests. Necessary because some farm nodes are not set up as submit hosts. Pipeline tests are still run directly on gsa4. -Bump up the timeout for the MaxRuntimeIntegrationTest even more (was still occasionally failing on the farm!)	2013-03-12 16:53:30 -04:00
MauricioCarneiro	4403e3572a	Merge pull request #94 from broadinstitute/gg_gatkdoc_docfixes_GSATDG-111	2013-03-12 13:02:35 -07:00
MauricioCarneiro	3a16ba04d4	Merge pull request #97 from broadinstitute/eb_refactor_sliding_window Refactoring of SlidingWindow class in RR to reduce complexity and fix important bug	2013-03-12 12:27:26 -07:00
droazen	dcdd6e3e60	Merge pull request #96 from broadinstitute/md_assess_only_reviewed Add mode to AssessNA12878 that will only consider reviewed sites	2013-03-12 10:29:07 -07:00
Geraldine Van der Auwera	f972963918	Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) GATK-73 updated docs for bqsr args GATK-9 differentiate CountRODs from CountRODsByRef GATK-76 generate GATKDoc for CatVariants GATK-4 made resource arg required GATK-10 added -o, some docs to CountMales; some docs to CountLoci GATK-11 fixed by MC's -o change; straightened out the docs. GATK-77 fixed references to wiki GATK-76 Added Ami's doc block GATK-14 Added note that these annotations can only be used with VariantAnnotator GATK-15 specified required=false for two arguments GATK-23 Added documentation block GATK-33 Added documentation GATK-34 Added documentation GATK-32 Corrected arg name and docstring in DiffObjects GATK-32 Added note to DO doc about reference (required but unused) GATK-29 Added doc block to CountIntervals GATK-31 Added @Output PrintStream to enable -o GATK-35 Touched up docs GATK-36 Touched up docs, specified verbosity is optional GATK-60 Corrected GContent annot module location in gatkdocs GATK-68 touched up docs and arg docstrings GATK-16 Added note of caution about calling RODRequiringAnnotations as a group GATK-61 Added run requirements (num samples, min genotype quality) Tweaked template and generic doc block formatting (h2 to h3 titles) GATK-62 Added a caveat to HR annot Made experimental annotation hidden GATK-75 Added setup info regarding BWA GATK-22 Clarified some argument requirements GATK-48 Clarified -G doc comments GATK-67 Added arg requirement GATK-58 Added annotation and usage docs GSATDG-96 Corrected doc Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)	2013-03-12 10:57:14 -04:00
Mark DePristo	01c2e6e9fa	Merge pull request #99 from broadinstitute/ami-fix-compilationError-LScallingPipeline Ami fix compilation error l scalling pipeline	2013-03-12 07:47:57 -07:00
Ami Levy-Moonshine	e2d4d1da20	fix compilation error in ReduceReadsScript (missing import)	2013-03-12 10:31:57 -04:00
Ami Levy-Moonshine	eaf9c30257	fix compilation error (change from org.broadinstitute.variant.variantcontext.VariantContextUtils.FilteredRecordMergeType.KEEP_IF_ANY_UNFILTERED to GATKVariantContextUtils.FilteredRecordMergeType.KEEP_IF_ANY_UNFILTERED)	2013-03-12 10:31:57 -04:00
Mark DePristo	72f9abfcab	Merge pull request #98 from broadinstitute/rp_hc_glm_both Use the indel heterozygosity prior when calling indels with the HC	2013-03-12 07:09:43 -07:00
Eric Banks	05e69b6294	Refactoring of SlidingWindow class in RR to reduce complexity and fix important bug. * Allow RR to write its BAM to stdout by setting required=true for @Output. * Fixed bug in sliding window where a break in coverage after a long stretch without a variant region was causing a doubling of all the reads before the break. * Refactored SlidingWindow.updateHeaderCounts() into 3 separate tested methods. * Refactored polyploid consensus code out of SlidingWindow.compressVariantRegion().	2013-03-12 09:06:55 -04:00
Mark DePristo	08db3b5155	Add mode to AssessNA12878 that will only consider reviewed sites	2013-03-11 21:31:02 -04:00
Ryan Poplin	c96fbcb995	Use the indel heterozygosity prior when calling indels with the HC	2013-03-11 14:12:43 -04:00
Mark DePristo	7dce4f8630	Merge pull request #95 from broadinstitute/dr_parallel_tests_with_job_arrays run_parallel_tests: add job array support	2013-03-11 10:57:39 -07:00
David Roazen	df9821614c	run_parallel_tests: add job array support -With one bsub command per job, dispatch time could vary from 2 minutes to 2 hours (!) -By dispatching all jobs at once using a job array, this potential bottleneck is removed	2013-03-11 13:36:55 -04:00
Eric Banks	508b58376c	Merge pull request #93 from broadinstitute/gda_ancient_dna Two features useful for ancient DNA processing. Ancient DNA sequencing d...	2013-03-10 17:57:28 -07:00
Guillermo del Angel	695723ba43	Two features useful for ancient DNA processing. Ancient DNA sequencing data is in many ways different from modern data, and methods to analyze it need to be adapted accordingly. Feature 1: Read adaptor trimming. Ancient DNA libraries typically have very short inserts (in the order of 50 bp), so typical Illumina libraries sequenced in, say, 100bp HiSeq will have a large adaptor component being read after the insert. If this adaptor is not removed, data will not be aligneable. There are third party tools that remove adaptor and potentially merge read pairs, but are cumbersome to use and require precise knowledge of the library construction and adaptor sequence. -- New walker ReadAdaptorTrimmer walks through paired end data, computes pair overlap and trims auto-detected adaptor sequence. -- Unit tests added for trimming operation. -- Utility walker (may be retired later) DetailedReadLengthDistribution computes insert size or read length distribution stratified by read group and mapping status and outputs a GATKReport with data. -- Renamed MaxReadLengthFilter to ReadLengthFilter and added ability to specify minimum read length as a filter (may be useful if, as a consequence of adaptor trimming, we're left with a lot of very short reads which will map poorly and will just clutter output BAMs). Feature 2: Unbiased site QUAL estimation: many times ancestral allele status is not known and VCF fields like QUAL, QD, GQ, etc. are affected by the pop. gen. prior at a site. This might introduce subtle biases in studies where a species is aligned against the reference of another species, so an option for UG and HC not to apply such prior is introduced. -- Added -noPrior argument to StandardCallerArgumentCollection. -- Added option not to fill priors is such argument is set. -- Added an integration test.	2013-03-09 18:18:13 -05:00
droazen	21a6b4add2	Merge pull request #92 from broadinstitute/yf_allow_spaces_in_sampleID_in_contam_file Changed loadContaminationFile file parser to delimit by tab only (not spaces)	2013-03-07 12:07:51 -08:00
Yossi Farjoun	baad965a57	- Changed loadContaminationFile file parser to delimit by tab only. This allows spaces in sampleIDs, which apparently are allowed. - This was needed since samples with spaces in their names are regularly found in the picard pipeline. - Modified the tests to account for this (removed spaces from the good tests, and changed the failing tests accordingly) - Cleaned up the unit tests using a @DataProvider (I'm in love...). - Moved AlleleBiasedDownsamplingUtilsUnitTest to public to match location of class it is testing (due to the way bamboo operates)	2013-03-07 13:04:24 -05:00
Mark DePristo	ecb2599cde	Merge pull request #91 from broadinstitute/dr_fix_failing_parallel_tests Fix tests that were consistently or intermittently failing when run in parallel on the farm	2013-03-06 11:47:36 -08:00
David Roazen	3ab78543a7	Fix tests that were consistently or intermittently failing when run in parallel on the farm -Make MaxRuntimeIntegrationTest more lenient by assuming that startup overhead might be as long as 120 seconds on a very slow node, rather than the original assumption of 20 seconds -In TraverseActiveRegionsUnitTest, write temp bam file to the temp directory, not to the current working directory -SimpleTimerUnitTest: This test was internally inconsistent. It asserted that a particular operation should take no more than 10 milliseconds, and then asserted again that this same operation should take no more than 100 microseconds (= 0.1 millisecond). On a slow node it could take slightly longer than 100 microseconds, however. Changed the test to assert that the operation should require no more than 10000 microseconds (= 10 milliseconds) -change global default test timeout from 20 to 40 minutes (things just take longer on the farm!) -build.xml: allow runtestonly target to work with scala test classes	2013-03-06 13:56:54 -05:00
Mark DePristo	7d833256e8	Merge pull request #90 from broadinstitute/eb_allow_read_transform_ordering Added the functionality to impose a relative ordering on ReadTransformer...	2013-03-06 09:52:26 -08:00
Eric Banks	3759d9dd67	Added the functionality to impose a relative ordering on ReadTransformers in the GATK engine. * ReadTransformers can say they must be first, must be last, or don't care. * By default, none of the existing ones care about ordering except BQSR (must be first). * This addresses a bug reported on the forum where BAQ is incorrectly applied before BQSR. * The engine now orders the read transformers up front before applying iterators. * The engine checks for enabled RTs that are not compatible (e.g. both must be first) and blows up (gracefully). * Added unit tests.	2013-03-06 12:38:59 -05:00
Mark DePristo	446cd61f7e	Merge pull request #84 from broadinstitute/eb_allelic_primitives Added new walker to split MNPs into their allelic primitives (SNPs).	2013-03-06 09:02:21 -08:00
Mark DePristo	dadc079dbc	Merge pull request #89 from broadinstitute/mc_fix_output_annotation_GSA-820 Turning @Output required to false	2013-03-06 09:01:20 -08:00
Mark DePristo	64a9ccded6	Merge pull request #77 from broadinstitute/mc_postqc_tsca One line change to the post calling QC pipeline	2013-03-06 07:13:10 -08:00
Eric Banks	78721ee09b	Added new walker to split MNPs into their allelic primitives (SNPs). * Can be extended to complex alleles at some point. * Currently only works for bi-allelics (documented). * Added unit and integration tests.	2013-03-05 23:16:42 -05:00
Mauricio Carneiro	e2d41f0282	Turning @Output required to false By default all output is assigned to stdout if a -o is not provided. Technically this makes @Output a not required parameter, and the documentation is misleading because it's reading from the annotation. GSA-820 #resolve	2013-03-05 17:26:16 -05:00
delangel	f10723df3b	Merge pull request #85 from broadinstitute/md_simple_kb_report AssessNA12878 now emits a simplified assessment table by default	2013-03-05 10:39:39 -08:00
Eric Banks	2be57fbcfb	Merged bug fix from Stable into Unstable	2013-03-05 13:28:46 -05:00
Eric Banks	5e89f01e10	Don't allow the use of compressed (.gz) references in the GATK.	2013-03-05 13:28:19 -05:00
Mark DePristo	92ac9e7f65	AssessNA12878 now emits a simplified assessment table by default -- New report collapses the detailed states in the 5 key states: TP, FP, FN, TN, unknown, such as in the following example: Name VariantType AssessmentType Count variant SNPS TRUE_POSITIVE 6 variant SNPS FALSE_POSITIVE 9 variant SNPS FALSE_NEGATIVE 1213 variant SNPS TRUE_NEGATIVE 172 variant SNPS CALLED_NOT_IN_DB_AT_ALL 0 variant INDELS TRUE_POSITIVE 19 variant INDELS FALSE_POSITIVE 13 variant INDELS FALSE_NEGATIVE 262 variant INDELS TRUE_NEGATIVE 57 variant INDELS CALLED_NOT_IN_DB_AT_ALL 39 -- Use --detailed to see the previous full version -- Expanded unittests for Assessment	2013-03-05 11:51:38 -05:00
Eric Banks	b5a07da04c	Merge pull request #88 from broadinstitute/eb_fix_pairHMM_from_stable Revert push from stable	2013-03-05 06:07:50 -08:00
Eric Banks	bbbaf9ad20	Revert push from stable (I forgot that pushing from stable overwrites current unstable changes)	2013-03-05 09:06:02 -05:00
Eric Banks	a037423225	Merged bug fix from Stable into Unstable	2013-03-05 09:03:48 -05:00
Eric Banks	7e1bfd6a7c	Included an accidental change from unstable into the previous push	2013-03-05 09:03:31 -05:00
Mauricio Carneiro	3e118a5b41	Adding interval list to Postcalling QC script It used to accept only interval strings, but I needed to pass it interval files for custom targeted projects.	2013-03-05 08:17:19 -05:00
David Roazen	74a5cd5956	run_parallel_tests: archive working directories for completed runs -deleting is too time-consuming and adds precious minutes to each run -old working directories can be deleted later by a cron job -delete working directory if global timeout has elapsed, however, since in that case we've already spent an excessive amount of time on the run	2013-03-05 05:49:25 -05:00
David Roazen	754226907e	run_parallel_tests.sh: improved test class search and post-test cleanup -search for compiled classes rather than source files to avoid picking up archived tests -add function (currently disabled) to remove test working directory when run completes -better log messages	2013-03-05 04:22:51 -05:00
Eric Banks	bd4e4f4ee3	Merged bug fix from Stable into Unstable	2013-03-04 23:24:44 -05:00

1 2 3 4 5 ...

12031 Commits (7cab709a88c86145d3be601c5ec2ea6476aa02a3) All Branches Search

12031 Commits (7cab709a88c86145d3be601c5ec2ea6476aa02a3)

All Branches