gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	7d833256e8	Merge pull request #90 from broadinstitute/eb_allow_read_transform_ordering Added the functionality to impose a relative ordering on ReadTransformer...	2013-03-06 09:52:26 -08:00
Eric Banks	3759d9dd67	Added the functionality to impose a relative ordering on ReadTransformers in the GATK engine. * ReadTransformers can say they must be first, must be last, or don't care. * By default, none of the existing ones care about ordering except BQSR (must be first). * This addresses a bug reported on the forum where BAQ is incorrectly applied before BQSR. * The engine now orders the read transformers up front before applying iterators. * The engine checks for enabled RTs that are not compatible (e.g. both must be first) and blows up (gracefully). * Added unit tests.	2013-03-06 12:38:59 -05:00
Mark DePristo	446cd61f7e	Merge pull request #84 from broadinstitute/eb_allelic_primitives Added new walker to split MNPs into their allelic primitives (SNPs).	2013-03-06 09:02:21 -08:00
Mark DePristo	dadc079dbc	Merge pull request #89 from broadinstitute/mc_fix_output_annotation_GSA-820 Turning @Output required to false	2013-03-06 09:01:20 -08:00
Mark DePristo	64a9ccded6	Merge pull request #77 from broadinstitute/mc_postqc_tsca One line change to the post calling QC pipeline	2013-03-06 07:13:10 -08:00
Eric Banks	78721ee09b	Added new walker to split MNPs into their allelic primitives (SNPs). * Can be extended to complex alleles at some point. * Currently only works for bi-allelics (documented). * Added unit and integration tests.	2013-03-05 23:16:42 -05:00
Mauricio Carneiro	e2d41f0282	Turning @Output required to false By default all output is assigned to stdout if a -o is not provided. Technically this makes @Output a not required parameter, and the documentation is misleading because it's reading from the annotation. GSA-820 #resolve	2013-03-05 17:26:16 -05:00
delangel	f10723df3b	Merge pull request #85 from broadinstitute/md_simple_kb_report AssessNA12878 now emits a simplified assessment table by default	2013-03-05 10:39:39 -08:00
Eric Banks	2be57fbcfb	Merged bug fix from Stable into Unstable	2013-03-05 13:28:46 -05:00
Eric Banks	5e89f01e10	Don't allow the use of compressed (.gz) references in the GATK.	2013-03-05 13:28:19 -05:00
Mark DePristo	92ac9e7f65	AssessNA12878 now emits a simplified assessment table by default -- New report collapses the detailed states in the 5 key states: TP, FP, FN, TN, unknown, such as in the following example: Name VariantType AssessmentType Count variant SNPS TRUE_POSITIVE 6 variant SNPS FALSE_POSITIVE 9 variant SNPS FALSE_NEGATIVE 1213 variant SNPS TRUE_NEGATIVE 172 variant SNPS CALLED_NOT_IN_DB_AT_ALL 0 variant INDELS TRUE_POSITIVE 19 variant INDELS FALSE_POSITIVE 13 variant INDELS FALSE_NEGATIVE 262 variant INDELS TRUE_NEGATIVE 57 variant INDELS CALLED_NOT_IN_DB_AT_ALL 39 -- Use --detailed to see the previous full version -- Expanded unittests for Assessment	2013-03-05 11:51:38 -05:00
Eric Banks	b5a07da04c	Merge pull request #88 from broadinstitute/eb_fix_pairHMM_from_stable Revert push from stable	2013-03-05 06:07:50 -08:00
Eric Banks	bbbaf9ad20	Revert push from stable (I forgot that pushing from stable overwrites current unstable changes)	2013-03-05 09:06:02 -05:00
Eric Banks	a037423225	Merged bug fix from Stable into Unstable	2013-03-05 09:03:48 -05:00
Eric Banks	7e1bfd6a7c	Included an accidental change from unstable into the previous push	2013-03-05 09:03:31 -05:00
Mauricio Carneiro	3e118a5b41	Adding interval list to Postcalling QC script It used to accept only interval strings, but I needed to pass it interval files for custom targeted projects.	2013-03-05 08:17:19 -05:00
David Roazen	74a5cd5956	run_parallel_tests: archive working directories for completed runs -deleting is too time-consuming and adds precious minutes to each run -old working directories can be deleted later by a cron job -delete working directory if global timeout has elapsed, however, since in that case we've already spent an excessive amount of time on the run	2013-03-05 05:49:25 -05:00
David Roazen	754226907e	run_parallel_tests.sh: improved test class search and post-test cleanup -search for compiled classes rather than source files to avoid picking up archived tests -add function (currently disabled) to remove test working directory when run completes -better log messages	2013-03-05 04:22:51 -05:00
Eric Banks	bd4e4f4ee3	Merged bug fix from Stable into Unstable	2013-03-04 23:24:44 -05:00
Eric Banks	b715218bfe	Fix for mismatching indel quals erro: need to adjust for softclips just like we do for bases and normal quals.	2013-03-04 23:23:18 -05:00
Mark DePristo	1b7164ccdb	Merge pull request #86 from broadinstitute/mc_fix_exception_messages Just a quick cleanup on the exception messages no need to wait for bamboo.	2013-03-04 13:55:00 -08:00
Mauricio Carneiro	d0c8105387	Cleaning up hilarious exception messages Too many users (with RNASeq reads) are hitting these exceptions that were never supposed to happen. Let's give them (and us) a better and clearer error message.	2013-03-04 16:52:22 -05:00
Ryan Poplin	ce7554e9d6	Merged bug fix from Stable into Unstable	2013-03-04 12:36:04 -05:00
Ryan Poplin	0697594778	Active regions that don't contain any usable reads should just be skipped over instead of throwing an IllegalStateException.	2013-03-04 12:35:40 -05:00
Ryan Poplin	b3ecbb011d	Merge pull request #81 from broadinstitute/md_hc_bam_writing Expanded functionality of writing BAMs from the haplotype caller	2013-03-04 06:39:19 -08:00
Mark DePristo	42d3919ca4	Expanded functionality for writing BAMs from HaplotypeCaller -- The new code includes a new mode to write out a BAM containing reads realigned to the called haplotypes from the HC, which can be easily visualized in IGV. -- Previous functionality maintained, with bug fixes -- Haplotype BAM writing code now lives in utils -- Created a base class that includes most of the functionality of writing reads realigned to haplotypes onto haplotypes. -- Created two subclasses, one that writes all haplotypes (previous functionality) and a CalledHaplotypeBAMWriter that will only write reads aligned to the actually called haplotypes -- Extended PerReadAlleleLikelihoodMap.getMostLikelyAllele to optionally restrict set of alleles to consider best -- Massive increase in unit tests in AlignmentUtils, along with several new powerful functions for manipulating cigars -- Fix bug in SWPairwiseAlignment that produces cigar elements with 0 size, and are now fixed with consolidateCigar in AlignmentUtils -- HaplotypeCaller now tracks the called haplotypes in the GenotypingEngine, and returns this information to the HC for use in visualization. -- Added extensive docs to HaplotypeCaller on how to use this capability -- BUGFIX -- don't modify the read bases in GATKSAMRecord in LikelihoodCalculationEngine in the HC -- Cleaned up SWPairwiseAlignment. Refactored out the big main and supplementary static methods. Added a unit test with a bug TODO to fix what seems to be an edge case bug in SW -- Integration test to make sure we can actually write a BAM for each mode. This test only ensures that the code runs and doesn't exception out. It doesn't actually enforce any MD5s -- HaplotypeBAMWriter also left aligns indels in the reads, as SW can return a random placement of a read against the haplotype. Calls leftAlign to make the alignments more clear, with unit test of real read to cover this case -- Writes out haplotypes for both all haplotype and called haplotype mode -- Haplotype writers now get the active region call, regardless of whether an actual call was made. Only emitting called haplotypes is moved down to CalledHaplotypeBAMWriter	2013-03-03 12:07:29 -05:00
Mark DePristo	ec3bf9f362	Adding 1mb of 2x250 bp PCR-free reads to private testdata	2013-03-01 20:44:17 -05:00
Mark DePristo	b1ea2f6125	Merge pull request #83 from broadinstitute/dr_gatk_jar_with_private_GSA-803 Ant target to package a GATK jar with private included	2013-03-01 13:15:57 -08:00
David Roazen	2a1a20fc9d	Parallel tests: switch working directory from /humgen/gsa-scr1 to /humgen/gsa-hpprojects Hoping that the higher class of storage will get us down from the current ~40 minutes for a parallel run of the integration tests to the goal of ~20 minutes.	2013-03-01 16:11:29 -05:00
David Roazen	a0be74c2ef	Ant target to package a GATK jar with private included Needed before we can start emitting full unstable jars from Bamboo for our internal use.	2013-03-01 15:33:59 -05:00
David Roazen	3f7d888ea5	run_parallel_tests.sh: further improvements -accept global timeout as a command-line argument -kill outstanding jobs when timeout reached -print job output files to stdout so that they get recorded in bamboo's logs -periodically print number of jobs outstanding during run -documentation / comments	2013-03-01 14:59:10 -05:00
Mark DePristo	0cff9b8027	Merge pull request #82 from broadinstitute/dr_split_long_integration_test_classes Split long-running integration test classes into multiple classes	2013-03-01 11:07:23 -08:00
David Roazen	c5c99c8339	Split long-running integration test classes into multiple classes This is to facilitate the current experiment with class-level test suite parallelism. It's our hope that with these changes, we can get the runtime of the integration test suite down to 20 minutes or so. -UnifiedGenotyper tests: these divided nicely into logical categories that also happened to distribute the runtime fairly evenly -UnifiedGenotyperPloidy: these had to be divided arbitrarily into two classes in order to halve the runtime -HaplotypeCaller: turns out that the tests for complex and symbolic variants make up half the runtime here, so merely moving these into a separate class was sufficient -BiasedDownsampling: most of these tests use excessively large intervals that likely can't be reduced without defeating the goals of the tests. I'm disabling these tests for now until they can either be redesigned to use smaller intervals around the variants of interest, or refactored into unit tests (creating a JIRA for Yossi for this task)	2013-03-01 13:55:23 -05:00
depristo	6204e6ccc9	Merge pull request #76 from broadinstitute/md_kb_bugfix_GSA-795 Bug fixes and optimizations for NA12878 KB	2013-03-01 10:52:16 -08:00
depristo	c05d1352b1	Merge pull request #80 from broadinstitute/eb_cleanup_genomelocsortedset_GSA-775 Fixed the add functionality of GenomeLocSortedSet.	2013-03-01 08:35:20 -08:00
Eric Banks	ebd5404124	Fixed the add functionality of GenomeLocSortedSet. * Fixed GenomeLocSortedSet.add() to ensure that overlapping intervals are detected and an exception is thrown. * Fixed GenomeLocSortedSet.addRegion() by merging it with the add() method; it now produces sorted inputs in all cases. * Cleaned up duplicated code throughout the engine to create a list of intervals over all contigs. * Added more unit tests for add functionality of GLSS. * Resolves GSA-775.	2013-02-28 23:31:00 -05:00
David Roazen	6a77eee5f4	parallel tests script: pass in bamboo build number to make globally unique working directories for each run	2013-02-28 18:06:18 -05:00
David Roazen	2a7f55ae45	Further run_parallel_tests.sh quick fixes -Apparently the version of "basename" on gsa4 lacks the -s option...	2013-02-28 17:40:20 -05:00
David Roazen	394e8889f1	Fix silly typo in run_parallel_tests.sh script	2013-02-28 17:15:32 -05:00
MauricioCarneiro	e5fa1672c1	Merge pull request #79 from broadinstitute/dr_parallel_tests_prototype fingers crossed!	2013-02-28 14:12:37 -08:00
David Roazen	e6ac94fd75	Experimental script to run tests using class-level parallelism on the farm -script to dispatch one farm job per test class and monitor jobs until completion -new ant target to run tests without doing ANY compilation or extra steps at all allows multiple instances of the test suite to share the same working directory	2013-02-28 16:51:58 -05:00
droazen	ca42be9788	Merge pull request #78 from broadinstitute/dr_pdfgen_bamboo_script_GSA-794 Trivial shell script for bamboo to trigger the website pdfgen script	2013-02-28 11:48:41 -08:00
David Roazen	b050d16b22	Trivial shell script for bamboo to trigger the website pdfgen script	2013-02-28 14:45:25 -05:00
Mark DePristo	0931afab39	NA12878 KB performance improvement -- updateConsensus now don't call remove when it's updating the entire db from scratch. This radically improves performance when you are simply dropping the entire consensus and rebuilding from scratch, as the server does upon start up	2013-02-28 10:51:59 -05:00
Mark DePristo	4095a9ef32	Bugfixes for AssessNA12878 -- Refactor initialization routine into BadSitesWriter. This now adds the GQ and DP genotype header lines which are necessarily if the input VCF doesn't have proper headers -- GATKVariantContextUtils subset to biallelics now tolerates samples with bad GL values for multi-allelics, where it just removes the PLs and issues a warning.	2013-02-28 10:35:06 -05:00
depristo	92d6a4f441	Merge pull request #75 from broadinstitute/eb_missing_rg_error_GSA-407 Added better error message for BAMs with bad read groups.	2013-02-28 05:20:39 -08:00
depristo	cac3f80c64	Merge pull request #73 from broadinstitute/eb_remove_nested_hashmap_GSA-732 Replace uses of NestedHashMap with NestedIntegerArray.	2013-02-28 05:19:56 -08:00
Eric Banks	12fc198b80	Added better error message for BAMs with bad read groups. * Split the cases into reads that don't have a RG at all vs. those with a RG that's not defined in the header. * Added integration tests to make sure that the correct error is thrown. * Resolved GSA-407.	2013-02-27 16:02:56 -05:00
Eric Banks	45fc0ed261	Merge pull request #74 from broadinstitute/eb_update_rtc_docs_GSA-716 Update docs for RTC.	2013-02-27 11:58:09 -08:00
Eric Banks	d2904cb636	Update docs for RTC.	2013-02-27 14:56:44 -05:00

1 2 3 4 5 ...

12000 Commits (7d833256e8d8202118bdb056962c4f7d2265f81a) All Branches Search

12000 Commits (7d833256e8d8202118bdb056962c4f7d2265f81a)

All Branches