Commit Graph

10312 Commits (b9dab068eebd962a7af3fd73a18430c118cfb9fd)

Author SHA1 Message Date
Mauricio Carneiro b9dab068ee New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning 2012-09-26 16:16:53 -04:00
Mauricio Carneiro f8b954334e Revised implementation of the RAWBAM => BAM pipeline
stripped out all the FQ pipeline and tumor/normal information.
2012-09-26 13:37:15 -04:00
Mauricio Carneiro c9c2682f86 removing annoying xml from IDEA configuration 2012-09-25 17:18:44 -04:00
Mauricio Carneiro 9486131d17 First implementation of the CMI data processing pipeline, handling both germline and cancer BAM/FQ => BAM.
Not ready for prime time yet, need more work!
2012-09-25 17:15:42 -04:00
Mauricio Carneiro cb8d4c97e1 First implementation of a generic 'bundled' Data Processing Pipeline for germline and cancer.
not ready for prime time yet!
2012-09-25 17:13:50 -04:00
Mauricio Carneiro 65b100f9b0 Reverting the DPP to the original version, going to create a new simplified version for CMI in private. 2012-09-25 12:02:34 -04:00
Mauricio Carneiro 4324bd72fd Updating Intellij enviroment and adding Scala 2012-09-25 10:51:53 -04:00
Mauricio Carneiro 4aad135f8c Generic input file name recognition (still need to implement support to FastQ, but it now can at least accept it) 2012-09-24 17:01:17 -04:00
Mauricio Carneiro ca84586443 Adding default intellij configuration files 2012-09-24 16:15:57 -04:00
Mauricio Carneiro 7cf9911924 Fixed ReduceReads bug where variant regions were missing.
This affected variant regions with more than 100 reads and less than 250 reads. Only bams reduced with GATK v2 and 2.1 were affected.
2012-09-19 16:09:08 -04:00
Ryan Poplin eb63221875 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2012-08-30 09:19:35 -04:00
Eric Banks 150a969279 Be careful with String manipulation when constructing alleles in SomaticIndelDetector 2012-08-29 15:13:28 -04:00
Eric Banks 3d476487c6 LIBS is totally busted for deletions. Putting a check in AD for bad pileup event bases so that we don't produce busted alleles. We must fix LIBS ASAP. 2012-08-27 12:13:12 -04:00
Mark DePristo dcc972a557 Usability cleanup for BQSR
-- I'm seeing a lot of people trying to use BinaryTagCovariate in the community.  They really shouldn't do this, so I moved it to private.
-- Throw an exception if its required bintag argument is missing
-- Check explicitly if user is requesting DinucCovariate and tell them that its been retired in favor of ContextCovariate
-- Show the type (Required, Experimental, Standard) of the covariates when running --list
2012-08-25 14:53:00 -04:00
Ryan Poplin 5f8574bd15 Fixing typo in error message. 2012-08-24 10:48:41 -04:00
Ryan Poplin e5cfdb4811 Bug fix for popular _Duplicate allele added to VariantContext_ error reported on the forum. It seems to be due to lower case bases in the reference being treated as reference mismatches. We would try to turn these mismatches into SNP events, for example c/C. We now uppercase the result from IndexedFastaSequenceFile.getSubsequenceAt() 2012-08-22 14:39:35 -04:00
Eric Banks 03017855e4 WTF - why is support for whole-read insertions all messed up in LIBS? I've pushed a temporary patch for now (the right solution should certainly not be implemented in stable; LIBS needs to be better thought out). Added another unit test. 2012-08-22 00:24:01 -04:00
Eric Banks 40d5efc804 Fix for Adam K's reported bug: we weren't handling reads that were entirely insertions properly in LIBS. Specifically, the event bases were off-by-one (which was disasterous in Adam's case with a 1bp read). Added a unit test to cover this case. 2012-08-20 23:12:41 -04:00
Eric Banks 5b1781fdac Merge remote-tracking branch 'unstable/master' 2012-08-20 21:18:54 -04:00
Ryan Poplin 5db3bd6fd2 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-20 15:28:57 -04:00
Ryan Poplin 464d49509a Pulling out common caller arguments into its own StandardCallerArgumentCollection base class so that every caller isn't exposed to the unused arguments from every other caller. 2012-08-20 15:28:39 -04:00
Eric Banks 4450d66c64 Fixing the docs for DP and AD 2012-08-20 15:10:24 -04:00
Ryan Poplin c67d708c51 Bug fix in HaplotypeCaller for non-regular bases in the reference or reads. Those events don't get created any more. Bug fix for advanced GenotypeFullActiveRegion mode: custom variant annotations created by the HC don't make sense when in this mode so don't try to calculate them. 2012-08-20 13:41:08 -04:00
Eric Banks 154f65e0de Temporarily disabling multi-threaded usage of BaseRecalibrator for performance reasons. 2012-08-20 12:43:17 -04:00
Eric Banks 97b191f578 Thanks to Guillermo I was able to isolate an instance of where the MLEAC > AN. It turns out that this is valid, e.g. when PLs are all 0s for a sample we no-call it but it's allowed to factor into the MLE (since that's the contract with the exact model). Removing the check in UG and instead protecting for it in the AlleleCount stratification. 2012-08-20 01:16:23 -04:00
Mark DePristo 7fa76f719b Print "Parsing data stream with BCF version BCFx.y" in BCF2 codec as .debug not .info 2012-08-19 10:32:55 -04:00
Mark DePristo 9121b98167 CombineVariants outputs the first non-MISSING qual, not the maximum
-- When merging multiple VCF records at a site, the combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value.  The previous behavior was to take the max QUAL, which resulted in sometime strange downstream confusion.
2012-08-19 10:29:38 -04:00
David Roazen 342a5b68ed Bring bamboo performance test runner script under version control 2012-08-18 21:08:29 -04:00
Mark DePristo d3206e35e0 Cleanup and expansion of GATKPerformanceOfTime
-- Does BQSR parallelism test
-- Does CountLoci parallelism test
-- Updated R script
2012-08-18 18:47:26 -04:00
Mauricio Carneiro d16cb68539 Updated and more thorough version of the BadCigar read filter
* No reads with Hard/Soft clips in the middle of the cigar
   * No reads starting with deletions (with or without preceding clips)
   * No reads ending in deletions (with or without follow-up clips)
   * No reads that are fully hard or soft clipped
   * No reads that have consecutive indels in the cigar (II, DD, ID or DI)

 Also added systematic test for good cigars and iterative test for bad cigars.
2012-08-17 17:05:27 -04:00
Mark DePristo 980685af16 Fix GSA-137: Having both DataSource.REFERENCE and DataSource.REFERENCE_BASES is confusing to end users.
-- Removed REFERENCE_BASES option.  You only have REFERENCE now.  There's no efficiency savings for the REFERENCE_BASES option any longer, since the reference bases are loaded lazy so if you don't use them there's effectively no cost to making the RefContext that could load them.
2012-08-17 14:55:38 -04:00
Eric Banks 2676b7fc2e Put in a sanity check that MLEAC <= AN 2012-08-17 11:49:53 -04:00
Mark DePristo 0a706c9105 Add support for CombineVariants nt option in GATKPerformanceOverTime
-- Also includes some nicer PDF formatting
2012-08-17 11:49:02 -04:00
Mark DePristo bf6c0aaa57 Fix for missing formatter in R 2.15
-- VariantCallQC now works on newest ESP call set
2012-08-17 11:49:02 -04:00
Mark DePristo daa26cc64e Print to logger not to System.out in CachingIndexFastaSequenceFile when profiling cache performance 2012-08-17 11:49:02 -04:00
Mark DePristo be0f8beebb Fixed GSA-434: GATK should generate error when gzipped FASTA is passed in.
-- The GATK sort of handles this now, but only if you have the exactly correct sequence dictionary and FAI files associated with the reference.  If you do, the file can be .gz.  If not, the GATK will fail on creating the FAI and DICT files.  Added an error message that handles this case and clearly says what to do.
2012-08-17 11:49:02 -04:00
Mark DePristo a3d2764d11 Fixed: GSA-392 @arguments with just a short name get the wrong argument bindings
-- Now blows up if an argument begins with -.  Implementation isn't pretty, as it actually blows up during Queue extension creation with a somewhat obscure error message but at least its something.
2012-08-17 11:49:01 -04:00
Mark DePristo 4c0f198d48 Potential fix for GSA-484: Incomplete writing of temp BCF when running CombineVariants in parallel
-- Keep reading from BCF2 input stream when read(byte[]) returns < number of needed bytes
-- It's possible (I think) that the failure in GSA-484 is due to multi-threading writing/reading of BCF2 records where the underlying stream is not yet flushed so read(byte[]) returns a partial result.  No loops until we get all of the needed bytes or EOF is encounted
2012-08-17 11:49:01 -04:00
Mark DePristo de3be45806 Proper function call in BCF2Decoder to validateReadBytes 2012-08-17 11:49:01 -04:00
Mark DePristo 67ebd65512 Bugfix for potential SEGFAULT with JNA getting execution hosts for LSF with multiple hosts 2012-08-17 11:49:01 -04:00
Mark DePristo 54e7302daf Improvements to GATKPerformanceOverTime
-- CombineVariants parallelism test
-- Easy way to ask for specific runs with enum argument
-- Update for R to handle new outputs
2012-08-17 11:49:01 -04:00
Eric Banks 53383e82ec Hmm, not good. Fixing the math in PBT resulted in changed MD5s for integration tests that look like significant changes. I am reverting and will report this to Laurent. 2012-08-16 21:41:18 -04:00
Eric Banks 65c594afff Better error message for reads that begin/end with a deletion in LIBS 2012-08-16 21:27:07 -04:00
Mark DePristo 6a2862e8bc GSA-483: Bug in GATKdocs for Enums
-- Fixed to no long show constants in enums as constant values in the gatkdocs
2012-08-16 16:24:17 -04:00
Eric Banks 3253fc216b FindBugs 'Maintainability' fixes 2012-08-16 15:53:06 -04:00
Eric Banks 05cbf1c8c0 FindBugs 'Efficiency' fixes 2012-08-16 15:40:52 -04:00
Mark DePristo d8071c66ed Removing SlowGenotype object from GATK 2012-08-16 15:23:06 -04:00
Eric Banks a22e7a5358 Should've run 'ant clean' instead of just 'ant'. In any event, these are 2 cases where we are setting a class's internal static variable directly. Very dangerous. 2012-08-16 15:07:32 -04:00
Eric Banks 47b4f7b7e5 One final FindBugs related fix. I think it's safe to consider these changes 'fixes' that are allowed to go in during a code freeze. 2012-08-16 14:59:05 -04:00
Eric Banks ded0e11b45 Killing off some FindBugs 'Realiability' issues 2012-08-16 14:00:48 -04:00