Mauricio Carneiro
b9dab068ee
New version of the pipeline starting from an ALIGNED bam going all the way to reducing using n-way out cleaning
2012-09-26 16:16:53 -04:00
Mauricio Carneiro
f8b954334e
Revised implementation of the RAWBAM => BAM pipeline
...
stripped out all the FQ pipeline and tumor/normal information.
2012-09-26 13:37:15 -04:00
Mauricio Carneiro
c9c2682f86
removing annoying xml from IDEA configuration
2012-09-25 17:18:44 -04:00
Mauricio Carneiro
9486131d17
First implementation of the CMI data processing pipeline, handling both germline and cancer BAM/FQ => BAM.
...
Not ready for prime time yet, need more work!
2012-09-25 17:15:42 -04:00
Mauricio Carneiro
cb8d4c97e1
First implementation of a generic 'bundled' Data Processing Pipeline for germline and cancer.
...
not ready for prime time yet!
2012-09-25 17:13:50 -04:00
Mauricio Carneiro
65b100f9b0
Reverting the DPP to the original version, going to create a new simplified version for CMI in private.
2012-09-25 12:02:34 -04:00
Mauricio Carneiro
4324bd72fd
Updating Intellij enviroment and adding Scala
2012-09-25 10:51:53 -04:00
Mauricio Carneiro
4aad135f8c
Generic input file name recognition (still need to implement support to FastQ, but it now can at least accept it)
2012-09-24 17:01:17 -04:00
Mauricio Carneiro
ca84586443
Adding default intellij configuration files
2012-09-24 16:15:57 -04:00
Mauricio Carneiro
7cf9911924
Fixed ReduceReads bug where variant regions were missing.
...
This affected variant regions with more than 100 reads and less than 250 reads. Only bams reduced with GATK v2 and 2.1 were affected.
2012-09-19 16:09:08 -04:00
Ryan Poplin
eb63221875
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2012-08-30 09:19:35 -04:00
Eric Banks
150a969279
Be careful with String manipulation when constructing alleles in SomaticIndelDetector
2012-08-29 15:13:28 -04:00
Eric Banks
3d476487c6
LIBS is totally busted for deletions. Putting a check in AD for bad pileup event bases so that we don't produce busted alleles. We must fix LIBS ASAP.
2012-08-27 12:13:12 -04:00
Mark DePristo
dcc972a557
Usability cleanup for BQSR
...
-- I'm seeing a lot of people trying to use BinaryTagCovariate in the community. They really shouldn't do this, so I moved it to private.
-- Throw an exception if its required bintag argument is missing
-- Check explicitly if user is requesting DinucCovariate and tell them that its been retired in favor of ContextCovariate
-- Show the type (Required, Experimental, Standard) of the covariates when running --list
2012-08-25 14:53:00 -04:00
Ryan Poplin
5f8574bd15
Fixing typo in error message.
2012-08-24 10:48:41 -04:00
Ryan Poplin
e5cfdb4811
Bug fix for popular _Duplicate allele added to VariantContext_ error reported on the forum. It seems to be due to lower case bases in the reference being treated as reference mismatches. We would try to turn these mismatches into SNP events, for example c/C. We now uppercase the result from IndexedFastaSequenceFile.getSubsequenceAt()
2012-08-22 14:39:35 -04:00
Eric Banks
03017855e4
WTF - why is support for whole-read insertions all messed up in LIBS? I've pushed a temporary patch for now (the right solution should certainly not be implemented in stable; LIBS needs to be better thought out). Added another unit test.
2012-08-22 00:24:01 -04:00
Eric Banks
40d5efc804
Fix for Adam K's reported bug: we weren't handling reads that were entirely insertions properly in LIBS. Specifically, the event bases were off-by-one (which was disasterous in Adam's case with a 1bp read). Added a unit test to cover this case.
2012-08-20 23:12:41 -04:00
Eric Banks
5b1781fdac
Merge remote-tracking branch 'unstable/master'
2012-08-20 21:18:54 -04:00
Ryan Poplin
5db3bd6fd2
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-20 15:28:57 -04:00
Ryan Poplin
464d49509a
Pulling out common caller arguments into its own StandardCallerArgumentCollection base class so that every caller isn't exposed to the unused arguments from every other caller.
2012-08-20 15:28:39 -04:00
Eric Banks
4450d66c64
Fixing the docs for DP and AD
2012-08-20 15:10:24 -04:00
Ryan Poplin
c67d708c51
Bug fix in HaplotypeCaller for non-regular bases in the reference or reads. Those events don't get created any more. Bug fix for advanced GenotypeFullActiveRegion mode: custom variant annotations created by the HC don't make sense when in this mode so don't try to calculate them.
2012-08-20 13:41:08 -04:00
Eric Banks
154f65e0de
Temporarily disabling multi-threaded usage of BaseRecalibrator for performance reasons.
2012-08-20 12:43:17 -04:00
Eric Banks
97b191f578
Thanks to Guillermo I was able to isolate an instance of where the MLEAC > AN. It turns out that this is valid, e.g. when PLs are all 0s for a sample we no-call it but it's allowed to factor into the MLE (since that's the contract with the exact model). Removing the check in UG and instead protecting for it in the AlleleCount stratification.
2012-08-20 01:16:23 -04:00
Mark DePristo
7fa76f719b
Print "Parsing data stream with BCF version BCFx.y" in BCF2 codec as .debug not .info
2012-08-19 10:32:55 -04:00
Mark DePristo
9121b98167
CombineVariants outputs the first non-MISSING qual, not the maximum
...
-- When merging multiple VCF records at a site, the combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value. The previous behavior was to take the max QUAL, which resulted in sometime strange downstream confusion.
2012-08-19 10:29:38 -04:00
David Roazen
342a5b68ed
Bring bamboo performance test runner script under version control
2012-08-18 21:08:29 -04:00
Mark DePristo
d3206e35e0
Cleanup and expansion of GATKPerformanceOfTime
...
-- Does BQSR parallelism test
-- Does CountLoci parallelism test
-- Updated R script
2012-08-18 18:47:26 -04:00
Mauricio Carneiro
d16cb68539
Updated and more thorough version of the BadCigar read filter
...
* No reads with Hard/Soft clips in the middle of the cigar
* No reads starting with deletions (with or without preceding clips)
* No reads ending in deletions (with or without follow-up clips)
* No reads that are fully hard or soft clipped
* No reads that have consecutive indels in the cigar (II, DD, ID or DI)
Also added systematic test for good cigars and iterative test for bad cigars.
2012-08-17 17:05:27 -04:00
Mark DePristo
980685af16
Fix GSA-137: Having both DataSource.REFERENCE and DataSource.REFERENCE_BASES is confusing to end users.
...
-- Removed REFERENCE_BASES option. You only have REFERENCE now. There's no efficiency savings for the REFERENCE_BASES option any longer, since the reference bases are loaded lazy so if you don't use them there's effectively no cost to making the RefContext that could load them.
2012-08-17 14:55:38 -04:00
Eric Banks
2676b7fc2e
Put in a sanity check that MLEAC <= AN
2012-08-17 11:49:53 -04:00
Mark DePristo
0a706c9105
Add support for CombineVariants nt option in GATKPerformanceOverTime
...
-- Also includes some nicer PDF formatting
2012-08-17 11:49:02 -04:00
Mark DePristo
bf6c0aaa57
Fix for missing formatter in R 2.15
...
-- VariantCallQC now works on newest ESP call set
2012-08-17 11:49:02 -04:00
Mark DePristo
daa26cc64e
Print to logger not to System.out in CachingIndexFastaSequenceFile when profiling cache performance
2012-08-17 11:49:02 -04:00
Mark DePristo
be0f8beebb
Fixed GSA-434: GATK should generate error when gzipped FASTA is passed in.
...
-- The GATK sort of handles this now, but only if you have the exactly correct sequence dictionary and FAI files associated with the reference. If you do, the file can be .gz. If not, the GATK will fail on creating the FAI and DICT files. Added an error message that handles this case and clearly says what to do.
2012-08-17 11:49:02 -04:00
Mark DePristo
a3d2764d11
Fixed: GSA-392 @arguments with just a short name get the wrong argument bindings
...
-- Now blows up if an argument begins with -. Implementation isn't pretty, as it actually blows up during Queue extension creation with a somewhat obscure error message but at least its something.
2012-08-17 11:49:01 -04:00
Mark DePristo
4c0f198d48
Potential fix for GSA-484: Incomplete writing of temp BCF when running CombineVariants in parallel
...
-- Keep reading from BCF2 input stream when read(byte[]) returns < number of needed bytes
-- It's possible (I think) that the failure in GSA-484 is due to multi-threading writing/reading of BCF2 records where the underlying stream is not yet flushed so read(byte[]) returns a partial result. No loops until we get all of the needed bytes or EOF is encounted
2012-08-17 11:49:01 -04:00
Mark DePristo
de3be45806
Proper function call in BCF2Decoder to validateReadBytes
2012-08-17 11:49:01 -04:00
Mark DePristo
67ebd65512
Bugfix for potential SEGFAULT with JNA getting execution hosts for LSF with multiple hosts
2012-08-17 11:49:01 -04:00
Mark DePristo
54e7302daf
Improvements to GATKPerformanceOverTime
...
-- CombineVariants parallelism test
-- Easy way to ask for specific runs with enum argument
-- Update for R to handle new outputs
2012-08-17 11:49:01 -04:00
Eric Banks
53383e82ec
Hmm, not good. Fixing the math in PBT resulted in changed MD5s for integration tests that look like significant changes. I am reverting and will report this to Laurent.
2012-08-16 21:41:18 -04:00
Eric Banks
65c594afff
Better error message for reads that begin/end with a deletion in LIBS
2012-08-16 21:27:07 -04:00
Mark DePristo
6a2862e8bc
GSA-483: Bug in GATKdocs for Enums
...
-- Fixed to no long show constants in enums as constant values in the gatkdocs
2012-08-16 16:24:17 -04:00
Eric Banks
3253fc216b
FindBugs 'Maintainability' fixes
2012-08-16 15:53:06 -04:00
Eric Banks
05cbf1c8c0
FindBugs 'Efficiency' fixes
2012-08-16 15:40:52 -04:00
Mark DePristo
d8071c66ed
Removing SlowGenotype object from GATK
2012-08-16 15:23:06 -04:00
Eric Banks
a22e7a5358
Should've run 'ant clean' instead of just 'ant'. In any event, these are 2 cases where we are setting a class's internal static variable directly. Very dangerous.
2012-08-16 15:07:32 -04:00
Eric Banks
47b4f7b7e5
One final FindBugs related fix. I think it's safe to consider these changes 'fixes' that are allowed to go in during a code freeze.
2012-08-16 14:59:05 -04:00
Eric Banks
ded0e11b45
Killing off some FindBugs 'Realiability' issues
2012-08-16 14:00:48 -04:00