Commit Graph

6850 Commits (5f8264dddb0ded5f0e039bc50c02fb2a2c8cfbb2)

Author SHA1 Message Date
Mark DePristo 5f8264dddb RMS calculation protected again n == 0 bug 2011-08-09 20:45:34 -04:00
Mark DePristo 2af206b0ae .variant not .variants 2011-08-09 19:58:10 -04:00
Mauricio Carneiro 481630da00 BWA parameters added 2011-08-09 17:05:24 -04:00
Mauricio Carneiro 22d2563823 added BWA SW alignment
The pipeline now accepts fasta/fastq files and aligns them using BWA SW, adds default basequalities, creates read groups and performs BQSR.
2011-08-09 17:05:24 -04:00
Mauricio Carneiro bd1cf4c7bc Pacbio Pipeline
Added the base quality "filling" step to allow the pipeline to handle raw pacbio BAM files. This is the first step towards a generic pacbio data processing pipeline.
2011-08-09 17:05:24 -04:00
Eric Banks 489e5cffc1 Missed a few 'variants' 2011-08-09 14:29:15 -04:00
Eric Banks 5a3c99b7b9 Fixing 'variants' change in qscript 2011-08-09 12:30:46 -04:00
Eric Banks c3c9876391 Move reference to .rods away from symlinks in preparation for their going away in GATK 1.2 2011-08-09 12:20:50 -04:00
Eric Banks b20c4d5286 Thanks to Mark for agreeing to transition from 'variants' back to 'variant'. I think I got them all but I've been jumping all around the code, so there might be a straggler or two. 2011-08-09 12:04:55 -04:00
Eric Banks 78aa6db076 added the 'reference' header line too. We are now header-compliant for vcf4.1. 2011-08-09 11:45:54 -04:00
Eric Banks ec76bf6d4a VCF headers now include 'contig' lines describing the name, length, and assembly (when easily parsable) for each contig in the reference. 2011-08-09 11:24:48 -04:00
Eric Banks 7afb5c9f1c More updates to be consistent with the new rod syntax. 2011-08-09 10:11:37 -04:00
Eric Banks 1e490e0dec Bringing up to speed with new syntax 2011-08-09 09:26:06 -04:00
Eric Banks 70b3daf689 VariantsToVCF is up and running again; integration tests are reenabled (and added one for dbSNP).ant 2011-08-09 03:03:43 -04:00
Khalid Shakir cb28875c2a Updated rod binding syntax usage on CombineVariants from .rodBind to .variants. 2011-08-09 00:46:39 -04:00
Mauricio Carneiro d15852be0a Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-09 00:04:59 -04:00
Mauricio Carneiro 2db6225c53 A read filter that sets all mapping qualities to a given value
Pacbio has decided to assign 255 to the MQ of all their reads since they claim their aligner does not produce a number equivalent to a mapping quality. Despite much back and forth, they are dead set on not using this field, so if we want to use their bams, we will need to override that. This filter does just that. Replacing all values with a given one. Default is 60.
2011-08-09 00:04:42 -04:00
David Roazen 2efa376619 Made the necessary changes to get SnpEff support working with the new rodbinding system. 2011-08-08 23:29:39 -04:00
David Roazen b180a1311a Merge branch 'snpEff' 2011-08-08 22:12:14 -04:00
David Roazen 28d8c8fcbc Modified the SnpEff integration test to run on a much smaller interval. 2011-08-08 21:51:16 -04:00
David Roazen a13bc7b929 Added an integration test for the SnpEff annotation support, as well as some extra safety checks and comments. 2011-08-08 20:01:24 -04:00
Mark DePristo 80924d24de Single positional arguments are now treated as names unless they actually match a tribble feature 2011-08-08 19:26:27 -04:00
Mark DePristo f8a56bc64b Merge branch 'master' into rodRefactor 2011-08-08 16:58:18 -04:00
Mark DePristo f8ad91b16f Reverting a bunch of bad -B type drops 2011-08-08 16:57:38 -04:00
David Roazen 5e288136e0 Added unit tests for the SnpEff codec, and made minor adjustments to the codec itself. 2011-08-08 16:51:43 -04:00
Eric Banks f0ee789a99 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-08 16:25:43 -04:00
Eric Banks d7813db217 Combine Variants was actually outputting invalid VCFs in cases where it was combining Variant Contexts with different alternate alleles: if any of the genotypes had PLs they were no longer valid/correct. Added a check for such cases (the combined VC has more alleles than an original VC) and strip out the PLs when triggered; added integration test to cover it. I also added the check to Select Variants, although it currently doesn't remove unused alleles so it should never trigger. Is there any reason not to strip out unused alleles after a select? 2011-08-08 16:25:35 -04:00
Mark DePristo 383bb6f0e0 Merge branch 'master' into rodRefactor 2011-08-08 15:25:55 -04:00
Mark DePristo 4f8fc0f2f1 VCF3 now dynamically determined 2011-08-08 15:05:47 -04:00
Mark DePristo ba7353c561 Updated IntegrationTests to use the new type free format for VCF files 2011-08-08 15:04:38 -04:00
Mark DePristo 0810c42309 GATK now does dynamic type determination for VCF files
Added UnitTests covering all of the cases.
2011-08-08 14:45:46 -04:00
Eric Banks 9756c1d7d7 Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-08 14:16:31 -04:00
Eric Banks 221b71c5a4 add 2nd round of counting covariates 2011-08-08 14:16:16 -04:00
Mark DePristo e36994e36b Refactored a FeatureManager class from RMDTrackBuilder
New class handles (vastly more cleanly) the db of tribble codecs, features, and names for use throughout the GATK.
Added SelfScopingFeatureCodec interface that allows a FeatureCodec to examine a file and determine if the file can be parsed.  This is the first step towards allowing the GATK to dynamically determine the type of a RodBinding.
2011-08-08 14:04:46 -04:00
Eric Banks 197169e47b Submitting patch from Larry Singh to make MathUtils compatible with java 1.7 2011-08-08 13:34:04 -04:00
David Roazen dd974040af When finding the highest-impact effect at a locus, all effects that are not within a
non-coding gene are now considered higher impact than all effects that are within a
non-coding gene.
2011-08-08 13:29:54 -04:00
David Roazen c1061e994c Initial support for adding genomic annotations through VariantAnnotator using
the output from the SnpEff tool, which replaces the old Genomic Annotator.
2011-08-08 13:29:53 -04:00
Eric Banks 1a0e5ab4ba Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-08 13:08:25 -04:00
Eric Banks a06f341685 qscript to assess BQSR known sets 2011-08-08 13:08:17 -04:00
Ryan Poplin 99e3a72343 Merged bug fix from Stable into Unstable 2011-08-08 12:36:17 -04:00
Ryan Poplin 8072bd9831 Updating resource bundle generation qscript for changeover to git 2011-08-08 12:35:39 -04:00
Mark DePristo 0db79207e8 Refactored dependancy from CommandLineGATK from javadocs
This allows us to run the GATK again in environments without Javadoc loading by default in the classpath
2011-08-08 12:27:13 -04:00
Mauricio Carneiro 0db46d0648 Merged bug fix from Stable into Unstable 2011-08-08 10:50:09 -04:00
Mauricio Carneiro 2fd101135c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-08 10:49:43 -04:00
Mauricio Carneiro 4d6cb33612 removing temporary bam index
The clean bai file was left behind after the data processing pipeline was done
2011-08-08 10:49:28 -04:00
Mark DePristo e5fde0d16b Merge branch 'master' into rodRefactor 2011-08-08 10:08:43 -04:00
Mark DePristo 88061ed5fa rmdir the empty tmp dir if possible 2011-08-08 09:18:20 -04:00
Mark DePristo 526b524c3c CombineVariants with new RodBinding. Bugfix
-- CombineVariants now uses the new RodBinding syntax, -V / --variants.  Passed all integration tests on first run
-- Exposed gapping bug in the List<RodBinding<T>> system now fixed.  ParserEngine now has a addRodBinding() that is called by RodBindingArgumentTypeDescriptor when it encounters each RodBinding.  This allows the system to work with collection types that are recursively parsed by the system.
2011-08-07 20:16:51 -04:00
Ryan Poplin 6693407bd8 Merged bug fix from Stable into Unstable 2011-08-07 17:39:03 -04:00
Ryan Poplin 738e94efcb Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-07 17:36:45 -04:00