Commit Graph

4179 Commits (4d4ef5b42c0f722aa0ac44fe01ac0b153bf37cff)

Author SHA1 Message Date
ebanks 4d4ef5b42c In the end, it's not worth rewriting TranscriptToInfo from scratch. I'm keeping the old one around for a bit so I can play with this new version which 1. doesn't store the records in memory so can be run in under 1Gb of memory, 2. actually emits all of the records (the original fails in some cases), and 3. is refactored to cut out ~20% of the code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4215 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-06 02:37:34 +00:00
kiran 0dd5a0990d Now annotates sites marked as filtered out (this is important if sites are in a lower-quality tranche).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4214 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-04 00:36:55 +00:00
kiran e9af893bf4 Write headers that are VCF4.0 compliant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4213 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 23:46:27 +00:00
delangel ef7454a241 Minor improvements to indel genotyper:
a) Ability to specify haplotype size from command line
b) Expand reference context  window so we can form haplotypes for longer indel events.
c) small bug fix in temp output writer (to be removed once I can emit vcfs)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4212 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 22:52:08 +00:00
depristo 7eeabe534a QSample walker for 1KG -- measures aggregate quality of sequencing. Includes misc. improvements throughtout the code, including using the new Tribble GenotypeLikelihoods class for working with VCF GLs from the UG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4211 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 18:21:43 +00:00
rpoplin e3962c0d13 VR integration tests are longer but much more useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4210 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 15:50:19 +00:00
hanna da11efa1a2 Automatically write BAM file indices for coordinate-sorted BAMs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4209 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 14:10:44 +00:00
fromer 529eecd4dc Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4208 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:38:46 +00:00
fromer c0ce9ca8cc Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4207 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:32:30 +00:00
rpoplin 60003aeaca Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4206 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:31:49 +00:00
fromer c119f64514 Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4205 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:24:18 +00:00
fromer fc13191352 Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4204 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:23:24 +00:00
depristo 3c9597d45a OnTraversalDone writes output to out now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4203 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:55:03 +00:00
depristo 73d41bfa24 CountLoci nows writes out to a file for Queue status tracking. VariantAnnotatorEngine has a special group None that doesn't add any annotations; useful for those who are testing UG performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4202 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:52:33 +00:00
ebanks b59d62927e Fix busted performance test (-outputBam has been deprecated in the BQ recalibrator in favor of -o)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4201 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:51:53 +00:00
hanna 70bb480939 The battle is over. Picard is revved.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4200 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 05:28:01 +00:00
ebanks fdaac4aa78 As the VCF guru, I'll take this one for Andrey. Someone has actually found a deletion at the beginning of the chromosome. Instead of failing with an ArrayIndexOutOfBoundsException, just don't try to print out the record. Our VCF writer doesn't really support this case (yet).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4199 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:27:43 +00:00
ebanks c45ffcdaed Changing documentation (temporarily) to warn people that -U is not supported.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4198 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:18:07 +00:00
delangel 8a7f5aba4b First more or less sort of functional framework for statistical Indel error caller. Current implementation computes Pr(read|haplotype) based on Dindel's error model. A simple walker that takes an existing vcf, generates haplotypes around calls and computes genotype likelihoods is used to test this as first example. No attempt yet to use prior information on indel AF, nor to use multi-sample caller abilities.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4197 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 00:25:34 +00:00
fromer a1cf3398a5 Added basic version of phasing evaluation: GenotypePhasingEvaluator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4196 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 22:09:50 +00:00
kshakir fd5970fdd4 At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them.
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:30:48 +00:00
rpoplin 0bb05fb472 Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4194 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:12:09 +00:00
chartl 3a4844ebde Additional partition types into DepthOfCoverage:
- Sequencing Center
- Platform
- Sample by Center
- Sample by Platform
- Sample by Platform by Center <---- needed for analysis I'm doing

The fact that the latter three needed their own partition types, rather than being dictatable from the command line, combined with the new hierarchical traversal types, and new output formatting engine, suggest that DepthOfCoverageV3 is about ready to be retired in favor of a newer, sleeker version.

For now, this will do.
 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4193 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 19:30:03 +00:00
chartl 590bb50d16 Test for missing read group
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4192 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 14:22:13 +00:00
kiran acd6bd2430 Experimental tool to annotates indels that are provided in a VCF file based on RefGene. Specifies gene, transcript, strand, type (Non-frameshift, frameshift, 5'-UTR, 3'-UTR, SpliceSiteDisruption, Intron, or Unknown).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4191 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 23:30:28 +00:00
hanna dc5f858d29 Replaced placeholder support for splitting by read group with read support (sorry everyone), and added relatively comprehensive unit tests to ensure that splitting by read group works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4190 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:24:50 +00:00
rpoplin b28f63a948 Base recalibrator now uses -o and deprecates -outputBam
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4189 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:13:50 +00:00
kshakir 33400074fa Updated tribble BED parsing code to use the official UCSC spec, and updated tests to match expected results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4188 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 21:49:06 +00:00
depristo 924e16f4f0 More robust analysis tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4187 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 20:24:04 +00:00
depristo ca503e5801 Queue scripts for recalibration and running nSample UG jobs pre and dynamic merging
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4186 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 20:23:37 +00:00
depristo 995cfe34fe You can have an error so early that some engine fields are uninitialized. Commit protects RunReport from these errors
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4185 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 19:00:25 +00:00
rpoplin a975db2c2e Bug fix for the case of reads with no read bases!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4184 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:58:54 +00:00
depristo 0c54bf4195 Better reporting and now with a special mode for listing exceptions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4183 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:19:51 +00:00
corin cdad243645 updated version of the DPR. Now produces part of the tearsheet as well as good depth of coverage figures
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4182 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:38:58 +00:00
rpoplin 469bbaa240 Added more integration tests for the variant quality score recalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4181 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:31:24 +00:00
depristo fc5caa98a5 Improved reporting now with metrics by day/week/etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4180 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 02:43:13 +00:00
depristo 8c4009ee18 Oops, don't enable reporting in integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4179 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 22:56:18 +00:00
rpoplin 5b94c926c8 More precise language.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4178 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:44:22 +00:00
rpoplin 96040726ac Better exception text for the common error of providing only dbsnp but giving dbsnp sites zero clustering weight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4177 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:36:43 +00:00
depristo 8683087756 Suppl. tools for working with and displaying GATK run reports
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4176 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:32:22 +00:00
depristo 32c6b48106 Proper memory metrics in the file. Please use -et if at all possible
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4175 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:30:09 +00:00
chartl 63c7cbd89b Forgot to commit this long ago, change so the tables are correctly propagated
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4174 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 19:06:52 +00:00
aaron db4ff7317f allowing empty RMD files (we need to not validate their sequence dictionaries against the reference in this case)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4173 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 17:45:33 +00:00
ebanks 3d6c4fc55f Removing the obsolete --hapmap and --hapmap_chip options
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4172 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:57:05 +00:00
depristo b33873206a GATKRunReport now has an ID (random 32 char string) that uniquely identifies the JOB run and can be used to find a run in the run repository
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4171 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:18:57 +00:00
chartl 5e710050d6 minor change, bamFiles comes from the input list, not the script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4170 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:03:35 +00:00
chartl 1a14dbee1e Adding in .bam indexing; commit for Khalid
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4169 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 15:21:41 +00:00
ebanks 3c956110f3 Fixing up the VCFWriter storage code: instead of assuming all samples are coming from the input bam file (they're not), just use the original VCF header for writing the temporary thread files. Now parallelization in e.g. the Genomic Annotator works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4168 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 02:16:07 +00:00
aaron 69d92fab4f adding the ability to get iterators from Tribble without having an index, and updating the Tabix code to the latest Samtools SVN version (this still doesn't fix the outstanding tabix bugs, waiting for Heng on that).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4167 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:49:23 +00:00
fromer 50f7f18cbd Changed ReadBackedPhasing default PQ threshold to 10
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4166 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:26:15 +00:00