Commit Graph

4548 Commits (cbce3e3c83c72a8c7dff7b8fec00f00a2b419e83)

Author SHA1 Message Date
hanna 6c27a9bfc6 - Include the fasta index builder in the package.
- Unit/integration  test against the jars built to the dist directory, so that 
  the tests reveal classes that don't exist in the final jars.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4234 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 20:37:54 +00:00
fromer ce031b2f05 PhasingEvaluator prints out interesting sites (only 1 phased, or phases disagree)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4233 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:21:21 +00:00
ebanks 40283f6456 Success! TranscriptToGenomicInfo now works without the delicate hacks that Ben had put in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4232 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:06:00 +00:00
ebanks cd091d7309 This walker can NOT be tree-reducible (in its current state). Given that it's meant to be run just once for any given transcript set, this is not at all a problem.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4231 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 16:47:51 +00:00
ebanks ae9cba1c73 After an epic battle with this code until 3am last night, I have discovered that it is tragically and fatally busted. Ben clearly didn't understand how the ROD system works when writing it and so it is unusable in its current state. I've ripped out all code and it now gracefully exits telling the user that we are actively working on a replacement for this tool. Sigh.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4230 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 16:39:41 +00:00
ebanks 29f7b1e6d6 Trivial update
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4229 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 14:02:38 +00:00
ebanks cd2bfb09ef Change for Tim: invalidate the MD tag (temporarily) if it exists in a read that gets realigned
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4228 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 13:59:09 +00:00
ebanks 65edbced36 Addition for Tim: recalculate the NM and UQ tags after realignment. Also, don't fix the insert size calculation, since that's done by fix mate information.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4227 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 04:02:14 +00:00
kiran 84ddadca64 One more fix: exclude input VCF file's directory name from the output file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4226 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 03:09:08 +00:00
depristo 594fb4a547 More plots in report
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4225 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 02:56:51 +00:00
depristo c524a9ec83 Minor usability improvements to reporting tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4224 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 02:56:25 +00:00
chartl 71046e650e Added a more robust check for Jishu -- am pretty sure the .bam header is busticated
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4223 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 01:11:22 +00:00
fromer ae3f7026a4 Corrected phasing quality evaluation to correctly account for hom sites that break phase
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4222 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 22:43:54 +00:00
kiran bd878565f6 Change for Firehose: infers output path from the input VCF, so that we don't have to change a whole bunch of stuff so that Firehose knows where to expect the output file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4221 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 21:07:46 +00:00
hanna 501f6a0e14 Temporary hack to disable index creation when target BAM is /dev/null. Tim
promises me that Picard will put in a real solution next week.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4220 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 16:57:51 +00:00
fromer 754c2c761e Added minimum phasing quality for phasing evaluation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4219 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 14:29:11 +00:00
ebanks 5d0d9c7dce My parallel version of TranscriptToInfo now emits 'chr start end' instead of 'chr:start-end' for records so that 1) they can be easily sorted in coordinate order (allowing me to emit records out of order if I choose) and 2) the file can be tabix indexed (when we stop finding 'critical' bugs in that code).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4218 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 05:20:40 +00:00
hanna fe2a0bb3a6 Fix for DoC issue with multiplexer -- will retire use of multiplexer when
GATK reporting structure comes online.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4217 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 00:44:07 +00:00
kiran 19e22cfa87 Fixed a bug where the script looked for the wrong column name. Also, all results are now returned in a single plot.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4216 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-06 14:19:57 +00:00
ebanks 4d4ef5b42c In the end, it's not worth rewriting TranscriptToInfo from scratch. I'm keeping the old one around for a bit so I can play with this new version which 1. doesn't store the records in memory so can be run in under 1Gb of memory, 2. actually emits all of the records (the original fails in some cases), and 3. is refactored to cut out ~20% of the code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4215 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-06 02:37:34 +00:00
kiran 0dd5a0990d Now annotates sites marked as filtered out (this is important if sites are in a lower-quality tranche).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4214 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-04 00:36:55 +00:00
kiran e9af893bf4 Write headers that are VCF4.0 compliant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4213 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 23:46:27 +00:00
delangel ef7454a241 Minor improvements to indel genotyper:
a) Ability to specify haplotype size from command line
b) Expand reference context  window so we can form haplotypes for longer indel events.
c) small bug fix in temp output writer (to be removed once I can emit vcfs)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4212 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 22:52:08 +00:00
depristo 7eeabe534a QSample walker for 1KG -- measures aggregate quality of sequencing. Includes misc. improvements throughtout the code, including using the new Tribble GenotypeLikelihoods class for working with VCF GLs from the UG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4211 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 18:21:43 +00:00
rpoplin e3962c0d13 VR integration tests are longer but much more useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4210 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 15:50:19 +00:00
hanna da11efa1a2 Automatically write BAM file indices for coordinate-sorted BAMs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4209 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 14:10:44 +00:00
fromer 529eecd4dc Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4208 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:38:46 +00:00
fromer c0ce9ca8cc Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4207 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:32:30 +00:00
rpoplin 60003aeaca Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4206 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:31:49 +00:00
fromer c119f64514 Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4205 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:24:18 +00:00
fromer fc13191352 Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4204 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:23:24 +00:00
depristo 3c9597d45a OnTraversalDone writes output to out now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4203 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:55:03 +00:00
depristo 73d41bfa24 CountLoci nows writes out to a file for Queue status tracking. VariantAnnotatorEngine has a special group None that doesn't add any annotations; useful for those who are testing UG performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4202 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:52:33 +00:00
ebanks b59d62927e Fix busted performance test (-outputBam has been deprecated in the BQ recalibrator in favor of -o)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4201 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:51:53 +00:00
hanna 70bb480939 The battle is over. Picard is revved.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4200 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 05:28:01 +00:00
ebanks fdaac4aa78 As the VCF guru, I'll take this one for Andrey. Someone has actually found a deletion at the beginning of the chromosome. Instead of failing with an ArrayIndexOutOfBoundsException, just don't try to print out the record. Our VCF writer doesn't really support this case (yet).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4199 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:27:43 +00:00
ebanks c45ffcdaed Changing documentation (temporarily) to warn people that -U is not supported.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4198 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:18:07 +00:00
delangel 8a7f5aba4b First more or less sort of functional framework for statistical Indel error caller. Current implementation computes Pr(read|haplotype) based on Dindel's error model. A simple walker that takes an existing vcf, generates haplotypes around calls and computes genotype likelihoods is used to test this as first example. No attempt yet to use prior information on indel AF, nor to use multi-sample caller abilities.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4197 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 00:25:34 +00:00
fromer a1cf3398a5 Added basic version of phasing evaluation: GenotypePhasingEvaluator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4196 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 22:09:50 +00:00
kshakir fd5970fdd4 At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them.
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:30:48 +00:00
rpoplin 0bb05fb472 Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4194 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:12:09 +00:00
chartl 3a4844ebde Additional partition types into DepthOfCoverage:
- Sequencing Center
- Platform
- Sample by Center
- Sample by Platform
- Sample by Platform by Center <---- needed for analysis I'm doing

The fact that the latter three needed their own partition types, rather than being dictatable from the command line, combined with the new hierarchical traversal types, and new output formatting engine, suggest that DepthOfCoverageV3 is about ready to be retired in favor of a newer, sleeker version.

For now, this will do.
 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4193 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 19:30:03 +00:00
chartl 590bb50d16 Test for missing read group
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4192 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 14:22:13 +00:00
kiran acd6bd2430 Experimental tool to annotates indels that are provided in a VCF file based on RefGene. Specifies gene, transcript, strand, type (Non-frameshift, frameshift, 5'-UTR, 3'-UTR, SpliceSiteDisruption, Intron, or Unknown).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4191 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 23:30:28 +00:00
hanna dc5f858d29 Replaced placeholder support for splitting by read group with read support (sorry everyone), and added relatively comprehensive unit tests to ensure that splitting by read group works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4190 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:24:50 +00:00
rpoplin b28f63a948 Base recalibrator now uses -o and deprecates -outputBam
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4189 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:13:50 +00:00
kshakir 33400074fa Updated tribble BED parsing code to use the official UCSC spec, and updated tests to match expected results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4188 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 21:49:06 +00:00
depristo 924e16f4f0 More robust analysis tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4187 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 20:24:04 +00:00
depristo ca503e5801 Queue scripts for recalibration and running nSample UG jobs pre and dynamic merging
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4186 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 20:23:37 +00:00
depristo 995cfe34fe You can have an error so early that some engine fields are uninitialized. Commit protects RunReport from these errors
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4185 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 19:00:25 +00:00