Commit Graph

3441 Commits (f978c25b9daec76bddc7e89b43c03f85aef8306b)

Author SHA1 Message Date
ebanks 61d511f601 Small memory performance improvement: remove the mapping from the hash instead of setting the value to null (i.e. remove the key too)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4256 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:19:09 +00:00
ebanks a0231f073f Damnit. Enabling the Picard code to recalculate all of the relevant SAMRecord attribute tags means that I need to have reference bases over all read bases even after realignment (and there are some big indels in dbsnp). Fortunately, I have my trusty IndexedFastaSequenceFile reader handy! Re-enabling the previously broken performance test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4255 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:06:37 +00:00
hanna 87aca64716 Jumped the gun a bit on bam on-the-fly indexing -- Tim says it's not ready yet.
Turned it off by default and added a property to turn it back on.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4254 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 21:16:03 +00:00
rpoplin 7b113a4886 Truncate the floating point numbers coming out of the variant recalibration walkers. Integration tests now work with both 1.6.0_16-b01 and 1.6.0_21-b06
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4253 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 18:37:49 +00:00
depristo 8f1a32acae All exceptions thrown by the GATK have been reviewed and UserErrors replaced where appropriate. Shazam. Another check-in will remove the GATKException and restore the StingException.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4252 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 15:25:30 +00:00
rpoplin 61e848c4f0 It's clear from Sendu's calling and my own calling that -qScale 100.0 is a much better default value for low pass data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4248 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 01:47:21 +00:00
depristo 1de713f354 Massive review of maybe 50% of the exceptions in the GATK. GATKException is a tmp. tracker so that I can tell which StingExceptions I've reviewed. Please don't use it. If you are working on new code and are considering throwing exceptions, it's either UserError or StingException, please
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4246 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 23:21:17 +00:00
aaron f5c295b6b2 add a little bit of documentation to the RMD track builder and wrap any exceptions thrown in tribble with the file source and line that caused the error.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4243 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 17:56:36 +00:00
rpoplin aeb897db7f VR walkers look at by-hapmap validation status by default. Eric will be updating the syntax to allow for more flexibility here.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4242 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:40:56 +00:00
depristo 6a30617a60 Initial implementation of UserError exceptions and error message overhaul. UserErrors and their subclasses UserError.MalFormedBam for example should be used when the GATK detects errors on part of the user. The output for errors is now much clearer and hopefully will reduce GS posts. Please start using UserError and its subclasses in your code. I've replace some, but not all, of the StingExceptions in the GATK with UserError where appropriate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4239 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 11:32:20 +00:00
depristo ca9c7389ee Not useful
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4238 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:33:03 +00:00
depristo 8708753a6a checkin for removal
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4237 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:32:46 +00:00
hanna 5119bdb55e - Update DoC to support output to /dev/null.
- Add a release sanity check for DoC.
- Update release sanity checks with new command-line argument system.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4236 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 23:43:18 +00:00
fromer 1b1ec7e52d Changed default phasing window size to 10
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4235 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 21:28:36 +00:00
fromer ce031b2f05 PhasingEvaluator prints out interesting sites (only 1 phased, or phases disagree)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4233 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:21:21 +00:00
ebanks 40283f6456 Success! TranscriptToGenomicInfo now works without the delicate hacks that Ben had put in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4232 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:06:00 +00:00
ebanks cd091d7309 This walker can NOT be tree-reducible (in its current state). Given that it's meant to be run just once for any given transcript set, this is not at all a problem.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4231 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 16:47:51 +00:00
ebanks ae9cba1c73 After an epic battle with this code until 3am last night, I have discovered that it is tragically and fatally busted. Ben clearly didn't understand how the ROD system works when writing it and so it is unusable in its current state. I've ripped out all code and it now gracefully exits telling the user that we are actively working on a replacement for this tool. Sigh.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4230 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 16:39:41 +00:00
ebanks 29f7b1e6d6 Trivial update
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4229 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 14:02:38 +00:00
ebanks cd2bfb09ef Change for Tim: invalidate the MD tag (temporarily) if it exists in a read that gets realigned
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4228 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 13:59:09 +00:00
ebanks 65edbced36 Addition for Tim: recalculate the NM and UQ tags after realignment. Also, don't fix the insert size calculation, since that's done by fix mate information.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4227 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 04:02:14 +00:00
chartl 71046e650e Added a more robust check for Jishu -- am pretty sure the .bam header is busticated
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4223 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 01:11:22 +00:00
fromer ae3f7026a4 Corrected phasing quality evaluation to correctly account for hom sites that break phase
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4222 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 22:43:54 +00:00
hanna 501f6a0e14 Temporary hack to disable index creation when target BAM is /dev/null. Tim
promises me that Picard will put in a real solution next week.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4220 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 16:57:51 +00:00
fromer 754c2c761e Added minimum phasing quality for phasing evaluation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4219 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 14:29:11 +00:00
ebanks 5d0d9c7dce My parallel version of TranscriptToInfo now emits 'chr start end' instead of 'chr:start-end' for records so that 1) they can be easily sorted in coordinate order (allowing me to emit records out of order if I choose) and 2) the file can be tabix indexed (when we stop finding 'critical' bugs in that code).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4218 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-07 05:20:40 +00:00
ebanks 4d4ef5b42c In the end, it's not worth rewriting TranscriptToInfo from scratch. I'm keeping the old one around for a bit so I can play with this new version which 1. doesn't store the records in memory so can be run in under 1Gb of memory, 2. actually emits all of the records (the original fails in some cases), and 3. is refactored to cut out ~20% of the code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4215 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-06 02:37:34 +00:00
kiran 0dd5a0990d Now annotates sites marked as filtered out (this is important if sites are in a lower-quality tranche).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4214 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-04 00:36:55 +00:00
delangel ef7454a241 Minor improvements to indel genotyper:
a) Ability to specify haplotype size from command line
b) Expand reference context  window so we can form haplotypes for longer indel events.
c) small bug fix in temp output writer (to be removed once I can emit vcfs)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4212 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 22:52:08 +00:00
depristo 7eeabe534a QSample walker for 1KG -- measures aggregate quality of sequencing. Includes misc. improvements throughtout the code, including using the new Tribble GenotypeLikelihoods class for working with VCF GLs from the UG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4211 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 18:21:43 +00:00
rpoplin e3962c0d13 VR integration tests are longer but much more useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4210 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 15:50:19 +00:00
hanna da11efa1a2 Automatically write BAM file indices for coordinate-sorted BAMs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4209 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 14:10:44 +00:00
fromer 529eecd4dc Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4208 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:38:46 +00:00
fromer c0ce9ca8cc Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4207 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:32:30 +00:00
rpoplin 60003aeaca Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4206 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:31:49 +00:00
fromer c119f64514 Added phasing sub-directory to keep walkers directory clean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4205 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 13:24:18 +00:00
depristo 3c9597d45a OnTraversalDone writes output to out now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4203 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:55:03 +00:00
depristo 73d41bfa24 CountLoci nows writes out to a file for Queue status tracking. VariantAnnotatorEngine has a special group None that doesn't add any annotations; useful for those who are testing UG performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4202 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 12:52:33 +00:00
hanna 70bb480939 The battle is over. Picard is revved.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4200 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 05:28:01 +00:00
ebanks fdaac4aa78 As the VCF guru, I'll take this one for Andrey. Someone has actually found a deletion at the beginning of the chromosome. Instead of failing with an ArrayIndexOutOfBoundsException, just don't try to print out the record. Our VCF writer doesn't really support this case (yet).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4199 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:27:43 +00:00
ebanks c45ffcdaed Changing documentation (temporarily) to warn people that -U is not supported.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4198 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 03:18:07 +00:00
delangel 8a7f5aba4b First more or less sort of functional framework for statistical Indel error caller. Current implementation computes Pr(read|haplotype) based on Dindel's error model. A simple walker that takes an existing vcf, generates haplotypes around calls and computes genotype likelihoods is used to test this as first example. No attempt yet to use prior information on indel AF, nor to use multi-sample caller abilities.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4197 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-03 00:25:34 +00:00
fromer a1cf3398a5 Added basic version of phasing evaluation: GenotypePhasingEvaluator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4196 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 22:09:50 +00:00
kshakir fd5970fdd4 At chartl's superb suggestion, command line files are now all Files instead of old method of sometimes "has a File". Should be easier when reassigning them.
No longer generating deprecated GATK arguments on the Queue extensions.
Emitting deprecation warnings to Queue compile to help debugging issues.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4195 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:30:48 +00:00
rpoplin 0bb05fb472 Bug fix in VariantRecalibrator. Only add sample names from the input rod bindings, not from all rod bindings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4194 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 21:12:09 +00:00
chartl 3a4844ebde Additional partition types into DepthOfCoverage:
- Sequencing Center
- Platform
- Sample by Center
- Sample by Platform
- Sample by Platform by Center <---- needed for analysis I'm doing

The fact that the latter three needed their own partition types, rather than being dictatable from the command line, combined with the new hierarchical traversal types, and new output formatting engine, suggest that DepthOfCoverageV3 is about ready to be retired in favor of a newer, sleeker version.

For now, this will do.
 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4193 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 19:30:03 +00:00
chartl 590bb50d16 Test for missing read group
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4192 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-02 14:22:13 +00:00
kiran acd6bd2430 Experimental tool to annotates indels that are provided in a VCF file based on RefGene. Specifies gene, transcript, strand, type (Non-frameshift, frameshift, 5'-UTR, 3'-UTR, SpliceSiteDisruption, Intron, or Unknown).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4191 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 23:30:28 +00:00
hanna dc5f858d29 Replaced placeholder support for splitting by read group with read support (sorry everyone), and added relatively comprehensive unit tests to ensure that splitting by read group works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4190 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:24:50 +00:00
rpoplin b28f63a948 Base recalibrator now uses -o and deprecates -outputBam
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4189 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 22:13:50 +00:00
depristo 995cfe34fe You can have an error so early that some engine fields are uninitialized. Commit protects RunReport from these errors
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4185 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 19:00:25 +00:00
rpoplin a975db2c2e Bug fix for the case of reads with no read bases!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4184 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 16:58:54 +00:00
rpoplin 469bbaa240 Added more integration tests for the variant quality score recalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4181 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-01 15:31:24 +00:00
rpoplin 5b94c926c8 More precise language.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4178 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:44:22 +00:00
rpoplin 96040726ac Better exception text for the common error of providing only dbsnp but giving dbsnp sites zero clustering weight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4177 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 21:36:43 +00:00
depristo 32c6b48106 Proper memory metrics in the file. Please use -et if at all possible
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4175 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 20:30:09 +00:00
chartl 63c7cbd89b Forgot to commit this long ago, change so the tables are correctly propagated
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4174 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 19:06:52 +00:00
aaron db4ff7317f allowing empty RMD files (we need to not validate their sequence dictionaries against the reference in this case)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4173 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 17:45:33 +00:00
ebanks 3d6c4fc55f Removing the obsolete --hapmap and --hapmap_chip options
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4172 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:57:05 +00:00
depristo b33873206a GATKRunReport now has an ID (random 32 char string) that uniquely identifies the JOB run and can be used to find a run in the run repository
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4171 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 16:18:57 +00:00
ebanks 3c956110f3 Fixing up the VCFWriter storage code: instead of assuming all samples are coming from the input bam file (they're not), just use the original VCF header for writing the temporary thread files. Now parallelization in e.g. the Genomic Annotator works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4168 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-31 02:16:07 +00:00
aaron 69d92fab4f adding the ability to get iterators from Tribble without having an index, and updating the Tabix code to the latest Samtools SVN version (this still doesn't fix the outstanding tabix bugs, waiting for Heng on that).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4167 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:49:23 +00:00
fromer 50f7f18cbd Changed ReadBackedPhasing default PQ threshold to 10
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4166 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 21:26:15 +00:00
chartl e64d1be475 Check if VC is null before trying to subset it (can happen with indels)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4165 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-30 20:43:37 +00:00
depristo 1ddb5d17c9 hostname now fully qualified and working
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4163 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 17:04:37 +00:00
depristo 4c28fc3a39 Clear documentation for GATKRunReport
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4161 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 15:59:25 +00:00
kiran 16b75e3b9a A new version of the ErrorRateByReadPosition walker, using the GATKReport functionality to store and emit its output. This version of the walker is roughly half the number of lines as the previous version, owing simply to the removal of all of the output formatting that's now handled by GATKReport.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4160 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:41:13 +00:00
kiran fd19c63aaf A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module).
This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R.  In the end, you get a table that looks like this:

##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads
cycle  errorrate.61PA8.7         qualavg.61PA8.7
0      0.007451835696110506      25.474613284804366
1      0.002362777171937477      29.844949954504095
2      9.087604507451836E-4      32.87590975254731
3      5.452562704471102E-4      34.498999090081895
4      9.087604507451836E-4      35.14831665150137
5      5.452562704471102E-4      36.07223435225619
6      5.452562704471102E-4      36.1217248908297
7      5.452562704471102E-4      36.1910480349345
8      5.452562704471102E-4      36.00345705967977
...

A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession.  Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone.  This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect.

The display property of individual columns can be turned off.  This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file.

Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations.  For instance, two whole columns can be divided, the results of the operation being stored in a third column.  This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 05:39:24 +00:00
ebanks df76474b34 Proper filtering when indels are being lifted over
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4158 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-29 04:48:31 +00:00
depristo 3fd2392090 Improved interface to getting command line options. Now fully traverses all objects to get all internal argument collections. Preliminary (but disabled version) of phoning home (see -et argument for more information). Captures correct and erroring out runs and writes out gzipped, xml report with lots of useful information. Needs a bit more information but is approximately working. Reports going to /humgen/gsa-hpprojects/GATK/reports/ in submitted directory that will be collated by some external tool. Only operating if -et STANDARD or -et STDOUT are provided currently and REPORT_DIR contains a file called ENABLE. WalkerTest now adds -et NO_ET to tests to avoid populating the reports with tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4155 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-28 22:53:32 +00:00
rpoplin 9c3f403307 Add the calculated lod value to the info field of each recalibrated VCF record.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4153 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 21:33:58 +00:00
delangel fe19539188 Small bug fix: if a read falls at the edge of an indel event (but is not part of it), don't count it towards consistency computation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4152 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 20:37:27 +00:00
rpoplin 54355b1864 In variant quality score recalibrator Preserve the definition of known and novel to be presence in dbSNP or not even when training with 1KG project calls.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4151 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:07:59 +00:00
ebanks 7a5f297083 actually modify the vcf when a sample has been down-sampled
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4150 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 19:03:21 +00:00
ebanks 9860db64a3 Fix up liftover to enable lifting over indels
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4148 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:55:27 +00:00
hanna fb177c4fee If only dcov is specified, assume that selected downsample type is BY_SAMPLE.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4147 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 17:35:41 +00:00
ebanks 9584cbc05e UG now downsamples to 250x by default
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4146 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:53:15 +00:00
ebanks 431392330e Re-enable the max records in ram argument, which I accidentally removed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4145 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:42:49 +00:00
hanna de5ccfb0b1 Moved hasPileupBeenDownsampled() based on Eric's request. Also eliminated
@Deprecated constructors from AlignmentContext.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4142 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 16:12:05 +00:00
ebanks 427a2f85e9 The Indel Realigner now lets the engine do all of the setup for args affecting the SAM writer. Thanks, Matt!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4141 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:19:47 +00:00
asivache a3d9d23b0f Now prints het genotype with GQ=0 for each indel; in two-sample (normal-tumor) mode, prints both genotypes (N and T) as hets for germline events or hom ref for N and het for T for somatic events (all genotypes still have GQ=0)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4140 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:06:42 +00:00
ebanks dda84a0e54 Re-enabling indels for the Genomic Annotator as per Steve's patch. Steve assures me that he will test this out really well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4139 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 15:01:25 +00:00
hanna 6f4af47aac setMaxRecordsInRam now a member of StingSAMFileWriter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4138 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 14:50:41 +00:00
ebanks bfcac33e80 Cleaning up playground utils and tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:25:47 +00:00
ebanks 4979dcc9a7 Finishing up the playground cleanup (for now)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4135 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:19:37 +00:00
ebanks 0452b1ab68 archiving, removing, or promoting to core from playground
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4134 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-27 01:07:42 +00:00
hanna d773b3264b Eliminated -mrl option.
Eliminated -fmq0 option.
Eliminated read group hallucination.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4133 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 21:38:03 +00:00
depristo f384d4a5d6 A java reimplementation of vcf2table in python; supports getting more useful information about genotypes (HET, e.g.) than was possible in python.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4130 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 17:50:33 +00:00
asivache 1e193e4c20 prinring '\n' at the end of line leads to some aesthetical advantages
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4129 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:29:42 +00:00
asivache 9b3ffa5f64 Now outputs VCF (as standard output associated with -o)! Can also outptut, in parallel, a lightweight bed and fully annotated .txt (old verbose format) with --bed and --verbose, respectively
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4128 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 16:26:03 +00:00
ebanks dfae48cee0 Moving supported tools to core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4127 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 13:56:19 +00:00
ebanks 45d895dcf4 Remove the check in the Unified Genotyper for hitting the max reads at locus value. Instead, simply add a flag to the INFO field if any of the samples has been downsampled. 95% hooked up.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4126 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:50:47 +00:00
ebanks e06b2c90ef Cap the default size of join tables; this can be modified with the --maxJoinTableSize argument. Also, misc cleanup of the comments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4125 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 05:21:26 +00:00
ebanks 79cd716671 More cleanup of the Genomic Annotator. Also, we now require join tables to have unique entries for the column keyed on the join.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4124 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-26 04:43:52 +00:00
kshakir 0105e8d063 Updated Queue GATK generation to reflect -B and -I changes.
To add support for "-I:tumor tumor.bam", the GATK argument
import_file (-I) is now generated as a List of NamedFile objects.
Could not get sugar working 100%.  To activate sugar import the
gatk package.  This effectively adds a new method to java.io.File
called toNamedFile.  When adding a file to the list call
  countReads.import_file :+= myJavaFile.toNamedFile
See scala/qscript/examples for actual examples.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4122 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:17:36 +00:00
hanna bdb3a7ebe6 The tagger was automatically combining identical tags, but this is a problem
for the ROD system.  Eliminate tag combine operation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4121 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 22:01:32 +00:00
fromer 39da567d48 Changed ReadBackedPhasing to be a RodWalker (corrected to By(READS))
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4120 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:53:04 +00:00
ebanks 4678613893 Significant fixes for the Genomic Annotator.
1. Rip out all of Ben's code intended to circumvent the stable VCF Writer output system in multi-threaded mode (I threw up a little when 
I saw this code).  This will improve memory consumption when running with -nt.
2. Don't annotate indels or > bi-allelic sites.
3. Fix bug where not all records were making it into the output VCF.
4. General code clean up.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4118 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 20:16:50 +00:00
fromer 41e53d37e1 Changed ReadBackedPhasing to be a RodWalker (more efficient, since it is ROD-focused)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4117 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 19:43:57 +00:00
rpoplin ac58eb3cbb Slightly better error message for the common error of only providing a dbsnp track but giving it zero clustering weight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4114 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:41:21 +00:00
rpoplin 5623e01602 GenerateVariantClusters and VariantRecalibrator now uses hapmap and 1kg ROD bindings (in addition to dbsnp) to distinguish between knowns and novels. It no longer looks at by-hapmap validation status so providing hapmap is highly recommended. Example on the wiki. Input variants tracks now must start with input.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4113 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:33:40 +00:00
hanna bf0b6bd486 Update integration tests to use the new ROD syntax.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4112 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:13:30 +00:00
asivache 14198b74d5 Can now compute av. qualities and stddevs per cycle for both original (when present in bam) and recalibrated quals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4111 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 17:14:58 +00:00
asivache 23dbaa68e6 Can design assays when multiple (distinct) events occur at the same locus (one assay per event)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4110 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 16:52:47 +00:00
ebanks b4baa3eb8f Cleanup. INDELS model is now disconnected (and renamed 'DINDEL' in preparation for adding plumbing for Guillermo soon)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4106 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 14:52:51 +00:00
hanna 3dc78855fd Command-line argument tagging is in, and the ROD system is hacked slightly to support the new syntax
(-B:name,type file) as well as the old syntax.  Also, a bonus feature: BAMs can now be tagged at the
command-line, which should allow us to get rid of some of the hackier calls in GenomeAnalysisEngine.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4105 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 03:47:57 +00:00
fromer aa8cf25d08 Implemented fully symmetric sliding window read-backed phaser
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4104 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 21:12:32 +00:00
ebanks cba5f05538 Small fixes for consistency in the numbers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4103 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:48:25 +00:00
rpoplin 7bbd67f3c4 Fixing stray comments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4102 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:19:39 +00:00
rpoplin 85007ffa87 Some clean up for the variant recalibrator. Now uses @Input and @Output so that it can join the Queue party. Users now specify a -o, -clusterFile, -tranchesFile, and -reportDatFile. Example on the wiki. ApplyVariantCuts now has an integration test. Base quality recalibrator now requires a dbsnp rod or vcf file. Now that the base quality recalibrator is using @Output the PrintStream shouldn't be closed in OnTraversalDone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4101 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 20:14:58 +00:00
delangel f2b138d975 Small refactoring: make Haplotype a public class since it will be soon extended and shared with other callers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4100 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 17:52:36 +00:00
ebanks 43f1fb2380 Okay, finally done with VCF compression. Now:
1. Uses blocked gzip compression.
2. No more -bzip option available (since we can't compress to sdout).
3. Only file extensions that are compressed are .gz and .gzip.
4. No more need for CompressedVCFWriter.java



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4099 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 16:36:54 +00:00
ebanks 25fb53e7a2 Oops, forgot to call toLowerCase().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4097 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 14:43:24 +00:00
ebanks 7957b60768 We now automatically compress the output VCF if the file suffix is one of the supported types (.gz, .bz, .bz2). You can still specify -bzip if you want to use another file suffix (or pipe it to sdout for some reason).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4096 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 14:39:59 +00:00
rpoplin 7a8b6b87da Committing Michael Yourshaw's patch for AnalyzeCovariates. We spawn each RScript process and wait for it to finish in series. Thanks Michael!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4095 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 13:06:25 +00:00
ebanks 9fb151f417 Minor update
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4094 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 05:17:10 +00:00
ebanks 44f3c5639a I have finally figured out that when you volunteer to do something in group meeting, you keep getting pestered about it on Mark's Omniplan doc until it gets done (except for contig aliasing, of course). As such...
We can now emit bzipped VCFs from the GATK.

Details: any walker that defines a VCFWriter for its @Output (i.e. pretty much every core walker from UG and on), also has associated with it the -bzip (--bzip_compression) boolean argument.  When set, it will emit a VCF that is compressed with bzip2.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4093 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 04:14:50 +00:00
hanna 691333f75c Force isRequired() to be false for @Deprecated args.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4092 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:50:30 +00:00
hanna 5d6a6420a9 New behavior for filling it output streams: if required==true for a field and the field
is an output stream, we'll automatically create it and point it to stdout.  Otherwise, 
we'll leave it empty.  
I think about it like this: marking a field 'required' indicates to the GATK that the 
walker author requires a value for this field, and if the GATK can provide one without 
end user intervention, it will.  Maybe this is hackish.  We'll try it and see.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4091 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:39:13 +00:00
ebanks 90aef66ec5 Minor fixes for my last commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4090 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 23:25:29 +00:00
ebanks ef795825fd Yet more argument consistency updates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4089 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 20:52:30 +00:00
aaron 7474afa7a3 allow other objects access to the static method that resolves bam lists, and some renaming and improved documentation for the function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4087 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:52:00 +00:00
ebanks ccda4f6ec1 More output consistency changes (updating wiki docs as I go along).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4086 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:46:08 +00:00
ebanks c9c6ff49c2 Deprecated 'O' in favor of 'o' in the cleaner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4085 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 18:09:24 +00:00
ebanks 55a8306a0d Update the @RMD tags to look for VariantContext.class instead of ReferenceOrderedDatum.class. Since the test for rod type is broken this won't affect anything right now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4084 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:49:37 +00:00
aaron 35b9883dd6 vcfwriter is in tribble now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4083 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 17:01:04 +00:00
aaron 2d3b6d89dc adding the ability in Tribble to create indexes from a stream of features, so that we can create multiple indexes from one pass of the file. In the GATK we now create multiple indexes, and choose the
most appropriate based on feature density, and the longest feature in the file.  Also:

- Converted Tribble to TestNG; it has better features and is about 6x faster.
- As much code clean-up as I could get done.  More to do, especially in the example code.
- Moved asserts in the code to throw exceptions.
- Added getBinSize to the index interface; both indexes already implemented this.
- Removed the abstract parts of the indexCreator interface; this is now more simple.
- Added an IndexType enumeration; might be overkill but it is at least a single point of entry for index information.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4082 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 06:54:59 +00:00
kiran 295472bf69 Simple change to handle a no-call (must avoid asking for the second allele, which will be be null in this case). Also, added a hack to deal with input VCFs where there are no genotype likelihoods (needed in order to process Hapmap and 1KG VCFs). In this mode, called genotypes are assigned a likelihood of 0.96, and alternative genotypes are given 0.02 each. I know Beagle actually takes genotype data without likelihoods, so this might not be the right way to do this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4081 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:13:09 +00:00
kiran dec713a184 Simple test code from Steve Schaffner to compute R^2 and D'. This is just for educational purposes. Don't use this code for anything, ever!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4080 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-23 05:06:16 +00:00
hanna c177801d81 Add deprecated command-line arguments, and switched over UG to output to
-o/--out instead of -varout.  Let's watch as our intrepid support engineer
gracefully responds to all the incoming questions of the form: "the GATK told
me to use -o instead of -varout.  What do I do?"


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4078 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 21:01:44 +00:00
hanna b80cf7d1d9 Modifications to the output system for better interaction with @Output. Multiplexed arguments. More details in the Monday meeting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4077 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 14:27:05 +00:00
ebanks 30a104228a Don't require entropy reduction when cleaning only at known sites; instead we need to trust the known indels. This will improve consistency between lane-level and aggregated cleaning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4076 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-22 02:44:38 +00:00
depristo b6989289fc Potential bug fix for bad references where some codons may have Ns
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4075 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 12:09:33 +00:00
kiran 121b4f23b6 Simple change to allow a list of samples or regular expressions to be provided in a text file (one line per sample).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4074 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-21 00:01:48 +00:00
ebanks 165dc6d3b0 Ryan, what did you decide about supporting this tool? Is it still useful?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4073 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 19:16:14 +00:00
ebanks 2ef2f1b24a Fix UG's simple indel calculation model so that deletions are created correctly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4072 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-20 15:35:47 +00:00
fromer 1c4784999a Updated to work exclusively in log10 space
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4069 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 21:31:07 +00:00
fromer 3af4e618cc Fixed precision issues with PQ (phasing quality)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4068 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:34:47 +00:00
kshakir 88ca1fb22c Lazy loading reflections so Queue can hack the classpath before the PluginManager looks for classes.
Removed extra quotes from 'cd' pre-exec command.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4067 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:29:52 +00:00
aaron 63ada20da5 allow RefSeq files to optionally contain the header line, which is the default output from the UCSC table browser
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4065 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 20:25:37 +00:00
fromer effeedf1a3 Updated Bayesian phasing method to output per-site phasing statistics (and to not cap PQ at 40)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4064 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:55:47 +00:00
aaron 04e5b28f6d updates for VCF; we can no longer cache genotypes or alleles in a static array, this is bad for sharred memory parallel runs. One instance per codec was better for performance than using ThreadLocal code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4063 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:34:44 +00:00
corin 8054b6b295 Changing a name of a column for variantevals output for easier reading by R--let me know if this needs to be updated elsewhere; it's just a space to an underscore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4062 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 19:18:16 +00:00
ebanks 4b94f8c21b Silly me, I forgot to check for the contig boundaries. Thank goodness for performance tests!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4061 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 18:40:26 +00:00
aaron f16bb1e830 fix for a bug in package utils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4060 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 15:01:50 +00:00
fromer 15c5aa6e48 Efficient iteration over all possible combinations of variable assignments, for variables of arbitrary cardinalities
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4059 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 14:14:37 +00:00
ebanks 1ec305cd15 Fix for running the cleaner at the lane-level for known indels only: instead of relying on the reads to get the reference sequence, we now use an IndexedFastaSequenceFile in all cases and pad the reference with bases on either end. This allows us to deal with cases in which we are trying to clean just a single deletion-containing read with tiny LOD (so the read needs to be pushed off the seen reference; @Reference doesn't yet work for Read Walkers) and has the added benefit of allowing us now to get much larger known indels that aren't completely covered with reads.
Thanks to Matt for the advice.

Also, for Guillermo: while I was at it, I changed the .stats debug output to emit the original interval instead of the cleaned region.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4058 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 11:31:13 +00:00
ebanks 98f7679619 Fixed the bug reported on GS regarding a clipped read that got moved several hundred bases away. The code that got triggered here was written back in the original version of the cleaner and it never actually did the right thing.
While I was fixing it, I noticed that we weren't allowing the cleaner to un-clean reads with indels when they're wrong even though we should.  Hypothetically, that should rarely happen: only when we can left-align out an indel or when the original mapper really went haywire.  This situation is rare enough that I'm calling logger.info to let the user know it's happening and suggesting that they double-check that everything looks right with their reads.  Better to be extra-cautious now that the cleaner is moving into the 1kg and Broad production pipelines soon
.
Mark, have no fear: this was truly a rare edge case - one that won't affect the cleaning stats.  There is no need to re-clean the data processing paper bams!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4057 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 01:42:48 +00:00
aaron 3dc4d3c3a9 removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also
removed the auto-deletion of the reflections jar, and removed the very old OmniPlan document we had checked-in.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4056 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 00:42:37 +00:00
fromer 1336ea17a3 quality-scored-based Bayesian phasing algorithm implemented
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4055 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-18 21:17:46 +00:00