Commit Graph

3553 Commits (b5d2e299d21c4e595caf8fe829582cf33733c445)

Author SHA1 Message Date
rpoplin b5d2e299d2 Make it more clear what is going on with the by-hapmap validation status in the dbSNP rod
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4289 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 17:29:31 +00:00
rpoplin 0a06fbdb94 Adding header lines to output of VR walkers to settle validator warnings. Command lines are added to the VCF header. GATK version numbers will be added to the header lines by Matt.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4288 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 16:45:03 +00:00
depristo 41fa323e63 Added iterator for tribble, fixing GS bug report. Removed unnecessary tabix double wrapping. Intergation tests to ensure the BTI works with both vcfs and vcf.gz
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4287 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 16:38:04 +00:00
asivache d7b5baf8e5 Now uses tagging of -I arguments. Multiple -I options (merging) is now allowed. In somatic mode 'tumor' and 'normal' tags are required for each input bam, the order does not matter anymore (since we use tags!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4286 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 13:58:51 +00:00
bthomas e5f81d25d4 Adding the --sample-metadata (-SM) command line argument and associated functionality. This is something Matt and I have been working on for a while. Basically, it allows you to integrate sample metadata into an analysis, by including a sample file. More detailed documentation is on the wiki: http://www.broadinstitute.org/gsa/wiki/index.php/Adding_Sample_data_to_an_analysis
This commit adds two important classes: Sample, which contains data about one sample; and SampleDataSource, which manages sample data a la ReferenceDataSource and ReadsDataSource. 

This code should be stable, but it has not been integrated with existing walkers yet. That's the next commit. 

In the meantime, feel free to experiment with the code - there are two basic example walkers in the playground.sample package. And PLEASE let me know if you see any errors/inconsistencies.

Note that this also adds a new dependency on SnakeYaml, a YAML parser.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4285 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 11:50:22 +00:00
ebanks dd23f204ab Making the UG args that allow users to proceed with insufficient bam headers (no SM or PL tags) @Hidden; removed them from wiki.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4283 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:54:50 +00:00
ebanks 514b28210e Have VF write to sdout when no -o is supplied
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4282 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:48:33 +00:00
ebanks 1901e3208e Oops, ran integration tests before Guillermo committed his change to the Beagle code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4281 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 01:41:02 +00:00
ebanks 4e83ba411f We now do lazy loading for the genotype data in VCF. Practically, almost all walkers end of loading the genotype data because we need to be smarter about transfering the unparsed genotype string when modifying VariantContexts; however, this does solve the problem for VR's piece to generate clusters (shaved off 75% of runtime for Ryan's large case). That further optimization will happen later.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4279 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 00:18:17 +00:00
depristo 74d4f124b1 Bug fixes to allow us to generate GATKRunReports for very early errors that leave the engine in a corrupt state. Vastly better error handling of common command line problems. Analysis output now notes whether an exception is a a UserException or a StingException
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4278 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 22:45:15 +00:00
delangel 2be5e862f1 forgot to commit change to MD5
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4277 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 19:28:03 +00:00
delangel 6d07181dc9 When processing Beagle output and creating new vcf, output the filtered records in the original input vcf as is, so that we don't lose the information on them when we run Beagle.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4276 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 19:18:45 +00:00
hanna 7fa6b2135b Added a back door so that integration tests can reset the sequence dictionary
in the reference.  Reset routine is not accessible to any class outside
GenomeLocParser's package.

We'll have to do something more intelligent with this when the GATK goes
distributed.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4275 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 18:58:08 +00:00
depristo dbb641280e CycleCovariate now tolerates SOLEXA as machine type. Also, exception handling is now written to stderr.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4274 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 12:35:57 +00:00
ebanks 71d2d69b41 Better error message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4273 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-14 05:04:26 +00:00
fromer 248cc308b2 ReadBackedPhasing silently ignores sites with ploidy != 2
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4272 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 21:14:17 +00:00
fromer 528f6344af Moved ReadBackedPhasingWalker to phasing sub-directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4271 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 19:36:41 +00:00
depristo fa3be2209f Improvements to the error display code to print out the SVN number in all messages. Fixes to CallableLoci and tests to check for that case
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4270 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-13 18:36:45 +00:00
depristo 4d0ff336c2 Missed update input
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4269 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:46:13 +00:00
depristo 7880863eb7 Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 15:07:38 +00:00
depristo 7ad8fbdd5a Moved GATKException to exceptions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:47:19 +00:00
depristo 1876c9856a Moved stingexception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4265 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:39:22 +00:00
depristo bccebf8899 Newly placed StingException
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4264 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:38:46 +00:00
depristo 3964e02fb6 Newly placed StingException
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4263 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:38:32 +00:00
depristo 595907e98e Moving StingException
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:34:15 +00:00
depristo 40e6179911 Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:02:43 +00:00
delangel da2e879bbc Miscellaneous improvements to indel genotyper:
- Add a simple calculation model for Pr(R|H) that doesn't rely on Dindel's HMM model. MUCH faster, at a cost of slightly worse performance since we're more sensitive to bad reads coming from sequencing artifacts (add -simple to command line to activate).
- Add debug option to calculation model so that we can optionally output useful info on current read being evaluated. (add -debugout to commandline).
- Small performance improvement: instead of evaluating haplotype to the right of indel (just with a 5 base addition to the left), it seems better to center the indel and to add context evenly around event.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4257 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 13:50:28 +00:00
ebanks 61d511f601 Small memory performance improvement: remove the mapping from the hash instead of setting the value to null (i.e. remove the key too)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4256 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:19:09 +00:00
ebanks a0231f073f Damnit. Enabling the Picard code to recalculate all of the relevant SAMRecord attribute tags means that I need to have reference bases over all read bases even after realignment (and there are some big indels in dbsnp). Fortunately, I have my trusty IndexedFastaSequenceFile reader handy! Re-enabling the previously broken performance test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4255 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 05:06:37 +00:00
hanna 87aca64716 Jumped the gun a bit on bam on-the-fly indexing -- Tim says it's not ready yet.
Turned it off by default and added a property to turn it back on.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4254 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 21:16:03 +00:00
rpoplin 7b113a4886 Truncate the floating point numbers coming out of the variant recalibration walkers. Integration tests now work with both 1.6.0_16-b01 and 1.6.0_21-b06
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4253 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 18:37:49 +00:00
depristo 8f1a32acae All exceptions thrown by the GATK have been reviewed and UserErrors replaced where appropriate. Shazam. Another check-in will remove the GATKException and restore the StingException.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4252 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 15:25:30 +00:00
aaron cf33614ddc remove the test that's failing the performance tests, please don't release until this is figured out
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4251 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 06:30:40 +00:00
aaron 4adb07683d all fixed..thanks Matt!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4250 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 06:18:59 +00:00
aaron bd4bc84abd comment out the broken aligner test again - I'll take a crack at fixing it tomorrow. Each software engineer is going to take a pass at fixing it, and we'll see who can do it with the most style.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4249 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 05:22:24 +00:00
rpoplin 61e848c4f0 It's clear from Sendu's calling and my own calling that -qScale 100.0 is a much better default value for low pass data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4248 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 01:47:21 +00:00
hanna e183b6598c - Fix our private repository of bwa reference support files.
- Update the test to point to our repository.
- Update the md5 to reflect new Picard tag ordering.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4247 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-10 00:29:26 +00:00
depristo 1de713f354 Massive review of maybe 50% of the exceptions in the GATK. GATKException is a tmp. tracker so that I can tell which StingExceptions I've reviewed. Please don't use it. If you are working on new code and are considering throwing exceptions, it's either UserError or StingException, please
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4246 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 23:21:17 +00:00
kshakir 4183e8805a Fixed reference (via busted symlink) /broad/1KG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4245 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 20:34:51 +00:00
aaron f5c295b6b2 add a little bit of documentation to the RMD track builder and wrap any exceptions thrown in tribble with the file source and line that caused the error.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4243 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 17:56:36 +00:00
rpoplin aeb897db7f VR walkers look at by-hapmap validation status by default. Eric will be updating the syntax to allow for more flexibility here.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4242 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:40:56 +00:00
kshakir d7f55574e2 Re-enabling aligner integration test now that we're back to having more than 1 or 2GB memory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4241 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:09:48 +00:00
rpoplin d625186796 I think the VR integration tests are fine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4240 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 15:00:41 +00:00
depristo 6a30617a60 Initial implementation of UserError exceptions and error message overhaul. UserErrors and their subclasses UserError.MalFormedBam for example should be used when the GATK detects errors on part of the user. The output for errors is now much clearer and hopefully will reduce GS posts. Please start using UserError and its subclasses in your code. I've replace some, but not all, of the StingExceptions in the GATK with UserError where appropriate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4239 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 11:32:20 +00:00
depristo ca9c7389ee Not useful
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4238 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:33:03 +00:00
depristo 8708753a6a checkin for removal
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4237 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-09 02:32:46 +00:00
hanna 5119bdb55e - Update DoC to support output to /dev/null.
- Add a release sanity check for DoC.
- Update release sanity checks with new command-line argument system.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4236 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 23:43:18 +00:00
fromer 1b1ec7e52d Changed default phasing window size to 10
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4235 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 21:28:36 +00:00
fromer ce031b2f05 PhasingEvaluator prints out interesting sites (only 1 phased, or phases disagree)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4233 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:21:21 +00:00
ebanks 40283f6456 Success! TranscriptToGenomicInfo now works without the delicate hacks that Ben had put in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4232 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-08 18:06:00 +00:00