Commit Graph

3863 Commits (8768e1a2409885e07326e07f2b4d51cc4378e514)

Author SHA1 Message Date
depristo 8768e1a240 Useful profiling tool that reads in a single rod and evalutes the time it takes to read the file by byte, by line, into pieces, just the sites of the vcf, and finally the full vcf. Emits a useful table for plotting with the associated R script that can be run like Rscript R/analyzeRodProfile.R table.txt table.pdf titleString
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4728 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 14:59:16 +00:00
ebanks 7a8b85dd15 Catch the JEXL exception when trying to match a variable that's not in the context - and don't filter in these cases. Now everyone can happily go back to using the stupid (and hopefully temporary) AlleleBalance filter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4727 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 05:00:41 +00:00
ebanks caf2c21f61 Must close the writer to flush the cache
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4726 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 04:33:08 +00:00
ebanks 816c33c821 indel-related fixes to the strict validator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4725 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 04:08:34 +00:00
delangel 9cdc341be5 Trivial update for data processing paper: change syntax of output argument for Beagle by depth walker to update to new GATK format.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4724 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 01:45:44 +00:00
ebanks ea6e2218c1 1. dbsnp has some massive indels which my left-aligner was barfing on because there isn't enough reference context; fixed. 2. Lower default calling threshold to Q30 for UGv2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4722 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 19:28:33 +00:00
aaron 53672361cc capture more details when something IO-related goes wrong in writing a Tribble index
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4720 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 17:06:28 +00:00
hanna 082073ca3c Stop RBP.getPileupBySample() from throwing a NullPointerException if the
sample doesn't exist -- now returns null.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4719 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 05:17:06 +00:00
kshakir 787e5d85e9 Added the ability to test pipelines in dry or live mode via 'ant pipelinetest' and 'ant pipelinetest -Dpipeline.run=run'.
Added an initial test for genotyping chr20 on ten 1000G bams.
Since tribble needs logging support too, for now setting the logging level and appending the console logger to the root logger, not just to "org.broadinstitute.sting".
Updated IntervalUtilsUnitTest to output to a temp directory and not the SVN controlled testdata directory.
Added refseq tables and dbsnps to validation data in BaseTest.
Now waiting up to two minutes for gather parts to propagate over NFS before attempting to merge the files.
Setting scatter/gather directories relative to the -run directory instead of the current directory that queue is running.
Fixed a bug where escaping test expressions didn't handle delimiters at the beginning or end of the String.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4717 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 22:59:42 +00:00
hanna 8ca5edf89f Fix issue where non-required file inputs can throw a NullPointerException
rather than a UserException when an the input argument is specified without
an argument value. 
The magnitude of code required to fix this points to a need to give the
command-line argument system a good spring cleaning.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4714 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 01:49:17 +00:00
ebanks b9a59ea54f Adding Het/Hom ratio to the temp per sample metrics. Because I'm in a generous mood tonight, I'm going ahead and fixing the paths for the classes I'm touching...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4713 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-21 04:24:42 +00:00
ebanks cff7c6ddce These are user exceptions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4712 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-21 02:08:11 +00:00
bthomas 374c0deba2 Updating the core LocusWalker tools to include the Sample infrastructure that I added last month. This commit touches a lot of files, but only significantly changes a few: LocusIteratorByState and ReadBackedPileup and associated classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4711 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-19 19:59:05 +00:00
kshakir c723db1f4b Added a -summary jexl argument to VariantEval similar to -validate.
Updated the package of ValidationGenotyper to match the file location.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4710 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-19 04:42:46 +00:00
kshakir 79725f2d9c Excluding the QFunction log files from the set of files to delete on completion.
When a QGraph is empty displaying a warning instead of crashing with an JGraph internal assertion error.
Cleaned up code using the Log4J root logger and explicitly talking to a logger for Sting.
When integration tests are run detecting that the logger has already been setup so that messages aren't logged twice.
Updated from Ivy 2.2.0-rc1 to 2.2.0.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4707 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 20:22:01 +00:00
depristo 721e8cb679 VariantsToTable now supports wildcard captures. -F PREFIX* now captures all fields that begin with PREFIX, output as a comma-separated list of unique values. Added integration test for VariantsToTable since I find it so useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4706 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 18:54:59 +00:00
depristo 8cba86a69d Trivial code organization for the haplotype score
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4703 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 12:32:55 +00:00
hanna 9f356b6cd0 Package all walkers in org/broadinstitute/sting/gatk/walkers directory in release.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4702 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 02:33:45 +00:00
hanna 90711d445c Change the interface for RMDTrackBuilder, therefore always mandating the specification
of a sequence dictionary and related info.  This will hopefully eliminate the cases in
which the refseq track depends a sequence dictionary / contig parser that hasn't been
specified.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4700 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 19:00:17 +00:00
fromer 367cc9135f Use VariantContext and Genotype accessor methods for attributes that will return null for unparseable data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4699 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 18:19:56 +00:00
fromer 2f3578182a Added VERY preliminary version for merging refseq annotations as SNPs are merged
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4698 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:49:12 +00:00
fromer e2f7f33ce7 Added getIntegerAttribute()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4697 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:33:07 +00:00
depristo d86ab2becb JEXL expressions now generate exceptions, not warnings. Tools should catch the runtime exception to handle correctly. Removed unncessary complexity from the JEXL contexts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4695 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:08:16 +00:00
delangel 539651de30 Initial version of Indel Statistics module for Variant Eval - not for general use yet, needs more verification and more work. Older IndelHistogram module will be obsolete with this new walker. Right now, for each sample (and for all samples), the following are computed:
- Number of insertions
- Number of deletions
- Length distribution for indels.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4694 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 15:52:01 +00:00
kshakir 01b721ab61 Passing ReviewedStingExceptions through the HMS.
Added a @Hidden experimental argument -validate to VariantEval that allows external JEXL assertions that must evaluate to true will throw an exception.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4692 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 21:50:42 +00:00
hanna 24ec35deaf - Reintroduce test dependency so that the tests passing / failing is not
dependent on the contents of the integrationtest directory.  Will figure
  out how to better manage the integrationtest directory at some point in
  the future.
- Up the max heap size for tests.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4691 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 19:55:20 +00:00
fromer 62f02bf30a Minor JAVA visibility updates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4690 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 15:28:58 +00:00
ebanks f1b0f3bc49 Putting my changes from earlier in the day back in after someone (rhymes with 'Dark') trounced on them with his last commit...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4687 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 01:55:50 +00:00
hanna 8ff4e4cb25 Cleanup testng listener configuration.
- Add StingTextReporter, which provides a text dump of the errors to the
  console.  Had to create our own reporter (inheriting from the standard
  TestNG TextReporter) to work around a configuration issue with the
  TextReporter.  In an ideal world, I'd report this on the TestNG mailing
  list and help them resolve the issue, but this solution is relatively
  robust at the moment and life is too short.
- Added back the failed test listener, which generates the testng-failed.xml
  file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4686 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 23:43:14 +00:00
rpoplin b677080858 Initial checkin of the ValidationGenotyper. Not intended to be used by anybody yet. Only here for archival purposes at this point.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4685 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 22:33:49 +00:00
depristo ef2f6d90d2 VQSR now operates on LOD scores in the INFO field directly, and doesn't adjust the QUAL field. New format for tranches file uses LOD score. Old file format no longer supported. log10sumlog10() function, a very useful utility in MathUtils. No more ExtendedPileupElement! Robust math calculations in GMM so that no infinities are generated! HaplotypeScore refactored to enable use of filtered context. Not yet enabled... InferredContext getDouble and getInteger arguments now parse values from Strings if necessary
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4684 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 22:19:22 +00:00
hanna 5b83942cee - Fix DepthOfCoverage so that, when it abuses the ROD system by instantiating a track in onTraversalDone, it also supplies the correct sequence dictionary and parser.
- Changed RMDTrackBuilder to use SequenceDictionaryUtils.validateDictionaries for ref <-> ROD sequence dictionary validation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4683 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 20:34:04 +00:00
ebanks 2af508ef83 Better docs, as requested by Matt
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4681 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 18:24:15 +00:00
depristo 62be55376b no longer useful
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4677 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 17:53:33 +00:00
ebanks 35382468ee Better error checking/output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4676 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 16:36:34 +00:00
depristo 7a3a464959 Finally, the logic is right
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4674 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 14:02:09 +00:00
depristo 8d66637fc2 Bug fix for VariantsToTable with filtered records
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4673 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 13:49:16 +00:00
depristo d76b87d6e3 Useful debug file output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4672 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 13:36:52 +00:00
ebanks 28142408ff Refactoring so that all counting in UGv2 is done on the filtered context. In particular, tests for empty pileups and too many spanning deletions now use the correct counts. Also, -all_bases mode now trumps all; this one is for you, chartl.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4671 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 05:01:12 +00:00
ebanks c7229abbf7 Get rid of 'meaningless and random values' that prevent Sendu from merging PG lines. I have to admit that he did have a good point there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4670 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 03:35:12 +00:00
kshakir 2fd816ac5f Updated ordering of integration tests. GVC > VR > AVC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4669 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-14 06:33:28 +00:00
delangel cb1e8ad43a Temp bug fix for indel genotyper: if there are two or more variant contexts at a site, just choose the first one containing an indel and genotype that. There might be cases where IGv2 emits 2 indel variant contexts in at the same ref location which made us fail there. A better solution will be to form underlying haplotypes supported by reads and compute likelihoods of that.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4667 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-14 00:21:54 +00:00
depristo 82f9327b5e Throw the right exception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4666 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 22:18:42 +00:00
depristo 44d0cb6cde New version of cutting routines for VQSR. Old code removed. Working unit tests. Best practice with testng integration test (everyone look at it). Walker test now allows you to not specify no. input files, if it can infer input counts from MD5s
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4664 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 16:19:56 +00:00
kshakir 62a106ca5a Disabled VariantGaussianMixtureModelUnitTest
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4663 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 03:53:33 +00:00
kshakir 673fa841a4 Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader.
Removed obsolete usages of PackageUtils with updated PluginManager.
Ported Queue interval utilities written in scala over to Sting's java IntervalUtils.
Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles.
Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test).
While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1".
Upgraded to scala 2.8.1 and updated calls to deprecated functions.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:14:28 +00:00
depristo 42acc968b1 Unit tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4660 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:09:39 +00:00
ebanks b51762c279 When you commit code late at night you tend to make careless mistakes... like forgetting to update integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4658 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 14:41:10 +00:00
depristo 988da428ae Bug fix for old style tranches file. ApplyVariantCuts moved over, and passes integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4657 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 14:38:26 +00:00
depristo c5f8c4dd0d VariantEval test for tranches file, plus cutting over VE to use the generic Tranches framework
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4656 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 13:52:40 +00:00