Commit Graph

4655 Commits (d492eb94ad2bb7e5c79a4d9cd051feca3abfd8a7)

Author SHA1 Message Date
kiran d492eb94ad Actually subsets the resulting table now, like it was supposed to all along.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4696 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:18:23 +00:00
depristo d86ab2becb JEXL expressions now generate exceptions, not warnings. Tools should catch the runtime exception to handle correctly. Removed unncessary complexity from the JEXL contexts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4695 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:08:16 +00:00
delangel 539651de30 Initial version of Indel Statistics module for Variant Eval - not for general use yet, needs more verification and more work. Older IndelHistogram module will be obsolete with this new walker. Right now, for each sample (and for all samples), the following are computed:
- Number of insertions
- Number of deletions
- Length distribution for indels.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4694 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 15:52:01 +00:00
kiran 50dbbdb8ab Retrieves per-sample or per-lane metrics from the SQUID database and populates a dataframe with the results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4693 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 22:46:07 +00:00
kshakir 01b721ab61 Passing ReviewedStingExceptions through the HMS.
Added a @Hidden experimental argument -validate to VariantEval that allows external JEXL assertions that must evaluate to true will throw an exception.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4692 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 21:50:42 +00:00
hanna 24ec35deaf - Reintroduce test dependency so that the tests passing / failing is not
dependent on the contents of the integrationtest directory.  Will figure
  out how to better manage the integrationtest directory at some point in
  the future.
- Up the max heap size for tests.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4691 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 19:55:20 +00:00
fromer 62f02bf30a Minor JAVA visibility updates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4690 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 15:28:58 +00:00
fromer d204355a32 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4689 348d0f76-0448-11de-a6fe-93d51630548a 2010-11-16 15:17:57 +00:00
fromer 1a567b3de8 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4688 348d0f76-0448-11de-a6fe-93d51630548a 2010-11-16 15:17:50 +00:00
ebanks f1b0f3bc49 Putting my changes from earlier in the day back in after someone (rhymes with 'Dark') trounced on them with his last commit...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4687 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 01:55:50 +00:00
hanna 8ff4e4cb25 Cleanup testng listener configuration.
- Add StingTextReporter, which provides a text dump of the errors to the
  console.  Had to create our own reporter (inheriting from the standard
  TestNG TextReporter) to work around a configuration issue with the
  TextReporter.  In an ideal world, I'd report this on the TestNG mailing
  list and help them resolve the issue, but this solution is relatively
  robust at the moment and life is too short.
- Added back the failed test listener, which generates the testng-failed.xml
  file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4686 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 23:43:14 +00:00
rpoplin b677080858 Initial checkin of the ValidationGenotyper. Not intended to be used by anybody yet. Only here for archival purposes at this point.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4685 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 22:33:49 +00:00
depristo ef2f6d90d2 VQSR now operates on LOD scores in the INFO field directly, and doesn't adjust the QUAL field. New format for tranches file uses LOD score. Old file format no longer supported. log10sumlog10() function, a very useful utility in MathUtils. No more ExtendedPileupElement! Robust math calculations in GMM so that no infinities are generated! HaplotypeScore refactored to enable use of filtered context. Not yet enabled... InferredContext getDouble and getInteger arguments now parse values from Strings if necessary
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4684 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 22:19:22 +00:00
hanna 5b83942cee - Fix DepthOfCoverage so that, when it abuses the ROD system by instantiating a track in onTraversalDone, it also supplies the correct sequence dictionary and parser.
- Changed RMDTrackBuilder to use SequenceDictionaryUtils.validateDictionaries for ref <-> ROD sequence dictionary validation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4683 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 20:34:04 +00:00
corin f8e1ea7b64 Script based on the second part of Eric's createTranscriptToGenomicInfoTables.pl script which sorts flat files by reference order
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4682 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 19:44:32 +00:00
ebanks 2af508ef83 Better docs, as requested by Matt
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4681 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 18:24:15 +00:00
corin 5466365575 Fixing a silly typo
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4680 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 18:16:51 +00:00
corin a64f693b20 Updated pipeline script to include dbSnp for UG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4679 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 18:09:47 +00:00
kshakir 302e8f0239 Fixed bug where the command directory was not being set to an absolute path, leading LSF to write some .done files to /tmp.
No longer using the command directory for temporary .done files, and instead using the user specified temporary directory.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4678 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 17:59:39 +00:00
depristo 62be55376b no longer useful
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4677 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 17:53:33 +00:00
ebanks 35382468ee Better error checking/output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4676 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 16:36:34 +00:00
asivache f2ee5dc319 regexp fixed, now can find OBS_COUNTS followed by both [C/A/R] (obsolete) and [C/A/T] (current)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4675 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 15:28:38 +00:00
depristo 7a3a464959 Finally, the logic is right
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4674 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 14:02:09 +00:00
depristo 8d66637fc2 Bug fix for VariantsToTable with filtered records
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4673 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 13:49:16 +00:00
depristo d76b87d6e3 Useful debug file output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4672 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 13:36:52 +00:00
ebanks 28142408ff Refactoring so that all counting in UGv2 is done on the filtered context. In particular, tests for empty pileups and too many spanning deletions now use the correct counts. Also, -all_bases mode now trumps all; this one is for you, chartl.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4671 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 05:01:12 +00:00
ebanks c7229abbf7 Get rid of 'meaningless and random values' that prevent Sendu from merging PG lines. I have to admit that he did have a good point there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4670 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 03:35:12 +00:00
kshakir 2fd816ac5f Updated ordering of integration tests. GVC > VR > AVC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4669 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-14 06:33:28 +00:00
kshakir 801c562909 Now actually checking in the integration test mentioned in the prior commit: compiles the full calling pipeline.
Removed QScript usages of VariantRecalibrator's -reportDatFile, --report_dat_file


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4668 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-14 04:27:10 +00:00
delangel cb1e8ad43a Temp bug fix for indel genotyper: if there are two or more variant contexts at a site, just choose the first one containing an indel and genotype that. There might be cases where IGv2 emits 2 indel variant contexts in at the same ref location which made us fail there. A better solution will be to form underlying haplotypes supported by reads and compute likelihoods of that.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4667 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-14 00:21:54 +00:00
depristo 82f9327b5e Throw the right exception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4666 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 22:18:42 +00:00
depristo ac52b64b77 correct test data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4665 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 22:17:38 +00:00
depristo 44d0cb6cde New version of cutting routines for VQSR. Old code removed. Working unit tests. Best practice with testng integration test (everyone look at it). Walker test now allows you to not specify no. input files, if it can infer input counts from MD5s
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4664 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 16:19:56 +00:00
kshakir 62a106ca5a Disabled VariantGaussianMixtureModelUnitTest
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4663 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 03:53:33 +00:00
kshakir a2c160da2d Explicitly set TestNG to run headless with no UI.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4662 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 22:02:02 +00:00
kshakir 673fa841a4 Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader.
Removed obsolete usages of PackageUtils with updated PluginManager.
Ported Queue interval utilities written in scala over to Sting's java IntervalUtils.
Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles.
Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test).
While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1".
Upgraded to scala 2.8.1 and updated calls to deprecated functions.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:14:28 +00:00
depristo 42acc968b1 Unit tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4660 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:09:39 +00:00
depristo 4f4eec12dd Minor improvement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4659 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 19:30:54 +00:00
ebanks b51762c279 When you commit code late at night you tend to make careless mistakes... like forgetting to update integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4658 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 14:41:10 +00:00
depristo 988da428ae Bug fix for old style tranches file. ApplyVariantCuts moved over, and passes integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4657 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 14:38:26 +00:00
depristo c5f8c4dd0d VariantEval test for tranches file, plus cutting over VE to use the generic Tranches framework
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4656 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 13:52:40 +00:00
ebanks 69de3e51bf Better precision for the calculated AF value. Now looks at the total number of samples to determine how much precision is necessary. Also, changing default min BQ used for calling in UGv2 to Q17.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4655 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 08:31:40 +00:00
depristo ec83a4b765 Initial commit, without any tool changes, of a new infrastructure for determining tranches. This new version walker up from the lowest quality snps and determines Ti/Tv. This is marginally more stable than moving in the other direction when there are few novel variants (exomes). Can make a substantial difference in the size of the call set (10-20%). I'll hook it into the main system now. Includes an new class Tranche, isolated read/writing utilities that are now testing in TestVariantRecalibrator, which should be moved to UnitTest as soon as I can figure out how to do this on my mac.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4654 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 23:52:49 +00:00
depristo ed6396ed43 No longer getting the inet, it seems to potentially hang the JVM
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4653 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 23:49:42 +00:00
ebanks 2f6666a988 Correcting traversal statistics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4652 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 22:46:58 +00:00
depristo dbde721dd0 Bug fix for filtered records
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4651 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 18:54:51 +00:00
aaron 698e5cf345 for GATK style codecs, make sure we fill in their GenomeLocParser from the RMDIndexer
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4650 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 18:44:15 +00:00
aaron fd78ce6c86 include the codecs into the RMD indexer that are available in the GATK, not just Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4649 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 06:35:04 +00:00
depristo 0e062ae040 V1 of the data processing paper, produced results for the manuscript we presented. Commit for archival purposes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4648 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 01:43:21 +00:00
delangel 2f3be24a00 Improvement in exact allele frequency calculation model (still under test, but this is definitely better than what I had before). Instead of approximating log(10^x+10^y) as max(x,y), approximate full Jacobian formula max(x,y)+log(1+10^-abs(x-y)) with static lookup table for the second term.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4647 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 01:22:35 +00:00