depristo
5539c2d9f3
--performanceLog (-PF) X.dat argument now enabled. Writes out a table (R-friendly) of the performance of the GATK over time, exactly as a more detailed version of the INFO progress meter. R script for useful plotting of the performance of the GATK over time. Will be helpful for upcoming scalability testing and debugging of memory leaks and other incremental performance problems
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4921 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 23:34:21 +00:00
depristo
4c9746f463
Disabled performance log intermediate commit. Will be refactored and committed to the responsiblity along with documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4919 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-02 22:18:12 +00:00
hanna
3fc9862964
Unit test fixed - Tribble codecs aren't designed to be stateless, but I was
...
using one as though it was. Fixed, and debug code reverted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4917 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 17:47:52 +00:00
hanna
b9cb57f4b9
A unit test is failing on bamboo in a way I can't reproduce (or even explain).
...
Checking in some debugging info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4916 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 16:35:04 +00:00
hanna
cba18116e4
A significant refactoring of the ROD system, done largely to simplify the process of
...
streaming/piping VCFs into the GATK. Notable changes:
- Public interface to RMDTrackBuilder is greatly simplified; users can use it only to build
RMDTracks and lookup codecs.
- RODDataSource and RMDTrack are no longer functionally at the same level; RODDataSources now
manage RMDTracks on behalf of the GATK, and the only direct consumers of the RMDTrack class
are the walkers that feel the need to access the ROD system directly. (We need to stamp out
this access pattern.
A few minor warts were introduced as part of this process, labeled with TODOs. These'll be
fixed as part of the VCF streaming project.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4915 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 04:52:22 +00:00
ebanks
d70483c50a
Automatically filter out reads with consecutive indel operators in the CIGAR string
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4914 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-31 04:42:54 +00:00
ebanks
848977678d
No reason to convert the GLs to a String for formatting when they're just going to be converted to PLs later. That was 5% of the UG runtime...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4913 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-29 22:06:19 +00:00
aaron
85f2968104
add convenience methods for RODs-for-reads: the ability to get all the RODs covering the read, regardless of their type or position on the read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4912 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-29 20:46:03 +00:00
depristo
d7e74f8be6
Temporary phasing evalution walker that needs to be incorporated into the newest VariantEval, whenever it is available
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4911 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-29 20:43:15 +00:00
ebanks
a31f6e4e99
Need to check isBiallelic before calling getSNPSubstitutionType for the allele swap warning
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4909 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-27 20:17:14 +00:00
ebanks
8a0c07b865
Support for indels in hapmap. This was non-trivial because not only does hapmap not tell you whether the allele is an insertion or deletion, but it also has a completely different positioning strategy (rightmost base). I'll send out an email tomorrow when the new HapMap3.3 VCF is ready.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4908 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-27 07:37:46 +00:00
chartl
6ebf5b30de
Transposing the table, and fixing some null pointer exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4906 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-23 16:22:57 +00:00
ebanks
cebfd01857
Properly output .bed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4905 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-23 14:49:24 +00:00
depristo
464d0e18e3
Bringing us back to passing integrationtests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4904 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-23 14:36:11 +00:00
depristo
8c583ea405
RBP now operates correctly at non-variant sites so we can phase hom-ref genotypes with -sampleToPhase
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4903 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-23 13:11:22 +00:00
delangel
376bc563d4
Trivial change to allow GenerateVariantClusters to be run on indels - not that VQSR now works on indels, far from it, but at least it's a first step and it allows us to generate cluster plots to see how well known/novel sites differentiate in their covariates (short answer: no difference/separation :( ).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4902 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 22:39:09 +00:00
hanna
e313eeede8
Push command-line expansions, such as BAM list unpacking and -B tag parsing, out
...
into the CommandLine* classes. This makes it easier for external functionality
(such as the VCF streamer) to use GenomeAnalysisEngine directly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4897 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 19:00:17 +00:00
depristo
66cca7de0f
renamed genotypesArePhased to isPhased, as the previous name was incorrect for several reasons. Added setPhase() to MutableGenotype. Other classes changed to reflect renaming to isPhased(). CombineVariants now supports an experimental MASTER mode where it consumes -B:master,vcf and -B:xi,vcf for any number i and updates the master with phasing information in xi.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4896 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 17:42:05 +00:00
chartl
2235245af0
PrivatePermutations generalized to compute transition counts and average probabilities (and thus was renamed). Changes in some pipelines to reflect the change. Bugfix in the batch merging pipeline (it would halt because the allele VCF for genotyping batches could become off-spec).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4894 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 15:16:15 +00:00
delangel
a1653f0c83
Another major redo for indel genotyper: this time, add ability to do allele and variant discovery, and don't rely necessarily on external vcf's to provide candidate variants and alleles (e.g. by using IndelGenotyperV2). This has two major advantages: speed, and more fine-grained control of discovery process. Code is still under test and analysis but this version should be hopefully stable.
...
Ability to genotype candidate variants from input vcf is retained and can be turned on by command line argument but is disabled by default.
Code, by default, will build a consensus of the most common indel event at a pileup. If that consensus allele has a count bigger than N (=5 by default), we proceed to genotype by computing probabilistic realigmment, AF distribution etc. and possibly emmiting a call.
Needed for this, also added ability to build haplotypes from list of alleles instead of from a variant context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4893 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-22 02:38:06 +00:00
hanna
09c7ea879d
Merging GenomeAnalysisEngine and AbstractGenomeAnalysisEngine back together.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4889 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-21 02:09:46 +00:00
depristo
b3ac47812c
No longer emits records at filtered sites, in sub-sampling mode
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4883 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-20 16:43:50 +00:00
depristo
60880b925f
VC utils prune method now will keep genotype attributes as well as info keys. RBP now emits a far reduce (NO INFO, only GT:GQ:PG) records, further reducing size of phasing output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4882 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-20 16:33:14 +00:00
depristo
8604335566
Minor improvements to further reduce debugging output. When running in -samplesToPhase mode, now only including the samples to phase in the output VCF, making it very much smaller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4881 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-20 16:19:47 +00:00
depristo
ff90c24f28
RBP now supports operating on a subset of samples, outputting a much reduced VCF file appropriate for merging later. Also, general optimization to avoid printing enormous amounts of data to logger.debug by using a glocal static variable DEBUG that conditionally allows writing to the variable. Passes integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4880 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-20 16:03:28 +00:00
depristo
a3729bd59c
Now I call BeforeMethod correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4872 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 22:45:45 +00:00
depristo
b7e4a015c0
static thread cache reset in UnitTest
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4870 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 21:53:10 +00:00
depristo
3bbc6a0540
Slightly more thread safe CachingIndexedFastaSequenceFile.java. Likely passes parallel testing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4869 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 21:05:17 +00:00
depristo
5dd0e8388b
Fixed a bug in UnitTest
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4867 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 19:44:35 +00:00
depristo
4a54f3f230
ThreadLocal version of CachingIndexedFastaSequenceFile. More efficient support for shared memory BAQ calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4865 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 15:44:48 +00:00
depristo
32d5397c01
Experimental support for sided annotations. Currently not more/less valuable than two-tailed testing. Future experiments are needed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4864 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 15:08:31 +00:00
handsake
21dc05138a
Bug fixes for the bwa aligner and changes to support compiling against newer releases of the bwa code base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4863 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-17 14:49:15 +00:00
chartl
2bd2667516
Another privately-owned class to add before re-checking out repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4858 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:14:51 +00:00
chartl
e406eb0f95
Adding a useful accessor method to TableFeature
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4856 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 18:11:51 +00:00
ebanks
8ab4704b4c
Adding a command-line argument to allow missing values to evaluate as false instead of true
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4854 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 05:18:12 +00:00
ebanks
9f3e56e487
VariantAnnotator shouldn't die when multiple records occur at the same position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4853 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-16 04:05:47 +00:00
hanna
acfe83920b
'-L unmapped': adding integration tests for explicitly including (-L unmapped)
...
unmapped reads and explicitly excluding (-XL unmapped) unmapped reads, augmenting
the suite of unit tests already put in place.
'-L unmapped' seems safe to use; go for it, but please validate results against
samtools flagstat when the process finishes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4849 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 23:11:46 +00:00
ebanks
dabdeb729e
Eric broke the build. Eric broke the build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4847 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 17:01:38 +00:00
ebanks
5c0b66cb7c
3 big changes that all kill the integration tests: 1. Don't cap the PLs by 255 anymore. 2. Move over to the 3state model as the only available base model for UG (no more base transition tables). 3. New QD implementation when GLs/PLs are available.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4846 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 16:24:28 +00:00
chartl
5a27d231fa
Rename it so that nobody else falls into the trap laid out (the test is VariantToTable, the walker is Variant[s]ToTable)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4844 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 11:43:00 +00:00
chartl
5e27e9162f
Huh? I thought we parsed out comma-separated command line arguments into list automatically...just change the syntax of the integration test, no need to update the md5
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4843 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 11:40:27 +00:00
chartl
3e75431bc8
Thanks to mark: VCFInfoToTable removed in favor of a more flexible walker. Slight change to the argument structure of the walker to make it play more nicely with Queue: the field list parsing is pushed into the command line system (e.g. the variable is exposed as a List<String> and not a String, so Queue doesn't have to join a list into a string only to have it broken out again. This also allows the user to specify -F field1 -F field2 -F field3 if he/she so desires.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4842 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-15 03:33:36 +00:00
kshakir
01323447c6
Removed LibBat.SUB2_BSUB_BLOCK since the use of it exits the JVM.
...
Fixed integration tests to wait on their own for the job to run instead of using SUB2_BSUB_BLOCK.
Updated VariantRecalibrationIntegrationTests MD5s which were knocked out of sync whele SUB2_BSUB_BLOCK was exiting in the middle of integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4840 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 19:57:20 +00:00
hanna
67c07d1a6a
Fixed recently introduced multiplexer issue where DoC couldn't be written
...
directly to command-line.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4839 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 19:35:15 +00:00
hanna
526ae92093
Getting back to '-L unmapped':
...
- basic unit tests for interval sorting and merging with mix of mapped/unmapped.
- validation to ensure that locus walkers (really all non-read walkers) blow up with a user error when -L unmapped is specified.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4837 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 18:24:18 +00:00
ebanks
afd4655674
Use @Output instead of @Argument. As a side note, Chris I'm ready for this nightmare to go away...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4835 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 17:13:15 +00:00
ebanks
cf7d932a17
Fix for f***ed up BWA alignments that adhere to SAM specs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4834 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 17:12:25 +00:00
kshakir
d550fdfd60
Disabling integration test to see if this restores the full test suite.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4833 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-14 15:27:02 +00:00
delangel
a5008faca8
Bug fix: when getting variant contexts at a site, we need to get only variants that start at current location, otherwise we get duplicated records when filtering indels.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4830 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 19:23:10 +00:00
delangel
17db2e0e24
(forgot I hadn't committed this) - refactored IndelStatistics module and added a new inner class to compute Indel classification along with other statistics. So, we now get an extra table specifying, per sample, counts of whether indels are:
...
- Repeat Expansions
- Novel sequence
And for indels of size <=2 we get a per-mononuc. or dinuc. breakdown of novels and expansions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4828 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-13 17:43:43 +00:00