Commit Graph

2518 Commits (035d4170aa8236d8d560fc830ea0fe693d1872dd)

Author SHA1 Message Date
ebanks 035d4170aa fix bug in read clipper: output bam can be null, so check for it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3005 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 18:49:26 +00:00
ebanks 411d25c8d1 -Integration tests for walkers that use original quals.
-framework for pushing -OQ into GATK (not done)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3004 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 18:46:31 +00:00
aaron e365d308d4 add a new JEXLContext that lazy-evaluates JEXL expressions given the VariantContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3003 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 16:00:55 +00:00
kcibul 9f519af06d new method to filter out overlapping PE reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3002 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 15:40:09 +00:00
hanna 45f70de6df Fixed bug that failed to reset an accumulator when crossing contig boundaries,
meaning that in special cases of shallow coverage, an interval might get dropped.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2999 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 04:45:55 +00:00
ebanks 73d6167bd6 Fixing broken integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2998 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 23:18:49 +00:00
depristo 4dd7c5972c Unit tests for -XL arguments; expt. annotation calculating the GC content within 100 bp of the current SNP
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2997 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 21:08:14 +00:00
ebanks e367a50e9b Added genotype concordance module. Not at all finished, but needed to give something to Aaron to look at for help in printing the output nicely.
Also misc cleanup and fixes (e.g. perform evalulation even when no comp tracks are provided).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2996 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 19:02:24 +00:00
aaron ecb59f5d0d removed old tests and old code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2995 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:57:01 +00:00
depristo e7eae9b61d High performance, correct implementation of -XL exclusion lists. Enjoy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2994 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:39:20 +00:00
aaron 88a48821ea removed the dependence on removeRegion() in GenomeLocSortedSet
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2993 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:35:49 +00:00
depristo b39b5edca8 Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:23:12 +00:00
aaron 1eb5f97255 fixed dropping single base intervals from deleteRegion, moving onto performance fixes.
(stop - start is length-1 on closed intervals, so we need to check greater than OR equals to zero)

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2990 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:14:21 +00:00
hanna 7aa7a5f9b8 Bug fixes for edge cases and filtration in the earlier performance fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2989 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 04:46:08 +00:00
hanna 5e8654fcdc Oops! Introduced a performance bug in read interval sharding, when the new sharding system is available. Track more state to avoid this problem in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2987 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 23:19:42 +00:00
asivache d804bdf210 New option: --maxReadsInRam . When using ON_DISK sorting option, the tool may still run out of memory in the regions of pathologically deep coverage because of the generous memory usage limit set in the underlying samtools' sorting sam writers. With this option, the user can lower the number of reads the writer keeps in memory before spilling them on disk.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2985 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 21:15:03 +00:00
aaron 661a043cef adding methods to get RODs by name or type in read traversals, performance improvements to RODs for Reads in general, and some more Tribble infrastructure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2984 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 21:13:39 +00:00
depristo 18ba9929f9 notes for eric
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2983 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 20:34:54 +00:00
hanna cbd529d544 Better chopping up of data for ref walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2982 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 20:13:26 +00:00
hanna a7ba88e649 Rework the way the MicroScheduler handles locus shards to handle intervals that span shards
with less memory consumption.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2981 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 18:40:31 +00:00
ebanks 4a05757a2a Fixed strand bias calculation because of -Infinity issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2980 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 16:05:51 +00:00
aaron dde9fd8a15 some rods-for-reads cleaning and performance improvements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2979 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:54:58 +00:00
depristo 4f4555c80f PPV and Sensitivity added to validation tool output; support for arbitrary -sample arguments to subset variant contexts by sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2978 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:28:31 +00:00
ebanks 40d305bc7e Added test of Nway cleaning for Matt; thanks to Aaron for the help.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2977 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 21:00:41 +00:00
depristo 486bef9318 Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 16:25:16 +00:00
ebanks c85ed1ce90 Plumbing is now in place to emit indel calls from the UnifiedGenotyper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2975 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 04:30:12 +00:00
ebanks 5c35be39ef Now that extended events work for reference traversals, turn it off in the genotyper for non-indel models (thereby fixing busted integration tests).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2974 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:14:06 +00:00
ebanks 7ddd45d059 Hmm. I thought I removed this already.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2973 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:09:13 +00:00
ebanks 1a576525e9 misc improvements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2972 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:00:28 +00:00
ebanks 6e855809e1 Renaming and moving relevant tools into a sequenom directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2971 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 02:31:10 +00:00
asivache c638c29eea In reference traversals, this view did not expect a possibility of TWO alignment contexts (base pileup followed by extended event pileup) associated with the same location. As the result, extended event pileups were silently skipped even when enabled in the traversal engine. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2970 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 22:18:44 +00:00
ebanks bc3761dc16 allow clipper to use original quals if requested
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2969 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 21:50:31 +00:00
ebanks f096a958d6 Initial commit for Andrey of plumbing for indels. Not finished - need to track down bug with him.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2967 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:13:01 +00:00
chartl 0a49dffa8f Row/Column names are now R-friendly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2966 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:01:03 +00:00
ebanks 0e360ea8af Alleles now hash correctly.
Special thanks to Matt & Aaron.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2965 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 18:09:44 +00:00
ebanks e5475a7ba9 re-enabling PlinkToVCF integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2964 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 17:35:49 +00:00
ebanks 5a20bf0e64 3 changes to UG which break integration tests:
1. emit AA,AB,BB likelihoods in the FORMAT field for Mark
2. remove constraint that genotype alleles (in the GT field) need to be lexigraphically sorted.
3. Add bam file(s) used by genotyper to header for Kiran


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2963 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 17:16:47 +00:00
hanna cdce639bae Partially reclaim performance lost during integration test fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2961 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 12:36:11 +00:00
ebanks 9f3b99c11b Moving UnifiedGenotyper and VariantAnnotator over to VariantContext system.
Removing obsolete genotyping classes.
First stage of removing dependence on old Genotype class.
More changes to come.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2960 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 03:41:07 +00:00
hanna 02f48b6457 Fix bug that's been in the GATK for a very long time: update nReads (as well
as nRecords), so that INFO logging doesn't say 'skipped 0 of 0 reads'.  While
I'm in there, update TraversalStatistics to store longs.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2959 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 22:44:54 +00:00
chartl bca9bdcc68 Add integration test for quartiles overflowing on interval reduce
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2957 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 16:18:45 +00:00
chartl 21bf8b4b93 Odd, what I saw on IntelliJ hadn't saved to sting before committing. Here's the actual change.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2956 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 15:54:41 +00:00
rpoplin fe8a8b9199 Hooked up both optimization models via command line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2955 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 14:49:59 +00:00
chartl cc6a714c09 Handle excess coverage in interval output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2954 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 14:40:05 +00:00
rpoplin ca2a0266dc Converting annotation values that are set to Double.Infinity
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2953 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 14:04:33 +00:00
rpoplin b42e0a398e Bug fix in variant optimizer for when there are more novel variants than known variants in the callset. Changing the magic numbers related to the starting sigma values for the gaussian clusters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2952 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 13:02:08 +00:00
hanna e4360bac6a More comprehensive support when sharding for ref walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2951 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 11:25:20 +00:00
hanna eb165ca844 Celebrate the fact that the new sharding system works with integration tests
by removing the scary debug line.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2950 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 23:40:56 +00:00
hanna 9e107513d0 In the new sharding system, if no read group is present, hallucinate one. Added
for test compatibility, but not sure whether we still need this feature.  TODO: Poll the group about this feature.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2949 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 23:01:34 +00:00
hanna a7fe07c404 A few stopgap fixes to get the GATK to the point where the old sharding
infrastructure can be torn down:
1) New sharding system emulates old MonolithicSharding mechanism.
2) Better awareness of differences between fasta and BAM files when creating
   shards.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2948 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 21:01:25 +00:00