Commit Graph

2538 Commits (4965d6b26a443fb6d6cacfbceea781b652fdec34)

Author SHA1 Message Date
aaron 182f1061ff Bamboo isn't picking up commits for some reason; updating a copyright to see if it'll get this commit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3025 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:56:48 +00:00
ebanks 5e29d0c219 Be smarter about dealing with infinite quals for ref calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3024 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:35:23 +00:00
rpoplin 1bb4394aa9 Adding a skeleton for the second step of the variant optimization process.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3023 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:03:40 +00:00
ebanks ded4ba8966 Let's make artificial reads that actually adhere to the specs...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3022 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:51:42 +00:00
bthomas 5b34bb9ab0 Adding three minor new features:
+ -L all now walks over all intervals

+ if a -L argument is passed with a .list extension, and file does not exist, returns a \
File Not Found error instead of "bad interval" error. We plan to soon revisit interval \
lists and generate a concrete list of filenames, so this is likely temporary.

+ Error is thrown if the start position on an interval is higher number than the end position.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3021 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:24:10 +00:00
ebanks 4340601c26 -Pushed base quals back down into SAMRecord; if -OQ is used, the SAMRecord quals get updated automatically
-Better integration test


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3020 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:00:10 +00:00
ebanks 76d14d17dc oops, need to update class names too
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3019 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 14:01:31 +00:00
ebanks 85a030069d renaming for consistency
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3018 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 14:00:28 +00:00
ebanks af5fd99444 Added filter for bad cigars (based on consecutive indels) - and cleaned up bad mates filter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3017 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 13:53:42 +00:00
hanna 2cc040aa1c New sharding system is live. Disable with -ds.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3016 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 03:32:45 +00:00
ebanks 1fd909cdaf Fix for Kiran: -1 is a valid value for genotype qualities in VCF, so VariantContext shouldn't die. Cleaned up the relevant VCF code while I was in there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3015 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 00:20:15 +00:00
hanna 849bd1f451 Set the eagerDecode flag in such a way that the binary data block in the BAM will always be considered dirty.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3014 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 22:01:23 +00:00
rpoplin 933823c8bc Removed the StingException when mkdir fails for Sendu in AnalyzeCovariates. Incremental updates to VariantOptimizer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3013 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 19:45:02 +00:00
hanna 2525ecaa43 Oops. Commented out some tests to improve performance and then checked in the commented out tests. Reverted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3012 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 16:34:50 +00:00
hanna 59045ccb28 Filter,merge performs much better than merge,filter. Many thanks to Eric for checking in an integration test that so compellingly demonstrates this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3011 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 16:23:37 +00:00
hanna 6dd5f192e7 Performance improvements for RODs in conjunction with new sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3010 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 14:54:12 +00:00
kiran f20f78d77f Don't crash if the tracker is null. Reset the alternate alleles based on the alts present in the subset of samples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3009 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 04:00:04 +00:00
aaron 10e76abbbc adding some VE2 report infrastructure; work-in-progress.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3008 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 03:57:42 +00:00
ebanks 586f87fa35 Quick fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3007 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 02:59:26 +00:00
ebanks 202231141c -Push the --use_original_qualities argument into the engine.
-Check that base and qual strings are the same lengths
-Fix one more bug in the clipper.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3006 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 02:06:11 +00:00
ebanks 035d4170aa fix bug in read clipper: output bam can be null, so check for it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3005 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 18:49:26 +00:00
ebanks 411d25c8d1 -Integration tests for walkers that use original quals.
-framework for pushing -OQ into GATK (not done)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3004 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 18:46:31 +00:00
aaron e365d308d4 add a new JEXLContext that lazy-evaluates JEXL expressions given the VariantContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3003 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 16:00:55 +00:00
kcibul 9f519af06d new method to filter out overlapping PE reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3002 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 15:40:09 +00:00
hanna 45f70de6df Fixed bug that failed to reset an accumulator when crossing contig boundaries,
meaning that in special cases of shallow coverage, an interval might get dropped.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2999 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 04:45:55 +00:00
ebanks 73d6167bd6 Fixing broken integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2998 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 23:18:49 +00:00
depristo 4dd7c5972c Unit tests for -XL arguments; expt. annotation calculating the GC content within 100 bp of the current SNP
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2997 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 21:08:14 +00:00
ebanks e367a50e9b Added genotype concordance module. Not at all finished, but needed to give something to Aaron to look at for help in printing the output nicely.
Also misc cleanup and fixes (e.g. perform evalulation even when no comp tracks are provided).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2996 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 19:02:24 +00:00
aaron ecb59f5d0d removed old tests and old code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2995 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:57:01 +00:00
depristo e7eae9b61d High performance, correct implementation of -XL exclusion lists. Enjoy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2994 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:39:20 +00:00
aaron 88a48821ea removed the dependence on removeRegion() in GenomeLocSortedSet
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2993 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:35:49 +00:00
depristo b39b5edca8 Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:23:12 +00:00
aaron 1eb5f97255 fixed dropping single base intervals from deleteRegion, moving onto performance fixes.
(stop - start is length-1 on closed intervals, so we need to check greater than OR equals to zero)

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2990 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:14:21 +00:00
hanna 7aa7a5f9b8 Bug fixes for edge cases and filtration in the earlier performance fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2989 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 04:46:08 +00:00
hanna 5e8654fcdc Oops! Introduced a performance bug in read interval sharding, when the new sharding system is available. Track more state to avoid this problem in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2987 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 23:19:42 +00:00
asivache d804bdf210 New option: --maxReadsInRam . When using ON_DISK sorting option, the tool may still run out of memory in the regions of pathologically deep coverage because of the generous memory usage limit set in the underlying samtools' sorting sam writers. With this option, the user can lower the number of reads the writer keeps in memory before spilling them on disk.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2985 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 21:15:03 +00:00
aaron 661a043cef adding methods to get RODs by name or type in read traversals, performance improvements to RODs for Reads in general, and some more Tribble infrastructure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2984 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 21:13:39 +00:00
depristo 18ba9929f9 notes for eric
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2983 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 20:34:54 +00:00
hanna cbd529d544 Better chopping up of data for ref walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2982 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 20:13:26 +00:00
hanna a7ba88e649 Rework the way the MicroScheduler handles locus shards to handle intervals that span shards
with less memory consumption.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2981 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 18:40:31 +00:00
ebanks 4a05757a2a Fixed strand bias calculation because of -Infinity issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2980 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 16:05:51 +00:00
aaron dde9fd8a15 some rods-for-reads cleaning and performance improvements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2979 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:54:58 +00:00
depristo 4f4555c80f PPV and Sensitivity added to validation tool output; support for arbitrary -sample arguments to subset variant contexts by sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2978 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:28:31 +00:00
ebanks 40d305bc7e Added test of Nway cleaning for Matt; thanks to Aaron for the help.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2977 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 21:00:41 +00:00
depristo 486bef9318 Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 16:25:16 +00:00
ebanks c85ed1ce90 Plumbing is now in place to emit indel calls from the UnifiedGenotyper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2975 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 04:30:12 +00:00
ebanks 5c35be39ef Now that extended events work for reference traversals, turn it off in the genotyper for non-indel models (thereby fixing busted integration tests).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2974 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:14:06 +00:00
ebanks 7ddd45d059 Hmm. I thought I removed this already.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2973 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:09:13 +00:00
ebanks 1a576525e9 misc improvements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2972 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:00:28 +00:00
ebanks 6e855809e1 Renaming and moving relevant tools into a sequenom directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2971 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 02:31:10 +00:00