Commit Graph

2568 Commits (00971069385d2c3f880146cd961fd8c76d6cb3ef)

Author SHA1 Message Date
ebanks 0097106938 VariantFiltration can now filter specific samples.
This is *NOT* an ideal implementation.  One day when we have lots of free time (or a greater desire), we will implement this correctly and sophisticatedly using all the power of JEXL.  For now, though, this will have to do.
Docs coming tonight.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3060 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 20:45:11 +00:00
asivache 543aefc3d7 Fixing the bug introduced with the earlier commit. When trimming locus to the current bases, we need to take into account expanded boundaries (for windowed reference traversals)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3059 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 19:20:34 +00:00
asivache ee1dc6092f Test updated. Now we do not throw an exception when locus interval is out of bounds, we just return silently a reference context trimmed to the current shard boundaries. New test checks for trimming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3058 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:37:52 +00:00
asivache d2944461ef We also have to allow the window to be (partially) outside the bounds and trimming to the contig size is not enough (thanks to shards). Now we trim to the current bounds too (i.e. if the interval is not completely within current bounds, we create reference context that contains only bases from the overlap between the interval and the bounds).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3057 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:36:29 +00:00
asivache 9053406798 LocusReferenceView: If the locus a view is requested for spans beyond the reference contig ends, create the actual window bounded by contig ends (so that the locus will not be fully contained in the window!!).
ReferenceContext: constructor does not throw an excepion anymore when locus is not fully contained inside the window. So now we can have a reference context associated with a locus such that the window/actual bases do not cover the whole locus. Scary. I am not sure I like this...

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3056 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 15:59:15 +00:00
aaron 439c34ed38 clean-up before annotating VariantEval2 for output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3055 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 07:39:20 +00:00
depristo 076d21d394 Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 02:47:09 +00:00
hanna 6cd97b78ab An additional safety check to ensure that we only walk over coordinate-sorted
data when doing locus traversals.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3053 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:31:45 +00:00
hanna b4b4e8d672 For Sarah Calvo: initial implementation of read pair traversal, for BAM files
sorted by read name.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3052 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:22:25 +00:00
hanna c0eb5c27ea Lower memory support for merged sharding. Merged sharding is still not available.
WARNING: If you update frequently, you might have to rm -rf ~/.ant/cache -- this is an unfortunate side effect of the way we
	 distribute picard-private.jar.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3050 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 22:03:47 +00:00
ebanks 4d4db7fe63 Renaming for consistency
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3049 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:45:01 +00:00
ebanks 4c4d048f14 Moving VariantFiltration over to use VariantContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3048 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:35:23 +00:00
ebanks c88a2a3027 Fixing/cleaning up the vcf merge util
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3047 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:13:32 +00:00
rpoplin cdec84aa8f Bug fix for variant optimizer. Remember to close the PrintStreams it uses to output the cluster files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3046 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:07:32 +00:00
depristo d8ff552311 Support for EXPERIMENT sampling-based genotype likelihoods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3044 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:40 +00:00
depristo 7b17bcd0af Refactoring a few useful routines for detecting mendelian violations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3043 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:01 +00:00
depristo 56092a0fc2 Slight cleanup for mathutils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3042 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:18:08 +00:00
depristo b221ce94ce Still being tested trio-aware genotyper that calculates P(de novo)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3041 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:11:39 +00:00
ebanks 03480c955c And now the UnifiedGenotyper can officially annotate genotype (FORMAT) fields too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3039 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 04:58:37 +00:00
ebanks e757f6f078 Missing value for arbitrary format entries is empty string (need to revisit at some point, but it will require updating the VCF spec).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3038 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:56:27 +00:00
ebanks 0311980668 The VariantAnnotator can now officially annotate genotype (FORMAT) fields.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3037 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:30:14 +00:00
hanna 9b61d95d9c Khalid found an out-of-memory condition with the new sharding system when
merging lots of BAMs, and the fix is taking longer than I thought.  Disable
experimental sharding when merging until the fix is ready.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3036 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 02:43:46 +00:00
ebanks b8e8852b4f Better interface for the Annotator in how it interacts with VariantContext.
Also, added a proof of concept genotype-level annotation (not working yet, almost there).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3035 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 20:41:57 +00:00
hanna 96662d8d1b Moving from GATK dependencies on isolated classes checked into the GATK
codebase to a dependency on a jar file compiled from my private picard branch.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3034 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 17:43:42 +00:00
aaron 8a5f0b746e some cleanup for the output system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3032 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:54:39 +00:00
rpoplin c78fc23ec5 Minor updates to output of variant optimizer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3031 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:46:47 +00:00
ebanks 0247548400 Fixed one test and (temporarily) punted on another
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3030 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 06:22:48 +00:00
ebanks ee0e833616 Some significant changes to the annotator:
1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental.
2. Users can now not only specify specific annotations to use, but also the interface names from #1.  Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest.
3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator.
4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 05:38:32 +00:00
rpoplin 58a31bab6a Variant optimizer now outputs VCF files via ApplyVariantClustersWalker. Documentation to be added to the wiki. It is ready to be used by other people but only with great caution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3028 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 20:41:42 +00:00
hanna d9398dc347 Remove some of the restrictions on getStart() and getStop(); getStart() and getStop()
now do the minimum validation rather than the more rigorous only-within-the-contig-bounds 
header validation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3027 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 19:39:30 +00:00
aaron 182f1061ff Bamboo isn't picking up commits for some reason; updating a copyright to see if it'll get this commit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3025 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:56:48 +00:00
ebanks 5e29d0c219 Be smarter about dealing with infinite quals for ref calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3024 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:35:23 +00:00
rpoplin 1bb4394aa9 Adding a skeleton for the second step of the variant optimization process.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3023 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:03:40 +00:00
ebanks ded4ba8966 Let's make artificial reads that actually adhere to the specs...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3022 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:51:42 +00:00
bthomas 5b34bb9ab0 Adding three minor new features:
+ -L all now walks over all intervals

+ if a -L argument is passed with a .list extension, and file does not exist, returns a \
File Not Found error instead of "bad interval" error. We plan to soon revisit interval \
lists and generate a concrete list of filenames, so this is likely temporary.

+ Error is thrown if the start position on an interval is higher number than the end position.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3021 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:24:10 +00:00
ebanks 4340601c26 -Pushed base quals back down into SAMRecord; if -OQ is used, the SAMRecord quals get updated automatically
-Better integration test


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3020 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:00:10 +00:00
ebanks 76d14d17dc oops, need to update class names too
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3019 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 14:01:31 +00:00
ebanks 85a030069d renaming for consistency
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3018 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 14:00:28 +00:00
ebanks af5fd99444 Added filter for bad cigars (based on consecutive indels) - and cleaned up bad mates filter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3017 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 13:53:42 +00:00
hanna 2cc040aa1c New sharding system is live. Disable with -ds.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3016 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 03:32:45 +00:00
ebanks 1fd909cdaf Fix for Kiran: -1 is a valid value for genotype qualities in VCF, so VariantContext shouldn't die. Cleaned up the relevant VCF code while I was in there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3015 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 00:20:15 +00:00
hanna 849bd1f451 Set the eagerDecode flag in such a way that the binary data block in the BAM will always be considered dirty.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3014 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 22:01:23 +00:00
rpoplin 933823c8bc Removed the StingException when mkdir fails for Sendu in AnalyzeCovariates. Incremental updates to VariantOptimizer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3013 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 19:45:02 +00:00
hanna 2525ecaa43 Oops. Commented out some tests to improve performance and then checked in the commented out tests. Reverted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3012 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 16:34:50 +00:00
hanna 59045ccb28 Filter,merge performs much better than merge,filter. Many thanks to Eric for checking in an integration test that so compellingly demonstrates this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3011 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 16:23:37 +00:00
hanna 6dd5f192e7 Performance improvements for RODs in conjunction with new sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3010 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 14:54:12 +00:00
kiran f20f78d77f Don't crash if the tracker is null. Reset the alternate alleles based on the alts present in the subset of samples.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3009 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 04:00:04 +00:00
aaron 10e76abbbc adding some VE2 report infrastructure; work-in-progress.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3008 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 03:57:42 +00:00
ebanks 586f87fa35 Quick fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3007 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 02:59:26 +00:00
ebanks 202231141c -Push the --use_original_qualities argument into the engine.
-Check that base and qual strings are the same lengths
-Fix one more bug in the clipper.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3006 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 02:06:11 +00:00