Commit Graph

2596 Commits (8ea98faf472305410ea3dbd4872ac69dbb04af8e)

Author SHA1 Message Date
depristo 8ea98faf47 Deleting the pooled calcluation model -- no longer supported.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3088 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 11:44:27 +00:00
hanna 85037ab13f Fix for Kiran's sharding issue (Invalid GZIP header). General cleanup of
Picard patch, including move of some of the Picard private classes we use to Picard public.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3087 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 03:21:27 +00:00
depristo a45ac220aa Removing unnecessary printing routines
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3086 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-28 22:34:54 +00:00
depristo b8ab74a6dc Minor useful changes to BaseUtils and MathUtils to support a new haplotype score annotation that determines to the two most likely haplotypes over an interval and scores variants by their consistency with a diploid model. Appears to be useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3085 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-28 21:45:22 +00:00
kshakir e9e53f68ab Filter lists can now end with .list or .txt.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3084 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 17:41:24 +00:00
aaron 074ec77dcc First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be
specified with the reportType command line option in VE2.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 03:59:32 +00:00
kiran 85f4f66180 Updated to use VariantContext. Output has been reformatted: variant and genotype concordance are emitted for every coverage level per variant. If the requested sampling level is higher than what's available, the maximum available coverage at that locus is used. This makes it much easier to make plots indicating the percentage of comparison callset recovered at a certain sampling depth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3082 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 21:02:43 +00:00
kiran 391e5843e4 If the annotation engine has not been supplied, don't try to annotate anything.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3081 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 20:52:21 +00:00
kiran 8048b709a0 Selects a single sample on which to operate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3080 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 20:50:58 +00:00
kshakir 20e3ba15ca Added an optional argument -rgbl --read_group_black_list to filter read groups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3079 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 19:38:57 +00:00
ebanks 73a14a985b Moving VariantsToVCF to core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3078 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:55:12 +00:00
ebanks 14bf6923a8 HapMap-to-VCF now works fine within Variants-to-VCF. Added integration test for it and removed old code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3077 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:34:59 +00:00
hanna 78af6d5a40 New sharding system is going live again for on-the-fly merging.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3076 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 18:39:04 +00:00
hanna 46c14ec63f New, much less memory intensive implementation of BAM file sharding. Streams indices together with the expectation
that bins will be present in the bin sparse array, which avoids the problem of having to hold the sparse bin array
stored in every BAM file index in memory at the same time.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3075 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 17:41:22 +00:00
ebanks 4398a8b370 Updated. Now uses VariantContext and is truly "variants" to vcf (i.e. not just GELI to vcf).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3074 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 04:53:31 +00:00
ebanks 2373a4618f bug caused by a misprint: context != contexts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3073 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 03:08:24 +00:00
ebanks 3176715c74 1. Alignability mask returns null when not available.
2. --list now prints out the available classes/groups too.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3072 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 20:49:07 +00:00
rpoplin 06a212e612 Adding VariantConcordanceROCCurveWalker to create ROC curves comparing concordance between optimized call sets and validation truth sets in VCF format in order to evaluate performance of variant optimizer independently of achieving a particular novel ti/tv ratio. Added option to ignore only the specified filters in the input call sets via --ignore_filter <String>. Added option to provide a prior estimate of error for known snps via --known_prior <qual>. The het and hom calls are clustered independently. Infrastructure in place to use titv of known snps to inform p(true) of novel snps. Tweaked protection against overfitting based on suggestions from several people. Minor edits to AnalyzeAnnotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3071 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 19:43:10 +00:00
ebanks 47e30aba92 Rods for reads hooked up into the cleaner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3070 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 18:17:56 +00:00
aaron 5079f35e40 better method names for read based reference ordered data access.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3069 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 16:13:31 +00:00
ebanks 49117819f5 For the cleaner to clean, it must beat the entropy produced by the aligner (and not just the raw reads).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3068 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 15:21:58 +00:00
aaron 60dfba997b added some sample annotations to VariantEval2 analysis modules, and some changes to the report system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3067 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 05:40:10 +00:00
hanna 1f451e17e5 Changing preloaded index to only "preload" reference sequences on demand.
Results in drastic lowering of startup cost when multiple BAM files are 
merged.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3066 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 22:02:28 +00:00
hanna 884a577013 Phase 2 of Picard patch refactoring: kill off SAMFileReader2/BAMFileReader2, merging the changes back into the base classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3065 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 16:48:11 +00:00
aaron 7462a0b2d1 cleaned-up of VariantContextAdapter tests, fixed the double comparisons in equals() in RodGeliText (nice MathUtils.compareDoubles Kiran)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3064 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 15:18:30 +00:00
aaron a69b8555dd Geli to variant context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3063 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 06:45:29 +00:00
aaron eafdd047f7 GLF to variant context. Added some methods in GLF to aid testing; and added a test that reads GLF, converts to VC, writes GLF and reads back to compare.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3062 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 03:43:25 +00:00
hanna 3767adb0bb Processing intervals as they stream in means much lower memory usage and
quicker runtime.  Making change as minimal as possible to avoid conflicts
with BT's incoming patch.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3061 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 22:04:45 +00:00
ebanks 0097106938 VariantFiltration can now filter specific samples.
This is *NOT* an ideal implementation.  One day when we have lots of free time (or a greater desire), we will implement this correctly and sophisticatedly using all the power of JEXL.  For now, though, this will have to do.
Docs coming tonight.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3060 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 20:45:11 +00:00
asivache 543aefc3d7 Fixing the bug introduced with the earlier commit. When trimming locus to the current bases, we need to take into account expanded boundaries (for windowed reference traversals)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3059 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 19:20:34 +00:00
asivache ee1dc6092f Test updated. Now we do not throw an exception when locus interval is out of bounds, we just return silently a reference context trimmed to the current shard boundaries. New test checks for trimming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3058 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:37:52 +00:00
asivache d2944461ef We also have to allow the window to be (partially) outside the bounds and trimming to the contig size is not enough (thanks to shards). Now we trim to the current bounds too (i.e. if the interval is not completely within current bounds, we create reference context that contains only bases from the overlap between the interval and the bounds).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3057 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:36:29 +00:00
asivache 9053406798 LocusReferenceView: If the locus a view is requested for spans beyond the reference contig ends, create the actual window bounded by contig ends (so that the locus will not be fully contained in the window!!).
ReferenceContext: constructor does not throw an excepion anymore when locus is not fully contained inside the window. So now we can have a reference context associated with a locus such that the window/actual bases do not cover the whole locus. Scary. I am not sure I like this...

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3056 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 15:59:15 +00:00
aaron 439c34ed38 clean-up before annotating VariantEval2 for output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3055 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 07:39:20 +00:00
depristo 076d21d394 Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 02:47:09 +00:00
hanna 6cd97b78ab An additional safety check to ensure that we only walk over coordinate-sorted
data when doing locus traversals.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3053 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:31:45 +00:00
hanna b4b4e8d672 For Sarah Calvo: initial implementation of read pair traversal, for BAM files
sorted by read name.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3052 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:22:25 +00:00
hanna c0eb5c27ea Lower memory support for merged sharding. Merged sharding is still not available.
WARNING: If you update frequently, you might have to rm -rf ~/.ant/cache -- this is an unfortunate side effect of the way we
	 distribute picard-private.jar.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3050 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 22:03:47 +00:00
ebanks 4d4db7fe63 Renaming for consistency
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3049 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:45:01 +00:00
ebanks 4c4d048f14 Moving VariantFiltration over to use VariantContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3048 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:35:23 +00:00
ebanks c88a2a3027 Fixing/cleaning up the vcf merge util
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3047 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:13:32 +00:00
rpoplin cdec84aa8f Bug fix for variant optimizer. Remember to close the PrintStreams it uses to output the cluster files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3046 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:07:32 +00:00
depristo d8ff552311 Support for EXPERIMENT sampling-based genotype likelihoods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3044 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:40 +00:00
depristo 7b17bcd0af Refactoring a few useful routines for detecting mendelian violations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3043 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:01 +00:00
depristo 56092a0fc2 Slight cleanup for mathutils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3042 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:18:08 +00:00
depristo b221ce94ce Still being tested trio-aware genotyper that calculates P(de novo)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3041 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:11:39 +00:00
ebanks 03480c955c And now the UnifiedGenotyper can officially annotate genotype (FORMAT) fields too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3039 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 04:58:37 +00:00
ebanks e757f6f078 Missing value for arbitrary format entries is empty string (need to revisit at some point, but it will require updating the VCF spec).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3038 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:56:27 +00:00
ebanks 0311980668 The VariantAnnotator can now officially annotate genotype (FORMAT) fields.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3037 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:30:14 +00:00
hanna 9b61d95d9c Khalid found an out-of-memory condition with the new sharding system when
merging lots of BAMs, and the fix is taking longer than I thought.  Disable
experimental sharding when merging until the fix is ready.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3036 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 02:43:46 +00:00