chartl
dc802aa26f
Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
ebanks
1e8b3ca6ba
Fare thee well, oh LocusWindowTraversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3089 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:17:26 +00:00
depristo
8ea98faf47
Deleting the pooled calcluation model -- no longer supported.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3088 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 11:44:27 +00:00
hanna
85037ab13f
Fix for Kiran's sharding issue (Invalid GZIP header). General cleanup of
...
Picard patch, including move of some of the Picard private classes we use to Picard public.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3087 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 03:21:27 +00:00
depristo
a45ac220aa
Removing unnecessary printing routines
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3086 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-28 22:34:54 +00:00
depristo
b8ab74a6dc
Minor useful changes to BaseUtils and MathUtils to support a new haplotype score annotation that determines to the two most likely haplotypes over an interval and scores variants by their consistency with a diploid model. Appears to be useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3085 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-28 21:45:22 +00:00
kshakir
e9e53f68ab
Filter lists can now end with .list or .txt.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3084 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 17:41:24 +00:00
aaron
074ec77dcc
First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be
...
specified with the reportType command line option in VE2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 03:59:32 +00:00
kiran
85f4f66180
Updated to use VariantContext. Output has been reformatted: variant and genotype concordance are emitted for every coverage level per variant. If the requested sampling level is higher than what's available, the maximum available coverage at that locus is used. This makes it much easier to make plots indicating the percentage of comparison callset recovered at a certain sampling depth.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3082 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 21:02:43 +00:00
kiran
391e5843e4
If the annotation engine has not been supplied, don't try to annotate anything.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3081 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 20:52:21 +00:00
kiran
8048b709a0
Selects a single sample on which to operate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3080 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 20:50:58 +00:00
kshakir
20e3ba15ca
Added an optional argument -rgbl --read_group_black_list to filter read groups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3079 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 19:38:57 +00:00
ebanks
73a14a985b
Moving VariantsToVCF to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3078 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:55:12 +00:00
ebanks
14bf6923a8
HapMap-to-VCF now works fine within Variants-to-VCF. Added integration test for it and removed old code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3077 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:34:59 +00:00
hanna
78af6d5a40
New sharding system is going live again for on-the-fly merging.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3076 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 18:39:04 +00:00
hanna
46c14ec63f
New, much less memory intensive implementation of BAM file sharding. Streams indices together with the expectation
...
that bins will be present in the bin sparse array, which avoids the problem of having to hold the sparse bin array
stored in every BAM file index in memory at the same time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3075 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 17:41:22 +00:00
ebanks
4398a8b370
Updated. Now uses VariantContext and is truly "variants" to vcf (i.e. not just GELI to vcf).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3074 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 04:53:31 +00:00
ebanks
2373a4618f
bug caused by a misprint: context != contexts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3073 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 03:08:24 +00:00
ebanks
3176715c74
1. Alignability mask returns null when not available.
...
2. --list now prints out the available classes/groups too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3072 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 20:49:07 +00:00
rpoplin
06a212e612
Adding VariantConcordanceROCCurveWalker to create ROC curves comparing concordance between optimized call sets and validation truth sets in VCF format in order to evaluate performance of variant optimizer independently of achieving a particular novel ti/tv ratio. Added option to ignore only the specified filters in the input call sets via --ignore_filter <String>. Added option to provide a prior estimate of error for known snps via --known_prior <qual>. The het and hom calls are clustered independently. Infrastructure in place to use titv of known snps to inform p(true) of novel snps. Tweaked protection against overfitting based on suggestions from several people. Minor edits to AnalyzeAnnotations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3071 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 19:43:10 +00:00
ebanks
47e30aba92
Rods for reads hooked up into the cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3070 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 18:17:56 +00:00
aaron
5079f35e40
better method names for read based reference ordered data access.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3069 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 16:13:31 +00:00
ebanks
49117819f5
For the cleaner to clean, it must beat the entropy produced by the aligner (and not just the raw reads).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3068 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 15:21:58 +00:00
aaron
60dfba997b
added some sample annotations to VariantEval2 analysis modules, and some changes to the report system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3067 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 05:40:10 +00:00
hanna
1f451e17e5
Changing preloaded index to only "preload" reference sequences on demand.
...
Results in drastic lowering of startup cost when multiple BAM files are
merged.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3066 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 22:02:28 +00:00
hanna
884a577013
Phase 2 of Picard patch refactoring: kill off SAMFileReader2/BAMFileReader2, merging the changes back into the base classes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3065 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 16:48:11 +00:00
aaron
7462a0b2d1
cleaned-up of VariantContextAdapter tests, fixed the double comparisons in equals() in RodGeliText (nice MathUtils.compareDoubles Kiran)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3064 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 15:18:30 +00:00
aaron
a69b8555dd
Geli to variant context.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3063 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 06:45:29 +00:00
aaron
eafdd047f7
GLF to variant context. Added some methods in GLF to aid testing; and added a test that reads GLF, converts to VC, writes GLF and reads back to compare.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3062 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 03:43:25 +00:00
hanna
3767adb0bb
Processing intervals as they stream in means much lower memory usage and
...
quicker runtime. Making change as minimal as possible to avoid conflicts
with BT's incoming patch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3061 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 22:04:45 +00:00
ebanks
0097106938
VariantFiltration can now filter specific samples.
...
This is *NOT* an ideal implementation. One day when we have lots of free time (or a greater desire), we will implement this correctly and sophisticatedly using all the power of JEXL. For now, though, this will have to do.
Docs coming tonight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3060 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 20:45:11 +00:00
asivache
543aefc3d7
Fixing the bug introduced with the earlier commit. When trimming locus to the current bases, we need to take into account expanded boundaries (for windowed reference traversals)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3059 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 19:20:34 +00:00
asivache
ee1dc6092f
Test updated. Now we do not throw an exception when locus interval is out of bounds, we just return silently a reference context trimmed to the current shard boundaries. New test checks for trimming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3058 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:37:52 +00:00
asivache
d2944461ef
We also have to allow the window to be (partially) outside the bounds and trimming to the contig size is not enough (thanks to shards). Now we trim to the current bounds too (i.e. if the interval is not completely within current bounds, we create reference context that contains only bases from the overlap between the interval and the bounds).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3057 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:36:29 +00:00
asivache
9053406798
LocusReferenceView: If the locus a view is requested for spans beyond the reference contig ends, create the actual window bounded by contig ends (so that the locus will not be fully contained in the window!!).
...
ReferenceContext: constructor does not throw an excepion anymore when locus is not fully contained inside the window. So now we can have a reference context associated with a locus such that the window/actual bases do not cover the whole locus. Scary. I am not sure I like this...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3056 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 15:59:15 +00:00
aaron
439c34ed38
clean-up before annotating VariantEval2 for output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3055 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 07:39:20 +00:00
depristo
076d21d394
Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 02:47:09 +00:00
hanna
6cd97b78ab
An additional safety check to ensure that we only walk over coordinate-sorted
...
data when doing locus traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3053 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:31:45 +00:00
hanna
b4b4e8d672
For Sarah Calvo: initial implementation of read pair traversal, for BAM files
...
sorted by read name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3052 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:22:25 +00:00
hanna
c0eb5c27ea
Lower memory support for merged sharding. Merged sharding is still not available.
...
WARNING: If you update frequently, you might have to rm -rf ~/.ant/cache -- this is an unfortunate side effect of the way we
distribute picard-private.jar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3050 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 22:03:47 +00:00
ebanks
4d4db7fe63
Renaming for consistency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3049 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:45:01 +00:00
ebanks
4c4d048f14
Moving VariantFiltration over to use VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3048 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:35:23 +00:00
ebanks
c88a2a3027
Fixing/cleaning up the vcf merge util
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3047 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:13:32 +00:00
rpoplin
cdec84aa8f
Bug fix for variant optimizer. Remember to close the PrintStreams it uses to output the cluster files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3046 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:07:32 +00:00
depristo
d8ff552311
Support for EXPERIMENT sampling-based genotype likelihoods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3044 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:40 +00:00
depristo
7b17bcd0af
Refactoring a few useful routines for detecting mendelian violations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3043 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:01 +00:00
depristo
56092a0fc2
Slight cleanup for mathutils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3042 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:18:08 +00:00
depristo
b221ce94ce
Still being tested trio-aware genotyper that calculates P(de novo)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3041 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:11:39 +00:00
ebanks
03480c955c
And now the UnifiedGenotyper can officially annotate genotype (FORMAT) fields too.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3039 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 04:58:37 +00:00
ebanks
e757f6f078
Missing value for arbitrary format entries is empty string (need to revisit at some point, but it will require updating the VCF spec).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3038 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:56:27 +00:00
ebanks
0311980668
The VariantAnnotator can now officially annotate genotype (FORMAT) fields.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3037 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:30:14 +00:00
hanna
9b61d95d9c
Khalid found an out-of-memory condition with the new sharding system when
...
merging lots of BAMs, and the fix is taking longer than I thought. Disable
experimental sharding when merging until the fix is ready.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3036 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 02:43:46 +00:00
ebanks
b8e8852b4f
Better interface for the Annotator in how it interacts with VariantContext.
...
Also, added a proof of concept genotype-level annotation (not working yet, almost there).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3035 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 20:41:57 +00:00
hanna
96662d8d1b
Moving from GATK dependencies on isolated classes checked into the GATK
...
codebase to a dependency on a jar file compiled from my private picard branch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3034 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 17:43:42 +00:00
aaron
8a5f0b746e
some cleanup for the output system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3032 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:54:39 +00:00
rpoplin
c78fc23ec5
Minor updates to output of variant optimizer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3031 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:46:47 +00:00
ebanks
0247548400
Fixed one test and (temporarily) punted on another
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3030 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 06:22:48 +00:00
ebanks
ee0e833616
Some significant changes to the annotator:
...
1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental.
2. Users can now not only specify specific annotations to use, but also the interface names from #1 . Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest.
3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator.
4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 05:38:32 +00:00
rpoplin
58a31bab6a
Variant optimizer now outputs VCF files via ApplyVariantClustersWalker. Documentation to be added to the wiki. It is ready to be used by other people but only with great caution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3028 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 20:41:42 +00:00
hanna
d9398dc347
Remove some of the restrictions on getStart() and getStop(); getStart() and getStop()
...
now do the minimum validation rather than the more rigorous only-within-the-contig-bounds
header validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3027 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 19:39:30 +00:00
aaron
182f1061ff
Bamboo isn't picking up commits for some reason; updating a copyright to see if it'll get this commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3025 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:56:48 +00:00
ebanks
5e29d0c219
Be smarter about dealing with infinite quals for ref calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3024 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:35:23 +00:00
rpoplin
1bb4394aa9
Adding a skeleton for the second step of the variant optimization process.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3023 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:03:40 +00:00
ebanks
ded4ba8966
Let's make artificial reads that actually adhere to the specs...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3022 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:51:42 +00:00
bthomas
5b34bb9ab0
Adding three minor new features:
...
+ -L all now walks over all intervals
+ if a -L argument is passed with a .list extension, and file does not exist, returns a \
File Not Found error instead of "bad interval" error. We plan to soon revisit interval \
lists and generate a concrete list of filenames, so this is likely temporary.
+ Error is thrown if the start position on an interval is higher number than the end position.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3021 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:24:10 +00:00
ebanks
4340601c26
-Pushed base quals back down into SAMRecord; if -OQ is used, the SAMRecord quals get updated automatically
...
-Better integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3020 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:00:10 +00:00
ebanks
76d14d17dc
oops, need to update class names too
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3019 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 14:01:31 +00:00
ebanks
85a030069d
renaming for consistency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3018 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 14:00:28 +00:00
ebanks
af5fd99444
Added filter for bad cigars (based on consecutive indels) - and cleaned up bad mates filter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3017 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 13:53:42 +00:00
hanna
2cc040aa1c
New sharding system is live. Disable with -ds.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3016 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 03:32:45 +00:00
ebanks
1fd909cdaf
Fix for Kiran: -1 is a valid value for genotype qualities in VCF, so VariantContext shouldn't die. Cleaned up the relevant VCF code while I was in there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3015 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 00:20:15 +00:00
hanna
849bd1f451
Set the eagerDecode flag in such a way that the binary data block in the BAM will always be considered dirty.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3014 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 22:01:23 +00:00
rpoplin
933823c8bc
Removed the StingException when mkdir fails for Sendu in AnalyzeCovariates. Incremental updates to VariantOptimizer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3013 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 19:45:02 +00:00
hanna
2525ecaa43
Oops. Commented out some tests to improve performance and then checked in the commented out tests. Reverted.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3012 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 16:34:50 +00:00
hanna
59045ccb28
Filter,merge performs much better than merge,filter. Many thanks to Eric for checking in an integration test that so compellingly demonstrates this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3011 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 16:23:37 +00:00
hanna
6dd5f192e7
Performance improvements for RODs in conjunction with new sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3010 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 14:54:12 +00:00
kiran
f20f78d77f
Don't crash if the tracker is null. Reset the alternate alleles based on the alts present in the subset of samples.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3009 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 04:00:04 +00:00
aaron
10e76abbbc
adding some VE2 report infrastructure; work-in-progress.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3008 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 03:57:42 +00:00
ebanks
586f87fa35
Quick fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3007 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 02:59:26 +00:00
ebanks
202231141c
-Push the --use_original_qualities argument into the engine.
...
-Check that base and qual strings are the same lengths
-Fix one more bug in the clipper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3006 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 02:06:11 +00:00
ebanks
035d4170aa
fix bug in read clipper: output bam can be null, so check for it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3005 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 18:49:26 +00:00
ebanks
411d25c8d1
-Integration tests for walkers that use original quals.
...
-framework for pushing -OQ into GATK (not done)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3004 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 18:46:31 +00:00
aaron
e365d308d4
add a new JEXLContext that lazy-evaluates JEXL expressions given the VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3003 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 16:00:55 +00:00
kcibul
9f519af06d
new method to filter out overlapping PE reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3002 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 15:40:09 +00:00
hanna
45f70de6df
Fixed bug that failed to reset an accumulator when crossing contig boundaries,
...
meaning that in special cases of shallow coverage, an interval might get dropped.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2999 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 04:45:55 +00:00
ebanks
73d6167bd6
Fixing broken integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2998 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 23:18:49 +00:00
depristo
4dd7c5972c
Unit tests for -XL arguments; expt. annotation calculating the GC content within 100 bp of the current SNP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2997 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 21:08:14 +00:00
ebanks
e367a50e9b
Added genotype concordance module. Not at all finished, but needed to give something to Aaron to look at for help in printing the output nicely.
...
Also misc cleanup and fixes (e.g. perform evalulation even when no comp tracks are provided).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2996 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 19:02:24 +00:00
aaron
ecb59f5d0d
removed old tests and old code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2995 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:57:01 +00:00
depristo
e7eae9b61d
High performance, correct implementation of -XL exclusion lists. Enjoy.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2994 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:39:20 +00:00
aaron
88a48821ea
removed the dependence on removeRegion() in GenomeLocSortedSet
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2993 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:35:49 +00:00
depristo
b39b5edca8
Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:23:12 +00:00
aaron
1eb5f97255
fixed dropping single base intervals from deleteRegion, moving onto performance fixes.
...
(stop - start is length-1 on closed intervals, so we need to check greater than OR equals to zero)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2990 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:14:21 +00:00
hanna
7aa7a5f9b8
Bug fixes for edge cases and filtration in the earlier performance fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2989 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 04:46:08 +00:00
hanna
5e8654fcdc
Oops! Introduced a performance bug in read interval sharding, when the new sharding system is available. Track more state to avoid this problem in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2987 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 23:19:42 +00:00
asivache
d804bdf210
New option: --maxReadsInRam . When using ON_DISK sorting option, the tool may still run out of memory in the regions of pathologically deep coverage because of the generous memory usage limit set in the underlying samtools' sorting sam writers. With this option, the user can lower the number of reads the writer keeps in memory before spilling them on disk.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2985 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 21:15:03 +00:00
aaron
661a043cef
adding methods to get RODs by name or type in read traversals, performance improvements to RODs for Reads in general, and some more Tribble infrastructure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2984 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 21:13:39 +00:00
depristo
18ba9929f9
notes for eric
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2983 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 20:34:54 +00:00
hanna
cbd529d544
Better chopping up of data for ref walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2982 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 20:13:26 +00:00
hanna
a7ba88e649
Rework the way the MicroScheduler handles locus shards to handle intervals that span shards
...
with less memory consumption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2981 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 18:40:31 +00:00