aaron
4014a8a674
A long overdue correction; all unit tests now end in 'UnitTest'. This was something we wanted to do for a while, and now with the performance tests coming, it was a good time to clean-up. Please label any new test appropriately: *UnitTest and *IntegrationTest are the two valid file name patterns for tests.
...
Thanks!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3135 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 06:14:15 +00:00
aaron
e148a3ac61
added the ability to create interval lists directly from a ROD, using the command line arg '-BTI' (long name '--rodToIntervalTrackName'). The parameter to this arg is the name of the ROD track, which must be a track name specified in the -B option.
...
Using this feature, sites covered by the target ROD will be iterated over. This list of intevals generated is merged with any intervals from the -L and -XL args, and the Walker is run over the resulting merged list.
WARNING: for very large ROD's this can be costly. Consider this experimental for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3134 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 05:14:41 +00:00
aaron
20cc2a85a4
removed the hashmap from Genotype Concordance, moved it into a table
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3133 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 21:24:48 +00:00
aaron
e55f27b3b1
forgot a file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3132 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 20:51:13 +00:00
aaron
9ca8e345fc
by-by old junk.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3131 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 20:41:48 +00:00
aaron
8fd59c8823
Modified the report system based on Ryan's feedback: tables are now created independently to avoid the permutation problem when they were all compressed in rows, and removed our dependency on FreeMarker. The Grep format stays the same.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3130 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 20:39:55 +00:00
depristo
918b746798
More detailed validation output. Fixes for genotyping overflow -- these are temporary and need to be properly resolved
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3129 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 16:38:28 +00:00
ebanks
e7dad728df
Trivial output changes for consistency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3128 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 14:47:43 +00:00
depristo
058e7d3d12
Bug fix for Gregory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3127 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 00:21:35 +00:00
rpoplin
7b44e6bd55
ApplyVariantClusters now outputs interesting threshold points based on hitting the target novel TiTv
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3126 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 19:47:29 +00:00
rpoplin
60c227d67f
Added new VE2 module to create a plot of titv ratio by variant quality score
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3125 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 15:19:27 +00:00
asivache
3530ef5a41
Explicit type cast fixed in order to work with new ROD implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3124 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 15:02:56 +00:00
rpoplin
2d002c56c3
Added histogram of variant quality scores broken out by true positive and false positive calls to the GenotypeConcordance module of VariantEval2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3123 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 13:48:31 +00:00
aaron
12e4f88ca7
a little bit more clean-up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3122 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-05 20:49:06 +00:00
aaron
df7e7921ce
removing some unused code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3121 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-05 19:30:08 +00:00
ebanks
56eb15f91f
Error checking for bad input (thanks, Aaron).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3120 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-04 03:17:01 +00:00
weisburd
705b28e90d
First attempt at implement record filtering based on special 'hap_ref', 'hap_alt' columns in the input files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3118 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-02 21:52:26 +00:00
weisburd
d78e7f6c0a
Added documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3117 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-02 21:51:28 +00:00
aaron
8017fb123f
changed the depth of coverage walkers class name, and added a dependency in the packaging system so that RODs will all get imported.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3116 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-02 20:55:19 +00:00
weisburd
6b7b07f178
First checkin of GenomicAnnotator which annotates an input VCF file by pulling data in a generic way from an arbitrary set of TabularRODs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3114 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-02 17:49:42 +00:00
rpoplin
642c969896
reverting optimizer changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3112 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-02 16:59:13 +00:00
chartl
d7880ef7ad
Forgot to uncomment the AlignerIntegrationTest before committing. And yes, matt, commenting it out is, in fact, easier than just setting my classpath.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3110 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 17:17:16 +00:00
chartl
f7d1b8f5de
CoverageStatistics has now replaced DepthOfCoverage -- old DoC is in the archive.
...
Also, I can't be bothered to fix the spelling of "oldepthofcoverage" to contain the necessary number of D's. Be content that it does, however, contain the requisite number of O's.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3109 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:27:23 +00:00
aaron
585cc880a2
changed jexl expressions to jexl names in the VariantEval2 output, fixed integration test, and fixed a problem where a line was getting dropped in CSV output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3108 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:23:14 +00:00
hanna
d00bde22db
Reverting one of Brett's changes that should not have been committed. Will
...
address with Brett separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3107 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:10:46 +00:00
bthomas
b4f6f54502
Reorganizing the way interval arguments are processed
...
Most of the changes occur in GenomeAnalysisEngine.java and GenomeLocParser.java:
-- parseIntervalRegion and parseGenomeLocs combined into parseIntervalArguments
-- initializeIntervals modified
-- some helper functions deprecated for cleanliness
Includes new set of unit tests, GenomeAnalysisEngineTest.java
New restrictions:
-- all interval arguments are now checked to be on the reference contig
-- all interval files must have one of the following extensions: .picard, .bed, .list, .intervals, .interval_list
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3106 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 12:47:48 +00:00
aaron
c3c6e632d1
support for two new VCF header info field value-types, Flag (for fields that are just boolean truths), and Character (for single charatcer info fields).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3105 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 03:11:32 +00:00
aaron
3d3d19a6a7
the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date.
...
a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects. This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc. This layer is needed so we can integrate Tribble tracks (which don't natively have names). Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc).
Sorry for the inconvenience! More changes to come, but this is by far the largest (as has the greatest effect on end users).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 22:39:56 +00:00
hanna
4fcee248f9
For Kristian: functions which, given a read, can uniquely identify the BAM file storing that read.
...
Introducing this into the pile of code which peeks under the covers of the SAMDataSource in the hopes
that this function can help to replace the others and provide a single path for crosstalk.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3103 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 20:46:44 +00:00
rpoplin
d58fe70708
Correctly ignore filtered calls and indel calls in the truth sets
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3101 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 14:33:01 +00:00
hanna
b60197ae10
Another round of cleanup and simplification in Picard -- Picard's unit tests
...
are now passing for my branch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3100 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 01:02:59 +00:00
depristo
40f8e7644c
Better, multi-haplotype aware haplotype scores. Looking very good now, seems to be vastly better at dealing with incorrect calls in deep and low pass data. Almost ready for use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3099 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 23:57:36 +00:00
depristo
f992f51a3b
Deleting incorrect sampling genotype likelihoods from the codebase
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3098 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 23:56:35 +00:00
kiran
b9d3fc3fbb
Now checks if the i-th element of the FiltrationContext[] is null before trying to access it. This seems to happen occassionally at the very end of a VCF file... the array will be 6 elements long, but the last element will actually be null.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3097 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 22:40:17 +00:00
hanna
400684542c
Revisions to take into account finalization of Picard patch: naming changes, better definition
...
of public interfaces. This won't be the last Picard patch, but it should be the last big one.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3096 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 19:28:14 +00:00
aaron
b00d2bf2bc
fixing an annotation that was breaking the error log output system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3095 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 15:34:04 +00:00
aaron
a6e8687d71
implementing a clean way to import the template files into the GATK jar (they should not always get bundled). All further resources should be added to the gatk.resources path id in the build script.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3094 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 04:20:19 +00:00
ebanks
babb9fb825
snp cluster filter should ignore ref calls when determining the clusters
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3093 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 17:57:33 +00:00
chartl
24461a2503
Let's *not* import classes that no longer exist. How my own ant test compiled is beyond me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3091 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:59:01 +00:00
chartl
dc802aa26f
Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
ebanks
1e8b3ca6ba
Fare thee well, oh LocusWindowTraversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3089 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:17:26 +00:00
depristo
8ea98faf47
Deleting the pooled calcluation model -- no longer supported.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3088 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 11:44:27 +00:00
hanna
85037ab13f
Fix for Kiran's sharding issue (Invalid GZIP header). General cleanup of
...
Picard patch, including move of some of the Picard private classes we use to Picard public.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3087 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 03:21:27 +00:00
depristo
a45ac220aa
Removing unnecessary printing routines
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3086 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-28 22:34:54 +00:00
depristo
b8ab74a6dc
Minor useful changes to BaseUtils and MathUtils to support a new haplotype score annotation that determines to the two most likely haplotypes over an interval and scores variants by their consistency with a diploid model. Appears to be useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3085 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-28 21:45:22 +00:00
kshakir
e9e53f68ab
Filter lists can now end with .list or .txt.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3084 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 17:41:24 +00:00
aaron
074ec77dcc
First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be
...
specified with the reportType command line option in VE2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 03:59:32 +00:00
kiran
85f4f66180
Updated to use VariantContext. Output has been reformatted: variant and genotype concordance are emitted for every coverage level per variant. If the requested sampling level is higher than what's available, the maximum available coverage at that locus is used. This makes it much easier to make plots indicating the percentage of comparison callset recovered at a certain sampling depth.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3082 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 21:02:43 +00:00
kiran
391e5843e4
If the annotation engine has not been supplied, don't try to annotate anything.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3081 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 20:52:21 +00:00
kiran
8048b709a0
Selects a single sample on which to operate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3080 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 20:50:58 +00:00
kshakir
20e3ba15ca
Added an optional argument -rgbl --read_group_black_list to filter read groups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3079 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 19:38:57 +00:00
ebanks
73a14a985b
Moving VariantsToVCF to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3078 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:55:12 +00:00
ebanks
14bf6923a8
HapMap-to-VCF now works fine within Variants-to-VCF. Added integration test for it and removed old code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3077 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:34:59 +00:00
hanna
78af6d5a40
New sharding system is going live again for on-the-fly merging.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3076 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 18:39:04 +00:00
hanna
46c14ec63f
New, much less memory intensive implementation of BAM file sharding. Streams indices together with the expectation
...
that bins will be present in the bin sparse array, which avoids the problem of having to hold the sparse bin array
stored in every BAM file index in memory at the same time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3075 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 17:41:22 +00:00
ebanks
4398a8b370
Updated. Now uses VariantContext and is truly "variants" to vcf (i.e. not just GELI to vcf).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3074 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 04:53:31 +00:00
ebanks
2373a4618f
bug caused by a misprint: context != contexts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3073 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 03:08:24 +00:00
ebanks
3176715c74
1. Alignability mask returns null when not available.
...
2. --list now prints out the available classes/groups too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3072 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 20:49:07 +00:00
rpoplin
06a212e612
Adding VariantConcordanceROCCurveWalker to create ROC curves comparing concordance between optimized call sets and validation truth sets in VCF format in order to evaluate performance of variant optimizer independently of achieving a particular novel ti/tv ratio. Added option to ignore only the specified filters in the input call sets via --ignore_filter <String>. Added option to provide a prior estimate of error for known snps via --known_prior <qual>. The het and hom calls are clustered independently. Infrastructure in place to use titv of known snps to inform p(true) of novel snps. Tweaked protection against overfitting based on suggestions from several people. Minor edits to AnalyzeAnnotations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3071 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 19:43:10 +00:00
ebanks
47e30aba92
Rods for reads hooked up into the cleaner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3070 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 18:17:56 +00:00
aaron
5079f35e40
better method names for read based reference ordered data access.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3069 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 16:13:31 +00:00
ebanks
49117819f5
For the cleaner to clean, it must beat the entropy produced by the aligner (and not just the raw reads).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3068 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 15:21:58 +00:00
aaron
60dfba997b
added some sample annotations to VariantEval2 analysis modules, and some changes to the report system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3067 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 05:40:10 +00:00
hanna
1f451e17e5
Changing preloaded index to only "preload" reference sequences on demand.
...
Results in drastic lowering of startup cost when multiple BAM files are
merged.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3066 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 22:02:28 +00:00
hanna
884a577013
Phase 2 of Picard patch refactoring: kill off SAMFileReader2/BAMFileReader2, merging the changes back into the base classes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3065 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 16:48:11 +00:00
aaron
7462a0b2d1
cleaned-up of VariantContextAdapter tests, fixed the double comparisons in equals() in RodGeliText (nice MathUtils.compareDoubles Kiran)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3064 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 15:18:30 +00:00
aaron
a69b8555dd
Geli to variant context.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3063 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 06:45:29 +00:00
aaron
eafdd047f7
GLF to variant context. Added some methods in GLF to aid testing; and added a test that reads GLF, converts to VC, writes GLF and reads back to compare.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3062 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 03:43:25 +00:00
hanna
3767adb0bb
Processing intervals as they stream in means much lower memory usage and
...
quicker runtime. Making change as minimal as possible to avoid conflicts
with BT's incoming patch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3061 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 22:04:45 +00:00
ebanks
0097106938
VariantFiltration can now filter specific samples.
...
This is *NOT* an ideal implementation. One day when we have lots of free time (or a greater desire), we will implement this correctly and sophisticatedly using all the power of JEXL. For now, though, this will have to do.
Docs coming tonight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3060 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 20:45:11 +00:00
asivache
543aefc3d7
Fixing the bug introduced with the earlier commit. When trimming locus to the current bases, we need to take into account expanded boundaries (for windowed reference traversals)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3059 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 19:20:34 +00:00
asivache
ee1dc6092f
Test updated. Now we do not throw an exception when locus interval is out of bounds, we just return silently a reference context trimmed to the current shard boundaries. New test checks for trimming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3058 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:37:52 +00:00
asivache
d2944461ef
We also have to allow the window to be (partially) outside the bounds and trimming to the contig size is not enough (thanks to shards). Now we trim to the current bounds too (i.e. if the interval is not completely within current bounds, we create reference context that contains only bases from the overlap between the interval and the bounds).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3057 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:36:29 +00:00
asivache
9053406798
LocusReferenceView: If the locus a view is requested for spans beyond the reference contig ends, create the actual window bounded by contig ends (so that the locus will not be fully contained in the window!!).
...
ReferenceContext: constructor does not throw an excepion anymore when locus is not fully contained inside the window. So now we can have a reference context associated with a locus such that the window/actual bases do not cover the whole locus. Scary. I am not sure I like this...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3056 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 15:59:15 +00:00
aaron
439c34ed38
clean-up before annotating VariantEval2 for output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3055 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 07:39:20 +00:00
depristo
076d21d394
Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 02:47:09 +00:00
hanna
6cd97b78ab
An additional safety check to ensure that we only walk over coordinate-sorted
...
data when doing locus traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3053 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:31:45 +00:00
hanna
b4b4e8d672
For Sarah Calvo: initial implementation of read pair traversal, for BAM files
...
sorted by read name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3052 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:22:25 +00:00
hanna
c0eb5c27ea
Lower memory support for merged sharding. Merged sharding is still not available.
...
WARNING: If you update frequently, you might have to rm -rf ~/.ant/cache -- this is an unfortunate side effect of the way we
distribute picard-private.jar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3050 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 22:03:47 +00:00
ebanks
4d4db7fe63
Renaming for consistency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3049 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:45:01 +00:00
ebanks
4c4d048f14
Moving VariantFiltration over to use VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3048 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:35:23 +00:00
ebanks
c88a2a3027
Fixing/cleaning up the vcf merge util
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3047 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:13:32 +00:00
rpoplin
cdec84aa8f
Bug fix for variant optimizer. Remember to close the PrintStreams it uses to output the cluster files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3046 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:07:32 +00:00
depristo
d8ff552311
Support for EXPERIMENT sampling-based genotype likelihoods
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3044 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:40 +00:00
depristo
7b17bcd0af
Refactoring a few useful routines for detecting mendelian violations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3043 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:01 +00:00
depristo
56092a0fc2
Slight cleanup for mathutils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3042 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:18:08 +00:00
depristo
b221ce94ce
Still being tested trio-aware genotyper that calculates P(de novo)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3041 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:11:39 +00:00
ebanks
03480c955c
And now the UnifiedGenotyper can officially annotate genotype (FORMAT) fields too.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3039 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 04:58:37 +00:00
ebanks
e757f6f078
Missing value for arbitrary format entries is empty string (need to revisit at some point, but it will require updating the VCF spec).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3038 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:56:27 +00:00
ebanks
0311980668
The VariantAnnotator can now officially annotate genotype (FORMAT) fields.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3037 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:30:14 +00:00
hanna
9b61d95d9c
Khalid found an out-of-memory condition with the new sharding system when
...
merging lots of BAMs, and the fix is taking longer than I thought. Disable
experimental sharding when merging until the fix is ready.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3036 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 02:43:46 +00:00
ebanks
b8e8852b4f
Better interface for the Annotator in how it interacts with VariantContext.
...
Also, added a proof of concept genotype-level annotation (not working yet, almost there).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3035 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 20:41:57 +00:00
hanna
96662d8d1b
Moving from GATK dependencies on isolated classes checked into the GATK
...
codebase to a dependency on a jar file compiled from my private picard branch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3034 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 17:43:42 +00:00
aaron
8a5f0b746e
some cleanup for the output system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3032 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:54:39 +00:00
rpoplin
c78fc23ec5
Minor updates to output of variant optimizer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3031 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:46:47 +00:00
ebanks
0247548400
Fixed one test and (temporarily) punted on another
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3030 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 06:22:48 +00:00
ebanks
ee0e833616
Some significant changes to the annotator:
...
1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental.
2. Users can now not only specify specific annotations to use, but also the interface names from #1 . Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest.
3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator.
4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 05:38:32 +00:00
rpoplin
58a31bab6a
Variant optimizer now outputs VCF files via ApplyVariantClustersWalker. Documentation to be added to the wiki. It is ready to be used by other people but only with great caution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3028 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 20:41:42 +00:00
hanna
d9398dc347
Remove some of the restrictions on getStart() and getStop(); getStart() and getStop()
...
now do the minimum validation rather than the more rigorous only-within-the-contig-bounds
header validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3027 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 19:39:30 +00:00
aaron
182f1061ff
Bamboo isn't picking up commits for some reason; updating a copyright to see if it'll get this commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3025 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 17:56:48 +00:00