Commit Graph

1291 Commits (5f7564bf0ae441e2a689f42772b710ac63d2b115)

Author SHA1 Message Date
aaron e682460c1f add a fix so that XL arguments won't cancel out -BTI arguments, fixed a bug for Ben where the ROD -> interval list conversion was throwing an exception, and some old code removal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3174 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 16:31:43 +00:00
weisburd 74ec72d1ac Added AnnotatorROD - the TabularROD format specific to GenomicAnnotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3164 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 14:39:50 +00:00
weisburd 77a6608784 Changed a variable name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3163 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 14:38:18 +00:00
hanna 8573b0bc6f Refactoring intervals, separating the process of parsing interval lists,
sorting and merging interval lists, and creating RODs from intervals.  This
gives Doug the ability to keep using our interval list parsing code when
sorting intervals on our behalf.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3159 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 15:50:38 +00:00
chartl 7b05091c04 DoC now does not require a -o argument. (Change for Matt)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3157 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 13:58:17 +00:00
ebanks e413882302 Generalizing the SequenomValidationConverter to be able to take in any arbitrary rod type (provided it can be converted to VariantContext).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3155 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-12 20:42:18 +00:00
hanna 14b8101d45 Error message fail. Failed to supply one of the valid interval file types.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3153 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-12 01:19:01 +00:00
hanna 60d54e69f3 Hackish fix to present a better error message if the file does not have the proper extension. Will work with Brett to come up with a better solution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3152 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-12 01:11:27 +00:00
ebanks 3434a61146 Don't trigger when ref=N (which can happen when a dbsnp track is provided)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3150 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-11 02:59:11 +00:00
ebanks 961ca05abc Removed outdated Sequenom rod and renamed HapMapGenotypeROD to HapMapROD.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3149 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-11 01:43:07 +00:00
ebanks 0cc6d0fbbb One more quick memory improvement: reuse Alleles in a given context instead of creating new ones for each sample (duh).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3147 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 18:48:36 +00:00
ebanks e73e6a4fb0 Significant memory improvements to plink code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3144 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 16:12:38 +00:00
ebanks fba48b515a Heads up everyone:
For consistency, these tools should be writing to the walker's output stream and no longer use the -vcf argument.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3140 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 05:37:25 +00:00
ebanks e286623f6f Use byte[] instead of String in an attempt to cut down on memory usage
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3139 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 05:32:54 +00:00
chartl 7025f5b51d Added an auxiliary table to DepthOfCoverage, which is the cumulative equivalent of the locus table (got tired of doing the calculation by hand). Also took care of a trailing tab in the per-locus output table.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3138 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 19:37:17 +00:00
aaron e148a3ac61 added the ability to create interval lists directly from a ROD, using the command line arg '-BTI' (long name '--rodToIntervalTrackName'). The parameter to this arg is the name of the ROD track, which must be a track name specified in the -B option.
Using this feature, sites covered by the target ROD will be iterated over.  This list of intevals generated is merged with any intervals from the -L and -XL args, and the Walker is run over the resulting merged list.

WARNING: for very large ROD's this can be costly.  Consider this experimental for now.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3134 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 05:14:41 +00:00
ebanks e7dad728df Trivial output changes for consistency
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3128 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 14:47:43 +00:00
depristo 058e7d3d12 Bug fix for Gregory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3127 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 00:21:35 +00:00
asivache 3530ef5a41 Explicit type cast fixed in order to work with new ROD implementation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3124 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 15:02:56 +00:00
ebanks 56eb15f91f Error checking for bad input (thanks, Aaron).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3120 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-04 03:17:01 +00:00
aaron 8017fb123f changed the depth of coverage walkers class name, and added a dependency in the packaging system so that RODs will all get imported.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3116 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-02 20:55:19 +00:00
weisburd 6b7b07f178 First checkin of GenomicAnnotator which annotates an input VCF file by pulling data in a generic way from an arbitrary set of TabularRODs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3114 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-02 17:49:42 +00:00
chartl f7d1b8f5de CoverageStatistics has now replaced DepthOfCoverage -- old DoC is in the archive.
Also, I can't be bothered to fix the spelling of "oldepthofcoverage" to contain the necessary number of D's. Be content that it does, however, contain the requisite number of O's.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3109 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:27:23 +00:00
bthomas b4f6f54502 Reorganizing the way interval arguments are processed
Most of the changes occur in GenomeAnalysisEngine.java and GenomeLocParser.java: 
-- parseIntervalRegion and parseGenomeLocs combined into parseIntervalArguments
-- initializeIntervals modified
-- some helper functions deprecated for cleanliness
Includes new set of unit tests, GenomeAnalysisEngineTest.java

New restrictions: 
-- all interval arguments are now checked to be on the reference contig
-- all interval files must have one of the following extensions: .picard, .bed, .list, .intervals, .interval_list



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3106 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 12:47:48 +00:00
aaron 3d3d19a6a7 the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date.
a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects.  This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc.  This layer is needed so we can integrate Tribble tracks (which don't natively have names).  Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc).

Sorry for the inconvenience!  More changes to come, but this is by far the largest (as has the greatest effect on end users).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 22:39:56 +00:00
hanna 4fcee248f9 For Kristian: functions which, given a read, can uniquely identify the BAM file storing that read.
Introducing this into the pile of code which peeks under the covers of the SAMDataSource in the hopes
that this function can help to replace the others and provide a single path for crosstalk.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3103 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 20:46:44 +00:00
hanna b60197ae10 Another round of cleanup and simplification in Picard -- Picard's unit tests
are now passing for my branch.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3100 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 01:02:59 +00:00
depristo 40f8e7644c Better, multi-haplotype aware haplotype scores. Looking very good now, seems to be vastly better at dealing with incorrect calls in deep and low pass data. Almost ready for use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3099 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 23:57:36 +00:00
depristo f992f51a3b Deleting incorrect sampling genotype likelihoods from the codebase
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3098 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 23:56:35 +00:00
kiran b9d3fc3fbb Now checks if the i-th element of the FiltrationContext[] is null before trying to access it. This seems to happen occassionally at the very end of a VCF file... the array will be 6 elements long, but the last element will actually be null.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3097 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 22:40:17 +00:00
hanna 400684542c Revisions to take into account finalization of Picard patch: naming changes, better definition
of public interfaces.  This won't be the last Picard patch, but it should be the last big one.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3096 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 19:28:14 +00:00
aaron b00d2bf2bc fixing an annotation that was breaking the error log output system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3095 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-30 15:34:04 +00:00
ebanks babb9fb825 snp cluster filter should ignore ref calls when determining the clusters
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3093 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 17:57:33 +00:00
chartl 24461a2503 Let's *not* import classes that no longer exist. How my own ant test compiled is beyond me.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3091 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:59:01 +00:00
chartl dc802aa26f Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
ebanks 1e8b3ca6ba Fare thee well, oh LocusWindowTraversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3089 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:17:26 +00:00
depristo 8ea98faf47 Deleting the pooled calcluation model -- no longer supported.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3088 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 11:44:27 +00:00
hanna 85037ab13f Fix for Kiran's sharding issue (Invalid GZIP header). General cleanup of
Picard patch, including move of some of the Picard private classes we use to Picard public.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3087 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 03:21:27 +00:00
depristo a45ac220aa Removing unnecessary printing routines
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3086 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-28 22:34:54 +00:00
depristo b8ab74a6dc Minor useful changes to BaseUtils and MathUtils to support a new haplotype score annotation that determines to the two most likely haplotypes over an interval and scores variants by their consistency with a diploid model. Appears to be useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3085 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-28 21:45:22 +00:00
kshakir e9e53f68ab Filter lists can now end with .list or .txt.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3084 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 17:41:24 +00:00
kiran 391e5843e4 If the annotation engine has not been supplied, don't try to annotate anything.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3081 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 20:52:21 +00:00
kiran 8048b709a0 Selects a single sample on which to operate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3080 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 20:50:58 +00:00
kshakir 20e3ba15ca Added an optional argument -rgbl --read_group_black_list to filter read groups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3079 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 19:38:57 +00:00
ebanks 73a14a985b Moving VariantsToVCF to core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3078 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:55:12 +00:00
ebanks 14bf6923a8 HapMap-to-VCF now works fine within Variants-to-VCF. Added integration test for it and removed old code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3077 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:34:59 +00:00
hanna 78af6d5a40 New sharding system is going live again for on-the-fly merging.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3076 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 18:39:04 +00:00
hanna 46c14ec63f New, much less memory intensive implementation of BAM file sharding. Streams indices together with the expectation
that bins will be present in the bin sparse array, which avoids the problem of having to hold the sparse bin array
stored in every BAM file index in memory at the same time.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3075 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 17:41:22 +00:00
ebanks 3176715c74 1. Alignability mask returns null when not available.
2. --list now prints out the available classes/groups too.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3072 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 20:49:07 +00:00
ebanks 47e30aba92 Rods for reads hooked up into the cleaner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3070 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 18:17:56 +00:00