Commit Graph

3360 Commits (618c69f8dcf75b1bf458b1869c35f636f7acffbb)

Author SHA1 Message Date
kiran 48e311a5ea Added copyright notice.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3980 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:11:51 +00:00
kiran 9aa70d9c7c Replaced by SelectVariants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3979 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:07:42 +00:00
kiran 758ab428f5 Better logging info for the samples being selected and the sample expressions being ignored.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3978 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 07:03:37 +00:00
kiran e242a8f143 Put single quotes around the regex. This isn't strictly necessary through the integration test machinery, but *is* necessary at the console, and it's convenient to be able to cut and paste this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3977 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:56:57 +00:00
kiran 13f29660bb Integration test for SelectVariants. Tests a complex case with an explicit sample selection, sample selection by regex, exclusion of non-variant and filtered loci, and JEXL selection on low allele-frequency variants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3976 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:49:47 +00:00
ebanks 637a1e5055 Updating to use the new VA interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3975 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:31:01 +00:00
ebanks bd6d5a8d51 Adding command-line header to VA and VF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3974 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:21:15 +00:00
kiran 64446f0ddf Avoid NaNs in the final output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3973 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 05:16:52 +00:00
ebanks 3f6e44dc71 Updated recalibrator and cleaner to output full command-lines in the bam header
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3972 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:39:18 +00:00
kiran 0da0dfa1da Cosmetic change - lower-case for all command-line arguments' short names.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3971 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:12:01 +00:00
kiran eb1bb94d1c Moved the evaluation of the JEXL expressions to a point *after* the samples are subset and the INFO-field annotations are updated. I think this makes more sense than having the evaluations happen beforehand, since it seems jarring to have the JEXL expressions operate on the annotations before they're updated, and have the file contain the annotations after they're updated. Now, selecting on something like allele frequency will actually apply to the annotations that actually end up in the file, while selection on other annotations (which are carried over without modification) will act exactly the same regardless.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3970 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 04:09:02 +00:00
ebanks 594b7912f1 Added a generic method for returning the complete command-line used when calling a walker, to be used in the bam/vcf headers. As requested, every possible engine/walker argument is included. I've added it to the Unified Genotyper output, so people can try it out and let me know what they think. Something that needs to be discussed in group meeting: what happens when we merge VCFs? Do we keep all of the command-lines?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3969 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 03:53:07 +00:00
kiran 6e389059cf An improved version of VariantSubset and VariantSelect, meant to replace those walkers. Takes in a VCF and creates a subsetted VCF by sample(s), JEXL expressions, or both.
When subsetting by sample, the -SN argument is treated as a literal sample name and, if no match is found, as a regular expression.  This allows for a large number of samples to be selected at once (useful when, for instance, cases are given one sample name prefix and controls are given another).

After the subsetting procedure, the INFO-field annotations AC, AN, AF, and DP are all recalculated to properly reflect the new contents of the VCF.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3968 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-08 02:57:06 +00:00
ebanks ac4699a650 Re-enabling this test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3962 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 20:20:37 +00:00
depristo f275041b1c -minimalVCF for CombineVariants. Work around for broken locking code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3960 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 16:10:59 +00:00
aaron 9076c0b28b removing unused code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3958 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 14:24:39 +00:00
ebanks 341e752c6c 1) AlleleBalance is no longer a standard annotation, but the Allelic Depth (AD) is for each sample.
2) Small fixes in the VCFWriter:
a) Trailing missing values weren't being removed if their count was > 1 (e.g. ".,.")
b) We were handling key values that were Lists, but not Arrays.  We now handle both.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3956 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-06 12:05:14 +00:00
aaron c68625f055 Fixes from Mark for the MutableContexts; this fixes the clearGenotypes() and the clearFilters() methods, and adds a method to clear the attributes. Also added is a method for creating a variant context where the attribute list is pruned to a specific subset, which can be null.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3955 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 22:39:51 +00:00
aaron 72ae81c6de VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include:
- Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from 
inside the tribble directory.
- Hapmap ROD now in Tribble; all mentions have been switched over.
- VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc.
- VariantContext.getSNPSubstitutionType is now in VariantContextUtils.
- This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN

I'll send out an email to GSAMembers with some more details.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 18:47:53 +00:00
fromer b21f90aee0 Added preliminary framework for performing short-range phasing (ReadBackedPhasingWalker.java)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3953 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:56:34 +00:00
rpoplin a8d37da10b Checking in everyone's changes to the variant recalibrator. We now calculate the variant quality score as a LOD score between the true and false hypothesis. Allele Count prior is changed to be (1 - 0.5^ac). Known prior breaks out HapMap sites
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3952 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 14:12:19 +00:00
ebanks 07addf1187 Fix for Kiran: since the Variant Annotator will re-annotate on top of existing annotations it makes sense to remove old headers if they conflict with the definitions being added by VA.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3951 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 06:44:39 +00:00
ebanks 1539791a04 Fix for Kiran: when using VCFs for the comp tracks in the Annotator(s), don't put the headers from them into the output VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3950 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:45:47 +00:00
ebanks 227c4b10f0 Bug fix for Chris: convert comp tracks to VC so that we can respect the filter field. Added an integration test to cover this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3949 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 04:13:16 +00:00
ebanks 84ca2f27bb Bug fix for Chris: added method createPotentiallyInvalidGenomeLoc() to the GenomeLocParser that doesn't check that the contig exists in the sequence dictionary. This is crucial for lifting over from one reference to another, as sometimes contigs names change in the liftover (e.g. chrM to MT).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3948 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 03:19:02 +00:00
ebanks f247cbf68e I want to be the first to use the new super-cool Hidden annotation! No more telling people not to use the cleaner debugging options.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3947 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:44:37 +00:00
hanna 78bfe6ac48 Added @Hidden annotation, a way to deliberately exclude experimental fields and
walkers from the help system.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3946 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 02:26:46 +00:00
chartl 82d6c5073b A simple read strand filter for potluri on get satisfaction
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3945 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 23:23:50 +00:00
asivache d53d5ffbf6 A utility class that computes running average and standard deviation for a stream of numbers it is being fed with. Updates mean/stddev on the fly and does not cache the observations, so it uses no memory and also should be stable against overflow/loss of precision. Simple unit test is also provided (does *not* stress-test the engine with millions of numbers though).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3944 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 21:39:02 +00:00
ebanks 8d8acc9fae Moving G's MyHapScore to replace the old HapScore
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3943 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 21:00:54 +00:00
ebanks 7858ffec32 Spit out the error in the warning message so that Sendu can tell me what his problem is
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3942 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 20:40:28 +00:00
delangel 86211b74e8 Bug fix: when padding alleles in creating a Variant context from an indel, leave no-call alleles as no-call alleles.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3940 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:51:10 +00:00
chartl 38e65f6e1b Added: A VariantEval module that gives simple metrics by sample, an an abstract class that makes per-sample modules easy to write (but a little bit clunky since a class needs be defined for each data point -- see SimpleMetricsBySample as an example). AnalysisModuleScanner needed a slight update to pull in data points from parent classes for this to work (thanks Khalid for showing me how to do this). After a code review with Aaron (thanks) and ensuring integration tests pass, I am committing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3939 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:37:39 +00:00
hanna f13d52e427 Attempt to determine whether underlying filesystem supports file locking and
disable on-the-fly dict and fai generation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3938 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 19:28:27 +00:00
ebanks 340bd0e2c1 Removed hard-coded pointers to references
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3934 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 17:59:37 +00:00
asivache a47824d680 A couple of type specific implementations of a single extend() method: takes an array (byte[] or short[] currently) and "extends" it to the left or to the right by the specified number of elements. Returns newly allocated array, with the content of original array copied in (if we extend by n elements to the left, then the returned array will have n default-filled elements *followed* by the content of the old array).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3932 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:30:48 +00:00
asivache 012a7cf0a5 mismatchCount now has a version that counts mismatches only along a part of the read (takes additional args start_on_read and length_on_read to specify the read's subsequence to be interrogated);
isMateUnmapped() convenience shortcut method added.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3931 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:27:35 +00:00
delangel e6e8a20a1e 1) Fix MyHaplotypeScore to ignore 454 reads, since all those pathological non-existing indels make some sites' score blow up. If a site is only covered by 454 reads, we (hopefully) detect this graciously and just emit a score of 0.0 for the site.
2) New annotation SByDepth = log10(-StrandBias/Depth) (non-standard annotation, key name = "SBD"). If StrandBias/Depth happens to be positive (very rare but can happen), annotation gets value=-1000. 
3) Abstracted out new class AnnotationByDepth so that QD and SBD can share code.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3930 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:23:08 +00:00
ebanks bf60ed0b25 Needed it here too: warn user instead of dying if the R script cannot be executed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3929 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:11:27 +00:00
ebanks 40ffe34686 Warn user instead of dying if the R script cannot be executed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3928 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:08:15 +00:00
ebanks 17d5e89734 Now --list annotates which modules are Standard
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3927 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 21:00:37 +00:00
ebanks 72875cf717 Removing annoying printouts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3926 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 19:55:00 +00:00
ebanks 2307bed742 VariantEval now uses the "standard" modules only by default. You can add other modules with the -E argument and not use all of the standard ones with -noStandard (they can be added back individually with -E).
Generalized some of the packaging code from VariantAnnotator.  Matt might want to take a look to make this nicer...?



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3925 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 16:51:10 +00:00
ebanks a7ff9caf54 Added sanity check against bad people and/or crazy big indels at edges of ref context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3918 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 05:37:17 +00:00
hanna 5f1b67c1de Coping out and forcing the entire GATK (and associated JVM) to use US English
locale.  Method to force JVM into proper locale exists in CommandLineProgram
and is disabled by default, but implementers of CommandLineProgram can opt in
to the forced US locale by calling a static method.

Question for the VCF developers: I removed the code to explicitly output doubles
in US locale.  Do you / how do you want to handle this in applications that use
Tribble outside the GATK?


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3917 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 03:48:26 +00:00
chartl 2bc69572cb Make transcript2info capable of handling b37/hg19 contigs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3915 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 17:32:08 +00:00
depristo c203e0fb02 Added JEXL support for hetCount, homRefCount, and homVarCount in VCs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3914 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 12:24:11 +00:00
depristo 7fab5c0a8f support for -singleton_fp_rate arguments to variant recalibrator instead of the pop.gen. AF prior. Worth experimenting with Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3913 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-31 21:17:47 +00:00
ebanks 6d91cd587e Be explicitly clear about which options are for debugging purposes only and shouldn't be used if your username is not ebanks@broad. If only we had a @hidden annotation option for args...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3909 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:18:31 +00:00
depristo ac8048f17b Support for automated selects for tranches in variant eval -- use -tf to make tranch-specific ve outputs. ApplyVariantCuts with tranche reading functions for general use, along with todo for ryan. CombineVariants now has --filteredAreUncalled and will treat filtered snps in input VCFs are uncalled, and so won't emit -filteredInOther set features
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3908 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:16:43 +00:00
chartl 9231d13252 Minor modification: adding an argument to make slightly more general.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3907 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 05:20:20 +00:00
chartl db54d63fc7 Hahaha yes, ownage. This now works.
BTW, Eric, thanks for forwarding the DepthOfCoverage thread to gsamembers. I'd forgotten about reduce by interval. Mighty helpful in this case!




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3906 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 04:23:02 +00:00
chartl 3e3f8c7692 Simple count intervals walker, as per my recent email to GSAMembers. Never use this. It doesn't behave the way you think it does.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3905 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 03:39:23 +00:00
delangel ba1a330293 Corrected location and made more explicit the error message thrown if someone tries to read a VCF 3.3 file with indels, which is not supported.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3901 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 20:02:47 +00:00
delangel 5af986e0c1 Add an integration test for Beagle (one for ProduceBeagleInput and one for BeagleOutputToVCFWalker)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3897 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 18:49:22 +00:00
delangel e1a34685fd Add back MyHaplotypeScore as a new implementation for HaplotypeScore, this time as a non-standard annotation. Implementaiton is also better, it computes better consensus haplotypes, ranks them by sum of quality score.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3890 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 21:23:19 +00:00
hanna 6c93b13428 A Java sizeof, implemented using the Java instrumentation API. Can either get the memory consumed either only by a single
object or by a single object and all the references it contains.  Requires a command-line change to add a Java agent to
the command-line; see the Sizeof.java javadoc for details.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3889 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 18:44:15 +00:00
rpoplin f5566a6593 Knocking out some quick findBugs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3887 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 14:10:59 +00:00
delangel 894623858d OK, bad idea to add new temporary annotation - revert to keep integration tests hapy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3886 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 12:07:13 +00:00
delangel 71bfb1ee35 First redesign of HaplotypeScore - now, a different approach is taken to build possible haplotypes at a site: first, all possible haplotypes consistent with reads are formed (reference is not used). After this list has been formed, it is ranked according to the number of reads that are consistent with it and the two most popular haplotypes are chosen.
this reduces to the old method in typical cases, but it builds haplotypes correctly if there are two variants close by within a context window.

Annotation is temporarily named MyHaplotypeScore so it can be run in parallel with old one, soon it will be renamed after some more testing.
 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3885 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 10:54:56 +00:00
delangel cffebcc867 Small utility walker used for production of the Beagle data processing paper section. Walker will print out to output file, for every site common to a reference vcf and an eval vcf, a given sample's depth, hapmap AC and AF and pre/post Beagle genotype as well as corresponding reference (e.g. Hapmap) genotype.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3884 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 03:00:17 +00:00
ebanks 1d9ed1e214 Cleanup of old VCFRecord code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3883 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 02:56:47 +00:00
ebanks 7dd55fbf13 Archiving
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3882 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 02:47:18 +00:00
aaron 9667942e52 fix for Ryan's issue: we also need to sync when we store a resource.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3881 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-26 22:17:47 +00:00
hanna 8b072b59e2 Returning index dumping functionality in BAMFileStat to a useable state.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3880 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-26 20:03:50 +00:00
depristo 19ad44d332 Minor improvements to CombineVariants to handle the complex case from Chris. IntegrationTest of complex case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3876 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 13:46:11 +00:00
ebanks 7c5a3836db Trivial changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3875 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 04:00:47 +00:00
ebanks 56de475f11 Based on feedback from non-GSA users, who claim that our exceptions are 'scary and overwhelming,' I've cleaned up the error message to first describe the error and what users should do and then ask them to copy the subsequent stack trace into their GetSatisfaction posting.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3874 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 03:57:44 +00:00
ebanks 9bd8a2685b Because the performance tests were busted on LSF, no one caught this error until now: when Matt changed over the contract for the AlignmentContext, this line needed to get updated too. All is well now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3873 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 02:53:01 +00:00
depristo b551eaf8fd Actually commit the code that makes variant eval run in a reasonable amount of time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3872 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-24 17:32:03 +00:00
depristo b0b37c3476 No handles (I believe) reference only VCs correctly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3871 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 23:09:23 +00:00
depristo e21376219d Updates to CombineVariants for Tim. -setKey can be null. Integrationtests for -setKey foo and -setKey null.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3870 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 22:35:52 +00:00
delangel 26bb1cd9ce Fix broken test correctly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3869 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 20:47:41 +00:00
delangel 4fc1db7aaf Change interface to VCFWriter add() method to take only 1 byte from reference (since that's the only thing it needs), to prevent bugs like having people call it with ref.addBases() which is wrong (since it provides bases starting from the left of reference context window).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3868 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 20:24:03 +00:00
aaron b3fd145161 fix for a bug deep in the tribble indexing: if you had a single record in the first contig, the second contig's index blocks would point to the wrong file seek location, and you'd see no
features in that contig. Thanks to Mark for finding this.  I'm not rev'ing the index version (which would cause all indexes to be rebuilt), since this seems like a pretty rare edge case.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3865 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 18:39:55 +00:00
depristo 33090629ea VariantEval can now see the EvaluationContext group objects, so they can decide if/when to print interesting sites. GenotypeConcordance has a hard-coded option to print FNs that is on the way to being generally useful. VCFWriter now uses the US locale for formatting floating point numbers; I believe this fixes a long-standing annoyance. Italian guys will check on this.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3864 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 17:16:50 +00:00
delangel 5eef15cfdf a) Bad bug fix to CombineVariants: when indels were being merged, the reference base provided was wrong - ref.getBases()[0] was being used, but this returns bease at start of window. Instead, the reference at current locus should be used.
b) Cosmetic change to Beagle annotation description.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3861 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 15:13:47 +00:00
ebanks 4ff8b8fc0e 1. Fixing a bug that Mark found where indel-containing clipped reads would get an original cigar tag even when they didn't actually get modified.
2. Added some useful logging messages.
3. Added a oneoffs walker to calculate the number of realigned reads and intervals containing them.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3860 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 14:24:01 +00:00
chartl 973934f769 Depth of coverage now uses longs rather than ints. We can now successfully run on the Lepidosiren paradoxa genome. (about 80 GB)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3859 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 14:14:12 +00:00
depristo 536399eaa0 Improvements to variant combine. Now calculates AC/AN/AF correctly by calling into the VariantAnnotator engine. Automatically removes annotations that are inconsistent across incoming VCs (in simpleMerge). TODO bug fix for Guillermo/Eric.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3858 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 13:33:11 +00:00
aaron 9579aace1f updates to code dependent on Tribble, as well as the following Tribble changes:
- makes writing to disk optional for indexes using the indexCreator classes (allow the user to specify the index file, if null don't write it)
- removed some system.out debugging code
- fixed version checking in interval tree 
- made indexes store and return a LinkedHashSet for sequence names (to ensure they've preserved the ordering in the file)
- index creators now read the file before creating the index
- changed the Index.write() method to take a LEDataStream instead of a file
- removed the sequence dictionary code on the header
- added utils for getting LEDataStreams
- added a base Tribble exception




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3857 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 01:56:10 +00:00
ebanks c5325b03be 1) Removed hard-coded strings. Please let's use the fields defined in VCFConstants.
2) General code cleanup.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3856 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 01:49:47 +00:00
hanna e9d243babb More improvements to exception handling during multithreaded runs based on
a bug reported by Ryan.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3855 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 22:13:01 +00:00
hanna 83798225ac Repackaged datasource-specific command-line tools into their own package. Added a tag renamer tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3854 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 19:50:34 +00:00
delangel 98caedb5f0 Forgot to update VCF4 unit test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3853 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 16:25:51 +00:00
asivache 485023ba8e this.intersect(that) method added to GenomeLoc (returns intersection of two intervals or dies if the locations do not overlap)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3852 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 16:00:30 +00:00
asivache 3308d956f4 Added utility shortcut method: getOriginalQualsInCycleOrder(read)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3851 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 15:44:25 +00:00
delangel 473ec91633 a) Bug fix in VCFHeader parsing - Info fields were not being parsed properly, with the result that the Count field was not being properly displayed in records (e.g. if Count=0 for a particular field, the INFO tag was still being displayed as ...;Field=x;... instead of ...;Field;...
b) Bug fixes and update to how we represent indels and other complex events in a VariantContext object. Convention is now that all events are left aligned, with the first variant context location marking the common base before an event occurs. However, alleles in a VC don't have the common base in all VC's. Two new functions are now part of VariantContextUtils: CreateVariantContextWithPaddedAlleles and CreateVariantContextWithTrimmedAlleles. Both take a VC as an input and create a VC as an output.
Main flow is that a VCF reader would create a VC with trimmed alleles, all walkers would ideally work with these trimmed alleles, and then the VCF writer would pad back the alleles before writing. However, there are special cases where we need to pad alleles like for example when merging/combining VC's.

Pending issues:
- PED and DBSNP RODs have to be updated to create VC's for indels following the convention above. Changes will go in after Tribble location is moved and things are tested.
- Need to verify Indel genotyper and other modules that create VC's with indels.- Wiki page describing convention above and how walkers should interpret indel VC's still needs updating/detailing.
 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3850 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 02:36:45 +00:00
chartl b696c3ea98 No more traversal reduce results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3849 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 18:34:54 +00:00
chartl 365b42390d Support for generating (very basic) wiggle files for use with IGV (see UCSC for wiggle spec); and a walker to take in a variant track and create a transition transversion rate track for the whole genome (due to the wiggle spec, this has to be done by chromosome). It's interesting to see the effect of genes!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3848 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 18:04:30 +00:00
depristo f7957bc7f2 Fixed memory leak in VariantEval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3845 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 12:35:46 +00:00
aaron 1cba81c16f updates to tribble with fixes for some bugs I've found in some new indexing code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3842 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 22:08:04 +00:00
ebanks ff6748d1cd oops - missed one
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3841 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 18:55:19 +00:00
ebanks c6ad26e04f 1) When quals/GQs are really integers (x.00), strip off the floating points.
2) Keep track of whether vcf records are unfiltered vs. pass filters in the variant context so we can regenerate the records on output.
3) No more "ID" hard-coded all over the code to set the VariantContext ID.  Use a static variable instead.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3840 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 18:01:45 +00:00
ebanks 0db7fab1a9 Fixing genotype filtering for VF and adding integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3839 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:30:21 +00:00
aaron 2a6c2d3098 re-enable test; I was moving the input file in prep for my last commit around on Eric, so he rightfully removed the test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3838 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:14:59 +00:00
aaron 0108517b98 updating the Tribble track loading code to use the new shared locks, updated lots of new tests, add infrastructure for the TreeInterval, and removed the old locking class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3837 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:08:10 +00:00
ebanks f742980864 1. Refactoring of GenoypeWriters so that parallelization now works again with VCF4.0. We now have just a single reference to the old VCF classes, and that one will be purged soon.
2. Moved Jared's VCFTool code into archive so that everything would compile.
3. Added the vcf reference base (needed for indels) as an attribute to the VariantContext from the reader.
4. TribbleRMDTrackBuilderUnitTest was complaining that a validation file didn'r exist, so I commented it out.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3835 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 06:16:45 +00:00
depristo 70b07206a2 CombineVariants tests for Guillermo and Eric to explore the correctness of the in/out reader, writer behavior of the system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3834 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 22:41:48 +00:00
depristo c47a5ff5ab Official parallel CountCovariates, passes all integration tests. Now poster-child example of parallelism in GATK (Matt H). Apparent general performance improvements throughout too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3833 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 22:13:18 +00:00