Commit Graph

1072 Commits (714cac21c9d4b1a5a3d96b0c1299d1aa4d4348fd)

Author SHA1 Message Date
Mark DePristo dc4932f93d VariantEval module to stratify the variants by whether they overlap an interval set
The primary use of this stratification is to provide a mechanism to divide asssessment of a call set up by whether a variant overlaps an interval or not.  I use this to differentiate between variants occurring in CCDS exons vs. those in non-coding regions, in the 1000G call set, using a command line that looks like:

-T VariantEval -R human_g1k_v37.fasta -eval 1000G.vcf -stratIntervals:BED ccds.bed -ST IntervalStratification

Note that the overlap algorithm properly handles symbolic alleles with an INFO field END value.  In order to safely use this module you should provide entire contigs worth of variants, and let the interval strat decide overlap, as opposed to using -L which will not properly work with symbolic variants.

Minor improvements to create() interval in GenomeLocParser.
2011-11-10 10:58:40 -05:00
Mark DePristo e639f0798e mergeEvals allows you to treat -eval 1.vcf -eval 2.vcf as a single call set
-- A bit of code cleanup in VCFUtils
-- VariantEval table to create 1000G Phase I variant summary table
-- First version of 1000G Phase I summary table Qscript
2011-11-09 14:35:50 -05:00
Eric Banks 759f4fe6b8 Moving unclaimed walker with bad integration test to archive 2011-11-07 13:16:38 -05:00
Eric Banks c1986b6335 Add notes to the GATKdocs as to when a particular annotation can/cannot be calculated. 2011-11-07 11:06:19 -05:00
Eric Banks 724e3f3b0d Merged bug fix from Stable into Unstable 2011-11-06 22:23:22 -05:00
Eric Banks cdd40d1222 Removing contracts for the SimpleTimer 2011-11-06 22:22:49 -05:00
Ryan Poplin 5c565d28b9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-06 10:26:19 -05:00
Eric Banks 3517489a22 Better --sample selection integration test for VE. The previous one would return true even if --sample was not working at all. 2011-11-06 01:07:49 -04:00
Eric Banks 1c4e429a1c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-06 00:05:56 -04:00
Eric Banks a12bc63e5c Get rid of support for bams without sample information in the read groups. This hidden option wasn't being used anyways because it wasn't hooked up properly in the AlignmentContext. 2011-11-05 23:54:28 -04:00
Eric Banks ad57bcd693 Adding integration test to cover using expressions with IDs (-E foo.ID) 2011-11-05 23:53:15 -04:00
Eric Banks 90a053ea93 Don't change the mapping quality of MQ=255 reads in IR 2011-11-05 22:40:45 -04:00
Ryan Poplin 611a395783 Now properly extending candidate haplotypes with bases from the reference context instead of filling with padding bases. Functionality in the private Haplotype class is no longer necessary so removing it. No need to have four different Haplotype classes in the GATK. 2011-11-05 12:18:56 -04:00
Mark DePristo e99871f587 Bug fix for decode loc
-- decodeLoc() wasn't skipping input header lines, so the system blew up when there was an = line being split.
2011-11-04 13:20:54 -04:00
Mark DePristo a340a1aeac Bug fix. decodeLoc() should update lineNo so you get meaningful line no when indexing
due to malformed VCF files.
2011-11-04 11:44:24 -04:00
Mark DePristo 9f260c0dc1 Zero byte index bug fix for RandomlySplitVariants + cleanup
-- vcfWriter2 was never being closed in onTraversalDone(), so the on the fly index file was being created but never actually properly written to the file.

-- This bug is ultimately due to the inability of the GATK to allow multiple VCF output writers as @Output arguments, though

-- Removed the unnecessary local variable iFraction, = 1000 * the input fraction argument.  Now the system just uses a double random number and compares to the input fraction at all.  Is there some subtle reason I don't appreciate for this programming construct?
2011-11-04 09:45:20 -04:00
Mauricio Carneiro e89ff063fc GATKSAMRecord refactor
The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...).

* No tools should create SAMRecord anymore, use GATKSAMRecord instead *
2011-11-03 15:43:26 -04:00
Eric Banks e8bceb1eaa Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-02 21:13:54 -04:00
Eric Banks 78a00d2ddc Updating UG integration tests (needed updating only because the -mbq default is different from the old -mmq one). 2011-11-02 21:13:44 -04:00
Eric Banks 52b16bf739 Must check whether there's a normal vs. extended pileup before asking for it. 2011-11-02 20:45:24 -04:00
Eric Banks e1edd6bd12 Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain. 2011-11-02 20:32:58 -04:00
Ryan Poplin e94fcf537b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-02 16:29:19 -04:00
Ryan Poplin 4d35272916 Bug fixes with Mauricio to functions in ReadUtils used by reduced reads and the haplotype caller. 2011-11-02 16:29:10 -04:00
Mark DePristo 8a2929c1dd Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-02 16:21:00 -04:00
Eric Banks 967ff647b8 Reduced reads shouldn't contribute to Fisher Strand calculations 2011-11-02 13:07:20 -04:00
Eric Banks cf0e699226 QualByDepth was inefficiently iterating over the pileup 2 times for some reason. Removed non-useful annotation classes. 2011-11-02 12:58:38 -04:00
Eric Banks 4501dce58d Fixing merge conflict 2011-11-02 12:50:32 -04:00
Eric Banks 54331b44e9 New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths. 2011-11-02 12:47:30 -04:00
Mark DePristo 392e0aeace Moved unit tests into master IntervalUtilsUnitTest 2011-11-02 10:52:00 -04:00
Mark DePristo c2b97030a4 IntervalUtils for completely balanced locus-based scatter/gather
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Mark DePristo 5fc613f972 Better default partition types for walkers
-- Added PartitionType.READ, and associated ReadScatterFunction.  ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Mauricio Carneiro 36600fd8e9 added MQ of low MQ/BQ to consensus RMS
Bases that were excluded for MQ and BQ filters are now contributing to the MQ RMS (but not to consensus base counts and variant/not variant region triggers).
2011-11-01 17:46:12 -04:00
Mauricio Carneiro b004489c6d Moving ReduceRead TAG to GATKSAMRecord
ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.
2011-11-01 17:12:09 -04:00
Mauricio Carneiro 17cc484dbd Revert "ReduceReads ref bases are now output as '='
Reducing the reference bases to '=' results in an extra compression of 13% on average. The GATK is not ready to handle files with '=' bases, and the decision was to implement this a an engine support, not a part of ReduceReads.
2011-11-01 16:35:07 -04:00
Eric Banks 0839c75c8d More minor fixes to docs 2011-10-31 21:49:27 -04:00
Eric Banks 74b018a1f3 Minor fixes to docs 2011-10-31 21:41:43 -04:00
Eric Banks 31ee5432c5 Merged bug fix from Stable into Unstable 2011-10-31 14:56:59 -04:00
David Roazen cdde32acbd Merged bug fix from Stable into Unstable 2011-10-31 14:21:15 -04:00
Eric Banks f62af0291b Check for invalid VCF records (not enough tokens) instead of assuming they are there. 2011-10-31 14:09:51 -04:00
Andrey Sivachenko bed0acaed4 nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them 2011-10-31 12:28:28 -04:00
Mauricio Carneiro 389380a590 ReduceReads ref bases are now output as '=' to save space
Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.
2011-10-30 12:04:39 -04:00
Eric Banks 0ca7428e76 Allow processing of empty intervals, but warn user when this case is encountered. 2011-10-28 12:12:14 -04:00
Eric Banks 649dfe98f0 Add VCF header for any expressions that are requested 2011-10-28 10:22:19 -04:00
Eric Banks 8b1a62da27 Adding unit test to cover overlapping intervals from the same source with the intersection rule. 2011-10-28 09:59:43 -04:00
Eric Banks 057a79f598 This argument should be annotated as @Input 2011-10-28 09:44:49 -04:00
Eric Banks 4ba7c0cecd Moving to private 2011-10-28 09:29:28 -04:00
Eric Banks 1bdd76c2f2 These tools now use the IntervalBinding system to handle intervals instead of doing it all manually 2011-10-28 09:28:12 -04:00
Eric Banks 6ba08a103d Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory. 2011-10-28 09:23:25 -04:00
Eric Banks 3d04bb5608 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-27 23:55:18 -04:00
Eric Banks 19e27d4568 Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative. 2011-10-27 23:55:11 -04:00