Commit Graph

1078 Commits (c8b1c92a6ca2d99e8050d04bb30778a2d4008099)

Author SHA1 Message Date
Guillermo del Angel 1bfe28067f Don't try to genotype an indel even bigger than the reference window size, or else we'll be out of bounds. Necessary to handle Phase 1 integrated callset with large deletions. Better error indication when validating a GenomeLoc. 2011-12-08 12:54:08 -05:00
Ryan Poplin cb284eebde Further updating VQSR tutorial wiki docs to reflect the bundle 2011-11-29 14:00:57 -05:00
Ryan Poplin 447e9bff9e Updating VQSR tutorial wiki docs to reflect the bundle 2011-11-29 09:57:45 -05:00
Guillermo del Angel 75d93e6335 Another corner condition fix: skip likelihood computation in case we cut so many bases there's no haplotype or read left 2011-11-22 22:46:12 -05:00
Guillermo del Angel 32a77a8a56 Prevent out of bound error in case read span > reference context + indel length. Can happen in RNAseq reads with long N CIGAR operators in the middle. 2011-11-22 13:57:24 -05:00
Eric Banks c62082ba1b Making this class public again as per request from Cancer folks 2011-11-18 12:34:27 -05:00
Matt Hanna 6a5d5e7ac9 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable 2011-11-16 09:57:13 -05:00
Matt Hanna 7ac5cf8430 Getting rid of unsupported CountReadPairs walker in stable. Removal of
remainder of pairs processing framework to follow in unstable.
2011-11-16 09:53:59 -05:00
Ryan Poplin 348f2db7fd Fix for HMM optimization. If the two penalty arrays match exactly the function should return the end of the array instead of 0. 2011-11-09 22:00:52 -05:00
Christopher Hartl d828eba7f4 Allow comments in a table-formatted file to precede the header line. 2011-11-09 15:27:38 -05:00
Christopher Hartl 11abb4f9d1 Better error message. 2011-11-09 11:25:28 -05:00
Christopher Hartl d3a533b82e Revert "a"
This reverts commit 1175f50ddbf389f5da74d27dc725596582ae15af.
2011-11-09 11:22:26 -05:00
Christopher Hartl 5eaf800281 a 2011-11-09 11:22:20 -05:00
Christopher Hartl 091229e4db MVLikelihoodRatio now checks if the family string is provided before attempting to instantiate. Also check that variant contexts have both genotypes and genotype likelihoods.
Table codec now yells at users for not providing a HEADER with the table - parsing tables without a header line was causing the first line of the file to be eaten.
Table feature now has a toString method.

These are minor bug fixes.
2011-11-09 11:03:29 -05:00
Ryan Poplin b0e6afec48 Bug fix for HMM optimization. Need to also check the gap continuation penalty array for the index with the first discrepancy. 2011-11-08 14:51:25 -05:00
Ryan Poplin 0b181be61f Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this. 2011-11-07 15:25:16 -05:00
Ryan Poplin 2d1e385ca4 Adding note to VQSR docs about Rscript being needed in the environment PATH. 2011-11-07 14:04:13 -05:00
Eric Banks cdd40d1222 Removing contracts for the SimpleTimer 2011-11-06 22:22:49 -05:00
Eric Banks 90a053ea93 Don't change the mapping quality of MQ=255 reads in IR 2011-11-05 22:40:45 -04:00
Mark DePristo e99871f587 Bug fix for decode loc
-- decodeLoc() wasn't skipping input header lines, so the system blew up when there was an = line being split.
2011-11-04 13:20:54 -04:00
Mark DePristo a340a1aeac Bug fix. decodeLoc() should update lineNo so you get meaningful line no when indexing
due to malformed VCF files.
2011-11-04 11:44:24 -04:00
Mark DePristo 9f260c0dc1 Zero byte index bug fix for RandomlySplitVariants + cleanup
-- vcfWriter2 was never being closed in onTraversalDone(), so the on the fly index file was being created but never actually properly written to the file.

-- This bug is ultimately due to the inability of the GATK to allow multiple VCF output writers as @Output arguments, though

-- Removed the unnecessary local variable iFraction, = 1000 * the input fraction argument.  Now the system just uses a double random number and compares to the input fraction at all.  Is there some subtle reason I don't appreciate for this programming construct?
2011-11-04 09:45:20 -04:00
Mauricio Carneiro e89ff063fc GATKSAMRecord refactor
The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...).

* No tools should create SAMRecord anymore, use GATKSAMRecord instead *
2011-11-03 15:43:26 -04:00
Eric Banks e8bceb1eaa Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-02 21:13:54 -04:00
Eric Banks 78a00d2ddc Updating UG integration tests (needed updating only because the -mbq default is different from the old -mmq one). 2011-11-02 21:13:44 -04:00
Eric Banks 52b16bf739 Must check whether there's a normal vs. extended pileup before asking for it. 2011-11-02 20:45:24 -04:00
Eric Banks e1edd6bd12 Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain. 2011-11-02 20:32:58 -04:00
Ryan Poplin e94fcf537b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-02 16:29:19 -04:00
Ryan Poplin 4d35272916 Bug fixes with Mauricio to functions in ReadUtils used by reduced reads and the haplotype caller. 2011-11-02 16:29:10 -04:00
Mark DePristo 8a2929c1dd Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-02 16:21:00 -04:00
Eric Banks 967ff647b8 Reduced reads shouldn't contribute to Fisher Strand calculations 2011-11-02 13:07:20 -04:00
Eric Banks cf0e699226 QualByDepth was inefficiently iterating over the pileup 2 times for some reason. Removed non-useful annotation classes. 2011-11-02 12:58:38 -04:00
Eric Banks 4501dce58d Fixing merge conflict 2011-11-02 12:50:32 -04:00
Eric Banks 54331b44e9 New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths. 2011-11-02 12:47:30 -04:00
Mark DePristo 392e0aeace Moved unit tests into master IntervalUtilsUnitTest 2011-11-02 10:52:00 -04:00
Mark DePristo c2b97030a4 IntervalUtils for completely balanced locus-based scatter/gather
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Mark DePristo 5fc613f972 Better default partition types for walkers
-- Added PartitionType.READ, and associated ReadScatterFunction.  ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Mauricio Carneiro 36600fd8e9 added MQ of low MQ/BQ to consensus RMS
Bases that were excluded for MQ and BQ filters are now contributing to the MQ RMS (but not to consensus base counts and variant/not variant region triggers).
2011-11-01 17:46:12 -04:00
Mauricio Carneiro b004489c6d Moving ReduceRead TAG to GATKSAMRecord
ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.
2011-11-01 17:12:09 -04:00
Mauricio Carneiro 17cc484dbd Revert "ReduceReads ref bases are now output as '='
Reducing the reference bases to '=' results in an extra compression of 13% on average. The GATK is not ready to handle files with '=' bases, and the decision was to implement this a an engine support, not a part of ReduceReads.
2011-11-01 16:35:07 -04:00
Eric Banks 0839c75c8d More minor fixes to docs 2011-10-31 21:49:27 -04:00
Eric Banks 74b018a1f3 Minor fixes to docs 2011-10-31 21:41:43 -04:00
Eric Banks 31ee5432c5 Merged bug fix from Stable into Unstable 2011-10-31 14:56:59 -04:00
David Roazen cdde32acbd Merged bug fix from Stable into Unstable 2011-10-31 14:21:15 -04:00
Eric Banks f62af0291b Check for invalid VCF records (not enough tokens) instead of assuming they are there. 2011-10-31 14:09:51 -04:00
Andrey Sivachenko bed0acaed4 nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them 2011-10-31 12:28:28 -04:00
Mauricio Carneiro 389380a590 ReduceReads ref bases are now output as '=' to save space
Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.
2011-10-30 12:04:39 -04:00
Eric Banks 0ca7428e76 Allow processing of empty intervals, but warn user when this case is encountered. 2011-10-28 12:12:14 -04:00
Eric Banks 649dfe98f0 Add VCF header for any expressions that are requested 2011-10-28 10:22:19 -04:00
Eric Banks 8b1a62da27 Adding unit test to cover overlapping intervals from the same source with the intersection rule. 2011-10-28 09:59:43 -04:00