Commit Graph

7955 Commits (8a2929c1dd131498ffbd4adf2aab003013dfd377)

Author SHA1 Message Date
Mark DePristo 8a2929c1dd Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-02 16:21:00 -04:00
Mark DePristo e2f40da27f Scala script to run multi-sample analysis 2011-11-02 16:20:57 -04:00
Mark DePristo bd977c2d92 Bug fix to avoid infinite loop in GATKScatterFunction 2011-11-02 16:20:42 -04:00
Eric Banks cf0e699226 QualByDepth was inefficiently iterating over the pileup 2 times for some reason. Removed non-useful annotation classes. 2011-11-02 12:58:38 -04:00
Eric Banks 4501dce58d Fixing merge conflict 2011-11-02 12:50:32 -04:00
Eric Banks 54331b44e9 New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths. 2011-11-02 12:47:30 -04:00
Mark DePristo c1da8cd5e7 Final version of bp-resolved locus scatter/gather
-- Minor refactoring to allow LocusScatterFunction to have maxIntervals be the original scatter count, rather than capping this by the interval count as Contig and Interval do
2011-11-02 11:26:34 -04:00
Mark DePristo 392e0aeace Moved unit tests into master IntervalUtilsUnitTest 2011-11-02 10:52:00 -04:00
Mark DePristo c2b97030a4 IntervalUtils for completely balanced locus-based scatter/gather
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Mark DePristo 5fc613f972 Better default partition types for walkers
-- Added PartitionType.READ, and associated ReadScatterFunction.  ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Mauricio Carneiro 53c9f49050 Fixing contracts!
forgot to revert the contract changes. This will fix bamboo.
2011-11-01 18:09:29 -04:00
Mauricio Carneiro 7d194afda8 Revert "Using isReduceRead from GATKSAMRecord"
Apparently the casting SAMRecord to GATKSAMRecord is not allowed.
2011-11-01 17:54:30 -04:00
Mauricio Carneiro 36600fd8e9 added MQ of low MQ/BQ to consensus RMS
Bases that were excluded for MQ and BQ filters are now contributing to the MQ RMS (but not to consensus base counts and variant/not variant region triggers).
2011-11-01 17:46:12 -04:00
Mauricio Carneiro 18f4c63d44 Using isReduceRead from GATKSAMRecord
centralizing functionality of the reduced reads.
2011-11-01 17:15:58 -04:00
Mauricio Carneiro b004489c6d Moving ReduceRead TAG to GATKSAMRecord
ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.
2011-11-01 17:12:09 -04:00
Mauricio Carneiro 2b200c34a6 Removing testEqualBases
No need for the test if ReduceReads is not producing '=' bases anymore.
2011-11-01 17:05:27 -04:00
Mauricio Carneiro 17cc484dbd Revert "ReduceReads ref bases are now output as '='
Reducing the reference bases to '=' results in an extra compression of 13% on average. The GATK is not ready to handle files with '=' bases, and the decision was to implement this a an engine support, not a part of ReduceReads.
2011-11-01 16:35:07 -04:00
Mauricio Carneiro 76c32f5409 Revert "Compressed read group information"
We decided not to compress read group information because read groups should be universally unique. The gain of 3% compression was not worth it.

This reverts commit 79f1c3b70de240d8060ecb9a86d2f1d4ff2a8efb.
2011-11-01 16:33:21 -04:00
Eric Banks 0839c75c8d More minor fixes to docs 2011-10-31 21:49:27 -04:00
Eric Banks 74b018a1f3 Minor fixes to docs 2011-10-31 21:41:43 -04:00
Mauricio Carneiro b51c36abb3 TEST: equal '=' base test implementation 2011-10-31 18:01:22 -04:00
Eric Banks 31ee5432c5 Merged bug fix from Stable into Unstable 2011-10-31 14:56:59 -04:00
David Roazen cdde32acbd Merged bug fix from Stable into Unstable 2011-10-31 14:21:15 -04:00
Eric Banks f62af0291b Check for invalid VCF records (not enough tokens) instead of assuming they are there. 2011-10-31 14:09:51 -04:00
Andrey Sivachenko bed0acaed4 nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them 2011-10-31 12:28:28 -04:00
Mauricio Carneiro 7a8e49154d Fixing contracts
forgot to rename the variables in the contracts. This should resolve Bamboo failures.
2011-10-31 09:21:19 -04:00
Mauricio Carneiro ab4f58a5e9 A base for a test walker for reduce reads
this walker allows general testing on reduce reads given two bam files to make sure the reduce reads are not screwing up the format. This is just the base of the walker, the tests need to be implemented independently inside the framework.
2011-10-30 22:26:16 -04:00
Mauricio Carneiro e220507ff9 Compressed read group information
Adding another 3% reduction in file size by compressing the read group ID.
2011-10-30 21:09:06 -04:00
Mauricio Carneiro 389380a590 ReduceReads ref bases are now output as '=' to save space
Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.
2011-10-30 12:04:39 -04:00
Mauricio Carneiro dbd8c25787 No more R resources in the DPP
updating the DPP to conform with Analyze Covariates changes.
2011-10-28 16:57:01 -04:00
Khalid Shakir e25d40882a Swapping Thread.sleep(0) with Object.wait(0) caused Queue to lock up. Thanks to rpoplin for pointing it out. 2011-10-28 15:51:03 -04:00
Eric Banks 0ca7428e76 Allow processing of empty intervals, but warn user when this case is encountered. 2011-10-28 12:12:14 -04:00
Eric Banks 649dfe98f0 Add VCF header for any expressions that are requested 2011-10-28 10:22:19 -04:00
Eric Banks 8b1a62da27 Adding unit test to cover overlapping intervals from the same source with the intersection rule. 2011-10-28 09:59:43 -04:00
Eric Banks 057a79f598 This argument should be annotated as @Input 2011-10-28 09:44:49 -04:00
Eric Banks 4ba7c0cecd Moving to private 2011-10-28 09:29:28 -04:00
Eric Banks 1bdd76c2f2 These tools now use the IntervalBinding system to handle intervals instead of doing it all manually 2011-10-28 09:28:12 -04:00
Eric Banks 6ba08a103d Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory. 2011-10-28 09:23:25 -04:00
Eric Banks 3d04bb5608 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-27 23:55:18 -04:00
Eric Banks 19e27d4568 Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative. 2011-10-27 23:55:11 -04:00
Eric Banks cafc245a43 For some reason, a class of Codecs (including TableCodec) require that a GenomeLocParser be passed in to do the position processing. Why can't they just return a Feature with chr, start, stop? Isn't that the right thing? 2011-10-27 23:54:28 -04:00
Guillermo del Angel cbc43683ee Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-27 20:54:18 -04:00
Guillermo del Angel 8907e42007 First fully functional implementation of ValidationSiteSelectorWalker. User gives a) a set of input variants, b) a desired number of output variants, b) Optionally, a set of samples which will restrict sites to be polymorphic in those samples, c) a frequency selection mode: either uniform (no AF matching), or matching AF so that output sites mirror the input AF spectrum as closely as possible.
More testing is needed and docs need improving but so far all functionality seems up and running
2011-10-27 20:53:48 -04:00
Eric Banks ccfd853b34 Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed. 2011-10-27 20:43:50 -04:00
Mauricio Carneiro 49b616b8e6 Outputting compressed read names
In an attempt to minimize file size, read names are compressed to a number. Reads with the same name will have the same number. Consensus reads will have a C in front of the number. Tests show an average 10% reduction (extra) in file size.
2011-10-27 16:59:58 -04:00
Mark DePristo 648a17a30b Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-27 16:27:30 -04:00
Mark DePristo 1ecd08a296 Better docs for VariantsToPed 2011-10-27 16:27:19 -04:00
Eric Banks c2f343773e Oops, working too quickly last time. This is the proper fix for the potential NPE in the equals() test. 2011-10-27 15:32:08 -04:00
Khalid Shakir 4d0e34109f Compacting pdfs when running under R 2.13+. 2011-10-27 14:51:56 -04:00
Khalid Shakir b80d407dc7 No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
Other minor cleanup.
2011-10-27 14:17:07 -04:00