Commit Graph

9489 Commits (a13d125ba15345a17ad8f4de5e7183493b2d0ea2)

Author SHA1 Message Date
Menachem Fromer a13d125ba1 Split out contig names from Reference .fai file on white space (to support the GATK resource bundle's file human_g1k_v37.fasta.fai.gz, which does not use tab delimiters) 2012-06-07 16:56:32 -04:00
Mark DePristo a90482c772 Rev. tribble to v101 with another putative open file leak fix
Scalability bugfixes; can issues tens of thousands of queries to an reader
without opening too many files

-- Fixed missing close() statement in TribbleIndexedFeatureReader
-- Fixed NPE in TabixIteratorLineReader
-- Added scalability test that confirms .query() failure and subsequent fix

Note this actually fixes a tested and reproducible scability issue.  Might not be the only one but I believe it should do the trick.  Sorry everyone for the inconvenience.  Note that we now have a test in Tribble to ensure this doesn't happen again.
2012-05-04 15:40:41 -04:00
David Roazen 9424acb3c8 BCF2: Fix issue with parsing of filters 2012-05-04 15:08:53 -04:00
David Roazen e506de47b3 BCF2: Use the reference's sequence dictionary in BCF2Writer, don't require the VCF header to have contig declarations 2012-05-04 14:54:50 -04:00
David Roazen b28de6674d BCF2: set VC stop position to allow BCF2ToVCF walker to work correctly
Stop position is not yet correct for multi-nucleotide events, but that can
be fixed later
2012-05-04 13:24:49 -04:00
David Roazen 6b769e91d8 BCF2: third checkpoint
* writer mostly implemented
* walkers to convert BCF2 <-> VCF
* almost working for sites-only files; genotypes still need work
* initial performance tests this afternoon will be on sites-only files
2012-05-04 13:00:15 -04:00
Mark DePristo fa84d50a2b Rev. tribble for putative bugfixes for not closing streams 2012-05-04 10:20:46 -04:00
Khalid Shakir 23e3668e2c Added JUST_BCF2 to PRS walker based on GVCF tests.
Example: -T ProfileRodSystem -mode JUST_BCF2 -R <fasta> -vcf <input> -o out.txt [-performanceTest]
2012-05-03 22:08:18 -04:00
Khalid Shakir a9da9598f5 Implemented getSamplesFromVCF. 2012-05-03 21:57:57 -04:00
Khalid Shakir 7c11dde328 Updated DPP test MD5's due to template length (TLEN) changes when Picard was revved. 2012-05-03 14:47:58 -04:00
David Roazen fbb40c3c42 BCF2: checkpoint for Mark 2012-05-03 14:31:25 -04:00
Eric Banks c9829374d3 Oops, was using the wrong variables to print in the HaplotypeResolver. Fixing for Ryan. 2012-05-03 13:39:49 -04:00
Eric Banks f3433201b1 Merged bug fix from Stable into Unstable 2012-05-03 11:11:00 -04:00
Eric Banks 557da77a1a Don't compute QD if there is no QUAL; added integration test for this 2012-05-03 11:02:37 -04:00
Eric Banks 1fc7b5d58b Merged bug fix from Stable into Unstable 2012-05-03 10:37:58 -04:00
Laurent Francioli 567d01cee8 - Added option to output the father's allele first in phased child haplotypes - BUG corrected causing wrong phasing of child/father pairs
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-05-03 10:36:49 -04:00
Laurent Francioli 96e5a26223 PED support for Inbreeding Coefficient annotation
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-05-03 10:36:20 -04:00
Mark DePristo 0f4cc1884d Rev to tribble 99, optimized AsciiFeatureCodec
-- Removed tmp. GeneralizedFeatureCodec
-- BCF2 Reader update to use new style, but this entire class can be deleted now
-- Rev. tribble to r99
2012-05-03 07:31:48 -04:00
Mark DePristo 43d97c2e00 Rev Tribble to r97, adding binary feature support
From tribble logs:

Binary feature support in tribble

-- Massive refactoring and cleanup
-- Many bug fixes throughout
-- FeatureCodec is now general, with decode etc. taking a PositionBufferedStream
as an argument not a String
-- See ExampleBinaryCodec for an example binary codec
-- AbstractAsciiFeatureCodec provides to its subclass the same String decode,
readHeader functionality before.  Old ASCII codecs should inherit from this base
class, and will work without additional modifications
-- Split AsciiLineReader into a position tracking stream
(PositionalBufferedStream).  The new AsciiLineReader takes as an argument a
PositionalBufferedStream and provides the readLine() functionality of before.
Could potentially use optimizations (its a TODO in the code)
-- The Positional interface includes some more functionality that's now
necessary to support the more general decoding of binary features
-- FeatureReaders now work using the general FeatureCodec interface, so they can
index binary features
-- Bugfixes to LinearIndexCreator off by 1 error in setting the end block
position
-- Deleted VariantType, since this wasn't used anywhere and it's a particularly
clean why of thinking about the problem
-- Moved DiploidGenotype, which is specific to Gelitext, to the gelitext package
-- TabixReader requires an AsciiFeatureCodec as it's currently only implemented
to handle line oriented records
-- Renamed AsciiFeatureReader to TribbleIndexedFeatureReader now that it handles
Ascii and binary features
-- Removed unused functions here and there as encountered
-- Fixed build.xml to be truly headless
-- FeatureCodec readHeader returns a FeatureCodecHeader obtain that contains a
value and the position in the file where the header ends (not inclusive).
TribbleReaders now skip the header if the position is set, so its no longer
necessary, if one implements the general readHeader(PositionalBufferedStream)
version to see header lines in the decode functions.  Necessary for binary
codecs but a nice side benefit for ascii codecs as well
-- Cleaned up the IndexFactory interface so there's a truly general createIndex
function that takes the enumerated index type.  Added a writeIndex() function
that writes an index to disk.
-- Vastly expanded the index unit tests and reader tests to really test linear,
interval, and tabix indexed files.  Updated test.bed, and created a tabix
version of it as well.
-- Significant BinaryFeaturesTest suite.
-- Some test files have indent changes
2012-05-03 07:31:48 -04:00
Mark DePristo 58c470a6c5 Rev'ing Tribble from 53 to 94
-- Other tribble contributors did major refactoring / simplification of tribble, which required some changes to GATK code
-- Integrationtests pass without modification, though some very old index files (callable loci beds) were apparently corrupt and no longer tolerated by the newer tribble codebase
2012-05-03 07:31:47 -04:00
Eric Banks e448cfcc59 Forgot to update these md5s 2012-05-02 21:09:50 -04:00
Khalid Shakir b8b7f28aa9 Revving Picard to pick up new SamFileHeaderMerger.
Updated ReadFilter abstract class to implement (via UnsupportedOperationException) the new SamRecordFilter.filterOut().
In IndelRealignerIntegrationTest updates for Picard fixes to SAMRecord.getInferredInsertSize() in svn r1115 & r1124.
- Ran FixMates to create new input BAM since running IR with variable maxReadsInMemory means all reads weren't realigned leading to different outputs.
- Updated md5s to match new expectations after looking at TLEN diff engine output.
2012-05-02 16:47:28 -04:00
Mauricio Carneiro b32f09b949 some more updates to the BQSR scala script 2012-05-02 16:23:02 -04:00
Mauricio Carneiro f51a1d0d61 Better error message to the BAMScheduler
In the case where the BAM file was aligned using a reference but analysis is being attempted with a different reference.
2012-05-02 16:10:00 -04:00
Mauricio Carneiro a5d17e02c7 quick lua script to merge recalibration reports by hand. 2012-05-02 16:06:04 -04:00
Mauricio Carneiro 940029fa5d Fixing on-the-fly recalibration (caught by Ryan)
low quality bases in the tails were being turned to N's in the final read.
2012-05-02 16:06:04 -04:00
Eric Banks 623b36fbc4 Add header lines for AC,AF, and AN tags 2012-05-02 15:33:34 -04:00
Guillermo del Angel 6fac8f2c70 More test coverage on PoolAFCalculationModel: add more tests for multiallelic case with higher ploidy 2012-05-02 14:12:02 -04:00
Joel Thibault bb756447e2 Move mongodb package to a location where walkers will be visible from the command line 2012-05-02 11:58:06 -04:00
Guillermo del Angel 429800a192 Fix corner case rounding issue in MathUtils unit test: 10^logFactorial(4)) was 23.999999... which if cast directly yielded 23 - so, do pre-rounding to ensure correct integer result if caller will cast value. 2012-05-02 09:57:06 -04:00
Guillermo del Angel 76a95fdedf Full implementation of multiallelic exact model for pools. Still super-linear so not useable at scale but it should be a gold standard to compare to. Unit tests are not exhaustive yet, will be expanded to provide better test coverage. Small inconsequential optimization in MathUtils: we're already caching log10(factorial(n)) for large n, so might as well use the cached values to compute binomial and multinomial coefficients instead of the log-gamma approximation which is more expensive (doesn't seem to save much time either in PoolCaller nor in UG though). 2012-05-02 09:24:28 -04:00
Joel Thibault 4d732fa586 Move all MongoDB files into private/java/src/org/broadinstitute/sting/mongodb 2012-05-01 18:23:51 -04:00
Mauricio Carneiro bdf6d1f109 updates to BQSR queue script 2012-05-01 17:36:33 -04:00
Eric Banks 619a69a5f1 As promised in the release notes for 1.6, I am removing the old deprecated genotyping framework revolving around the misordering of alleles and have moved the fixed version in its place in preparation for release 1.7 (or 2.0?). 2012-05-01 16:18:24 -04:00
Joel Thibault c255dd5917 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-01 16:10:38 -04:00
Ryan Poplin 51af61b5d7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-01 16:07:23 -04:00
Ryan Poplin cc646690d6 updating HaplotypeCaller integration tests 2012-05-01 16:07:18 -04:00
Ryan Poplin fc55dcec3c Unfortunately the reverse trimming of alleles still doesn't work with mixed records in some corner cases. Turning it off for now. 2012-05-01 16:02:36 -04:00
Ryan Poplin 2187d71bb2 Adding some quick debugging, custom annotations to the calls coming out of the HaplotypeCaller. 2012-05-01 15:55:14 -04:00
Ryan Poplin 20a0078f23 Merging active regions across shard boundries if they are contiguous, have the same active status and don't grow too big. 2012-05-01 15:51:36 -04:00
Eric Banks 0f3af9555b Adding an option to SelectVariants which allows the user to re-genotype through the exact model (if PLs are present) the samples in order to recalculate the QUAL and genotypes. This is really the correct way to select a subset of samples, especially when originally called from low coverage data. Also added integration test to cover this case. 2012-05-01 14:58:06 -04:00
Joel Thibault aa4d41cce0 Minor cleanup before push 2012-05-01 14:16:44 -04:00
Joel Thibault b101b9c30b Add Mongo switch 2012-05-01 14:00:48 -04:00
Joel Thibault 1b609e9075 Move Mongo to server couchdb 2012-05-01 13:59:47 -04:00
Joel Thibault fd57d27f45 Move MongoDB connection handling to a separate class 2012-05-01 13:59:37 -04:00
Joel Thibault db3cd1abd5 Use 2 MongoDB collections (tables): one for INFO/attributes, one for samples/genotypes. 2012-05-01 13:57:23 -04:00
Joel Thibault 04e1be9106 Better handling of Mongo errors + exceptions 2012-05-01 13:57:23 -04:00
Joel Thibault ca737479cf Query for stop locations because we don't have that information in the reference 2012-05-01 13:57:23 -04:00
Joel Thibault 1cda87a4ad Set ROD priority list to input 2012-05-01 13:57:23 -04:00
Joel Thibault a7fe847faf Set the priority list and don't bother combining if not needed 2012-05-01 13:57:23 -04:00