Commit Graph

9527 Commits (229d1aa904e8943a500b9b7b8200c8dbf18b515c)

Author SHA1 Message Date
Joel Thibault 229d1aa904 Bjorn -> Nexus 2012-05-15 13:30:29 -04:00
Joel Thibault ca39387ec7 Retrieve from the DB in block mode
Reorder query fields
Throw exception when DB data cannot be found
2012-05-15 13:29:45 -04:00
Joel Thibault 747e3f6c94 Modify the DB schema to use block writes 2012-05-15 13:29:07 -04:00
Eric Banks d44886d9e8 Very naughty bug: VE output is not at all gatherable but no one told this to Queue. Fixed. 2012-05-15 10:29:04 -04:00
Eric Banks 819c3d0c15 Adding to the Hrun docs 2012-05-15 10:27:52 -04:00
Christopher Hartl b16e169412 Variant Call QC was calling melt() without importing the reshape package. It's unclear how this ever worked... 2012-05-15 07:34:36 -04:00
Guillermo del Angel 2abd1e06cc More fixes to prevent NPE in pool caller 2012-05-14 22:02:39 -04:00
Guillermo del Angel 7b81559a9b One more pool caller bug fix: don't create output file for noisy simulation in unit test, or else previous results will be deleted 2012-05-14 16:27:38 -04:00
Guillermo del Angel 617ac0b88f More pool caller bug fixes 2012-05-14 16:15:54 -04:00
Guillermo del Angel 578092b120 Pool caller bug fixes: avoid NPE in null tracker positions, fix so that we can (in theory) use Pool AF with non-pool GL model for testing 2012-05-14 15:03:53 -04:00
Guillermo del Angel 5fc3adbb04 One more VariantsToTable bug fix 2012-05-14 14:10:07 -04:00
Guillermo del Angel 04d691f04a Forgot to update MD5's due to new Exact AF model in pool caller (all changes legit, minor QUAL/QD/SB differences). Fixed bug in VariantsToTable from previous commit 2012-05-14 14:01:29 -04:00
Guillermo del Angel ae26f0fe14 a) Fully functional and working multiallelic exact model for pools. Needs cleanup/more testing. b) Better unit test for pool genotype likelihoods - it now optionally generates actual noisy pileups that can be used for assessing GL accuracy, c) Totally experimental, hidden option in VariantsToTable to output genotype fields. Specifying -GF will output columns of form Sample.FieldName - needs also more testing 2012-05-14 10:55:35 -04:00
Guillermo del Angel 67e5c3ff9f Solved major scalability problem in pool caller - exact model may have been linear but computing pool GL's was O(n^p) where p was max # of alleles (4 in SNP discovery mode). Linearized approach follows exact AF model with queue of AC conformations to add - may refactor code to eliminate duplication later, as linear multiallelic pool AF model will use same approach. TBD: how to print PL's with -Infinity value, right now since we never cap PL printing we end up with big nonsense numbers in those positions and vcf's look ugly. Calling MT in CEU trio with pool size = 100 goes from 2 days to 55 minutes (sic) 2012-05-11 10:05:09 -04:00
Guillermo del Angel 9acef4b206 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-10 16:00:58 -04:00
Guillermo del Angel da6f16986e Preparatory refactorings for pool indel calling and for optimizations: restructure code in PoolSNPGenotypeLikelihoods that will be shared with indels, and make it easier to rewrite when optimized version that's linear in pool size is ready (current version is linear in #of pools but not yet on pool size). 2012-05-10 16:00:37 -04:00
Ryan Poplin c9dd0f3173 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-10 13:09:10 -04:00
Ryan Poplin 0cdadffe14 Committing the best of the frantic pre-CSHL experiments: Better algorithm for partioning reads amongst the alleles they support. Require the read's original alignment to actually overlap the variant. QD uses the non-informative reads when calculating D. More HC-specific annotations for potential use in a statistical filtering strategy. Increasing the minimum kmer length in the assembly graphs. Misc minor bug fixes. 2012-05-10 13:09:03 -04:00
Guillermo del Angel 89f8a6b2e6 Revert bad part of last commit that shouldn't have been pushed 2012-05-10 10:41:08 -04:00
Guillermo del Angel 27b1aa5dd3 Don't allow N's in insertions when discovering indels. Maybe better solution will be to use them as wildcards and merge them with compatible regular insertion alleles but for now it's easier to ignore them. Minor refactoring of Allele.accepableAlleleBases to support this. Added unit test to test consensus allele counter in presence of N's 2012-05-10 10:29:19 -04:00
Eric Banks 4f37d6d399 Fixing docs 2012-05-10 00:56:00 -04:00
Joel Thibault 51936dcef3 Update indices to better match queries
Only query for the requested samples
2012-05-09 17:14:50 -04:00
Joel Thibault f4ae4a0a70 Initial versions of Queue scripts for Mongo testing 2012-05-09 14:22:34 -04:00
Joel Thibault 5427f8dffa Mongo Long/Integer confusion 2012-05-09 14:22:34 -04:00
David Roazen c56370a503 Update GATKPerformanceOverTime script for GATK 1.6 2012-05-09 13:47:41 -04:00
Mark DePristo 398dceec56 Basic test script to cut up a BAM and run it through cramtools 2012-05-08 19:46:51 -04:00
Mark DePristo c81acfc15d Working implementation of BCF2
-- Nearly complete on spec implementation.  Slow but clean
-- Some refactoring of VariantContext to support common functions for BCF and VCF
2012-05-08 19:46:51 -04:00
Mark DePristo a5193c2399 Mostly complete reference implementation of BCF2
-- Can run VariantEval on 3000 sample exome VCF and get the same output as the original VCF
2012-05-08 19:46:51 -04:00
Mark DePristo 237a41d3d3 Phase II of BCF2 reader / writer
-- New encoder decoder implementation with cleaner interface that supports newer spec versions
-- Checkpoint to read / write sites files
2012-05-08 19:46:50 -04:00
Mark DePristo eb6721bd44 Initial test simple BCF2 encoder / decoder 2012-05-08 19:46:50 -04:00
Eric Banks 473d07b0c5 fixing up docs from previous Pool Caller commit 2012-05-08 11:02:55 -04:00
Eric Banks b4999d14c1 updating docs 2012-05-08 10:58:46 -04:00
Guillermo del Angel 33a1dd2048 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-08 10:42:12 -04:00
Guillermo del Angel 7584b1ea17 Back off optimization of pool vcf that didn't print genotypes. Many site attributes require GT in genotypes to be computed correctly. Better to change string representation of polyploid genotypes, TBD better solution 2012-05-08 10:41:39 -04:00
Eric Banks 5cf4fd63c2 Catch malformed base qualities and throw as a User Error 2012-05-08 09:34:57 -04:00
Guillermo del Angel a4f4b5007b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-08 09:34:33 -04:00
Guillermo del Angel 605984353f Pool Caller improvements: a) New non-standard private annotation Heteroplasmy which measures mean heteroplasmy (pool AF) across called samples, meant for easier mtDNA calling. Pure homoplasmic variants (pool AF = 1 or 0) would have heteroplasmy=1. b) Don't output pool genotypes by default for large pool sizes because it makes file sizes explode and they're unreadable. c) Refactored classes ExactACCounts and ExactACSet and moved to superclass AlleleFrequencyCalculationModel because both Pool and Exact AF calculation models will use it. d) Initial refactorings and skeleton for linearized multi-allelic exact model (not done yet). e) Unit test for Pool AF calculation model. 2012-05-08 09:33:38 -04:00
Eric Banks c40cda7e3c Nope, loads of integration tests had to be changed. 2012-05-07 14:30:42 -04:00
Eric Banks 66838a073e Very annoying: we have been emitting an extra TAB in the header of the VCF (which breaks some parsers) for sites-only file. Hopefully not too many integration tests will need to be fixed... 2012-05-07 12:20:11 -04:00
Mark DePristo a90482c772 Rev. tribble to v101 with another putative open file leak fix
Scalability bugfixes; can issues tens of thousands of queries to an reader
without opening too many files

-- Fixed missing close() statement in TribbleIndexedFeatureReader
-- Fixed NPE in TabixIteratorLineReader
-- Added scalability test that confirms .query() failure and subsequent fix

Note this actually fixes a tested and reproducible scability issue.  Might not be the only one but I believe it should do the trick.  Sorry everyone for the inconvenience.  Note that we now have a test in Tribble to ensure this doesn't happen again.
2012-05-04 15:40:41 -04:00
David Roazen 9424acb3c8 BCF2: Fix issue with parsing of filters 2012-05-04 15:08:53 -04:00
David Roazen e506de47b3 BCF2: Use the reference's sequence dictionary in BCF2Writer, don't require the VCF header to have contig declarations 2012-05-04 14:54:50 -04:00
David Roazen b28de6674d BCF2: set VC stop position to allow BCF2ToVCF walker to work correctly
Stop position is not yet correct for multi-nucleotide events, but that can
be fixed later
2012-05-04 13:24:49 -04:00
David Roazen 6b769e91d8 BCF2: third checkpoint
* writer mostly implemented
* walkers to convert BCF2 <-> VCF
* almost working for sites-only files; genotypes still need work
* initial performance tests this afternoon will be on sites-only files
2012-05-04 13:00:15 -04:00
Mark DePristo fa84d50a2b Rev. tribble for putative bugfixes for not closing streams 2012-05-04 10:20:46 -04:00
Khalid Shakir 23e3668e2c Added JUST_BCF2 to PRS walker based on GVCF tests.
Example: -T ProfileRodSystem -mode JUST_BCF2 -R <fasta> -vcf <input> -o out.txt [-performanceTest]
2012-05-03 22:08:18 -04:00
Khalid Shakir a9da9598f5 Implemented getSamplesFromVCF. 2012-05-03 21:57:57 -04:00
Khalid Shakir 7c11dde328 Updated DPP test MD5's due to template length (TLEN) changes when Picard was revved. 2012-05-03 14:47:58 -04:00
David Roazen fbb40c3c42 BCF2: checkpoint for Mark 2012-05-03 14:31:25 -04:00
Eric Banks c9829374d3 Oops, was using the wrong variables to print in the HaplotypeResolver. Fixing for Ryan. 2012-05-03 13:39:49 -04:00