Joel Thibault
229d1aa904
Bjorn -> Nexus
2012-05-15 13:30:29 -04:00
Joel Thibault
ca39387ec7
Retrieve from the DB in block mode
...
Reorder query fields
Throw exception when DB data cannot be found
2012-05-15 13:29:45 -04:00
Joel Thibault
747e3f6c94
Modify the DB schema to use block writes
2012-05-15 13:29:07 -04:00
Eric Banks
d44886d9e8
Very naughty bug: VE output is not at all gatherable but no one told this to Queue. Fixed.
2012-05-15 10:29:04 -04:00
Eric Banks
819c3d0c15
Adding to the Hrun docs
2012-05-15 10:27:52 -04:00
Christopher Hartl
b16e169412
Variant Call QC was calling melt() without importing the reshape package. It's unclear how this ever worked...
2012-05-15 07:34:36 -04:00
Guillermo del Angel
2abd1e06cc
More fixes to prevent NPE in pool caller
2012-05-14 22:02:39 -04:00
Guillermo del Angel
7b81559a9b
One more pool caller bug fix: don't create output file for noisy simulation in unit test, or else previous results will be deleted
2012-05-14 16:27:38 -04:00
Guillermo del Angel
617ac0b88f
More pool caller bug fixes
2012-05-14 16:15:54 -04:00
Guillermo del Angel
578092b120
Pool caller bug fixes: avoid NPE in null tracker positions, fix so that we can (in theory) use Pool AF with non-pool GL model for testing
2012-05-14 15:03:53 -04:00
Guillermo del Angel
5fc3adbb04
One more VariantsToTable bug fix
2012-05-14 14:10:07 -04:00
Guillermo del Angel
04d691f04a
Forgot to update MD5's due to new Exact AF model in pool caller (all changes legit, minor QUAL/QD/SB differences). Fixed bug in VariantsToTable from previous commit
2012-05-14 14:01:29 -04:00
Guillermo del Angel
ae26f0fe14
a) Fully functional and working multiallelic exact model for pools. Needs cleanup/more testing. b) Better unit test for pool genotype likelihoods - it now optionally generates actual noisy pileups that can be used for assessing GL accuracy, c) Totally experimental, hidden option in VariantsToTable to output genotype fields. Specifying -GF will output columns of form Sample.FieldName - needs also more testing
2012-05-14 10:55:35 -04:00
Guillermo del Angel
67e5c3ff9f
Solved major scalability problem in pool caller - exact model may have been linear but computing pool GL's was O(n^p) where p was max # of alleles (4 in SNP discovery mode). Linearized approach follows exact AF model with queue of AC conformations to add - may refactor code to eliminate duplication later, as linear multiallelic pool AF model will use same approach. TBD: how to print PL's with -Infinity value, right now since we never cap PL printing we end up with big nonsense numbers in those positions and vcf's look ugly. Calling MT in CEU trio with pool size = 100 goes from 2 days to 55 minutes (sic)
2012-05-11 10:05:09 -04:00
Guillermo del Angel
9acef4b206
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-10 16:00:58 -04:00
Guillermo del Angel
da6f16986e
Preparatory refactorings for pool indel calling and for optimizations: restructure code in PoolSNPGenotypeLikelihoods that will be shared with indels, and make it easier to rewrite when optimized version that's linear in pool size is ready (current version is linear in #of pools but not yet on pool size).
2012-05-10 16:00:37 -04:00
Ryan Poplin
c9dd0f3173
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-10 13:09:10 -04:00
Ryan Poplin
0cdadffe14
Committing the best of the frantic pre-CSHL experiments: Better algorithm for partioning reads amongst the alleles they support. Require the read's original alignment to actually overlap the variant. QD uses the non-informative reads when calculating D. More HC-specific annotations for potential use in a statistical filtering strategy. Increasing the minimum kmer length in the assembly graphs. Misc minor bug fixes.
2012-05-10 13:09:03 -04:00
Guillermo del Angel
89f8a6b2e6
Revert bad part of last commit that shouldn't have been pushed
2012-05-10 10:41:08 -04:00
Guillermo del Angel
27b1aa5dd3
Don't allow N's in insertions when discovering indels. Maybe better solution will be to use them as wildcards and merge them with compatible regular insertion alleles but for now it's easier to ignore them. Minor refactoring of Allele.accepableAlleleBases to support this. Added unit test to test consensus allele counter in presence of N's
2012-05-10 10:29:19 -04:00
Eric Banks
4f37d6d399
Fixing docs
2012-05-10 00:56:00 -04:00
Joel Thibault
51936dcef3
Update indices to better match queries
...
Only query for the requested samples
2012-05-09 17:14:50 -04:00
Joel Thibault
f4ae4a0a70
Initial versions of Queue scripts for Mongo testing
2012-05-09 14:22:34 -04:00
Joel Thibault
5427f8dffa
Mongo Long/Integer confusion
2012-05-09 14:22:34 -04:00
David Roazen
c56370a503
Update GATKPerformanceOverTime script for GATK 1.6
2012-05-09 13:47:41 -04:00
Mark DePristo
398dceec56
Basic test script to cut up a BAM and run it through cramtools
2012-05-08 19:46:51 -04:00
Mark DePristo
c81acfc15d
Working implementation of BCF2
...
-- Nearly complete on spec implementation. Slow but clean
-- Some refactoring of VariantContext to support common functions for BCF and VCF
2012-05-08 19:46:51 -04:00
Mark DePristo
a5193c2399
Mostly complete reference implementation of BCF2
...
-- Can run VariantEval on 3000 sample exome VCF and get the same output as the original VCF
2012-05-08 19:46:51 -04:00
Mark DePristo
237a41d3d3
Phase II of BCF2 reader / writer
...
-- New encoder decoder implementation with cleaner interface that supports newer spec versions
-- Checkpoint to read / write sites files
2012-05-08 19:46:50 -04:00
Mark DePristo
eb6721bd44
Initial test simple BCF2 encoder / decoder
2012-05-08 19:46:50 -04:00
Eric Banks
473d07b0c5
fixing up docs from previous Pool Caller commit
2012-05-08 11:02:55 -04:00
Eric Banks
b4999d14c1
updating docs
2012-05-08 10:58:46 -04:00
Guillermo del Angel
33a1dd2048
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-08 10:42:12 -04:00
Guillermo del Angel
7584b1ea17
Back off optimization of pool vcf that didn't print genotypes. Many site attributes require GT in genotypes to be computed correctly. Better to change string representation of polyploid genotypes, TBD better solution
2012-05-08 10:41:39 -04:00
Eric Banks
5cf4fd63c2
Catch malformed base qualities and throw as a User Error
2012-05-08 09:34:57 -04:00
Guillermo del Angel
a4f4b5007b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-08 09:34:33 -04:00
Guillermo del Angel
605984353f
Pool Caller improvements: a) New non-standard private annotation Heteroplasmy which measures mean heteroplasmy (pool AF) across called samples, meant for easier mtDNA calling. Pure homoplasmic variants (pool AF = 1 or 0) would have heteroplasmy=1. b) Don't output pool genotypes by default for large pool sizes because it makes file sizes explode and they're unreadable. c) Refactored classes ExactACCounts and ExactACSet and moved to superclass AlleleFrequencyCalculationModel because both Pool and Exact AF calculation models will use it. d) Initial refactorings and skeleton for linearized multi-allelic exact model (not done yet). e) Unit test for Pool AF calculation model.
2012-05-08 09:33:38 -04:00
Eric Banks
c40cda7e3c
Nope, loads of integration tests had to be changed.
2012-05-07 14:30:42 -04:00
Eric Banks
66838a073e
Very annoying: we have been emitting an extra TAB in the header of the VCF (which breaks some parsers) for sites-only file. Hopefully not too many integration tests will need to be fixed...
2012-05-07 12:20:11 -04:00
Mark DePristo
a90482c772
Rev. tribble to v101 with another putative open file leak fix
...
Scalability bugfixes; can issues tens of thousands of queries to an reader
without opening too many files
-- Fixed missing close() statement in TribbleIndexedFeatureReader
-- Fixed NPE in TabixIteratorLineReader
-- Added scalability test that confirms .query() failure and subsequent fix
Note this actually fixes a tested and reproducible scability issue. Might not be the only one but I believe it should do the trick. Sorry everyone for the inconvenience. Note that we now have a test in Tribble to ensure this doesn't happen again.
2012-05-04 15:40:41 -04:00
David Roazen
9424acb3c8
BCF2: Fix issue with parsing of filters
2012-05-04 15:08:53 -04:00
David Roazen
e506de47b3
BCF2: Use the reference's sequence dictionary in BCF2Writer, don't require the VCF header to have contig declarations
2012-05-04 14:54:50 -04:00
David Roazen
b28de6674d
BCF2: set VC stop position to allow BCF2ToVCF walker to work correctly
...
Stop position is not yet correct for multi-nucleotide events, but that can
be fixed later
2012-05-04 13:24:49 -04:00
David Roazen
6b769e91d8
BCF2: third checkpoint
...
* writer mostly implemented
* walkers to convert BCF2 <-> VCF
* almost working for sites-only files; genotypes still need work
* initial performance tests this afternoon will be on sites-only files
2012-05-04 13:00:15 -04:00
Mark DePristo
fa84d50a2b
Rev. tribble for putative bugfixes for not closing streams
2012-05-04 10:20:46 -04:00
Khalid Shakir
23e3668e2c
Added JUST_BCF2 to PRS walker based on GVCF tests.
...
Example: -T ProfileRodSystem -mode JUST_BCF2 -R <fasta> -vcf <input> -o out.txt [-performanceTest]
2012-05-03 22:08:18 -04:00
Khalid Shakir
a9da9598f5
Implemented getSamplesFromVCF.
2012-05-03 21:57:57 -04:00
Khalid Shakir
7c11dde328
Updated DPP test MD5's due to template length (TLEN) changes when Picard was revved.
2012-05-03 14:47:58 -04:00
David Roazen
fbb40c3c42
BCF2: checkpoint for Mark
2012-05-03 14:31:25 -04:00
Eric Banks
c9829374d3
Oops, was using the wrong variables to print in the HaplotypeResolver. Fixing for Ryan.
2012-05-03 13:39:49 -04:00