Commit Graph

9461 Commits (bb756447e2e76bc11df7a09bcb32eeba66731f6d)

Author SHA1 Message Date
Joel Thibault bb756447e2 Move mongodb package to a location where walkers will be visible from the command line 2012-05-02 11:58:06 -04:00
Guillermo del Angel 429800a192 Fix corner case rounding issue in MathUtils unit test: 10^logFactorial(4)) was 23.999999... which if cast directly yielded 23 - so, do pre-rounding to ensure correct integer result if caller will cast value. 2012-05-02 09:57:06 -04:00
Guillermo del Angel 76a95fdedf Full implementation of multiallelic exact model for pools. Still super-linear so not useable at scale but it should be a gold standard to compare to. Unit tests are not exhaustive yet, will be expanded to provide better test coverage. Small inconsequential optimization in MathUtils: we're already caching log10(factorial(n)) for large n, so might as well use the cached values to compute binomial and multinomial coefficients instead of the log-gamma approximation which is more expensive (doesn't seem to save much time either in PoolCaller nor in UG though). 2012-05-02 09:24:28 -04:00
Joel Thibault 4d732fa586 Move all MongoDB files into private/java/src/org/broadinstitute/sting/mongodb 2012-05-01 18:23:51 -04:00
Mauricio Carneiro bdf6d1f109 updates to BQSR queue script 2012-05-01 17:36:33 -04:00
Eric Banks 619a69a5f1 As promised in the release notes for 1.6, I am removing the old deprecated genotyping framework revolving around the misordering of alleles and have moved the fixed version in its place in preparation for release 1.7 (or 2.0?). 2012-05-01 16:18:24 -04:00
Joel Thibault c255dd5917 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-01 16:10:38 -04:00
Ryan Poplin 51af61b5d7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-01 16:07:23 -04:00
Ryan Poplin cc646690d6 updating HaplotypeCaller integration tests 2012-05-01 16:07:18 -04:00
Ryan Poplin fc55dcec3c Unfortunately the reverse trimming of alleles still doesn't work with mixed records in some corner cases. Turning it off for now. 2012-05-01 16:02:36 -04:00
Ryan Poplin 2187d71bb2 Adding some quick debugging, custom annotations to the calls coming out of the HaplotypeCaller. 2012-05-01 15:55:14 -04:00
Ryan Poplin 20a0078f23 Merging active regions across shard boundries if they are contiguous, have the same active status and don't grow too big. 2012-05-01 15:51:36 -04:00
Eric Banks 0f3af9555b Adding an option to SelectVariants which allows the user to re-genotype through the exact model (if PLs are present) the samples in order to recalculate the QUAL and genotypes. This is really the correct way to select a subset of samples, especially when originally called from low coverage data. Also added integration test to cover this case. 2012-05-01 14:58:06 -04:00
Joel Thibault aa4d41cce0 Minor cleanup before push 2012-05-01 14:16:44 -04:00
Joel Thibault b101b9c30b Add Mongo switch 2012-05-01 14:00:48 -04:00
Joel Thibault 1b609e9075 Move Mongo to server couchdb 2012-05-01 13:59:47 -04:00
Joel Thibault fd57d27f45 Move MongoDB connection handling to a separate class 2012-05-01 13:59:37 -04:00
Joel Thibault db3cd1abd5 Use 2 MongoDB collections (tables): one for INFO/attributes, one for samples/genotypes. 2012-05-01 13:57:23 -04:00
Joel Thibault 04e1be9106 Better handling of Mongo errors + exceptions 2012-05-01 13:57:23 -04:00
Joel Thibault ca737479cf Query for stop locations because we don't have that information in the reference 2012-05-01 13:57:23 -04:00
Joel Thibault 1cda87a4ad Set ROD priority list to input 2012-05-01 13:57:23 -04:00
Joel Thibault a7fe847faf Set the priority list and don't bother combining if not needed 2012-05-01 13:57:23 -04:00
Joel Thibault f739305f43 Combine the variants found at a location 2012-05-01 13:57:23 -04:00
Joel Thibault 020f884d5a Use new key of source ROD plus alleles 2012-05-01 13:57:23 -04:00
Joel Thibault 221ce9c3d6 Add alleles to the primary key 2012-05-01 13:57:23 -04:00
Joel Thibault 3198ce5471 Can have multiple variants at a location 2012-05-01 13:57:22 -04:00
Joel Thibault 11ed8e61c9 Add referenceBaseForIndel to the Mongo VariantContext objects 2012-05-01 13:53:44 -04:00
Joel Thibault 7ed0ee7ed0 Skip locations with no genotypes instead of throwing a NPE 2012-05-01 13:53:44 -04:00
Joel Thibault 4bdfeacdaa Handle multiple samples/genotypes per location
TODO: sample selection
2012-05-01 13:53:43 -04:00
Joel Thibault 1f7c628796 Insert the ROD filename into MongoDB as part of the primary key 2012-05-01 13:53:43 -04:00
Joel Thibault bb8a6e9b0a Initial test of write and read from MongoDB 2012-05-01 13:53:43 -04:00
Joel Thibault d93a413f2e Add MongoDB dependency 2012-05-01 13:53:43 -04:00
Mark DePristo 0cf3603c73 Merged bug fix from Stable into Unstable 2012-05-01 13:39:27 -04:00
Mark DePristo c2b74eca64 Remove unnecessary and obscure usage of old R 2012-05-01 13:39:09 -04:00
David Roazen c0084c741b Pilot BCF2 Implementation: Checkpointing the code
* Not working yet, still very much a work-in-progress with lots of placeholders
* Needed to check this in to enable possible collaboration, since it's
  going slower than anticipated and the conference deadline looms.
2012-05-01 12:23:10 -04:00
Eric Banks fdffe1d61b Merged bug fix from Stable into Unstable 2012-05-01 11:04:46 -04:00
Eric Banks 0c8e801021 Removing public to private dependency 2012-05-01 11:04:11 -04:00
Eric Banks e964d17518 Removing public to private dependency 2012-05-01 11:02:28 -04:00
Eric Banks ef082356e9 Merge remote-tracking branch 'unstable/master' 2012-05-01 08:47:08 -04:00
Mauricio Carneiro 462450c3e3 disabling all BQSR unit tests
with the changes to the cycle covariate, some tests need updates, others  need to be completely re-written.
2012-04-30 14:39:55 -04:00
Mauricio Carneiro 825ad30477 Adding readgroup filter option to BQSR queue script 2012-04-30 14:39:55 -04:00
Guillermo del Angel e185632013 Exhaustive unit tests for Pool SNP genotype likelihoods:
a) Add ability for ErrorModel to be specified by external log-probability vector for testing.
b) For a given depth and ploidy(=2*samples/pool), create artificial high quality pileup testing from AC=0 to AC=ploidy, and test that pool GL's have expected content.Misc. refactorings and cleanups
c) Misc. cleanups and beautification.
2012-04-30 14:29:46 -04:00
Christopher Hartl 7d029b9a28 Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-30 12:16:30 -04:00
Christopher Hartl 944a7d815e Bringing VQSRV3 up to date. Lots of new features (un-classifying the worst-performing training sites, treating the x% best/worst sites as postive/negative points, ability to pass in a monomorphic track to see ROC curves output). Minor changes to AlleleBalance: weighted average was incorrectly specified (using logscale actually biased the average towards the AB of low-quality genotypes), and breaking out AB by het, hom, and diploid to bring it in line with some (private) changes to the indel likelihood model that (correctly) computes these values for indels. 2012-04-28 11:31:03 -04:00
Ryan Poplin 54a9bc2da2 Bug fix in reverse trim alleles for the case of mixed records that become non-mixed after subsetting the alleles. 2012-04-28 09:12:26 -04:00
Ryan Poplin e332aeaf70 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-27 16:21:21 -04:00
Ryan Poplin 2b5dd28550 Bug fix in reverse trim alleles for the case of mixed records. 2012-04-27 16:21:02 -04:00
Mauricio Carneiro c2472b3c45 parallel BQSR implementation. 2012-04-27 15:18:08 -04:00
Mauricio Carneiro 1db2d1ba82 Do not add the first and last 4 cycles to the recalibration tables. 2012-04-27 15:18:07 -04:00
Mauricio Carneiro 08dbd756f3 Quick QC walkers to look at the error profile of indels in the read 2012-04-27 15:18:07 -04:00