Commit Graph

11158 Commits (4ced2e4ffc7d457cb9a8aad4c4aa2cb3cd3fb705)

Author SHA1 Message Date
Guillermo del Angel 4ced2e4ffc Merge branch 'develop' of github.com:broadinstitute/cmi-gatk into develop 2012-12-03 20:14:43 -05:00
Guillermo del Angel c2c6b858e3 Better checks/more flexibility in fastq2bam parsing. Immediate benefit: we can now process normal-only samples, and metadata should be able to specify tumor/normal pairs in any order. Hard-coded hacks removed. DEV-134 #resolve #time 3m 2012-12-03 20:14:37 -05:00
Douglas Voet e1b5b562eb fix TrivalTask compile issues 2012-11-30 10:48:38 -05:00
Douglas Voet 3db877f4f2 Merge branch 'develop' of https://github.com/broadinstitute/cmi-gatk into develop 2012-11-30 10:47:08 -05:00
kshakir a6c1fcd151 Removed default use of @Output syntax.
If compile completes for QScripts, sending runtime errors during execute.
2012-11-29 13:40:36 -05:00
Guillermo del Angel cf56ca3bc9 Call at known sites (initially from 1000G) at end of fastq2bam. DEV-272 #resolve #time 2m 2012-11-28 11:15:16 -05:00
Mauricio Carneiro c0a1d2fe31 Rolling back merge changes that undid Doug's commits. 2012-11-27 16:01:03 -05:00
Mauricio Carneiro 97fd5de260 Merging latest CMI updates with UNSTABLE 2012-11-27 09:08:00 -05:00
Eric Banks b1969a66bd Update docs 2012-11-27 08:24:41 -05:00
Eric Banks cc72aaefeb Minor efficiency: use >= instead of > in test 2012-11-27 01:11:23 -05:00
Eric Banks 405f3c675d Fix for GSA-649: GenomeLocSortedSet.overlaps is crazy slow. Also improved GenomeLocSortedSet.sizeBeforeLoc. 2012-11-27 01:07:00 -05:00
Ryan Poplin e27d677c13 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-26 12:20:32 -05:00
Ryan Poplin 59cef880d1 Updating HC integration tests because experimental, HC-specific annotations have been removed. 2012-11-26 12:20:07 -05:00
Ryan Poplin c3b7dd1374 Misc cleanup in the HaplotypeCaller. Cleaning up unused arguments after recent changes to HC-GenotypingEngine 2012-11-26 12:19:11 -05:00
Eric Banks 4f7fa3009a I forget why I thought that the VariantAnnotator couldn't run multi-threaded because it works just fine. Now you can specify -nt with VA. 2012-11-26 11:34:59 -05:00
Mauricio Carneiro c0261f75ce Merging master and develop together
(because I forgot to do so when I merged in nov 14th, now develop has a few extra commits not present in master).
2012-11-26 11:31:47 -05:00
Mauricio Carneiro a3f5932501 Fixed null pointer exception in Integration Tests
When running Utils.setupWriter with NO_PG_TAG set, the writer was attempting to create a program record with the null pointer. Fixed.
2012-11-26 11:12:27 -05:00
Eric Banks b15b62157a Use correct path in imports 2012-11-26 10:09:13 -05:00
Menachem Fromer 3784bb5258 Fixes to process all SNPs and indels simultaneously (even those at same site) 2012-11-26 03:59:36 -05:00
Ryan Poplin fedc4fde6c Merged bug fix from Stable into Unstable 2012-11-25 21:55:55 -05:00
Ryan Poplin d978cfe835 Soft clipped bases shouldn't be counted in the delocalized BQSR. 2012-11-25 21:55:29 -05:00
Eric Banks 9719ba7adc Remove -number example from the docs since it's no longer supported. 2012-11-22 21:53:42 -05:00
Menachem Fromer 2306518ab6 Fix to deal with 'proper' options of casting 2012-11-22 01:45:18 -05:00
Menachem Fromer d33a412b5f Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-22 01:42:29 -05:00
Mark DePristo 48f271c5bd Adding 80% support for multi-allelic variants
-- Multi-allelic variants are split into their bi-allelic version, trimmed, and we attempt to provide a meaningful genotype for NA12878 here.  It's not perfect and needs some discussion on how to handle het/alt variants
-- Adding splitInBiallelic funtion to VariantContextUtils as well as extensive unit tests that also indirectly test reverseTrimAlleles (which worked perfectly FYI)
2012-11-21 17:24:59 -05:00
Mark DePristo e14bfa9f5c Update reviews.vcf to have genotypes, and downstreams changes to tools to support this 2012-11-21 17:24:59 -05:00
Mark DePristo 5d2ee32936 Documentation and validation for NA12878 KB tools
-- MongoVariantContexts and MongoGenotype have a validate() function that ensures that the information is consistent, in anticipation of potential problems with the data coming in from reviews via IGV
-- Divide the world into production, development, and test DB, via the NA12878DBArgumentCollection
2012-11-21 17:24:59 -05:00
Joel Thibault c68bc95db6 Initial read mapping tests
- Failing tests are commented out
2012-11-21 17:16:46 -05:00
Joel Thibault 3ad9128800 Add some reads
- Move intervals and reads to init
- Update intervals and reads
2012-11-21 17:16:46 -05:00
Joel Thibault 3fa3b00f4a Add ActiveRegion tests and refactor 2012-11-21 17:16:45 -05:00
Joel Thibault e8defcb20d Test multiple bases and intervals 2012-11-21 17:16:45 -05:00
Joel Thibault c08b782743 Count isActive calls directly 2012-11-21 17:16:45 -05:00
Eric Banks 7580a487f3 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-21 16:20:49 -05:00
Eric Banks 4f2229d399 As per the TODO message, I removed a check that was no longer necessary. Now ID is an allowable INFO field key. 2012-11-21 16:01:26 -05:00
Menachem Fromer a8c7edca05 Fixed fragment handling in DepthOfCoverage 2012-11-21 16:01:10 -05:00
Menachem Fromer 06261b58c2 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-21 15:57:08 -05:00
Eric Banks ed50814ccb Finally found a case where user errors were being masked behind other errors and could debug. It turns out that the checkForMaskedUserErrors() method needs to run recursively over all levels (calling exception.getCause()) to check for the original cause. 2012-11-21 15:57:05 -05:00
Menachem Fromer c8be7c3102 Keep SNPs and indels separately for batch merging; Add options to DepthOfCoverage to count fragments (to not double-count overlapping reads of same fragment); DepthOfCoverage should now support ReducedReads; Replace recusrion with loop in DoC/package.scala (for lists longer than 5000 elements) 2012-11-21 15:56:53 -05:00
Ryan Poplin 17d1c9ed53 Updating NA12878 reviews with Mark. 2012-11-21 12:47:04 -05:00
Douglas Voet d3817e789c made dedup setup not intermediate 2012-11-21 09:07:08 -05:00
Eric Banks 2e1a055aca Merged bug fix from Stable into Unstable 2012-11-20 23:20:33 -05:00
Eric Banks c54fc94505 Protect against features that start off the end of the read (otherwise, Arrays.fill fails) 2012-11-20 23:19:59 -05:00
Eric Banks c2efb04657 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-20 22:43:15 -05:00
Eric Banks 72e2d569c5 The user can now set the maximum allowable cycle on the command-line with --maximum_cycle_value. This value is (now) enforced in the Cycle covariate and a User Error is thrown if the maximum value is passed (with a helpful error message). Added unit tests to cover this new functionality. 2012-11-20 22:41:57 -05:00
Eric Banks ff87642a91 Enable cycle covariate unit tests 2012-11-20 22:29:56 -05:00
Mark DePristo cc7680e601 NA12878 knowledge base backed by MongoDB
-- Idea is simply to create a persistent database of all TP/FP sites on chr20 in NA12878.  Individual callsets can be imported, and a consensus algorithm is run over all callsets in the database to create a consensus collection, which can be used to assess NA12878 callsets for GATK and methods development
-- Framework for representing simple VariantContexts and Genotypes in MongoDB, querying for records, and iterating over them in the GATK
-- Not hooked up to Tribble, but could be done reasonably easily now (future TODO)
-- Tools to import callsets, create consensus callsets, import and export reviews
-- Scripts to reset the knowledge base and repopulate it with the standard data files (Eric will expand)
-- Actually scales to all of chr20, includes AssessNA12878 that reads a VCF and itemizes it against the truth data set
-- ImportCallset can load OMNI, HM3, CEU best practices, mills/devine sites and genotypes, properly marking sites as poly/mono/unk as well as TP/FP/UNK based on command line parameters
-- Added shell scripts that start up a local mongo db, that connect to a local or BI hosted mongo for NA12878.db for debugging, and a setupNA12878db script that can load OMNI, HM3, CEU best practices, Mills/Devine into the db and then update the consensus.
-- Reviewed sites can be exported to a VCF, and imported again, as a mechanism to safely store the only non-recoverable data from the Mongo DB.
-- Created a NA12878DBWalker that manages the outer DB interaction, and that all MongoDB interacting walkers inherit from.  Added a NA12878DBArgumentCollection.java consolating all of the common command line arguments (though strictly not necessary as all of this occurs in the root walker)

UnitTests
-- Can connect to a test knowledge base for development and unit testing
-- PolymorphicStatus, TruthStatus, SiteIterator
-- NA12878KBUnitTestBase provides simple utilities for connecting to the test mongo db, getting calls, etc
-- MongoVariantContext tests creation, matching, and encoding -> writing -> read -> decoding from the mongodb

AssessNA12878
-- Generic tool for comparing a NA12878 callset against the knowledge base.  See http://gatkforums.broadinstitute.org/discussion/1848/using-the-na12878-knowledge-base for detailed documentation
-- Performs trivial filtering on FS, MQ, QD for SNPs and non-SNPs to separate out variants likely to be filtered from those that are honest-to-goodness FPs

Misc
-- Ability to provide Description for Simplified GATK report
2012-11-20 18:50:52 -05:00
Eric Banks 7dadbae068 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-20 16:13:34 -05:00
Eric Banks 937ac7290f Lots more GGA fixes for the HC now that I understand what's going on internally. Integration tests pass except for the GGA test which I believe now produces better results. 2012-11-20 16:13:29 -05:00
Menachem Fromer 8376c28728 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-20 12:35:39 -05:00
Menachem Fromer d2a6e2526d Fixed Input vs. Argument 2012-11-20 12:34:47 -05:00