Commit Graph

9558 Commits (085588cb043fd1befdec32e5e42d4257dc8d48ae)

Author SHA1 Message Date
Joel Thibault 085588cb04 Not Nexus. Need new name. Navel? 2012-05-24 10:11:58 -04:00
Guillermo del Angel 43919078cd Merged bug fix from Stable into Unstable 2012-05-23 21:21:01 -04:00
Guillermo del Angel 4bc04e2a9e Correct way in which start/stop positions in a VC are computed when creating an indel VC. Old way was incorrect in case GENOTYPE_GIVEN_ALLELES was specified with a complex record. New way should work in general for all cases and is simpler. 2012-05-23 21:19:30 -04:00
Guillermo del Angel 7fe07a4ae6 Bug fix: prevent index out of bounds error if reference sample in pool caller has a call present at a site but genotype is a no-call allele 2012-05-22 21:06:53 -04:00
Joel Thibault dad75babf1 Increase Queue memory limits to 16 GB 2012-05-22 10:50:47 -04:00
Joel Thibault af3d73b884 Re-enable partitioning for Mongo reads (but not writes) 2012-05-22 10:50:47 -04:00
Ryan Poplin 692addb498 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-22 10:25:03 -04:00
Ryan Poplin c3fb321014 Minor updates to pacbio data processing script to make it work with the latest bwa version/settings. 2012-05-22 10:24:45 -04:00
Christopher Hartl d366cce714 Initial commit of a burden testing framework. Currently tests against only one phenotype and only one weighting function, but computes robust weighted dosages and calls into an R script that calculates both a direct glm LRT and an asymptotic normal p-values. Weights currently read in from external file (beta-values). Future work is to let these be calculated on the fly from e.g. annotation, potential impact, conservation, etc, and enable multiple weighting schemes tested jointly for association against multiple phenotypes. 2012-05-21 16:56:32 -04:00
Ryan Poplin 08dfd6cab6 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-21 16:47:07 -04:00
Ryan Poplin 04000d920c Bug fix in BadCigar read filter for index out of bounds exception when used with a bam file that contains unmapped reads. 2012-05-21 16:46:59 -04:00
Khalid Shakir 94cd4e6a7d Updated WGP min confidence from 4 to 10 based on recommendations from depristo and ebanks. 2012-05-21 16:41:45 -04:00
Eric Banks 666862af19 Added @Hidden option for GSA production use to cap the max alleles for indels at a lower number than for SNPs 2012-05-21 16:03:29 -04:00
Khalid Shakir e57cd78bba Killed two more resource leakers that ignored requests to close wrapped file pointers, and added Unit Tests for each.
This bug will happen in all adapter/wrapper classes that are passed a resource, and then in their close method they ignore requests to close the wrapped resource, causing a leak when the adapter is the only one left with a reference to the resource.

Ex:

public Wrapper getNewWrapper(File path) {
  FileStream myStream = new FileStream(path); // This stream must be eventually closed.
  return new Wrapper(myStream);
}

public void close(Wrapper wrapper) {
  wrapper.close(); // If wrapper.close() does nothing, NO ONE else has a reference to close myStream.
}
2012-05-21 15:41:56 -04:00
Eric Banks 7f5ec17d22 Fixed up the comments in the GATKReportTable code and added some sanity checks to make sure that the user doesn't inconsistently add rows and corresponding IDs to the table. 2012-05-21 14:16:13 -04:00
Joel Thibault 27c46b8071 Better matching and searching between sites and samples 2012-05-21 09:50:49 -04:00
Joel Thibault 8fb6fc9ff9 Contigs as blocks are too large for MongoDB documents 2012-05-21 09:50:49 -04:00
Eric Banks c1c70f3b41 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-21 09:39:08 -04:00
Eric Banks 92d8aa3d4c Don't exception out in these VE modules if the VCF has records that aren't just SNPs or indels 2012-05-21 09:38:52 -04:00
Guillermo del Angel 5cc9a12fbb Fixed definition in VCF header for pool caller genotype parameters MLAC and MLAF 2012-05-19 14:53:37 -04:00
Eric Banks 3af3834d50 Fixing 2 bugs in the SAMRecord printing argument descriptor code (as reported by Kristian):
* For some reason, the original implementor decided to use Booleans instead of booleans and didn't always check for null so we'd occasionally get a NPE.  Switched over to booleans.
* We'd also generate a NPE if SAMRecord writing specific arguments (e.g. --simplifyBAM) were used while writing to sdout.
2012-05-18 11:55:41 -04:00
Eric Banks 26968ae8eb Forgot that the VCFStreamingOntegrationTest uses VE 2012-05-18 02:51:53 -04:00
Eric Banks 52c206d5db Has anyone else ever noticed that the DiffEngine outputs were always doubled for some reason? That no longer happens with the new reports. 2012-05-18 02:32:20 -04:00
Eric Banks 03d40272c8 Removed old GATKReport code and moved the new stuff in its place. 2012-05-18 01:44:31 -04:00
Eric Banks a26b04ba17 Extensive refactoring of the GATKReports. This was a beast.
The practical differences between version 1.0 and this one (v1.1) are:

* the underlying data structure now uses arrays instead of hashes, which should drastically reduce the memory overhead required to create large tables.
* no more primary keys; you can still create arbitrary IDs to index into rows, but there is no special cased primary key column in the table.
* no more dangerous/ugly table operations supported except to increment a cell's value (if an int) or to concatenate 2 tables.

Integration tests change because table headers are different.
Old classes are still lying around.  Will clean those up in a subsequent commit.
2012-05-18 01:11:26 -04:00
Guillermo del Angel 5189b06468 New annotation for indels that describe if they're STR's and their characteristics. If an indel is a STR, 3 fields are added to INFO: STR (boolean), RU = repeat unit (String), RPA = number of repetitions per allele. So, for example, if ATATAT* context gets changed to ATAT and ATATATAT, then RU=AT and RPA=3,2,4. Will be made standard annotation shortly. Added unit tests for new functionality. Pending: refactor VariantContextUtils.isRepeat() to unify code, and fix VariantEval functionality. 2012-05-17 15:28:19 -04:00
David Roazen 9c6bccfd8b build system overhaul
* Added support for a protected directory whose contents are only made public in binary form

* Simplified and reorganized build.xml to improve readability and maintainability

* build.xml now autodetects most build properties:
    -Includes private/protected if they exist
    -No more STING_BUILD_TYPE or specialized targets for public-only, etc.

* Build targets have changed! There are now two main build options:

"ant"       build everything (GATK and Queue)
"ant gatk"  build just the GATK

It was too hard to build everything before -- now it is the default.

* To run tests with debugging, use -Dtest.debug=true -Dtest.debug.port=XXXX on the command line.
  Much better than the old comment/uncomment method!
2012-05-17 15:16:29 -04:00
Eric Banks 0f7c917e7a Better error checking and messages for bad alleles 2012-05-17 13:36:42 -04:00
David Roazen 6967b3de6c Downsampler: checking in the new standalone downsampler implementations without the engine modifications
Mauricio and I need to collaborate on downsampling for ReducedReads, so
for now I'm checking in my downsampler implementations without any of the
(still problematic) engine modifications.
2012-05-16 16:15:58 -04:00
Joel Thibault b9a2e41c4b Can't partition these in their current state 2012-05-15 13:42:42 -04:00
Joel Thibault 76905e9342 Minor cleanup before push 2012-05-15 13:34:49 -04:00
Joel Thibault 229d1aa904 Bjorn -> Nexus 2012-05-15 13:30:29 -04:00
Joel Thibault ca39387ec7 Retrieve from the DB in block mode
Reorder query fields
Throw exception when DB data cannot be found
2012-05-15 13:29:45 -04:00
Joel Thibault 747e3f6c94 Modify the DB schema to use block writes 2012-05-15 13:29:07 -04:00
Eric Banks d44886d9e8 Very naughty bug: VE output is not at all gatherable but no one told this to Queue. Fixed. 2012-05-15 10:29:04 -04:00
Eric Banks 819c3d0c15 Adding to the Hrun docs 2012-05-15 10:27:52 -04:00
Christopher Hartl b16e169412 Variant Call QC was calling melt() without importing the reshape package. It's unclear how this ever worked... 2012-05-15 07:34:36 -04:00
Guillermo del Angel 2abd1e06cc More fixes to prevent NPE in pool caller 2012-05-14 22:02:39 -04:00
Guillermo del Angel 7b81559a9b One more pool caller bug fix: don't create output file for noisy simulation in unit test, or else previous results will be deleted 2012-05-14 16:27:38 -04:00
Guillermo del Angel 617ac0b88f More pool caller bug fixes 2012-05-14 16:15:54 -04:00
Guillermo del Angel 578092b120 Pool caller bug fixes: avoid NPE in null tracker positions, fix so that we can (in theory) use Pool AF with non-pool GL model for testing 2012-05-14 15:03:53 -04:00
Guillermo del Angel 5fc3adbb04 One more VariantsToTable bug fix 2012-05-14 14:10:07 -04:00
Guillermo del Angel 04d691f04a Forgot to update MD5's due to new Exact AF model in pool caller (all changes legit, minor QUAL/QD/SB differences). Fixed bug in VariantsToTable from previous commit 2012-05-14 14:01:29 -04:00
Guillermo del Angel ae26f0fe14 a) Fully functional and working multiallelic exact model for pools. Needs cleanup/more testing. b) Better unit test for pool genotype likelihoods - it now optionally generates actual noisy pileups that can be used for assessing GL accuracy, c) Totally experimental, hidden option in VariantsToTable to output genotype fields. Specifying -GF will output columns of form Sample.FieldName - needs also more testing 2012-05-14 10:55:35 -04:00
Guillermo del Angel 67e5c3ff9f Solved major scalability problem in pool caller - exact model may have been linear but computing pool GL's was O(n^p) where p was max # of alleles (4 in SNP discovery mode). Linearized approach follows exact AF model with queue of AC conformations to add - may refactor code to eliminate duplication later, as linear multiallelic pool AF model will use same approach. TBD: how to print PL's with -Infinity value, right now since we never cap PL printing we end up with big nonsense numbers in those positions and vcf's look ugly. Calling MT in CEU trio with pool size = 100 goes from 2 days to 55 minutes (sic) 2012-05-11 10:05:09 -04:00
Guillermo del Angel 9acef4b206 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-10 16:00:58 -04:00
Guillermo del Angel da6f16986e Preparatory refactorings for pool indel calling and for optimizations: restructure code in PoolSNPGenotypeLikelihoods that will be shared with indels, and make it easier to rewrite when optimized version that's linear in pool size is ready (current version is linear in #of pools but not yet on pool size). 2012-05-10 16:00:37 -04:00
Ryan Poplin c9dd0f3173 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-10 13:09:10 -04:00
Ryan Poplin 0cdadffe14 Committing the best of the frantic pre-CSHL experiments: Better algorithm for partioning reads amongst the alleles they support. Require the read's original alignment to actually overlap the variant. QD uses the non-informative reads when calculating D. More HC-specific annotations for potential use in a statistical filtering strategy. Increasing the minimum kmer length in the assembly graphs. Misc minor bug fixes. 2012-05-10 13:09:03 -04:00
Guillermo del Angel 89f8a6b2e6 Revert bad part of last commit that shouldn't have been pushed 2012-05-10 10:41:08 -04:00