Joel Thibault
085588cb04
Not Nexus. Need new name. Navel?
2012-05-24 10:11:58 -04:00
Guillermo del Angel
43919078cd
Merged bug fix from Stable into Unstable
2012-05-23 21:21:01 -04:00
Guillermo del Angel
4bc04e2a9e
Correct way in which start/stop positions in a VC are computed when creating an indel VC. Old way was incorrect in case GENOTYPE_GIVEN_ALLELES was specified with a complex record. New way should work in general for all cases and is simpler.
2012-05-23 21:19:30 -04:00
Guillermo del Angel
7fe07a4ae6
Bug fix: prevent index out of bounds error if reference sample in pool caller has a call present at a site but genotype is a no-call allele
2012-05-22 21:06:53 -04:00
Joel Thibault
dad75babf1
Increase Queue memory limits to 16 GB
2012-05-22 10:50:47 -04:00
Joel Thibault
af3d73b884
Re-enable partitioning for Mongo reads (but not writes)
2012-05-22 10:50:47 -04:00
Ryan Poplin
692addb498
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-22 10:25:03 -04:00
Ryan Poplin
c3fb321014
Minor updates to pacbio data processing script to make it work with the latest bwa version/settings.
2012-05-22 10:24:45 -04:00
Christopher Hartl
d366cce714
Initial commit of a burden testing framework. Currently tests against only one phenotype and only one weighting function, but computes robust weighted dosages and calls into an R script that calculates both a direct glm LRT and an asymptotic normal p-values. Weights currently read in from external file (beta-values). Future work is to let these be calculated on the fly from e.g. annotation, potential impact, conservation, etc, and enable multiple weighting schemes tested jointly for association against multiple phenotypes.
2012-05-21 16:56:32 -04:00
Ryan Poplin
08dfd6cab6
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-21 16:47:07 -04:00
Ryan Poplin
04000d920c
Bug fix in BadCigar read filter for index out of bounds exception when used with a bam file that contains unmapped reads.
2012-05-21 16:46:59 -04:00
Khalid Shakir
94cd4e6a7d
Updated WGP min confidence from 4 to 10 based on recommendations from depristo and ebanks.
2012-05-21 16:41:45 -04:00
Eric Banks
666862af19
Added @Hidden option for GSA production use to cap the max alleles for indels at a lower number than for SNPs
2012-05-21 16:03:29 -04:00
Khalid Shakir
e57cd78bba
Killed two more resource leakers that ignored requests to close wrapped file pointers, and added Unit Tests for each.
...
This bug will happen in all adapter/wrapper classes that are passed a resource, and then in their close method they ignore requests to close the wrapped resource, causing a leak when the adapter is the only one left with a reference to the resource.
Ex:
public Wrapper getNewWrapper(File path) {
FileStream myStream = new FileStream(path); // This stream must be eventually closed.
return new Wrapper(myStream);
}
public void close(Wrapper wrapper) {
wrapper.close(); // If wrapper.close() does nothing, NO ONE else has a reference to close myStream.
}
2012-05-21 15:41:56 -04:00
Eric Banks
7f5ec17d22
Fixed up the comments in the GATKReportTable code and added some sanity checks to make sure that the user doesn't inconsistently add rows and corresponding IDs to the table.
2012-05-21 14:16:13 -04:00
Joel Thibault
27c46b8071
Better matching and searching between sites and samples
2012-05-21 09:50:49 -04:00
Joel Thibault
8fb6fc9ff9
Contigs as blocks are too large for MongoDB documents
2012-05-21 09:50:49 -04:00
Eric Banks
c1c70f3b41
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-21 09:39:08 -04:00
Eric Banks
92d8aa3d4c
Don't exception out in these VE modules if the VCF has records that aren't just SNPs or indels
2012-05-21 09:38:52 -04:00
Guillermo del Angel
5cc9a12fbb
Fixed definition in VCF header for pool caller genotype parameters MLAC and MLAF
2012-05-19 14:53:37 -04:00
Eric Banks
3af3834d50
Fixing 2 bugs in the SAMRecord printing argument descriptor code (as reported by Kristian):
...
* For some reason, the original implementor decided to use Booleans instead of booleans and didn't always check for null so we'd occasionally get a NPE. Switched over to booleans.
* We'd also generate a NPE if SAMRecord writing specific arguments (e.g. --simplifyBAM) were used while writing to sdout.
2012-05-18 11:55:41 -04:00
Eric Banks
26968ae8eb
Forgot that the VCFStreamingOntegrationTest uses VE
2012-05-18 02:51:53 -04:00
Eric Banks
52c206d5db
Has anyone else ever noticed that the DiffEngine outputs were always doubled for some reason? That no longer happens with the new reports.
2012-05-18 02:32:20 -04:00
Eric Banks
03d40272c8
Removed old GATKReport code and moved the new stuff in its place.
2012-05-18 01:44:31 -04:00
Eric Banks
a26b04ba17
Extensive refactoring of the GATKReports. This was a beast.
...
The practical differences between version 1.0 and this one (v1.1) are:
* the underlying data structure now uses arrays instead of hashes, which should drastically reduce the memory overhead required to create large tables.
* no more primary keys; you can still create arbitrary IDs to index into rows, but there is no special cased primary key column in the table.
* no more dangerous/ugly table operations supported except to increment a cell's value (if an int) or to concatenate 2 tables.
Integration tests change because table headers are different.
Old classes are still lying around. Will clean those up in a subsequent commit.
2012-05-18 01:11:26 -04:00
Guillermo del Angel
5189b06468
New annotation for indels that describe if they're STR's and their characteristics. If an indel is a STR, 3 fields are added to INFO: STR (boolean), RU = repeat unit (String), RPA = number of repetitions per allele. So, for example, if ATATAT* context gets changed to ATAT and ATATATAT, then RU=AT and RPA=3,2,4. Will be made standard annotation shortly. Added unit tests for new functionality. Pending: refactor VariantContextUtils.isRepeat() to unify code, and fix VariantEval functionality.
2012-05-17 15:28:19 -04:00
David Roazen
9c6bccfd8b
build system overhaul
...
* Added support for a protected directory whose contents are only made public in binary form
* Simplified and reorganized build.xml to improve readability and maintainability
* build.xml now autodetects most build properties:
-Includes private/protected if they exist
-No more STING_BUILD_TYPE or specialized targets for public-only, etc.
* Build targets have changed! There are now two main build options:
"ant" build everything (GATK and Queue)
"ant gatk" build just the GATK
It was too hard to build everything before -- now it is the default.
* To run tests with debugging, use -Dtest.debug=true -Dtest.debug.port=XXXX on the command line.
Much better than the old comment/uncomment method!
2012-05-17 15:16:29 -04:00
Eric Banks
0f7c917e7a
Better error checking and messages for bad alleles
2012-05-17 13:36:42 -04:00
David Roazen
6967b3de6c
Downsampler: checking in the new standalone downsampler implementations without the engine modifications
...
Mauricio and I need to collaborate on downsampling for ReducedReads, so
for now I'm checking in my downsampler implementations without any of the
(still problematic) engine modifications.
2012-05-16 16:15:58 -04:00
Joel Thibault
b9a2e41c4b
Can't partition these in their current state
2012-05-15 13:42:42 -04:00
Joel Thibault
76905e9342
Minor cleanup before push
2012-05-15 13:34:49 -04:00
Joel Thibault
229d1aa904
Bjorn -> Nexus
2012-05-15 13:30:29 -04:00
Joel Thibault
ca39387ec7
Retrieve from the DB in block mode
...
Reorder query fields
Throw exception when DB data cannot be found
2012-05-15 13:29:45 -04:00
Joel Thibault
747e3f6c94
Modify the DB schema to use block writes
2012-05-15 13:29:07 -04:00
Eric Banks
d44886d9e8
Very naughty bug: VE output is not at all gatherable but no one told this to Queue. Fixed.
2012-05-15 10:29:04 -04:00
Eric Banks
819c3d0c15
Adding to the Hrun docs
2012-05-15 10:27:52 -04:00
Christopher Hartl
b16e169412
Variant Call QC was calling melt() without importing the reshape package. It's unclear how this ever worked...
2012-05-15 07:34:36 -04:00
Guillermo del Angel
2abd1e06cc
More fixes to prevent NPE in pool caller
2012-05-14 22:02:39 -04:00
Guillermo del Angel
7b81559a9b
One more pool caller bug fix: don't create output file for noisy simulation in unit test, or else previous results will be deleted
2012-05-14 16:27:38 -04:00
Guillermo del Angel
617ac0b88f
More pool caller bug fixes
2012-05-14 16:15:54 -04:00
Guillermo del Angel
578092b120
Pool caller bug fixes: avoid NPE in null tracker positions, fix so that we can (in theory) use Pool AF with non-pool GL model for testing
2012-05-14 15:03:53 -04:00
Guillermo del Angel
5fc3adbb04
One more VariantsToTable bug fix
2012-05-14 14:10:07 -04:00
Guillermo del Angel
04d691f04a
Forgot to update MD5's due to new Exact AF model in pool caller (all changes legit, minor QUAL/QD/SB differences). Fixed bug in VariantsToTable from previous commit
2012-05-14 14:01:29 -04:00
Guillermo del Angel
ae26f0fe14
a) Fully functional and working multiallelic exact model for pools. Needs cleanup/more testing. b) Better unit test for pool genotype likelihoods - it now optionally generates actual noisy pileups that can be used for assessing GL accuracy, c) Totally experimental, hidden option in VariantsToTable to output genotype fields. Specifying -GF will output columns of form Sample.FieldName - needs also more testing
2012-05-14 10:55:35 -04:00
Guillermo del Angel
67e5c3ff9f
Solved major scalability problem in pool caller - exact model may have been linear but computing pool GL's was O(n^p) where p was max # of alleles (4 in SNP discovery mode). Linearized approach follows exact AF model with queue of AC conformations to add - may refactor code to eliminate duplication later, as linear multiallelic pool AF model will use same approach. TBD: how to print PL's with -Infinity value, right now since we never cap PL printing we end up with big nonsense numbers in those positions and vcf's look ugly. Calling MT in CEU trio with pool size = 100 goes from 2 days to 55 minutes (sic)
2012-05-11 10:05:09 -04:00
Guillermo del Angel
9acef4b206
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-10 16:00:58 -04:00
Guillermo del Angel
da6f16986e
Preparatory refactorings for pool indel calling and for optimizations: restructure code in PoolSNPGenotypeLikelihoods that will be shared with indels, and make it easier to rewrite when optimized version that's linear in pool size is ready (current version is linear in #of pools but not yet on pool size).
2012-05-10 16:00:37 -04:00
Ryan Poplin
c9dd0f3173
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-10 13:09:10 -04:00
Ryan Poplin
0cdadffe14
Committing the best of the frantic pre-CSHL experiments: Better algorithm for partioning reads amongst the alleles they support. Require the read's original alignment to actually overlap the variant. QD uses the non-informative reads when calculating D. More HC-specific annotations for potential use in a statistical filtering strategy. Increasing the minimum kmer length in the assembly graphs. Misc minor bug fixes.
2012-05-10 13:09:03 -04:00
Guillermo del Angel
89f8a6b2e6
Revert bad part of last commit that shouldn't have been pushed
2012-05-10 10:41:08 -04:00