Mark DePristo
ee2f12e2ac
Simpler naming convention for AlleleFrequencyCalculation => AFCalc
2012-10-15 07:53:55 -04:00
Mark DePristo
cf3f9d6ee8
Reorganize and cleanup AFCalculations
...
-- Now contained in a package called afcalc
-- Extracted standard alone classes from private static classes in ExactAF
-- Most fields are now private, with accessors
-- Overall cleaner organization now
2012-10-15 07:53:55 -04:00
Mark DePristo
13211231c7
Restructure and cleanup ExactAFCalculations
...
-- Now there's no duplication between exact old and constrained models. The behavior is controlled by an overloaded abstract function
-- No more static function to access the linear exact model -- you have to create the surrounding class. Updated code in the system
-- Everything passes unit tests
2012-10-15 07:53:54 -04:00
Mark DePristo
99ad7b2d71
GeneralPloidyExact should use indel max alt alleles
2012-10-15 07:53:54 -04:00
Mark DePristo
bf276baca0
Don't try to compute full exact model for > 100 samples
2012-10-15 07:53:54 -04:00
Mark DePristo
b924e9ebb4
Add OptimizedDiploidExactAF to PerformanceTesting framework
2012-10-15 07:53:54 -04:00
Mark DePristo
f800f3fb88
Optimized diploid exact AF calculation uses maxACs to stop the calculation by maxAC by allele
...
-- Added unit tests to ensure the approximation isn't so far from our reference implementation (DiploidExactAFCalculation)
2012-10-15 07:53:54 -04:00
Mark DePristo
efad215edb
Greedy version of function to compute the max achievable AC for each alt allele
...
-- walks over the genotypes in VC, and computes for each alt allele the maximum AC we need to consider in that alt allele dimension. Does the calculation based on the PLs in each genotype g, choosing to update the max AC for the alt alleles corresponding to that PL. Only takes the first lowest PL, if there are multiple genotype configurations with the same PL value. It takes values in the order of the alt alleles.
2012-10-15 07:53:54 -04:00
Mark DePristo
7666a58773
Function to compute the max achievable AC for each alt allele
...
-- Additional minor cleanup of ExactAFCalculation
2012-10-15 07:53:53 -04:00
Mark DePristo
b3cb33a416
simple script to run nano schedule main[]
2012-10-15 07:52:02 -04:00
Eric Banks
a8efa5451a
Protect against bad bases users have screwy data (or try to use zipped references)
2012-10-12 15:05:03 -04:00
David Roazen
da1cffbfca
Run performance tests in gsa-engineering queue on gsa4 rather than gsa queue
...
Running the performance tests on the farm wasn't working out very well --
it's been too long since they've run to completion. Switching back to
running them on gsa4 for now.
2012-10-12 14:21:27 -04:00
Guillermo del Angel
5971006678
Bug fix when running nondiploid mode in UG with EMIT_ALL_SITES: if site was reference-only, QUAL is produced OK but genotypes were being set to no-call because of unnecessary likelihood normalization. May change integration test md5 which I'll fix later today
2012-10-12 12:45:55 -04:00
Eric Banks
81532a0529
Missing file are user errors.
2012-10-12 09:48:12 -04:00
Eric Banks
fa77a83783
Update the out of space error to include another permutation
2012-10-12 09:38:12 -04:00
Eric Banks
85525d9e6e
Make Geraldine's life easier: from now on we treat problems where a temp file cannot be found when running the GATK with multiple threads as User Errors (since they are 99.9% of the time). This is an extremely large class of errors in Tableau and on the forums. Helpful error message tells users exactly what we tell them on the forums anyways (Geraldine: feel free to edit).
2012-10-12 09:19:50 -04:00
Eric Banks
ad60300bee
Catch malformed BAM files at the source since this is the largest class of errors in Tableau.
2012-10-12 09:07:57 -04:00
Eric Banks
593c8065d9
Fix docs for BadMateFilter
2012-10-12 08:35:45 -04:00
Christopher Hartl
6b9987cf1b
Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable
2012-10-12 00:48:42 -04:00
Christopher Hartl
c1211ad3a1
Full test suite of LD-corrected GRM calculation. The correctness of this code is now largely verified. Matches GCTA when no correction is used (up to 6 decimal places). Bed reading relies on a particular test directory that is still local. The rest is all generated in unit test fashion.
2012-10-12 00:46:02 -04:00
David Roazen
3861212dab
Fix inefficiency in FilePointer GenomeLoc validation
...
Validation of GenomeLocs in the FilePointer class was extremely inefficient
when the GenomeLocs were added one at a time rather than all at once.
Appears to mostly fix GSA-604
2012-10-11 19:55:14 -04:00
Mark DePristo
9b19f5ce99
No longer include stack traces for user exceptions in GATK logs
...
-- Was taking a shocking large amount of space on the server, and slowing down Tableau so much all stack traces had to be disabled
2012-10-10 20:41:03 -04:00
Ryan Poplin
08b8ce6903
Fixing merge conflicts related to the comment formatting in the BQSR.
2012-10-10 16:03:58 -04:00
Ryan Poplin
45717349dc
Fixing BQSR bug reported on the forum for reads that begin with insertions.
2012-10-10 16:01:37 -04:00
David Roazen
40a3b5bfe2
Revert "Testing github auto-mirroring attempt #2 ; please ignore"
...
This reverts commit aacbe369446af8d7901820bf828ed15d72497005.
2012-10-10 15:28:50 -04:00
David Roazen
fba6a084e4
Testing github auto-mirroring attempt #2 ; please ignore
2012-10-10 15:28:13 -04:00
David Roazen
267d1ff59c
Revert "Testing the new github auto-mirroring; please ignore"
...
This reverts commit bd8b321132167f6f393f234ea0e93edcfd8701ff.
2012-10-10 15:07:48 -04:00
David Roazen
66ee3f230f
Testing the new github auto-mirroring; please ignore
2012-10-10 15:06:50 -04:00
Ryan Poplin
15b405d458
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-10 10:47:40 -04:00
Ryan Poplin
2a9ee89c19
Turning on allele trimming for the haplotype caller.
2012-10-10 10:47:26 -04:00
Christopher Hartl
7381d5c243
Since this GRM now matches GCTA output for uncorrected intervals, implement and start proofing methods for LD-correction for genome partitioning. Very rudimentary tests just to solidify current position.
...
Wish I could do this in the GATK, but it has to run on bed files natively. Phooey.
2012-10-10 01:59:13 -04:00
Khalid Shakir
f66284658d
RetryMemoryLimit now works with Scatter/Gather.
2012-10-09 21:51:03 -04:00
Johan Dahlberg
e9b9e2318c
Fixed SortSam bug, for .done file
...
The *.bai.done file for the .bai file was written in the run directory instead of in the specified output directory.
Changing getName() to getAbsolutePath() fixes this.
Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2012-10-09 16:25:18 -04:00
Ryan Poplin
b543bddbb7
Fixing merge conflicts related to the comment formatting in the BQSR.
2012-10-08 10:23:08 -04:00
Ryan Poplin
b3cc04976f
Fixing BQSR bug reported on the forum for reads that being with insertions.
2012-10-08 10:18:29 -04:00
Eric Banks
be9fcba546
Don't allow triggering of polyploid consensus creation in regions where there is more than one het, as it just doesn't work properly. We could probably refactor at some point to make it work, but it's not worth doing that now (especially as it should be rare to have multiple proximal known hets in a single sample exome).
2012-10-07 16:32:48 -04:00
Eric Banks
08ac80c080
RR bug: when the last base in the window around the polyploid consensus is filtered (low quality), the filtered consensus is not flushed and subsequent filtered bases (but importantly not contiguous to this one) are just added to this position. In other words, bases were being added to the wrong genomic positions. Fixed.
2012-10-07 10:52:01 -04:00
Eric Banks
36a26a7da6
md5s failed because I forgot to add --no_cmdline_in_header so it is different depending on where you run from. Fixed.
2012-10-07 08:35:55 -04:00
Eric Banks
a5aaa14aaa
Fix for GSA-601: Indels dropped during liftover. This was a true bug that was an effect of the switch over to the non-null representation of alleles in the VariantContext. Unfortunately, this tool didn't have integration tests - but it does now.
2012-10-07 01:19:52 -04:00
Eric Banks
82e40340c0
Use StringBuilder over StringBuffer
2012-10-07 00:02:15 -04:00
Eric Banks
5d6aad67e2
Fix for bug reported on forums: VariantsToTable does not handle lists and nested arrays correctly. Added an integration test to cover printing of PLs.
2012-10-07 00:01:27 -04:00
Eric Banks
e7798ddd2a
Fix for JIRA GSA-598: AD field not handled properly by CombineVariants. It was also not handled by SelectVariants either. We now strip the AD field out whenever combining/selecting makes it invalid due to a changing of the number of ALT alleles.
2012-10-06 23:02:36 -04:00
Eric Banks
bfc551f612
Fix for GSA-589: SelectVariants with -number gives biased results. The implementation was not good and it's not worth keeping this busted code around given that we have a working implementation of a fractional random sampling already in place, so I removed it.
2012-10-06 22:39:49 -04:00
Eric Banks
e8a6460a33
After merging with Yossi's fix I can confirm that the AD is fixed when going through the HC too. Added similar fixes to DP and FS annotations too.
2012-10-05 16:37:42 -04:00
Eric Banks
b7639d7ceb
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-05 16:21:17 -04:00
Eric Banks
52326942cf
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-10-05 16:15:07 -04:00
Eric Banks
04853252a0
Possible fix for reduced reads coming from the HaplotypeCaller in the AD
2012-10-05 16:15:04 -04:00
Yossi Farjoun
ef90beb827
- forgot to use git rm to delete a file from git. Now that VCF is deleted.
...
- uncommented a HC test that I missed.
2012-10-05 16:14:51 -04:00
Yossi Farjoun
6874a5ce76
This bam and bai are needed for testing the ADAnnotation tests (both UG and HC)
...
The vcf file was mistakenly added previously, now removed.
2012-10-05 16:10:41 -04:00
Yossi Farjoun
d419a33ed1
* Added an integration test for AD annotation in the Haplotype caller.
...
* Corrected FS Anotation for UG as for AD.
* HC still does not annotate ReducedReads correctly (for FS nor AD)
2012-10-05 15:23:59 -04:00