Mauricio Carneiro
cec7107762
Better location for the downsampling of reads in PrintReads
...
* using the filter() instead of map() makes for a cleaner walker.
* renaming the unit tests to make more sense with the other unit and integration tests
2012-01-14 14:06:09 -05:00
Mauricio Carneiro
28aa353501
Added "unbiased" downsampling parameter to PrintReads
...
* also cleaned up and updated part of the unit tests for print reads. Needs a more thorough cleaning.
2012-01-12 16:33:55 -05:00
Mauricio Carneiro
77a03c9709
Patching special case in the adaptor clipping
...
* if the adaptor boundary is more than MAXIMUM_ADAPTOR_SIZE bases away from the read, then let's not clip anything and consider the fragment to be undetermined for this read pair.
* updated md5's accordingly
2012-01-11 17:47:44 -05:00
Eric Banks
c5320ef1af
Resolving changes in integration test during merge
2012-01-10 12:14:16 -05:00
Eric Banks
0f36f6947e
Resolving merge conflicts
2012-01-10 11:44:16 -05:00
Eric Banks
f2cecce10f
Much better implementation of the approximate summing of an array of log10 values (including more efficient rounding). Now effectively takes 0% of UG runtime on T2D GENES (as opposed to 11% previously).
2012-01-10 11:34:23 -05:00
Mark DePristo
dd80ffbbbe
Merged bug fix from Stable into Unstable
2012-01-05 21:51:48 -05:00
Mark DePristo
c96fee477c
Bug fix for VariantSummary
...
-- Call sets with indels > 50 bp in length are tagged as CNVs in the tag (following the 1000 Genomes convention) and were unconditionally checking whether the CNV is already known, by looking at the known cnvs file, which is optional. Fixed. Has the annoying side effect that indels > 50bp in size are not counted as indels, and so are substrated from both the novel and known counts for indels. C'est la vie
-- Added integration test to check for this case, using Mauricio's most recent VCF file for NA12878 which has many large indels. Using this more recent and representative file probably a good idea for more future tests in VE and other tools. File is NA12878.HiSeq.WGS.b37_decoy.indel.recalibrated.vcf in Validation_Data
2012-01-05 21:51:06 -05:00
Guillermo del Angel
58d4539304
Enabled banded indel computation by default. Reversed logic in input UG argument so that we can still disable it if required. Minor changes to integration tests due to minor differences in GL's and in annotations
2012-01-04 15:28:26 -05:00
David Roazen
621ee2b613
Merged bug fix from Stable into Unstable
2012-01-03 16:56:49 -05:00
David Roazen
ea6e718cb8
SnpEff 2.0.5 support. Re-enabled SnpEff in the HybridSelectionPipeline.
...
For now, we recommend only running with the GRCh37.64 database.
2012-01-03 15:18:36 -05:00
David Roazen
4984ca5e31
Merged bug fix from Stable into Unstable
2012-01-03 11:03:30 -05:00
David Roazen
f3f01da1af
Enforce serial dependencies in RecalibrationWalkersIntegrationTest
...
Some tests in this class were intermittently not being executed due
to being randomly scheduled before tests whose results they depend on.
Now the serial dependencies are enforced to avoid problematic orderings.
2012-01-03 10:42:41 -05:00
Eric Banks
ab8d47d9a5
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-03 09:38:49 -05:00
Mauricio Carneiro
1b6d52817e
fixing adaptor clipping effect on recalibration integration test
2012-01-01 22:20:06 -05:00
Eric Banks
393993e0c7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-31 20:42:46 -05:00
Mauricio Carneiro
55cfa76cf3
Updated integration tests for the new adaptor clipping fix.
2011-12-30 18:47:14 -05:00
Eric Banks
d20a25d681
A much better way of choosing the alternate allele(s) to genotype in the SNP model of UG: instead of looking at the sum of base qualities (which can and did lead to us over-genotyping esp. when allowing multiple alternate alleles), we look at the likelihoods themselves (free since we are already calculating likelihoods for all 10 genotypes). Now, even if the base quals exceed some arbitrary threshold, we only bother genotyping an alternate allele when there's a sample for which it is more likely than ref/ref (I can generate weird edge cases where this falls apart, but none that model truly variable sites that we actually want to call). This leads to a huge efficiency improvement esp. for exomes (and esp. for many samples) where we almost always were trying to genotype all 3 alternate alleles. Integration tests change only because ref calls have slight QUAL differences (because the best alt allele is still chosen arbitrarily, but differently).
2011-12-27 16:50:38 -05:00
David Roazen
506c0e9c97
Disabling SnpEff support in the GATK and SnpEff annotation in the HybridSelectionPipeline
...
SnpEff support will remain disabled until SnpEff 2.0.4 has been officially released
and we've verified the quality of its annotations.
2011-12-23 19:12:57 -05:00
David Roazen
510c71158c
Merged bug fix from Stable into Unstable
2011-12-22 10:49:52 -05:00
David Roazen
32cdef9682
Rename *PerformanceTest test classes to *LargeScaleTest
...
This is in preparation for the installation of the new performance test suite in Bamboo.
Note that "ant performancetest" is now "ant largescaletest"
2011-12-22 10:38:49 -05:00
Mauricio Carneiro
731a463415
Updated IntegrationTests with new adaptor clipper
...
phew!
2011-12-20 17:48:52 -05:00
Laurent Francioli
16cc2b864e
- Corrected bug causing cases where both parents are HET to be accounted twice in the TDT calculation - Adapted TDT Integration test to corrected version of TDT
...
Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>
2011-12-19 10:30:59 -05:00
Eric Banks
3069a689fe
Bug fix: if there are multiple records at a given position, it turns out that SelectVariants would drop all variants that follow after one that fails filters (instead of dropping just the failing one). Added an integration test to cover this case.
2011-12-19 10:04:33 -05:00
Eric Banks
76bd13a1ed
Forgot to update the unit test
2011-12-18 01:13:49 -05:00
Eric Banks
c5ffe0ab04
No reason to sum the normalized posteriors array to get Pr(AF>0) given that we can just compute 1.0 - array[0]. Integration tests change only because of trivial precision artifacts for reference calls using EMIT_ALL_SITES.
2011-12-18 00:31:47 -05:00
Eric Banks
6dc52d42bf
Implemented the proper QUAL calculation for multi-allelic calls. Integration tests pass except for the ones making multi-allelic calls (duh) and one of the SLOD tests (which used to print 0 when one of the LODs was NaN but now we just don't print the SB annotation for that record).
2011-12-18 00:01:42 -05:00
Eric Banks
4fddac9f22
Updating busted integration tests
2011-12-14 16:24:43 -05:00
Eric Banks
1e90d602a4
Optimization: cache up front the PL index to the pair of alleles it represents for all possible numbers of alternate alleles.
2011-12-14 13:38:20 -05:00
Mauricio Carneiro
5cc1e72fdb
Parallelized SelectVariants
...
* can now use -nt with SelectVariants for significant speedup in large files
* added parallelization integration tests for SelectVariants
2011-12-12 18:41:14 -05:00
Laurent Francioli
7cf27bb66e
Updated md5sum for MendelianViolationEvaluator test to reflect the change in column alignment in VariantEval.
2011-12-12 12:22:43 +01:00
Laurent Francioli
025bdfe2cc
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-12 12:19:44 +01:00
Eric Banks
7b6338c742
Merge branch 'master' into trialleles
2011-12-11 00:28:46 -05:00
Eric Banks
7c4b9338ad
The old bi-allelic implementation of the Exact model has been completely deprecated - you can only use the multi-allelic implementation now.
2011-12-11 00:23:33 -05:00
Eric Banks
044f211a30
Don't collapse likelihoods over all alt alleles - that's just not right. For now, the QUAL is calculated for just the most likely of the alt alleles; I need to think about the right way to handle this properly.
2011-12-10 23:57:14 -05:00
Eric Banks
442ceb6ad9
The Exact model now computes both the likelihoods and posteriors (in separate arrays); likelihoods are used for assigning genotypes, not the posteriors.
2011-12-09 10:16:44 -05:00
Laurent Francioli
a79144f7db
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-09 15:57:24 +01:00
Laurent Francioli
72fbfba97d
Added UnitTests for getFamilies() and getChildrenWithParents()
2011-12-09 15:57:07 +01:00
Eric Banks
aa4a8c5303
No dynamic programming solution for assignning genotypes; just done greedily now. Fixed QualByDepth to skip no-call genotypes. No-calls are no longer given annotations (attributes).
2011-12-09 02:25:06 -05:00
Eric Banks
2fe50c64da
Updating md5s
2011-12-09 00:47:01 -05:00
Eric Banks
4aebe99445
Need to use longs for the set index (because we can run out of ints when there are too many alternate alleles). Integration tests now use the multiallelic implementation.
2011-12-08 15:31:02 -05:00
Mark DePristo
4055877708
Prints 0.0 TiTv not NaN when there are no variants
...
-- Updated md5
2011-12-07 12:07:54 -05:00
Eric Banks
79d18dc078
Fixing indexing bug on the ACsets. Added unit tests for the Exact model code.
2011-12-06 16:17:18 -05:00
Matt Hanna
f5b977fc88
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-06 10:11:35 -05:00
Matt Hanna
4001c22a11
Better file count / buffering variation in test suite. Parameterized read shard buffering. Misc cleanup.
2011-12-06 10:10:38 -05:00
Khalid Shakir
677bea0abd
Right aligning GATKReport numeric columns and updated MD5s in tests.
...
PreQC parses file with spaces in sample names by using tabs only.
PostQC allows passing the file names for the evals so that flanks can be evaled.
BaseTest's network temp dir now adds the user name to the path so files aren't created in the root.
HybridSelectionPipeline:
- Updated to latest versions of reference data.
- Refactored Picard parsing code replacing YAML.
2011-12-05 23:22:15 -05:00
Eric Banks
29662be3d7
Fixed bug where k=2N case wasn't properly being computed. Added optimization for BB genotype case not in old model. At this point, integration tests pass except for 1 case where QUALs differ by 0.01 (this is okay because I occasionally need to compute extra cells in the matrix which affects the approximations) and 2 cases where multi-allelic indels are being genotyped (some work still needs to be done to support them).
2011-12-03 23:12:04 -05:00
Laurent Francioli
9574be0394
Updated MendelianViolationEvaluator integration test
2011-11-30 14:44:15 +01:00
Laurent Francioli
a4606f9cfe
Merge branch 'MendelianViolation'
...
Conflicts:
public/java/src/org/broadinstitute/sting/utils/MendelianViolation.java
2011-11-30 11:13:15 +01:00
Laurent Francioli
7d58db626e
Added MendelianViolationEvaluator integration test
2011-11-30 10:09:20 +01:00