Eric Banks
0c8e801021
Removing public to private dependency
2012-05-01 11:04:11 -04:00
Eric Banks
ef082356e9
Merge remote-tracking branch 'unstable/master'
2012-05-01 08:47:08 -04:00
Mauricio Carneiro
462450c3e3
disabling all BQSR unit tests
...
with the changes to the cycle covariate, some tests need updates, others need to be completely re-written.
2012-04-30 14:39:55 -04:00
Mauricio Carneiro
825ad30477
Adding readgroup filter option to BQSR queue script
2012-04-30 14:39:55 -04:00
Guillermo del Angel
e185632013
Exhaustive unit tests for Pool SNP genotype likelihoods:
...
a) Add ability for ErrorModel to be specified by external log-probability vector for testing.
b) For a given depth and ploidy(=2*samples/pool), create artificial high quality pileup testing from AC=0 to AC=ploidy, and test that pool GL's have expected content.Misc. refactorings and cleanups
c) Misc. cleanups and beautification.
2012-04-30 14:29:46 -04:00
Christopher Hartl
7d029b9a28
Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-30 12:16:30 -04:00
Christopher Hartl
944a7d815e
Bringing VQSRV3 up to date. Lots of new features (un-classifying the worst-performing training sites, treating the x% best/worst sites as postive/negative points, ability to pass in a monomorphic track to see ROC curves output). Minor changes to AlleleBalance: weighted average was incorrectly specified (using logscale actually biased the average towards the AB of low-quality genotypes), and breaking out AB by het, hom, and diploid to bring it in line with some (private) changes to the indel likelihood model that (correctly) computes these values for indels.
2012-04-28 11:31:03 -04:00
Ryan Poplin
54a9bc2da2
Bug fix in reverse trim alleles for the case of mixed records that become non-mixed after subsetting the alleles.
2012-04-28 09:12:26 -04:00
Ryan Poplin
e332aeaf70
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-27 16:21:21 -04:00
Ryan Poplin
2b5dd28550
Bug fix in reverse trim alleles for the case of mixed records.
2012-04-27 16:21:02 -04:00
Mauricio Carneiro
c2472b3c45
parallel BQSR implementation.
2012-04-27 15:18:08 -04:00
Mauricio Carneiro
1db2d1ba82
Do not add the first and last 4 cycles to the recalibration tables.
2012-04-27 15:18:07 -04:00
Mauricio Carneiro
08dbd756f3
Quick QC walkers to look at the error profile of indels in the read
2012-04-27 15:18:07 -04:00
Guillermo del Angel
24173d860a
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-27 14:41:38 -04:00
Guillermo del Angel
730208133b
Several fixes and improvements to Pool caller with ancillary test functions (not done yet):
...
a) Utility class called Probability Vector that holds a log-probability vector and has the ability to clip ends that deviate largely from max value.
b) Used this class to hold site error model, since likelihoods of error model away from peak are so far down that it's not worth computing with them and just wastes time.
c) Expand unit tests and add an exhaustive test for ErrorModel class.
d) Corrected major math bug in ErrorModel uncovered by exhaustive test: log(e^x) is NOT x if log's base = 10.
e) Refactored utility functions that created artificial pileups for testing into separate class ArtificialPileupTestProvider. Right now functionality is limited (one artificial contig of 10 bp), can only specify pileups in one position with a given number of matches and mismatches to ref) but functionality will be expanded in future to cover more test cases.
f) Use this utility class for IndelGenotypeLikelihoods unit test and for PoolGenotypeLikelihoods unit test (the latter testing functionality still not done).
g) Linearized implementation of biallelic exact model (very simple approach, similar to diploid exact model, just abort if we're past the max value of AC distribution and below a threshold). Still need to add unit tests for this and to expand to multiallelic model.
h) Update integration test md5's due to minor differences stemming from linearized exact model and better error model math
2012-04-27 14:41:17 -04:00
Eric Banks
da83076ab5
The name still sounds silly to me, but less silly than before
2012-04-27 14:39:47 -04:00
Eric Banks
959f5417f2
Handle complex multi-allelic events
2012-04-27 14:19:41 -04:00
Eric Banks
08bebcecbd
Updating the status values
2012-04-27 14:07:48 -04:00
Eric Banks
d8f6bc232b
Adding support for non-haplotype-based comparison of multi-allelics (i.e. we do compare alleles and test for partial equality, but we don't construct haplotypes for this yet).
2012-04-27 13:43:52 -04:00
Eric Banks
1bbb156afa
Fixing compile error
2012-04-27 12:52:39 -04:00
Eric Banks
0439047269
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-27 10:49:45 -04:00
Eric Banks
05b44dd017
The genotypeCounts array wasn't always being initialized before it was accessed, leading to a NPE (which got caught and thrown as a JEXL expression when used in selection). Added unit test to cover all genotype count methods.
2012-04-27 10:49:36 -04:00
Khalid Shakir
9801dd114f
Bug fix for: https://getsatisfaction.com/gsa/topics/problem_with_indelrealigner_and_l_unmapped
...
The GATK -L unmapped is for GenomeLocs with SAMRecord.NO_ALIGNMENT_REFERENCE_NAME, not SAMRecord.getReadUnmappedFlag()
Previously unmapped flag reads in the last bin were being printed while also seeking for the reads without a reference contig.
2012-04-27 09:58:38 -04:00
Khalid Shakir
005cdcad5b
Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-27 09:58:14 -04:00
Menachem Fromer
64077ec7c8
Add option to use XHMM to genotype all possible consecutive sub-segments of CNVs
2012-04-27 01:42:20 -04:00
Khalid Shakir
2ad1aa2a2c
Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-26 19:31:47 -04:00
Eric Banks
db41d10f54
First (very rough) version of a haplotype-based resolver of alleles from two provided VCF files. It works well on standard cases (e.g. MNP vs. 2 SNPs) but still needs to be tested more thoroughly and I need to add support for multi-allelics (although I know how to do that now). Being committed in private for Ryan's benefit, but no one else should be using it now.
2012-04-26 16:30:22 -04:00
Khalid Shakir
b8c0405715
Updates to the WGP to only run eval on chr20.
...
PicardIntervals object now prints out a meaningful toString when HSP batches return multiple interval lists.
2012-04-26 16:28:45 -04:00
Guillermo del Angel
2f86ccb086
Correct md5's for previous code change
2012-04-26 16:20:41 -04:00
Guillermo del Angel
972d6531b6
Corner case fix for indel GL computation: sometimes (depending on surrounding context) reads which are not informative of two candidate haplotypes end up having marginally higher likelihoods with one haplotype as opposed to another, depending on uncertainty on alignments in surrounding regions. So, a sample whose GL is -0.0001,-0.0005,-0.001 may have its genotype set to 1/1 due to this statistical noise. We already have a tolerance comparing max(gl)-min(gl) to avoid genotyping, so this tolerance is now increased from 0.001 to 0.1 (equivalent to 1 PL unit) to avoid genotyping a sample if all PLs are within this threshold. Changed 2 integration test md5s that hit this case.
2012-04-26 10:15:26 -04:00
Laurent Francioli
ab2a952ad1
PED support for Inbreeding Coefficient annotation
...
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-04-25 12:56:47 -04:00
Laurent Francioli
219b0a128b
PED support for ChromosomeCounts annotation
...
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-04-25 12:50:04 -04:00
Laurent Francioli
19d5213d5a
Added function to get founders IDs in SampleDB
...
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-04-25 12:49:36 -04:00
Mark DePristo
120deaa010
Remove old licensing
2012-04-25 12:23:08 -04:00
Mark DePristo
dab25afc88
Add warning message about ratios in variantQCreport, give ratio for MAF > 10%
2012-04-25 12:22:32 -04:00
Mauricio Carneiro
902277856e
fix for RBP getPileupsForSamples()
...
do not differentiate per sample pileups from generic pileups. Do the same for both -- it's O(n) either way.
2012-04-24 17:20:30 -04:00
Mauricio Carneiro
82b4798913
CountBasesWalker -- a quick QC walker.
2012-04-24 17:20:30 -04:00
Mauricio Carneiro
e440d0ce69
BQSR triage #4
...
* fixed queue script plot file names
* updated the ReadGroupCovariate to use the platform unit instead of sample + lane.
* fixed plotting of marginalized reported qualities
2012-04-24 17:19:54 -04:00
Eric Banks
d6277b70d8
Forgot to consider the optimized case in hasAllele
2012-04-24 11:32:28 -04:00
Eric Banks
91bad244d5
Using a VCF whose ALT is the reference in GGA mode is a User Error
2012-04-24 11:08:37 -04:00
Eric Banks
74ad008163
Adding VariantContext.hasAlternateAllele functionality
2012-04-24 11:07:46 -04:00
Eric Banks
66f3315548
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-24 09:39:55 -04:00
Eric Banks
bcb93dda5f
Fixing docs (rank sum test values are not phred-scaled)
2012-04-24 09:39:42 -04:00
Mauricio Carneiro
e39a59594a
BQSR triage and test routines
...
* updated BQSR queue script for faster turnaround
* implemented plot generation for scatter/gatherered runs
* adjusted output file names to be cooperative with the queue script
* added the recalibration report file to the argument table in the report
* added ReadCovariates unit test -- guarantees that all the covariates are being generated for every base in the read
* added RecalibrationReport unit test -- guarantees the integrity of the delta tables
2012-04-23 11:23:00 -04:00
Eric Banks
a733723439
Merged bug fix from Stable into Unstable
2012-04-23 10:30:30 -04:00
Eric Banks
2761da975e
Handle null VCs (which can arise when indels are present in the file)
2012-04-23 10:30:00 -04:00
Eric Banks
cd63bcb1b8
Fixing unit tests to register the user exception being thrown (instead of the NumberFormatException)
2012-04-23 10:06:51 -04:00
Eric Banks
63aa79df82
Slightly better error message
2012-04-23 09:37:28 -04:00
Eric Banks
7b5fbf9567
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-04-23 09:34:08 -04:00
Eric Banks
4edb005411
Catch poorly formatted PL/GL fields
2012-04-23 09:33:50 -04:00