gatk-3.8

Commit Graph

Author	SHA1	Message	Date
carneiro	91fb664135	Many updates to SelectVariants : 1) There is now a different parameter for sample name (-sn), sample file (-sf) or sample expression (-se). The unexpected behavior of the previous implementation was way too tricky to leave unchecked. (if you had a file or directory named after a sample name, SV wouldn't work) 1b) Fixed a TODO added by Eric -- now the output vcf always has the samples sorted alphabetically regardless of input (this came as a byproduct of the implementation of 1) 2) Discordance and Concordance now work in combination with all other parameters. 3) Discordance now follows Guillermo's suggestion where the discordance track is your VCF and the variant track is the one you are comparing to. I have updated the example in the wiki to reflect this change in interpretation. 4) If you DON'T provide any samples (-sn, -se or -sf), SelectVariants works with all samples from the VCF and ignores sample/genotype information when doing concordance or discordance. That is, it will report every "missing line" or "concordant line" in the two vcfs, regardless of sample or genotype information. 5) When samples are provided (-sn, -se or -sf) discordance and concordance will go down to the genotypes to determine whether or not you have a discordance/concordance event. In this case, a concordance happens only when the two VCFs display the same sample/genotype information for that locus, and discordance happens when the disc track is missing the line or has a different genotype information for that sample. 6) When dealing with multiple samples, concordance only happens if ALL your samples agree, and discordance happens if AT LEAST ONE of your samples disagree. --- Integration tests: 1) Discordance and concordance test added 2) All other tests updated to comply with the new 'sorted output' format and different inputs for samples. --- Methods for handling sample expressions and files with list of samples were added to SampleUtils. I recommend NOT USING the old getSamplesFromCommandLineInput as this mixing of sample names with expressions and files creates a rogue error that can be challenging to catch. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6072 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-23 20:18:45 +00:00
droazen	658e65d26c	2 unrelated changes: 1) fix the variant context adaptor for dbsnp; conversion of deletions was totally broken. 2) stop using paths that include gsa-scr1 in integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6070 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:56:07 +00:00
droazen	d323ef0461	As promised, VariantFiltration can now mask out sites within a user-specified window around the provided mask rod. By default the window is 0, but you can now use the --maskExtension argument to increase that value. Added integration tests to cover this new functionality. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6060 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:55:29 +00:00
droazen	26d837f59e	Factorial and log Factorial utilities avoiding overflow using the gamma function. Lots of unit tests. Everything is working great. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6058 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:55:20 +00:00
droazen	8d5b4af8ca	Binomial and Multinomial interfaces for probability and coefficients in log and real space. Passed all unit tests. BinomialCumulativeProbability was reformatted to follow the now standard parameter order. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6057 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:55:15 +00:00
droazen	4abb7c424b	implementation of the Gamma function and log10 Binomial / Multinomial coefficients. Unit tests for gamma and binomial passed with honors. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6056 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:55:09 +00:00
droazen	4f7a64a798	Fixing broken walker as per GS; adding integration test to cover it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6040 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:54:04 +00:00
droazen	0e057276ae	Changing the default behavior of the IndelRealigner to run without Smith-Waterman. Changed around the integration tests accordingly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6039 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:53:58 +00:00
droazen	53c089949e	Added integration test for -n parameter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6032 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 22:53:22 +00:00
ebanks	745935ffc2	No longer used - instead see the ConstrainedMateFixingManager class git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6030 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-22 19:38:17 +00:00
kshakir	a1f8aa90c0	Added an integration test showing how to use LSF C API to get LSF parameters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6025 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-21 22:54:55 +00:00
depristo	4c6d0e6143	Added stratification by discrete allele count, just like AF, but requiring genotypes so it can be exact. Added docs on wiki, and integrationtest using Kiran's very nice fundamental VCF VariantEvalWalker now passes a pointer to itself to the Stratefication setVariantEvalWalker (and assoc. get method) so that stratefications can look at VEWalker variables to obtain information necessary for their calculations, like the list of eval samples. This is a better interface, in my opinion, than the current approach of extending the base abstract Stratefication to include an initialize function that has all arguments necessarily for any Strat. JEXL expressions now provide access to the VariantContext vc object itself, so you can write JEXL's that directly use VariantContext and associated functions from the command line. ExomePostQC Queue script now creates a byAC eval using the new strat, and no longer produces a byAF file (as this was not exact, and lead to strange punctile behavior when actual AF quantization was out of sync with fix quantization of AF strat. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6015 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-19 03:11:00 +00:00
ebanks	4e85416af1	[Foiled yet again when trying to do this in git] Slight modifications in the argument structure for the IndelRealigner. Instead of boolean flags -knownsOnly and -doNotUseSW, we now have an enum --consensusDeterminationModel which lets you specify knowns only, also use indels in reads, or also use SW. Please note that the default behavior of IR has not changed at all (and won't for a few more days) - that'll be done in GIT (fingers crossed). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6008 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 17:35:37 +00:00
depristo	4304fc4862	Fixed up md5s git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6007 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-16 16:20:41 +00:00
ebanks	5be4f31515	Surprisingly, the TileCovariate was indeed covered in integration tests. Updated. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5997 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-14 17:40:23 +00:00
rpoplin	17e17d3c3c	Misc cleanup in VQSR. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5972 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-09 18:37:37 +00:00
depristo	ac3620839c	Very basic intergration tests for ReducedReads, to allow safe optimization of the code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5970 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-09 17:06:32 +00:00
rpoplin	895e86c544	Annotations used to build the 1000G consensus callsets are now standard annotations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5969 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-09 17:03:39 +00:00
kiran	49b021d435	Changed the definition of degeneracy (it's at the site level - degeneracy of a position in a codon, not degeneracy of the amino acid itself like I initially thought. Added the ability to supply an ancestral allele track (available in /humgen/gsa-hpprojects/GATK/data/Ancestor/). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5963 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 15:07:31 +00:00
depristo	a331e13721	Slightly more extensive test includes a 0/0 site to genotype git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5961 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 14:48:55 +00:00
depristo	0f43b10c39	Optimization in CombineVariants when merging into a sites_only VCF VariantContextUtils now was a utility function that creates a sitesOnlyVariantContext from an input VC Add complex merge test of SNPs and indels from the new batch merge wiki in : http://www.broadinstitute.org/gsa/wiki/index.php/Merging_batched_call_sets with multiple alleles for an indel. Created a BatchMergeIntegrationTest that uses GGA with the complex merged input alleles to genotype SNPs and Indels with multiple alleles simultaneously in NA12878. Looks great. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5959 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-08 14:31:46 +00:00
delangel	7e49e1668f	Finished changing md5's due to recent change in definition of mixed and indel vc's. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5948 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-05 00:40:51 +00:00
delangel	d534241f35	Major revamp of annotations for indels: a) All rank sum tests now work for indels including multiallelic sites. For the latter cases, rank sum test is REF vs most common allele b) Redid computation of HaplotypeScore for indels. It's now trivially easy to do because we are already computing likelihoods of each read vs haplotypes in GL computation so we reuse that if available. For multiallelic case, we score against N haplotypes where N is total called alleles. Drawback is that all cases need information contained in likelihood table that stores likelihood for each pileup element, for each allele. If this table is not available we dont annotate, so we can only fully annotate indels right now when running UG but not when running VariantAnnotator alone. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5947 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-04 15:34:24 +00:00
delangel	1448a1f155	Change md5 because conversion of a tri-allelic dbsnp indel record is now legit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5946 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-04 11:24:16 +00:00
delangel	53667ce8fa	Disabled test that checks whether output is the same whether in Genotype Given Alleles mode or not - it won't as long as extended events are finally removed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5945 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-04 00:52:54 +00:00
delangel	35df80de14	Updated md5 due to changes to changes in QUAL field when in Genotype given alleles mode w/indels when in insertions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5944 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 23:52:38 +00:00
ebanks	b93829e505	The underlying bam file for this test was busted for many reasons preventing Picard folks from making unrelated changes, so I needed to fix it. Updating md5s accordingly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5943 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 22:26:06 +00:00
depristo	cd293f145b	More stable reduced reads representation. Bug fixes throughout. No diffs by <1% of sites in an exome, and the majority of these differences are filtered out, or are obvious artifacts. UnitTests for BaseCounts. BaseCounts extended to handle indels, but not yet enabled in the consensus reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5939 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 20:11:31 +00:00
ebanks	420d8feff6	No one should be calling the createHeader method(s) directly, but instead should be going through the full readHeader method (because it first sets the version); therefore I made them package protected and merged them. Updated the various unit tests that were using createHeader and were dangerously assuming that the header version was defaulting to 4.0 (which it no longer does). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5934 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-03 02:17:37 +00:00
depristo	86df10ec09	UnitTests for ConsensusSpan infrastructure git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5929 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 22:44:52 +00:00
ebanks	dd6d61c031	Adding integration test to cover the case of a read that only covers an insertion (i.e. no M in the CIGAR string). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5923 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 21:02:47 +00:00
chartl	84c2c5d7e6	Stop running away from my commits, test modules. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5919 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 13:05:53 +00:00
chartl	092952db44	After verifying that the changes to these tests were all in the RankSum annotations, I'm commiting fixes to the test md5s. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5918 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-02 13:01:18 +00:00
chartl	511cd48d7a	There is an edge case ( \|Set1\| = 5, \|Set2\| = 4) where the exact p-value exceeds the range of the normal distribution we want to invert. For the edge cases, this happens exactly at the mean, and so this can be safely replaced with a z value of 0.0 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5915 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-01 17:30:09 +00:00
chartl	a79967d9af	After extensive testing of MannWhitneyU: - Verified that exact calculations do agree with R's dwilcox() - Verified that exact calculations do not agree with R's wilcox.test + This is because R does a correction, and calculates CDFs rather than PDFs (e.g. sums over dwilcox() values) - Can now specify MWU to calculate cumulative exact tests, rather than point probabilities - Z-scores are now calculated properly for exact tests + Previously, z-values calculated by inverting normal CDF from U-statistic PDF + Now both inversions are done, with a smart heuristic (biased variance) to make the point-calculated Z-value more accurate + Additional tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5911 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-01 15:51:27 +00:00
rpoplin	2b5683909e	Updated VQSR integration tests because of the new Omni file. Fixed overflow condition in FisherStrand when the depth is too high. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5910 348d0f76-0448-11de-a6fe-93d51630548a	2011-06-01 14:20:37 +00:00
ebanks	44cb7e4980	Renaming to make grepping through the output less confusing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5908 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-31 19:54:44 +00:00
rpoplin	9e834391fe	We now skip over all covering RODs in the BQSR as intended instead of just those which can be converted into a VariantContext. All the integration tests change because of subtleties in how certain dbsnp rod records are being converted into VCs. Added integration test which uses a bed file as the list of known polymorphic sites. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5892 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-27 16:32:17 +00:00
depristo	8ed82e5a08	The previous version of the UG was always creating BAQ'd pileups for the underlying site QUAL calculation. This resulted in some slowdown in the code. But as far as I can tell, the code actually didn't apply the BAQ'd base quality anywhere when the BAQ field wasn't in the read, so this just saves us 20% of the runtime when BAQ isn't enabled from heading into the BAQ subsystem when we don't actually want to get the BAQ'd base qualities. Fixed minor problem with WalkerTest for "" (for parameterization) md5s. Added an explicit integrationtest for BAQ NONE Now only creates the BAQ'd pileup, if the useBAQPileup parameter is provide in initializeAlternateAllele. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5891 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-27 14:00:52 +00:00
depristo	136c8c7900	ClipReads now supports HARDCLIP_BASES, though in fact this turned out to be not necessary for my desired tests. In the process of developing the HARDCLIP mode, I added some proper ReadUtils unit tests, which would ideally be expanded to include other ReadUtil functions, as added git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5890 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-27 11:42:22 +00:00
delangel	f7298f4a7f	First of many baby steps to redo way in which we trigger events for indel calling and to eliminate extended events: get rid of SpanningDeletions annotation for indels. It's completely useless, and even more so once we no longer trigger at extended events (because we'll trigger by definition a base before a deletion starts, so deletions present in the current pileup are not informative). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5876 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-26 00:49:23 +00:00
depristo	1bd1404aa9	Sometimes md5s can be null git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5867 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-24 19:17:18 +00:00
depristo	e582a92af6	WalkerTest now checks for valid md5s in the integrationtests themselves, so no more stray whitespace errors. Added a WalkerTestTest to ensure tha t bad MD5s are detected and an error thrown git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5865 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-24 14:34:55 +00:00
hanna	06486c134a	Kill extra space in the md5. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5863 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-24 12:00:31 +00:00
depristo	57e4693e4c	Slightly better error message when failing to create the index on the fly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5861 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-24 11:04:08 +00:00
depristo	cf3dbfee97	Renamed variantMergeOptions to filteredRecordsMergeType, as this is really what it does. Cleaned up the wiki so that it's clear what this does, as well as included an example of how to create an intersection with CombineVariants and SelectVariants. Added integrationtests of CombineVariants with OMNI and HapMap that deal with the two ways to merge fitlered/unfiltered records at the same site. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5860 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-24 01:54:29 +00:00
hanna	4bfec4c55b	Reenabling E.coli ValidatingPileup with MV1994 realigned using the BWA/C bindings. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5856 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 21:32:53 +00:00
hanna	5dca1e4d2e	Make IntervalIntegrationTest aware of the new alignments in the MV1994.bam testset. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5852 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 19:59:47 +00:00
chartl	7ff5375493	Removing build-killing dependency on a private package. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5851 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 18:13:15 +00:00
chartl	0b07373909	Incorporating old feedback from eric: @deprecated methods should not be @deprecated, but rather protected, and the test's package moved to where it can access those test methods. Also allows for the slightly more awesome name "MWUnitTest" git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5850 348d0f76-0448-11de-a6fe-93d51630548a	2011-05-23 18:06:05 +00:00

1 2 3 4 5 ...

1198 Commits (fbe157137f33107289302a24ca022a9fbc2ba046)