gatk-3.8/java/test/org/broadinstitute/sting
carneiro 91fb664135 Many updates to SelectVariants :
1) There is now a different parameter for sample name (-sn), sample file (-sf) or sample expression (-se). The unexpected behavior of the previous implementation was way too tricky to leave unchecked. (if you had a file or directory named after a sample name, SV wouldn't work)

1b) Fixed a TODO added by Eric -- now the output vcf always has the samples sorted alphabetically regardless of input (this came as a byproduct of the implementation of 1)

2) Discordance and Concordance now work in combination with all other parameters.

3) Discordance now follows Guillermo's suggestion where the discordance track is your VCF and the variant track is the one you are comparing to. I have updated the example in the wiki to reflect this change in interpretation. 

4) If you DON'T provide any samples (-sn, -se or -sf), SelectVariants works with all samples from the VCF and ignores sample/genotype information when doing concordance or discordance. That is, it will report every "missing line" or "concordant line" in the two vcfs, regardless of sample or genotype information.

5) When samples are provided (-sn, -se or -sf) discordance and concordance will go down to the genotypes to determine whether or not you have a discordance/concordance event. In this case, a concordance happens only when the two VCFs display the same sample/genotype information for that locus, and discordance happens when the disc track is missing the line or has a different genotype information for that sample. 

6) When dealing with multiple samples, concordance only happens if ALL your samples agree, and discordance happens if AT LEAST ONE of your samples disagree.

---

Integration tests:

1) Discordance and concordance test added
2) All other tests updated to comply with the new 'sorted output' format and different inputs for samples.

---

Methods for handling sample expressions and files with list of samples were added to SampleUtils. I recommend *NOT USING* the old getSamplesFromCommandLineInput as this mixing of sample names with expressions and files creates a rogue error that can be challenging to catch.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6072 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:18:45 +00:00
..
alignment 2 unrelated changes: 1) fix the variant context adaptor for dbsnp; conversion of deletions was totally broken. 2) stop using paths that include gsa-scr1 in integration tests. 2011-06-22 22:56:07 +00:00
commandline Fixed long-standing bug reported by Mauricio where @Arguments assigned to 2011-01-12 22:18:24 +00:00
datasources/pipeline Removed deprecated getDbsnpFile. 2011-02-08 21:12:15 +00:00
gatk Many updates to SelectVariants : 2011-06-23 20:18:45 +00:00
jna Added an integration test showing how to use LSF C API to get LSF parameters. 2011-06-21 22:54:55 +00:00
oneoffprojects/walkers Incorporating old feedback from eric: @deprecated methods should not be @deprecated, but rather protected, and the test's package moved to where it can access those test methods. 2011-05-23 18:06:05 +00:00
playground/gatk/walkers Fixed up md5s 2011-06-16 16:20:41 +00:00
utils Factorial and log Factorial utilities avoiding overflow using the gamma function. Lots of unit tests. Everything is working great. 2011-06-22 22:55:20 +00:00
BaseTest.java Many updates to SelectVariants : 2011-06-23 20:18:45 +00:00
StingTextReporter.java Cleanup testng listener configuration. 2010-11-15 23:43:14 +00:00
WalkerTest.java The previous version of the UG was always creating BAQ'd pileups for the underlying site QUAL calculation. This resulted in some slowdown in the code. But as far as I can tell, the code actually didn't apply the BAQ'd base quality anywhere when the BAQ field wasn't in the read, so this just saves us 20% of the runtime when BAQ isn't enabled from heading into the BAQ subsystem when we don't actually want to get the BAQ'd base qualities. 2011-05-27 14:00:52 +00:00