gatk-3.8/java/test/org/broadinstitute/sting/gatk/walkers
carneiro 91fb664135 Many updates to SelectVariants :
1) There is now a different parameter for sample name (-sn), sample file (-sf) or sample expression (-se). The unexpected behavior of the previous implementation was way too tricky to leave unchecked. (if you had a file or directory named after a sample name, SV wouldn't work)

1b) Fixed a TODO added by Eric -- now the output vcf always has the samples sorted alphabetically regardless of input (this came as a byproduct of the implementation of 1)

2) Discordance and Concordance now work in combination with all other parameters.

3) Discordance now follows Guillermo's suggestion where the discordance track is your VCF and the variant track is the one you are comparing to. I have updated the example in the wiki to reflect this change in interpretation. 

4) If you DON'T provide any samples (-sn, -se or -sf), SelectVariants works with all samples from the VCF and ignores sample/genotype information when doing concordance or discordance. That is, it will report every "missing line" or "concordant line" in the two vcfs, regardless of sample or genotype information.

5) When samples are provided (-sn, -se or -sf) discordance and concordance will go down to the genotypes to determine whether or not you have a discordance/concordance event. In this case, a concordance happens only when the two VCFs display the same sample/genotype information for that locus, and discordance happens when the disc track is missing the line or has a different genotype information for that sample. 

6) When dealing with multiple samples, concordance only happens if ALL your samples agree, and discordance happens if AT LEAST ONE of your samples disagree.

---

Integration tests:

1) Discordance and concordance test added
2) All other tests updated to comply with the new 'sorted output' format and different inputs for samples.

---

Methods for handling sample expressions and files with list of samples were added to SampleUtils. I recommend *NOT USING* the old getSamplesFromCommandLineInput as this mixing of sample names with expressions and files creates a rogue error that can be challenging to catch.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6072 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:18:45 +00:00
..
annotator Annotations used to build the 1000G consensus callsets are now standard annotations 2011-06-09 17:03:39 +00:00
beagle PLEASE READ ME! In order to prepare for the upcoming changes to VCF4, we felt it was best to split up the vcf3 and vcf4 codecs (vcf4 is not backwards compatible to vcf3 and certain changes are too complex to handle in both codecs). Using the 'VCF' rod type in the GATK will now throw a UserException for vcf3.2 or vcf3.3 files telling you to use the 'VCF3' type instead (and vice versa). Integration/unit tests have been updated. For programmers: note that there is currently a lot of code duplication in the two codecs (although I pulled out the easy stuff to a VCFCodecUtils class); however WE ARE FREEZING THE VCF3 CODEC AND WILL NO LONGER MAKE CHANGES TO IT. All updates/improvements will be targetted to the vcf4 codec only as vcf3 is there only to be able to read legacy files. People should really be using vcf4 files only. 2011-05-11 12:07:44 +00:00
coverage -ct x no longer includes coverage in the previous bin 2011-02-24 15:52:04 +00:00
fasta 2 unrelated changes: 1) fix the variant context adaptor for dbsnp; conversion of deletions was totally broken. 2) stop using paths that include gsa-scr1 in integration tests. 2011-06-22 22:56:07 +00:00
filters As promised, VariantFiltration can now mask out sites within a user-specified window around the provided mask rod. By default the window is 0, but you can now use the --maskExtension argument to increase that value. Added integration tests to cover this new functionality. 2011-06-22 22:55:29 +00:00
genotyper Annotations used to build the 1000G consensus callsets are now standard annotations 2011-06-09 17:03:39 +00:00
indels Changing the default behavior of the IndelRealigner to run without Smith-Waterman. Changed around the integration tests accordingly. 2011-06-22 22:53:58 +00:00
phasing Remove the extra trailing tab at the end of the VCF ## header line. Unfortunately, this meant updating every freaking integration test. 2010-12-08 17:22:29 +00:00
qc Reenabling E.coli ValidatingPileup with MV1994 realigned using the BWA/C bindings. 2011-05-23 21:32:53 +00:00
recalibration 2 unrelated changes: 1) fix the variant context adaptor for dbsnp; conversion of deletions was totally broken. 2) stop using paths that include gsa-scr1 in integration tests. 2011-06-22 22:56:07 +00:00
sequenom continuing from last night, the integration tests weren't covering the right behavior either 2011-04-28 13:30:57 +00:00
varianteval Added stratification by discrete allele count, just like AF, but requiring genotypes so it can be exact. Added docs on wiki, and integrationtest using Kiran's very nice fundamental VCF 2011-06-19 03:11:00 +00:00
variantrecalibration Misc cleanup in VQSR. 2011-06-09 18:37:37 +00:00
variantutils Many updates to SelectVariants : 2011-06-23 20:18:45 +00:00
BAQIntegrationTest.java Better query start / stop function that directly parses the cigar string, unlike the previous version. Now properly handles H (hard-clipped) reads. Added -baq OFF and -baq RECALCULATE integration tests on all three 1KG technologies. Please let me know if this new code somehow fails. 2011-01-28 15:08:21 +00:00
ClipReadsWalkersIntegrationTest.java Changing testing framework from junit -> testng, for its enhanced configurability. 2010-11-01 21:31:44 +00:00
PileupWalkerIntegrationTest.java Updated to now longer include 2nd-best base output 2011-04-03 20:13:10 +00:00
PrintReadsIntegrationTest.java Added integration test for -n parameter 2011-06-22 22:53:22 +00:00
PrintReadsWalkerUnitTest.java Changing testing framework from junit -> testng, for its enhanced configurability. 2010-11-01 21:31:44 +00:00