gatk-3.8/java/test/org/broadinstitute/sting/gatk
carneiro 91fb664135 Many updates to SelectVariants :
1) There is now a different parameter for sample name (-sn), sample file (-sf) or sample expression (-se). The unexpected behavior of the previous implementation was way too tricky to leave unchecked. (if you had a file or directory named after a sample name, SV wouldn't work)

1b) Fixed a TODO added by Eric -- now the output vcf always has the samples sorted alphabetically regardless of input (this came as a byproduct of the implementation of 1)

2) Discordance and Concordance now work in combination with all other parameters.

3) Discordance now follows Guillermo's suggestion where the discordance track is your VCF and the variant track is the one you are comparing to. I have updated the example in the wiki to reflect this change in interpretation. 

4) If you DON'T provide any samples (-sn, -se or -sf), SelectVariants works with all samples from the VCF and ignores sample/genotype information when doing concordance or discordance. That is, it will report every "missing line" or "concordant line" in the two vcfs, regardless of sample or genotype information.

5) When samples are provided (-sn, -se or -sf) discordance and concordance will go down to the genotypes to determine whether or not you have a discordance/concordance event. In this case, a concordance happens only when the two VCFs display the same sample/genotype information for that locus, and discordance happens when the disc track is missing the line or has a different genotype information for that sample. 

6) When dealing with multiple samples, concordance only happens if ALL your samples agree, and discordance happens if AT LEAST ONE of your samples disagree.

---

Integration tests:

1) Discordance and concordance test added
2) All other tests updated to comply with the new 'sorted output' format and different inputs for samples.

---

Methods for handling sample expressions and files with list of samples were added to SampleUtils. I recommend *NOT USING* the old getSamplesFromCommandLineInput as this mixing of sample names with expressions and files creates a rogue error that can be challenging to catch.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6072 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:18:45 +00:00
..
arguments A significant refactoring of the ROD system, done largely to simplify the process of 2010-12-31 04:52:22 +00:00
contexts/variantcontext 2 unrelated changes: 1) fix the variant context adaptor for dbsnp; conversion of deletions was totally broken. 2) stop using paths that include gsa-scr1 in integration tests. 2011-06-22 22:56:07 +00:00
datasources Contracts for GenomeLocParser and GenomeLoc are now fully implemented. 2011-05-21 02:01:59 +00:00
executive Changing testing framework from junit -> testng, for its enhanced configurability. 2010-11-01 21:31:44 +00:00
filters Convert GenomeLocParser into an instance variable. This change is required 2010-11-10 17:59:50 +00:00
iterators Some refactoring that Mauricio and I worked through together. Changed filters 2011-05-04 19:29:08 +00:00
refdata PLEASE READ ME! In order to prepare for the upcoming changes to VCF4, we felt it was best to split up the vcf3 and vcf4 codecs (vcf4 is not backwards compatible to vcf3 and certain changes are too complex to handle in both codecs). Using the 'VCF' rod type in the GATK will now throw a UserException for vcf3.2 or vcf3.3 files telling you to use the 'VCF3' type instead (and vice versa). Integration/unit tests have been updated. For programmers: note that there is currently a lot of code duplication in the two codecs (although I pulled out the easy stuff to a VCFCodecUtils class); however WE ARE FREEZING THE VCF3 CODEC AND WILL NO LONGER MAKE CHANGES TO IT. All updates/improvements will be targetted to the vcf4 codec only as vcf3 is there only to be able to read legacy files. People should really be using vcf4 files only. 2011-05-11 12:07:44 +00:00
report Added a utility method to retrieve the contig lengths for WG chunking. 2011-04-20 19:22:21 +00:00
traversals Fix requested by Lee Lichtenstein: first check to see whether it's time for 2011-04-20 03:22:48 +00:00
walkers Many updates to SelectVariants : 2011-06-23 20:18:45 +00:00
GenomeAnalysisEngineUnitTest.java Contracts for GenomeLocParser and GenomeLoc are now fully implemented. 2011-05-21 02:01:59 +00:00
WalkerManagerUnitTest.java Changing testing framework from junit -> testng, for its enhanced configurability. 2010-11-01 21:31:44 +00:00