gatk3的最后一个经典版本3.8
 
 
 
 
Go to file
depristo 6a49e8df34 Significant change to the way subsetting by sample works with monomorphic sites. Now keeps the alt allele, even if a record is AC=0 after the subset. Previously, the system dropped the alt allele, which I don't think is the right behavior. If you really want a VCF without monomorphic sites, use the option to drop monomorphic sites after subsetting. See detailed information below.
Right now, if you select a multi-sample VCF file down (or one with filters I see) down to a smaller set of samples, and the site isn't polymorphic in that subgroup, then the alt allele is lost.  For example, when selecting down NA12878 from the OMNI, I previously received the following VCF:

1       82154   rs4477212       A       .       .       PASS    AC=0;AF=0.00;AN=2;CR=100.0;DP=0;GentrainScore=0.7826;HW=1.0     GT:GC   0/0:0.7205
1       534247  SNP1-524110     C       .       .       PASS    AC=0;AF=0.00;AN=2;CR=99.93414;DP=0;GentrainScore=0.7423;HW=1.0  GT:GC   0/0:0.6491
1       565286  SNP1-555149     C       T       .       PASS    AC=2;AF=1.00;AN=2;CR=98.8266;DP=0;GentrainScore=0.7029;HW=1.0   GT:GC   1/1:0.3471
1       569624  SNP1-559487     T       C       .       PASS    AC=2;AF=1.00;AN=2;CR=97.8022;DP=0;GentrainScore=0.8070;HW=1.0   GT:GC   1/1:0.3942

Where the first two records lost the ALT allele, because NA12878 is hom-ref at this site.  My change results in a VCF that looks like:

1       82154   rs4477212       A       G       .       PASS    AC=0;AF=0.00;AN=2;CR=100.0;DP=0;GentrainScore=0.7826;HW=1.0     GT:GC   0/0:0.7205
1       534247  SNP1-524110     C       T       .       PASS    AC=0;AF=0.00;AN=2;CR=99.93414;DP=0;GentrainScore=0.7423;HW=1.0  GT:GC   0/0:0.6491
1       565286  SNP1-555149     C       T       .       PASS    AC=2;AF=1.00;AN=2;CR=98.8266;DP=0;GentrainScore=0.7029;HW=1.0   GT:GC   1/1:0.3471
1       569624  SNP1-559487     T       C       .       PASS    AC=2;AF=1.00;AN=2;CR=97.8022;DP=0;GentrainScore=0.8070;HW=1.0   GT:GC   1/1:0.3942

The genotype remains unchanged, but the ALT allele is now preserved.  I think this is the correct behavior, as reducing samples down shouldn't change the character of the site, only the AC in the subpopulation.  This is related to the tricky issue of isPolymorphic() vs. isVariant().  

isVariant => is there an ALT allele?
isPolymorphic => is some sample non-ref in the samples?

In part this is complicated as the semantics of sites-only VCFs, where ALT = . is used to mean not-polymorphic.  Unfortunately, I just don't think there's a consistent convention right now, but it might be worth at some point to adopt a single approach to handling this.  Wiki docs updated.

Does anyone have critical infrastructure that depends on the previous convention?  Let me know so we can coordinate the change.

There's a new function subContextFromGenotypes() that also takes a Set<Allele> to handle this type of behavior.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5832 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-21 13:59:16 +00:00
R Removing requirement of providing known track in VQSR for the non-humans. Updating placement of legend on tranche plot. 2011-05-05 20:24:06 +00:00
analysis/depristo Added splitContextByReadGroup() and fixed bug in getPileupForReadGroup() that resulted in a NPE when no reads where present for a read group. 2011-05-15 17:36:07 +00:00
archive Moving GLF code to archive 2011-01-15 22:42:42 +00:00
c Bug fixes for the bwa aligner and changes to support compiling against newer releases of the bwa code base. 2010-12-17 14:49:15 +00:00
chainFiles Renaming for consistency 2011-05-10 16:36:39 +00:00
doc removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also 2010-08-19 00:42:37 +00:00
java Significant change to the way subsetting by sample works with monomorphic sites. Now keeps the alt allele, even if a record is AC=0 after the subset. Previously, the system dropped the alt allele, which I don't think is the right behavior. If you really want a VCF without monomorphic sites, use the option to drop monomorphic sites after subsetting. See detailed information below. 2011-05-21 13:59:16 +00:00
lua forgot to remove a debug line. 2011-02-15 16:25:48 +00:00
matlab Another matlab script -- this time for making power and coverage plots over a specific gene region. Lots of fun file reading, string manipulation, and exploration of the set() function 2009-11-30 20:02:25 +00:00
packages Added JavaMail dependencies to Queue package since bcel wasn't picking them up. 2011-04-23 20:48:40 +00:00
perl Quit immediately with an error message if any of the individual steps fails. 2011-04-22 13:23:33 +00:00
python A helper script that will take a list of bams, a list of case sample IDs, and a list of control sample IDs, and generate a sample meta data yaml (which includes the bamfiles) 2011-03-21 16:11:55 +00:00
ruby accidentally commited an old tool 2010-08-25 15:42:02 +00:00
scala Contracts for GenomeLocParser and GenomeLoc are now fully implemented. 2011-05-21 02:01:59 +00:00
settings Contracts for Java (http://code.google.com/p/cofoja/) infrastructure enabled. No piece of code actually uses this, so it's possible to remove easily. Does not build by default (you must modify build.xml). Really an intermediate commit so I can play around with the system in my java classes and revert safely. Very much looking forward to DVCS 2011-05-18 18:05:59 +00:00
shell Useful utility for looking at the file size of GSA file systems 2011-04-02 03:47:27 +00:00
testdata Updating VariantGaussianMixtureModelUnitTest to use truth sensitivity cutting 2011-05-04 13:56:01 +00:00
LICENSE Adding a license to the root directory in case BOSC checks for one. Has the 2010-04-20 16:04:29 +00:00
build.xml Contracts for java now enabled by default in GATK build. The contract checking is automatically enabled when running tests and integrationtests. If you want to run the GATK with Contract checking enabled, add -javaagent:lib/cofoja.jar to your jvm args 2011-05-20 02:53:42 +00:00
ivy.xml Factor out all testing dependencies into a separate test configuration and 2011-05-05 22:42:11 +00:00