Final PED documentation

This commit is contained in:
Mark DePristo 2011-10-05 15:17:41 -07:00
parent 3226d5dc0d
commit be2d29ce69
2 changed files with 45 additions and 29 deletions

View File

@ -213,55 +213,63 @@ public class GATKArgumentCollection {
// --------------------------------------------------------------------------------------------------------------
/**
* Reads PED file-formatted tabular text files describing meta-data about the samples being
* processed in the GATK.
* <p>Reads PED file-formatted tabular text files describing meta-data about the samples being
* processed in the GATK.</p>
*
* See http://www.broadinstitute.org/mpg/tagger/faq.html
* See http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped
* <ul>
* <li>see <a href="http://www.broadinstitute.org/mpg/tagger/faq.html">http://www.broadinstitute.org/mpg/tagger/faq.html</a></li>
* <li>see <a href="http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped">http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped</a></li>
* </ul>
*
* The PED file is a white-space (space or tab) delimited file: the first six columns are mandatory:
* <p>The PED file is a white-space (space or tab) delimited file: the first six columns are mandatory:</p>
*
* Family ID
* Individual ID
* Paternal ID
* Maternal ID
* Sex (1=male; 2=female; other=unknown)
* Phenotype
* <ul>
* <li>Family ID</li>
* <li>Individual ID</li>
* <li>Paternal ID</li>
* <li>Maternal ID</li>
* <li>Sex (1=male; 2=female; other=unknown)</li>
* <li>Phenotype</li>
* </ul>
*
* The IDs are alphanumeric: the combination of family and individual ID should uniquely identify a person.
* <p>The IDs are alphanumeric: the combination of family and individual ID should uniquely identify a person.
* A PED file must have 1 and only 1 phenotype in the sixth column. The phenotype can be either a
* quantitative trait or an affection status column: GATK will automatically detect which type
* (i.e. based on whether a value other than 0, 1, 2 or the missing genotype code is observed).
* (i.e. based on whether a value other than 0, 1, 2 or the missing genotype code is observed).</p>
*
* If an individual's sex is unknown, then any character other than 1 or 2 can be used.
* <p>If an individual's sex is unknown, then any character other than 1 or 2 can be used.</p>
*
* You can add a comment to a PED or MAP file by starting the line with a # character. The rest of that
* line will be ignored. Do not start any family IDs with this character therefore.
* <p>You can add a comment to a PED or MAP file by starting the line with a # character. The rest of that
* line will be ignored. Do not start any family IDs with this character therefore.</p>
*
* Affection status should be coded:
* <p>Affection status should be coded:</p>
*
* -9 missing
* 0 missing
* 1 unaffected
* 2 affected
* <ul>
* <li>-9 missing</li>
* <li>0 missing</li>
* <li>1 unaffected</li>
* <li>2 affected</li>
* </ul>
*
* If any value outside of -9,0,1,2 is detected than the samples are assumed
* <p>If any value outside of -9,0,1,2 is detected than the samples are assumed
* to phenotype values are interpreted as string phenotype values. In this case -9 uniquely
* represents the missing value.
* represents the missing value.</p>
*
* Genotypes (column 7 onwards) cannot be specified to the GATK.
* <p>Genotypes (column 7 onwards) cannot be specified to the GATK.</p>
*
* For example, here are two individuals (one row = one person):
* <p>For example, here are two individuals (one row = one person):</p>
*
* <pre>
* FAM001 1 0 0 1 2
* FAM001 2 0 0 1 2
* </pre>
*
* Each -ped argument can be tagged with NO_FAMILY_ID, NO_PARENTS, NO_SEX, NO_PHENOTYPE to
* tell the GATK PED parser that the corresponding fields are missing from the ped file.
* <p>Each -ped argument can be tagged with NO_FAMILY_ID, NO_PARENTS, NO_SEX, NO_PHENOTYPE to
* tell the GATK PED parser that the corresponding fields are missing from the ped file.</p>
*
* Note that most GATK walkers do not use pedigree information. Walkers that require pedigree
* <p>Note that most GATK walkers do not use pedigree information. Walkers that require pedigree
* data should clearly indicate so in their arguments and will throw errors if required pedigree
* information is missing.
* information is missing.</p>
*/
@Argument(fullName="pedigree", shortName = "ped", doc="Pedigree files for samples",required=false)
public List<File> pedigreeFiles = Collections.emptyList();

View File

@ -28,6 +28,14 @@ package org.broadinstitute.sting.gatk.samples;
*
*/
public enum PedigreeValidationType {
/**
* Require if a pedigree file is provided at all samples in the VCF or BAM files have a corresponding
* entry in the pedigree file(s).
*/
STRICT,
/**
* Do not enforce any overlap between the VCF/BAM samples and the pedigree data
* */
SILENT
}