Add info about multiple input samples (as relevant for M2)

Also generalize references to the tool/caller since this code is now shared by HC and M2
2015-07-23 09:44:13 -04:00 · 2015-07-23 09:44:13 -04:00 · 85b340caed
parent 66cf22b28f
commit 85b340caed
1 changed files with 11 additions and 6 deletions
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/AssemblyBasedCallerArgumentCollection.java
+++ b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/AssemblyBasedCallerArgumentCollection.java
@ -89,11 +89,12 @@ public class AssemblyBasedCallerArgumentCollection extends StandardCallerArgumen
    }

    /**
-     * The assembled haplotypes will be written as BAM to this file if requested.  Really for debugging purposes only.
-     * Note that the output here does not include uninformative reads so that not every input read is emitted to the bam.
+     * The assembled haplotypes and locally realigned reads will be written as BAM to this file if requested.  Really
+     * for debugging purposes only. Note that the output here does not include uninformative reads so that not every
+     * input read is emitted to the bam.
     *
-     * Turning on this mode may result in serious performance cost for the HC.  It's really only appropriate to
-     * use in specific areas where you want to better understand why the HC is making specific calls.
+     * Turning on this mode may result in serious performance cost for the caller.  It's really only appropriate to
+     * use in specific areas where you want to better understand why the caller is making specific calls.
     *
     * The reads are written out containing an "HC" tag (integer) that encodes which haplotype each read best matches
     * according to the haplotype caller's likelihood calculation.  The use of this tag is primarily intended
@ -101,14 +102,18 @@ public class AssemblyBasedCallerArgumentCollection extends StandardCallerArgumen
     * easily see which reads go with these haplotype.
     *
     * Note that the haplotypes (called or all, depending on mode) are emitted as single reads covering the entire
-     * active region, coming from read HC and a special read group.
+     * active region, coming from sample "HC" and a special read group called "ArtificialHaplotype". This will increase the
+     * pileup depth compared to what would be expected from the reads only, especially in complex regions.
     *
     * Note also that only reads that are actually informative about the haplotypes are emitted.  By informative we mean
     * that there's a meaningful difference in the likelihood of the read coming from one haplotype compared to
     * its next best haplotype.
     *
+     * If multiple BAMs are passed as input to the tool (as is common for M2), then they will be combined in the bamout
+     * output and tagged with the appropriate sample names.
+     *
     * The best way to visualize the output of this mode is with IGV.  Tell IGV to color the alignments by tag,
-     * and give it the HC tag, so you can see which reads support each haplotype.  Finally, you can tell IGV
+     * and give it the "HC" tag, so you can see which reads support each haplotype.  Finally, you can tell IGV
     * to group by sample, which will separate the potential haplotypes from the reads.  All of this can be seen in
     * <a href="https://www.dropbox.com/s/xvy7sbxpf13x5bp/haplotypecaller%20bamout%20for%20docs.png">this screenshot</a>
     *