* Compares two record-oriented files, itemizing specific difference between equivalent * records in the two files. Reports both itemized and summarized differences. + *
+ * What are the summarized differences and the DiffObjectsWalker + *
+ * The GATK contains a summarizing difference engine that compares hierarchical data structures to emit: + *
+ * The GATK contains a private walker DiffObjects that allows you access to the DiffEngine capabilities on the command line. Simply provide the walker with the master and test files and it will emit summarized differences for you. + * + *
+ * Why? + *
+ * The reason for this system is that it allows you to compare two structured files -- such as BAMs and VCFs -- for common differences among them. This is primarily useful in regression testing or optimization, where you want to ensure that the differences are those that you expect and not any others. + * + *
Understanding the output + *
The DiffEngine system compares to two hierarchical data structures for specific differences in the values of named + * nodes. Suppose I have two trees: + *
+ * Tree1=(A=1 B=(C=2 D=3)) + * Tree2=(A=1 B=(C=3 D=3 E=4)) + * Tree3=(A=1 B=(C=4 D=3 E=4)) + *+ *
+ * where every node in the tree is named, or is a raw value (here all leaf values are integers). The DiffEngine + * traverses these data structures by name, identifies equivalent nodes by fully qualified names + * (Tree1.A is distinct from Tree2.A, and determines where their values are equal (Tree1.A=1, Tree2.A=1, so they are). + * These itemized differences are listed as: + *
+ * Tree1.B.C=2 != Tree2.B.C=3 + * Tree1.B.C=2 != Tree3.B.C=4 + * Tree2.B.C=3 != Tree3.B.C=4 + * Tree1.B.E=MISSING != Tree2.B.E=4 + *+ *
+ * This conceptually very similar to the output of the unix command line tool diff. What's nice about DiffEngine though + * is that it computes similarity among the itemized differences and displays the count of differences names + * in the system. In the above example, the field C is not equal three times, while the missing E in Tree1 occurs + * only once. So the summary is: + * + *
+ * *.B.C : 3 + * *.B.E : 1 + *+ *
where the * operator indicates that any named field matches. This output is sorted by counts, and provides an + * immediate picture of the commonly occurring differences among the files. + *
+ * Below is a detailed example of two VCF fields that differ because of a bug in the AC, AF, and AN counting routines, + * detected by the integrationtest integration (more below). You can see that in the although there are many specific + * instances of these differences between the two files, the summarized differences provide an immediate picture that + * the AC, AF, and AN fields are the major causes of the differences. + *
+ *
+ [testng] path count + [testng] *.*.*.AC 6 + [testng] *.*.*.AF 6 + [testng] *.*.*.AN 6 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000000.AC 1 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000000.AF 1 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000000.AN 1 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000117.AC 1 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000117.AF 1 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000117.AN 1 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000211.AC 1 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000211.AF 1 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000211.AN 1 + [testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000598.AC 1 ++ * * @author Mark DePristo - * @version 0.1 + * @since 7/4/11 */ @Requires(value={}) public class DiffObjectsWalker extends RodWalker
+ * It is a three-state stratification: + *
@WalkerName tag.
- * @return false always
- */
- @Override
- public boolean inOverview() {
- return true;
- }
-
- /**
- * Will return true to indicate that packages can be given useful
- * description.
- * @return true always
- */
- @Override
- public boolean inPackage() {
- return true;
- }
-
- /**
- * Register this Taglet.
- * @param tagletMap the map to register this tag to.
- */
- public static void register(Map tagletMap) {
- DescriptionTaglet tag = new DescriptionTaglet();
- Taglet t = (Taglet)tagletMap.get(tag.getName());
- if (t != null) {
- tagletMap.remove(tag.getName());
- }
- tagletMap.put(tag.getName(), tag);
- }
-}
\ No newline at end of file
diff --git a/public/java/src/org/broadinstitute/sting/utils/help/DisplayNameTaglet.java b/public/java/src/org/broadinstitute/sting/utils/help/DisplayNameTaglet.java
deleted file mode 100644
index 6c6dad736..000000000
--- a/public/java/src/org/broadinstitute/sting/utils/help/DisplayNameTaglet.java
+++ /dev/null
@@ -1,49 +0,0 @@
-package org.broadinstitute.sting.utils.help;
-
-import com.sun.tools.doclets.Taglet;
-
-import java.util.Map;
-
-/**
- * Provide a display name in the help for packages
- *
- * @author mhanna
- * @version 0.1
- */
-public class DisplayNameTaglet extends HelpTaglet {
- /**
- * The display name for this taglet.
- */
- public static final String NAME = "help.display.name";
-
- /**
- * Return the name of this custom tag.
- */
- @Override
- public String getName() {
- return NAME;
- }
-
- /**
- * Will return true to indicate that packages can be given useful
- * display text.
- * @return true always
- */
- @Override
- public boolean inPackage() {
- return true;
- }
-
- /**
- * Register this Taglet.
- * @param tagletMap the map to register this tag to.
- */
- public static void register(Map tagletMap) {
- DisplayNameTaglet tag = new DisplayNameTaglet();
- Taglet t = (Taglet)tagletMap.get(tag.getName());
- if (t != null) {
- tagletMap.remove(tag.getName());
- }
- tagletMap.put(tag.getName(), tag);
- }
-}
diff --git a/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeature.java b/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeature.java
new file mode 100644
index 000000000..710503ca8
--- /dev/null
+++ b/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeature.java
@@ -0,0 +1,44 @@
+/*
+ * Copyright (c) 2011, The Broad Institute
+ *
+ * Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+package org.broadinstitute.sting.utils.help;
+
+import java.lang.annotation.*;
+
+/**
+ * An annotation to identify a class as a GATK capability for documentation
+ *
+ * @author depristo
+ */
+@Documented
+@Inherited
+@Retention(RetentionPolicy.RUNTIME)
+@Target(ElementType.TYPE)
+public @interface DocumentedGATKFeature {
+ public boolean enable() default true;
+ public String groupName();
+ public String summary() default "";
+ public Class extends DocumentedGATKFeatureHandler> handler() default GenericDocumentationHandler.class;
+ public Class[] extraDocs() default {};
+}
diff --git a/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeatureHandler.java b/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeatureHandler.java
new file mode 100644
index 000000000..366df0c3a
--- /dev/null
+++ b/public/java/src/org/broadinstitute/sting/utils/help/DocumentedGATKFeatureHandler.java
@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2011, The Broad Institute
+ *
+ * Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+package org.broadinstitute.sting.utils.help;
+
+import com.sun.javadoc.ClassDoc;
+import com.sun.javadoc.RootDoc;
+
+import java.io.*;
+import java.util.Set;
+
+/**
+ *
+ */
+public abstract class DocumentedGATKFeatureHandler {
+ private GATKDoclet doclet;
+
+ protected RootDoc getRootDoc() {
+ return this.doclet.rootDoc;
+ }
+
+ public void setDoclet(GATKDoclet doclet) {
+ this.doclet = doclet;
+ }
+
+ public GATKDoclet getDoclet() {
+ return doclet;
+ }
+
+ public boolean shouldBeProcessed(ClassDoc doc) { return true; }
+
+ public String getDestinationFilename(ClassDoc doc) {
+ return HelpUtils.getClassName(doc).replace(".", "_") + ".html";
+ }
+
+ public abstract String getTemplateName(ClassDoc doc) throws IOException;
+ public abstract void processOne(RootDoc rootDoc, GATKDocWorkUnit toProcess, Set