gatk3的最后一个经典版本3.8
 
 
 
 
Go to file
depristo 27c8fb1e4d Added support for a general GATK option --simplifyBAM to automatically remove and simplify kept reads in an output BAM file. Specifically, duplicate, non-PF, and unmapped reads are removed, and all extended tags in the retained SAM records are removed except the RG:Z tag. This option is very useful when creating temporary BAM files (merged per-population or multi-sample cleaned) for future calling (as in the 1000G processing pipeline). Results in a significant reduction in space of the resulting BAM, faster reading of the BAM, and surprisingly even faster UG performance:
1-10mb of chromosome one, from NA12878 HiSeq 64x data set on hg18:

Full BAM
Write time: 8.6 m
Size: 866M
CountReads time: 2.9 m
UG time: 11.3 m

Simplified BAM:
Write time: 6.2
Size: 458M
CountReads time: 85.7 s
UG time: 10.1 m


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5517 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 01:21:35 +00:00
R Misc changes 2011-02-26 15:35:49 +00:00
analysis/depristo Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. 2011-03-24 14:03:51 +00:00
archive Moving GLF code to archive 2011-01-15 22:42:42 +00:00
c Bug fixes for the bwa aligner and changes to support compiling against newer releases of the bwa code base. 2010-12-17 14:49:15 +00:00
doc removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also 2010-08-19 00:42:37 +00:00
java Added support for a general GATK option --simplifyBAM to automatically remove and simplify kept reads in an output BAM file. Specifically, duplicate, non-PF, and unmapped reads are removed, and all extended tags in the retained SAM records are removed except the RG:Z tag. This option is very useful when creating temporary BAM files (merged per-population or multi-sample cleaned) for future calling (as in the 1000G processing pipeline). Results in a significant reduction in space of the resulting BAM, faster reading of the BAM, and surprisingly even faster UG performance: 2011-03-26 01:21:35 +00:00
lua forgot to remove a debug line. 2011-02-15 16:25:48 +00:00
matlab Another matlab script -- this time for making power and coverage plots over a specific gene region. Lots of fun file reading, string manipulation, and exploration of the set() function 2009-11-30 20:02:25 +00:00
packages Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. 2011-03-24 14:03:51 +00:00
perl 2 more scripts I found helpful in syncing (and cleaning up) the 1000G mirror 2011-02-22 04:17:36 +00:00
python A helper script that will take a list of bams, a list of case sample IDs, and a list of control sample IDs, and generate a sample meta data yaml (which includes the bamfiles) 2011-03-21 16:11:55 +00:00
ruby accidentally commited an old tool 2010-08-25 15:42:02 +00:00
scala Enabled the parameterize option for debugging PipelineTest MD5s. 2011-03-26 00:41:47 +00:00
settings Update Picard / sam-jdk at Tim's request. 2011-01-03 02:17:25 +00:00
shell Fixing this so it gets the right 129 dbsnp for b37 samples 2011-03-22 17:43:20 +00:00
testdata ReplaceReadGroups. Fixes BAM files without read group info. MissingReadGroup points people to this tool now. Please point users on the forum to this tool now. Will migrate to Picard. 2011-02-21 14:02:41 +00:00
LICENSE Adding a license to the root directory in case BOSC checks for one. Has the 2010-04-20 16:04:29 +00:00
build.xml Build.xml contained references to tools now in picard 2011-03-17 18:29:46 +00:00
ivy.xml Added commons math, for Kristian. 2011-02-14 18:57:21 +00:00