gatk-3.8/public
Mark DePristo af3613cc5f GATKSAMRecord commit branch summary
First, I'm sure there's a better way to do this, but I wanted to create a single commit summarizing the changes from my branch SamRecordFactory.  What's the best way to do this?  Rebase?

Now, on to the changes here:

-- Picard added a SamRecordFactory that is used to create instances the subclass SamRecord or BAMRecord.  This factory allows us to have low-level picard readers (SamFileReader) create objects of type GATKSamRecord.  The abomination of the extends and contains GATKSamRecord is now gone.  GATKSamRecords are now produced by this factory, the GATK provides this factory to our SamFileReaders, and everything works with GATKSamRecord just extending BAMRecord.  This results in up to a 2x performance improvement in writing BAMs and a ~10% improvement when reading BAMs files.

-- As a consequence of this, we no longer officially support SAM records.  Attempting to create SAMRecord objects with the factory will throw a user exception.

-- Created a standard NGSPlatform enum, and GATKSamRecords support efficiently obtaining this value.  The real BQSR (not the copy indel version) got the efficient code to use this.  Please add all future platforms to this enum.

-- GATKSamRecord no longer supports using the OQ or defaultBaseQuality.  This is performed in a wrapper iterator that's only added when these command line options are used.

-- ReducedRead code has been moved from ReadUtils until efficiency caching assessors in GATKSamRecord.

-- ArtificialSamUtils creates GATKSamRecords now, just SAMRecords.  Added code here to create artifical pairs and using that code to create artificial ReadBackedPileups with specific properties

-- New smarter algorithm for FragmentPileup.  This new code is up to 3x faster than the previous version, and is lazy so is more efficient when no overlapping pairs are actually in the pileup.  Created extensive DataProvider driven UnitTest.  Added Caliper-based benchmarking system to characterize the performance differences between the old and new algorithms.  TODO still remains to make a efficient version that works for non-pileups for the HaplotypeCaller
2011-10-25 20:52:56 -04:00
..
R Embedding gsalib source and queueJobReport R scripts in the dist and package jars. 2011-10-24 15:58:34 -04:00
c Reinitialize random seed in the bwa bindings from the fixed seed stored in the 2011-07-22 13:41:53 -04:00
chainFiles Reorganized the codebase beneath top-level public and private directories, 2011-06-28 06:55:19 -04:00
doc Reorganized the codebase beneath top-level public and private directories, 2011-06-28 06:55:19 -04:00
java GATKSAMRecord commit branch summary 2011-10-25 20:52:56 -04:00
packages Embedding gsalib source and queueJobReport R scripts in the dist and package jars. 2011-10-24 15:58:34 -04:00
perl Update to the bindings for liftOverVCF.pl (to -V from -B) 2011-09-15 15:33:09 -04:00
scala Embedding gsalib source and queueJobReport R scripts in the dist and package jars. 2011-10-24 15:58:34 -04:00
testdata Oops, forgot the PED test file 2011-10-05 21:09:08 -07:00