gatk-3.8

History

Guillermo del Angel 695723ba43 Two features useful for ancient DNA processing. Ancient DNA sequencing data is in many ways different from modern data, and methods to analyze it need to be adapted accordingly. Feature 1: Read adaptor trimming. Ancient DNA libraries typically have very short inserts (in the order of 50 bp), so typical Illumina libraries sequenced in, say, 100bp HiSeq will have a large adaptor component being read after the insert. If this adaptor is not removed, data will not be aligneable. There are third party tools that remove adaptor and potentially merge read pairs, but are cumbersome to use and require precise knowledge of the library construction and adaptor sequence. -- New walker ReadAdaptorTrimmer walks through paired end data, computes pair overlap and trims auto-detected adaptor sequence. -- Unit tests added for trimming operation. -- Utility walker (may be retired later) DetailedReadLengthDistribution computes insert size or read length distribution stratified by read group and mapping status and outputs a GATKReport with data. -- Renamed MaxReadLengthFilter to ReadLengthFilter and added ability to specify minimum read length as a filter (may be useful if, as a consequence of adaptor trimming, we're left with a lot of very short reads which will map poorly and will just clutter output BAMs). Feature 2: Unbiased site QUAL estimation: many times ancestral allele status is not known and VCF fields like QUAL, QD, GQ, etc. are affected by the pop. gen. prior at a site. This might introduce subtle biases in studies where a species is aligned against the reference of another species, so an option for UG and HC not to apply such prior is introduced. -- Added -noPrior argument to StandardCallerArgumentCollection. -- Added option not to fill priors is such argument is set. -- Added an integration test.		2013-03-09 18:18:13 -05:00
..
R	Fix a nasty bug in reading GATK reports with a single line	2012-09-10 20:14:13 -04:00
c	At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings.	2012-01-17 14:47:53 -05:00
chainFiles	Reorganized the codebase beneath top-level public and private directories,	2011-06-28 06:55:19 -04:00
doc	Reorganized the codebase beneath top-level public and private directories,	2011-06-28 06:55:19 -04:00
java	Two features useful for ancient DNA processing.	2013-03-09 18:18:13 -05:00
keys	Public-key authorization scheme to restrict use of NO_ET	2012-03-06 00:09:43 -05:00
packages	ValidatingPileup was renamed to CheckPileup	2013-02-15 11:56:19 -05:00
perl	Split out contig names from Reference .fai file on white space (to support the GATK resource bundle's file human_g1k_v37.fasta.fai.gz, which does not use tab delimiters)	2012-06-07 16:56:32 -04:00
scala	Fix improper dependencies in QScripts used by pipeline tests, and attempt to fix the flawed MisencodedBaseQualityUnitTest	2013-02-27 04:45:53 -05:00
testdata	Reverting move of BQSR tests to public, as per DR's email	2012-07-19 10:02:05 -04:00