-Changes in Java 7 related to comparators / sorting produce a large number
of innocuous differences in our test output. Updating expectations now
that we've moved to using Java 7 internally.
-Also incorporate Eric's fix to the GATKSAMRecordUnitTest to prevent
intermittent failures.
This class, being unused, was no longer getting packaged into the
GATK release jar by bcel, and so attempting to run its unit test
on the release jar was producing an error.
-Even though we're no longer compiling/using contracts in tests,
we still need the cofoja jar in the classpath when testing the
release jars due to some bad behavior on the part of TestNG in
not being able to handle missing annotation classes.
-We don't need to package the cofoja classes in the actual GATK
jar, however (and we never have).
RR counts are represented as offsets from the first count, but that wasn't being done
correctly when counts are adjusted on the fly. Also, we were triggering the expensive
conversion and writing to binary tags even when we weren't going to write the read
to disk.
The code has been updated so that unconverted counts are passed to the GATKSAMRecord
and it knows how to encode the tag correctly. Also, there are now methods to write
to the reduced counts array without forcing the conversion (and methods that do force
the conversion).
Also:
1. counts are now maintained as ints whenever possible. Only the GATKSAMRecord knows
about the internal encoding.
2. as discussed in meetings today, we updated the encoding so that it can now handle
a range of values that extends to 255 instead of 127 (and is backwards compatible).
3. tests have been moved from SyntheticReadUnitTest to GATKSAMRecordUnitTest accordingly.
Added also R script used to process everything into a couple of ggplot-friendly data frames.
Functionality is basically the same. Enhancements:
-- Add annotation to log axiom and Exome Chip AC along with LOF results for concordance comparisons.
-- General Cleanup.
-- Used base path for files as a variable in case directory structure in gsa-hpprojects changes again.
-- Output also per-pool data by subsetting genotypes per pool and comparing with corresponding genotypes from Axiom, exome chip and omni.
-- Commit R scripts that load all tables and crunch them to analyze them.
-- Added check to see if read spans beyond reference window MINUS padding and event length. This guarantees that read will always be contained in haplotype.
-- Changed md5's that happen when long reads from old 454 data have their likelihoods changed because of the extra base clipping.
-- The previous version of the read clipping operations wouldn't modify the reduced reads counts, so hardClipToRegion would result in a read with, say, 50 bp of sequence and base qualities but 250 bp of reduced read counts. Updated the hardClip operation to handle reduce reads, and added a unit test to make sure this works properly. Also had to update GATKSAMRecord.emptyRead() to set the reduced count to new byte[0] if the template read is a reduced read
-- Update md5s, where the new code recovers a TP variant with count 2 that was missed previously
Use case:
The default AF priors used (infinite sites model, neutral variation) is appropriate in the case where the reference allele is ancestral, and the called allele is a derived allele.
Most of the times this is true but in several population studies and in ancient DNA analyses this might introduce reference biases, and in some other cases it's hard to ascertain what the ancestral allele is (normally requiring to look up homologous chimp sequence).
Specifying no prior is one solution, but this may introduce a lot of artifactual het calls in shallower coverage regions.
With this option, users can specify what the prior for each AC should be according to their needs, subject to the restrictions documented in the code and in GATK docs.
-- Updated ancient DNA single sample calling script with filtering options and other cleanups.
-- Added integration test. Removed old -noPrior syntax.
-Do not throw an exception when parsing snpEff output files
generated by not-officially-supported versions of snpEff,
PROVIDED that snpEff was run with -o gatk
-Requested by the snpEff author
-Relevant integration tests updated/expanded
Note that this works only in the case of pileups (i.e. coming from UG);
allele-biased down-sampling for RR just cannot work for haplotypes.
Added lots of unit tests for new functionality.
-- The previous version was unclipping soft clipped bases, and these were sometimes adaptor sequences. If the two reads successfully merged, we'd lose all of the information necessary to remove the adaptor, producing a very high quality read that matched reference. Updated the code to first clip the adapter sequences from the incoming fragments
-- Update MD5s
-Acquire file locks in a background thread with a timeout of 30 seconds,
and throw a UserException if a lock acquisition call times out
* should solve the locking issue for most people provided they
RETRY failed farm jobs
* since we use NON-BLOCKING lock acquisition calls, any call that
takes longer than a second or two indicates a problem with the
underlying OS file lock support
* use daemon threads so that stuck lock acquisition tasks don't
prevent the JVM from exiting
-Disable both auto-index creation and file locking for integration tests
via a hidden GATK argument --disable_auto_index_creation_and_locking_when_reading_rods
* argument not safe for general use, since it allows reading from
an index file without first acquiring a lock
* this is fine for the test suite, since all index files already
exist for test files (or if they don't, they should!)
-Added missing indices for files in private/testdata
-Had to delete most of RMDTrackBuilderUnitTest, since it mostly tested auto-index
creation, which we can't test with locking disabled, but I replaced the deleted
tests with some tests of my own.
-Unit test for FSLockWithShared to test the timeout feature