gatk-3.8/protected/java/test/org/broadinstitute/sting/gatk/walkers
Ami Levy-Moonshine 6da53aea09 Write a new tool for spliting reads that have N cigar string.
For example, this tool can be used for processing bowtie RNA-seq data.
Each read with k N-cigar elemments is plit to k+1 reads. The split is done by hard clipping the bases rest of the bases.

In order to do it, few changes were introduced to some other clipping methods:
- make a segnificant change in ClippingOp.hardClip() that prevent the spliting of read with cigar: 1M2I1N1M3I.
- change getReadCoordinateForReferenceCoordinate in ReadUtil to recognize Ns

create unitTests for that walker:
- change ReadClipperTestUtils to be more general in order to use its code and avoid code duplication
- move some useful methods from ReadClipperTestUtils to CigarUtils

create integration test for that class

small change in a comment in FullProcessingPipeline

last commit:

Address review comments:
- move to protected under walkers/rnaseq
- change the read splitting methods to be more readable and more efficiant
- change (minor changes) some methods in ReadClipper to allow the changes in split reads
- add (minor change) one method to CigarUtils to allow the changes in split reads
- change ReadUtils.getReadCoordinateForReferenceCoordinate to include possible N in the cigar
- address the rest of the review comments (minor changes)

- fix ReadUtilsUnitTest.testReadWithNs acoording to the defult behaviour of getReadCoordinateForReferenceCoordinate (in case of refernce index that fall into deletion, return the read index of the base before the deletion).
- add another test to ReadUtilsUnitTest.testReadWithNs

- Allow the user to print the split positions (not working proparly currently)
2014-01-01 22:21:36 -05:00
..
annotator Improvements to the reference model pipeline. 2013-11-01 17:58:25 -04:00
beagle Simpler FILTER and info field encoding for BeagleOutputToVCF 2013-06-14 15:56:13 -04:00
bqsr Removed plots generation from the BaseRecalibration software 2013-06-19 14:47:56 -04:00
compression/reducereads Bug fix for RR: stop (incorrectly) pulling the MQ out of the SAMRecord as a byte instead of an int. 2013-11-27 18:55:03 -05:00
diagnostics Add GC Content to DiagnoseTargets 2013-12-03 23:04:40 -05:00
diffengine Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) 2013-03-12 10:57:14 -04:00
fasta Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
filters Don't allow users to specify keys and IDs that contain angle brackets or equals signs (not allowed in VCF spec). 2013-04-05 00:52:32 -04:00
genotyper Bug fix for something Guillermo added to UG before he left to support calling indels from reduced reads. 2013-11-27 13:54:39 -05:00
haplotypecaller Fixed issue > 0 log likelihoods using GraphBased likelihood engine reported by Mauricio 2013-12-13 11:19:57 -05:00
indels Bug fix for something Guillermo added to UG before he left to support calling indels from reduced reads. 2013-11-27 13:54:39 -05:00
phasing Fixed bug in PhaseByTransmission where it was completely dropping multi-allelic records. 2013-08-21 15:46:57 -04:00
rnaseq Write a new tool for spliting reads that have N cigar string. 2014-01-01 22:21:36 -05:00
validation MathUtils.randomSubset() now uses Collections.shuffle() (indirectly, through the other methods 2013-03-29 14:52:10 -04:00
varianteval adding a check for the UNAVAILABLE case of GenotypeType in CountVariants 2013-08-29 17:27:00 -04:00
variantrecalibration Various VQSR optimizations in both runtime and accuracy. 2013-11-29 13:04:46 -05:00
variantutils Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline. 2013-12-31 12:07:56 -05:00