gatk-3.8/public/java/test/org/broadinstitute/sting/utils
Ami Levy-Moonshine 6da53aea09 Write a new tool for spliting reads that have N cigar string.
For example, this tool can be used for processing bowtie RNA-seq data.
Each read with k N-cigar elemments is plit to k+1 reads. The split is done by hard clipping the bases rest of the bases.

In order to do it, few changes were introduced to some other clipping methods:
- make a segnificant change in ClippingOp.hardClip() that prevent the spliting of read with cigar: 1M2I1N1M3I.
- change getReadCoordinateForReferenceCoordinate in ReadUtil to recognize Ns

create unitTests for that walker:
- change ReadClipperTestUtils to be more general in order to use its code and avoid code duplication
- move some useful methods from ReadClipperTestUtils to CigarUtils

create integration test for that class

small change in a comment in FullProcessingPipeline

last commit:

Address review comments:
- move to protected under walkers/rnaseq
- change the read splitting methods to be more readable and more efficiant
- change (minor changes) some methods in ReadClipper to allow the changes in split reads
- add (minor change) one method to CigarUtils to allow the changes in split reads
- change ReadUtils.getReadCoordinateForReferenceCoordinate to include possible N in the cigar
- address the rest of the review comments (minor changes)

- fix ReadUtilsUnitTest.testReadWithNs acoording to the defult behaviour of getReadCoordinateForReferenceCoordinate (in case of refernce index that fall into deletion, return the read index of the base before the deletion).
- add another test to ReadUtilsUnitTest.testReadWithNs

- Allow the user to print the split positions (not working proparly currently)
2014-01-01 22:21:36 -05:00
..
R Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
activeregion Adaptations to accomodate Tribble API changes, comprising mostly of the following. 2013-08-19 15:52:47 -04:00
baq Fixing BQSR/BAQ bug: 2013-01-31 11:03:17 -05:00
classloader Enable convenient display of diff engine output in Bamboo, plus misc. minor test-related improvements 2013-05-10 19:00:33 -04:00
clipping Write a new tool for spliting reads that have N cigar string. 2014-01-01 22:21:36 -05:00
codecs/hapmap Adaptations to accomodate Tribble API changes, comprising mostly of the following. 2013-08-19 15:52:47 -04:00
collections Fixing license on Yossi's file 2013-02-05 11:14:43 -05:00
crypt Update expected test output for Java 7 2013-05-01 16:18:01 -04:00
fasta Move BaseUtils back to the GATK by request, along with associated utility methods 2013-01-30 13:09:44 -05:00
file Detect stuck lock-acquisition calls, and disable file locking for tests 2013-04-24 22:49:02 -04:00
fragments A whole slew of improvements to the Haplotype Caller and related code. 2013-07-12 10:09:10 -04:00
haplotype Major improvements to HC that trims down active regions before genotyping 2013-04-08 12:47:49 -04:00
interval Intervals: fix bug where we could fail to find the intersection of unsorted/missorted interval lists 2013-04-02 14:01:52 -04:00
io Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
locusiterator Working version of HaplotypeCaller ReferenceConfidenceModel that accounts for indels as well as SNP confidences 2013-07-02 15:46:38 -04:00
nanoScheduler Further tweaking of test timeouts 2013-03-15 14:49:21 -04:00
pileup Fixing ReadBackedPileup to represent mapping qualities as ints, not (signed) bytes. 2013-07-23 23:47:15 -04:00
progressmeter Subshard timeouts in the GATK 2013-05-15 07:00:39 -04:00
recalibration Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
report Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
runtime Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
sam Write a new tool for spliting reads that have N cigar string. 2014-01-01 22:21:36 -05:00
smithwaterman New faster Smith-Waterman implementation that is edge greedy and assumes that ref and haplotype have same global start/end points. 2013-05-13 09:36:39 -04:00
text Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
threading Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
variant Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline. 2013-12-31 12:07:56 -05:00
AutoFormattingTimeUnitTest.java AutoFormattingTimeUnitTest should be in utils 2013-01-30 09:47:47 -05:00
BaseUtilsUnitTest.java More aggressive checking of AWS key quality upon startup in the GATK 2013-01-31 09:08:38 -05:00
BitSetUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
GenomeLocParserBenchmark.java Optimize GenomeLocParser.createGenomeLoc 2013-01-30 09:47:47 -05:00
GenomeLocParserUnitTest.java Adding Graph-based likelihood ratio calculation to HC 2013-12-02 19:37:19 -05:00
GenomeLocSortedSetUnitTest.java Fixed the add functionality of GenomeLocSortedSet. 2013-02-28 23:31:00 -05:00
GenomeLocUnitTest.java Added distance across contigs calculation to GenomeLocs 2013-02-07 16:31:41 -05:00
MRUCachingSAMSequencingDictionaryUnitTest.java Refactoring and unit testing GenomeLocParser 2013-01-30 09:47:47 -05:00
MWUnitTest.java Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
MathUtilsUnitTest.java Introducing the latest-and-greatest in genotyping: CalculatePosteriors. 2013-11-27 13:00:45 -05:00
MedianUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
NGSPlatformUnitTest.java Expand NGSPlatform to meet SAM 1.4 spec, with full unit tests 2013-02-09 11:16:21 -05:00
PathUtilsUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
QualityUtilsUnitTest.java Final edge case bug fixes to QualityUtil routines 2013-02-16 07:31:38 -08:00
SequenceDictionaryUtilsUnitTest.java Sequence dictionary validation: detect problematic contig indexing differences 2013-02-25 11:14:22 -05:00
SimpleTimerUnitTest.java Fix tests that were consistently or intermittently failing when run in parallel on the farm 2013-03-06 13:56:54 -05:00
UtilsUnitTest.java New faster Smith-Waterman implementation that is edge greedy and assumes that ref and haplotype have same global start/end points. 2013-05-13 09:36:39 -04:00