gatk-3.8/public/java/test/org/broadinstitute/sting/gatk
Valentin Ruano-Rubio 96073c3058 This commit addresses JIRA issue GSA-948: Prevent users from doing the wrong thing with RNA-Seq data and the GATK.
The previous behavior is to process reads with N CIGAR operators as they are despite that many of the tools do not actually support such operator and results become unpredictible.

Now if the there is some read with the N operator, the engine returns a user exception. The error message indicates what is the problem (including the offending read and mapping position) and give a couple of alternatives that the user can take in order to move forward:

a) ask for those reads to be filtered out (with --filter_reads_with_N_cigar or -filterRNC)

b) keep them in as before (with -U ALLOW_N_CIGAR_READS or -U ALL)

Notice that (b) does not have any effect if (a) is enacted; i.e. filtering overrides ignoring.

Implementation:

* Added filterReadsWithMCigar argument to MalformedReadFilter with the corresponding changes in the code to get it to work.
* Added ALLOW_N_CIGAR_READS unsafe flag so that N cigar containing reads can be processed as they are if that is what the user wants.
* Added ReadFilterTest class commont parent for ReadFilter test cases.
* Refactor ReadGroupBlackListFilterUnitTest to extend ReadFilterTest and push up some functionality to that class.
* Modified MalformedReadFilterUnitTest to extend ReadFilterTest and to test the new filter functionality.
* Added AllowNCigarMalformedReadFilterUnittest to check on the behavior when the unsafe ALLOW_N_CIGAR_READS flag is used.
* Added UnsafeNCigarMalformedReadFilterUnittest to check on the behavior when the unsafe ALL flag is used.
* Updated a broken test case in UnifiedGenotyperIntegrationTest resulting from the new behavior.
* Updated EngineFeaturesIntegrationTest testdata to be compliant with new behavior
2013-06-10 10:44:42 -04:00
..
datasources Require a minimum dcov value of 200 for Locus and ActiveRegion walkers when downsampling to coverage 2013-05-29 12:07:12 -04:00
downsampling Require a minimum dcov value of 200 for Locus and ActiveRegion walkers when downsampling to coverage 2013-05-29 12:07:12 -04:00
executive Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
filters This commit addresses JIRA issue GSA-948: Prevent users from doing the wrong thing with RNA-Seq data and the GATK. 2013-06-10 10:44:42 -04:00
iterators Refactor LIBS into utils.locusiterator before refactoring 2013-01-11 15:17:16 -05:00
refdata Detect stuck lock-acquisition calls, and disable file locking for tests 2013-04-24 22:49:02 -04:00
report Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
samples Trivial update to ceutrio.ped file to make it really the CEU trio sample names 2013-05-14 17:08:13 -04:00
traversals Count Reads should use a Long instead of an Integer for counts to prevent overflows. Added unit test. 2013-05-21 15:23:51 -04:00
walkers Fixes to get accurate read counts for Read traversals 2013-05-21 15:24:07 -04:00
CommandLineGATKUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
EngineFeaturesIntegrationTest.java This commit addresses JIRA issue GSA-948: Prevent users from doing the wrong thing with RNA-Seq data and the GATK. 2013-06-10 10:44:42 -04:00
GenomeAnalysisEngineUnitTest.java Added the functionality to impose a relative ordering on ReadTransformers in the GATK engine. 2013-03-06 12:38:59 -05:00
MaxRuntimeIntegrationTest.java Subshard timeouts in the GATK 2013-05-15 07:00:39 -04:00
ReadMetricsUnitTest.java Optimized counting of filtered records by filter. 2013-05-21 21:54:49 -04:00
WalkerManagerUnitTest.java Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00