gatk-3.8/protected/java/test/org/broadinstitute/sting/gatk/walkers
Valentin Ruano-Rubio 96073c3058 This commit addresses JIRA issue GSA-948: Prevent users from doing the wrong thing with RNA-Seq data and the GATK.
The previous behavior is to process reads with N CIGAR operators as they are despite that many of the tools do not actually support such operator and results become unpredictible.

Now if the there is some read with the N operator, the engine returns a user exception. The error message indicates what is the problem (including the offending read and mapping position) and give a couple of alternatives that the user can take in order to move forward:

a) ask for those reads to be filtered out (with --filter_reads_with_N_cigar or -filterRNC)

b) keep them in as before (with -U ALLOW_N_CIGAR_READS or -U ALL)

Notice that (b) does not have any effect if (a) is enacted; i.e. filtering overrides ignoring.

Implementation:

* Added filterReadsWithMCigar argument to MalformedReadFilter with the corresponding changes in the code to get it to work.
* Added ALLOW_N_CIGAR_READS unsafe flag so that N cigar containing reads can be processed as they are if that is what the user wants.
* Added ReadFilterTest class commont parent for ReadFilter test cases.
* Refactor ReadGroupBlackListFilterUnitTest to extend ReadFilterTest and push up some functionality to that class.
* Modified MalformedReadFilterUnitTest to extend ReadFilterTest and to test the new filter functionality.
* Added AllowNCigarMalformedReadFilterUnittest to check on the behavior when the unsafe ALLOW_N_CIGAR_READS flag is used.
* Added UnsafeNCigarMalformedReadFilterUnittest to check on the behavior when the unsafe ALL flag is used.
* Updated a broken test case in UnifiedGenotyperIntegrationTest resulting from the new behavior.
* Updated EngineFeaturesIntegrationTest testdata to be compliant with new behavior
2013-06-10 10:44:42 -04:00
..
annotator Update expected test output for Java 7 2013-05-01 16:18:01 -04:00
beagle Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
bqsr Make BQSR calculateIsIndel robust to indel CIGARs are start/end of read 2013-05-31 13:58:37 -04:00
compression/reducereads Fix error in merging code in HC 2013-05-31 16:29:29 -04:00
diagnostics Update MD5s and the Diagnose Target scala script 2013-05-13 12:06:17 -04:00
diffengine Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) 2013-03-12 10:57:14 -04:00
fasta Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
filters Don't allow users to specify keys and IDs that contain angle brackets or equals signs (not allowed in VCF spec). 2013-04-05 00:52:32 -04:00
genotyper This commit addresses JIRA issue GSA-948: Prevent users from doing the wrong thing with RNA-Seq data and the GATK. 2013-06-10 10:44:42 -04:00
haplotypecaller HaplotypeCaller now emits per-sample DP 2013-06-06 09:47:32 -04:00
indels Secondary alignments were not handled correctly in IndelRealigner 2013-05-06 19:09:10 -04:00
phasing Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
validation MathUtils.randomSubset() now uses Collections.shuffle() (indirectly, through the other methods 2013-03-29 14:52:10 -04:00
varianteval Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
variantrecalibration Update MD5s for VQSR header change 2013-04-16 11:45:45 -04:00
variantutils CombineVariants no longer adds PASS to unfiltered records 2013-05-20 16:53:51 -04:00