Commit Graph

53 Commits (b1073fb17bc9a22eb58fa0db1a02998c791a39c8)

Author SHA1 Message Date
Valentin Ruano-Rubio 1f8282633b Removed plots generation from the BaseRecalibration software
Improved AnalyzeCovariates (AC) integration test.
Renamed AC test files ending with .grp to .table

Implementation:

* Removed RECAL_PDF/CSV_FILE from RecalibrationArgumentCollection (RAC). Updated rest of the code accordingly.
* Fixed BQSRIntegrationTest to work with new changes
2013-06-19 14:47:56 -04:00
Valentin Ruano-Rubio 96073c3058 This commit addresses JIRA issue GSA-948: Prevent users from doing the wrong thing with RNA-Seq data and the GATK.
The previous behavior is to process reads with N CIGAR operators as they are despite that many of the tools do not actually support such operator and results become unpredictible.

Now if the there is some read with the N operator, the engine returns a user exception. The error message indicates what is the problem (including the offending read and mapping position) and give a couple of alternatives that the user can take in order to move forward:

a) ask for those reads to be filtered out (with --filter_reads_with_N_cigar or -filterRNC)

b) keep them in as before (with -U ALLOW_N_CIGAR_READS or -U ALL)

Notice that (b) does not have any effect if (a) is enacted; i.e. filtering overrides ignoring.

Implementation:

* Added filterReadsWithMCigar argument to MalformedReadFilter with the corresponding changes in the code to get it to work.
* Added ALLOW_N_CIGAR_READS unsafe flag so that N cigar containing reads can be processed as they are if that is what the user wants.
* Added ReadFilterTest class commont parent for ReadFilter test cases.
* Refactor ReadGroupBlackListFilterUnitTest to extend ReadFilterTest and push up some functionality to that class.
* Modified MalformedReadFilterUnitTest to extend ReadFilterTest and to test the new filter functionality.
* Added AllowNCigarMalformedReadFilterUnittest to check on the behavior when the unsafe ALLOW_N_CIGAR_READS flag is used.
* Added UnsafeNCigarMalformedReadFilterUnittest to check on the behavior when the unsafe ALL flag is used.
* Updated a broken test case in UnifiedGenotyperIntegrationTest resulting from the new behavior.
* Updated EngineFeaturesIntegrationTest testdata to be compliant with new behavior
2013-06-10 10:44:42 -04:00
David Roazen 4d56142163 Detect stuck lock-acquisition calls, and disable file locking for tests
-Acquire file locks in a background thread with a timeout of 30 seconds,
 and throw a UserException if a lock acquisition call times out

    * should solve the locking issue for most people provided they
      RETRY failed farm jobs

    * since we use NON-BLOCKING lock acquisition calls, any call that
      takes longer than a second or two indicates a problem with the
      underlying OS file lock support

    * use daemon threads so that stuck lock acquisition tasks don't
      prevent the JVM from exiting

-Disable both auto-index creation and file locking for integration tests
 via a hidden GATK argument --disable_auto_index_creation_and_locking_when_reading_rods

    * argument not safe for general use, since it allows reading from
      an index file without first acquiring a lock

    * this is fine for the test suite, since all index files already
      exist for test files (or if they don't, they should!)

-Added missing indices for files in private/testdata

-Had to delete most of RMDTrackBuilderUnitTest, since it mostly tested auto-index
 creation, which we can't test with locking disabled, but I replaced the deleted
 tests with some tests of my own.

-Unit test for FSLockWithShared to test the timeout feature
2013-04-24 22:49:02 -04:00
David Roazen 2eac97a76c Remove auto-creation of fai/dict files for fasta references
-A UserException is now thrown if either the fai or dict file for the
 reference does not exist, with pointers to instructions for creating
 these files.

-Gets rid of problematic file locking that was causing intermittent
 errors on our farm.

-Integration tests to verify that correct exceptions are thrown in
 the case of a missing fai / dict file.

GSA-866 #resolve
2013-04-02 18:34:08 -04:00
Geraldine Van der Auwera d70bf64737 Created new DeprecatedToolChecks class
--Based on existing code in GenomeAnalysisEngine
	--Hashmaps hold mapping of deprecated tool name to version number and recommended replacement (if any)
	--Using FastUtils for maps; specifically Object2ObjectMap but there could be a better type for Strings...
	--Added user exception for deprecated annotations
	--Added deprecation check to AnnotationInterfaceManager.validateAnnotations
	--Run when annotations are initialized
	--Made annotation sets instead of lists
2013-03-20 06:46:02 -04:00
Eric Banks 3759d9dd67 Added the functionality to impose a relative ordering on ReadTransformers in the GATK engine.
* ReadTransformers can say they must be first, must be last, or don't care.
  * By default, none of the existing ones care about ordering except BQSR (must be first).
    * This addresses a bug reported on the forum where BAQ is incorrectly applied before BQSR.
  * The engine now orders the read transformers up front before applying iterators.
  * The engine checks for enabled RTs that are not compatible (e.g. both must be first) and blows up (gracefully).
  * Added unit tests.
2013-03-06 12:38:59 -05:00
Eric Banks 2be57fbcfb Merged bug fix from Stable into Unstable 2013-03-05 13:28:46 -05:00
Eric Banks 5e89f01e10 Don't allow the use of compressed (.gz) references in the GATK. 2013-03-05 13:28:19 -05:00
Eric Banks 12fc198b80 Added better error message for BAMs with bad read groups.
* Split the cases into reads that don't have a RG at all vs. those with a RG that's not defined in the header.
  * Added integration tests to make sure that the correct error is thrown.
  * Resolved GSA-407.
2013-02-27 16:02:56 -05:00
Geraldine Van der Auwera 6208742f7c Refactored GATKDocs categories some more ( GSATDG-62 )
-- Renamed ValidatePileup to CheckPileup since validation is reserved word
-- Renamed AlignmentValidation to CheckAlignment (same as above)
-- Refactored category definitions to use constants defined in HelpConstants
-- Fixed a couple of minor typos and an example error
-- Reorganized the GATKDocs index template to use supercategories
-- Refactored integration tests for renamed walkers (my earlier refactoring had screwed them up or not carried over)
2013-02-13 16:49:18 -05:00
Geraldine Van der Auwera dff5ef562b Reorganized walker categories in GATKDocs (@DocumentedGATKFeature details)
-- Sorted out contents of BAM Processing vs. Diagnostics & QC Tools
-- Moved two validation-related walkers from Diagnostics & QC to Validation Utilities
-- Reworded some category names and descriptions to be more explicit and user-friendly
2013-02-12 13:36:15 -05:00
Mark DePristo 6382d5bdc9 Final cleanup and unit testing for GATKRunReport
-- Bringing code up to document, style, and code coverage specs
-- Move GATKRunReportUnitTest to private
-- Fully expand GATKRunReportUnitTests to coverage writing and reading GATKRunReport to local disk, to standard out, to AWS.
-- Move documentation URL from GATKRunReport to UserException
-- Delete a few unused files from s3GATKReport
-- Added capabilities to GATKRunReport to make testing easier
-- Added capabilities to deserialize GATKRunReports from an InputStream
2013-02-02 15:06:56 -05:00
Mauricio Carneiro 2a4ccfe6fd Updated all JAVA file licenses accordingly
GSATDG-5
2013-01-10 17:06:41 -05:00
Eric Banks 35d9bd377c Moved (nearly) all Walkers from public to protected and removed GATKLite utils 2013-01-07 14:42:40 -05:00
Eric Banks 0249e1f497 Resolving merge conflicts from VCF move 2013-01-06 14:32:31 -05:00
Eric Banks 8822b8e7c8 Moving HelpConstants out of HelpUtils so that we stop getting these ProgramElementDoc errors when com.sun.javadoc cannot load on a user's system. 2013-01-06 14:30:45 -05:00
David Roazen 07b369ca7e Move VCF/BCF2/VariantContext to new standalone org.broadinstitute.variant package
This is an intermediate commit so that there is a record of these changes in our
commit history. Next step is to isolate the test classes as well, and then move
the entire package to the Picard repository and replace it with a jar in our repo.

-Removed all dependencies on org.broadinstitute.sting (still need to do the test classes,
though)

-Had to split some of the utility classes into "GATK-specific" vs generic methods
(eg., GATKVCFUtils vs. VCFUtils)

-Placement of some methods and choice of exception classes to replace the StingExceptions
and UserExceptions may need to be tweaked until everyone is happy, but this can be
done after the move.
2012-12-19 10:25:22 -05:00
Randal Moore 8d2d0253a2 introduce a level of indirection for the forum URLs - this new function will allow me a place to morph the URL into something that is supported by Confluence
Signed-off-by: Eric Banks <ebanks@broadinstitute.org>
2012-12-03 22:33:02 -05:00
Eric Banks b6839b3049 Added checking in the GATK for mis-encoded quality scores.
The check is performed by a Read Transformer that samples (currently set to once
every 1000 reads so that we don't hurt overall GATK performance) from the input
reads and checks to make sure that none of the base quals is too high (> Q60). If
we encounter such a base then we fail with a User Error.

* Can be over-ridden with --allow_potentially_misencoded_quality_scores.
* Also, the user can choose to fix his quals on the fly (presumably using PrintReads
  to write out a fixed bam) with the --fix_misencoded_quality_scores argument.

Added unit tests.
2012-12-03 11:18:41 -05:00
Eric Banks e199562c25 I have pulled out all of the documentation URLs and put them into the HelpUtils class as static variables; this way, Appistry can change links as needed to point commercial users to their own internal forum without having to muck things up all over our source. Added some TODOs for Geraldine to update links in the GATK docs that still point to the old wiki. Sorry that I am pushing into stable, but that's what Appistry is pulling from for their release next week (and unstable has been failing forever). 2012-11-27 10:26:17 -05:00
Eric Banks be902375ac 'Bug' fix: fix the error message from the vcf validator so people realize that the file fails strict validation but still adheres to the spec. 2012-10-29 16:29:27 -04:00
Khalid Shakir fd59e7d5f6 Better error message when generic types are erased from scala collections. 2012-10-22 16:27:31 -04:00
Eric Banks 81532a0529 Missing file are user errors. 2012-10-12 09:48:12 -04:00
Eric Banks 85525d9e6e Make Geraldine's life easier: from now on we treat problems where a temp file cannot be found when running the GATK with multiple threads as User Errors (since they are 99.9% of the time). This is an extremely large class of errors in Tableau and on the forums. Helpful error message tells users exactly what we tell them on the forums anyways (Geraldine: feel free to edit). 2012-10-12 09:19:50 -04:00
Eric Banks 576c7280d9 Extensions to the ErrorThrowing framework for testing purposes 2012-09-06 22:03:18 -04:00
Christopher Hartl d795437202 - New UserExceptions added for when ReadFilters or Walkers specified on the command line are not found. When -rf xxxx cannot find the class corresponding to xxxx, all read filters are printed in a better formatted way, with links to their gatk docs.
- VariantAnnotatorEngine changed to call genotype annotations even if pilups and allele -> likelihood mappings are not present. Current genotype annotations altered to check for null pilupes and null mappings.
2012-09-04 16:41:44 -04:00
Mark DePristo be0f8beebb Fixed GSA-434: GATK should generate error when gzipped FASTA is passed in.
-- The GATK sort of handles this now, but only if you have the exactly correct sequence dictionary and FAI files associated with the reference.  If you do, the file can be .gz.  If not, the GATK will fail on creating the FAI and DICT files.  Added an error message that handles this case and clearly says what to do.
2012-08-17 11:49:02 -04:00
David Roazen a7811d673f Update URL for phone home / GATK key documentation output by the GATK upon error 2012-08-08 09:29:54 -04:00
Eric Banks 56f8afab97 Requested by Geraldine: adding a utility to register deprecated walkers (and the major version of the first release since they were removed) so that the User Error printed out for e.g. CountCovariates now states: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0. 2012-08-01 09:50:00 -04:00
Eric Banks 357e0b35af Register GATK-full-only walkers and rethrow the missing walker error as a not supported in GATK lite error 2012-07-25 14:11:03 -04:00
Eric Banks 62c5228048 1) Revert previous change - indel recalibration is turned on by default and users of the Lite version will need to turn it off to avoid a User Error. 2) Implemented the engine.isGATKLite() method. 2012-07-17 12:23:40 -04:00
Eric Banks 40618ac471 A bunch of BQSR changes: 1) by default we do not emit indel quals, but they can be turned on with --enable_indel_quals. 2) We check whether or not we are running in Lite mode (not done yet) and if so and the user is trying to recalibrate indels, we throw a User Error (not supported). 3) Like v1 we now allow the user to set the qual value below which we don't recalibrate (this was the remaining source of differences in the v1 vs. v2 plots). 2012-07-17 10:52:43 -04:00
Khalid Shakir c4f7df4dce When an underlying exception occurs because of the user error, if the exception instance does not include a message instead of telling the user "because null", tell them "because <exception class name>". 2012-05-30 16:39:06 -04:00
David Roazen c0084c741b Pilot BCF2 Implementation: Checkpointing the code
* Not working yet, still very much a work-in-progress with lots of placeholders
* Needed to check this in to enable possible collaboration, since it's
  going slower than anticipated and the conference deadline looms.
2012-05-01 12:23:10 -04:00
Eric Banks 4edb005411 Catch poorly formatted PL/GL fields 2012-04-23 09:33:50 -04:00
Eric Banks f6aa95685d OutOfMemory exceptions are User Errors 2012-04-02 22:46:56 -04:00
David Roazen 91d10431d3 BAMScheduler: detect contigs from the interval list that are not in the merged BAM header's sequence dictionary
This is a quick-and-dirty patch for the null pointer error Mauricio reported earlier.

Later on we might want to address in a more general way the fact that we validate user intervals
against the reference but not against the merged BAM header produced by the engine at runtime.
2012-03-09 15:20:16 -05:00
David Roazen 0702ee1587 Public-key authorization scheme to restrict use of NO_ET
-Running the GATK with the -et NO_ET or -et STDOUT options now
 requires a key issued by us. Our reasons for doing this, and the
 procedure for our users to request keys, are documented here:
 http://www.broadinstitute.org/gsa/wiki/index.php/Phone_home

-A GATK user key is an email address plus a cryptographic signature
 signed using our private key, all wrapped in a GZIP container.
 User keys are validated using the public key we now distribute with
 the GATK. Our private key is kept in a secure location.

-Keys are cryptographically secure in that valid keys definitely
 came from us and keys cannot be fabricated, however keys are not
 "copy-protected" in any way.

-Includes private, standalone utilities to create a new GATK user key
 (GenerateGATKUserKey) and to create a new master public/private key
 pair (GenerateKeyPair). Usage of these tools will be documented on
 the internal wiki shortly.

-Comprehensive unit/integration tests, including tests to ensure the
 continued integrity of the GATK master public/private key pair.

-Generation of new user keys and the new unit/integration tests both
 require access to the GATK private key, which can only be read by
 members of the group "gsagit".
2012-03-06 00:09:43 -05:00
Mark DePristo 0b29d54937 Changed most BAMSchedule ReviewedStingExceptions to UserExceptions
-- As these represent the bulk of the StingExceptions coming from BAMSchedule and are caused by simple problems like the user providing bad input tmp directories, etc.
2012-02-27 17:08:41 -05:00
Eric Banks 09a5a9eac0 Don't update lineNo for decodeLoc - only for decode (otherwise they get double-counted). Even still, because of the way the GATK currently utilizes Tribble we can parse the same line multiple times, which knocks the line counter out of sync. For now, I've added a TODO in the code to remind us and the error messages note that it's an approximate line number. 2011-12-14 10:43:52 -05:00
Eric Banks 5821c11fad For BAM and Reviewed errors we now check the error message to see if it's actually a 'too many open files' problem and, if so, we generate a User Error instead. 2011-11-22 10:50:22 -05:00
Khalid Shakir b80d407dc7 No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Mark DePristo 3550798c4c Merged bug fix from Stable into Unstable 2011-10-17 13:58:56 -04:00
Mark DePristo 4108a294f7 Better error message when a RodBinding file doesn't exist 2011-10-17 13:58:46 -04:00
Mark DePristo 73f9d1f217 GATK read group requirement iron hand
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo 84160bd83f Reorganization of Sample
-- Moved Gender and Afflication to separate public enums
-- PedReader 90% implemented
-- Improve interface cleanup to XReadLines and UserException
2011-09-30 15:50:54 -04:00
Eric Banks 8f8b59a932 My interpretation of the VCF spec is that the FORMAT field should only be present if there is genotype/sample data. So the VCFCodec now throws an exception when it encounters such a case. I had to fix one of the integration test VCFs. 2011-09-21 22:23:28 -04:00
Menachem Fromer 7ed120361d Fixed bug that required symbolic alleles to be padded with reference base and added integration test to test parsing and output of symbolic alleles 2011-08-12 12:23:44 -04:00
Mark DePristo 8f696c7731 Continuing progress towards RodBinding 1.0
-- Cleaning up old interface to RMDT, docs and contracts added
-- Proper type checking for RodBinding for cases where the Tribble type isn't found or is the wrong type
2011-08-03 17:19:28 -04:00
Mark DePristo e262f4e10b gatkdoc now generalized to use @Annotation. Multiple subsystems now use annotation to receive docs
Index expanded to use summary() annotation field
UserExceptions, ReadFilters, GATK engine all use the system to generate docs
Doclet expanded to handle lots of new cases
2011-07-23 20:00:35 -04:00