Commit Graph

270 Commits (0a56fe5bc33f9dfd40e25005f2e037bc36e9ffdc)

Author SHA1 Message Date
Eric Banks eccb76c304 Only run UG in the bundle for chr20 2012-10-30 15:09:46 -04:00
Eric Banks 8a402024c2 Updating bundle script to handle new naming convention of CEU trio best practices callset 2012-10-30 09:11:56 -04:00
Ryan Poplin 5ee2feb2a3 updating pipeline test md5s 2012-10-29 18:53:27 -04:00
Ami Levy Moonshine dde3060bb8 add the CEUtrio best practices results (UG + PBT) to the bundle 2012-10-25 15:36:17 -04:00
Khalid Shakir fd59e7d5f6 Better error message when generic types are erased from scala collections. 2012-10-22 16:27:31 -04:00
Khalid Shakir 2ef456d51a Added explicit @ClassType annotations to @Argument for Option[Int] or Option[Double] since scala seems to change the reflected type to Option[Object] on some systems.
Changed ReflectionUtils.getGenericTypes' order of looking for @ClassType since the primitive generic wasn't completely erased, only changed to Object which is incorrect.
More fixes to @Arguments labeled as java.io.File via incorrect @Input annotation.
Put in a default undocumented implementation of @Argument doc() to match the one added to @Input.
2012-10-19 13:20:29 -04:00
Khalid Shakir 403654d40a Fixed null checkes in ArgumentTypeDescriptor due to ArgumentMatchValue updates.
Fixed @Arguments such as scatter count that were labeled as java.io.File via incorrect @Input annotation.
2012-10-18 16:57:15 -04:00
kshakir 0196dbeaca Added more logging to push/pull of RemoteFiles. 2012-10-17 09:52:17 -04:00
kshakir f93b279151 Moved the class field caching from QScript to a ClassFieldCache utility.
Using ClassFieldCache to pull values from QScript for passing to done() method of QStatusMessenger.
2012-10-16 18:49:31 -04:00
kshakir c4ee31075c Fixed package error and a few deprecated scala warnings. 2012-10-15 15:29:40 -04:00
kshakir 213cc00abe Refactored argument matching to support other plugins in addition to file lists.
Added plugin support for sending Queue status messages.
Argument parsing can store subclasses of java.io.File, for example RemoteFile.
2012-10-15 15:10:45 -04:00
Kristian Cibulskis dad7ca281e upgraded mutation caller with VCF output
raw indel calls (non filtered,non vcf)
2012-10-15 13:49:08 -04:00
Guillermo del Angel 22b79fb4dd Resolve [DEV-7]: add single-sample VCF calling at end of FASTQ-BAM pipeline. Initial steps of [DEV-4]: queue extensions for Picard QC metrics 2012-10-15 13:49:08 -04:00
Kristian Cibulskis 658f355171 initial cancer pipeline with mutations and partial indel support 2012-10-15 13:49:07 -04:00
Mauricio Carneiro 322ea1262c First implementation of a generic 'bundled' Data Processing Pipeline for germline and cancer.
not ready for prime time yet!
2012-10-15 13:49:06 -04:00
Mauricio Carneiro f1fb51b222 Reverting the DPP to the original version, going to create a new simplified version for CMI in private. 2012-10-15 13:49:06 -04:00
Mauricio Carneiro 429c96e723 Generic input file name recognition (still need to implement support to FastQ, but it now can at least accept it) 2012-10-15 13:49:06 -04:00
Khalid Shakir f66284658d RetryMemoryLimit now works with Scatter/Gather. 2012-10-09 21:51:03 -04:00
Johan Dahlberg e9b9e2318c Fixed SortSam bug, for .done file
The *.bai.done file for the .bai file was written in the run directory instead of in the specified output directory.
Changing getName() to getAbsolutePath() fixes this.

Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2012-10-09 16:25:18 -04:00
Mauricio Carneiro 9a8f53e76c Probably the GATK's most seen typo in the world 2012-10-02 13:34:37 -04:00
David Roazen 3f44b3e019 Update DataProcessingPipelineTest MD5s 2012-09-24 15:38:07 -04:00
Eric Banks 277ba94c7b Update from dbsnp135 to dbsnp137. 2012-08-31 14:06:29 -04:00
Eric Banks 5ea7cd6dcc Updating resource bundle: no reason to include both genotype and sites files for Omni and HM3, sites are enough. Also, don't include duplicate entry for the Mills indels. 2012-08-31 14:01:54 -04:00
Khalid Shakir 2d1ea7124b One less Queue command line requirement: -tempDir now defaults to .queue/tmp.
Also moved queueScatterGather to .queue/scatterGather.
2012-08-27 12:04:50 -04:00
Mark DePristo 9eec33ec3b Complete GSA-497: Let Queue write out runInfo on the fly, after each job group finishes running
-- Queue will incrementally now write out its jobReport.txt file whenever jobs finish running (FAIL or DONE)
-- This makes it far easier to track what's going on, or to analyze incrementally performance results coming out of Queue
-- Generally cleaned up the QJobsReporting code, creating a new clean class QJobsReporter that holds all of the information on what to do log and where to put into, which was previously scattered in QCommandLine and QJobReport
2012-08-21 14:44:18 -04:00
Khalid Shakir 3514fb6e66 Changed the default memory limit from none to 2GB upon suggestions from delangel, carneiro, and depristo. 2012-08-20 21:41:13 -04:00
Mark DePristo 67ebd65512 Bugfix for potential SEGFAULT with JNA getting execution hosts for LSF with multiple hosts 2012-08-17 11:49:01 -04:00
Khalid Shakir 22b4466cf5 Added setupRetry() to modify jobs when Queue is run with '-retry' and jobs are about to restart after an error.
Implemented a mixin called "RetryMemoryLimit" which will by default double the memory.
GridEngine memory request parameter can be selected on the command line via '-resMemReqParam mem_free' or '-resMemReqParam virtual_free'.
Java optimizations now enabled by default:
- Only 4 GC threads instead of each job using java's default O(number of cores) GC threads. Previously on a machine with N cores if you have N jobs running and java allocates N GC threads by default, then the machines are using up to N^2 threads if all jobs are in heavy GC (thanks elauzier).
- Exit if GC spends more than 50% of time in GC (thanks ktibbett).
- Exit if GC reclaims lest than 10% of max heap (thanks ktibbett).
Added a -noGCOpt command line option to disable new java optimizations.
2012-08-13 15:43:05 -04:00
Eric Banks 0381fd7c83 Hmm, I thought I used the right md5s last time. Let's try again. 2012-08-02 11:25:10 -04:00
Eric Banks 05bf6e3726 Updating md5s in pipeline tests so that they finally pass 2012-08-01 10:27:00 -04:00
Eric Banks 7cf4b63d76 Disabling indel quals in BaseRecalibrator as it should be, not PrintReads. 2012-08-01 09:23:04 -04:00
Eric Banks 675ccab2fa Renaming BQSR to BaseRecalibrator 2012-07-23 10:17:17 -04:00
Mauricio Carneiro d446d34227 GATK Error messages now point to the new website instead of GetSatisfaction. 2012-07-20 17:27:11 -04:00
Eric Banks a9f27e5b02 Updated md5s for DPP test 2012-07-17 21:54:46 -04:00
Eric Banks 4e3780fd4f Updated md5 for PBPP 2012-07-17 15:47:43 -04:00
Eric Banks 863eb5b5c0 Use Context not Dinuc covariate 2012-07-17 15:18:11 -04:00
Eric Banks 17d627b86d Update the DPP and PBPP to use the BQSRv2 walkers 2012-07-17 13:15:32 -04:00
Joel Thibault 9ee58d323a Pass the original GATK unsafe parameter to the VcfGatherFunction 2012-07-02 16:03:11 -04:00
Khalid Shakir 746a5e95f3 Refactored parsing of Rod/IntervalBinding. Queue S/G now uses all interval arguments passed to CommandLineGATK QFunctions including support for BED/tribble types, XL, ISR, and padding.
Updated HSP to use new padding arguments instead of flank intervals file, plus latest QC evals.
IntervalUtils return unmodifiable lists so that utilities don't mutate the collections.
Added a JavaCommandLineFunction.javaGCThreads option to test reducing java's automatic GC thread allocation based on num cpus.
Added comma to list of characters to convert to underscores in GridEngine job names so that GE JSV doesn't choke on the -N values.
JobRunInfo handles the null done times when jobs crash with strange errors.
2012-06-27 01:15:22 -04:00
Mauricio Carneiro bbd46690e6 fixing conflicts 2012-06-26 17:12:24 -04:00
Mauricio Carneiro 91f02dfd85 fixing pipeline tests (sorry, my bad) 2012-06-26 17:10:58 -04:00
Mauricio Carneiro 9346c5b37a Merged bug fix from Stable into Unstable 2012-06-26 14:55:41 -04:00
Mauricio Carneiro 334d66f2b1 Updating validation parameter in the DPP
users were very confused with the failing validation of their 'unpicarded' bam files. Changed the default to OFF and added an option to turn it on.
2012-06-26 14:54:37 -04:00
Mark DePristo 567dba0f76 Cleanup of VCF header lines and constants, BCF2 bugfixes
-- Created public static UnifiedGenotyper.getHeaderInfo that loads UG standard header lines, and use this in tools like PoolCaller
-- Created VCFStandardHeaderLines class that keeps standard header lines in the GATK in a single place.  Provides convenient methods to add these to a header, as well as functionality to repair standard lines in incoming VCF headers
-- VCF parsers now automatically repair standard VCF header lines when reading the header
-- Updating integration tests to reflect header changes
-- Created private and public testdata directories (public/testdata and private/testdata).  Updated tests to use test
-- SelectHeaders now always updates the header to include the contig lines
-- SelectVariants add UG header lines when in regenotype mode
-- Renamed PHRED_GENOTYPE_LIKELIHOODS_KEY to GENOTYPE_PL_KEY
-- Bugfix in BCF2 to handle lists of null elements (can happen in genotype field values from VCFs)
-- Throw error when VCF has unbounded non-flag values that don't have = value bindings
-- By default we no longer allow writing of BCF2 files without contig lines in the header
2012-06-21 15:16:31 -04:00
Mark DePristo 982192e2e4 MD5DB for integrationtest management now writes out a md5mismatches files for clean analysis
-- This file is in integrationtests/md5mismatches.txt, and looks like:

expected        observed        test
7fd0d0c2d1af3b16378339c181e40611        2339d841d3c3c7233ebba9a6ace895fd        test BeagleOutputToVCF
43865f3f0d975ee2c5912b31393842f8        1b9c4734274edd3142a05033e520beac        testBeagleChangesSitesToRef
daead9bfab1a5df72c5e3a239366118e        27be14f9fc951c4e714b4540b045c2df        testDiffObjects:master=/local/dev/depristo/itest/public/testdata/diffTestMaster.vcf,test=/local/dev/depristo/itest/public/testdata/diffTestTest.vcf,md5=daead9bfab1a5df72c5e3a239366118e

-- Associated cleanup with making md5db an instantiated object, rather than a bunch of static methods
2012-06-14 16:42:27 -04:00
Mark DePristo 96dbd8df63 Fix a nasty script bug in Queue
-- If you are using user-defined configurations (configureJobFeatures) and you didn't overwride the analysisName of your jobs, and there were other jobs using the same name, then you got very strange errors at the end of your script.  For example, in my script I was using SelectVariants to prepare VCF files, and SelectVariants to generate a useful performance table.  Since I forgot to make a special analysisName for my table commands, the generic SV commands were being included in the analysis group, and these were throwing an error since the special features added for the table weren't added to those SV commands
2012-06-14 16:42:26 -04:00
Mark DePristo f77d2e6965 Renamed NO_HEADER to the more accurate no_cmdline_in_header
-- Also no_cmdline_in_header permits us to write contigs into the header, so that the shadow BCF system can work as well
2012-05-24 10:57:08 -04:00
Ryan Poplin c3fb321014 Minor updates to pacbio data processing script to make it work with the latest bwa version/settings. 2012-05-22 10:24:45 -04:00
Eric Banks 03d40272c8 Removed old GATKReport code and moved the new stuff in its place. 2012-05-18 01:44:31 -04:00
Eric Banks a26b04ba17 Extensive refactoring of the GATKReports. This was a beast.
The practical differences between version 1.0 and this one (v1.1) are:

* the underlying data structure now uses arrays instead of hashes, which should drastically reduce the memory overhead required to create large tables.
* no more primary keys; you can still create arbitrary IDs to index into rows, but there is no special cased primary key column in the table.
* no more dangerous/ugly table operations supported except to increment a cell's value (if an int) or to concatenate 2 tables.

Integration tests change because table headers are different.
Old classes are still lying around.  Will clean those up in a subsequent commit.
2012-05-18 01:11:26 -04:00