Commit Graph

139 Commits (8eda0c50df451e94d46e5098bc4205adb2ddd4dd)

Author SHA1 Message Date
Mauricio Carneiro bc64d4240f Licensing update -- batch #2
- caught all scala files that didn't have proper package information / class names
   - included all source files in archive as well

GSATDG-5
2013-01-11 13:38:11 -05:00
Mauricio Carneiro 28235f57f2 Adding package information to scala scripts that were missing it. Including archived ones.
GSATDG-5
2013-01-11 13:38:05 -05:00
Mauricio Carneiro e5913e50b2 Updating licenses for all scala files
GSATDG-5
2013-01-10 17:46:10 -05:00
Mauricio Carneiro d3e2352072 Moved processing pipelines to private
These pipelines were supposed to serve as an example for the community, they were written a long-long-long time ago and are being used today by users as the 'best practice pipeline'. Unless we decide we want to support and maintain an example best-practices pipeline, I'm moving these to private.
2013-01-07 14:49:57 -05:00
Eric Banks 18728ec5bd Updates to the bundle script:
1. Add the symbolic 'current' link for the new bundle dir
2. Don't gzip and copy .out files
3. Don't call chr20 SNPs on the example BAM because it's now just a few reads on chr1
2012-12-18 11:16:42 -05:00
Menachem Fromer a8c7edca05 Fixed fragment handling in DepthOfCoverage 2012-11-21 16:01:10 -05:00
Menachem Fromer c8be7c3102 Keep SNPs and indels separately for batch merging; Add options to DepthOfCoverage to count fragments (to not double-count overlapping reads of same fragment); DepthOfCoverage should now support ReducedReads; Replace recusrion with loop in DoC/package.scala (for lists longer than 5000 elements) 2012-11-21 15:56:53 -05:00
Menachem Fromer 9111966261 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-20 12:19:58 -05:00
Eric Banks 843384e435 Rename hg19 files in bundle to b37 since that's what they are 2012-11-14 11:47:09 -05:00
Eric Banks eccb76c304 Only run UG in the bundle for chr20 2012-10-30 15:09:46 -04:00
Eric Banks 8a402024c2 Updating bundle script to handle new naming convention of CEU trio best practices callset 2012-10-30 09:11:56 -04:00
Menachem Fromer 9af4b34fd8 Changed @Input to @Argument for non-File types 2012-10-26 01:21:05 -04:00
Menachem Fromer 0fe36b1c72 Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-10-25 16:18:57 -04:00
Menachem Fromer cde4f037d3 Begin moving XHMM scripts to public 2012-10-25 16:18:25 -04:00
Ami Levy Moonshine dde3060bb8 add the CEUtrio best practices results (UG + PBT) to the bundle 2012-10-25 15:36:17 -04:00
Khalid Shakir 2ef456d51a Added explicit @ClassType annotations to @Argument for Option[Int] or Option[Double] since scala seems to change the reflected type to Option[Object] on some systems.
Changed ReflectionUtils.getGenericTypes' order of looking for @ClassType since the primitive generic wasn't completely erased, only changed to Object which is incorrect.
More fixes to @Arguments labeled as java.io.File via incorrect @Input annotation.
Put in a default undocumented implementation of @Argument doc() to match the one added to @Input.
2012-10-19 13:20:29 -04:00
Khalid Shakir 403654d40a Fixed null checkes in ArgumentTypeDescriptor due to ArgumentMatchValue updates.
Fixed @Arguments such as scatter count that were labeled as java.io.File via incorrect @Input annotation.
2012-10-18 16:57:15 -04:00
Khalid Shakir f66284658d RetryMemoryLimit now works with Scatter/Gather. 2012-10-09 21:51:03 -04:00
Eric Banks 277ba94c7b Update from dbsnp135 to dbsnp137. 2012-08-31 14:06:29 -04:00
Eric Banks 5ea7cd6dcc Updating resource bundle: no reason to include both genotype and sites files for Omni and HM3, sites are enough. Also, don't include duplicate entry for the Mills indels. 2012-08-31 14:01:54 -04:00
Khalid Shakir 22b4466cf5 Added setupRetry() to modify jobs when Queue is run with '-retry' and jobs are about to restart after an error.
Implemented a mixin called "RetryMemoryLimit" which will by default double the memory.
GridEngine memory request parameter can be selected on the command line via '-resMemReqParam mem_free' or '-resMemReqParam virtual_free'.
Java optimizations now enabled by default:
- Only 4 GC threads instead of each job using java's default O(number of cores) GC threads. Previously on a machine with N cores if you have N jobs running and java allocates N GC threads by default, then the machines are using up to N^2 threads if all jobs are in heavy GC (thanks elauzier).
- Exit if GC spends more than 50% of time in GC (thanks ktibbett).
- Exit if GC reclaims lest than 10% of max heap (thanks ktibbett).
Added a -noGCOpt command line option to disable new java optimizations.
2012-08-13 15:43:05 -04:00
Eric Banks 7cf4b63d76 Disabling indel quals in BaseRecalibrator as it should be, not PrintReads. 2012-08-01 09:23:04 -04:00
Eric Banks 675ccab2fa Renaming BQSR to BaseRecalibrator 2012-07-23 10:17:17 -04:00
Eric Banks 863eb5b5c0 Use Context not Dinuc covariate 2012-07-17 15:18:11 -04:00
Eric Banks 17d627b86d Update the DPP and PBPP to use the BQSRv2 walkers 2012-07-17 13:15:32 -04:00
Mauricio Carneiro 9346c5b37a Merged bug fix from Stable into Unstable 2012-06-26 14:55:41 -04:00
Mauricio Carneiro 334d66f2b1 Updating validation parameter in the DPP
users were very confused with the failing validation of their 'unpicarded' bam files. Changed the default to OFF and added an option to turn it on.
2012-06-26 14:54:37 -04:00
Ryan Poplin c3fb321014 Minor updates to pacbio data processing script to make it work with the latest bwa version/settings. 2012-05-22 10:24:45 -04:00
Khalid Shakir 91cb654791 AggregateMetrics:
- By porting from jython to java now accessible to Queue via automatic extension generation.
- Better handling for problematic sample names by using PicardAggregationUtils.
GATKReportTable looks up keys using arrays instead of dot-separated strings, which is useful when a sample has a period in the name.
CombineVariants has option to suppress the header with the command line, which is now invoked during VCF gathering.
Added SelectHeaders walker for filtering headers for dbGAP submission.
Generated command line for read filters now correctly prefixes the argument name as --read_filter instead of -read_filter.
Latest WholeGenomePipeline.
Other minor cleanup to utility methods.
2012-04-17 11:45:32 -04:00
Eric Banks ed69f4ff7c Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-13 09:28:16 -04:00
Eric Banks 9b9856ead5 quick todo for next time we make a bundle 2012-03-13 09:28:11 -04:00
Eric Banks 6e9b8559d8 Unfortunately need to bump up memory needed for liftover to get Omni file sorted 2012-03-12 23:20:00 -04:00
Eric Banks 359090c4b7 Updating dbsnp to v135 2012-03-12 13:17:58 -04:00
Eric Banks 7e9a535c4d Updated the bundle to use the official filtered (final) indel calls 2012-03-12 12:12:24 -04:00
Christopher Hartl 2c1b14d35e Mostly small changes to my own scala scripts: .vcf.gz compatibility for output files, smarter beagle generation, simple script to scatter-gather combine variants. Whole genome indel calling now uses the gold standard indel set. 2012-02-22 17:20:04 -05:00
Christopher Hartl 974c2499cc Bugfixed to script. 2012-02-02 12:55:54 -05:00
Christopher Hartl 27ea6426a4 Small script to chunk up a VCF into equal-sized chunks 2012-02-02 12:29:03 -05:00
Christopher Hartl 0c562756eb Add a memory limit so this thing doesn't get killed on the farm 2012-02-02 10:30:09 -05:00
Christopher Hartl 45bf2562cc . 2012-02-02 09:11:17 -05:00
Christopher Hartl f8c5406084 Add the ability to extract samples 2012-02-02 09:06:39 -05:00
Christopher Hartl b567ed8793 Use the right reference path :( 2012-02-01 12:35:18 -05:00
Christopher Hartl 87a63d54d6 fix the script! 2012-02-01 12:05:29 -05:00
Christopher Hartl 810996cfca Introducing: VariantsToPed, the world's most annoying walker! And also a busted QScript to run it that I need Khalid's help debugging ( frownie face ). Note that VariantsToPed and PlinkSeq generate the same binary file (up to strand flips...thanks PlinkSeq), so I know it's working properly. Hooray! 2012-02-01 10:39:03 -05:00
Mauricio Carneiro 052a4bdb9c Turning off PHONE HOME option in the MDCP
* MDCP is for internal use and there is no need to report to the Amazon cloud.
   * Reporting to ASW_S3 is not allowing jobs to finish, this is probably a bug.
2012-01-27 11:13:30 -05:00
Mauricio Carneiro 97499529c7 another small bug with the file extension. 2012-01-24 16:14:35 -05:00
Mauricio Carneiro 945cf03889 IntelliJ ate my import! 2012-01-23 21:46:45 -05:00
Mauricio Carneiro 2bb9525e7f Don't set base qualities if fastQ is provided
* Pacbio Processing pipeline now works with the new fastQ files outputted by the Pacbio instrument
2012-01-23 17:57:29 -05:00
Khalid Shakir c18beadbdb Device files like /dev/null are now tracked as special by Queue and are not used to generate .out file paths, scattered into a temporary directory, gathered, deleted, etc.
Attempted workaround for xdr_resourceInfoReq unsatisfied link during loading of libbat.so.
2012-01-23 16:17:04 -05:00
Ryan Poplin 75f87db468 Replacing Mills file with new gold standard indel set in the resource bundle for release with v1.5 2012-01-17 15:02:45 -05:00
Mauricio Carneiro 5bf960deb8 adding dbsnp to indel VQSR 2012-01-10 12:38:49 -05:00