Commit Graph

424 Commits (ec443e89cf58e9548ea8beb91c211f8ef65a8b25)

Author SHA1 Message Date
dheiman 16db86e6cb Grid Engine backend to GATK-Queue, initial commit of implementation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5788 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-11 13:21:45 +00:00
kshakir 3ffc2ccd81 Implemented broad specific LSF requirement in the LSF job runner ahead of GridEngine check in by dheiman.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5781 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-09 22:14:04 +00:00
rpoplin 1d11e88899 Adding another example call set to GATK resource bundle for use in VQSR wiki tutorial
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5774 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 21:16:33 +00:00
fromer 04f156d86b Removed extraneous import
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5772 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 18:51:03 +00:00
kshakir 4d08d39849 Moved some of the java to scala conversions from production to test code as it's not needed in production and slows down the code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5769 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 04:11:15 +00:00
kshakir 28b897d5de Fixed O(N^2) operation when scattering interval files.
Cleaned up intervals contig count function.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5768 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 03:32:35 +00:00
kshakir 8ad547e6c2 Fixed another interval bug where dividing up N intervals into N parts wasn't working.
Minor updates to the FCPTest to match the changes due to using the old indel caller.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5766 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:49:35 +00:00
rpoplin 825682f58c oops, putting the script back into a sensible state
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:17:05 +00:00
rpoplin b5ab2274f6 Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:13:26 +00:00
kshakir 4d251fb91f Why won't you die?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5758 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:13:39 +00:00
kshakir f7d9f0a1f3 Removing QPipeline directory as there's no one to support it at the moment.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5757 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 18:36:02 +00:00
kshakir 08f0509a5c Disabling the queue/pipeline package by default so that scala code can build. If it's not going to be fixed the package should be removed. If it is going to be fixed this patch to build.xml should be reverted.
Also added the old model of indel calling to the FCP.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5749 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 12:17:33 +00:00
carneiro f35d955490 recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:43:09 +00:00
carneiro 9f2a8033ff just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:42:23 +00:00
carneiro c2f8536e02 removing old GATK options
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:40:39 +00:00
carneiro 8bb92160b5 Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:19:42 +00:00
carneiro e2b9227d8d script to test BQSR on good/bad regions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:16:37 +00:00
rpoplin 4bbce42861 Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:12:47 +00:00
rpoplin 3224bbe750 New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 19:14:42 +00:00
carneiro a93a9ac663 adding gold standard (full coverage) to the variant eval analysis output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5721 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 16:29:11 +00:00
kshakir 2d81262f87 Fixed a bug where empty intervals were being scattered zero ways parallel. Would be awesome to use the GAE at some point.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5718 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 22:42:48 +00:00
carneiro 2384e23274 Added the capability of running count covariates only on a given interval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5717 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 21:30:14 +00:00
carneiro 3868a7e778 Oneoff project to downsample, bootstrap and call snps to test sensitivity/specificity of downsampled coverage in WEX projects.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5713 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:17:30 +00:00
carneiro f04cc4321f fixed a bug when the pipeline was used on a single bam.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5708 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 17:19:22 +00:00
depristo 122d5845d3 GATK Resource bundle, latest version (now with b37 -> b36 support). Oneoff scala script that assesses chip coverage of calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5703 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 22:01:36 +00:00
kshakir df35a143b2 Removed -debug/--debug_mode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5697 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 10:56:39 +00:00
kshakir ca817356b6 Quick disabling test to restore build. TODO fix test or complete removal of the MFCP.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5696 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 04:26:11 +00:00
kshakir 6b1b4931e7 Added FCP VE stratifications for Filter, FunctionalClass, and Stratification as requested by Corin.
Feeding FCP UG the bam list instead of individual bams to cut scatter gather time from O(m^100) as measured by Chris to O(m^1).
Fixed NPE when eval values aren't found in PipelineTests.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5694 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 02:29:56 +00:00
kshakir 58c7b27ccc Missing file from last checkin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5688 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 00:12:41 +00:00
kshakir f619dd3ca7 Refactored IntervalUtils used to parse and scatter intervals for Queue.
Scattering non-contig interval lists by number of loci in the intervals instead of just number of intervals.
Queue caches the list of locs and how to split them up instead of reloading them from disk repeatedly.
TODO: general purpose function to divide data evenly.
Skip over comments when parsing picard analysis files.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5687 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 00:06:00 +00:00
kshakir 6ca4e3cebf Updating FCPT nCalledLoci due to fixed QD<2.0 filter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5686 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-25 21:37:04 +00:00
kshakir 1158c99726 Only running chr20 test on the hour queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5684 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 22:09:42 +00:00
kshakir 00b57c751b Added missing ".0".
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5682 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 21:50:07 +00:00
chartl 5b9a8555cd Queue graph time is currently of O(n^m) where n = num jobs, m = num unique base files. This script therefore was running in order 1200^16, which I don't think would finish before the heat death of the universe. For now, push down the number of files to 1 and gather them outside of Queue, once I've fixed up scatter-gather in core, outputs can be uncommented.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5674 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 12:56:25 +00:00
corin 9f006be425 Updates Omni path and removes a typo
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5673 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 04:17:13 +00:00
kshakir 8619f49d20 Added a utility method to retrieve the contig lengths for WG chunking.
Added a rudimentary GATKReportParser for parsing VE3 results.
Re-enabled the FCPTest using VE3, the GATKRP, and the PicardAggregationUtils.
The tag type for .rod files is DBSNP, not ROD.
More explicit return types on implicit methods.
Added null checks for implicit string to/from file conversions.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5668 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 19:22:21 +00:00
depristo d8b8f857f3 V2 -- now working -- of a core walker that creates the standard GATK resource bundle
See https://www.broadinstitute.org/gsa/wiki/index.php/GATK_resource_bundle

Which live locally in /humgen/gsa-hpprojects/GATK/bundle/current

You use this following command to create the bundle:

java -Djava.io.tmpdir=/broad/shptmp/depristo/tmp -jar dist/Queue.jar -S scala/qscript/core/GATKResourcesBundle.scala --gatkjarfile dist/GenomeAnalysisTK.jar -bsub -jobQueue gsa -svn 5660 $* 

Annoyingly, it must be run in the trunk directory, and requires an explicit svn version number to create the directory.  It also must be run in two stages manually.  First, the local bundle is created, and then with the -phase2 argument all of the files in the local bundle are compressed and pushed to the FTP server.  I'm likely going to shift most of my processes over to using this location for data file access, especially for b37 data sets.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5665 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-20 12:48:47 +00:00
carneiro d35c7d1029 - minor changes to the 'justclean' script to handle the Trio Cleaning.
- fixing a bug on single ended BWA option of the data processing pipeline.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5662 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-19 16:35:24 +00:00
depristo 541c9109b3 V1 of GATK Resource Bundling system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5659 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-18 19:23:45 +00:00
chartl 23fac043d9 Fix the outputs so the proper files are gathered (not automatic due to multiplexer)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5654 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 23:55:12 +00:00
chartl e5ef8388fc BatchMerge - AlleleVCF --> AllelesVCF, this (combined with Eric's fix) will solve James P.'s forum issue.
After viewing results on real case/control data from RAW -- it's really working quite well. ReadIndels, however, needs to use a T-test rather than a U-test, especially in deep coverage (at indel sites, the reads with indels will have mostly the same number of CIGAR indel elements -- one -- which doesn't really play nicely with the UTest when sample sets are large). Modified ReadsLargeInsertSize to be a two-way test (e.g. ReadsLarge and ReadsSmall). BaseQualityScore also suffers from the same issue as read indels, so switching over to a T-test in that case as well.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5653 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 22:03:16 +00:00
kshakir 798178b167 Another case of just because you can do something doesn't mean you should.
Scala type inference for the implicit return types on implicit methods was a little too much for poor IntelliJ IDEA to handle, and it was breaking things like copy/paste, auto-complete, etc.
Also updated the Queue package to include all Sting utils.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5646 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 15:39:56 +00:00
chartl 104d5515fe Huh, somehow this change didn't make it through last time
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5639 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 17:09:37 +00:00
chartl 47fa7e2227 + Added override to extractFileEntries
+ UG now doesn't care whether it's given SNPs or indels to genotype, it will do the right thing -- so remove the option to specify which GM user wants

+ Max misamatches argument removed

integration test will follow



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5638 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 15:13:35 +00:00
kshakir cad6722cf6 Emailing on function start.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5637 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 14:55:35 +00:00
kshakir 475ad1259d Put a band-aid on the FCP by switching use of DINDEL to INDEL and explicitly running UG the old way with just indels and just snps.
Switched YAML parser to new Broad parser which will additionally update picard cleaned bams to the latest version if the project and sample are specified.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5634 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 02:22:31 +00:00
corin 9ee30ce594 Whole genome pipeline script. currently chunks, cleans, calls, merges, selects and filters indels, recalibrates, and evals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5627 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 16:59:48 +00:00
chartl 8125b8b901 Old changes to the exome VQSR search.
SGA updated to include new proportion-based insert size test.

Major fix for dichotomization test: MathUtils now optionally ignores NaN values for sums, averages, variances. In the future this feature can be pushed back into the AssociationContext object iself (e.g. no data? no entry), but it's kept like this for transparency for now.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5618 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:00:50 +00:00
kshakir 4b7c3af763 When /etc/mailname is unreadable fall back to the hostname.
Implicit conversions for String to/from File.
Small updates to the example QScripts.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5614 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-11 20:22:44 +00:00
rpoplin 05ad6ecf72 bug fix in MDCP
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5613 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-11 18:27:47 +00:00