Commit Graph

459 Commits (36db9bdcd55aa9668d7b5c2bed620a2eb96f21bc)

Author SHA1 Message Date
carneiro 36db9bdcd5 Implemented and tested BWA alignment in the data processing pipeline.
caveat: Right now bwa only supports one read group, so if the original file had multiple @RG lines, only the first one will be kept. (working on a solution to this)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5931 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 23:03:07 +00:00
carneiro c85a1d9210 Implemented and tested BWA alignment in the data processing pipeline.
Renamed it and moved to core. Happy to support it.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5930 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 22:58:55 +00:00
fromer ef56b48eef Add CNV sub-dir
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5928 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 21:47:13 +00:00
carneiro 355be57539 fixing the pipeline so that it still works while I'm adding support for BWA.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5921 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 19:32:28 +00:00
kshakir 8d294dd6e6 For the snps to create combine snps and filtered indels, now using a VCF with just snps instead of vcf with snps plus unfiltered indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5904 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-29 04:17:18 +00:00
carneiro 5974675b43 Two intermediate commits, to work over the weekend.
ReplicationValidationWalker: Just the skeleton of what will be the implementation of the replication/validation model.
dataProcessingV2: Committing an UNTESTED implementation of BWA alignment. I am running tests on it over the weekend.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5900 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 22:03:08 +00:00
carneiro 2524216d4b Added the R script for VQSR
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5898 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 21:56:56 +00:00
depristo 549172af10 removing dependance on jobQueue == gsa
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5889 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 10:12:09 +00:00
kshakir fd21c5d100 Minor update so the debug messages don't show temp files as chromosome 208799060637697164972
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5887 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 22:56:33 +00:00
fromer b4af28c7df Handle case where -L argument (intervals) not given
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5886 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 20:24:56 +00:00
ebanks d393f59ad2 Moving the hg19 reference to a new location as per instruction from our intrepid leader
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5875 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:48:59 +00:00
kshakir 9d8c963fcc Switched arguments from short name to long name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5873 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:24:56 +00:00
kshakir 6ec3dd0f8c Updated GridEngineJobRunner to return status RUNNING instead of PENDING when a job has been sent to GridEngine, even if it hasn't started.
Added GridEngine to pipeline tests.
Removed passing -jobProject since GridEngine projects must be predefined.
Writing the HybridSelectionPipelineTest yaml into the temp directory.
Disabled job priority as it needs to be refactored for use by GridEngine and LSF.
Fixed WholeGenomePipeline variantmergeoption rename to filteredRecordsMergeType.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5872 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:16:44 +00:00
delangel 3565eca2dd Script to run UG to create annotated all-pop VCF files to use for Phase1 VQSR indel project consensus. Paralleles and generalizes SNP version, so in theory this script can be used for both SNP and Indel consensus.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5871 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 16:50:59 +00:00
ebanks 3d134a8497 Updated to produce (actual) hg19 resources too
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5870 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 02:14:55 +00:00
delangel e6396062c0 Script to use VQSR on indels - does VR, AR on each continental group, combines variants and then does VariantEval comparing with different chr20 all-pop 1000G callsets.
Not for general use yet!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5866 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 17:19:30 +00:00
depristo 0448ef28d3 Actually use the right parameter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5864 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 12:09:01 +00:00
depristo d551ce720d Updated with new CombineVariants syntax
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5862 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 11:38:09 +00:00
carneiro 2efd807952 No more default callsets, they're now mandatory arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5858 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:56:43 +00:00
fromer bc4305c956 Added memory limit parameter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5855 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:11:44 +00:00
fromer 833dff658a Small script to do full variant annotation in parallel
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5853 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 20:33:20 +00:00
chartl 912c6cdbfa Moving this script out of playground while I figure out what's going on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5848 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 17:48:44 +00:00
depristo 72ad8ded19 Removed unused importants, but some of these scripts are now out of date (they have been for a long time) so they don't compile anyway
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5837 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:43:48 +00:00
depristo e234589240 Contracts for GenomeLocParser and GenomeLoc are now fully implemented.
GenomeLocs can officially have any start/stop values from -Inf - +Inf.  Bounds w.r.t. the reference are enforced, optionally, by GenomeLocParser.  General code cleanup throughout the subsystem.

All validation code for GLs is now centralized, and all I/O systems now validate their inputs.  Because of this, the Picard interval processing code has been changed to examine whether an interval is valid, and only keep the valid intervals.  Note that the scatter/gather test was changed, because the original hg18 chr20 interval files as actually malformed (all records for some reason where on chr20).  

Many interval processing routines were moved to IntervalUtils, as this is their natural home.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5830 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-21 02:01:59 +00:00
carneiro 3a2e32eef3 wex is wex, wgs is wgs.... i think i got it right this time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5828 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-20 16:44:25 +00:00
kshakir 6c6e52def9 Renamed FCP to HybridSelectionPipeline.
Reviewed pipelines with dev team.
HSP updates:
- Calling SNPs and Indels at the same time then using SelectVariants to separate them for filtering
- Moved logs next to the files like in WGP
- Flattened outputs into one directory
- The file names for the final outputs are now <projectName>.vcf and <projectName>.eval
- Updated test to pass the chr20 intervals instead of a boolean
- Removed MultiFCP
WGP updates:
- Only cleaning and calling chromosomes 1-22, X, Y, MT
- Splitting SNPs from indels, filtering indels, then merging the selected SNPs and selected Indels back together to make sure there are no collisions in CombineVariants
- Still running VQSR on the recombined SNPs plus hard filtered indels
- Using hard indel filters from delangel
- Reduced number of tranches with rpoplin
- Changed prior for dbsnp from 10 to 8 with rpoplin
- Assuming identical samples on both CombineVariants
- Explicitly using variant merge option UNION even though it's the default
- Not setting the default genotype merge option PRIORITIZE
- Generating a vcf and eval for each tranche


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5825 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-19 22:47:02 +00:00
carneiro 76c87c9f1d trio WGS was creating trio WEX filenames.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5822 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-19 17:45:45 +00:00
carneiro ebcd333ed8 Quick small updates:
SelectVariants: typo
MethodsDevelopmentPipeline: Added CEU Trio WGS dataset


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5818 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-18 20:08:39 +00:00
carneiro b5b8cb959a Added VQSR to the downsampling script and changed memory limits for the clean script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5817 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-18 20:07:42 +00:00
kshakir 83e207d9dd Added option to exclude intervals during chunk calling.
Removed job priority as temp space isn't as tight at the moment and planning on changing the priority interface.
Updated chunk calling with ebanks:
- Using "the bundle" of resources.
- Using dbsnp 132 and 1000G indel RODs for both RTC & IR.
- Using the default maxIntervalSize in RTC.
- Removed use of UG.exactCalculation argument.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5814 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-18 03:48:02 +00:00
depristo 9423652ad8 Computes how well a genotype chip covers a reference panel
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5806 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-14 15:07:28 +00:00
kshakir 95fc6c0a83 Changed VR tranches from old 0.1-10 to new 100 to 90.
Using hapmap training and truth based on wiki.
Explicitly setting the ts_filter_level even though 99.0 is the default.
Recal file path now ends with with .recal.
Added ar's vcf input.
Omni rod name now omni instead of 1kg.
The VR RodBind tags had spaces in them.
Was passing both the full intervals and the chunk intervals to chunk jobs.
Switched back to chr20 for default since the VR crashes on small intervals sets with "MESSAGE: Matrix is singular."
Log files names based on the file paths + .out.
Added eval statifications by sample based on the Hybrid Selection / Whole Exome pipeline.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5800 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-13 14:38:56 +00:00
kshakir 08c13f3944 Using embedded GATK.
Hardcoded the reference and dbsnp since the training rods are also hardcoded, for now.
Changed freeze/chr20 to wg/chr20/cent1 to also test the heaviest known shard.
Other cleanup.
TODO: Memory command line options or have the script figure it out using FLS or similar.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5799 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-12 23:19:49 +00:00
dheiman 9e08a699c6 Corrected memory handling and jobName formatting issues
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5797 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-12 17:47:56 +00:00
chartl 66c8fa5c48 James P says this change worked for him, so I'm committing it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5795 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-12 16:55:18 +00:00
dheiman 16db86e6cb Grid Engine backend to GATK-Queue, initial commit of implementation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5788 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-11 13:21:45 +00:00
kshakir 3ffc2ccd81 Implemented broad specific LSF requirement in the LSF job runner ahead of GridEngine check in by dheiman.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5781 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-09 22:14:04 +00:00
rpoplin 1d11e88899 Adding another example call set to GATK resource bundle for use in VQSR wiki tutorial
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5774 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 21:16:33 +00:00
fromer 04f156d86b Removed extraneous import
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5772 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 18:51:03 +00:00
kshakir 4d08d39849 Moved some of the java to scala conversions from production to test code as it's not needed in production and slows down the code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5769 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 04:11:15 +00:00
kshakir 28b897d5de Fixed O(N^2) operation when scattering interval files.
Cleaned up intervals contig count function.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5768 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 03:32:35 +00:00
kshakir 8ad547e6c2 Fixed another interval bug where dividing up N intervals into N parts wasn't working.
Minor updates to the FCPTest to match the changes due to using the old indel caller.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5766 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:49:35 +00:00
rpoplin 825682f58c oops, putting the script back into a sensible state
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:17:05 +00:00
rpoplin b5ab2274f6 Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:13:26 +00:00
kshakir 4d251fb91f Why won't you die?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5758 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 19:13:39 +00:00
kshakir f7d9f0a1f3 Removing QPipeline directory as there's no one to support it at the moment.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5757 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 18:36:02 +00:00
kshakir 08f0509a5c Disabling the queue/pipeline package by default so that scala code can build. If it's not going to be fixed the package should be removed. If it is going to be fixed this patch to build.xml should be reverted.
Also added the old model of indel calling to the FCP.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5749 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 12:17:33 +00:00
carneiro f35d955490 recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:43:09 +00:00
carneiro 9f2a8033ff just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:42:23 +00:00
carneiro c2f8536e02 removing old GATK options
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:40:39 +00:00