carneiro
355be57539
fixing the pipeline so that it still works while I'm adding support for BWA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5921 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 19:32:28 +00:00
kshakir
8d294dd6e6
For the snps to create combine snps and filtered indels, now using a VCF with just snps instead of vcf with snps plus unfiltered indels.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5904 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-29 04:17:18 +00:00
carneiro
5974675b43
Two intermediate commits, to work over the weekend.
...
ReplicationValidationWalker: Just the skeleton of what will be the implementation of the replication/validation model.
dataProcessingV2: Committing an UNTESTED implementation of BWA alignment. I am running tests on it over the weekend.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5900 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 22:03:08 +00:00
carneiro
2524216d4b
Added the R script for VQSR
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5898 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 21:56:56 +00:00
depristo
549172af10
removing dependance on jobQueue == gsa
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5889 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 10:12:09 +00:00
fromer
b4af28c7df
Handle case where -L argument (intervals) not given
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5886 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 20:24:56 +00:00
ebanks
d393f59ad2
Moving the hg19 reference to a new location as per instruction from our intrepid leader
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5875 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:48:59 +00:00
kshakir
9d8c963fcc
Switched arguments from short name to long name.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5873 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:24:56 +00:00
kshakir
6ec3dd0f8c
Updated GridEngineJobRunner to return status RUNNING instead of PENDING when a job has been sent to GridEngine, even if it hasn't started.
...
Added GridEngine to pipeline tests.
Removed passing -jobProject since GridEngine projects must be predefined.
Writing the HybridSelectionPipelineTest yaml into the temp directory.
Disabled job priority as it needs to be refactored for use by GridEngine and LSF.
Fixed WholeGenomePipeline variantmergeoption rename to filteredRecordsMergeType.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5872 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:16:44 +00:00
delangel
3565eca2dd
Script to run UG to create annotated all-pop VCF files to use for Phase1 VQSR indel project consensus. Paralleles and generalizes SNP version, so in theory this script can be used for both SNP and Indel consensus.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5871 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 16:50:59 +00:00
ebanks
3d134a8497
Updated to produce (actual) hg19 resources too
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5870 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 02:14:55 +00:00
delangel
e6396062c0
Script to use VQSR on indels - does VR, AR on each continental group, combines variants and then does VariantEval comparing with different chr20 all-pop 1000G callsets.
...
Not for general use yet!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5866 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 17:19:30 +00:00
depristo
0448ef28d3
Actually use the right parameter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5864 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 12:09:01 +00:00
depristo
d551ce720d
Updated with new CombineVariants syntax
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5862 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 11:38:09 +00:00
carneiro
2efd807952
No more default callsets, they're now mandatory arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5858 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:56:43 +00:00
fromer
bc4305c956
Added memory limit parameter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5855 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:11:44 +00:00
fromer
833dff658a
Small script to do full variant annotation in parallel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5853 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 20:33:20 +00:00
chartl
912c6cdbfa
Moving this script out of playground while I figure out what's going on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5848 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 17:48:44 +00:00
depristo
72ad8ded19
Removed unused importants, but some of these scripts are now out of date (they have been for a long time) so they don't compile anyway
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5837 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:43:48 +00:00
carneiro
3a2e32eef3
wex is wex, wgs is wgs.... i think i got it right this time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5828 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-20 16:44:25 +00:00
kshakir
6c6e52def9
Renamed FCP to HybridSelectionPipeline.
...
Reviewed pipelines with dev team.
HSP updates:
- Calling SNPs and Indels at the same time then using SelectVariants to separate them for filtering
- Moved logs next to the files like in WGP
- Flattened outputs into one directory
- The file names for the final outputs are now <projectName>.vcf and <projectName>.eval
- Updated test to pass the chr20 intervals instead of a boolean
- Removed MultiFCP
WGP updates:
- Only cleaning and calling chromosomes 1-22, X, Y, MT
- Splitting SNPs from indels, filtering indels, then merging the selected SNPs and selected Indels back together to make sure there are no collisions in CombineVariants
- Still running VQSR on the recombined SNPs plus hard filtered indels
- Using hard indel filters from delangel
- Reduced number of tranches with rpoplin
- Changed prior for dbsnp from 10 to 8 with rpoplin
- Assuming identical samples on both CombineVariants
- Explicitly using variant merge option UNION even though it's the default
- Not setting the default genotype merge option PRIORITIZE
- Generating a vcf and eval for each tranche
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5825 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-19 22:47:02 +00:00
carneiro
76c87c9f1d
trio WGS was creating trio WEX filenames.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5822 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-19 17:45:45 +00:00
carneiro
ebcd333ed8
Quick small updates:
...
SelectVariants: typo
MethodsDevelopmentPipeline: Added CEU Trio WGS dataset
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5818 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-18 20:08:39 +00:00
carneiro
b5b8cb959a
Added VQSR to the downsampling script and changed memory limits for the clean script.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5817 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-18 20:07:42 +00:00
kshakir
83e207d9dd
Added option to exclude intervals during chunk calling.
...
Removed job priority as temp space isn't as tight at the moment and planning on changing the priority interface.
Updated chunk calling with ebanks:
- Using "the bundle" of resources.
- Using dbsnp 132 and 1000G indel RODs for both RTC & IR.
- Using the default maxIntervalSize in RTC.
- Removed use of UG.exactCalculation argument.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5814 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-18 03:48:02 +00:00
depristo
9423652ad8
Computes how well a genotype chip covers a reference panel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5806 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-14 15:07:28 +00:00
kshakir
95fc6c0a83
Changed VR tranches from old 0.1-10 to new 100 to 90.
...
Using hapmap training and truth based on wiki.
Explicitly setting the ts_filter_level even though 99.0 is the default.
Recal file path now ends with with .recal.
Added ar's vcf input.
Omni rod name now omni instead of 1kg.
The VR RodBind tags had spaces in them.
Was passing both the full intervals and the chunk intervals to chunk jobs.
Switched back to chr20 for default since the VR crashes on small intervals sets with "MESSAGE: Matrix is singular."
Log files names based on the file paths + .out.
Added eval statifications by sample based on the Hybrid Selection / Whole Exome pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5800 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-13 14:38:56 +00:00
kshakir
08c13f3944
Using embedded GATK.
...
Hardcoded the reference and dbsnp since the training rods are also hardcoded, for now.
Changed freeze/chr20 to wg/chr20/cent1 to also test the heaviest known shard.
Other cleanup.
TODO: Memory command line options or have the script figure it out using FLS or similar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5799 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-12 23:19:49 +00:00
chartl
66c8fa5c48
James P says this change worked for him, so I'm committing it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5795 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-12 16:55:18 +00:00
rpoplin
1d11e88899
Adding another example call set to GATK resource bundle for use in VQSR wiki tutorial
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5774 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 21:16:33 +00:00
fromer
04f156d86b
Removed extraneous import
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5772 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 18:51:03 +00:00
rpoplin
825682f58c
oops, putting the script back into a sensible state
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:17:05 +00:00
rpoplin
b5ab2274f6
Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:13:26 +00:00
kshakir
08f0509a5c
Disabling the queue/pipeline package by default so that scala code can build. If it's not going to be fixed the package should be removed. If it is going to be fixed this patch to build.xml should be reverted.
...
Also added the old model of indel calling to the FCP.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5749 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 12:17:33 +00:00
carneiro
f35d955490
recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:43:09 +00:00
carneiro
9f2a8033ff
just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:42:23 +00:00
carneiro
c2f8536e02
removing old GATK options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:40:39 +00:00
carneiro
8bb92160b5
Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:19:42 +00:00
carneiro
e2b9227d8d
script to test BQSR on good/bad regions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:16:37 +00:00
rpoplin
4bbce42861
Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:12:47 +00:00
rpoplin
3224bbe750
New visualization output for VQSR. It creates the R script file on the fly and then runs Rscript on it. Adding 1000G Project consensus code. First pass of having VQSR work with missing data by marginalizing over the missing dimension for that data point (thanks Chris and Bob for ideas). Updated math functions to use apache math commons instead of approximations from wikipedia. New parameters available for the priors based on further reading in Bishop and looking at the new visualizations. Updated integration test to use more modern files. Updated MDCP to use new best practices w.r.t. annotations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5723 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 19:14:42 +00:00
carneiro
a93a9ac663
adding gold standard (full coverage) to the variant eval analysis output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5721 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 16:29:11 +00:00
carneiro
2384e23274
Added the capability of running count covariates only on a given interval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5717 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 21:30:14 +00:00
carneiro
3868a7e778
Oneoff project to downsample, bootstrap and call snps to test sensitivity/specificity of downsampled coverage in WEX projects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5713 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:17:30 +00:00
carneiro
f04cc4321f
fixed a bug when the pipeline was used on a single bam.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5708 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 17:19:22 +00:00
depristo
122d5845d3
GATK Resource bundle, latest version (now with b37 -> b36 support). Oneoff scala script that assesses chip coverage of calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5703 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 22:01:36 +00:00
kshakir
6b1b4931e7
Added FCP VE stratifications for Filter, FunctionalClass, and Stratification as requested by Corin.
...
Feeding FCP UG the bam list instead of individual bams to cut scatter gather time from O(m^100) as measured by Chris to O(m^1).
Fixed NPE when eval values aren't found in PipelineTests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5694 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 02:29:56 +00:00
kshakir
00b57c751b
Added missing ".0".
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5682 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 21:50:07 +00:00
chartl
5b9a8555cd
Queue graph time is currently of O(n^m) where n = num jobs, m = num unique base files. This script therefore was running in order 1200^16, which I don't think would finish before the heat death of the universe. For now, push down the number of files to 1 and gather them outside of Queue, once I've fixed up scatter-gather in core, outputs can be uncommented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5674 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 12:56:25 +00:00
corin
9f006be425
Updates Omni path and removes a typo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5673 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 04:17:13 +00:00