carneiro
9c1b8ea796
Updated BQSR script to be more general and work with the new PacBio BAM files - for Kristian Cibulskis
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6075 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 21:05:28 +00:00
carneiro
087a25d9e3
quick memory upgrade to BWA classes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6074 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:53:32 +00:00
carneiro
fbe157137f
removing the old processing pipeline.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6073 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:19:13 +00:00
droazen
48055d45cb
Added support for PICARD functions to QUEUE after following Khalid's pointers on where to do it. I have added the 6 functions used by the Data Processing Pipeline, but from now on it should be a matter of seconds to copy/paste and create bindings to more functions.
...
Updated the Data Processing Pipeline to use the new Picard classes and reorganized the pre-processing of the pipeline accordingly.
Will only update the wiki once this change goes live.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6071 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:14 +00:00
droazen
c8124496d0
now with the new 'consensus model' parameter to the cleaner.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6064 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:42 +00:00
droazen
29a0e08aa2
Testing bug fix process #3 (changes are irrelevant)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6048 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:40 +00:00
droazen
e148a75c32
Testing the 'bug fix' process #2 (changes are irrelevant)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6047 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:37 +00:00
droazen
9b90e9385d
Putting new association files, some qscripts, and the new pick sequenom probes file under local version control. I notice some dumb emacs backup files, I'll kill those off momentarily. Also minor changes to GenomeLoc (getStartLoc() and getEndLoc() convenience methods)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6034 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:37 +00:00
depristo
4c6d0e6143
Added stratification by discrete allele count, just like AF, but requiring genotypes so it can be exact. Added docs on wiki, and integrationtest using Kiran's very nice fundamental VCF
...
VariantEvalWalker now passes a pointer to itself to the Stratefication setVariantEvalWalker (and assoc. get method) so that stratefications can look at VEWalker variables to obtain information necessary for their calculations, like the list of eval samples. This is a better interface, in my opinion, than the current approach of extending the base abstract Stratefication to include an initialize function that has all arguments necessarily for any Strat.
JEXL expressions now provide access to the VariantContext vc object itself, so you can write JEXL's that directly use VariantContext and associated functions from the command line.
ExomePostQC Queue script now creates a byAC eval using the new strat, and no longer produces a byAF file (as this was not exact, and lead to strange punctile behavior when actual AF quantization was out of sync with fix quantization of AF strat.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6015 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-19 03:11:00 +00:00
fromer
03a0185566
Control unscattered output file location
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6011 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-17 15:53:25 +00:00
ebanks
4e85416af1
[Foiled yet again when trying to do this in git] Slight modifications in the argument structure for the IndelRealigner. Instead of boolean flags -knownsOnly and -doNotUseSW, we now have an enum --consensusDeterminationModel which lets you specify knowns only, also use indels in reads, or also use SW. Please note that the default behavior of IR has not changed at all (and won't for a few more days) - that'll be done in GIT (fingers crossed).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6008 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 17:35:37 +00:00
depristo
27d4b317fc
Simple program that calls indels in CEU trio exomes and WGS can compared the results. Overall the indel calls really look good to me, given reasonably good input BAM files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6006 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:56:04 +00:00
depristo
43fdd31e20
Significant performance optimization for reduced reads due to better algorithm for including reads in the variable regions. Fixed a critical bug that actually produced multiple copies of the same read in the variable regions with this optimization as well. Scala exploration script updated as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6005 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:54:59 +00:00
depristo
38d7733989
Now accepts any number of VCFs to evaluate. Runs the standard (now three) variant eval commands and invokes the exomeQC R script. Has some annoying assumptions about paths encoded right now. Example usage below:
...
setenv DATA ~/Desktop/broadLocal/localData/
java -Djava.io.tmpdir=tmp -jar ../dist/Queue.jar -S ../scala/qscript/oneoffs/depristo/ExomePostQCEval.scala --gatkjarfile ../dist/GenomeAnalysisTK.jar -R $DATA/human_g1k_v37.fasta $* -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch1.vcf -intervals ~/Desktop/broadLocal/localData/whole_exome_agilent_1.1_refseq_plus_3_boosters.Homo_sapiens_assembly19.targets.interval_list -dbSNP ~/Desktop/broadLocal/localData/dbsnp_132_b37.vcf -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch2.vcf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6004 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:49:54 +00:00
fromer
b4c30bf124
Added option of minMappingQuality
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6002 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 00:02:26 +00:00
depristo
165befd38a
V1 of the post processing QC plotting scala script and R function. The scala script runs VariantEval on a VCF file, and computes QC metrics. The R script generates the report. Will discuss usage with data processing group. Ryan -- please add your additional plotting routines to this script, as you see fit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5980 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:06:42 +00:00
carneiro
95f3da1126
limiting the number of reads in memory for the SamValidateFile.jar
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5976 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 20:14:30 +00:00
ebanks
077862958d
Oops, forgot to define the hg19 variable
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5975 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-10 18:26:48 +00:00
depristo
44287ea8dc
ReducedBAM changes to downsample to a fixed coverage over the variable regions. Evaluation script now has filters and eval. commands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5965 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 19:36:08 +00:00
delangel
78f5309656
Intermediate commit of indel consensus VQSR script, a couple of new features added, not for general use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5951 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-06 13:27:02 +00:00
kshakir
ac3f1be7f0
Added a samtools merge CLF.
...
Using samtools to merge the low pass bams before cleaning to avoid "Too many open files." with 1500+ bams.
Other minor cleanup as pointed out by the IntelliJ scala plugin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5942 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-03 22:20:38 +00:00
kshakir
4c6751ec3c
Added argument to WGP and HSP to allow more memory.
...
Upped the WGP VQSR memory to 32g to power through the filtering whole genome. TODO: Figure out what the right amount is.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5940 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-03 20:48:37 +00:00
depristo
cd293f145b
More stable reduced reads representation. Bug fixes throughout. No diffs by <1% of sites in an exome, and the majority of these differences are filtered out, or are obvious artifacts. UnitTests for BaseCounts. BaseCounts extended to handle indels, but not yet enabled in the consensus reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5939 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-03 20:11:31 +00:00
carneiro
32ac7be86a
new name to the pipeline, it's now in core, happy to support it.
...
ps: Can't wait for GIT !
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5933 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 23:34:54 +00:00
carneiro
a4ffae880d
Subversion crashed my intellij BADLY, so now moving the data processing pipeline to core in 2 steps.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5932 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 23:31:24 +00:00
carneiro
36db9bdcd5
Implemented and tested BWA alignment in the data processing pipeline.
...
caveat: Right now bwa only supports one read group, so if the original file had multiple @RG lines, only the first one will be kept. (working on a solution to this)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5931 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 23:03:07 +00:00
carneiro
c85a1d9210
Implemented and tested BWA alignment in the data processing pipeline.
...
Renamed it and moved to core. Happy to support it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5930 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 22:58:55 +00:00
fromer
ef56b48eef
Add CNV sub-dir
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5928 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 21:47:13 +00:00
carneiro
355be57539
fixing the pipeline so that it still works while I'm adding support for BWA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5921 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 19:32:28 +00:00
kshakir
8d294dd6e6
For the snps to create combine snps and filtered indels, now using a VCF with just snps instead of vcf with snps plus unfiltered indels.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5904 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-29 04:17:18 +00:00
carneiro
5974675b43
Two intermediate commits, to work over the weekend.
...
ReplicationValidationWalker: Just the skeleton of what will be the implementation of the replication/validation model.
dataProcessingV2: Committing an UNTESTED implementation of BWA alignment. I am running tests on it over the weekend.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5900 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 22:03:08 +00:00
carneiro
2524216d4b
Added the R script for VQSR
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5898 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 21:56:56 +00:00
depristo
549172af10
removing dependance on jobQueue == gsa
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5889 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 10:12:09 +00:00
fromer
b4af28c7df
Handle case where -L argument (intervals) not given
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5886 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 20:24:56 +00:00
ebanks
d393f59ad2
Moving the hg19 reference to a new location as per instruction from our intrepid leader
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5875 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:48:59 +00:00
kshakir
9d8c963fcc
Switched arguments from short name to long name.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5873 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:24:56 +00:00
kshakir
6ec3dd0f8c
Updated GridEngineJobRunner to return status RUNNING instead of PENDING when a job has been sent to GridEngine, even if it hasn't started.
...
Added GridEngine to pipeline tests.
Removed passing -jobProject since GridEngine projects must be predefined.
Writing the HybridSelectionPipelineTest yaml into the temp directory.
Disabled job priority as it needs to be refactored for use by GridEngine and LSF.
Fixed WholeGenomePipeline variantmergeoption rename to filteredRecordsMergeType.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5872 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:16:44 +00:00
delangel
3565eca2dd
Script to run UG to create annotated all-pop VCF files to use for Phase1 VQSR indel project consensus. Paralleles and generalizes SNP version, so in theory this script can be used for both SNP and Indel consensus.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5871 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 16:50:59 +00:00
ebanks
3d134a8497
Updated to produce (actual) hg19 resources too
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5870 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 02:14:55 +00:00
delangel
e6396062c0
Script to use VQSR on indels - does VR, AR on each continental group, combines variants and then does VariantEval comparing with different chr20 all-pop 1000G callsets.
...
Not for general use yet!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5866 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 17:19:30 +00:00
depristo
0448ef28d3
Actually use the right parameter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5864 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 12:09:01 +00:00
depristo
d551ce720d
Updated with new CombineVariants syntax
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5862 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 11:38:09 +00:00
carneiro
2efd807952
No more default callsets, they're now mandatory arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5858 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:56:43 +00:00
fromer
bc4305c956
Added memory limit parameter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5855 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:11:44 +00:00
fromer
833dff658a
Small script to do full variant annotation in parallel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5853 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 20:33:20 +00:00
chartl
912c6cdbfa
Moving this script out of playground while I figure out what's going on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5848 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 17:48:44 +00:00
depristo
72ad8ded19
Removed unused importants, but some of these scripts are now out of date (they have been for a long time) so they don't compile anyway
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5837 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:43:48 +00:00
carneiro
3a2e32eef3
wex is wex, wgs is wgs.... i think i got it right this time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5828 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-20 16:44:25 +00:00
kshakir
6c6e52def9
Renamed FCP to HybridSelectionPipeline.
...
Reviewed pipelines with dev team.
HSP updates:
- Calling SNPs and Indels at the same time then using SelectVariants to separate them for filtering
- Moved logs next to the files like in WGP
- Flattened outputs into one directory
- The file names for the final outputs are now <projectName>.vcf and <projectName>.eval
- Updated test to pass the chr20 intervals instead of a boolean
- Removed MultiFCP
WGP updates:
- Only cleaning and calling chromosomes 1-22, X, Y, MT
- Splitting SNPs from indels, filtering indels, then merging the selected SNPs and selected Indels back together to make sure there are no collisions in CombineVariants
- Still running VQSR on the recombined SNPs plus hard filtered indels
- Using hard indel filters from delangel
- Reduced number of tranches with rpoplin
- Changed prior for dbsnp from 10 to 8 with rpoplin
- Assuming identical samples on both CombineVariants
- Explicitly using variant merge option UNION even though it's the default
- Not setting the default genotype merge option PRIORITIZE
- Generating a vcf and eval for each tranche
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5825 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-19 22:47:02 +00:00
carneiro
76c87c9f1d
trio WGS was creating trio WEX filenames.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5822 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-19 17:45:45 +00:00