droazen
29a0e08aa2
Testing bug fix process #3 (changes are irrelevant)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6048 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:40 +00:00
droazen
e148a75c32
Testing the 'bug fix' process #2 (changes are irrelevant)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6047 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:37 +00:00
droazen
9b90e9385d
Putting new association files, some qscripts, and the new pick sequenom probes file under local version control. I notice some dumb emacs backup files, I'll kill those off momentarily. Also minor changes to GenomeLoc (getStartLoc() and getEndLoc() convenience methods)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6034 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:37 +00:00
depristo
4c6d0e6143
Added stratification by discrete allele count, just like AF, but requiring genotypes so it can be exact. Added docs on wiki, and integrationtest using Kiran's very nice fundamental VCF
...
VariantEvalWalker now passes a pointer to itself to the Stratefication setVariantEvalWalker (and assoc. get method) so that stratefications can look at VEWalker variables to obtain information necessary for their calculations, like the list of eval samples. This is a better interface, in my opinion, than the current approach of extending the base abstract Stratefication to include an initialize function that has all arguments necessarily for any Strat.
JEXL expressions now provide access to the VariantContext vc object itself, so you can write JEXL's that directly use VariantContext and associated functions from the command line.
ExomePostQC Queue script now creates a byAC eval using the new strat, and no longer produces a byAF file (as this was not exact, and lead to strange punctile behavior when actual AF quantization was out of sync with fix quantization of AF strat.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6015 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-19 03:11:00 +00:00
fromer
03a0185566
Control unscattered output file location
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6011 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-17 15:53:25 +00:00
depristo
27d4b317fc
Simple program that calls indels in CEU trio exomes and WGS can compared the results. Overall the indel calls really look good to me, given reasonably good input BAM files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6006 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:56:04 +00:00
depristo
43fdd31e20
Significant performance optimization for reduced reads due to better algorithm for including reads in the variable regions. Fixed a critical bug that actually produced multiple copies of the same read in the variable regions with this optimization as well. Scala exploration script updated as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6005 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:54:59 +00:00
depristo
38d7733989
Now accepts any number of VCFs to evaluate. Runs the standard (now three) variant eval commands and invokes the exomeQC R script. Has some annoying assumptions about paths encoded right now. Example usage below:
...
setenv DATA ~/Desktop/broadLocal/localData/
java -Djava.io.tmpdir=tmp -jar ../dist/Queue.jar -S ../scala/qscript/oneoffs/depristo/ExomePostQCEval.scala --gatkjarfile ../dist/GenomeAnalysisTK.jar -R $DATA/human_g1k_v37.fasta $* -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch1.vcf -intervals ~/Desktop/broadLocal/localData/whole_exome_agilent_1.1_refseq_plus_3_boosters.Homo_sapiens_assembly19.targets.interval_list -dbSNP ~/Desktop/broadLocal/localData/dbsnp_132_b37.vcf -eval $DATA/ESPGO_Gabriel_NHLBI_eomi_june_2011_batch2.vcf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6004 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:49:54 +00:00
fromer
b4c30bf124
Added option of minMappingQuality
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6002 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 00:02:26 +00:00
depristo
165befd38a
V1 of the post processing QC plotting scala script and R function. The scala script runs VariantEval on a VCF file, and computes QC metrics. The R script generates the report. Will discuss usage with data processing group. Ryan -- please add your additional plotting routines to this script, as you see fit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5980 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 17:06:42 +00:00
depristo
44287ea8dc
ReducedBAM changes to downsample to a fixed coverage over the variable regions. Evaluation script now has filters and eval. commands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5965 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-08 19:36:08 +00:00
delangel
78f5309656
Intermediate commit of indel consensus VQSR script, a couple of new features added, not for general use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5951 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-06 13:27:02 +00:00
depristo
cd293f145b
More stable reduced reads representation. Bug fixes throughout. No diffs by <1% of sites in an exome, and the majority of these differences are filtered out, or are obvious artifacts. UnitTests for BaseCounts. BaseCounts extended to handle indels, but not yet enabled in the consensus reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5939 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-03 20:11:31 +00:00
carneiro
a4ffae880d
Subversion crashed my intellij BADLY, so now moving the data processing pipeline to core in 2 steps.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5932 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 23:31:24 +00:00
carneiro
36db9bdcd5
Implemented and tested BWA alignment in the data processing pipeline.
...
caveat: Right now bwa only supports one read group, so if the original file had multiple @RG lines, only the first one will be kept. (working on a solution to this)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5931 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 23:03:07 +00:00
carneiro
c85a1d9210
Implemented and tested BWA alignment in the data processing pipeline.
...
Renamed it and moved to core. Happy to support it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5930 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 22:58:55 +00:00
fromer
ef56b48eef
Add CNV sub-dir
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5928 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 21:47:13 +00:00
carneiro
355be57539
fixing the pipeline so that it still works while I'm adding support for BWA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5921 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 19:32:28 +00:00
carneiro
5974675b43
Two intermediate commits, to work over the weekend.
...
ReplicationValidationWalker: Just the skeleton of what will be the implementation of the replication/validation model.
dataProcessingV2: Committing an UNTESTED implementation of BWA alignment. I am running tests on it over the weekend.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5900 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 22:03:08 +00:00
depristo
549172af10
removing dependance on jobQueue == gsa
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5889 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 10:12:09 +00:00
fromer
b4af28c7df
Handle case where -L argument (intervals) not given
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5886 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 20:24:56 +00:00
delangel
3565eca2dd
Script to run UG to create annotated all-pop VCF files to use for Phase1 VQSR indel project consensus. Paralleles and generalizes SNP version, so in theory this script can be used for both SNP and Indel consensus.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5871 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 16:50:59 +00:00
delangel
e6396062c0
Script to use VQSR on indels - does VR, AR on each continental group, combines variants and then does VariantEval comparing with different chr20 all-pop 1000G callsets.
...
Not for general use yet!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5866 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 17:19:30 +00:00
carneiro
2efd807952
No more default callsets, they're now mandatory arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5858 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:56:43 +00:00
fromer
bc4305c956
Added memory limit parameter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5855 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:11:44 +00:00
fromer
833dff658a
Small script to do full variant annotation in parallel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5853 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 20:33:20 +00:00
chartl
912c6cdbfa
Moving this script out of playground while I figure out what's going on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5848 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 17:48:44 +00:00
depristo
72ad8ded19
Removed unused importants, but some of these scripts are now out of date (they have been for a long time) so they don't compile anyway
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5837 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:43:48 +00:00
carneiro
b5b8cb959a
Added VQSR to the downsampling script and changed memory limits for the clean script.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5817 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-18 20:07:42 +00:00
depristo
9423652ad8
Computes how well a genotype chip covers a reference panel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5806 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-14 15:07:28 +00:00
rpoplin
825682f58c
oops, putting the script back into a sensible state
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:17:05 +00:00
rpoplin
b5ab2274f6
Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:13:26 +00:00
carneiro
f35d955490
recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:43:09 +00:00
carneiro
9f2a8033ff
just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:42:23 +00:00
carneiro
c2f8536e02
removing old GATK options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:40:39 +00:00
carneiro
8bb92160b5
Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:19:42 +00:00
carneiro
e2b9227d8d
script to test BQSR on good/bad regions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:16:37 +00:00
rpoplin
4bbce42861
Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:12:47 +00:00
carneiro
a93a9ac663
adding gold standard (full coverage) to the variant eval analysis output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5721 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 16:29:11 +00:00
carneiro
2384e23274
Added the capability of running count covariates only on a given interval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5717 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 21:30:14 +00:00
carneiro
3868a7e778
Oneoff project to downsample, bootstrap and call snps to test sensitivity/specificity of downsampled coverage in WEX projects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5713 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:17:30 +00:00
carneiro
f04cc4321f
fixed a bug when the pipeline was used on a single bam.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5708 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 17:19:22 +00:00
depristo
122d5845d3
GATK Resource bundle, latest version (now with b37 -> b36 support). Oneoff scala script that assesses chip coverage of calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5703 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 22:01:36 +00:00
chartl
5b9a8555cd
Queue graph time is currently of O(n^m) where n = num jobs, m = num unique base files. This script therefore was running in order 1200^16, which I don't think would finish before the heat death of the universe. For now, push down the number of files to 1 and gather them outside of Queue, once I've fixed up scatter-gather in core, outputs can be uncommented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5674 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 12:56:25 +00:00
carneiro
d35c7d1029
- minor changes to the 'justclean' script to handle the Trio Cleaning.
...
- fixing a bug on single ended BWA option of the data processing pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5662 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-19 16:35:24 +00:00
chartl
23fac043d9
Fix the outputs so the proper files are gathered (not automatic due to multiplexer)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5654 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 23:55:12 +00:00
chartl
8125b8b901
Old changes to the exome VQSR search.
...
SGA updated to include new proportion-based insert size test.
Major fix for dichotomization test: MathUtils now optionally ignores NaN values for sums, averages, variances. In the future this feature can be pushed back into the AssociationContext object iself (e.g. no data? no entry), but it's kept like this for transparency for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5618 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:00:50 +00:00
chartl
b81228fec1
Minor bug fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5603 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 17:30:40 +00:00
hanna
437db28937
Incorporating Khalid's feedback.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5602 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 16:22:49 +00:00
chartl
cc58e19621
This is now running. Expect results in a few weeks when the ~7k jobs have percolated through the week queue. Pray gsa1 doesn't go down.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5593 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 21:12:59 +00:00