carneiro
36db9bdcd5
Implemented and tested BWA alignment in the data processing pipeline.
...
caveat: Right now bwa only supports one read group, so if the original file had multiple @RG lines, only the first one will be kept. (working on a solution to this)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5931 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 23:03:07 +00:00
carneiro
c85a1d9210
Implemented and tested BWA alignment in the data processing pipeline.
...
Renamed it and moved to core. Happy to support it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5930 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 22:58:55 +00:00
fromer
ef56b48eef
Add CNV sub-dir
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5928 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 21:47:13 +00:00
carneiro
355be57539
fixing the pipeline so that it still works while I'm adding support for BWA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5921 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-02 19:32:28 +00:00
carneiro
5974675b43
Two intermediate commits, to work over the weekend.
...
ReplicationValidationWalker: Just the skeleton of what will be the implementation of the replication/validation model.
dataProcessingV2: Committing an UNTESTED implementation of BWA alignment. I am running tests on it over the weekend.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5900 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 22:03:08 +00:00
depristo
549172af10
removing dependance on jobQueue == gsa
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5889 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-27 10:12:09 +00:00
fromer
b4af28c7df
Handle case where -L argument (intervals) not given
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5886 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 20:24:56 +00:00
delangel
3565eca2dd
Script to run UG to create annotated all-pop VCF files to use for Phase1 VQSR indel project consensus. Paralleles and generalizes SNP version, so in theory this script can be used for both SNP and Indel consensus.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5871 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 16:50:59 +00:00
delangel
e6396062c0
Script to use VQSR on indels - does VR, AR on each continental group, combines variants and then does VariantEval comparing with different chr20 all-pop 1000G callsets.
...
Not for general use yet!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5866 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 17:19:30 +00:00
carneiro
2efd807952
No more default callsets, they're now mandatory arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5858 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:56:43 +00:00
fromer
bc4305c956
Added memory limit parameter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5855 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:11:44 +00:00
fromer
833dff658a
Small script to do full variant annotation in parallel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5853 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 20:33:20 +00:00
chartl
912c6cdbfa
Moving this script out of playground while I figure out what's going on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5848 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 17:48:44 +00:00
depristo
72ad8ded19
Removed unused importants, but some of these scripts are now out of date (they have been for a long time) so they don't compile anyway
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5837 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:43:48 +00:00
carneiro
b5b8cb959a
Added VQSR to the downsampling script and changed memory limits for the clean script.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5817 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-18 20:07:42 +00:00
depristo
9423652ad8
Computes how well a genotype chip covers a reference panel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5806 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-14 15:07:28 +00:00
rpoplin
825682f58c
oops, putting the script back into a sensible state
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5765 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:17:05 +00:00
rpoplin
b5ab2274f6
Committing the base qscript I used to make the Phase1 Project Consensus. Does per-population cleaning and simplifyBAM, and then per-analysis-panel calling with genotype given alleles. Combines info fields using the panel with max AC.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5764 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-04 20:13:26 +00:00
carneiro
f35d955490
recalibrates a dataset splitting between good and bad regions for comparison (used to be named justRecalibrate)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5747 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:43:09 +00:00
carneiro
9f2a8033ff
just recalibrates now recalibrates one sample, fully, not splitting intervals (naming makes more sense)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5746 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:42:23 +00:00
carneiro
c2f8536e02
removing old GATK options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5745 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:40:39 +00:00
carneiro
8bb92160b5
Script to identify mendelian violations in the CEU Trio and follow up with supposedly incorrect SNP calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5744 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:19:42 +00:00
carneiro
e2b9227d8d
script to test BQSR on good/bad regions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5743 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 21:16:37 +00:00
rpoplin
4bbce42861
Renaming ContrastiveRecalibrator --> VariantRecalibrator in preparation for move to core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5733 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-03 18:12:47 +00:00
carneiro
a93a9ac663
adding gold standard (full coverage) to the variant eval analysis output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5721 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-02 16:29:11 +00:00
carneiro
2384e23274
Added the capability of running count covariates only on a given interval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5717 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 21:30:14 +00:00
carneiro
3868a7e778
Oneoff project to downsample, bootstrap and call snps to test sensitivity/specificity of downsampled coverage in WEX projects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5713 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-29 19:17:30 +00:00
carneiro
f04cc4321f
fixed a bug when the pipeline was used on a single bam.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5708 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-28 17:19:22 +00:00
depristo
122d5845d3
GATK Resource bundle, latest version (now with b37 -> b36 support). Oneoff scala script that assesses chip coverage of calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5703 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-27 22:01:36 +00:00
chartl
5b9a8555cd
Queue graph time is currently of O(n^m) where n = num jobs, m = num unique base files. This script therefore was running in order 1200^16, which I don't think would finish before the heat death of the universe. For now, push down the number of files to 1 and gather them outside of Queue, once I've fixed up scatter-gather in core, outputs can be uncommented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5674 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-21 12:56:25 +00:00
carneiro
d35c7d1029
- minor changes to the 'justclean' script to handle the Trio Cleaning.
...
- fixing a bug on single ended BWA option of the data processing pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5662 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-19 16:35:24 +00:00
chartl
23fac043d9
Fix the outputs so the proper files are gathered (not automatic due to multiplexer)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5654 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-15 23:55:12 +00:00
chartl
8125b8b901
Old changes to the exome VQSR search.
...
SGA updated to include new proportion-based insert size test.
Major fix for dichotomization test: MathUtils now optionally ignores NaN values for sums, averages, variances. In the future this feature can be pushed back into the AssociationContext object iself (e.g. no data? no entry), but it's kept like this for transparency for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5618 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:00:50 +00:00
chartl
b81228fec1
Minor bug fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5603 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 17:30:40 +00:00
hanna
437db28937
Incorporating Khalid's feedback.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5602 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 16:22:49 +00:00
chartl
cc58e19621
This is now running. Expect results in a few weeks when the ~7k jobs have percolated through the week queue. Pray gsa1 doesn't go down.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5593 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 21:12:59 +00:00
chartl
6a26957b65
Bug squashing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5592 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 20:11:28 +00:00
chartl
a1b7d28375
Initial VQSR full search script
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5591 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 20:03:48 +00:00
hanna
798fb6a7a2
First draft of a script to measure performance of read walkers when merging
...
dynamically.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5570 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 15:35:14 +00:00
carneiro
b722ebf244
quick help/comments updates to match the wikipage.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5569 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 12:55:55 +00:00
depristo
349661b958
Renamed StratifyAlignmentContext to AlignmentContextUtils, and StatiefyContextType to ReadOrientation. Also, went through the system and deleted all references to second bases. That ship passed long ago.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5563 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 15:35:09 +00:00
carneiro
0a772688fe
implementation of the Gatherer class for CountCovariates, which makes it now scatter/gatherable. Kudos to the @Gather annotation Khalid just introduced!
...
QuickCCTest is my test script for the gatherer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5547 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 21:15:21 +00:00
carneiro
20344a27b4
Quick updates to the data processing pipeline after successfully cleaning the papuans. It now scatter gathers everything and runs in the hour queue for low pass data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5546 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 21:13:33 +00:00
carneiro
5d26c66769
Count Covariates is almost scatter-gatherable now!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5537 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 22:25:33 +00:00
carneiro
c3f70cc5cb
DPP: Updated after some tests with BWA. Still needs more testing.
...
MDP: Removed ApplyVariantCut as it's no longer necessary with VQSR2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5534 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 18:22:09 +00:00
carneiro
ccdc021207
Added BWA (option) to the data processing pipeline. Lots of testing still happening...
...
little fix to the calling pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5528 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 20:17:57 +00:00
depristo
cdb0bde952
Bringing script up to date
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5526 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 20:49:07 +00:00
depristo
bae0b6cba8
A script for playing with BEAGLE refinement parameters. Supports construction of reference panels from NGS data sets with varying niteration and calibration curve parameters, as well as imputing missing genotypes in a VCF with this reference panel, and comparison to a deeply sequenced individual.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5523 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 12:44:25 +00:00
chartl
fe7f45ee2e
First pass at recalibrating associations, with optional data whitening. Modification to the TableCodec so it can natively read bedgraph files (just needed to add an extra header marker: "track").
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5515 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 19:35:39 +00:00
carneiro
1281c842ad
quick updates to conform with the new picard bam function structure
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5505 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 16:58:37 +00:00