corin
1e2892a35d
Preliminary QC script in R, which checks coverage, fingerprints, library duplication, total SNPs, dbSNP%, and availability of sample data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5881 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 19:57:00 +00:00
delangel
6ecbfa9013
OK, this time REALLY fix cut and paste error
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5880 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 19:47:12 +00:00
kshakir
dab269160b
Added cofoja to the Queue package. Although BCEL doesn't think they're needed the scala compiler respectfully disagrees.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5879 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 17:42:34 +00:00
delangel
efe6602827
Fix copy-paste error from previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5878 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 16:02:08 +00:00
delangel
7a43673599
Bug fix: also enclose fetching FS or HRun in a try/catch block or else code will blow up if an annotation is absent (e.g. when there no evidence for a variant in a vc)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5877 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 15:00:36 +00:00
delangel
f7298f4a7f
First of many baby steps to redo way in which we trigger events for indel calling and to eliminate extended events: get rid of SpanningDeletions annotation for indels. It's completely useless, and even more so once we no longer trigger at extended events (because we'll trigger by definition a base before a deletion starts, so deletions present in the current pileup are not informative).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5876 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-26 00:49:23 +00:00
ebanks
d393f59ad2
Moving the hg19 reference to a new location as per instruction from our intrepid leader
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5875 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:48:59 +00:00
ebanks
bafdd4f8f7
Ask for existance of extended pileup before grabbing it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5874 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:39:03 +00:00
kshakir
9d8c963fcc
Switched arguments from short name to long name.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5873 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:24:56 +00:00
kshakir
6ec3dd0f8c
Updated GridEngineJobRunner to return status RUNNING instead of PENDING when a job has been sent to GridEngine, even if it hasn't started.
...
Added GridEngine to pipeline tests.
Removed passing -jobProject since GridEngine projects must be predefined.
Writing the HybridSelectionPipelineTest yaml into the temp directory.
Disabled job priority as it needs to be refactored for use by GridEngine and LSF.
Fixed WholeGenomePipeline variantmergeoption rename to filteredRecordsMergeType.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5872 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 17:16:44 +00:00
delangel
3565eca2dd
Script to run UG to create annotated all-pop VCF files to use for Phase1 VQSR indel project consensus. Paralleles and generalizes SNP version, so in theory this script can be used for both SNP and Indel consensus.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5871 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 16:50:59 +00:00
ebanks
3d134a8497
Updated to produce (actual) hg19 resources too
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5870 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 02:14:55 +00:00
ebanks
389c546757
Renamed for consistency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5869 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-25 02:12:21 +00:00
ebanks
6ed71cf683
Annotation that adds a list of samples who are polymorphic at a site based on the GTs. Very useful if you are looking at rare variants among many samples, esp. in Evoker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5868 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 20:12:27 +00:00
depristo
1bd1404aa9
Sometimes md5s can be null
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5867 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 19:17:18 +00:00
delangel
e6396062c0
Script to use VQSR on indels - does VR, AR on each continental group, combines variants and then does VariantEval comparing with different chr20 all-pop 1000G callsets.
...
Not for general use yet!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5866 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 17:19:30 +00:00
depristo
e582a92af6
WalkerTest now checks for valid md5s in the integrationtests themselves, so no more stray whitespace errors. Added a WalkerTestTest to ensure tha t bad MD5s are detected and an error thrown
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5865 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 14:34:55 +00:00
depristo
0448ef28d3
Actually use the right parameter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5864 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 12:09:01 +00:00
hanna
06486c134a
Kill extra space in the md5.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5863 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 12:00:31 +00:00
depristo
d551ce720d
Updated with new CombineVariants syntax
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5862 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 11:38:09 +00:00
depristo
57e4693e4c
Slightly better error message when failing to create the index on the fly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5861 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 11:04:08 +00:00
depristo
cf3dbfee97
Renamed variantMergeOptions to filteredRecordsMergeType, as this is really what it does. Cleaned up the wiki so that it's clear what this does, as well as included an example of how to create an intersection with CombineVariants and SelectVariants. Added integrationtests of CombineVariants with OMNI and HapMap that deal with the two ways to merge fitlered/unfiltered records at the same site.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5860 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 01:54:29 +00:00
kiran
653475ce12
Now finds the most likely configuration of genotypes given the genotype likelihoods and inheritance constraints. The parental genotypes are now phased as well (the alleles are ordered as A_transmitted|A_untransmitted). Rewrote the way the transmission probability is calculated. This will probably move into core soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5859 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-24 01:35:40 +00:00
carneiro
2efd807952
No more default callsets, they're now mandatory arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5858 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:56:43 +00:00
carneiro
133712faec
Have a list of bam files but Picard updated their versions from v1 to v17 ? This script will update all your v* numbers for you.
...
PS: don't hate Lua. :-)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5857 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:46:14 +00:00
hanna
4bfec4c55b
Reenabling E.coli ValidatingPileup with MV1994 realigned using the BWA/C bindings.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5856 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:32:53 +00:00
fromer
bc4305c956
Added memory limit parameter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5855 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:11:44 +00:00
chartl
c7f4674fe2
Great! Contracts is working. Fixing some misspecified ones.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5854 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 21:00:52 +00:00
fromer
833dff658a
Small script to do full variant annotation in parallel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5853 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 20:33:20 +00:00
hanna
5dca1e4d2e
Make IntervalIntegrationTest aware of the new alignments in the MV1994.bam
...
testset.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5852 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 19:59:47 +00:00
chartl
7ff5375493
Removing build-killing dependency on a private package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5851 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 18:13:15 +00:00
chartl
0b07373909
Incorporating old feedback from eric: @deprecated methods should not be @deprecated, but rather protected, and the test's package moved to where it can access those test methods.
...
Also allows for the slightly more awesome name "MWUnitTest"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5850 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 18:06:05 +00:00
kiran
f8f37a786d
Now emits much more informative filter names and includes all of other the proper VCF header details (filter description line, tag definitions, etc.). Currently rewriting the way the transmission probability is calculated. This is shaping up to be a lovely little piece of code...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5849 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 17:50:59 +00:00
chartl
912c6cdbfa
Moving this script out of playground while I figure out what's going on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5848 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 17:48:44 +00:00
chartl
15dc632570
The U-value can be zero (edge case)
...
z-value can not be NaN (and can't possibly be null)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5847 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 14:15:36 +00:00
chartl
3c31007da4
Stupid brackets. How did this even compile?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5846 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 14:00:53 +00:00
chartl
480859db50
Contractified version of MannWhitneyU. Some behavior has been changed:
...
- Running a test when there are no observations of at least one of the sets now breaks the MWU contract
+ MWU returns Pair(Double.NaN,Double.NaN) in these instances to maintain the contract of never returning null
+ No more Double.Infinity values will appear
- RankSumTests now probe the return values for NaNs, and don't annotate if they appear
- For small sets where the probability is calculated recursively, the z-value is now the inversion of the error function
and not the approximate z-value
- UG and Annotator integration tests updated to reflect changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5845 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 13:57:15 +00:00
depristo
b814f4bbd6
Contracts for HasGenomeLocation. BAQ iterator variables are all final. Contracts added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5844 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 02:21:59 +00:00
depristo
43057bd15c
Remove Param annotation and associated broken processing code, as this was never used in the codebase
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5843 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 02:21:15 +00:00
depristo
d005c4bf09
GenomeLocProcessingTracker was using SimpleTimer in a non-thread safe way. No longer providing an interface to time parallel operations. Now issues warning if someone enables distributed GATK, as this is considered an unstable, experimental engine feature.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5842 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-23 02:10:27 +00:00
depristo
a18b0152df
Contracts for SimpleTimer, as well as UnitTests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5841 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 19:45:31 +00:00
depristo
a02ab4028b
Build for tests now includes path to contracts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5840 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 19:44:49 +00:00
depristo
0dc0d586f1
Phasing-specific utilies are now in the Phasing walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5839 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:51:35 +00:00
depristo
a1349f3520
report packages are no more
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5838 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:45:08 +00:00
depristo
72ad8ded19
Removed unused importants, but some of these scripts are now out of date (they have been for a long time) so they don't compile anyway
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5837 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:43:48 +00:00
depristo
f608ed6d5a
Removed old (and unused) reporting system, now that Kiran's VE reporting system is working. Refactors dictionary creation error messages into UserExceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5836 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 18:42:52 +00:00
rpoplin
4e7ecbdcb2
FS values need to be jittered just like HRun
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5835 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 16:44:12 +00:00
depristo
9cc049f80f
Contracted ReferenceContext. Removed depreciated accessors that aren't used in the GATK at all
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5834 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-22 02:41:15 +00:00
depristo
d77f4ebe31
CalibrateGenotypeLikelihoods now emits a molten data set with REF and ALT alleles, so that GL calibration can be evaluated as a function of the REF/ALT bases. DigestTable is a stand-alone Rscript that digests the multi-GB molten data table into a tiny table that shows reported vs. empirical GLs, as a function of a variety of features of the data, like REF/ALT, comp GT, eval GT, and GL itself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5833 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-21 14:02:30 +00:00
depristo
6a49e8df34
Significant change to the way subsetting by sample works with monomorphic sites. Now keeps the alt allele, even if a record is AC=0 after the subset. Previously, the system dropped the alt allele, which I don't think is the right behavior. If you really want a VCF without monomorphic sites, use the option to drop monomorphic sites after subsetting. See detailed information below.
...
Right now, if you select a multi-sample VCF file down (or one with filters I see) down to a smaller set of samples, and the site isn't polymorphic in that subgroup, then the alt allele is lost. For example, when selecting down NA12878 from the OMNI, I previously received the following VCF:
1 82154 rs4477212 A . . PASS AC=0;AF=0.00;AN=2;CR=100.0;DP=0;GentrainScore=0.7826;HW=1.0 GT:GC 0/0:0.7205
1 534247 SNP1-524110 C . . PASS AC=0;AF=0.00;AN=2;CR=99.93414;DP=0;GentrainScore=0.7423;HW=1.0 GT:GC 0/0:0.6491
1 565286 SNP1-555149 C T . PASS AC=2;AF=1.00;AN=2;CR=98.8266;DP=0;GentrainScore=0.7029;HW=1.0 GT:GC 1/1:0.3471
1 569624 SNP1-559487 T C . PASS AC=2;AF=1.00;AN=2;CR=97.8022;DP=0;GentrainScore=0.8070;HW=1.0 GT:GC 1/1:0.3942
Where the first two records lost the ALT allele, because NA12878 is hom-ref at this site. My change results in a VCF that looks like:
1 82154 rs4477212 A G . PASS AC=0;AF=0.00;AN=2;CR=100.0;DP=0;GentrainScore=0.7826;HW=1.0 GT:GC 0/0:0.7205
1 534247 SNP1-524110 C T . PASS AC=0;AF=0.00;AN=2;CR=99.93414;DP=0;GentrainScore=0.7423;HW=1.0 GT:GC 0/0:0.6491
1 565286 SNP1-555149 C T . PASS AC=2;AF=1.00;AN=2;CR=98.8266;DP=0;GentrainScore=0.7029;HW=1.0 GT:GC 1/1:0.3471
1 569624 SNP1-559487 T C . PASS AC=2;AF=1.00;AN=2;CR=97.8022;DP=0;GentrainScore=0.8070;HW=1.0 GT:GC 1/1:0.3942
The genotype remains unchanged, but the ALT allele is now preserved. I think this is the correct behavior, as reducing samples down shouldn't change the character of the site, only the AC in the subpopulation. This is related to the tricky issue of isPolymorphic() vs. isVariant().
isVariant => is there an ALT allele?
isPolymorphic => is some sample non-ref in the samples?
In part this is complicated as the semantics of sites-only VCFs, where ALT = . is used to mean not-polymorphic. Unfortunately, I just don't think there's a consistent convention right now, but it might be worth at some point to adopt a single approach to handling this. Wiki docs updated.
Does anyone have critical infrastructure that depends on the previous convention? Let me know so we can coordinate the change.
There's a new function subContextFromGenotypes() that also takes a Set<Allele> to handle this type of behavior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5832 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-21 13:59:16 +00:00