Mauricio Carneiro
2ad4d2f0bb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-06-29 16:06:14 -04:00
Mauricio Carneiro
1085df8b7b
Making the BQSR pipeline publicly available and supported.
...
this is for all the Pacbio validation that is going on right no in the cancer group. They are all using this script, and I'm happy to support it.
2011-06-29 16:05:32 -04:00
Mauricio Carneiro
ec9d8313ee
Giving the BQSR pipeline script a more appropriate name since more people are using it now.
2011-06-29 16:03:26 -04:00
Eric Banks
70ba851478
Might as well check for the illegal state and throw an exception
2011-06-29 15:59:10 -04:00
Eric Banks
1f19afe1d9
Fixed bug in the IndelRealigner: now that variants are correctly typed in VariantContext, it is possible that a variant can be an indel but neither an insertion or a deletion; added a isComplexIndel() method and now we check for such an event in the realigner (we don't use them to generate alternate consenses). Also, added a isMNP() method while I was there so that it would be consistent with other variant types.
2011-06-29 15:54:09 -04:00
Eric Banks
f018c27050
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-06-29 14:56:22 -04:00
Eric Banks
33c67a139c
Wrong package; this should have been moved when VC got moved in from Tribble
2011-06-29 14:56:02 -04:00
Matt Hanna
9a22d78e48
Fix up gsalib target to work with new public/private directory structure.
2011-06-29 14:48:10 -04:00
Guillermo del Angel
dee10140dd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-06-29 13:58:04 -04:00
Eric Banks
8586c86bc4
My commit from last week to fix the old dbsnp rod conversion only worked for locus traversals. Updated now to work for all traversals.
2011-06-29 13:56:37 -04:00
Guillermo del Angel
f736a1d61b
Updated md5's from previous checkin
2011-06-29 13:37:15 -04:00
Guillermo del Angel
5b6d279a2e
Two bug fixes:
...
a) Modified the way clipped bases are dealt with in ReadPosRankSumTest when annotating indels. Cigar string cannot be trusted because BWA can clip good high quality bases and some sites get incorrect ReadPos annotations if BWA systematically clips at an indel breakpoint.
b) PL header needs to specify "." as length. Otherwise we fail VCF validation if multiallelic sites are present.
2011-06-29 10:21:27 -04:00
David Roazen
643458d7db
Updated the tribble jar -- this should fix most of the integration test
...
failures we've been seeing.
Note that with tribble's new svn repository the revision numbers have reset,
hence "revision 3"
2011-06-29 01:11:03 -04:00
David Roazen
0a96f53772
One last test...
2011-06-28 19:18:17 -04:00
David Roazen
efff6060b7
Revert "Final test commits before going live."
...
This reverts commit ba1443e54b829dd97af636415ed0d5cdaf256ad3.
2011-06-28 18:58:55 -04:00
David Roazen
755b80dc74
Final test commits before going live.
2011-06-28 18:56:00 -04:00
David Roazen
7e243fae6e
Bug fixes for build.xml related to the public/private restructuring.
2011-06-28 12:55:18 -04:00
David Roazen
139c6b84a1
Modified build.xml and the help extractor doclet to use the output of "git
...
describe" as an absolute version number (if the repository has at least one
tag), using the raw SHA-1 hash value as a fallback version number in the case
where there are no tags.
2011-06-28 08:37:05 -04:00
David Roazen
3c9497788e
Reorganized the codebase beneath top-level public and private directories,
...
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00
carneiro
b46279d62e
required RODs are now checked by annotations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6080 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-25 06:38:19 +00:00
ebanks
3879b02cdd
updating a package
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6079 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 20:13:28 +00:00
ebanks
86aa82caf8
Missed this integration test during my move of VC from Tribble
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6078 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 20:07:25 +00:00
ebanks
c2ec2891d1
Other people besides Mark also wanted VariantContext moved to the GATK, so I listened. I am moving VariantContext and all codecs that rely on it (VCF, SoapSNP, HapMap, and CGvar) to the GATK - including relevant unit tests and data files. Additionally, Matt has modified build.xml to generate the necessary jar files so that people can use our VCF codec with Tribble.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6077 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 16:56:04 +00:00
carneiro
be123d1399
missed a check for null on sampleNames. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6076 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 22:42:00 +00:00
carneiro
9c1b8ea796
Updated BQSR script to be more general and work with the new PacBio BAM files - for Kristian Cibulskis
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6075 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 21:05:28 +00:00
carneiro
087a25d9e3
quick memory upgrade to BWA classes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6074 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:53:32 +00:00
carneiro
fbe157137f
removing the old processing pipeline.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6073 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:19:13 +00:00
carneiro
91fb664135
Many updates to SelectVariants :
...
1) There is now a different parameter for sample name (-sn), sample file (-sf) or sample expression (-se). The unexpected behavior of the previous implementation was way too tricky to leave unchecked. (if you had a file or directory named after a sample name, SV wouldn't work)
1b) Fixed a TODO added by Eric -- now the output vcf always has the samples sorted alphabetically regardless of input (this came as a byproduct of the implementation of 1)
2) Discordance and Concordance now work in combination with all other parameters.
3) Discordance now follows Guillermo's suggestion where the discordance track is your VCF and the variant track is the one you are comparing to. I have updated the example in the wiki to reflect this change in interpretation.
4) If you DON'T provide any samples (-sn, -se or -sf), SelectVariants works with all samples from the VCF and ignores sample/genotype information when doing concordance or discordance. That is, it will report every "missing line" or "concordant line" in the two vcfs, regardless of sample or genotype information.
5) When samples are provided (-sn, -se or -sf) discordance and concordance will go down to the genotypes to determine whether or not you have a discordance/concordance event. In this case, a concordance happens only when the two VCFs display the same sample/genotype information for that locus, and discordance happens when the disc track is missing the line or has a different genotype information for that sample.
6) When dealing with multiple samples, concordance only happens if ALL your samples agree, and discordance happens if AT LEAST ONE of your samples disagree.
---
Integration tests:
1) Discordance and concordance test added
2) All other tests updated to comply with the new 'sorted output' format and different inputs for samples.
---
Methods for handling sample expressions and files with list of samples were added to SampleUtils. I recommend *NOT USING* the old getSamplesFromCommandLineInput as this mixing of sample names with expressions and files creates a rogue error that can be challenging to catch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6072 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:18:45 +00:00
droazen
48055d45cb
Added support for PICARD functions to QUEUE after following Khalid's pointers on where to do it. I have added the 6 functions used by the Data Processing Pipeline, but from now on it should be a matter of seconds to copy/paste and create bindings to more functions.
...
Updated the Data Processing Pipeline to use the new Picard classes and reorganized the pre-processing of the pipeline accordingly.
Will only update the wiki once this change goes live.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6071 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:14 +00:00
droazen
658e65d26c
2 unrelated changes: 1) fix the variant context adaptor for dbsnp; conversion of deletions was totally broken. 2) stop using paths that include gsa-scr1 in integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6070 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:07 +00:00
droazen
d92055d1f9
Checkpointing some bugfixes with zero-length version directories and missing
...
Picard metrics files before the push back into svn.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6069 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:01 +00:00
droazen
171e20a111
Updated the tribble jar to revision 351
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6068 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:57 +00:00
droazen
3d27e5eb98
Default operating parameters in addition to the parameterized Rscript version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6067 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:53 +00:00
droazen
0d07c979e9
added comments on how to use this very useful script!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6066 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:50 +00:00
droazen
ab1de3bfda
Updated the tribble jar to revision 350
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6065 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:46 +00:00
droazen
c8124496d0
now with the new 'consensus model' parameter to the cleaner.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6064 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:42 +00:00
droazen
c956e154a0
Kill silly plots.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6063 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:39 +00:00
droazen
772291c38f
Error model is now built by lane and each pool is called separately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6062 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:36 +00:00
droazen
28d8b28bdf
Density plots.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6061 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:33 +00:00
droazen
d323ef0461
As promised, VariantFiltration can now mask out sites within a user-specified window around the provided mask rod. By default the window is 0, but you can now use the --maskExtension argument to increase that value. Added integration tests to cover this new functionality.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6060 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:29 +00:00
droazen
ea47ccf032
Implemented HET case with binomial distribution. Separated events from normal events and for now skip all extended events.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6059 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:24 +00:00
droazen
26d837f59e
Factorial and log Factorial utilities avoiding overflow using the gamma function. Lots of unit tests. Everything is working great.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6058 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:20 +00:00
droazen
8d5b4af8ca
Binomial and Multinomial interfaces for probability and coefficients in log and real space. Passed all unit tests.
...
BinomialCumulativeProbability was reformatted to follow the now standard parameter order.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6057 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:15 +00:00
droazen
4abb7c424b
implementation of the Gamma function and log10 Binomial / Multinomial coefficients. Unit tests for gamma and binomial passed with honors.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6056 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:09 +00:00
droazen
3392c67e0f
Support for command-line arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6055 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:05 +00:00
droazen
d9973d3da7
Adding in a template for many other plots based on Mark's initial list of metrics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6054 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:02 +00:00
droazen
237f73c1b1
Initial fingerprint boxplot for exome PreQC metrics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6053 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:59 +00:00
droazen
ff6386c29b
binomial coefficient was in log2, changed to log10.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6052 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:55 +00:00
droazen
082abfd84f
implementation of the truth allele, different cases for REF , HOMVAR, FILTERED and HET.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6051 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:51 +00:00
droazen
3f974c62e6
Reorganized init() to check for RODs (reference / truth)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6050 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:48 +00:00