Commit Graph

6058 Commits (2ad4d2f0bbb94ce31164f0a79ec74e0b7d3d8b54)

Author SHA1 Message Date
Mauricio Carneiro 2ad4d2f0bb Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-29 16:06:14 -04:00
Mauricio Carneiro 1085df8b7b Making the BQSR pipeline publicly available and supported.
this is for all the Pacbio validation that is going on right no in the cancer group. They are all using this script, and I'm happy to support it.
2011-06-29 16:05:32 -04:00
Mauricio Carneiro ec9d8313ee Giving the BQSR pipeline script a more appropriate name since more people are using it now. 2011-06-29 16:03:26 -04:00
Eric Banks 70ba851478 Might as well check for the illegal state and throw an exception 2011-06-29 15:59:10 -04:00
Eric Banks 1f19afe1d9 Fixed bug in the IndelRealigner: now that variants are correctly typed in VariantContext, it is possible that a variant can be an indel but neither an insertion or a deletion; added a isComplexIndel() method and now we check for such an event in the realigner (we don't use them to generate alternate consenses). Also, added a isMNP() method while I was there so that it would be consistent with other variant types. 2011-06-29 15:54:09 -04:00
Eric Banks f018c27050 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-29 14:56:22 -04:00
Eric Banks 33c67a139c Wrong package; this should have been moved when VC got moved in from Tribble 2011-06-29 14:56:02 -04:00
Matt Hanna 9a22d78e48 Fix up gsalib target to work with new public/private directory structure. 2011-06-29 14:48:10 -04:00
Guillermo del Angel dee10140dd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-29 13:58:04 -04:00
Eric Banks 8586c86bc4 My commit from last week to fix the old dbsnp rod conversion only worked for locus traversals. Updated now to work for all traversals. 2011-06-29 13:56:37 -04:00
Guillermo del Angel f736a1d61b Updated md5's from previous checkin 2011-06-29 13:37:15 -04:00
Guillermo del Angel 5b6d279a2e Two bug fixes:
a) Modified the way clipped bases are dealt with in ReadPosRankSumTest when annotating indels. Cigar string cannot be trusted because BWA can clip good high quality bases and some sites get incorrect ReadPos annotations if BWA systematically clips at an indel breakpoint.
b) PL header needs to specify "." as length. Otherwise we fail VCF validation if multiallelic sites are present.
2011-06-29 10:21:27 -04:00
David Roazen 643458d7db Updated the tribble jar -- this should fix most of the integration test
failures we've been seeing.

Note that with tribble's new svn repository the revision numbers have reset,
hence "revision 3"
2011-06-29 01:11:03 -04:00
David Roazen 0a96f53772 One last test... 2011-06-28 19:18:17 -04:00
David Roazen efff6060b7 Revert "Final test commits before going live."
This reverts commit ba1443e54b829dd97af636415ed0d5cdaf256ad3.
2011-06-28 18:58:55 -04:00
David Roazen 755b80dc74 Final test commits before going live. 2011-06-28 18:56:00 -04:00
David Roazen 7e243fae6e Bug fixes for build.xml related to the public/private restructuring. 2011-06-28 12:55:18 -04:00
David Roazen 139c6b84a1 Modified build.xml and the help extractor doclet to use the output of "git
describe" as an absolute version number (if the repository has at least one
tag), using the raw SHA-1 hash value as a fallback version number in the case
where there are no tags.
2011-06-28 08:37:05 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00
carneiro b46279d62e required RODs are now checked by annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6080 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-25 06:38:19 +00:00
ebanks 3879b02cdd updating a package
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6079 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 20:13:28 +00:00
ebanks 86aa82caf8 Missed this integration test during my move of VC from Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6078 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 20:07:25 +00:00
ebanks c2ec2891d1 Other people besides Mark also wanted VariantContext moved to the GATK, so I listened. I am moving VariantContext and all codecs that rely on it (VCF, SoapSNP, HapMap, and CGvar) to the GATK - including relevant unit tests and data files. Additionally, Matt has modified build.xml to generate the necessary jar files so that people can use our VCF codec with Tribble.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6077 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 16:56:04 +00:00
carneiro be123d1399 missed a check for null on sampleNames. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6076 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 22:42:00 +00:00
carneiro 9c1b8ea796 Updated BQSR script to be more general and work with the new PacBio BAM files - for Kristian Cibulskis
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6075 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 21:05:28 +00:00
carneiro 087a25d9e3 quick memory upgrade to BWA classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6074 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:53:32 +00:00
carneiro fbe157137f removing the old processing pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6073 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:19:13 +00:00
carneiro 91fb664135 Many updates to SelectVariants :
1) There is now a different parameter for sample name (-sn), sample file (-sf) or sample expression (-se). The unexpected behavior of the previous implementation was way too tricky to leave unchecked. (if you had a file or directory named after a sample name, SV wouldn't work)

1b) Fixed a TODO added by Eric -- now the output vcf always has the samples sorted alphabetically regardless of input (this came as a byproduct of the implementation of 1)

2) Discordance and Concordance now work in combination with all other parameters.

3) Discordance now follows Guillermo's suggestion where the discordance track is your VCF and the variant track is the one you are comparing to. I have updated the example in the wiki to reflect this change in interpretation. 

4) If you DON'T provide any samples (-sn, -se or -sf), SelectVariants works with all samples from the VCF and ignores sample/genotype information when doing concordance or discordance. That is, it will report every "missing line" or "concordant line" in the two vcfs, regardless of sample or genotype information.

5) When samples are provided (-sn, -se or -sf) discordance and concordance will go down to the genotypes to determine whether or not you have a discordance/concordance event. In this case, a concordance happens only when the two VCFs display the same sample/genotype information for that locus, and discordance happens when the disc track is missing the line or has a different genotype information for that sample. 

6) When dealing with multiple samples, concordance only happens if ALL your samples agree, and discordance happens if AT LEAST ONE of your samples disagree.

---

Integration tests:

1) Discordance and concordance test added
2) All other tests updated to comply with the new 'sorted output' format and different inputs for samples.

---

Methods for handling sample expressions and files with list of samples were added to SampleUtils. I recommend *NOT USING* the old getSamplesFromCommandLineInput as this mixing of sample names with expressions and files creates a rogue error that can be challenging to catch.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6072 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:18:45 +00:00
droazen 48055d45cb Added support for PICARD functions to QUEUE after following Khalid's pointers on where to do it. I have added the 6 functions used by the Data Processing Pipeline, but from now on it should be a matter of seconds to copy/paste and create bindings to more functions.
Updated the Data Processing Pipeline to use the new Picard classes and reorganized the pre-processing of the pipeline accordingly.

Will only update the wiki once this change goes live.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6071 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:14 +00:00
droazen 658e65d26c 2 unrelated changes: 1) fix the variant context adaptor for dbsnp; conversion of deletions was totally broken. 2) stop using paths that include gsa-scr1 in integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6070 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:07 +00:00
droazen d92055d1f9 Checkpointing some bugfixes with zero-length version directories and missing
Picard metrics files before the push back into svn.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6069 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:01 +00:00
droazen 171e20a111 Updated the tribble jar to revision 351
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6068 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:57 +00:00
droazen 3d27e5eb98 Default operating parameters in addition to the parameterized Rscript version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6067 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:53 +00:00
droazen 0d07c979e9 added comments on how to use this very useful script!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6066 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:50 +00:00
droazen ab1de3bfda Updated the tribble jar to revision 350
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6065 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:46 +00:00
droazen c8124496d0 now with the new 'consensus model' parameter to the cleaner.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6064 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:42 +00:00
droazen c956e154a0 Kill silly plots.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6063 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:39 +00:00
droazen 772291c38f Error model is now built by lane and each pool is called separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6062 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:36 +00:00
droazen 28d8b28bdf Density plots.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6061 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:33 +00:00
droazen d323ef0461 As promised, VariantFiltration can now mask out sites within a user-specified window around the provided mask rod. By default the window is 0, but you can now use the --maskExtension argument to increase that value. Added integration tests to cover this new functionality.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6060 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:29 +00:00
droazen ea47ccf032 Implemented HET case with binomial distribution. Separated events from normal events and for now skip all extended events.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6059 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:24 +00:00
droazen 26d837f59e Factorial and log Factorial utilities avoiding overflow using the gamma function. Lots of unit tests. Everything is working great.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6058 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:20 +00:00
droazen 8d5b4af8ca Binomial and Multinomial interfaces for probability and coefficients in log and real space. Passed all unit tests.
BinomialCumulativeProbability was reformatted to follow the now standard parameter order.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6057 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:15 +00:00
droazen 4abb7c424b implementation of the Gamma function and log10 Binomial / Multinomial coefficients. Unit tests for gamma and binomial passed with honors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6056 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:09 +00:00
droazen 3392c67e0f Support for command-line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6055 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:05 +00:00
droazen d9973d3da7 Adding in a template for many other plots based on Mark's initial list of metrics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6054 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:02 +00:00
droazen 237f73c1b1 Initial fingerprint boxplot for exome PreQC metrics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6053 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:59 +00:00
droazen ff6386c29b binomial coefficient was in log2, changed to log10.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6052 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:55 +00:00
droazen 082abfd84f implementation of the truth allele, different cases for REF , HOMVAR, FILTERED and HET.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6051 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:51 +00:00
droazen 3f974c62e6 Reorganized init() to check for RODs (reference / truth)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6050 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:48 +00:00