Commit Graph

6042 Commits (7e243fae6ef3e7a8d6606a5433c66b39d53c9952)

Author SHA1 Message Date
David Roazen 7e243fae6e Bug fixes for build.xml related to the public/private restructuring. 2011-06-28 12:55:18 -04:00
David Roazen 139c6b84a1 Modified build.xml and the help extractor doclet to use the output of "git
describe" as an absolute version number (if the repository has at least one
tag), using the raw SHA-1 hash value as a fallback version number in the case
where there are no tags.
2011-06-28 08:37:05 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00
carneiro b46279d62e required RODs are now checked by annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6080 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-25 06:38:19 +00:00
ebanks 3879b02cdd updating a package
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6079 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 20:13:28 +00:00
ebanks 86aa82caf8 Missed this integration test during my move of VC from Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6078 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 20:07:25 +00:00
ebanks c2ec2891d1 Other people besides Mark also wanted VariantContext moved to the GATK, so I listened. I am moving VariantContext and all codecs that rely on it (VCF, SoapSNP, HapMap, and CGvar) to the GATK - including relevant unit tests and data files. Additionally, Matt has modified build.xml to generate the necessary jar files so that people can use our VCF codec with Tribble.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6077 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 16:56:04 +00:00
carneiro be123d1399 missed a check for null on sampleNames. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6076 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 22:42:00 +00:00
carneiro 9c1b8ea796 Updated BQSR script to be more general and work with the new PacBio BAM files - for Kristian Cibulskis
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6075 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 21:05:28 +00:00
carneiro 087a25d9e3 quick memory upgrade to BWA classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6074 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:53:32 +00:00
carneiro fbe157137f removing the old processing pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6073 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:19:13 +00:00
carneiro 91fb664135 Many updates to SelectVariants :
1) There is now a different parameter for sample name (-sn), sample file (-sf) or sample expression (-se). The unexpected behavior of the previous implementation was way too tricky to leave unchecked. (if you had a file or directory named after a sample name, SV wouldn't work)

1b) Fixed a TODO added by Eric -- now the output vcf always has the samples sorted alphabetically regardless of input (this came as a byproduct of the implementation of 1)

2) Discordance and Concordance now work in combination with all other parameters.

3) Discordance now follows Guillermo's suggestion where the discordance track is your VCF and the variant track is the one you are comparing to. I have updated the example in the wiki to reflect this change in interpretation. 

4) If you DON'T provide any samples (-sn, -se or -sf), SelectVariants works with all samples from the VCF and ignores sample/genotype information when doing concordance or discordance. That is, it will report every "missing line" or "concordant line" in the two vcfs, regardless of sample or genotype information.

5) When samples are provided (-sn, -se or -sf) discordance and concordance will go down to the genotypes to determine whether or not you have a discordance/concordance event. In this case, a concordance happens only when the two VCFs display the same sample/genotype information for that locus, and discordance happens when the disc track is missing the line or has a different genotype information for that sample. 

6) When dealing with multiple samples, concordance only happens if ALL your samples agree, and discordance happens if AT LEAST ONE of your samples disagree.

---

Integration tests:

1) Discordance and concordance test added
2) All other tests updated to comply with the new 'sorted output' format and different inputs for samples.

---

Methods for handling sample expressions and files with list of samples were added to SampleUtils. I recommend *NOT USING* the old getSamplesFromCommandLineInput as this mixing of sample names with expressions and files creates a rogue error that can be challenging to catch.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6072 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:18:45 +00:00
droazen 48055d45cb Added support for PICARD functions to QUEUE after following Khalid's pointers on where to do it. I have added the 6 functions used by the Data Processing Pipeline, but from now on it should be a matter of seconds to copy/paste and create bindings to more functions.
Updated the Data Processing Pipeline to use the new Picard classes and reorganized the pre-processing of the pipeline accordingly.

Will only update the wiki once this change goes live.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6071 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:14 +00:00
droazen 658e65d26c 2 unrelated changes: 1) fix the variant context adaptor for dbsnp; conversion of deletions was totally broken. 2) stop using paths that include gsa-scr1 in integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6070 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:07 +00:00
droazen d92055d1f9 Checkpointing some bugfixes with zero-length version directories and missing
Picard metrics files before the push back into svn.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6069 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:01 +00:00
droazen 171e20a111 Updated the tribble jar to revision 351
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6068 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:57 +00:00
droazen 3d27e5eb98 Default operating parameters in addition to the parameterized Rscript version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6067 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:53 +00:00
droazen 0d07c979e9 added comments on how to use this very useful script!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6066 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:50 +00:00
droazen ab1de3bfda Updated the tribble jar to revision 350
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6065 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:46 +00:00
droazen c8124496d0 now with the new 'consensus model' parameter to the cleaner.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6064 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:42 +00:00
droazen c956e154a0 Kill silly plots.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6063 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:39 +00:00
droazen 772291c38f Error model is now built by lane and each pool is called separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6062 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:36 +00:00
droazen 28d8b28bdf Density plots.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6061 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:33 +00:00
droazen d323ef0461 As promised, VariantFiltration can now mask out sites within a user-specified window around the provided mask rod. By default the window is 0, but you can now use the --maskExtension argument to increase that value. Added integration tests to cover this new functionality.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6060 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:29 +00:00
droazen ea47ccf032 Implemented HET case with binomial distribution. Separated events from normal events and for now skip all extended events.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6059 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:24 +00:00
droazen 26d837f59e Factorial and log Factorial utilities avoiding overflow using the gamma function. Lots of unit tests. Everything is working great.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6058 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:20 +00:00
droazen 8d5b4af8ca Binomial and Multinomial interfaces for probability and coefficients in log and real space. Passed all unit tests.
BinomialCumulativeProbability was reformatted to follow the now standard parameter order.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6057 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:15 +00:00
droazen 4abb7c424b implementation of the Gamma function and log10 Binomial / Multinomial coefficients. Unit tests for gamma and binomial passed with honors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6056 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:09 +00:00
droazen 3392c67e0f Support for command-line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6055 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:05 +00:00
droazen d9973d3da7 Adding in a template for many other plots based on Mark's initial list of metrics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6054 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:02 +00:00
droazen 237f73c1b1 Initial fingerprint boxplot for exome PreQC metrics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6053 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:59 +00:00
droazen ff6386c29b binomial coefficient was in log2, changed to log10.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6052 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:55 +00:00
droazen 082abfd84f implementation of the truth allele, different cases for REF , HOMVAR, FILTERED and HET.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6051 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:51 +00:00
droazen 3f974c62e6 Reorganized init() to check for RODs (reference / truth)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6050 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:48 +00:00
droazen 6f5a08ddc6 Simple walker to look at SNPs near indels. Didn't need to make this a walker and commit it, but used it as an opportunity to play with GIT in unstable.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6049 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:44 +00:00
droazen 29a0e08aa2 Testing bug fix process #3 (changes are irrelevant)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6048 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:40 +00:00
droazen e148a75c32 Testing the 'bug fix' process #2 (changes are irrelevant)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6047 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:37 +00:00
droazen 2e3d6754cd First implementation of the Error Model.
Added stratification by lane to ReadBackedPileup.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6046 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:33 +00:00
droazen 27b1418b84 PSP2 output much better. Good masking of repetitive regions. Flagging of invalid amplicons rather than omission of them, reasons properly given. Kiran doesn't like the trailing comma, but the trailing comma also doesn't like Kiran.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6045 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:27 +00:00
droazen f7fa373643 Incorporate lists of fingerprint data rather than summaries.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6044 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:23 +00:00
droazen 9a00d81d57 Is git commit -a different than git commit?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6043 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:19 +00:00
droazen 84dd72e6cb Adding in some read filters, updating MathUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6042 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:15 +00:00
droazen e0d203434f Add a column summing the fingerprint LOD scores.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6041 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:09 +00:00
droazen 4f7a64a798 Fixing broken walker as per GS; adding integration test to cover it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6040 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:04 +00:00
droazen 0e057276ae Changing the default behavior of the IndelRealigner to run without Smith-Waterman. Changed around the integration tests accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6039 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:58 +00:00
droazen 751aa8bfa6 Partial rewrite of the summary metrics aggregator to accumulate all metrics
from sample-level summaries, rather than only specific metrics.  Continues to
manually handle fingerprinting.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6038 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:53 +00:00
droazen 4288ca1c24 Fix doc bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6037 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:49 +00:00
droazen cc1f94310d A prototype script and library dependencies to extract a BAM list from a
reasonably well-formed PM's xls{x}-format spreadsheet or tsv file.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6036 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:45 +00:00
droazen df71d5b965 bye bye
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6035 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:42 +00:00
droazen 9b90e9385d Putting new association files, some qscripts, and the new pick sequenom probes file under local version control. I notice some dumb emacs backup files, I'll kill those off momentarily. Also minor changes to GenomeLoc (getStartLoc() and getEndLoc() convenience methods)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6034 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:37 +00:00