Commit Graph

4752 Commits (3879b02cdd24438f8c55ccc32cefb816cf9fd717)

Author SHA1 Message Date
ebanks 86aa82caf8 Missed this integration test during my move of VC from Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6078 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 20:07:25 +00:00
ebanks c2ec2891d1 Other people besides Mark also wanted VariantContext moved to the GATK, so I listened. I am moving VariantContext and all codecs that rely on it (VCF, SoapSNP, HapMap, and CGvar) to the GATK - including relevant unit tests and data files. Additionally, Matt has modified build.xml to generate the necessary jar files so that people can use our VCF codec with Tribble.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6077 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-24 16:56:04 +00:00
carneiro be123d1399 missed a check for null on sampleNames. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6076 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 22:42:00 +00:00
carneiro 91fb664135 Many updates to SelectVariants :
1) There is now a different parameter for sample name (-sn), sample file (-sf) or sample expression (-se). The unexpected behavior of the previous implementation was way too tricky to leave unchecked. (if you had a file or directory named after a sample name, SV wouldn't work)

1b) Fixed a TODO added by Eric -- now the output vcf always has the samples sorted alphabetically regardless of input (this came as a byproduct of the implementation of 1)

2) Discordance and Concordance now work in combination with all other parameters.

3) Discordance now follows Guillermo's suggestion where the discordance track is your VCF and the variant track is the one you are comparing to. I have updated the example in the wiki to reflect this change in interpretation. 

4) If you DON'T provide any samples (-sn, -se or -sf), SelectVariants works with all samples from the VCF and ignores sample/genotype information when doing concordance or discordance. That is, it will report every "missing line" or "concordant line" in the two vcfs, regardless of sample or genotype information.

5) When samples are provided (-sn, -se or -sf) discordance and concordance will go down to the genotypes to determine whether or not you have a discordance/concordance event. In this case, a concordance happens only when the two VCFs display the same sample/genotype information for that locus, and discordance happens when the disc track is missing the line or has a different genotype information for that sample. 

6) When dealing with multiple samples, concordance only happens if ALL your samples agree, and discordance happens if AT LEAST ONE of your samples disagree.

---

Integration tests:

1) Discordance and concordance test added
2) All other tests updated to comply with the new 'sorted output' format and different inputs for samples.

---

Methods for handling sample expressions and files with list of samples were added to SampleUtils. I recommend *NOT USING* the old getSamplesFromCommandLineInput as this mixing of sample names with expressions and files creates a rogue error that can be challenging to catch.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6072 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-23 20:18:45 +00:00
droazen 658e65d26c 2 unrelated changes: 1) fix the variant context adaptor for dbsnp; conversion of deletions was totally broken. 2) stop using paths that include gsa-scr1 in integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6070 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:56:07 +00:00
droazen 772291c38f Error model is now built by lane and each pool is called separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6062 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:36 +00:00
droazen d323ef0461 As promised, VariantFiltration can now mask out sites within a user-specified window around the provided mask rod. By default the window is 0, but you can now use the --maskExtension argument to increase that value. Added integration tests to cover this new functionality.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6060 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:29 +00:00
droazen ea47ccf032 Implemented HET case with binomial distribution. Separated events from normal events and for now skip all extended events.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6059 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:24 +00:00
droazen 26d837f59e Factorial and log Factorial utilities avoiding overflow using the gamma function. Lots of unit tests. Everything is working great.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6058 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:20 +00:00
droazen 8d5b4af8ca Binomial and Multinomial interfaces for probability and coefficients in log and real space. Passed all unit tests.
BinomialCumulativeProbability was reformatted to follow the now standard parameter order.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6057 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:15 +00:00
droazen 4abb7c424b implementation of the Gamma function and log10 Binomial / Multinomial coefficients. Unit tests for gamma and binomial passed with honors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6056 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:55:09 +00:00
droazen ff6386c29b binomial coefficient was in log2, changed to log10.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6052 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:55 +00:00
droazen 082abfd84f implementation of the truth allele, different cases for REF , HOMVAR, FILTERED and HET.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6051 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:51 +00:00
droazen 3f974c62e6 Reorganized init() to check for RODs (reference / truth)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6050 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:48 +00:00
droazen 6f5a08ddc6 Simple walker to look at SNPs near indels. Didn't need to make this a walker and commit it, but used it as an opportunity to play with GIT in unstable.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6049 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:44 +00:00
droazen 2e3d6754cd First implementation of the Error Model.
Added stratification by lane to ReadBackedPileup.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6046 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:33 +00:00
droazen 27b1418b84 PSP2 output much better. Good masking of repetitive regions. Flagging of invalid amplicons rather than omission of them, reasons properly given. Kiran doesn't like the trailing comma, but the trailing comma also doesn't like Kiran.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6045 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:27 +00:00
droazen 9a00d81d57 Is git commit -a different than git commit?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6043 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:19 +00:00
droazen 84dd72e6cb Adding in some read filters, updating MathUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6042 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:15 +00:00
droazen 4f7a64a798 Fixing broken walker as per GS; adding integration test to cover it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6040 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:54:04 +00:00
droazen 0e057276ae Changing the default behavior of the IndelRealigner to run without Smith-Waterman. Changed around the integration tests accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6039 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:58 +00:00
droazen df71d5b965 bye bye
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6035 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:42 +00:00
droazen 9b90e9385d Putting new association files, some qscripts, and the new pick sequenom probes file under local version control. I notice some dumb emacs backup files, I'll kill those off momentarily. Also minor changes to GenomeLoc (getStartLoc() and getEndLoc() convenience methods)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6034 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:37 +00:00
droazen 53c089949e Added integration test for -n parameter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6032 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:22 +00:00
ebanks 745935ffc2 No longer used - instead see the ConstrainedMateFixingManager class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6030 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 19:38:17 +00:00
ebanks b35df9a0f7 Removing unnecessary String.format calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6028 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 15:30:52 +00:00
delangel b7a1beff3c Bug fixes and rewrite of logic of several parts in SelectVariants:
a) If we were selecting SNPs or Indels and there was one of each at the same location, only whichever one was pulled first was processed.
b) Fixed logic error when selecting Mendelian Violations: if that option was used it wasn't possible to combine with other options.
c) Fixed logic error when using -disc option: you shouldn't parse genotypes to check whether a site is present or not because a vcf can be sites-only and this is slow.
d) Made -disc option work in the same way as other options: variants are now selected from "variant" track all the time, and variants which are not in disc track are kept. Inverse logic (keep disc variants not present in "variant") is confusing and prevents users from combining with different options.

With these changes it is now possible to ask for example "Give me all indels which are Mendelian Violations, not in dbsnp and present in these samples" which was not possible before.

Integration tests covering the above are forthcoming.

 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6027 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 14:05:00 +00:00
kshakir a1f8aa90c0 Added an integration test showing how to use LSF C API to get LSF parameters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6025 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-21 22:54:55 +00:00
ebanks 8e149cc52f Fixing a silly bug of mine: when a realignment target begins at position 1 of a contig, it was possible to have some reads get emitted out of order (triggering an exception in the SAMFileWriter). This is fixed by moving around some parentheses. Tim, if you are reading this: feel free to take this fix in whenever it's convenient. I.e. it's not critical as the only user who has been hit by it has a reference with over 130K short contigs. Committing in SVN so that it gets incorporated immediately (and I can respond on GS now).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6024 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-21 21:42:38 +00:00
asivache 6dd41c8489 Nway writer takes another argument: whether to create index on the fly. Realigner in NWayOut mode currently will ALWAYS create index on the fly as there seems to be no clean way to extract the requested value from argument collection in the presence of a different @Output stream.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6023 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-21 17:26:04 +00:00
asivache 78461bac1e Default logic (and name) has changed. Now somatic mode is default one. In order to run in single-sample (unpaired) mode, one has to use (hidden) --unpaired option.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6022 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-21 17:08:41 +00:00
chartl c5de06a641 Fixing up the RefSeqCodec so a bad entry in RefSeq (some transcripts are odd and have a negative length which may signify something special (?) ) doesn't cause failure, but issues a warning instead. Integration tests pass.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6021 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-21 14:07:58 +00:00
asivache 7c322780d3 Nway out fixed: in this mode a special nwayout sam writer is instantiated and passed to constrained manager. All the dispatching of the reads into separate output sam streams is taken care by that writer, so no other special processing is needed at the realigner/manager level.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6020 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-20 19:35:26 +00:00
ebanks 600a6a43a6 Reverting previous commit, as promised.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6019 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-20 16:30:19 +00:00
ebanks ee18c9b0c2 Temporary commit to please those in 320: re-support the -knownsOnly argument (@Hidden). This will be reverted in a sec.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6018 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-20 16:28:58 +00:00
rpoplin e8738f95c5 This warning message scares too many people.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6016 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-20 13:43:16 +00:00
depristo 4c6d0e6143 Added stratification by discrete allele count, just like AF, but requiring genotypes so it can be exact. Added docs on wiki, and integrationtest using Kiran's very nice fundamental VCF
VariantEvalWalker now passes a pointer to itself to the Stratefication setVariantEvalWalker (and assoc. get method) so that stratefications can look at VEWalker variables to obtain information necessary for their calculations, like the list of eval samples.  This is a better interface, in my opinion, than the current approach of extending the base abstract Stratefication to include an initialize function that has all arguments necessarily for any Strat.  
JEXL expressions now provide access to the VariantContext vc object itself, so you can write JEXL's that directly use VariantContext and associated functions from the command line.
ExomePostQC Queue script now creates a byAC eval using the new strat, and no longer produces a byAF file (as this was not exact, and lead to strange punctile behavior when actual AF quantization was out of sync with fix quantization of AF strat.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6015 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-19 03:11:00 +00:00
asivache 64196b6c7a Writer implementation that can dispatch reads to maltiple underlying bam files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6013 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-17 20:44:17 +00:00
depristo 1afd24c831 SliceBams now handles properly the case where multiple read groups clash in the input BAM files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6012 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-17 20:19:19 +00:00
ebanks dd1d9cd76f Forgot to deprecate the old args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6009 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 17:54:44 +00:00
ebanks 4e85416af1 [Foiled yet again when trying to do this in git] Slight modifications in the argument structure for the IndelRealigner. Instead of boolean flags -knownsOnly and -doNotUseSW, we now have an enum --consensusDeterminationModel which lets you specify knowns only, also use indels in reads, or also use SW. Please note that the default behavior of IR has not changed at all (and won't for a few more days) - that'll be done in GIT (fingers crossed).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6008 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 17:35:37 +00:00
depristo 4304fc4862 Fixed up md5s
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6007 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 16:20:41 +00:00
depristo 43fdd31e20 Significant performance optimization for reduced reads due to better algorithm for including reads in the variable regions. Fixed a critical bug that actually produced multiple copies of the same read in the variable regions with this optimization as well. Scala exploration script updated as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6005 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-16 12:54:59 +00:00
rpoplin d7430c23f8 Bringing VQSR up to date with the 1000G v2b changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6000 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 20:23:43 +00:00
asivache 04ecbf10ab Fixes the constraint-generated error about stop being less than start in GenomeLocParser.createGenomeLoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5999 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:44:11 +00:00
ebanks 5be4f31515 Surprisingly, the TileCovariate was indeed covered in integration tests. Updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5997 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 17:40:23 +00:00
ebanks d00d4fd4d6 Obsolete covariate class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5993 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-14 14:11:47 +00:00
rpoplin db43e3f1ab Fixing an apparent parenthesis matching problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5986 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 18:52:14 +00:00
rpoplin 3534f412c9 Better error message for the case of input variants found in ApplyRecalibration that were never seen during VariantRecalibrator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5979 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-13 14:45:28 +00:00
rpoplin 6231bba288 Bug fix for mergeInfoWithMaxAC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5978 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-12 20:10:16 +00:00