gatk-3.8

Commit Graph

Author	SHA1	Message	Date
hanna	e313eeede8	Push command-line expansions, such as BAM list unpacking and -B tag parsing, out into the CommandLine* classes. This makes it easier for external functionality (such as the VCF streamer) to use GenomeAnalysisEngine directly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4897 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 19:00:17 +00:00
depristo	66cca7de0f	renamed genotypesArePhased to isPhased, as the previous name was incorrect for several reasons. Added setPhase() to MutableGenotype. Other classes changed to reflect renaming to isPhased(). CombineVariants now supports an experimental MASTER mode where it consumes -B:master,vcf and -B:xi,vcf for any number i and updates the master with phasing information in xi. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4896 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 17:42:05 +00:00
chartl	2235245af0	PrivatePermutations generalized to compute transition counts and average probabilities (and thus was renamed). Changes in some pipelines to reflect the change. Bugfix in the batch merging pipeline (it would halt because the allele VCF for genotyping batches could become off-spec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4894 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 15:16:15 +00:00
delangel	a1653f0c83	Another major redo for indel genotyper: this time, add ability to do allele and variant discovery, and don't rely necessarily on external vcf's to provide candidate variants and alleles (e.g. by using IndelGenotyperV2). This has two major advantages: speed, and more fine-grained control of discovery process. Code is still under test and analysis but this version should be hopefully stable. Ability to genotype candidate variants from input vcf is retained and can be turned on by command line argument but is disabled by default. Code, by default, will build a consensus of the most common indel event at a pileup. If that consensus allele has a count bigger than N (=5 by default), we proceed to genotype by computing probabilistic realigmment, AF distribution etc. and possibly emmiting a call. Needed for this, also added ability to build haplotypes from list of alleles instead of from a variant context. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4893 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-22 02:38:06 +00:00
hanna	09c7ea879d	Merging GenomeAnalysisEngine and AbstractGenomeAnalysisEngine back together. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4889 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-21 02:09:46 +00:00
depristo	b3ac47812c	No longer emits records at filtered sites, in sub-sampling mode git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4883 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 16:43:50 +00:00
depristo	60880b925f	VC utils prune method now will keep genotype attributes as well as info keys. RBP now emits a far reduce (NO INFO, only GT:GQ:PG) records, further reducing size of phasing output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4882 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 16:33:14 +00:00
depristo	8604335566	Minor improvements to further reduce debugging output. When running in -samplesToPhase mode, now only including the samples to phase in the output VCF, making it very much smaller. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4881 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 16:19:47 +00:00
depristo	ff90c24f28	RBP now supports operating on a subset of samples, outputting a much reduced VCF file appropriate for merging later. Also, general optimization to avoid printing enormous amounts of data to logger.debug by using a glocal static variable DEBUG that conditionally allows writing to the variable. Passes integration tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4880 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-20 16:03:28 +00:00
depristo	a3729bd59c	Now I call BeforeMethod correctly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4872 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 22:45:45 +00:00
depristo	b7e4a015c0	static thread cache reset in UnitTest git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4870 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 21:53:10 +00:00
depristo	3bbc6a0540	Slightly more thread safe CachingIndexedFastaSequenceFile.java. Likely passes parallel testing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4869 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 21:05:17 +00:00
depristo	5dd0e8388b	Fixed a bug in UnitTest git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4867 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 19:44:35 +00:00
depristo	4a54f3f230	ThreadLocal version of CachingIndexedFastaSequenceFile. More efficient support for shared memory BAQ calculations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4865 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 15:44:48 +00:00
depristo	32d5397c01	Experimental support for sided annotations. Currently not more/less valuable than two-tailed testing. Future experiments are needed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4864 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 15:08:31 +00:00
handsake	21dc05138a	Bug fixes for the bwa aligner and changes to support compiling against newer releases of the bwa code base. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4863 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-17 14:49:15 +00:00
chartl	2bd2667516	Another privately-owned class to add before re-checking out repository git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4858 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 18:14:51 +00:00
chartl	e406eb0f95	Adding a useful accessor method to TableFeature git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4856 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 18:11:51 +00:00
ebanks	8ab4704b4c	Adding a command-line argument to allow missing values to evaluate as false instead of true git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4854 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 05:18:12 +00:00
ebanks	9f3e56e487	VariantAnnotator shouldn't die when multiple records occur at the same position git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4853 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-16 04:05:47 +00:00
hanna	acfe83920b	'-L unmapped': adding integration tests for explicitly including (-L unmapped) unmapped reads and explicitly excluding (-XL unmapped) unmapped reads, augmenting the suite of unit tests already put in place. '-L unmapped' seems safe to use; go for it, but please validate results against samtools flagstat when the process finishes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4849 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 23:11:46 +00:00
ebanks	dabdeb729e	Eric broke the build. Eric broke the build. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4847 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 17:01:38 +00:00
ebanks	5c0b66cb7c	3 big changes that all kill the integration tests: 1. Don't cap the PLs by 255 anymore. 2. Move over to the 3state model as the only available base model for UG (no more base transition tables). 3. New QD implementation when GLs/PLs are available. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4846 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 16:24:28 +00:00
chartl	5a27d231fa	Rename it so that nobody else falls into the trap laid out (the test is VariantToTable, the walker is Variant[s]ToTable) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4844 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 11:43:00 +00:00
chartl	5e27e9162f	Huh? I thought we parsed out comma-separated command line arguments into list automatically...just change the syntax of the integration test, no need to update the md5 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4843 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 11:40:27 +00:00
chartl	3e75431bc8	Thanks to mark: VCFInfoToTable removed in favor of a more flexible walker. Slight change to the argument structure of the walker to make it play more nicely with Queue: the field list parsing is pushed into the command line system (e.g. the variable is exposed as a List<String> and not a String, so Queue doesn't have to join a list into a string only to have it broken out again. This also allows the user to specify -F field1 -F field2 -F field3 if he/she so desires. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4842 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-15 03:33:36 +00:00
kshakir	01323447c6	Removed LibBat.SUB2_BSUB_BLOCK since the use of it exits the JVM. Fixed integration tests to wait on their own for the job to run instead of using SUB2_BSUB_BLOCK. Updated VariantRecalibrationIntegrationTests MD5s which were knocked out of sync whele SUB2_BSUB_BLOCK was exiting in the middle of integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4840 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 19:57:20 +00:00
hanna	67c07d1a6a	Fixed recently introduced multiplexer issue where DoC couldn't be written directly to command-line. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4839 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 19:35:15 +00:00
hanna	526ae92093	Getting back to '-L unmapped': - basic unit tests for interval sorting and merging with mix of mapped/unmapped. - validation to ensure that locus walkers (really all non-read walkers) blow up with a user error when -L unmapped is specified. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4837 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 18:24:18 +00:00
ebanks	afd4655674	Use @Output instead of @Argument. As a side note, Chris I'm ready for this nightmare to go away... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4835 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 17:13:15 +00:00
ebanks	cf7d932a17	Fix for f***ed up BWA alignments that adhere to SAM specs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4834 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 17:12:25 +00:00
kshakir	d550fdfd60	Disabling integration test to see if this restores the full test suite. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4833 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-14 15:27:02 +00:00
delangel	a5008faca8	Bug fix: when getting variant contexts at a site, we need to get only variants that start at current location, otherwise we get duplicated records when filtering indels. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4830 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 19:23:10 +00:00
delangel	17db2e0e24	(forgot I hadn't committed this) - refactored IndelStatistics module and added a new inner class to compute Indel classification along with other statistics. So, we now get an extra table specifying, per sample, counts of whether indels are: - Repeat Expansions - Novel sequence And for indels of size <=2 we get a per-mononuc. or dinuc. breakdown of novels and expansions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4828 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-13 17:43:43 +00:00
chartl	cf75caf653	java changes: VariantEvalWalker's logger is made public, so that variant eval modules can access it through the parent object. DesignFileGenerator comment lists how best to bind things to it, and the feature accessor is better refined to grab the genome loc. (old change) scala changes: convenience addAll( List[CommandLineFunction] ) added to QScript class (and thus removed from the fCPV2) useful command line functions added to a new library package for command line functions (these are fast simple VCF command lines) bug fixed in ProjectManagement for the class where there's only one batch to be batch-merged (not really part of the use-case, but an edge-condition that came up during pipeline testing) first draft of a private mutations pipeline which will be elaborated in future git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4823 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-12 05:10:45 +00:00
depristo	abd6ce1c77	A TiTv-free approach for cutting variants! Apparently much better than previous approach, and will work for indels and SV will truly minor modifications to the code. Will discuss with methods group on Monday. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4822 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-11 23:08:13 +00:00
depristo	974aaa134d	Trival fix to broken build git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4820 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-11 13:56:03 +00:00
kshakir	895cb39f41	Thanks to Platform Computing tech support, found the magical environment variable BSUB_QUIET. Minor refactoring to add more of the CLibrary including setenv(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4819 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-10 21:27:12 +00:00
depristo	5b46a900b3	Final version of BAQ calculation. default gap open is 1e-4, a good sensitive value. Useful timer class SimpleTimer added. BAQ is now live. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4818 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-10 19:35:12 +00:00
ebanks	491a599b59	Minor optimization git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4817 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-10 18:56:35 +00:00
kshakir	56433ebf6b	Switched from LSF command line wrappers to JNA wrappers around the C API. Side effects: - bsub command line is no longer fully printed out. - extraBsubArgs hack is now a callback function updateJobRun. Updated FullCallingPipelineTest to reflect latest changes to fullCallingPipeline.q. Added a pipeline that tests the UGv2 runtimes at different bam counts and memory limits. Updated VE packages that live in oneoffs to compile to oneoffs. Added a hack to replace the deprecated symbol environ in Mac OS X 10.5+ which is needed by LSF7 on Mac. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4816 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-10 04:36:06 +00:00
hanna	d4d3170436	Support for '-L unmapped' in read walkers. DO NOT USE THIS PATCH YET. It has been subjected to and passes cursory testing on one dataset (and all integration tests pass). However, there's a small library of validation checks, and unit and integration tests that must be added. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4813 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-09 19:51:48 +00:00
delangel	a2d6cef181	Weird corner condition fix in indel genotyper: if there are 2 consecutive locations on candidate sites to genotype, we can get both when calling getVariantContexts and if we are triggering on an extended event - this leads to confusion and we can end up picking the wrong one. So, we require start of the vc to be the same as the start of the ref locus to be sure. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4812 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-09 19:34:23 +00:00
depristo	722819688a	Minor utility improvements to ValidateBAQ git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4809 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-09 02:19:32 +00:00
depristo	a63bbb2fec	Optimized BAQ implementation. No longer does excessive amounts of copying of arrays. At this point I'm not 100% certain where additional performance improvements would come from git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4808 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-08 21:26:30 +00:00
depristo	db55b2b0c6	Better testing of BAQ. Now really handles soft clipped reads properly by doing an expensive copy operation :-( will need to be transformed to a ByteBuffer in the near future. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4807 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-08 17:37:00 +00:00
ebanks	f1f01610f8	Remove the extra trailing tab at the end of the VCF ## header line. Unfortunately, this meant updating every freaking integration test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4806 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-08 17:22:29 +00:00
depristo	16e1bbd380	Hidden command line option to control BAQ gap open penalty for testing by me and eric. ValidateBAQWalker has misc. useful improvements. PrintReads now adds BAQ tags on output, if requested. BAQ has generally useful improvements. Refactor code to make it easier for BAQUnitTest to run. minBaseQuality enforced on output, as well as input now. Added BAQUnitTest that checks that the BAQ calculation is performing as expected. Still needs to be expanded significantly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4804 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-08 01:01:39 +00:00
depristo	1b6bec8e6b	Trivial changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4803 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-07 20:06:54 +00:00
delangel	ca7810f11d	First major update of indel genotyper: a) Really fix this time strand bias computation for indels, previous version was a partial fix only. b) Change way in which we deal with bad bases at the edge of reads. Even if a base is soft clipped in CIGAR string, there may still be dangling bases with Q=2 that may throw off QUAL computation in some sites. So, we're stricter and we also trim off those bases off read edges even if they are not soft-clipped officially. c) First feeble-minded attempt at runtime optimization - don't compute log and 10^base_qual every time. Rather, cache 10^-k/10 and log(1-10^-k/10) for all k <=60. This speeds up code about 4x. d) Further optimization: don't compute log(10^x+10^y) but rather use softMax function recently put into ExactAFCalculationModel. e) Skip bad reads where all Q=2 (sic) f) Avoid log to lin and back to log conversions of genotype likelihoods - this was legacy code from back when exact model did stuff in linear domain. This improves precision overall. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4802 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-07 18:35:22 +00:00

1 2 3 4 5 ...

3966 Commits (e313eeede8d4d54a1e55148c064d60056690d214)