gatk-3.8

Commit Graph

Author	SHA1	Message	Date
kshakir	fc8acd503e	Enabled the parameterize option for debugging PipelineTest MD5s. Fixed escaping expressions that have more than one space between arguments. Updated example to match the wiki. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5516 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-26 00:41:47 +00:00
chartl	fe7f45ee2e	First pass at recalibrating associations, with optional data whitening. Modification to the TableCodec so it can natively read bedgraph files (just needed to add an extra header marker: "track"). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5515 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-25 19:35:39 +00:00
hanna	ac39f5532e	Turn off index caching. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5514 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-25 18:48:23 +00:00
hanna	8d8aed6a67	Fix correctness issue when dynamically merging many files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5512 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-25 16:35:43 +00:00
delangel	c9283e6bc5	Refinement to previous commit: no need to duplicate code to annotate rsID since variantAnnotatorEngine is called from UG anyways. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5511 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-25 15:00:32 +00:00
delangel	3383733379	Same commit as previous one for VariantAnnotator. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5510 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-25 12:07:18 +00:00
delangel	8701dfe8d3	Hideous, horrible, hairy mutant bug: when we annotate ID field in indels, we were looking for SNP records matching the position, instead of indel records. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5509 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-25 12:04:08 +00:00
kshakir	3e3ff4a9e7	Bam gathering passes on the compression_level and the create_index flag to MergeSamFiles. VCF gathering passes on the no_header and sites_only flags to CombineVariants. Fixed deletion of gathered log files. Although they are intermediate and do not need to be re-run if not present, they should not be deleted. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5508 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-25 03:58:38 +00:00
carneiro	47279ee56e	Added --concordance option that outputs the intersection between two VCF files. Useful to see what calls were made in both technologies/algorithms. Wiki has been updated accordingly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5507 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 21:27:16 +00:00
kshakir	e47513f043	Minor updates to match the wiki documentation. Upper cased the PartitionType enum values. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5506 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 20:22:23 +00:00
kshakir	f3e94ef2be	Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output. JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar. JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar. Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath. Walkers from the GATK package are now also embedded into the Queue package. Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP. Removed the GATK jar argument from the example QScripts. Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts: 1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers. 2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3 Removed other unused code. Re-fixed dry run function ordering. Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 14:03:51 +00:00
ebanks	18271aa1f4	It never fails to amaze me that aligners can find so many different ways to place indels off the ends of contigs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5503 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 04:17:23 +00:00
ebanks	48b15d42e0	More fixes and improvements. We no longer use any bases under Q20 because random ~Q5s were cluttering the graphs; instead we grab any contiguous segments of size at least MIN_SEQUENCE_LENGTH where all bases are above Q20. Also, I implemented a quick algorithm to traverse the graph (using DFS) to choose the two best scoring paths (haplotypes). Used it successfully at NA12878 HM3 SNP sites to determine whether they are homozygous (no distiction yet between ref and alt) or heterozygous! Indels are the next target. Still have some issues to work out. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5502 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-24 03:51:19 +00:00
hanna	26e3bea76e	Fix for == used to test object equality. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5499 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-23 18:15:19 +00:00
ebanks	401d1cb97f	Bug fixes plus some debugging code added. Broke out DeBruijnVertex into its own class so that the interface is now cleaner. Still very much a work in progress. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5498 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-23 17:35:34 +00:00
hanna	37fbf17da8	Finally restored code after accidentally removing three days worth of work: schedule file infrastructure has been restored, and is now a single file. Only the exact bins required for the traversal are stored in the schedule. Very close to being able to merge schedule entries. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5497 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-23 05:52:40 +00:00
ebanks	69646ff840	... and the corresponding integration test update git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5496 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-23 01:58:07 +00:00
ebanks	ded80e0c57	Trivial change to remove space at the end of the description git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5495 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-23 01:47:46 +00:00
carneiro	3414bccb46	documentation changes to agree with the wiki git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5494 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 21:48:49 +00:00
carneiro	28149e5c5e	GenotypeAndValidate version 2, ready to be used. - now it differentiates between confident REF calls and not confident calls. - you can now use a BAM file as the truth set. - output is much clearer now dataProcessingPipeline version 2, ready to be used. - All the processing is now done at the sample level - Reads the input bam file headers to combine all lanes of the same sample. - Cleaning is now scattered/gathered. Inteligently breaks down in as many intervals as possible, given the dataset. - Outputs one processed bam file per sample (and a .list file with all processed files listed) - Much faster, low pass (read Papuans) can run in the hour queue. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5493 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 20:18:02 +00:00
chartl	687b2e51b4	Switch from togglable wiggle output to togglable bedgraph format. Can be pulled directly into IGV to show the statistics values. I'll need to bug jim to allow value-toggling in a bedgraph, currently 2nd and 3rd columns are just ignored. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5492 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 17:58:53 +00:00
chartl	5a79f16ea4	Fixed an edge case where an exception was thrown if either of the sets was empty for the MWU test. Also altered the output format so U itself is not printed (which though interesting, isn't so useful for recalibration), but rather a value I call V (really the deviation of U from its expectation). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5490 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 16:28:44 +00:00
ebanks	af7f78e8ba	Minor debugging output change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5488 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 12:59:26 +00:00
ebanks	b463faad92	Fixing typo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5487 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 03:57:11 +00:00
ebanks	1a9e65bcd4	Updating other walkers now that VCC extends from VC git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5486 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 03:10:40 +00:00
ebanks	0ee687e49d	For Mauricio: now, even in GENOTYPE_GIVEN_ALLELES mode, the VariantCallContext (which now inherits directly from VC) will report reference calls as confidently called if they pass the threshold even if the QUAL of the record itself is low because we were forced to have an ALT allele. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5485 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-22 02:42:28 +00:00
ebanks	ab6a815184	As per the comments in the commit itself: when reads get mapped to the junction of two chromosomes (e.g. MT since it is actually circular DNA), their unmapped bit is set, but they are given legitimate coordinates. The Picard code will come in and move the read all the way back to its mate - which can be arbitrarily far away and cause records to be written out of order. Very evil. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5484 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-21 20:30:24 +00:00
ebanks	d9202f2764	Don't try to create a GenomeLoc from an unmapped read git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5480 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-21 13:46:55 +00:00
ebanks	1c95208e26	Finally found the bug that everyone is reporting on GS. Iterators on PriorityQueues aren't guaranteed to return elements in sorted order (a pretty stupid contract) - so we were passing items to the constrained writer out of order. Just do a Collections.sort instead (1 line of code). Happy father's day! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5476 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 21:28:19 +00:00
ebanks	9568c84af9	Don't output these messages in INFO mode because they are scaring people unnecessarily git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5475 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 19:55:22 +00:00
depristo	22ff2573d5	Removed MAG entirely git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5474 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 19:43:23 +00:00
kiran	55897631ad	Initial attempt at identifying potentially interesting variants in a Mendelian disease context when the called genotypes are uncertain. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5473 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 19:41:35 +00:00
kshakir	b2b8a4f19f	Re-un-final'ed BAQ.MAG as it was pre r5469. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5472 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 19:40:31 +00:00
asivache	1d5326ff0c	Minor fixes to the cmd-line help messages git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5470 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 18:18:04 +00:00
depristo	7857cb5a22	Waiting to go to the hospital -- fixed a bug in the BAQ calculation where the BAQ would NPE if a read had no usable bases (all clipped, for example) but didn't fail the PF filter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5469 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 17:45:21 +00:00
fromer	e84a27ceea	OverlapWithBedInIntervalWalker calculates the average per-input-interval coverage by the BED intervals track git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5468 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 17:44:46 +00:00
depristo	abc7d1aef9	BeagleOutputToVCF now accepts an option to keep monomorphic sites. This is useful to genotype a single sample, where having AC=0 just means that the sample is hom-ref at the site. ProduceBeagleInputWalker can optionally emit a beagle markers file, necessary to use the beagled reference panel for imputation. Also supports the VQSR calibration curve idea that a site can be flagged as a certain FP, based on the VQSLOD field. This allows us to have both continuous quality in the refinement of sites as well as hard filtering at some threshold so we don't end up with lots of sites with all 1/3 1/3 1/3 likelihoods for all samples (i.e., a definite FP site where we don't know anything about the samples). Added a new VariantsToBeagleUnphased walker that writes out a marker drive hard-call unphased genotypes file suitable for imputating missing genotypes with a reference panel with beagle. Can optionally keep back a fraction of sites, marked as missing in the genotypes file, for assessment of imputation accuracy and power. The bootstrap sites can be written to a separate VCF for assessment as well. Finally, my general Queue script for creating and evaluating reference panels from VCF files. Supports explicitly genotyping a BAM file at each panel SNP site, for assessment of imputation accuracy of a reference panel. Lots of options for exploring the impact of the VQS likelihooods, multiple VCFs for constructing the reference panel, as well as fraction of sites left out in assessing the panel's power. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5467 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 03:08:38 +00:00
depristo	9b8d41160b	GENOTYPE_GIVEN_ALLELES now respects the filter status of the incoming alleles file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5466 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 02:59:28 +00:00
depristo	6281c1db6f	A nicer error (UserException now) for malformed genome locs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5465 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-18 02:58:29 +00:00
delangel	b45afe5ba8	Several major fixes and changes to new indel likelihood model: a) Scrapped the way in which we constructed candidate haplotypes because it wasnt fully correct and yielded corner conditions with incorrect genotyping and likelihood computation. Ideally, a haplotype should "cover" the read and the most likely alignments should be such that the ends of the read are inside the ends of the haplotype. This wasn't happening, and if you have a "dangling read off a haplotype" the probabilistic alignment model may prefer to shift a read instead of scoring it correctly - this is especially bad with tandem repeat insertions. So now, we build haplotypes based on the reference context and adaptively change them based on read alignment positions, plus some padding and uncertainty in the alignment. b) Changed the way soft clipped based are dealt with. Instead of either ignoring them or using them, we only use them if the read start or end position (after soft clipping) are within eventDistance of the current location. This is done because it's very common that BWA's strictly local SW implementation will soft clip every single read at an insertion position because it couldn't place that end of the read without too many mismatches, but the read is legit and the bases are good quality. If we don't take these bases into consideration, reads which are informative of an insertion event are essentially discarded because the informative part is clipped away. c) Several cleanups and fixes to the context-dependent gap penalty model based on length of HRun. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5464 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 18:39:31 +00:00
depristo	cd38dfb4ef	Now with a clearer, grammatically correct message git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5462 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 18:06:05 +00:00
depristo	10466dc7d1	I finally broke down and added a default documentation string to @Input for use in Queue scripts. It's not ideal, but I couldn't take any more queue scripts with doc="x" all over the place. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5461 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 18:05:25 +00:00
depristo	c1798a7dbc	Whitespace cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5460 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 18:04:08 +00:00
corin	30237e6824	Updated the walker to specify the build based on the user's input file name if the user does not specify the build. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5459 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 17:49:17 +00:00
carneiro	3de300e504	A walker that moves annotations from the filter field to the info field of truth annotated vcfs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5458 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 17:11:28 +00:00
ebanks	481750cbf9	Probable patch to Jerry Glenn's GetSatisfaction report. I'm having him test it out. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5456 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 16:00:50 +00:00
ebanks	3eea6e92b7	An extremely basic implementation of a deBruijn-based local assembler, using the jgrapht graph library. This is not at all optimized and has only been tested on my very simple 3-read test bams. I'm sure there are bugs in there - more testing coming soon. Insertions and deletions confirmed to generate identical graphs (except for the multiplicity of edges of course). Not worth using yet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5455 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-17 14:03:07 +00:00
hanna	28a5a177ce	Very crude implementation of writing BAM 'schedules' to disk rather that 'meta- indexes'. Not yet elegant, but proves that it circumvents the performance issues associated with the meta-index. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5454 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-16 21:48:47 +00:00
rpoplin	8d0880d33e	Misc cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5453 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-16 17:33:19 +00:00
rpoplin	c6ef6ee8b7	Recal file is in input to ApplyRecalibration not an output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5452 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-16 12:08:58 +00:00
rpoplin	8e89ff170e	Can't check substitution type of tri-allelic SNPs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5451 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-16 03:06:03 +00:00
carneiro	e2e435d52c	GenotypeAndValidate: now looks at annotations in the INFO field instead of filter field. Better output and filters repetitive calls to indel extended events. IndelUtils: added a isInsideExtendedIndel() method to filter the above mentioned. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5449 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-15 21:54:40 +00:00
rpoplin	d98503ca50	Removing some debug code from VQSRv2. VariantEval can now be stratified by contig with -ST Contig. New hidden option in CombineVariants for overlapping records to take the info fields from the record with the highest AC (while still updating AC/AN/AF correctly) instead of dropping info fields which aren't exactly the same. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5448 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-15 21:28:10 +00:00
carneiro	4b9b767eb1	SelectVariants: now keeps the YAML stuff internal... it's there if you wanna use it, but won't be published anymore. Official parameter is the string for now. VariantEval: now sports the new MendelianViolation utility class. MendelianViolationClassifier: I noticed I had broken chartl's walker by changing VariantEval, so I took the liberty to modify it to use the new library too, though I kept modifications to a minimum, could have gone into full integration if this is a useful tool, but since it's in oneoffs, I decided not to go all out. MendelianViolation: Some getter methods were added for chartl and VariantEval. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5447 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-15 18:36:55 +00:00
delangel	653fb09bb7	a) Next iteration of context-dependent gap penalty model for new probabilistic alignment indel model. Actual model is now implemented, computes homopolymer run profile for candidate haplotypes and looks up in table gap penalties based on hrun length at each position. Initial penalty model is a very naive affine penalty model with each extra hrun increment decreasing Q2 the gap open penalty, until a minimum is reached. Still needs to be tuned and ideally get data from recalibration. b) small bug fix when setting debug arguments git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5446 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-15 16:46:28 +00:00
rpoplin	bbcc4ed700	The second pass of the contrastive VQSRv2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5444 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 21:05:02 +00:00
rpoplin	2a2538136d	A version of VQSRv2 that does contrastive clustering in two passes. The walkers will be renamed when they are moved to core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5443 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 21:03:56 +00:00
carneiro	fcc347bb05	making sure the output is as pretty as I said it would be on the wiki. wikipage for this walker is up, at : http://www.broadinstitute.org/gsa/wiki/index.php/Genotype_and_Validate#Examples use it ;) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5442 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 20:32:09 +00:00
ebanks	239dae0985	Absolutely nothing to get excited about. This is just the skeleton for the local assembler. It doesn't do anything at all now except for collect reads over each -L interval and pass them to an assembly engine (which isn't implemented yet). The interface for the AssemblyEngine will change later, but for now this one is the most conducive to debugging. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5441 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 20:31:54 +00:00
corin	6d09cdd4bc	This is a walker that lets the user generate the bed file for declaring variants true positives or false positives. For use with the IGV crowd sourcing project. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5440 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 19:56:16 +00:00
depristo	f75ad0dee3	Now in Picard, and released to the public git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5439 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 19:36:56 +00:00
carneiro	9dfe4c9cb7	moving GenotypeAndValidate to the playground. It's ready to be used. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5438 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 19:19:18 +00:00
carneiro	33c7593218	YAML integrated mendelian violation utility class, integrated and tested through select variants. Wiki is updated. ps: I moved it out of tribble. If you think it should reside in a different place, just yell at me. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5436 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 16:43:37 +00:00
hanna	5406e779d2	Ryan noticed that I accidentally killed a public interface method for getting tag information. Reinstated. Proper unit test to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5434 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-14 15:51:19 +00:00
depristo	3e3ec85807	Checked for consistency with the previous integration tests, and updated the walker and test to use the new I/O system (always prints 4 digits on floats. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5433 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-13 15:24:22 +00:00
depristo	b99e27bf9b	In the process of optimizing ProduceBeagleInputWalker, discovered that the GenotypeLikelihoods, the UG, and Genotype objects were using old-style GL tags internally, and then converting from Likelihoods -> GL String -> Likelihoods -> PL String throughout the GATK. It was both painful and led to convoluted code throughout the system. Removed everything but GL conversion -> PL in the GenotypeLikelihoods objects, and now all of the codes in UG now immediately provides GenotypeLikelihoods to the Genotype objects, which is converted straight to PL now. Resulted in a 30% speed up in ProduceBeagleLikelihoods, passes integration tests without any modifications, and likely speeds up writing any VCFs with likelihoods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5432 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-13 00:07:51 +00:00
rpoplin	ceb08f9ee6	Moving some math around in VQSRv2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5431 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-12 15:15:05 +00:00
depristo	d01d4fdeb5	Optimized version of produce beagle tool, along with experimental (hidden) support for combining likelihoods depending on estimate false positive rate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5430 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-12 02:06:28 +00:00
depristo	ee8f2871f7	A better output for Genotype Concordance summary. Now does only % comp hom-ref called hom-ref, het called het, and hom-var called hom-var, which are the quantities we typically show in slides. Updated intergration tests to reflect this change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5429 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-12 02:03:48 +00:00
kshakir	93de326066	Added a new @PartitionBy for walkers to specify how to cut up their inputs. Now building all javadoc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5428 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-12 01:33:08 +00:00
delangel	8ca3390ee0	Low level plumbing work required to have a context dependent error model with the new indel probabilistic alignment model. This just adds an extra input argument and does some refactoring so that when an actual model is ready it will be easy to plug in. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5427 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-12 00:00:55 +00:00
carneiro	e35a67b3cc	changed the name of the parameter to make the wiki more uniform. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5426 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-11 17:54:53 +00:00
carneiro	4a84a81d17	SelectVariants: added parameters for mendelian violation. Given a trio vcf, it will generate a VCF with the sites that are mendelian violations. GenotypeAndValidate: now annotates the validations with callStatus. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5425 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-11 17:47:53 +00:00
delangel	b03055099a	a) Changed the way we classify and log indel events (e.g. in IndelClasses table inside IndelStatistics VE module). Made names clearer, and split logging of event length with number of repetitions of event. b) Add an experimental annotation to log indel type string inside the INFO field, just for debugging/temp analysis purposes (will consider making it standard if it proves useful). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5424 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-11 17:37:41 +00:00
rpoplin	b3464a6031	Initial commit of VQSRv2 that passes the old integration tests. Not ready to be used yet unless your name rhymes with ... oh wait, that's me. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5419 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-11 15:18:34 +00:00
depristo	ccc773d175	Refactoring, cleanup, and performance improvements to ProduceBeagleInput. It's really a shame that there's no integration tests... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5418 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-11 13:55:30 +00:00
kshakir	097a9a59e8	Updated LSF libraries to use Pointer instead of Structure.ByReference for struct arrays since the the latter is autoRead() and LSF doesn't always return null for empty arrays. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5417 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-10 22:58:54 +00:00
ebanks	4baeb5979f	It turns out that Math.log10() can return 0, which leads to QUALs being set to -0, which is off-spec. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5415 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-10 03:08:56 +00:00
ebanks	3596c56602	New attempt at the constrained movement version of the indel realigner (I've kept around the old writer for now). The new contract is that the realigner must ask permission before trying to clean an area; permission will be denied by the CM-Manager if it was required to flush its cache of reads because of too much depth within a distance of maxInsertSizeForMovingReadPairs. Added integration tests to cover different max cache sizes, including an expected exception when too small a value is chosen. The actual logic changes were fairly minor - much of this commit is really just some cleanup. I'd like to throw 1000G Phase I at it, but will respectfully wait for Ryan to hit his deadline before doing so. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5414 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-10 02:48:29 +00:00
rpoplin	ff7edc4493	Minor bug fix in empiricalMu prior calculation in VQSR. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5412 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-10 00:42:38 +00:00
fromer	0b45de14ed	Some minor updates to fully utilize the functionality of reduceByInterval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5411 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-09 20:38:08 +00:00
rpoplin	509daac9f7	Minor bug fix in k-means implementation. Updating VQSR integration tests in preparation for VQSRv2 by removing some unused features such as VariantDatum.weight and ti/tv cutting. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5410 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-09 00:26:28 +00:00
carneiro	fa7284b7a1	Genotype And Validate walker is now ready to be used by anyone. given an annotated VCF and a BAM file, it genotypes (using the reads in the BAM) each variant in the VCF (for snp or indel) and validates (or not) the 'known' annotation. Outputs a truth table with the PPV and NPV values, and optionally a vcf file with the variants that had enough coverage to be validated. You can optionally provide a minimum depth of coverage and only do the analysis conditional on that. (will write a wiki for this walker, as it might be useful for future validation essays). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5409 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-08 22:10:38 +00:00
chartl	da88c29b6e	Added a module to test for reference mismatch associations, and a self-normalized/self-normalizing version. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5408 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-08 20:01:28 +00:00
chartl	31a2575c7b	Fixes: - Don't know how I got the wiggle header so utterly wrong. Fixed. - Q-values now have a static maximum of 2000 so IGV averaging won't make everything look spikey and ugly. - Changing windows to size 100 for (hopefully) better resolution. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5406 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-08 17:16:21 +00:00
chartl	1b310401fe	Due to the approximation not being well-founded in this case, (and the non-existence of a pre-computed table at this time), pushing up the cutoff git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5405 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-08 16:24:42 +00:00
delangel	00ac51acc8	Added several integration tests for UG indel caller: - Basic - Multiple technology - Test minIndelCnt parameter Added also 2 disabled tests: - Parallelization: issue w/code right now is that if -nt > 1, filter field shows "PASS" instead to ".", cause TBD - Genotype given alleles mode: code not working yet. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5404 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-08 16:21:21 +00:00
chartl	77fe902dbd	Testing modules now use wider windows and heftier shift, hopefully this will remove some of the noisiness of the results. Some UStatistics were changed to TStatistics to try and limit noisiness as well. Walker will also additionally write out wiggle files directly (which can be converted into "proper" tdf files via igvtools tile [args] [in].wig [out].tdf [ref]) subject to some restrictions. MWU could get stuck in a long-running recursive regime, it'd be nice to have a table lookup or a good small-n large-m approximation, for now the uniform should work just fine. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5403 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-08 15:26:13 +00:00
carneiro	b733cba7c7	re-fixing for a different approach suggested by eric! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5402 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-08 04:54:49 +00:00
kiran	d0598c7a04	Somehow missed this test when I was updating the md5s git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5400 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 23:53:42 +00:00
kiran	b6339967f8	Updated GenomicAnnotator integration tests to include the -NO_HEADER argument so that they tests op yelling about trtrivial differences git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5398 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 23:07:01 +00:00
hanna	85ff983a59	Failed to include some required GenomeLoc utilities in my last commit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5397 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 23:00:17 +00:00
carneiro	02006954bc	UG: small bug fix when creating empty variant contexts in UG for the -EMIT_ALL_SITES to allow indels. GAV: First version of the walker that validates reads from a BAM file based on an annotated VCF with TP/FP annotations. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5396 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 22:51:04 +00:00
hanna	9384b2ff65	A few quick fixes to temporarily make the LowMemorySharder return exactly the same shards as the previous sharder, so that I can directly compare filespans to see where some performance bugs lie. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5395 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 22:43:14 +00:00
depristo	0b4e51317b	Now includes project consensus high sensitivity data set git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5394 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 20:52:11 +00:00
kiran	43056d0188	Fixed integration test to reflect changes regarding when comp tracks got subset to fewer samples and whether no-call sites would get pulled in for comp tracks. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5393 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 20:25:57 +00:00
carneiro	73e43d8d2c	Added functionality: -disc (--discordance) parameter together with a ROD track will output a VCF with the variants in the ROD track that are not present in the 'variants' VCF. Useful tool to list the variants from hapmap (for example) that weren't called in a dataset. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5392 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 19:18:15 +00:00
kshakir	dc33fbed7c	Switched the CVUnitTest broken info from an Integer to a String since as of r5383 Integers are no longer broken when converted to Floats. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5390 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 16:33:14 +00:00
delangel	8c262eb605	Initial commit of new likelihood model to evaluate indel quality. Principle is simple, a plain Pair HMM with affine gap penalties (in log space) that does quasi-local alignment between reads and candidate haplotypes and which in theory should be more solid and more reliable than the older Dindel-based model. It also allows to be easily extensible in the future if we decide to introduce either context-dependent and/or read-dependent gap penalties. Model is disabled by default and we're still using the old Dindel model until I'm more confident that new model is a definitive improvement, so right now this is enabled by hidden command line arguments, and it's not to be used yet. In detail: a) Several refactorings to share softMax() available to other modules, so its now part of MathUtils. b) Refactored a couple of read utilities and moved from BAQ to ReadUtils. c) New PairHMMIndelErrorModel class implementing new likelihood model d) Several new hidden debug arguments in UAC. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5389 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 15:31:58 +00:00
kshakir	96fe540d66	Removing .tmp~ file. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5388 348d0f76-0448-11de-a6fe-93d51630548a	2011-03-07 14:52:38 +00:00

1 2 3 4 5 ...

4436 Commits (ab5c4064edb2ade90da3fef96c10df5cd5a2d430)