gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	04b122be29	Fix for bug reported on GetSatisfaction	2011-11-09 20:33:36 -05:00
Mauricio Carneiro	d00b2c6599	Adding a synthetic read for filtered data * Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data. * Synthetic reads can now have deletions (but not insertions) * New reduced read tag for filtered data synthetic reads (RF) * Sliding window header now keeps information of consensus and filtered data * Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads	2011-11-09 20:16:22 -05:00
Eric Banks	21bf43f3bb	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-09 15:34:40 -05:00
Eric Banks	02d5e3025e	Added integration test for intervals from bed file	2011-11-09 15:34:19 -05:00
Christopher Hartl	85bffe1dca	Merged bug fix from Stable into Unstable	2011-11-09 15:29:14 -05:00
Christopher Hartl	d828eba7f4	Allow comments in a table-formatted file to precede the header line.	2011-11-09 15:27:38 -05:00
Eric Banks	8205efbb29	Merge branch 'master' into intervals	2011-11-09 15:27:15 -05:00
Eric Banks	d64f8a89a9	Instead of the SelfScopingFeatureCodec interface, pushed this functionality into Tribble itself. Now we can e.g. determine that a file can be parsed by the BedCodec on the fly.	2011-11-09 15:24:29 -05:00
Mauricio Carneiro	f080f64f99	Preserve RG information on new GATKSAMRecord from SAMRecord	2011-11-09 14:39:20 -05:00
Mauricio Carneiro	f9530e0768	Clean unnecessary attributes from the read this gives on average 40% file size reduction.	2011-11-09 14:39:20 -05:00
Mauricio Carneiro	9427ada498	Fixing no cigar bug empty GATKSAMRecords will have a null cigar. Treat them accordingly.	2011-11-09 14:39:20 -05:00
Mark DePristo	e639f0798e	mergeEvals allows you to treat -eval 1.vcf -eval 2.vcf as a single call set -- A bit of code cleanup in VCFUtils -- VariantEval table to create 1000G Phase I variant summary table -- First version of 1000G Phase I summary table Qscript	2011-11-09 14:35:50 -05:00
Christopher Hartl	149b79eaad	Merged bug fix from Stable into Unstable	2011-11-09 11:26:30 -05:00
Christopher Hartl	11abb4f9d1	Better error message.	2011-11-09 11:25:28 -05:00
Christopher Hartl	d3a533b82e	Revert "a" This reverts commit 1175f50ddbf389f5da74d27dc725596582ae15af.	2011-11-09 11:22:26 -05:00
Christopher Hartl	5eaf800281	a	2011-11-09 11:22:20 -05:00
Christopher Hartl	5451fbc2b2	Merged bug fix from Stable into Unstable	2011-11-09 11:06:15 -05:00
Christopher Hartl	091229e4db	MVLikelihoodRatio now checks if the family string is provided before attempting to instantiate. Also check that variant contexts have both genotypes and genotype likelihoods. Table codec now yells at users for not providing a HEADER with the table - parsing tables without a header line was causing the first line of the file to be eaten. Table feature now has a toString method. These are minor bug fixes.	2011-11-09 11:03:29 -05:00
Mauricio Carneiro	e1b4c3968f	Fixing GATKSAMRecord bug when constructing a GATKSAMRecord from scratch, we should set "mRestOfBinaryData" to null so the BAMRecord doesn't try to retrieve missing information from the non-existent bam file.	2011-11-08 16:50:36 -05:00
Ryan Poplin	e973ca2010	fixing merge conflict.	2011-11-08 14:55:05 -05:00
Ryan Poplin	b0e6afec48	Bug fix for HMM optimization. Need to also check the gap continuation penalty array for the index with the first discrepancy.	2011-11-08 14:51:25 -05:00
Laurent Francioli	571c724cfd	Added reporting of the number of genotypes updated.	2011-11-08 15:15:51 +01:00
Ryan Poplin	94dc447a70	Merged bug fix from Stable into Unstable	2011-11-07 15:26:35 -05:00
Ryan Poplin	0b181be61f	Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this.	2011-11-07 15:25:16 -05:00
Ryan Poplin	0534149708	Merged bug fix from Stable into Unstable	2011-11-07 14:07:08 -05:00
Ryan Poplin	2d1e385ca4	Adding note to VQSR docs about Rscript being needed in the environment PATH.	2011-11-07 14:04:13 -05:00
Eric Banks	759f4fe6b8	Moving unclaimed walker with bad integration test to archive	2011-11-07 13:16:38 -05:00
Eric Banks	c1986b6335	Add notes to the GATKdocs as to when a particular annotation can/cannot be calculated.	2011-11-07 11:06:19 -05:00
Eric Banks	724e3f3b0d	Merged bug fix from Stable into Unstable	2011-11-06 22:23:22 -05:00
Eric Banks	cdd40d1222	Removing contracts for the SimpleTimer	2011-11-06 22:22:49 -05:00
Ryan Poplin	5c565d28b9	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-06 10:26:19 -05:00
Eric Banks	3517489a22	Better --sample selection integration test for VE. The previous one would return true even if --sample was not working at all.	2011-11-06 01:07:49 -04:00
Eric Banks	1c4e429a1c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-06 00:05:56 -04:00
Eric Banks	a12bc63e5c	Get rid of support for bams without sample information in the read groups. This hidden option wasn't being used anyways because it wasn't hooked up properly in the AlignmentContext.	2011-11-05 23:54:28 -04:00
Eric Banks	ad57bcd693	Adding integration test to cover using expressions with IDs (-E foo.ID)	2011-11-05 23:53:15 -04:00
Eric Banks	90a053ea93	Don't change the mapping quality of MQ=255 reads in IR	2011-11-05 22:40:45 -04:00
Ryan Poplin	611a395783	Now properly extending candidate haplotypes with bases from the reference context instead of filling with padding bases. Functionality in the private Haplotype class is no longer necessary so removing it. No need to have four different Haplotype classes in the GATK.	2011-11-05 12:18:56 -04:00
Mark DePristo	e99871f587	Bug fix for decode loc -- decodeLoc() wasn't skipping input header lines, so the system blew up when there was an = line being split.	2011-11-04 13:20:54 -04:00
Mark DePristo	a340a1aeac	Bug fix. decodeLoc() should update lineNo so you get meaningful line no when indexing due to malformed VCF files.	2011-11-04 11:44:24 -04:00
Mark DePristo	9f260c0dc1	Zero byte index bug fix for RandomlySplitVariants + cleanup -- vcfWriter2 was never being closed in onTraversalDone(), so the on the fly index file was being created but never actually properly written to the file. -- This bug is ultimately due to the inability of the GATK to allow multiple VCF output writers as @Output arguments, though -- Removed the unnecessary local variable iFraction, = 1000 * the input fraction argument. Now the system just uses a double random number and compares to the input fraction at all. Is there some subtle reason I don't appreciate for this programming construct?	2011-11-04 09:45:20 -04:00
Mauricio Carneiro	e89ff063fc	GATKSAMRecord refactor The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...). * No tools should create SAMRecord anymore, use GATKSAMRecord instead *	2011-11-03 15:43:26 -04:00
Laurent Francioli	385a6abec1	Fixed a bug that wrongly swapped the mother and father genotypes in case the child genotype missing.	2011-11-03 13:04:53 +01:00
Laurent Francioli	893787de53	Functions getAsMap and getNegLog10GQ now handle missing genotype case.	2011-11-03 13:04:11 +01:00
Eric Banks	e8bceb1eaa	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 21:13:54 -04:00
Eric Banks	78a00d2ddc	Updating UG integration tests (needed updating only because the -mbq default is different from the old -mmq one).	2011-11-02 21:13:44 -04:00
Eric Banks	52b16bf739	Must check whether there's a normal vs. extended pileup before asking for it.	2011-11-02 20:45:24 -04:00
Eric Banks	e1edd6bd12	Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain.	2011-11-02 20:32:58 -04:00
Ryan Poplin	e94fcf537b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 16:29:19 -04:00
Ryan Poplin	4d35272916	Bug fixes with Mauricio to functions in ReadUtils used by reduced reads and the haplotype caller.	2011-11-02 16:29:10 -04:00
Mark DePristo	8a2929c1dd	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 16:21:00 -04:00
Laurent Francioli	19ad5b635a	- Calculation of parent/child pairs corrected - Separated the reporting of single and double mendelian violations in trios	2011-11-02 18:35:31 +01:00
Eric Banks	967ff647b8	Reduced reads shouldn't contribute to Fisher Strand calculations	2011-11-02 13:07:20 -04:00
Eric Banks	cf0e699226	QualByDepth was inefficiently iterating over the pileup 2 times for some reason. Removed non-useful annotation classes.	2011-11-02 12:58:38 -04:00
Eric Banks	4501dce58d	Fixing merge conflict	2011-11-02 12:50:32 -04:00
Eric Banks	54331b44e9	New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths.	2011-11-02 12:47:30 -04:00
Mark DePristo	392e0aeace	Moved unit tests into master IntervalUtilsUnitTest	2011-11-02 10:52:00 -04:00
Mark DePristo	c2b97030a4	IntervalUtils for completely balanced locus-based scatter/gather -- scatterLocusIntervals master utility -- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc -- Util function for reversing a list (List<T> -> List<T>, unlike Collections version) -- DoC is PartitionType.INTERVAL -- Significant unit tests on new functionality (all passing) -- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work	2011-11-02 10:49:40 -04:00
Laurent Francioli	119ca7d742	Fixed a bug in parent/child pairs reporting causing a crash in case the -mvf option was used and mother was not provided	2011-11-02 08:22:33 +01:00
Laurent Francioli	b91a9c4711	- Fixed parent/child pairs handling (was crashing before) - Added parent/child pair reporting	2011-11-02 08:04:01 +01:00
Mark DePristo	5fc613f972	Better default partition types for walkers -- Added PartitionType.READ, and associated ReadScatterFunction. ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better -- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.	2011-11-01 19:47:10 -04:00
Mauricio Carneiro	36600fd8e9	added MQ of low MQ/BQ to consensus RMS Bases that were excluded for MQ and BQ filters are now contributing to the MQ RMS (but not to consensus base counts and variant/not variant region triggers).	2011-11-01 17:46:12 -04:00
Mauricio Carneiro	b004489c6d	Moving ReduceRead TAG to GATKSAMRecord ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.	2011-11-01 17:12:09 -04:00
Mauricio Carneiro	17cc484dbd	Revert "ReduceReads ref bases are now output as '=' Reducing the reference bases to '=' results in an extra compression of 13% on average. The GATK is not ready to handle files with '=' bases, and the decision was to implement this a an engine support, not a part of ReduceReads.	2011-11-01 16:35:07 -04:00
Eric Banks	0839c75c8d	More minor fixes to docs	2011-10-31 21:49:27 -04:00
Eric Banks	74b018a1f3	Minor fixes to docs	2011-10-31 21:41:43 -04:00
Eric Banks	31ee5432c5	Merged bug fix from Stable into Unstable	2011-10-31 14:56:59 -04:00
David Roazen	cdde32acbd	Merged bug fix from Stable into Unstable	2011-10-31 14:21:15 -04:00
Eric Banks	f62af0291b	Check for invalid VCF records (not enough tokens) instead of assuming they are there.	2011-10-31 14:09:51 -04:00
Andrey Sivachenko	bed0acaed4	nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them	2011-10-31 12:28:28 -04:00
Mauricio Carneiro	389380a590	ReduceReads ref bases are now output as '=' to save space Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.	2011-10-30 12:04:39 -04:00
Eric Banks	0ca7428e76	Allow processing of empty intervals, but warn user when this case is encountered.	2011-10-28 12:12:14 -04:00
Eric Banks	649dfe98f0	Add VCF header for any expressions that are requested	2011-10-28 10:22:19 -04:00
Eric Banks	8b1a62da27	Adding unit test to cover overlapping intervals from the same source with the intersection rule.	2011-10-28 09:59:43 -04:00
Eric Banks	057a79f598	This argument should be annotated as @Input	2011-10-28 09:44:49 -04:00
Eric Banks	4ba7c0cecd	Moving to private	2011-10-28 09:29:28 -04:00
Eric Banks	1bdd76c2f2	These tools now use the IntervalBinding system to handle intervals instead of doing it all manually	2011-10-28 09:28:12 -04:00
Eric Banks	6ba08a103d	Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory.	2011-10-28 09:23:25 -04:00
Eric Banks	3d04bb5608	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-27 23:55:18 -04:00
Eric Banks	19e27d4568	Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative.	2011-10-27 23:55:11 -04:00
Eric Banks	cafc245a43	For some reason, a class of Codecs (including TableCodec) require that a GenomeLocParser be passed in to do the position processing. Why can't they just return a Feature with chr, start, stop? Isn't that the right thing?	2011-10-27 23:54:28 -04:00
Guillermo del Angel	cbc43683ee	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-27 20:54:18 -04:00
Guillermo del Angel	8907e42007	First fully functional implementation of ValidationSiteSelectorWalker. User gives a) a set of input variants, b) a desired number of output variants, b) Optionally, a set of samples which will restrict sites to be polymorphic in those samples, c) a frequency selection mode: either uniform (no AF matching), or matching AF so that output sites mirror the input AF spectrum as closely as possible. More testing is needed and docs need improving but so far all functionality seems up and running	2011-10-27 20:53:48 -04:00
Eric Banks	ccfd853b34	Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.	2011-10-27 20:43:50 -04:00
Eric Banks	c2f343773e	Oops, working too quickly last time. This is the proper fix for the potential NPE in the equals() test.	2011-10-27 15:32:08 -04:00
Khalid Shakir	b80d407dc7	No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path. Other minor cleanup.	2011-10-27 14:17:07 -04:00
Eric Banks	8c4dbce6d8	Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing.	2011-10-27 13:58:19 -04:00
Eric Banks	4a7e6fee3f	Remove support for BED file interval parsing in the GATK; it should all go through Tribble now. IndelRealigner no longer supports unordered interval input (which shouldn't have been used anyways). Temporarily commenting out serialization of arguments so that tests pass; this whole piece will be deleted soon anyways.	2011-10-27 13:38:08 -04:00
Matt Hanna	f7df8bdecc	Merged bug fix from Stable into Unstable	2011-10-27 11:31:17 -04:00
Matt Hanna	41ddc7bce7	Make sure we output a full stack trace when we encounter Tribble error messages on VCF header merge.	2011-10-27 11:30:04 -04:00
Eric Banks	44f905b5e5	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 23:31:11 -04:00
Eric Banks	68283b1651	Fixing docs and adding GATKdocs for the new interval functionality	2011-10-26 22:14:43 -04:00
Mark DePristo	c9978316a3	Merge branch 'FragmentUtils'	2011-10-26 19:51:49 -04:00
Mauricio Carneiro	add9ad97ec	No scatter gather for VQSR or ApplyVQSR. These walkers should not be scatter gatherable. Annotating them accordingly so that Queue doesn't allow a less than knowledgeable user to try and scatter/gather VQSR.	2011-10-26 16:35:44 -04:00
Ryan Poplin	74aeb22eeb	Merged bug fix from Stable into Unstable	2011-10-26 15:57:30 -04:00
Ryan Poplin	86871bd1e3	Throw a UserException in the BQSR when there is no data instead of creating an empty csv file	2011-10-26 15:56:41 -04:00
Mark DePristo	034a997d07	Generalized Reads -> Fragment calculation -- Supports ReadBackedPileup -> FragmentCollection as before -- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller -- General cleanup, renaming, move to separate package, more extensive unit tests, etc. -- Added toFragment() function to ReadBackedPileup interface	2011-10-26 15:54:38 -04:00
Eric Banks	2f21b6ecfb	Removed debugging output	2011-10-26 15:50:20 -04:00
Eric Banks	b39fcb1bea	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 15:44:25 -04:00
Eric Banks	b6ce6ed3f8	Go around the ROD system for now so that we can just call decodeLoc() for efficiency. Noted that we should go through the ROD system once it gets cleaned up. This means that currently gzipped files are not supported with -L.	2011-10-26 15:42:53 -04:00
Eric Banks	3273c20c98	Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes.	2011-10-26 15:29:18 -04:00
Eric Banks	9424e8b2ca	Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed.	2011-10-26 14:11:49 -04:00
Mark DePristo	7fa943aef1	Renamed FragmentPileup to FragmentUtils	2011-10-26 14:01:45 -04:00
Laurent Francioli	1f044faedd	- Genotype assignment in case of equally likeli combination is now random - Genotype combinations with 0 confidence are now left unphased	2011-10-26 19:57:09 +02:00
Laurent Francioli	81b163ff4d	Indentation	2011-10-26 14:49:12 +02:00
Laurent Francioli	62cff266d4	GQ calculation corrected for most likely genotype	2011-10-26 14:40:04 +02:00
Mark DePristo	af3613cc5f	GATKSAMRecord commit branch summary First, I'm sure there's a better way to do this, but I wanted to create a single commit summarizing the changes from my branch SamRecordFactory. What's the best way to do this? Rebase? Now, on to the changes here: -- Picard added a SamRecordFactory that is used to create instances the subclass SamRecord or BAMRecord. This factory allows us to have low-level picard readers (SamFileReader) create objects of type GATKSamRecord. The abomination of the extends and contains GATKSamRecord is now gone. GATKSamRecords are now produced by this factory, the GATK provides this factory to our SamFileReaders, and everything works with GATKSamRecord just extending BAMRecord. This results in up to a 2x performance improvement in writing BAMs and a ~10% improvement when reading BAMs files. -- As a consequence of this, we no longer officially support SAM records. Attempting to create SAMRecord objects with the factory will throw a user exception. -- Created a standard NGSPlatform enum, and GATKSamRecords support efficiently obtaining this value. The real BQSR (not the copy indel version) got the efficient code to use this. Please add all future platforms to this enum. -- GATKSamRecord no longer supports using the OQ or defaultBaseQuality. This is performed in a wrapper iterator that's only added when these command line options are used. -- ReducedRead code has been moved from ReadUtils until efficiency caching assessors in GATKSamRecord. -- ArtificialSamUtils creates GATKSamRecords now, just SAMRecords. Added code here to create artifical pairs and using that code to create artificial ReadBackedPileups with specific properties -- New smarter algorithm for FragmentPileup. This new code is up to 3x faster than the previous version, and is lazy so is more efficient when no overlapping pairs are actually in the pileup. Created extensive DataProvider driven UnitTest. Added Caliper-based benchmarking system to characterize the performance differences between the old and new algorithms. TODO still remains to make a efficient version that works for non-pileups for the HaplotypeCaller	2011-10-25 20:52:56 -04:00
Mark DePristo	2822f0dc27	Merge branch 'SamRecordFactory'	2011-10-25 20:34:47 -04:00
Mark DePristo	1b722c21cf	merge master	2011-10-25 16:08:39 -04:00
Ryan Poplin	56fdf0b865	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-25 15:58:56 -04:00
Ryan Poplin	4a34c1862e	misc cleanup. We now filter out haplotypes when it is obvious that the assembly has failed to find a parsimonious event rather than use haplotypes with large numbers of SNPs and small indels on them.	2011-10-25 15:22:28 -04:00
David Roazen	2794e5c1d4	Modified the VCFJarClassLoadingUnitTest to play nice with the packaged-jar test targets.	2011-10-25 14:47:15 -04:00
Guillermo del Angel	b559936b7a	a)New variant eval stratification module for indel size. b) Next iteration on indel caller runtime optimization: when computing likelihood of each haplotype for a given read, many computations will be redundant since pieces of haplotypes will be common to both REF and ALT haplotypes. So, we keep HMM matrices from one haplotype to the next one and recompute starting at the part where either haplotype is different or GOP/GCP are different.	2011-10-25 09:56:43 -04:00
Khalid Shakir	fac9932938	Embedding gsalib source and queueJobReport R scripts in the dist and package jars. Moved gsalib and queueJobReport.R to embeddable namespaced locations. Updated packager dependencies/dir to add an @includes which filters the embedded fileset. RScriptExecutor can now JIT compiles the gsalib. RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG. Refactored ProcessController and IOUtils from Queue to Sting Utils. Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count. Replaced uses of some IOUtils with Apache Commons IO. ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown. Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().	2011-10-24 15:58:34 -04:00
Khalid Shakir	89a581a66f	Added ability to specify arguments in files via -args/--arg_file Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()	2011-10-24 15:58:34 -04:00
Mark DePristo	502592671d	Cleanup FragmentPileup before main repo commit -- removed intermiate functions. Now only original version and best optimized new version remain -- Moved general artificial read backed pileup creation code into ArtificialSamUtils	2011-10-24 14:40:05 -04:00
Mark DePristo	166174a551	Google caliper example execution script -- FragmentPileup with final performance testing	2011-10-24 14:04:53 -04:00
Laurent Francioli	62477a0810	Added documentation and comments	2011-10-24 13:45:21 +02:00
Laurent Francioli	38ebf3141a	- Now supports parent/child pairs - Sites with missing genotypes in pairs/trios are handled as follows: -- Missing child -> Homozygous parents are phased, no transmission probability is emitted -- Two individuals missing -> Phase if homozygous, no transmission probability is emitted -- One parent missing -> Phased / transmission probability emitted - Mutation prior set as argument	2011-10-24 12:30:04 +02:00
Laurent Francioli	7312e35c71	Now makes use of standard Allele and Genotype classes. This allowed quite some code cleaning.	2011-10-24 10:25:53 +02:00
Laurent Francioli	01b16abc8d	Genotype quality calculation modified to handle all genotypes the same way. This is inconsistent with GQ output by the UG but is correct even for cases of poor quality genotypes.	2011-10-24 10:24:41 +02:00
Mark DePristo	f6ccac889b	Merged bug fix from Stable into Unstable	2011-10-23 16:37:12 -04:00
Mark DePristo	585a45b7a3	Bug fix for ClipReadsWalker when stats output isn't provided -- See http://getsatisfaction.com/gsa/topics/clipreadswalker?utm_content=topic_link&utm_medium=email&utm_source=reply_notification	2011-10-23 16:36:48 -04:00
Ryan Poplin	f5d910b8a5	Haplotype caller now sends genotype likelihoods to the exact model to genotype the events found in the best haplotypes.	2011-10-23 13:29:08 -04:00
Mark DePristo	42bf9adede	Initial version of "fast" FragmentPileup code -- Uses mayOverlapRoutine in ReadUtils -- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations -- PileupElement now comparable (sorts on offset than on start) -- Caliper microbenchmark to assess performance	2011-10-22 21:36:37 -04:00
Mauricio Carneiro	4913f8a60f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-21 17:45:07 -04:00
Mauricio Carneiro	102dafdcbc	Validation of GATKSamRecord in read filters Moved the validation of the GATKSamRecord to the MalformedReadFilter with the intent to make the read filter the ultimate validation location for sam records. This way we can opt to filter out malformed reads if we know what we are doing or blow up otherwise.	2011-10-21 17:40:43 -04:00
Guillermo del Angel	f4b409fa0d	CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result	2011-10-21 14:07:20 -04:00
Mark DePristo	b863390cb1	Moving reduced read functionality into GATKSAMRecord -- More functions take / produce GATKSAMRecords instead of SAMRecord	2011-10-21 13:28:05 -04:00
Mark DePristo	2403e96062	Renamed GATKSamRecord -> GATKSAMRecord for consistency. Better docs.	2011-10-21 09:59:24 -04:00
Mark DePristo	110e13bc1e	Merge branch 'master' into SamRecordFactory	2011-10-21 09:43:52 -04:00
Mark DePristo	be797a8a1f	Recalibrator now uses the much more efficient NGSPlatform in the cycle covariates system	2011-10-21 09:39:21 -04:00
Mark DePristo	ed74ebcfa1	GATKSamRecords with efficiency NGSPlatform method	2011-10-21 09:38:41 -04:00
Mark DePristo	94e1898d8f	A canonical set of NGS platforms as enums with convenient manipulation methods	2011-10-21 09:37:45 -04:00
Laurent Francioli	edea90786a	Genotype quality is now recalculated for each of the phased Genotypes. Small problem is that we unnecessarily loose a little precision on the genotypes that do not change after assignment.	2011-10-20 17:04:19 +02:00
Laurent Francioli	1c61a57329	Original rewrite of PhaseByTransmission: - Adapted to get the trio information from the SampleDB (i.e. from Pedigree file (ped)) => Multiple trios can be passed as argument - Mendelian violations and trio phasing possibilities are pre-calculated and stored in Maps. => Runtime is ~3x faster - Genotype combinations possible only given two MVs are now given a squared MV prior (e.g. 0/0+0/0=>1/1 is given 10^-16 prior if the MV prior is 10^-8) - Corrected bug: In case the best genotype combination is Het/Het/Het, the genotypes are now set appropriately (before original genotypes were left even if they weren't Het/Het/Het) - Basic reporting added: -- mvf argument let the user specify a file to report remaining MVs -- When the walker ends, some basic stats about the genotype reconfiguration and phasing are output Known problems: - GQ is not recalculated even if the genotype changes Possible improvements: - Phase partially typed trios - Use standard Allele/Genotype Classes for the storage of the pre-calculated phase	2011-10-20 13:06:44 +02:00
Laurent Francioli	ef6a6fdfe4	Added getAsMap -> returns the likelihoods as an EnumMap with Genotypes as keys and likelihoods as values.	2011-10-20 12:49:18 +02:00
Laurent Francioli	76dd816e70	Added getParents() -> returns an arrayList containing the sample's parent(s) if available	2011-10-20 12:47:27 +02:00
Mark DePristo	999a8998ae	Constructor for GATKSamRecord with header only, for unit testing	2011-10-19 17:51:48 -04:00
Mark DePristo	3227143a1c	Systematic test code for FragmentPileup -- Creates all combinatinos of overlapping and non-overlapping read pair pileups in all orientations and first/second pairings to validate fragment detection.	2011-10-19 17:50:27 -04:00
Mark DePristo	bba69701b5	Now creates GATKSamRecords now SamRecords	2011-10-19 17:49:17 -04:00
Christopher Hartl	cd8a6d62bb	You know how the wiki has a big section on commiting local changes to BRANCHES of the repository you clone it from? Yeah. It sucks if you don't do that. This commit contains: - IntronLossGenotyper is brought into its current incarnation - A couple of simple new filters (ReadName is super useful for debugging, MateUnmapped is useful for selecting out reads that may have a relevant unaligned mate) - RFA now matches my current local repository. It's in flux since I'm transitioning to the new traversal type. + the triggering read stash pilot required me to change the scope of some of the variables in the ReadClipping code, private -> protected. Those are all the changes there. - MendelianViolation restored to its former glory (and an annotator module that uses the likelihood calculation has been added) + use this rather than a hard GQ threshold if you're doing MV analyses. - Some miscellaneous QScripts	2011-10-19 17:42:37 -04:00
Mark DePristo	52345f0aec	Meaningful documentation string	2011-10-19 15:47:36 -04:00
Mark DePristo	1b38aa1a7e	Cleaning up reduced read code accessors	2011-10-19 15:46:44 -04:00
Eric Banks	d8d73fe4f2	Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.	2011-10-19 15:11:13 -04:00
Mark DePristo	7928b287fc	GATKSamRecord now produced by SAMFileReaders by default -- Removed all of the unnecessary caching operations in GATKSAMRecord -- GATKSAMRecord renamed to GATKSamRecord for consistency	2011-10-19 13:15:27 -04:00
Eric Banks	5a6468c11e	Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.	2011-10-19 11:52:05 -04:00
Eric Banks	48c4a8cb33	Make error messages clearer (even I was confused)	2011-10-19 11:49:16 -04:00
Eric Banks	6cadaa84c9	Just use validate() from super class since it does the same thing	2011-10-19 11:48:23 -04:00
Mark DePristo	df3e4e1abd	First working code to use SamRecordFactory to produce objects of our own design in SAMFileReader	2011-10-19 11:22:35 -04:00
Mauricio Carneiro	c27e2fb676	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-18 15:23:05 -04:00

1 2 3 4 5 ...

1212 Commits (f392d330c3ba13a0ed8bb83f7b99f4d2a2cde522)