gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	c9978316a3	Merge branch 'FragmentUtils'	2011-10-26 19:51:49 -04:00
Mauricio Carneiro	add9ad97ec	No scatter gather for VQSR or ApplyVQSR. These walkers should not be scatter gatherable. Annotating them accordingly so that Queue doesn't allow a less than knowledgeable user to try and scatter/gather VQSR.	2011-10-26 16:35:44 -04:00
Ryan Poplin	74aeb22eeb	Merged bug fix from Stable into Unstable	2011-10-26 15:57:30 -04:00
Ryan Poplin	86871bd1e3	Throw a UserException in the BQSR when there is no data instead of creating an empty csv file	2011-10-26 15:56:41 -04:00
Mark DePristo	034a997d07	Generalized Reads -> Fragment calculation -- Supports ReadBackedPileup -> FragmentCollection as before -- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller -- General cleanup, renaming, move to separate package, more extensive unit tests, etc. -- Added toFragment() function to ReadBackedPileup interface	2011-10-26 15:54:38 -04:00
Eric Banks	2f21b6ecfb	Removed debugging output	2011-10-26 15:50:20 -04:00
Eric Banks	b39fcb1bea	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 15:44:25 -04:00
Eric Banks	b6ce6ed3f8	Go around the ROD system for now so that we can just call decodeLoc() for efficiency. Noted that we should go through the ROD system once it gets cleaned up. This means that currently gzipped files are not supported with -L.	2011-10-26 15:42:53 -04:00
Eric Banks	9424e8b2ca	Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed.	2011-10-26 14:11:49 -04:00
Mark DePristo	7fa943aef1	Renamed FragmentPileup to FragmentUtils	2011-10-26 14:01:45 -04:00
Laurent Francioli	1f044faedd	- Genotype assignment in case of equally likeli combination is now random - Genotype combinations with 0 confidence are now left unphased	2011-10-26 19:57:09 +02:00
Laurent Francioli	81b163ff4d	Indentation	2011-10-26 14:49:12 +02:00
Laurent Francioli	62cff266d4	GQ calculation corrected for most likely genotype	2011-10-26 14:40:04 +02:00
Mark DePristo	af3613cc5f	GATKSAMRecord commit branch summary First, I'm sure there's a better way to do this, but I wanted to create a single commit summarizing the changes from my branch SamRecordFactory. What's the best way to do this? Rebase? Now, on to the changes here: -- Picard added a SamRecordFactory that is used to create instances the subclass SamRecord or BAMRecord. This factory allows us to have low-level picard readers (SamFileReader) create objects of type GATKSamRecord. The abomination of the extends and contains GATKSamRecord is now gone. GATKSamRecords are now produced by this factory, the GATK provides this factory to our SamFileReaders, and everything works with GATKSamRecord just extending BAMRecord. This results in up to a 2x performance improvement in writing BAMs and a ~10% improvement when reading BAMs files. -- As a consequence of this, we no longer officially support SAM records. Attempting to create SAMRecord objects with the factory will throw a user exception. -- Created a standard NGSPlatform enum, and GATKSamRecords support efficiently obtaining this value. The real BQSR (not the copy indel version) got the efficient code to use this. Please add all future platforms to this enum. -- GATKSamRecord no longer supports using the OQ or defaultBaseQuality. This is performed in a wrapper iterator that's only added when these command line options are used. -- ReducedRead code has been moved from ReadUtils until efficiency caching assessors in GATKSamRecord. -- ArtificialSamUtils creates GATKSamRecords now, just SAMRecords. Added code here to create artifical pairs and using that code to create artificial ReadBackedPileups with specific properties -- New smarter algorithm for FragmentPileup. This new code is up to 3x faster than the previous version, and is lazy so is more efficient when no overlapping pairs are actually in the pileup. Created extensive DataProvider driven UnitTest. Added Caliper-based benchmarking system to characterize the performance differences between the old and new algorithms. TODO still remains to make a efficient version that works for non-pileups for the HaplotypeCaller	2011-10-25 20:52:56 -04:00
Mark DePristo	2822f0dc27	Merge branch 'SamRecordFactory'	2011-10-25 20:34:47 -04:00
Mark DePristo	1b722c21cf	merge master	2011-10-25 16:08:39 -04:00
Ryan Poplin	56fdf0b865	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-25 15:58:56 -04:00
Ryan Poplin	4a34c1862e	misc cleanup. We now filter out haplotypes when it is obvious that the assembly has failed to find a parsimonious event rather than use haplotypes with large numbers of SNPs and small indels on them.	2011-10-25 15:22:28 -04:00
Guillermo del Angel	b559936b7a	a)New variant eval stratification module for indel size. b) Next iteration on indel caller runtime optimization: when computing likelihood of each haplotype for a given read, many computations will be redundant since pieces of haplotypes will be common to both REF and ALT haplotypes. So, we keep HMM matrices from one haplotype to the next one and recompute starting at the part where either haplotype is different or GOP/GCP are different.	2011-10-25 09:56:43 -04:00
Khalid Shakir	fac9932938	Embedding gsalib source and queueJobReport R scripts in the dist and package jars. Moved gsalib and queueJobReport.R to embeddable namespaced locations. Updated packager dependencies/dir to add an @includes which filters the embedded fileset. RScriptExecutor can now JIT compiles the gsalib. RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG. Refactored ProcessController and IOUtils from Queue to Sting Utils. Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count. Replaced uses of some IOUtils with Apache Commons IO. ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown. Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().	2011-10-24 15:58:34 -04:00
Khalid Shakir	89a581a66f	Added ability to specify arguments in files via -args/--arg_file Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()	2011-10-24 15:58:34 -04:00
Mark DePristo	502592671d	Cleanup FragmentPileup before main repo commit -- removed intermiate functions. Now only original version and best optimized new version remain -- Moved general artificial read backed pileup creation code into ArtificialSamUtils	2011-10-24 14:40:05 -04:00
Mark DePristo	166174a551	Google caliper example execution script -- FragmentPileup with final performance testing	2011-10-24 14:04:53 -04:00
Laurent Francioli	62477a0810	Added documentation and comments	2011-10-24 13:45:21 +02:00
Laurent Francioli	38ebf3141a	- Now supports parent/child pairs - Sites with missing genotypes in pairs/trios are handled as follows: -- Missing child -> Homozygous parents are phased, no transmission probability is emitted -- Two individuals missing -> Phase if homozygous, no transmission probability is emitted -- One parent missing -> Phased / transmission probability emitted - Mutation prior set as argument	2011-10-24 12:30:04 +02:00
Laurent Francioli	7312e35c71	Now makes use of standard Allele and Genotype classes. This allowed quite some code cleaning.	2011-10-24 10:25:53 +02:00
Laurent Francioli	01b16abc8d	Genotype quality calculation modified to handle all genotypes the same way. This is inconsistent with GQ output by the UG but is correct even for cases of poor quality genotypes.	2011-10-24 10:24:41 +02:00
Mark DePristo	f6ccac889b	Merged bug fix from Stable into Unstable	2011-10-23 16:37:12 -04:00
Mark DePristo	585a45b7a3	Bug fix for ClipReadsWalker when stats output isn't provided -- See http://getsatisfaction.com/gsa/topics/clipreadswalker?utm_content=topic_link&utm_medium=email&utm_source=reply_notification	2011-10-23 16:36:48 -04:00
Ryan Poplin	f5d910b8a5	Haplotype caller now sends genotype likelihoods to the exact model to genotype the events found in the best haplotypes.	2011-10-23 13:29:08 -04:00
Mark DePristo	42bf9adede	Initial version of "fast" FragmentPileup code -- Uses mayOverlapRoutine in ReadUtils -- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations -- PileupElement now comparable (sorts on offset than on start) -- Caliper microbenchmark to assess performance	2011-10-22 21:36:37 -04:00
Mauricio Carneiro	4913f8a60f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-21 17:45:07 -04:00
Mauricio Carneiro	102dafdcbc	Validation of GATKSamRecord in read filters Moved the validation of the GATKSamRecord to the MalformedReadFilter with the intent to make the read filter the ultimate validation location for sam records. This way we can opt to filter out malformed reads if we know what we are doing or blow up otherwise.	2011-10-21 17:40:43 -04:00
Guillermo del Angel	f4b409fa0d	CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result	2011-10-21 14:07:20 -04:00
Mark DePristo	b863390cb1	Moving reduced read functionality into GATKSAMRecord -- More functions take / produce GATKSAMRecords instead of SAMRecord	2011-10-21 13:28:05 -04:00
Mark DePristo	2403e96062	Renamed GATKSamRecord -> GATKSAMRecord for consistency. Better docs.	2011-10-21 09:59:24 -04:00
Mark DePristo	110e13bc1e	Merge branch 'master' into SamRecordFactory	2011-10-21 09:43:52 -04:00
Mark DePristo	be797a8a1f	Recalibrator now uses the much more efficient NGSPlatform in the cycle covariates system	2011-10-21 09:39:21 -04:00
Mark DePristo	ed74ebcfa1	GATKSamRecords with efficiency NGSPlatform method	2011-10-21 09:38:41 -04:00
Mark DePristo	94e1898d8f	A canonical set of NGS platforms as enums with convenient manipulation methods	2011-10-21 09:37:45 -04:00
Laurent Francioli	edea90786a	Genotype quality is now recalculated for each of the phased Genotypes. Small problem is that we unnecessarily loose a little precision on the genotypes that do not change after assignment.	2011-10-20 17:04:19 +02:00
Laurent Francioli	1c61a57329	Original rewrite of PhaseByTransmission: - Adapted to get the trio information from the SampleDB (i.e. from Pedigree file (ped)) => Multiple trios can be passed as argument - Mendelian violations and trio phasing possibilities are pre-calculated and stored in Maps. => Runtime is ~3x faster - Genotype combinations possible only given two MVs are now given a squared MV prior (e.g. 0/0+0/0=>1/1 is given 10^-16 prior if the MV prior is 10^-8) - Corrected bug: In case the best genotype combination is Het/Het/Het, the genotypes are now set appropriately (before original genotypes were left even if they weren't Het/Het/Het) - Basic reporting added: -- mvf argument let the user specify a file to report remaining MVs -- When the walker ends, some basic stats about the genotype reconfiguration and phasing are output Known problems: - GQ is not recalculated even if the genotype changes Possible improvements: - Phase partially typed trios - Use standard Allele/Genotype Classes for the storage of the pre-calculated phase	2011-10-20 13:06:44 +02:00
Laurent Francioli	ef6a6fdfe4	Added getAsMap -> returns the likelihoods as an EnumMap with Genotypes as keys and likelihoods as values.	2011-10-20 12:49:18 +02:00
Laurent Francioli	76dd816e70	Added getParents() -> returns an arrayList containing the sample's parent(s) if available	2011-10-20 12:47:27 +02:00
Mark DePristo	999a8998ae	Constructor for GATKSamRecord with header only, for unit testing	2011-10-19 17:51:48 -04:00
Mark DePristo	bba69701b5	Now creates GATKSamRecords now SamRecords	2011-10-19 17:49:17 -04:00
Christopher Hartl	cd8a6d62bb	You know how the wiki has a big section on commiting local changes to BRANCHES of the repository you clone it from? Yeah. It sucks if you don't do that. This commit contains: - IntronLossGenotyper is brought into its current incarnation - A couple of simple new filters (ReadName is super useful for debugging, MateUnmapped is useful for selecting out reads that may have a relevant unaligned mate) - RFA now matches my current local repository. It's in flux since I'm transitioning to the new traversal type. + the triggering read stash pilot required me to change the scope of some of the variables in the ReadClipping code, private -> protected. Those are all the changes there. - MendelianViolation restored to its former glory (and an annotator module that uses the likelihood calculation has been added) + use this rather than a hard GQ threshold if you're doing MV analyses. - Some miscellaneous QScripts	2011-10-19 17:42:37 -04:00
Mark DePristo	52345f0aec	Meaningful documentation string	2011-10-19 15:47:36 -04:00
Mark DePristo	1b38aa1a7e	Cleaning up reduced read code accessors	2011-10-19 15:46:44 -04:00
Eric Banks	d8d73fe4f2	Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.	2011-10-19 15:11:13 -04:00
Mark DePristo	7928b287fc	GATKSamRecord now produced by SAMFileReaders by default -- Removed all of the unnecessary caching operations in GATKSAMRecord -- GATKSAMRecord renamed to GATKSamRecord for consistency	2011-10-19 13:15:27 -04:00
Eric Banks	5a6468c11e	Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.	2011-10-19 11:52:05 -04:00
Eric Banks	48c4a8cb33	Make error messages clearer (even I was confused)	2011-10-19 11:49:16 -04:00
Eric Banks	6cadaa84c9	Just use validate() from super class since it does the same thing	2011-10-19 11:48:23 -04:00
Mark DePristo	df3e4e1abd	First working code to use SamRecordFactory to produce objects of our own design in SAMFileReader	2011-10-19 11:22:35 -04:00
Mauricio Carneiro	c27e2fb676	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-18 15:23:05 -04:00
Mark DePristo	f77f2eeb7d	Fix for new ID structure	2011-10-18 13:04:43 -04:00
Mark DePristo	1a92ee3593	No longer adds a binding of ID -> . when the ID field is dot in the VCF -- Really we should make ID a primary key in VariantContext. Putting it into the attributes is just annoying now	2011-10-18 10:57:02 -04:00
Ryan Poplin	e45fcb66eb	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-17 15:56:19 -04:00
Ryan Poplin	1e6794c539	fixing typo in VariantsToTable docs	2011-10-17 15:56:02 -04:00
Mark DePristo	0de8550f17	Merged bug fix from Stable into Unstable	2011-10-17 15:29:53 -04:00
Mark DePristo	c1329c4dde	Fixing a binary to logical or	2011-10-17 15:29:45 -04:00
Mark DePristo	9e4963efc8	Merged bug fix from Stable into Unstable	2011-10-17 15:27:38 -04:00
Mark DePristo	ec911ce5bb	Even better error messages	2011-10-17 15:27:22 -04:00
Mark DePristo	d065bf1715	Merged bug fix from Stable into Unstable	2011-10-17 15:25:47 -04:00
Mark DePristo	a7cf9cdc67	Fixing error message typo	2011-10-17 15:25:35 -04:00
Ryan Poplin	589df6b7cf	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-17 14:35:14 -04:00
Ryan Poplin	6b02354d84	Adding a new getter in VariantsToTable to extract the indel event length.	2011-10-17 14:34:52 -04:00
Mark DePristo	3550798c4c	Merged bug fix from Stable into Unstable	2011-10-17 13:58:56 -04:00
Mark DePristo	4108a294f7	Better error message when a RodBinding file doesn't exist	2011-10-17 13:58:46 -04:00
Mark DePristo	cc76826f78	Merged bug fix from Stable into Unstable	2011-10-17 13:38:11 -04:00
Mark DePristo	fd4540cd32	Fixed extraordinarily subtle race condition with contracts invariant -- all of the methods in the class must be synchronized or the internal state can be inconsistent with the contract invariant when entering the class in a non-synchronized method, even when that method doesn't care about the object's internal state	2011-10-17 13:37:55 -04:00
Mark DePristo	5a881360df	Merged bug fix from Stable into Unstable	2011-10-13 15:54:43 -04:00
Mark DePristo	7cab6f6bb0	Bug fixes for thread unsafe simple timer and bad Ns treatment in AlignmentUtils -- SimpleTimer is now threadsafe using synchronized method keywords -- Bug fix for alignmentToByteArray() where the N case was refPos++ not the now correct refPos += elementLength	2011-10-13 15:53:12 -04:00
Mauricio Carneiro	e12ffb6547	Updating docs for GCContentByInterval This walker does not take any BAMs. It only walks over the reference.	2011-10-13 13:27:00 -04:00
Eric Banks	9aecd50473	Adding ability to exclude annotations from the VA and UG lists. As described in the docs, this argument trumps all others (including -all) so that we can get around the SnpEff issue brought up by Menachem. Added integration test for it.	2011-10-12 15:44:54 -04:00
Mauricio Carneiro	e53a952aeb	Added ION Torrent support to CountCovariates.	2011-10-12 01:57:02 -04:00
Mauricio Carneiro	a2733a451f	Added NotCalled feature to GAV Added "not called" and "no status" to the truth table. Very useful.	2011-10-11 19:31:45 -04:00
David Roazen	ae83420637	Merged bug fix from Stable into Unstable	2011-10-11 12:26:08 -04:00
David Roazen	794f275871	SnpEff is now marked as a RodRequiringAnnotation instead of an ExperimentalAnnotation. Having SnpEff grouped with the Experimental annotations was proving problematic, since it requires a rod. Placing it in its own group should improve the situation somewhat, making it easier to request "all annotations except for SnpEff".	2011-10-11 12:08:56 -04:00
David Roazen	cfd0ac8410	Merged bug fix from Stable into Unstable Conflicts: public/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java	2011-10-11 12:03:51 -04:00
David Roazen	24b72334b3	UnifiedGenotyper now correctly initializes the VariantAnnotator engine. This allows the annotation classes to perform any necessary initialization/validation. For example, it allows the SnpEff annotator to (among other things) validate its rod binding. This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding is present. Added an integration test to cover this case so that it doesn't break again.	2011-10-11 12:02:05 -04:00
Guillermo del Angel	0429b38021	Merged bug fix from Stable into Unstable	2011-10-11 11:19:38 -04:00
Guillermo del Angel	1c485d8b5e	Forgot that no matter how trivial a change it's a good idea to compile first	2011-10-11 11:18:41 -04:00
Guillermo del Angel	6418f4d69b	Merged bug fix from Stable into Unstable	2011-10-11 11:13:18 -04:00
Guillermo del Angel	1975de1b32	Second try: hide --do_indel_quality in AnalyzeCovariates	2011-10-11 11:11:29 -04:00
Guillermo del Angel	6506ea83e8	Revert "Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users"... a hidden passenger change made it through. This reverts commit 70e10ccb1be90dcff8f4485ae6ee036db2d1ac86.	2011-10-11 11:03:12 -04:00
Guillermo del Angel	4c1d8c8d44	Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users	2011-10-11 11:01:06 -04:00
Eric Banks	77c983c5b5	No one claimed this walker and it doesn't have integration tests or GATKdocs so it doesn't belong in public.	2011-10-10 15:17:54 -04:00
Mark DePristo	fb72bcf732	DiffObjects no longer prints out the file name in the status so MD5 are stable	2011-10-10 15:10:57 -04:00
Mark DePristo	46e7370128	this.allele, getAlleles(), and getAltAlleles() now return List not set -- Changes associated code throughout the codebase -- Updated necessary (but minimal) UnitTests to reflect new behavior -- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC	2011-10-09 11:45:55 -07:00
Mark DePristo	c67f6c076b	simpleMerge now preserves allele order -- UnitTests for dangerous PL merging cases in the multi-allelic case. The new behavior is correct	2011-10-08 17:39:53 -07:00
Mark DePristo	ec14a4a606	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-07 08:38:50 -07:00
Eric Banks	ca9cd9b688	Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC.	2011-10-06 22:38:44 -04:00
Mark DePristo	c7864c7256	Filter application order is now deterministic, in the order defined by the walker -- For no apparent reason we were using a HashSet to store the ReadFilters, so the order of operations was really arbitrarily applied. The order now is (1) the order of the walker intrinsic filters (2) read group black list (if provided) (3) command line filters (if provided)	2011-10-06 18:51:40 -07:00
Mark DePristo	0b88af4af9	Counts of records failing filters are displayed sorted -- Stops random ordering of the output, as the counts are returned sorted by string name of the class -- Deleted now unused sh*tty assessors in Utils	2011-10-06 18:42:26 -07:00
Mark DePristo	d1e70d6ec2	Removed Nx counting of reads in metrics with -nt > 1	2011-10-06 18:29:26 -07:00
Eric Banks	c61804a450	Rename the long version of the argument name to more accurately reflect its purpose.	2011-10-06 16:14:04 -04:00
Eric Banks	61a3dfae24	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 15:58:04 -04:00
Eric Banks	6eb87bf58a	RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop.	2011-10-06 15:57:49 -04:00
Eric Banks	1b0735f0a3	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 13:41:45 -04:00
Eric Banks	c4dfc1fb8b	Temporary commit of parallelization support for RealignerTargetCreator. Tim begged us for this and I got assurances from Khalid/Matt that this would also be extremely helpful for the whole genome calling pipeline, so I spent a while working on this. Needs to be fixed up though because apparently only the leaves in the hierarchical reduce get their output aggregated. Worked out a better solution with Matt.	2011-10-06 13:41:36 -04:00
Mark DePristo	73f9d1f217	GATK read group requirement iron hand -- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined -- LIBS again throws an error if the complete list of samples isn't provided -- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data. -- Convenience constructors for UserExceptions.MalformedBAM	2011-10-06 08:40:35 -07:00
Mark DePristo	23845ac798	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 08:17:08 -07:00
Mark DePristo	daa5999489	Fixed typo in argument description	2011-10-06 08:16:25 -07:00
Guillermo del Angel	8a474e38ff	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 10:08:39 -04:00
Guillermo del Angel	93f7e632bd	Minor fix/enhancement for VariantEval: if a vcf has symbolic alleles, program would crash ungracefully - now we'll just skip record without processing. This is a big issue since we can't process 1000G integration files with code as is.	2011-10-06 10:07:46 -04:00
Mark DePristo	190be4d0d1	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-05 21:27:11 -07:00
Mark DePristo	8e6845806a	Allowing empty samples list in LIBS -- Right now we cannot process BAM files without read groups because we enforce the samples list to not be empty when there's a SAM record. Now if there are reads and there are no samples we add the "null" sample so that LIBS walks the reads properly	2011-10-05 21:26:21 -07:00
Matt Hanna	180c8f286f	Merged bug fix from Stable into Unstable	2011-10-05 20:37:43 -04:00
Matt Hanna	55b9f06527	Ensure that IndelRealigner n-way out option supports MD5 generation.	2011-10-05 20:36:28 -04:00
Mark DePristo	be2d29ce69	Final PED documentation	2011-10-05 15:17:41 -07:00
Mark DePristo	3226d5dc0d	Merge branch 'master' into ped	2011-10-05 15:03:09 -07:00
Mark DePristo	6a573437af	Details documentation arguments for -ped	2011-10-05 15:00:58 -07:00
Mark DePristo	e7c80f7c45	Renaming quantitative trait to OtherPhenotype which is now a String not a double -- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles	2011-10-05 12:26:33 -07:00
Mark DePristo	51ecc20867	getFamily() and associated methods implemented and tested -- Sample no longer serializable -- Sample now implements Comparable	2011-10-05 09:55:05 -07:00
Mark DePristo	a45d985818	TODO method stubs	2011-10-04 15:54:09 -07:00
Mark DePristo	fee89e47ff	Only throws an error when there are no samples but there are reads -- Handles the case when you are running a ROD traversal and yet the LIBS is still used to return null everywhere.	2011-10-04 06:50:54 -07:00
Mark DePristo	f552aede42	Only provide the sample names in the BAM file for efficiency	2011-10-04 06:50:12 -07:00
Mark DePristo	a27641e1fc	Cleaned up imports	2011-10-04 06:28:36 -07:00
Mark DePristo	b20689ff55	No longer supports extraProperties -- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem -- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record. If the two records are inconsistent, an error is thrown -- addSample() in Sample.class now invokes mergeSample() when appropriate -- Validation types are now only STRICT or SILENT -- Validation code implemented in SampleDBBuilder -- Extensive unit tests for SampleDBBuilder	2011-10-03 19:20:33 -07:00
Mauricio Carneiro	3837aa45b4	Fixing conflicts Conflicts: public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java	2011-10-03 19:07:59 -07:00
Mark DePristo	2e3dc52088	Minor function renaming	2011-10-03 14:41:13 -07:00
Mark DePristo	dd71884b0c	On path to SampleDB engine integration -- PedReader tag parser -- Separation of SampleDBBuilder from SampleDB (now immutable) -- Removed old sample engine arguments	2011-10-03 12:08:07 -07:00
Eric Banks	c3eff7451a	Found a small inefficiency while profiling: we were still using String.split instead of ParsingUtils.split to break up array values in the INFO field. There was a noticeable (albeit not big) difference in the change when reading sites only files.	2011-10-03 14:20:39 -04:00
Mark DePristo	8ee0f91904	Remove residual processing tracker arguments	2011-10-03 09:50:01 -07:00
Mark DePristo	89ac50e86e	SampleDataSource -> SampleDB	2011-10-03 09:33:30 -07:00
Mark DePristo	93fba06cb5	Support for whitespace only lines	2011-10-03 09:30:10 -07:00
Mark DePristo	0604ce55d1	PedReader support for ; separated lines, not only newline	2011-10-03 09:19:58 -07:00
Mark DePristo	52f670c8b8	100% version of PedReader -- Passes all unit tests -- Added unit tests for missing fields	2011-10-03 06:12:58 -07:00
Mark DePristo	dd75ad9f49	95% PedReader -- Passes significiant unit tests -- Implicit sample creation for mom / dad when you create single samples -- Continuing cleanup of Sample and SampleDataSource	2011-09-30 18:03:34 -04:00
Andrey Sivachenko	c7898a9be7	inconsequential change in string constants printed into the vcf which noone uses anyway...	2011-09-30 16:40:21 -04:00
Mark DePristo	010899f886	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-30 15:51:09 -04:00
Mark DePristo	84160bd83f	Reorganization of Sample -- Moved Gender and Afflication to separate public enums -- PedReader 90% implemented -- Improve interface cleanup to XReadLines and UserException	2011-09-30 15:50:54 -04:00
Mauricio Carneiro	05fba6f23a	Clipping ends inside deletion and before insertion fixed.	2011-09-30 15:44:43 -04:00
Mark DePristo	c1cf6bc45a	PEDReader should be in samples	2011-09-30 14:22:19 -04:00
Mark DePristo	56f10b40a8	Fixing test bugs for WindowMaker that required empty sample list	2011-09-30 14:18:27 -04:00
Ryan Poplin	af6c053435	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-30 13:33:31 -04:00
Mark DePristo	810e8ad011	Removed getXByReaders() function from the engine -- These could be simplied in their downstream uses -- Or they could be replaced with a generic getSAMFileHeaders() function and then apply the getSamples(header) as desired downstream	2011-09-30 10:43:51 -04:00
Mark DePristo	178ba24c27	Move getSamplesForSamFile to SampleUtils -- A nearly identical piece of code already lived in SampleUtils. Now there are two functions, one taking a regular header and another grabbing the merged header from the GATK engine itself. Much cleaner	2011-09-30 10:28:18 -04:00
Mark DePristo	30d23942b1	Renamed ReadBackedPileup getXSampleName() functions to getXSample -- now that we don't have Sample objects floating around we don't have to have all of the Name extensions on our functions	2011-09-30 10:02:57 -04:00
Mark DePristo	3289a325fc	Removed final use of Sample in RBP	2011-09-30 09:57:39 -04:00
Mark DePristo	a69a4dda2f	SamplesDB no longer has null sample -- Updated getSamples().size() == 2 test in CallableLociWalker that really ensured there was one sample in the system	2011-09-30 09:56:23 -04:00
Mark DePristo	e055a78f6e	LIBS now requires at least one sample be present -- UnitTest provides a "null" sample for matching the reads without read groups	2011-09-30 09:49:35 -04:00
Mark DePristo	9860a2c989	Merge branch 'master' into ped	2011-09-30 09:28:18 -04:00
Mark DePristo	d901fed617	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-30 08:41:44 -04:00
Mauricio Carneiro	cabacf028d	Intermediate commit to fix interval skipping may need additional testing.	2011-09-29 18:45:12 -04:00
Mark DePristo	1765fbeb6b	Merge branch 'master' into ped	2011-09-29 17:18:51 -04:00
Mark DePristo	98ecaf8aa0	Support for ReducedReads with reduced counts and average quals -- ReadUtils and UnitTest updated to support new byte[] style -- Removed unnecessary read transformer in PairHMM	2011-09-29 17:18:39 -04:00
Mauricio Carneiro	9508220157	fixed hard clipping both ends inside deletion If both ends of the interval falls within a deletion in the read then hardClipBothEnds would cut the right tail first including the entire deletion, then fail to cut the left tail because there would not be any bases there anymore. Fixed.	2011-09-29 15:36:49 -04:00
Mark DePristo	625ffb6a07	LocusIteratorByState and ReadBackedPileups no long use Sample	2011-09-29 14:52:11 -04:00
Mark DePristo	b3a2371925	Merge branch 'master' into ped	2011-09-29 14:32:17 -04:00
Mark DePristo	68761a6e28	Removed sample from header	2011-09-29 14:13:05 -04:00
Mauricio Carneiro	a5e75cd14c	Outputting both consensus base qualities and counts The base qualities of a consensus reads are now the average quality of the bases forming the consensus base (most common base) and the consensus quality tag now carry an array with the counts of each base in the consensus. This should increase file size but improve calling sensitivity/specificity.	2011-09-29 12:54:41 -04:00
Mark DePristo	505416b6c0	Merge branch 'master' into ped	2011-09-29 12:22:39 -04:00
Mark DePristo	9536845e35	Cleaning up unused code in MV	2011-09-29 12:20:07 -04:00
Mark DePristo	5043d76c3d	Removing more bad uses of SampleDataSource creation	2011-09-29 12:16:34 -04:00
Mark DePristo	5c9227cf5e	Further cleanup of Sample database -- Removing more and more unnecessary code -- Partial removal of type safe Sample usage. On the road to SampleDB only	2011-09-29 11:50:05 -04:00
Mark DePristo	2a0cd556d3	Further cleanup of Sample -- Cleaned up interface functions in GAE -- Added Walker.getSampleDB() function which is an easier option for tools to get the samples db	2011-09-29 10:34:51 -04:00
Mark DePristo	e76f381628	Moved sample package from DataSources to gatk, and renamed it samples -- All associated changes to the codebase are just header updates	2011-09-29 09:57:15 -04:00
Mark DePristo	e197dcd1f3	Pre-cleanup commit of Sample and SampleDataSource -- SampleDataSource has all reader functionality disabled	2011-09-29 09:44:18 -04:00
Mark DePristo	4d31673cc5	No longer supporting YAML file allows us to delete 75% of the sample's codebase	2011-09-29 09:43:31 -04:00
Ryan Poplin	e366ee18bc	Adding ability to read in and make use of kmer quality tables during HMM evaluation	2011-09-29 07:46:19 -04:00
Mauricio Carneiro	fc86cd6fd8	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr	2011-09-29 00:12:15 -04:00
Roger Zurawicki	4fd5630f6a	Added ReadClipper Unit Test * Includes tests that include HardClip to Read and Reference Coords. * Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing	2011-09-28 23:13:50 -04:00
Matt Hanna	9272ed03b5	Merged bug fix from Stable into Unstable	2011-09-28 21:26:43 -04:00
Matt Hanna	0acaf2df65	Fix an embarrassing issue where a specific configuration of minimal coverage over small intervals could cause reads to be dropped from the pileup. Nothing to see here...	2011-09-28 21:23:01 -04:00
Guillermo del Angel	c8d3a720f9	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-28 18:17:34 -04:00
Guillermo del Angel	7e3cb45093	Further performance optim in banded hmm, about 60% speed improvement over current implementation now	2011-09-28 16:27:28 -04:00
Ryan Poplin	1b1ca80df2	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-28 16:17:39 -04:00
Ryan Poplin	3b73dc89fe	Making several esoteric arguments in the BQSR @Hidden. Adding basic support for Complete Genomics machine cycle.	2011-09-28 16:17:31 -04:00
Mauricio Carneiro	ff2f4df043	Fixed hardclipping inside indel (right tail) when hard clipping the right tail of a read falls inside a deletion, clipping should fall back to the last base before the deletion to follow the ReadClipper's contract.	2011-09-28 16:07:34 -04:00
Mauricio Carneiro	3c7b7f74ef	Optimized interval iteration Using a TreedSet to manipulate getToolkit.getIntervals() and being smart about which intervals to test makes interval clipping O(1) instead of O(n).	2011-09-28 16:07:34 -04:00
Mauricio Carneiro	5c9b659c02	clipping both ends of the reads was modifying the original read This goes against the ReadClipper contract, and was affecting the second part of the read that spans over multiple intervals. Fixed.	2011-09-28 16:07:34 -04:00
Guillermo del Angel	fe23e4d10c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-28 15:53:11 -04:00
Guillermo del Angel	e2b9030e93	First mostly fully functional implementation of banded pair HMM likelihood computation for indel caller. More experimentation to follow but it right now works in small data sets and at least it doesn't break existing things. Disabled by default at this point	2011-09-28 15:51:48 -04:00
Eric Banks	1b45f21774	Removing this command-line tool. Purposely not doing this in stable so that users who may still use it have time to find other options. But the docs are no longer on the wiki.	2011-09-28 13:18:32 -04:00
Eric Banks	1f0e354fae	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-28 13:13:21 -04:00
Eric Banks	bb619a9a3c	Fixing docs	2011-09-28 13:13:03 -04:00
Mark DePristo	5812004e06	Merge branch 'stable'	2011-09-28 11:36:40 -04:00
Mark DePristo	a5006831d7	Shows "" not empty space when default string value is ""	2011-09-28 11:35:52 -04:00
Mark DePristo	1e32281a15	Fix to not show -null when missing short name argument	2011-09-28 11:31:20 -04:00
Mauricio Carneiro	89544c209c	Fixing contracts changed return type to Pair, changing contracts accordingly.	2011-09-28 11:19:17 -04:00
Eric Banks	eacbee3fe5	Merged bug fix from Stable into Unstable	2011-09-27 20:35:18 -04:00
Eric Banks	43b0c98298	Fix docs	2011-09-27 20:34:46 -04:00
Eric Banks	232a6df11c	Add longhand form to the error message.	2011-09-27 20:29:31 -04:00
Eric Banks	1d6fcb6eb1	Revert "Add longhand form to the error message to prevent users from posting borderline dumb posts to GS." This reverts commit 75b2600527cfce05ae683cb394290ff2a80e8552.	2011-09-27 20:27:00 -04:00
Eric Banks	269b9826b6	Add longhand form to the error message to prevent users from posting borderline dumb posts to GS.	2011-09-27 20:26:36 -04:00
Mauricio Carneiro	3b6e43b7c4	Use reads that span multiple intervals * RR will now compress reads that span across multiple intervals correctly and output them in the correct order. * Fixed bug in getReadCoordinateForReferenceCoordinate where if the requested reference coordinate fell inside a deletion in the read the read would be clipped up to one element past the deletion.	2011-09-27 18:39:06 -04:00
Khalid Shakir	84bd355690	Merged bug fix from Stable into Unstable	2011-09-27 14:34:39 -04:00
Khalid Shakir	b090751f62	Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths. Updates to HybridSelectionPipeline: - Added annotations back via snpEff - Minor updates to VQSR paths and lowered memory	2011-09-27 14:33:57 -04:00
Eric Banks	26e71f6688	The Omni files have multiple records (with the same ALT) at a particular location, with one PASSing and the other(s) filtered. Chris, this is why using this file as both eval and comp leads to ref/no-call cells in the GenotypeConcordance table. However, this led to non-determinism in VE because the VCs were placed in a HashSet; we use a LinkedHashMap instead to bring back determinism.	2011-09-27 11:03:17 -04:00
Guillermo del Angel	ceffefa6a6	Intermediate version with banded pair HMM	2011-09-27 10:18:58 -04:00
Mark DePristo	e99ff3caae	Removed lots of old, and not to be used, HMM options -- resulted in massive code cleanup -- GdA will integrate his new banded algorithm here -- Removed: DO_CONTEXT_DEPENDENT_PENALTIES, GET_GAP_PENALTIES_FROM_DATA, INDEL_RECAL_FILE, dovit, GSA_PRODUCTION_ONLY	2011-09-27 10:08:40 -04:00
Mark DePristo	fa0efbc4ca	Refactoring of PairHMM to support reduced reads	2011-09-26 13:28:56 -04:00
Mark DePristo	a6b65d6347	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-26 13:26:21 -04:00
Mark DePristo	4f09453470	Refactored reduced read utilities -- UnitTests for key functions on reduced reads -- PileupElement calls static functions in ReadUtils -- Simple routine that takes a reduced read and fills in its quals with its reduced qual	2011-09-26 12:58:31 -04:00
Eric Banks	234b74dd05	Merged bug fix from Stable into Unstable	2011-09-26 11:47:23 -05:00
Eric Banks	317b95fa57	Fixing some annotator docs	2011-09-26 11:46:45 -05:00
Mauricio Carneiro	b76dbc72f0	Fixed interval navigation bug. If a read was hard clipped away from the current interval, all subsequent reads within that interval (not hardclipped) would be filtered out. Fixed.	2011-09-26 08:13:44 -04:00
Guillermo del Angel	9afccd11b1	Minor refactoring: add ability to MathUtils.normalizeFromLog10 to not go to linear domain but just substract max value from log values and return. Use this function in snp and indel GL computation.	2011-09-25 21:18:56 -04:00
Guillermo del Angel	3eef800889	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-24 21:20:11 -04:00
Guillermo del Angel	203517fbb7	a) Cleanups/bug fixes to previous commit to CombineVariants. b) Change md5 to reflect records that are now merged correctly. c) Change unit merge alleles test to reflect the fact that a null non-variant vc object is not valid and not supported because there's no way to codify such object in a vcf. The code correctly converts this to a non-variant single-base event with whatever the reference is at that location.	2011-09-24 19:08:00 -04:00
Mauricio Carneiro	c31f4cb2f6	Cleaning leading insertions With the current implementation, a read cannot start with a deletion or an insertion. Maybe this will change in the future, but for now, chop the leading insertion off.	2011-09-24 14:33:32 -04:00
Guillermo del Angel	cd058dd10f	a) Fixed md5 for legit change in UG output that now also no-calls genotypes w/0,0,0 in PL's in SNP case. b) First reimplementation of new vc merger of different types. Previous version did it in two steps, first merging all vc's per type and then trying to see if resulting vc's would be merged if alleles of one type were a subset of another, but this won't work when uniquifying genotypes since sample names would be messed up and GT sample names wouldn't match VC sample names. Now, it's actually simpler: when splitting vc's by type before merging, we check for alleles of one vc being a subset of alleles of vc of another type and if so we put them together in same list.	2011-09-24 13:40:11 -04:00
Mark DePristo	bb11951255	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-24 09:26:45 -04:00
Mark DePristo	8d9e136bba	Merge branch 'stable'	2011-09-24 09:26:28 -04:00
Mark DePristo	6804ab6d2f	Bug fix for NPE in very short GATK runs -- Was already in unstable, but not stable...	2011-09-24 09:25:29 -04:00
Mark DePristo	92acff46e5	Moved Haplotype into Utils root	2011-09-24 09:14:05 -04:00
Mark DePristo	f792353dcd	Framework for genotype unit test	2011-09-24 08:56:45 -04:00
Mark DePristo	c0bb0cb465	Make DiploidGenotype enum private to walkers.genotyper	2011-09-24 08:48:33 -04:00
Guillermo del Angel	3a4469a236	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-23 21:58:34 -04:00
Guillermo del Angel	0e74cc3c74	a) Treat SNP genotype likelihoods just as indels, in the sense that they're always normalized as PL's so one of them will always be zero. This creates minor numerical differences in Qual and annotations due to numerical approximations in AF computation. b) Intermediate CombineVariants fixes, not ready yet	2011-09-23 21:58:20 -04:00
Mauricio Carneiro	7cac75ae1d	Merged bug fix from Stable into Unstable	2011-09-23 19:00:43 -04:00
Mauricio Carneiro	fbe3c1e0b3	Adding warning on HardClipping Hard Clipping is still under heavy development and should not be used by anyone less prepared than MacGyver.	2011-09-23 19:00:19 -04:00
Mark DePristo	b66841f179	Static cache for binomial probability -- Very low level performance optimization	2011-09-23 17:29:34 -04:00
Mauricio Carneiro	1a45c331b2	bringing the latest bug fixes to Reduce Reads	2011-09-23 16:40:06 -04:00
Mauricio Carneiro	9ea40f2e41	Deletions/Insertions in hard clip and bug fixes * Deletions now count as hard clipped bases in order to recover the original alignment start of a clipped read. * Insertions do not count as hard clipped bases for the same reason. * This created a bug in the previous cigar cleaning function. Fixed.	2011-09-23 16:37:08 -04:00
David Roazen	40202c85e0	Merged bug fix from Stable into Unstable	2011-09-23 16:35:55 -04:00
David Roazen	e1cb5f6459	SnpEff annotator now assigns a functional class to each effect and distinguishes between actual effects and mere modifiers. -We now assign a functional class (nonsense, missense, silent, or none) to each SnpEff effect, and add a SNPEFF_FUNCTIONAL_CLASS annotation to the INFO field of the output VCF. -Effects are now prioritized according to both biological impact and functional class, instead of impact only. -Many of SnpEff's "low-impact" effects are now classified as "modifiers" with lower priority than every other effect. This includes such "effects" as DOWNSTREAM, UPSTREAM, INTRON, GENE, EXON, and others that really describe the location of the variant rather than its biological effect. This code will be short-lived (likely 1.2-only), as the next version of SnpEff will include most of these features directly. Checking this change into Stable+Unstable instead of Unstable because the current functional class stratification in VariantEval is basically broken and urgently needs to be fixed for production purposes.	2011-09-23 16:06:52 -04:00
Matt Hanna	e388c357ca	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-23 14:53:28 -04:00
Matt Hanna	cc23b0b8a9	Fix for recent change modelling unmapped shards: don't invoke optimization to combine mapped and unmapped shards.	2011-09-23 14:52:31 -04:00
Mark DePristo	e3d4efb283	Remove N2 EXACT model code, which should never be used	2011-09-23 11:55:21 -04:00
Mark DePristo	27ce3c822e	Merge branch 'stable'	2011-09-23 09:04:52 -04:00
Mark DePristo	2bb77a7978	Docs for all VariantAnnotator annotations	2011-09-23 09:04:16 -04:00
Mark DePristo	dd65ba5bae	@Hidden for DocumentationTest and GATKDocsExample	2011-09-23 09:03:37 -04:00
Mark DePristo	dfce301beb	Looks for @Hidden annotation on all classes and excludes them from the docs	2011-09-23 09:03:04 -04:00
Mark DePristo	4397ce8653	Moved removePLs to VariantContextUtils	2011-09-23 08:24:20 -04:00
Mark DePristo	c49cc623de	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-22 17:26:21 -04:00
Mark DePristo	5cf82f9236	simpleMerge UnitTest tests filtered VC merging	2011-09-22 17:05:12 -04:00
Mauricio Carneiro	96c875399c	Merging many bug fixes to reduce reads	2011-09-22 17:04:11 -04:00
Mauricio Carneiro	39b54211d0	Fixed hard clipping soft clipped bases after hard clips if soft clipped bases were after a hard clipped section of the read, the hard clip was clipping the left soft clip tail as if it were a right tail. Mayhem.	2011-09-22 15:46:55 -04:00
Mauricio Carneiro	1acf7945c5	Fixed hard clipped cigar and alignment start * Hard clipped Cigar now includes all insertions that were hard clipped and not the deletions. * The alignment start is now recalculated according to the new hard clipped cigar representation	2011-09-22 14:51:14 -04:00
Mauricio Carneiro	4e9020c9f7	Fixed alignment start for hard clipping insertions	2011-09-22 13:28:25 -04:00
Mark DePristo	ba5f83fee2	start of VariantContextUtils UnitTest -- tests rsID merging	2011-09-22 12:10:39 -04:00
Mark DePristo	93dd1faa5f	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-22 11:20:10 -04:00
Mark DePristo	a05c959e5a	Empty unit tests for VariantContextUtils -- will be expanded over the day	2011-09-22 11:20:07 -04:00
Christopher Hartl	4f4a0fc38a	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git	2011-09-22 11:01:58 -04:00
Christopher Hartl	982c47bfa7	Remove duplicate effort in ReadUtils (with apologies to Mauricio) Big (but not major) cleanup of code in ILG - mostly excising the old likelihood model Activated the early-abort check for ILG. I think it should be better this way.	2011-09-22 10:58:26 -04:00
Eric Banks	8f8b59a932	My interpretation of the VCF spec is that the FORMAT field should only be present if there is genotype/sample data. So the VCFCodec now throws an exception when it encounters such a case. I had to fix one of the integration test VCFs.	2011-09-21 22:23:28 -04:00
Christopher Hartl	dc96f6da79	Merge branch 'master' of ssh://chartl@gsa2/humgen/gsa-scr1/chartl/dev/git	2011-09-21 18:18:41 -04:00
Christopher Hartl	f9cdc119af	Added a method to ReadUtils that converts reads of the form 10S20M10S to 40M (just unclips the soft-clips). Be careful when using this - if you're writing a bam file it will be potentially written out of order (since the previous alignment start was at the M, not the S).	2011-09-21 18:16:42 -04:00
Christopher Hartl	faff6e4019	Failed to commit changes to the GATKReport required for more easy access when using the files as data sources (read: histograms) for walkers	2011-09-21 18:15:23 -04:00
Mauricio Carneiro	96768c8a18	Sending latest bug fixes to Reduce Reads to the main repository	2011-09-21 17:43:11 -04:00
Mauricio Carneiro	70335b2b0a	Hard clipping soft clipped reads to fix misalignments. Pre-softclipped reads (with high qual) are a complicated event to deal with in the Reduced Reads environment. I chose to hard clip them out for now and added a todo item to bring them back on in the future, perhaps as a variant region.	2011-09-21 17:12:01 -04:00
Christopher Hartl	ef05827c7b	Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-21 16:40:47 -04:00
Christopher Hartl	3b51d9106a	Adding in likelihood calculations for mendelian violations. Also fixing a minor and rare bug in SelectVariants when specifying family structure on the command line.	2011-09-21 16:40:29 -04:00
Mark DePristo	04968c88b3	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-21 15:43:25 -04:00
Mark DePristo	6bcfce225f	Fix for dynamic type determination for bgzip files -- GZipInputStream handles bgzip files under linux, but not mac -- Added BlockCompressedInputStream test as well, which works properly on bgzip files	2011-09-21 15:39:19 -04:00
Mark DePristo	9f6f0c443c	Marginally cleaner isVCFStream() function -- cleanup trying to debug minor bug. Failed to fix the bug, but the code is nicer now	2011-09-21 15:25:01 -04:00
Ryan Poplin	5fef6dc5d0	Merged bug fix from Stable into Unstable	2011-09-21 15:23:06 -04:00
Ryan Poplin	2585fc3d6c	Updating Rscript path doc text for Broad users	2011-09-21 15:22:26 -04:00
Mark DePristo	74f9ccf6dd	Merge	2011-09-21 11:30:11 -04:00
Mark DePristo	6592972f82	Putative fix for BAQ array out of bounds -- Old code required qual to be <64, which isn't strictly necessary. Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant -- Unittest to enforce this behavior	2011-09-21 11:25:08 -04:00
Eric Banks	174859fc68	Don't allow whitespace in the INFO field	2011-09-21 11:14:54 -04:00
Mark DePristo	ecc7f34774	Putative fix for BAQ problem.	2011-09-21 11:09:54 -04:00
Mark DePristo	7d11f93b82	Final bugfix for CombineVariants -- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp -- Proper handling of ids. If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list	2011-09-21 10:58:32 -04:00
Mark DePristo	a91ac0c5db	Intermediate commit of bugfixes to CombineVariants	2011-09-21 10:15:05 -04:00
David Roazen	b04d8eab55	Merged bug fix from Stable into Unstable	2011-09-20 17:24:14 -04:00
Mauricio Carneiro	758ecf2d43	Bringing latest updates of ReduceReads to the master repository	2011-09-20 16:35:09 -04:00
David Roazen	d9ea764611	SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file. This change is urgently required for production, which is why it's going into Stable+Unstable instead of just Unstable. The keys for the SnpEff version and command header lines in the VCF file output by VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally different from the keys for those same lines in the SnpEff output file (SnpEffVersion and SnpEffCmd), so that output files from VariantAnnotator won't be confused with output files from SnpEff itself.	2011-09-20 16:30:55 -04:00
Mark DePristo	bffd3cca6f	Bug fix for reduced read; only adds regular bases for calculation -- No longer passes on deletions for genotyping	2011-09-20 15:07:06 -04:00
Mark DePristo	a1b4cafe7a	Bug fix for NPE when timer wasn't initialized	2011-09-20 13:59:59 -04:00
Mark DePristo	b7511c5ff3	Fixed long-standing bug in tribble index creation -- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index. This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write -- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary. This can be used conveniently everywhere, and is what's written into the Tribble index -- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils -- VCFWriter now requires the master sequence dictionary -- Updated walkers that create VCFWriters to provide the master sequence dictionary	2011-09-20 10:53:18 -04:00
Mark DePristo	230e16d7c0	Merge branch 'master' into rodrewrite	2011-09-20 06:54:18 -04:00
Mark DePristo	aa8afa3899	Merge	2011-09-19 21:16:47 -04:00
Mauricio Carneiro	56106d54ed	Changing ReadUtils behavior to comply with GenomeLocParser Now the functions getRefCoordSoftUnclippedStart and getRefCoordSoftUnclippedEnd will return getUnclippedStart if the read is all contained within an insertion. Updated the contracts accordingly. This should give the same behavior as the GenomeLocParser now.	2011-09-19 14:00:00 -04:00
Mauricio Carneiro	080c957547	Fixing contracts for SoftUnclippedEnd utils Now accepts reads that are entirely contained inside an insertion.	2011-09-19 13:53:53 -04:00
Mauricio Carneiro	5e832254a4	Fixing ReadAndInterval overlap comments.	2011-09-19 13:28:41 -04:00
Christopher Hartl	ecb8466662	Merged bug fix from Stable into Unstable	2011-09-19 12:32:08 -04:00
Christopher Hartl	8143def292	Fix the -T argument in the DepthOfCoverage docs Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file	2011-09-19 12:31:47 -04:00
Christopher Hartl	034b868588	Revert "Fix the -T argument in the DepthOfCoverage docs" This reverts commit 0994efda998cf3a41b1a43696dbc852a441d5316.	2011-09-19 12:16:07 -04:00
Mark DePristo	cfde0e674b	Merge branch 'sgintervals'	2011-09-19 12:02:41 -04:00
Mark DePristo	3e93f246f7	Support for sample sets in AssignSomaticStatus -- Also cleaned up SampleUtils.getSamplesFromCommandLine() to return a set, not a list, and trim the sample names.	2011-09-19 11:40:45 -04:00
Mark DePristo	41ffb25b74	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-19 10:55:18 -04:00
Christopher Hartl	ca1b30e4a4	Fix the -T argument in the DepthOfCoverage docs Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file	2011-09-19 10:29:06 -04:00
Mark DePristo	4ad330008d	Final intervals cleanup -- No functional changes (my algorithm wouldn't work) -- Major structural cleanup (returning more basic data structures that allow us to development new algorithm) -- Unit tests for the efficiency of interval partitioning	2011-09-19 10:19:10 -04:00
Mark DePristo	6ea57bf036	Merge branch 'master' into sgintervals	2011-09-19 09:50:19 -04:00
Mark DePristo	6bd42c053d	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-18 20:18:39 -04:00
Roger Zurawicki	091c7197cd	Fixed memory leak and bug with deletions in clipping The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug. * There is no check to make sure the read coordinate are covered by the read though When Hard clipping to interval, I added a check for deletions. NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized	2011-09-18 19:21:51 -04:00
Guillermo del Angel	e7b9a009b7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-16 12:48:30 -04:00
Menachem Fromer	b2e8e11128	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-16 00:52:27 -04:00
Christopher Hartl	57b3efa2e2	Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 21:06:38 -04:00
Christopher Hartl	939babc820	Updating formating for ValidationAmplicons GATK docs	2011-09-15 21:05:51 -04:00
Christopher Hartl	9fdf1f8eb6	Fix some doc formatting for Depth of Coverage	2011-09-15 21:05:22 -04:00
Menachem Fromer	e6e9b08c9a	Must provide alleles VCF to UGCallVariants	2011-09-15 18:51:09 -04:00
David Roazen	d78e00e5b2	Renaming VariantAnnotator SnpEff keys This is to head off potential confusion with the output from the SnpEff tool itself, which also uses a key named EFF.	2011-09-15 17:42:15 -04:00
Ryan Poplin	2a8b8efd2f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 16:26:35 -04:00
Ryan Poplin	2f58fdb369	Adding expected output doc to CountCovariates	2011-09-15 16:26:11 -04:00
Eric Banks	fd1831b4a5	Updating docs to include more details	2011-09-15 16:25:03 -04:00
Eric Banks	6d02a34bfb	Updating docs to include output	2011-09-15 16:17:54 -04:00
Eric Banks	4ef6a4598c	Updating docs to include output	2011-09-15 16:10:34 -04:00
Eric Banks	fe474b77f8	Updating docs so printing looks nicer	2011-09-15 16:05:39 -04:00
Eric Banks	f04e51c6c2	Adding docs from Andrey since his repo was all screwed up.	2011-09-15 15:38:56 -04:00
Guillermo del Angel	86480b2e13	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 15:31:07 -04:00
Eric Banks	d369d10593	Adding documentation before the release for GATK wiki page	2011-09-15 13:56:23 -04:00
Eric Banks	202405b1a1	Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations.	2011-09-15 13:52:31 -04:00
David Roazen	1e682deb26	Minor html-formatting-related documentation fix to the SnpEff class.	2011-09-15 13:07:50 -04:00
Guillermo del Angel	a942fa38ef	Refine the way we merge records in CombineVariants of different types. As of before, two records of different types were not combined and were kept separate. This is still the case, except when the alleles of one record are a strict subset of alleles of another record. For example, a SNP with alleles {A,T} and a mixed record with alleles {A,T, AAT} are now combined when start position matches.	2011-09-15 10:22:28 -04:00
David Roazen	3db457ed01	Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames" After discussing this with Mark, it seems clear that the old version of the VariantEval FunctionalClass stratification is preferable to this version. By reverting, we maintain backwards compatibility with legacy output files from the old GenomicAnnotator, and can add SnpEff support later without breaking that backwards compatibility. This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.	2011-09-14 10:47:28 -04:00
David Roazen	e0c8c0ddcb	Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames This is a temporary and hopefully short-lived solution. I've modified the FunctionalClass stratification to stratify by effect impact as defined by SnpEff annotations (high, moderate, and low impact) rather than by the silent/missense/nonsense categories. If we want to bring back the silent/missense/nonsense stratification, we should probably take the approach of asking the SnpEff author to add it as a feature to SnpEff rather than coding it ourselves, since the whole point of moving to SnpEff was to outsource genomic annotation.	2011-09-14 07:09:47 -04:00
David Roazen	1213b2f8c6	SnpEff 2.0.2 support -Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2) -Removed support for SnpEff 1.9.6 (and associated tribble codec) -Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag) -Correctly matches ref/alt alleles before annotating a record, unlike the previous version -Correctly handles indels (again, unlike the previous version	2011-09-14 07:09:47 -04:00
Guillermo del Angel	5b1bf6e244	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-13 17:04:43 -04:00
Guillermo del Angel	c6672f2397	Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf	2011-09-13 16:57:37 -04:00
Mark DePristo	edf29d0616	Explicit info message about uploading S3 log	2011-09-12 22:16:52 -04:00
Mark DePristo	2316b6aad3	Trying to fix problems with S3 uploading behind firewalls -- Cannot reproduce the very long waits reported by some users. -- Fixed problem that exception might result in an undeleted file, which is now fixed with deleteOnExit()	2011-09-12 22:02:42 -04:00
Matt Hanna	64707c33bb	Merged bug fix from Stable into Unstable	2011-09-12 21:54:11 -04:00
Matt Hanna	e63d9d8f8e	Mauricio pointed out to me that dynamic merging the unmapped regions of multiple BAMs ('-L unmapped' with a BAM list) was completely broken. Sorry about this! Fixed.	2011-09-12 21:50:59 -04:00
Eric Banks	ec4b30de6d	Patch from Laurent: typo leads to bad error messages.	2011-09-12 14:45:53 -04:00
David Roazen	9d9d438bc4	New VariantAnnotatorEngine capability: an initialize() method for all annotation classes. All VariantAnnotator annotation classes may now have an (optional) initialize() method that gets called by the VariantAnnotatorEngine ONCE before annotation starts. As an example of how this can be used, the SnpEff annotation class will use the initialize() method to check whether the SnpEff version number stored in the vcf header is a supported version, and also to verify that its required RodBinding is present.	2011-09-12 13:00:53 -04:00
Ryan Poplin	981b78ea50	Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.	2011-09-12 12:17:43 -04:00
Ryan Poplin	60ebe68aff	Fixing issue in VariantEval in which insertion and deletion events weren't treated symmetrically. Added new option to require strict allele matching.	2011-09-12 09:43:23 -04:00
Guillermo del Angel	9344938360	Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly	2011-09-10 19:41:01 -04:00
Guillermo del Angel	e95d484757	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-09 18:31:14 -04:00
Guillermo del Angel	a807205fc3	a) Minor optimization to softMax() computation to avoid redundant operations, results in about 5-10% increase in speed in indel calling. b) Added (but left commented out since it may affect integration tests and to isolate commits) fix to per-sample DP reporting, so that deletions are included in count. c) Bug fix to avoid having non-reference genotypes assigned to samples with PL=0,0,0. Correct behavior should be to no-call these samples, and to ignore these samples when computing AC distribution since their likelihoods are not informative.	2011-09-09 18:00:23 -04:00
Mauricio Carneiro	9e650dfc17	Fixing SelectVariants documentation getting rid of messages telling users to go for the YAML file. The idea is to not support these anymore.	2011-09-09 16:25:31 -04:00
Mark DePristo	72536e5d6d	Done	2011-09-09 15:44:47 -04:00
Mark DePristo	3c8445b934	Performance bugfix for GenomeLoc.hashcode -- old version overflowed so most GenomeLocs had 0 hashcode. Now uses or not plus to combine	2011-09-09 14:25:37 -04:00
Mark DePristo	c6436ee5f0	Whitespace cleanup	2011-09-09 14:24:29 -04:00
Mark DePristo	87dc5cfb24	Whitespace cleanup	2011-09-09 14:23:42 -04:00
Ryan Poplin	91c949db74	Fixing ValidateVariants so that it validates deletion records. Fixing GATKdocs.	2011-09-09 12:57:14 -04:00
Mark DePristo	06cb20f2a5	Intermediate commit cleaning up scatter intervals -- Adding unit tests to ensure uniformity of intervals	2011-09-09 12:56:45 -04:00
Eric Banks	6ad8943ca0	CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.	2011-09-09 09:45:24 -04:00
Mark DePristo	48461b34af	Added TYPE argument to print out VariantType	2011-09-08 15:01:13 -04:00
Ryan Poplin	9cba1019c8	Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap	2011-09-08 09:25:13 -04:00
Ryan Poplin	e0020b2b29	Fixing PrintRODs. Now has input and only prints out one copy of each record	2011-09-08 08:58:37 -04:00
Ryan Poplin	29c968ab60	clean up	2011-09-08 08:42:43 -04:00
Ryan Poplin	59841f8232	Fixing genotype given alleles for indels. Only take the records that start at this locus.	2011-09-08 08:41:16 -04:00
Mark DePristo	cd2c511c4a	GCF improvements -- Support for streaming VCF writing via the VCFWriter interface -- GCF now has a header and a footer. The header is minimal, and contains a forward pointer to the position of the footer in the file. -- Readers now read the header, and then jump to the footer to get the rest of the "header" information -- Version now a field in GCF	2011-09-07 23:28:46 -04:00
Mark DePristo	fe5724b6ea	Refactored indexing part of StandardVCFWriter into superclass -- Now other implementations of the VCFWriter can easily share common functions, such as writing an index on the fly	2011-09-07 23:27:08 -04:00
Mark DePristo	01b6177ce1	Renaming GVCF -> GCF	2011-09-07 17:10:56 -04:00
Mark DePristo	b220ed0d75	Merge branch 'master' into rodrewrite	2011-09-07 17:05:35 -04:00
Guillermo del Angel	45d54f6258	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 16:49:49 -04:00
Guillermo del Angel	9604fb2ba3	Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted	2011-09-07 16:49:16 -04:00
Mark DePristo	2ded027762	Removed dysfunctional tranches support from VariantEval	2011-09-07 16:09:24 -04:00
Eric Banks	aa9e32f2f1	Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.	2011-09-07 15:48:06 -04:00
Eric Banks	3a04955a30	We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now.	2011-09-07 14:01:42 -04:00
Guillermo del Angel	743bf7784c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 13:21:26 -04:00
Guillermo del Angel	5f22ef9a8c	Added missing javadoc info to Beagle arguments	2011-09-07 13:21:11 -04:00
Mark DePristo	3bcbfa6e06	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 13:13:17 -04:00
Mark DePristo	430da23446	At least 2 minutes must pass before a status message is printed, further stabilizing time estimates	2011-09-07 13:13:07 -04:00
Mauricio Carneiro	6857d0324e	Merge branch 'master' into rr	2011-09-07 12:59:08 -04:00
Mark DePristo	7e9e20fed0	Forgot to delete previous call	2011-09-07 12:54:52 -04:00
Mark DePristo	d23d620494	Pushing traversal engine timer start to as close to actual start as possible -- Should make initial timings more accurate	2011-09-07 12:52:33 -04:00
Mark DePristo	6ff432e1f2	BugFix for TF argument to VariantEval, actually making it work properly	2011-09-07 12:50:17 -04:00
Mauricio Carneiro	131cb7effd	Bringing Reduce Reads bug fixes to the main repository	2011-09-07 12:25:53 -04:00
Mark DePristo	a1920397e8	Major bugfix for per sample VariantEval -- per sample stratification was not being calculated correctly. The alt allele was always remaining, even if the genotype of the sample was hom-ref. Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything. Eric is going to fix the system in general, so this commit may break the build.	2011-09-07 12:18:11 -04:00
Mark DePristo	a02636a1ac	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/ebanks/Sting_rodrefactor into rodrewrite	2011-09-07 10:50:00 -04:00
Mark DePristo	d5641cfac5	Merge branch 'variantEvalST'	2011-09-07 10:44:23 -04:00
Mark DePristo	2f4cf82e3b	VariantEval cleanup. Added VariantType Stratification -- ArrayList are List where possible -- states refactored into VariantStratifier base class (reduces many lines of duplicate code) -- Added VariantType stratification that partitions report by VariantContext.Type	2011-09-07 10:43:53 -04:00
Christopher Hartl	436f6eb52b	Reverting Eric's change and pushing in some command-line-option documentation.	2011-09-07 08:53:30 -04:00
Eric Banks	1ef8a1750a	I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon.	2011-09-06 21:07:49 -04:00
Eric Banks	da9c8ab386	Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.	2011-09-06 20:39:42 -04:00
Mark DePristo	3db7ecb920	ReducedRead flag cached in GATKSAMRecord. 20% performance improvement	2011-09-06 15:11:38 -04:00
Roger Zurawicki	47607a7eff	Fixed bug where deletions messed up interval clipping - Instead of using readLength, the ReadUtil function are used to get a proper read coordinate - Added debug info in interval clipping ( with -dl) NOTE: method might not be safe for production and checks need to be added to the ClippingOp code	2011-09-06 14:25:57 -04:00
Khalid Shakir	0adb388dee	Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue. Updates for HybridSelectionPipeline: - Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5. - If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds. - Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches. - Removed GSA_PRODUCTION_ONLY flag from indel caller. - Updated indel hard filters based on delangel's analysis. - Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.	2011-09-06 12:41:46 -04:00
Mark DePristo	d471617c65	GATK binary VCF (gvcf) prototype format for efficiency testing -- Very minimal working version that can read / write binary VCFs with genotypes -- Already 10x faster for sites, 5x for fully parsed genotypes, and 1000x for skipping genotypes when reading	2011-09-02 21:15:19 -04:00
Mark DePristo	048202d18e	Bugfix for cached quals	2011-09-02 21:13:28 -04:00
Mark DePristo	03aa04e37c	Simple refactoring to make formating functions public	2011-09-02 21:13:08 -04:00
Mark DePristo	124ef6c483	MISSING_VALUE now gets defaultValue in getAttribute functions	2011-09-02 21:12:28 -04:00
Mark DePristo	82f2131777	Simplied getAttributeAsX interfaces -- Removed versions getAttribriteAsX(key) that except on not having the value. -- Removed version that getAttributeAsXNoException(key) -- The only available assessors are now getAttributeAsX(key, default). -- This single accessors properly handle their argument types, so if the value is a double it is returned directly for getAttributeAsDouble(), or if it's a string it's converted to a double. If the key isn't found, default is returned.	2011-09-02 12:27:11 -04:00
Mauricio Carneiro	08ae6c0c61	ReadClipper is now handling unmapped reads	2011-09-02 11:32:30 -04:00
Mark DePristo	c57198a1b9	Optimizations in VCFCodec -- Don't create an empty LinkedHashSet() for PASS fields. Just return Collections.emptySet() instead. -- For filter fields with actual values, returns an unmodifiableSet instead of one that can be changed	2011-09-02 08:46:17 -04:00
Mark DePristo	c3ea96d856	Removing many unused functions of unquestionable purpose	2011-09-02 08:42:01 -04:00
Eric Banks	d241f0e903	Adding docs for the pcr error rate argument.	2011-09-01 21:57:02 -04:00
Eric Banks	827fe6130c	Adding hidden printing option. Also, always run UG in mode GENOTYPE_GIVEN_ALLELES given that we don't actually test for the correct alleles (otherwise UG may choose a different allele and we may falsely validate the wrong one).	2011-09-01 11:40:35 -04:00
Mark DePristo	ac49b8d26b	Conditional support for PerformanceTrackingQuerySource to measure Tribble / GATK bridge performance -- Removed DEBUG option, instead use MEASURE_TRIBBLE_QUERY_PERFORMANCE in RMDTrackerBuilder	2011-09-01 10:41:55 -04:00
Mauricio Carneiro	4b5a7046c5	Making ReadLengthDistribution Public Found this neat little walker Kiran wrote stashed in the private tree. Very useful. Generalized it a bit, added GATKDocs and moved it to public. I might include it as a QC step on the pacbio processing pipeline. * generalize it so it works with non pair ended reads. * generalize it to work with no read group information	2011-08-31 15:52:28 -04:00
Mauricio Carneiro	7d79de91c5	Merge branch 'master' into rr	2011-08-30 02:50:19 -04:00
Mauricio Carneiro	0cd9438ac2	fixed soft unclipped calculation * getRefCoordSoftUnclippedEnd was not resetting the shift when hitting insertions. Fixed. * getReadCoordinateForReferenceCoordinateBeforeAlignmentEnd was returning the wrong read coordinate position. Fixed.	2011-08-30 02:45:29 -04:00
Mauricio Carneiro	fd540592ab	Added RMS calculation for consensus MQ Consensus MQ is now the average of the RMS of the mapping qualities of the reads making each site.	2011-08-30 02:45:20 -04:00
Mauricio Carneiro	6f9264d2b3	Hard Clipping no longer leaves indels on the tails The clipper could leave an insertion or deletion as the start or end of a read after hardclipping a read if the element adjacent to the clipping point was an indel. Fixed.	2011-08-30 02:44:58 -04:00
Mauricio Carneiro	943876c6eb	Added QUAL/MINVAR parameters to the walker	2011-08-30 02:44:46 -04:00
Mauricio Carneiro	7532be7f5a	Allowing to clip after AlignmentEnd if end is soft clipped. Read clipper now identifies and clips even if the requested coordinate is outside the alignment but the read contains soft clipped bases in that region.	2011-08-30 02:44:46 -04:00
Mauricio Carneiro	90a1f5e15c	Several bug fixes * When hard clipping a read that had insertions in it, the insertion was being added to the cigar string's hard clip element. This way, the old UnclippedStart() was being modified and so was the calculation of the new AlignmentStart(). Fixed it by subtracting the number of insertions clipped from the total number of hard clipped bases. * Walker was sending read instead of filtered read when deleting a read that contains only Q2 bases * Sliding the window was causing reads that started on the new start position to be entirely clipped.	2011-08-30 02:44:19 -04:00
Mauricio Carneiro	66a8b36cf5	Fixed most indexing bugs * added bases and quals to consensus * fixed consensus read cigar generation.	2011-08-30 02:43:41 -04:00
Mark DePristo	1e5001b447	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-29 17:04:21 -04:00
Mark DePristo	3b09d42ed6	Now only prints 1 warning message about duplicate headers in simpleMerge	2011-08-29 14:41:29 -04:00
Eric Banks	c2f0db969b	Don't use the default deletion value from UG if not asking to have it set	2011-08-29 13:48:10 -04:00
Eric Banks	bb7a37e8f2	We need to allow reference calls in the input VCF for the GenotypeAndValidate walker when using the BAM as truth so that we can test supposed monomorphic calls against the truth.	2011-08-29 13:19:35 -04:00
Ryan Poplin	bc252a0d62	misc minor bug fixes in assembly. Increasing the minimum number of bad variants to be used in negative model training in the VQSR	2011-08-29 08:11:31 -04:00
Mark DePristo	a5c65fc133	Debugging information to print out the Query tracks	2011-08-28 18:54:49 -04:00
Mark DePristo	7bf006278d	Moved ResolveHostname to general utils as a static function	2011-08-28 12:04:16 -04:00
Mark DePristo	ccec0b4d73	AnalyzeCovariates uses the general RScript system now -- Convenience constructor for collection for testing -- callRScript() now accepts Objects not Strings, for convenience	2011-08-27 12:54:13 -04:00
Mark DePristo	1ceb020fae	UnitTests for RScript	2011-08-27 10:50:05 -04:00
Mark DePristo	e37a638e09	Fix for disallowed characters in GATKReportTable -- Illegal characters are automatically replaced with _	2011-08-26 13:24:06 -04:00
Mark DePristo	eef1ac415a	Merge branch 'master' into rodTesting Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java	2011-08-26 00:35:41 -04:00
Eric Banks	9b7512fd94	Just because there's a ref base doesn't mean the VC needs to be padded	2011-08-25 22:42:14 -04:00
Mark DePristo	e01273ca7c	Queue now writes out queueJobReport.pdf -- General purpose RScript executor in java (please use when invoking RScripts) -- Removed groupName. This is now analysisName -- Explicitly added capability to enable/disable individual QFunction	2011-08-25 16:57:11 -04:00
Eric Banks	09a729da3a	Removing incorrect comment	2011-08-25 15:42:52 -04:00
Eric Banks	8bbef79fc2	Create clipped alleles during allele parsing instead of creating a full VC, clipping alleles, and regenerating the VC from scratch.	2011-08-25 15:37:26 -04:00
Ryan Poplin	29c7b10f7b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-24 15:18:58 -04:00
Ryan Poplin	e5008aba00	Output the top two haplotypes as a variant call by running smith-waterman alignment against the reference and calling any difference as variation. This is the first verion that runs end-to-end by taking in reads as bam file and writing out variant calls in VCF.	2011-08-24 15:18:44 -04:00
Guillermo del Angel	e618cb1e79	a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED\|INDEL\|SNP\|MNP\|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC\|MULTIALLELIC]. Corresponding gatkdocs changes. b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk). c) Added integrationtest for new SelectVariants commands	2011-08-24 12:25:50 -04:00
Mark DePristo	28ee6dac41	Fixed spelling mistake	2011-08-24 10:14:45 -04:00
Mark DePristo	569e1a1089	Walker.isDone() aborts execution early -- Useful if you want to have a parameter like MAX_RECORDS that wants the walker to stop after some number of map calls without having to resort to the old System.exit() call directly.	2011-08-23 16:53:06 -04:00
Ryan Poplin	a1a1fac9e4	Likelihood engine now gives non-zero likelihoods. Using HMM function that can handle context specific gap open and gap continuation penalties	2011-08-23 13:43:07 -04:00
Guillermo del Angel	6e2552a9ef	Merge fix	2011-08-23 12:40:43 -04:00
Guillermo del Angel	8b7a0b3b62	Two new arguments to SelectVariants to exclude either multiallelic or biallelic sites from input vcf	2011-08-23 12:40:01 -04:00
Roger Zurawicki	ac36271457	Fixed extra reads showing up in Variable Sites Reads that were not hard clipped for the variable site no longer show up in output file Walker now uses unclippedStart of Read to determine position in the sliding Window	2011-08-23 11:26:00 -04:00

... 6 7 8 9 10 ...

1226 Commits (621ee2b613bc518480599b8e0939ebfd79f7bbe5)