gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Guillermo del Angel	827be878b4	Bug fix when running UG in GenotypeGivenAlleles mode: if an input site to genotype had no coverage, the output VCF had AC,AF and AN inherited from input VCF, which could have nothing to do with given BAM so numbers could be non-sensical. Now new vc has clear attributes instead of attributes inherited from input VCF.	2012-02-06 11:58:13 -05:00
Eric Banks	fbbd04621d	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-06 11:53:31 -05:00
Eric Banks	edb4edc08f	Commented out unused metrics for now	2012-02-06 11:53:15 -05:00
Ryan Poplin	096c23a473	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-06 11:10:38 -05:00
Ryan Poplin	dc05b71e39	Updating Covariate interface with Mauricio to include an errorModel parameter. On the fly recalibration of base insertion and base deletion quals is live for the HaplotypeCaller	2012-02-06 11:10:24 -05:00
Guillermo del Angel	1e11408f8b	Merged bug fix from Stable into Unstable	2012-02-06 10:34:26 -05:00
Guillermo del Angel	090d87b48b	Bug fix in ValidationSiteSelector: when input vcf had genotypes and was multiallelic, the parsing of the AF/AC fields was wrong. Better logic to unify parsing of field	2012-02-06 10:33:12 -05:00
Eric Banks	9d94f310f1	Break AF histogram into max and min AFs	2012-02-06 09:01:19 -05:00
Ryan Poplin	b7ffd144e8	Cleaning up the covariate classes and removing unused code from the bqsr optimizations in 2009.	2012-02-06 08:54:42 -05:00
Eric Banks	cef550903e	Minor optimization	2012-02-06 00:48:00 -05:00
Ryan Poplin	5343f8ba67	Initial version of on-the-fly, lazy loading base quality score recalibration. It isn't completely hooked up yet but I'm committing so Mauricio and Mark can see how I envision it will fit together. Look it over and give any feedback. With the exception of the Solid specific code we are very very close to being able to remove TableRecalibrationWalker from the code base and just replace it with PrintReads -BQSR recal.csv	2012-02-05 13:09:03 -05:00
Ryan Poplin	f94d547e97	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-03 17:14:20 -05:00
Ryan Poplin	894d3340be	Active Region Traversal should use GATKSAMRecords everywhere instead of SAMRecords. misc cleanup.	2012-02-03 17:13:52 -05:00
Mauricio Carneiro	4a57add6d0	First implementation of DiagnoseTargets * calculates and interprets the coverage of a given interval track * allows to expand intervals by specified number of bases * classifies targets as CALLABLE, LOW_COVERAGE, EXCESSIVE_COVERAGE and POOR_QUALITY. * outputs text file for now (testing purposes only), soon to be VCF. * filters are overly aggressive for now.	2012-02-03 17:12:43 -05:00
Mauricio Carneiro	3dd6a1f962	Adding some generic sum and average functions to MathUtils	2012-02-03 17:12:43 -05:00
Mauricio Carneiro	e1d69e4060	make the size of a GenomeLoc int instead of long it will never be bigger than an int and it's actually useful to be an int so we can use it as parameters to array/list/hash size creation.	2012-02-03 17:12:42 -05:00
Ryan Poplin	0e44430e47	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-03 13:45:11 -05:00
Christopher Hartl	aa3638ecb3	Merge branch 'master' of ssh://chartl@ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-03 13:42:09 -05:00
Eric Banks	3abfbcbcf2	Generalized the TDT for multi-allelic events	2012-02-03 12:23:21 -05:00
Ryan Poplin	601e53d633	Fix when specifying preset active regions with -AR argument	2012-02-02 16:34:26 -05:00
Christopher Hartl	0111505ea9	Terrible. Swapping the paternal and sample ids.	2012-02-02 11:41:16 -05:00
Ryan Poplin	1f50f6970b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-02 10:17:13 -05:00
Ryan Poplin	4ed06801a7	Updating HaplotypeCaller's HMM calc to use GOP as a function of the read instead of a function of the haplotype in preparation for IQSR	2012-02-02 10:17:04 -05:00
Matt Hanna	8adfc79123	Merged bug fix from Stable into Unstable	2012-02-01 16:07:41 -05:00
Matt Hanna	30b937d2af	Fix bug discovered in FGTP branch in which BlockInputStream returns -1 in cases where some data could be read, but not all the data requested by the caller.	2012-02-01 16:06:22 -05:00
Mauricio Carneiro	45da892ecc	Better exceptions to catch malformed reads * throw exceptions in LocusIteratorByState when hitting reads starting or ending with deletions	2012-02-01 11:56:19 -05:00
Christopher Hartl	810996cfca	Introducing: VariantsToPed, the world's most annoying walker! And also a busted QScript to run it that I need Khalid's help debugging ( frownie face ). Note that VariantsToPed and PlinkSeq generate the same binary file (up to strand flips...thanks PlinkSeq), so I know it's working properly. Hooray!	2012-02-01 10:39:03 -05:00
Christopher Hartl	25d943f706	Merge branch 'master' of ssh://chartl@ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-02-01 10:32:11 -05:00
Ryan Poplin	056b24ccd6	Resolving merge conflicts with LocusIteratorByState	2012-01-31 16:13:32 -05:00
Ryan Poplin	febc634557	Changing PileupElement's isSoftClipped to isNextToSoftClip since soft clipped bases aren't actually added to pileups, oops. Removing the intrinsic clustered variants filter from the HaplotypeCaller	2012-01-31 16:06:14 -05:00
Matt Hanna	7f70612beb	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-31 11:59:25 -05:00
Matt Hanna	a630db1703	Oops...HierarchicalMicroScheduler was transforming any exception from the walker level into a ReviewedStingException. Thanks to Ryan for pointing this out.	2012-01-31 11:58:21 -05:00
Christopher Hartl	faba3dd530	Merge branch 'master' of ssh://chartl@ni.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-31 10:25:29 -05:00
Mauricio Carneiro	17dbe9a95d	A few cleanups in the LocusIteratorByState * No more N's in the extended event pileups * Only add to the pileup MQ0 counter if the read actually goes into the pileup	2012-01-31 09:40:51 -05:00
Ryan Poplin	f9162ea705	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-30 19:45:19 -05:00
Ryan Poplin	abb91cf26b	Increasing the size of the active regions that are produced by the active probability integrator, more context is needed to call more complex events	2012-01-30 15:36:12 -05:00
Mauricio Carneiro	d5d4fa8a88	Fixed discordance bug reported by Brad Chapman discordance now reports discordance between genotypes as well (just like concordance)	2012-01-30 09:50:45 -05:00
Mark DePristo	3164c8dee5	S3 upload now directly creates the XML report in memory and puts that in S3 -- This is a partial fix for the problem with uploading S3 logs reported by Mauricio. There the problem is that the java.io.tmpdir is not accessible (network just hangs). Because of that the s3 upload fails because the underlying system uses tmpdir for caching, etc. As far as I can tell there's no way around this bug -- you cannot overload the java.io.tmpdir programmatically and even if I could what value would we use? The only solution seems to me is to detect that tmpdir is hanging (how?!) and fail with a meaningful error.	2012-01-29 15:14:58 -05:00
Menachem Fromer	0e17cbbce9	Merged bug fix from Stable into Unstable	2012-01-27 16:03:16 -05:00
Menachem Fromer	a9671b73ca	Fix to permit proper handling of mapping qualities between 128 to 255 (which get converted to byte values of -128 to -1)	2012-01-27 16:01:30 -05:00
Ryan Poplin	f7ac1f4a69	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-27 15:12:55 -05:00
Ryan Poplin	fc08235ff3	Bug fix in active region traversal, locusView.getNext() skips over pileups with zero coverage but still need to count them in the active probability integrator	2012-01-27 15:12:37 -05:00
Mark DePristo	0f2e8400b5	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-27 10:12:50 -05:00
Mauricio Carneiro	ec9920b04f	Updating the SAM TAG for Original Alignment Start to "OP" per Mark's recommendation to reuse the Indel Realigner tag that made it to the SAM spec. The Alignment end tag is still "OE" as there is no official tag to reuse.	2012-01-27 08:51:39 -05:00
Mark DePristo	13d1626f51	Minor improvements in ref QC walker. Unfortunately this doesn't actually catch Chris's error	2012-01-27 08:24:22 -05:00
Mauricio Carneiro	0d4027104f	Reduced reads are now aware of their original alignments * Added annotations for reads that had been soft clipped prior to being reduced so that we can later recuperate their original alignments (start and end). * Tags keep the alignment shifts, not real alignment, for better compression * Tags are defined in the GATKSAMRecord * GATKSAMRecord has new functionality to retrieve original alignment start of all reads (trimmed or not) -- getOriginalAlignmentStart() and getOriginalAligmentEnd() * Updated ReduceReads MD5s accordingly	2012-01-26 17:06:36 -05:00
Eric Banks	07f72516ae	Unsupported platform should be a user error	2012-01-26 16:14:25 -05:00
Ryan Poplin	cdff23269d	HaplotypeCaller now uses insertions and softclipped bases as possible triggers. LocusIteratorByState tags pileup elements with the required info to make this calculation efficient. The days of the extended event pileup are coming to a close.	2012-01-26 15:56:33 -05:00
Christopher Hartl	673ceadd11	While this fix worked for the evaluator module, it could potentially have bad effects in the phasing walkers. Special-case nocalls in the PhasingEvaluator and return AllelePair to previous state.	2012-01-26 13:06:36 -05:00
Christopher Hartl	9c6fda7e15	Yup. I was right.	2012-01-26 12:54:11 -05:00
Christopher Hartl	7d059540a4	Allow segments of genome to be excluded in generating a reference panel. Occasionally targets would contain no variation (typically, in the middle of the centromere), which beagle doesn't particularly like, and errors out rather than producing empty output files. The best way to deal with these is to just exclude the regions on a second-pass, and the remaining bits will be gathered with no additional work. AllelePair is being mean and not telling me what genotype it sees when it finds a non-diploid genotype, but i suspect it's a no-call (".") rather than a no call ("./.").	2012-01-26 12:43:52 -05:00
Ryan Poplin	25532bdc37	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-26 11:43:32 -05:00
Ryan Poplin	390d493049	Updating ActiveRegionWalker interface to output a probability of active status instead of a boolean. Integrator runs a band-pass filter over this probability to produce actual active regions. First version of HaplotypeCaller which decides for itself where to trigger and assembles those regions.	2012-01-26 11:37:08 -05:00
Eric Banks	859dd882c9	Don't make it standard for now	2012-01-26 00:38:16 -05:00
Eric Banks	c5e81be978	Adding pairwise AF table. Not polished at all, but usable none-the-less.	2012-01-26 00:37:06 -05:00
Eric Banks	702a2d768f	Initial version of multi-allelic summary module in VariantEval	2012-01-25 19:42:55 -05:00
Eric Banks	9a60887567	Lost an import in the merge	2012-01-25 19:41:41 -05:00
Eric Banks	cba5f1a8b1	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-25 19:19:03 -05:00
Eric Banks	add6918f32	Cleaner, more efficient way of determining the last dependent set in the queue.	2012-01-25 16:21:10 -05:00
Menachem Fromer	db645a94ca	Added options to make the batch-merger more all-inclusive: keep all indels, SNPs (even filtered ones) but maintain their annotations. Also, VariantContextUtils.simpleMerge can now merge variants of all types using the Hidden non-default enum MultipleAllelesMergeType=MIX_TYPES	2012-01-25 16:10:59 -05:00
Eric Banks	ef335a5812	Better implementation of the fix; PL index is now traversed in order.	2012-01-25 15:15:42 -05:00
Eric Banks	8e2d372ab0	Use remove instead of setting the value to null	2012-01-25 14:41:34 -05:00
Eric Banks	05816955aa	It was possible that we'd clean up a matrix column too early when a dependent column aborted early (with not enough probability mass) because we weren't being smart about the order in which we created dependencies. Fixed.	2012-01-25 14:28:21 -05:00
Eric Banks	2799a1b686	Catch exception for bad type and throw as a TribbleException	2012-01-25 12:15:51 -05:00
Eric Banks	96b62daff3	Minor tweak to the warning message.	2012-01-25 11:55:33 -05:00
Eric Banks	fb863dc6a7	Warn user when trying to run with EMIT_ALL_SITES with indels; better docs for that option.	2012-01-25 11:50:12 -05:00
Eric Banks	e349b4b14b	Allow appending with the dbSNP ID even if a (different) ID is already present for the variant rod.	2012-01-25 11:35:54 -05:00
Eric Banks	ea3d4d60f2	This annotation requires rods and should be annotated as such	2012-01-25 11:35:13 -05:00
Ryan Poplin	bbefe4a272	Added option to be able to write out the active regions to an interval list file	2012-01-25 09:47:06 -05:00
Ryan Poplin	9818c69df6	Can now specify active regions to process at the command line, mainly for debugging purposes	2012-01-25 09:32:52 -05:00
Mauricio Carneiro	ffd61f4c1c	Refactor the Pileup Element with regards to indels Eric reported this bug due to the reduced reads failing with an index out of bounds on what we thought was a deletion, but turned out to be a read starting with insertion. * Refactored PileupElement to distinguish clearly between deletions and read starting with insertion * Modified ExtendedEventPileup to correctly distinguish elements with deletion when creating new pileups * Refactored most of the lazyLoadNextAlignment() function of the LocusIteratorByState for clarity and to create clear separation between what is a pileup with a deletion and what's not one. Got rid of many useless if statements. * Changed the way LocusIteratorByState creates extended event pileups to differentiate between insertions in the beginning of the read and deletions. * Every deletion now has an offset (start of the event) * Fixed bug when LocusITeratorByState found a read starting with insertion that happened to be a reduced read. * Separated the definitions of deletion/insertion (in the beginning of the read) in all UG annotations (and the annotator engine). * Pileup depth of coverage for a deleted base will now return the average coverage around the deletion. * Indel ReadPositionRankSum test now uses the deletion true offset from the read, changed all appropriate md5's * The extra pileup elements now properly read by the Indel mode of the UG made any subsequent call have a different random number and therefore all RankSum tests have slightly different values (in the 10^-3 range). Updated all appropriate md5s after extremely careful inspection -- Thanks Ryan! phew!	2012-01-24 16:07:21 -05:00
Matt Hanna	c312bd5960	Weirdly, PicardException inherits from SAMException, which means that our specialty code for reporting malformed BAMs was actually misreporting any error that happened in the Picard layer as a BAM ERROR. Specifically changing PicardException to report as a ReviewedStingException; we might want to change it in the future. I'll followup with the Picard team to make sure they really, really want PicardException to inherit from SAMException.	2012-01-24 15:30:04 -05:00
Mark DePristo	0a3172a9f1	Fix for ref 0 bases for Chris -- Disturbingly, fixing this bug doesn't actually cause an test failures. -- Wrote a new QCRefWalker to actually check in detail that the reference bases coming into the RefWalker are all correct when comparing against a clean uncached load of the contig bases directly. -- However, I cannot run this tool due to some kind of weird BAM error -- sending this on to Matt	2012-01-24 10:55:09 -05:00
Khalid Shakir	c18beadbdb	Device files like /dev/null are now tracked as special by Queue and are not used to generate .out file paths, scattered into a temporary directory, gathered, deleted, etc. Attempted workaround for xdr_resourceInfoReq unsatisfied link during loading of libbat.so.	2012-01-23 16:17:04 -05:00
Mark DePristo	02450e4b12	Merged bug fix from Stable into Unstable	2012-01-23 12:08:39 -05:00
Christopher Hartl	798596257b	Enable the Genotype Phasing Evaluator. Because it didn't have the same argument structure as the base class, update2 of VariantEvaluator was being called, rather than update2 of the actual module.	2012-01-23 10:50:16 -05:00
Mark DePristo	80a4ce0edf	Bugfix for incorrect error messages for missing BAMs and VCFs -- Missing BAMs were appearing as StingExceptions -- Missing VCFs were showing up as CommandLineErrors, but it's clearer for them to be CouldNotReadInputFile exceptions -- Added integration tests to ensure missing BAMs, VCFs, and -L files are properly thrown as CouldNotReadInputFile exceptions -- Added path to standard b37 BAM to BaseTest -- Cleaned up code in SAMDataSource, removing my parallel loading code as this just didn't prove to be useful.	2012-01-23 09:52:07 -05:00
Guillermo del Angel	31d2f04368	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-23 09:23:03 -05:00
Guillermo del Angel	966387ca0b	Next intermediate commit in the pool caller. Lots of bug fixes and now we can emit true vcf's with calls in discovery mode (still of unknown quality) - old validation mode is temporarily broken,will be fixed in next refactoring.	2012-01-23 09:22:31 -05:00
Ryan Poplin	4d6312d4ea	HaplotypeCaller is now an ActiveRegionWalker.	2012-01-22 14:31:01 -05:00
Christopher Hartl	3b1aad4f17	After a minor and abject freakout, alter the T2D script to seek out truth sensitivities between 80 and 100, rather than between 0.8 and 1. Also, don't consider a genotype "changed by beagle" if the initial genotype is a no-call.	2012-01-20 23:43:51 -05:00
Christopher Hartl	9b4f6afa21	Alterations to scripts for better performance. Grid search now expands the sens/spec tradeoff (90 was far too aggressive against hapmap chr20), and 20 max gaussians was too many, and caused errors. For consensus genotypes: remember to gunzip the beagle outputs before converting to VCF. Also, beagle can in fact create 'null' alleles in certain circumstances. I'm not sure what exactly those circumstances are, but those sites should be ignored. When it does, all alleles apear to be set to null, so this should not affect the actual phasing in the output VCF.	2012-01-20 23:07:59 -05:00
Ryan Poplin	4b18786b5d	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-19 22:05:20 -05:00
Ryan Poplin	ace9333068	Active region walkers can now see the reads in a buffer around thier active reigons. This buffer size is specified as a walker annotation. Intervals are internally extended by this buffer size so that the extra reads make their way through the traversal engine but the walker author only needs to see the original interval. Also, several corner case bug fixes in active region traversal.	2012-01-19 22:05:08 -05:00
Menachem Fromer	066da80a3d	Added KEEP_UNCONDTIONAL option which permits even sites with only filtered records to be included as unfiltered sites in the output	2012-01-19 18:19:58 -05:00
Christopher Hartl	7f3ad25b01	Adding a mode to VariantFiltration to invalidate previously-applied filters to allow complete re-filtering of a VCF. T2D VQSR: re-calling now done with appropriate quality settings and using BAQ.	2012-01-19 10:54:48 -05:00
Ryan Poplin	7e082c7750	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-19 09:11:23 -05:00
Eric Banks	ab8f499bc3	Annotate with FS even for filtered sites	2012-01-18 22:04:51 -05:00
Guillermo del Angel	b123416c4c	Resolve stale merge changes	2012-01-18 20:56:36 -05:00
Guillermo del Angel	2eb45340e1	Initial, raw, mostly untested version of new pool caller that also does allele discovery. Still needs debugging/refining. Main modification is that there is a new operation mode, set by argument -ALLELE_DISCOVERY_MODE, which if true will determine optimal alt allele at each computable site and will compute AC distribution on it. Current implementation is not working yet if there's more than one pool and it will only output biallelic sites, no functionality for true multi-allelics yet	2012-01-18 20:54:10 -05:00
Ryan Poplin	0268da7560	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-18 09:53:00 -05:00
Ryan Poplin	11982b5a34	We no longer calculate the population-level TDT statistic if there are fewer than 5 trios with full genotype likelihood information. When there is a high degree of missingness the results are skewed or in the worst case come out as NaN.	2012-01-18 09:42:41 -05:00
Mark DePristo	763c81d520	No longer enforce MAX_ALLELE_SIZE in VCF codec -- Instead issue a warning when a large (>1MB) record is encountered -- Optimized ref.getBytes()[i] => (byte)ref.charAt(i), which avoids an implicit O(n) allocation each iteration through computeReverseClipping()	2012-01-18 07:35:11 -05:00
Mark DePristo	0c7865fdb5	UnitTest for reverseAlleleClipping -- No code modified yet, just implementing a unit test to ensure correctness of the existing code	2012-01-18 07:35:11 -05:00
Mark DePristo	62801e430a	Bugfix for unnecessary optimization -- don't cache the ref bytes	2012-01-17 16:40:26 -05:00
Mark DePristo	f2b0575dee	Detect unreasonably large allele strings (>2^16) and throw an error -- samtools can emit alleles where the ref is 42M Ns and this caused the GATK (via tribble) to hang in several places. -- Tribble was updated so we actually could read the line properly (rev. to 51 here). -- Still the parsing algorithms in the GATK aren't happy with such a long allele. Instead of optimizing the code around an improper use case I put in a limit of 2^16 bp for any allele, and throw a meaningful exception when encountered.	2012-01-17 16:40:26 -05:00
Ryan Poplin	8b0ddf0aaf	Adding notes to CountCovariates docs about using interval lists as database of known variation	2012-01-17 16:13:13 -05:00
Matt Hanna	40ebc17437	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-17 14:49:17 -05:00
Matt Hanna	41d70abe4e	At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings.	2012-01-17 14:47:53 -05:00
Ryan Poplin	ae259f81cc	Bug fixing for merging of read fragments when one fragment contained an indel	2012-01-17 14:39:27 -05:00
Christopher Hartl	cde224746f	Bait Redesign supports baits that overlap, by picking only the start of intervals. CalibrateGenotypeLikelihoods supports using an external VCF as input for genotype likelihoods. Currently can be a per-sample VCF, but has un-implemented methods for allowing a read-group VCF to be used. Removed the old constrained genotyping code from UGE -- the trellis calculated is exactly the same as that done in the MLE AC estimate; so we should just re-use that one.	2012-01-17 13:51:05 -05:00
Ryan Poplin	8e23c98dd9	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-17 13:46:28 -05:00
Matt Hanna	32ccde374b	Merged bug fix from Stable into Unstable	2012-01-17 11:08:35 -05:00
Matt Hanna	3ba918aff1	Error message cleanup in BAM indexing code.	2012-01-17 11:05:42 -05:00
Mauricio Carneiro	cec7107762	Better location for the downsampling of reads in PrintReads * using the filter() instead of map() makes for a cleaner walker. * renaming the unit tests to make more sense with the other unit and integration tests	2012-01-14 14:06:09 -05:00
Mark DePristo	b06074d6e7	Updated SortingVCFWriterBase to use PriorityBlockingQueue so that the class is thread-safe -- Uses PriorityBlockingQueue instead of PriorityQueue -- synchronized keywords added to all key functions that modify internal state Note that this hasn't been tested extensivesly. Based on report: http://getsatisfaction.com/gsa/topics/missing_loci_output_in_multi_thread_mode_when_implement_sortingvcfwriterbase?utm_content=topic_link&utm_medium=email&utm_source=new_topic	2012-01-13 09:33:16 -05:00
Mauricio Carneiro	28aa353501	Added "unbiased" downsampling parameter to PrintReads * also cleaned up and updated part of the unit tests for print reads. Needs a more thorough cleaning.	2012-01-12 16:33:55 -05:00
Matt Hanna	2c3176eb80	Merged bug fix from Stable into Unstable	2012-01-12 13:31:10 -05:00
Matt Hanna	cd43f016ce	Fixed NPE in getNextOverlappingBAMScheduleEntry() when mixed mapped/unmapped interval lists are used. Added integrationtest to verify behavior.	2012-01-12 13:29:11 -05:00
Eric Banks	ed34b4f088	Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-12 10:27:26 -05:00
Eric Banks	e7fe9910f7	Create the temp storage for calculating cell values just once as per Mark's TODO	2012-01-12 10:27:10 -05:00
Eric Banks	f5f5ed5dcd	Don't initialize the cell conformation values (use an else in the loop instead) as per Mark's TODO	2012-01-12 08:50:03 -05:00
Eric Banks	410a340ef5	Swapping the iteration order to run over AF conformations and then samples instead of the reverse minimizes calls to HashMap.get; instead of it being O(n) since we called it for each sample it's now O(1). Runtime on T2D GENES test set is reduced by 5-10%. More optimizations to follow.	2012-01-12 02:04:03 -05:00
Mauricio Carneiro	77a03c9709	Patching special case in the adaptor clipping * if the adaptor boundary is more than MAXIMUM_ADAPTOR_SIZE bases away from the read, then let's not clip anything and consider the fragment to be undetermined for this read pair. * updated md5's accordingly	2012-01-11 17:47:44 -05:00
Eric Banks	25d0d53d88	Moving the approximate summing of log10 vals to MathUtils; keeping the more efficient implementation of fast rounding.	2012-01-10 12:38:47 -05:00
Eric Banks	589397d611	Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-10 12:36:48 -05:00
Matt Hanna	e923a2e512	Revving Picard to incorporate final version of ReadWalker performance improvements.	2012-01-10 12:12:33 -05:00
Eric Banks	0f36f6947e	Resolving merge conflicts	2012-01-10 11:44:16 -05:00
Eric Banks	f2cecce10f	Much better implementation of the approximate summing of an array of log10 values (including more efficient rounding). Now effectively takes 0% of UG runtime on T2D GENES (as opposed to 11% previously).	2012-01-10 11:34:23 -05:00
Matt Hanna	509c3d87b0	Merged bug fix from Stable into Unstable	2012-01-09 23:08:46 -05:00
Matt Hanna	dc60757b68	Eliminate unnecessary strong references (and therefore memory held) by tree reduce entries that have already been processed. Thanks to Tim Fennell for the bug report.	2012-01-09 23:04:53 -05:00
Matt Hanna	fda1795791	Merged bug fix from Stable into Unstable	2012-01-08 22:04:44 -05:00
Matt Hanna	1f1233b669	Fix for a rare but insidious bug in position tracking during async BAM file reading. Thanks to Khalid for spotting and reporting the issue.	2012-01-08 22:03:35 -05:00
Khalid Shakir	5793625592	No more "Q-<pid>@<host>". Generated log file names now use the first output + ".out" (ex. my.vcf.out) or the name of the first QScript plus the order the function was added (ex. MyScript-1.out). The same function added twice with the same outputs will now have the same default logs, meaning the 2nd instance of the function won't be added to the graph twice. QScript accessor to QSettings to specify a default runName and other default function settings. Because log files are no longer pseudo-random their presense can be used to tell if a job without other file outputs is "done". For now still using the log's .done file in addition to original outputs. Gathered log files concatenate all log files together into the stdout. InProcessFunctions now have PrintStreams for stdout and stderr. Updated ivy to use commons-io 2.1 for copying logs to the stdout PrintStream. Removed snakeyaml. During graph tracking of outputs the Index files, and now BAM MD5s, are tracked with the gathering of the original file. In Queue generated wrappers for the GATK the Index and MD5s used for tracking are switched to private scope. Added more detailed output when running with -l DEBUG. Simplified graphviz visualization for additional debugging. Switched usage of the scala class 'List' to the trait 'Seq' (think java.util.ArrayList vs. using the interface java.util.List) Minor cleanup to build including sending ant gsalib to R's default libloc.	2012-01-08 12:11:55 -05:00
Guillermo del Angel	d4e7655d14	Added ability to call multiallelic indels, if -multiallelic is included in UG arguments. Simple idea: we genotype all alleles with count >= minIndelCnt. To support this, refactored code that computes consensus alleles. To ease merging of mulitple alt alleles, we create a single vc for each alt alleles and then use VariantContextUtils.simpleMerge to carry out merging, which takes care of handling all corner conditions already. In order to use this, interface to GenotypeLikelihoodsCalculationModel changed to pass in a GenomeLocParser object (why are these objects to hard to handle??). More testing is required and feature turned off my default.	2012-01-06 11:24:38 -05:00
Ryan Poplin	616ff8ea01	fixed typo in help text	2012-01-06 10:36:11 -05:00
Mark DePristo	dd80ffbbbe	Merged bug fix from Stable into Unstable	2012-01-05 21:51:48 -05:00
Mark DePristo	c96fee477c	Bug fix for VariantSummary -- Call sets with indels > 50 bp in length are tagged as CNVs in the tag (following the 1000 Genomes convention) and were unconditionally checking whether the CNV is already known, by looking at the known cnvs file, which is optional. Fixed. Has the annoying side effect that indels > 50bp in size are not counted as indels, and so are substrated from both the novel and known counts for indels. C'est la vie -- Added integration test to check for this case, using Mauricio's most recent VCF file for NA12878 which has many large indels. Using this more recent and representative file probably a good idea for more future tests in VE and other tools. File is NA12878.HiSeq.WGS.b37_decoy.indel.recalibrated.vcf in Validation_Data	2012-01-05 21:51:06 -05:00
Eric Banks	f5e10e9879	Merged bug fix from Stable into Unstable	2012-01-05 15:35:09 -05:00
Eric Banks	18ed954741	Compute Ti/Tv only if bi-allelic	2012-01-05 15:33:26 -05:00
Ryan Poplin	a6886a4cc0	Initial commit of the Active Region Traversal. Not ready to be used by anyone yet.	2012-01-04 17:03:21 -05:00
Guillermo del Angel	58d4539304	Enabled banded indel computation by default. Reversed logic in input UG argument so that we can still disable it if required. Minor changes to integration tests due to minor differences in GL's and in annotations	2012-01-04 15:28:26 -05:00
Mauricio Carneiro	9ff8a01da2	Merged bug fix from Stable into Unstable	2012-01-03 18:10:39 -05:00
Mauricio Carneiro	9b55505c03	Fixing PairHMMIndelErrorModel array out of bounds This error was due to the ReadClipper change of contract. Before the read utils would return null if a read was entirely clipped, now it returns an empty (safe) GATKSAMRecord.	2012-01-03 18:08:46 -05:00
Christopher Hartl	2c3a9ce02f	Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable	2012-01-03 17:25:56 -05:00
David Roazen	621ee2b613	Merged bug fix from Stable into Unstable	2012-01-03 16:56:49 -05:00
Christopher Hartl	9093de1132	Cleanup: remove code to calculate the MLE AC in the UGE.	2012-01-03 15:58:51 -05:00
Christopher Hartl	2d093828a4	Final changes to Junky (been frozen for a while, but uncommitted) and the qscript for it. A first cursory implementation of the trellis-based Exact AC-constrained genotyping algorithm in UGE. Nothing calls into it, so this should be entirely safe (and, no surprise, it passes UG integration tests).	2012-01-03 15:33:04 -05:00
David Roazen	ea6e718cb8	SnpEff 2.0.5 support. Re-enabled SnpEff in the HybridSelectionPipeline. For now, we recommend only running with the GRCh37.64 database.	2012-01-03 15:18:36 -05:00
Christopher Hartl	93e1417b6e	Update to the VSS GATK documentation.	2012-01-03 13:39:31 -05:00
Eric Banks	ab8d47d9a5	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-01-03 09:38:49 -05:00
Mauricio Carneiro	3d4bf273de	Added getPileupForReadGroups to ReadBackPileup * returns a pileup for all the read groups provided. * saves us from multiple calls to getPileup (which is very inefficient)	2012-01-03 09:35:11 -05:00
Mauricio Carneiro	4a208c7c06	Refactor of the downsampling machinery to accept different strategies * Implemented Adaptive downsampler * Added integration test * Added option to RRead scala script to choose downsampling strategy	2012-01-03 09:29:47 -05:00
Mauricio Carneiro	21ae3ef5f9	Added downsampling support to ReduceReads * Downsampling is now a parameter to the walker with default value of 0 (no downsampling) * Downsampling selects reads at random at the variant region window and strives to achieve uniform coverage if possible around the desired downsampling value. * Added integration test	2012-01-03 09:29:46 -05:00
Mauricio Carneiro	cd68cc239b	Added knuth-shuffle (KS) and randomSubset using KS to MathUtils * Knuth-shuffle is a simple, yet effective array permutator (hope this is good english). * added a simple randomSubset that returns a random subset without repeats of any given array with the same probability for every permutation. * added unit tests to both functions	2012-01-03 09:29:46 -05:00
Mauricio Carneiro	94791a2a75	Add support for reads starting with insertion * Modified cleanCigarShift to allow insertions in the beginning and end of the read * Allowed cigars starting/ending in insertions in the systematic ReadClipper tests * Updated all ReadClipper unit tests * ReduceReads does not hard clip leading insertions by default anymore * SlidingWindow adjusts start location if read starts with insertion * SlidingWindow creates an empty element with insertions to the right * Fixed all potential divide by zero with totalCount() (from BaseCounts) * Updated all Integration tests * Added new integration test for multiple interval reducing	2012-01-03 09:29:45 -05:00
Mark DePristo	d05f0c2318	GATKPerformanceOverTime script update -- Automatic detection of most recent version of GATK release (just tell the script now to use 1.2, 1.3, and 1.4) -- Uses 1.4 now -- By default we do 9 runs of each non-parallel test -- In PathUtils added convenience utility to find most recent release GATK jar with a specific release number	2012-01-02 09:58:46 -05:00
Eric Banks	393993e0c7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-31 20:42:46 -05:00
Mauricio Carneiro	55cfa76cf3	Updated integration tests for the new adaptor clipping fix.	2011-12-30 18:47:14 -05:00
Mauricio Carneiro	c7d0a9ebee	Forgot to test for inter-chromosomal mates in the adaptor clipping * Fixing bug caught by Eric (and Kristian)	2011-12-30 00:19:53 -05:00
Matt Hanna	a259bfefd4	First commit addressing problems running RTC in parallel. Turns out that because the RTC is the first walker to 'correctly' tree reduce according to functional programming standards, the RTC has revealed a few problems with the tree reducer holding on to too much data. This is the first and smaller of two commits to reduce memory consumption. The second commit will likely be pushed after GATK1.4 is released.	2011-12-29 16:22:14 -05:00
Eric Banks	1a45ea5a05	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-29 11:37:15 -05:00
Mauricio Carneiro	f692911903	GATKSAMRecord emptyRead static constructor * Creates an empty GATKSAMRecord with empty (not null) Cigar, bases and quals. Allows empty reads to be probed without breaking. * All ReadClipper utilities now emit empty reads for fully clipped reads	2011-12-27 17:01:17 -05:00
Mauricio Carneiro	8259c748f2	No more Filtered Reads tag. All synthetic reads are marked with the reduced read tag.	2011-12-27 17:01:17 -05:00
Eric Banks	d20a25d681	A much better way of choosing the alternate allele(s) to genotype in the SNP model of UG: instead of looking at the sum of base qualities (which can and did lead to us over-genotyping esp. when allowing multiple alternate alleles), we look at the likelihoods themselves (free since we are already calculating likelihoods for all 10 genotypes). Now, even if the base quals exceed some arbitrary threshold, we only bother genotyping an alternate allele when there's a sample for which it is more likely than ref/ref (I can generate weird edge cases where this falls apart, but none that model truly variable sites that we actually want to call). This leads to a huge efficiency improvement esp. for exomes (and esp. for many samples) where we almost always were trying to genotype all 3 alternate alleles. Integration tests change only because ref calls have slight QUAL differences (because the best alt allele is still chosen arbitrarily, but differently).	2011-12-27 16:50:38 -05:00
Eric Banks	adff40ff58	Minor optimizations to avoid extra processing (esp. for reduced reads)	2011-12-27 13:16:25 -05:00
Mauricio Carneiro	17bfe48d5e	Made all class methods private in the ReadClipper * ReadClipperUnitTest now uses static methods * Haplotype caller now uses static methods * Exon Junction Genotyper now uses static methods	2011-12-27 02:11:32 -05:00
Eric Banks	dd990061f6	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-26 14:45:35 -05:00
Eric Banks	2130b39f33	Found the bug in the engine: RodLocusView was using the wrong seek method so that it would only move to the first locus of a shard (and with multi-locus shards, this meant that we never processed RODs from the other positions). In fact, because the seek(Shard) method is extremely misleading and now no longer used, I think it's safer to delete it and make everyone use the much more transparent seek(GenomeLoc). Note that I have not re-enabled my improvements to the intervals accumulation of ReferenceDataSource because that inefficiency is still present downstream in RodLocusView; need to discuss those changes with Matt.	2011-12-26 14:45:19 -05:00
Mauricio Carneiro	35c41409a1	Better contracts and docs for the ReadClipper * Described the ReadClipper contract in the top of the class * Added contracts where applicable * Added descriptive information to all tools in the read clipper * Organized public members and static methods together with the same javadoc	2011-12-23 19:36:57 -05:00
David Roazen	506c0e9c97	Disabling SnpEff support in the GATK and SnpEff annotation in the HybridSelectionPipeline SnpEff support will remain disabled until SnpEff 2.0.4 has been officially released and we've verified the quality of its annotations.	2011-12-23 19:12:57 -05:00
Eric Banks	24c84da60d	'Fixing' the changes in ReferenceDataSource so that a shard properly contains a list of GenomeLocs instead of a single merged one. However, that uncovered a probable bug in the engine, so instead of letting this code fester unfixed in the build (affecting everyone in the group) I've decided to revert the previous (slow, but working) version and fix the engine in my own branch.	2011-12-23 15:39:12 -05:00
Eric Banks	8762313a0d	Better TODO message	2011-12-22 20:54:35 -05:00
Eric Banks	a815e875a8	Removing debugging output	2011-12-22 15:49:11 -05:00
Eric Banks	deef542a38	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-22 15:44:58 -05:00
Eric Banks	6d260ec6ae	Start printing traversal stats after 30 seconds. I can't stand waiting 2 minutes.	2011-12-22 15:40:59 -05:00
Mauricio Carneiro	cadff40247	getRefCoordSoftUnclippedStart and End refactor These functions are methods of the read, and supplement getAlignmentStart() and getUnclippedStart() by calculating the unclipped start counting only soft clips. * Removed from ReadUtils * Added to GATKSAMRecord * Changed name to getSoftStart() and getSoftEnd * Updated third party code accordingly.	2011-12-20 17:48:51 -05:00
Mauricio Carneiro	07128a2ad2	ReadUtils cleanup * Removed all clipping functionality from ReadUtils (it should all be done using the ReadClipper now) * Cleaned up functionality that wasn't being used or had been superseded by other code (in an effort to reduce multiple unsupported implementations) * Made all meaningful functions public and added better comments/explanation to the headers	2011-12-20 17:48:40 -05:00
Mauricio Carneiro	1c4774c475	Static versions of the hard clipping utilities For simplified access to the hard clipping utilities. No need to create a ReadClipper object if you are not doing multiple complicated clipping operations, just use the static methods. examples: ReadClipper.hardClipLowQualEnds(2); ReadClipper.hardClipAdaptorSequence();	2011-12-20 17:48:39 -05:00
Mauricio Carneiro	f73ad1c2e2	Bugfix/Rewrite: Algorithm to determine adaptor boundaries The algorithm wasn't accounting for the case where the read is the reverse strand and the insert size is negative. * Fixed and rewrote for more clarity (with Ryan, Mark and Eric). * Restructured the code to handle GATKSAMRecords only * Cleaned up the other structures and functions around it to minimize clutter and potential for error. * Added unit tests for all 4 cases of adaptor boundaries.	2011-12-20 17:48:39 -05:00
Mark DePristo	0cc5c3d799	General improvements to Queue -- Support for collecting resources info from DRMAA runners -- Disabled the non-standard mem_free argument so that we can actually use our own SGE cluster gsa4 -- NCoresRequest is a testing queue script for this. -- Added two command line arguments: -- multiCoreJerk: don't request multiple cores for jobs with nt > 1. This was the old behavior but it's really not the best way to run parallel jobs. Now with queue if you run nt = 4 the system requests 4 cores on your host. If this flag is thrown, though, it will only request 1 and you'll just use 4, like a jerk -- job_parallel_env: parallel environment named used with SGE to request multicore jobs. Equivalent to -pe job_parallel_env NT for NT > 1 jobs	2011-12-20 14:05:09 -05:00
Eric Banks	7204fcc2c3	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-20 12:59:11 -05:00
Eric Banks	8ade2d6ac2	max_alternate_alleles also ready to be made public	2011-12-20 12:59:02 -05:00
Eric Banks	6f52bd580b	--multiallelic mode is not hidden anymore (but it is annotated as advanced); added docs	2011-12-20 12:47:38 -05:00
Mauricio Carneiro	37e0044c48	Removing unclipSoftClipBases from ReadUtils * it was buggy and dangerous. * Updated Chris' code to use the ReadClipper.	2011-12-20 00:11:26 -05:00
Mauricio Carneiro	78d9bf7196	Added REVERT_SOFTCLIPPED_BASES capability to ReadClipper * New ClippingOp REVERT_SOFTCLIPPED_BASES turns soft clipped bases into matches. * Added functionality to clipping op to revert all soft clip bases in a read into matches * Added revertSoftClipBases function to the ReadClipper for public use * Wrote systematic unit tests	2011-12-20 00:04:30 -05:00
Christopher Hartl	24585062f8	Merge branch 'incoming'	2011-12-19 23:16:36 -05:00
Christopher Hartl	67298f8a11	AFCR made public (for use in VSS) Minor changes to ValidationSiteSelector logic (SampleSelectors determine whether a site is valid for output, no actual subset context need be operated on beyond that determination). Implementation of GL-based site selection. Minor changes to EJG.	2011-12-19 23:14:26 -05:00
Eric Banks	06d385e619	Simplifying the interface a bit	2011-12-19 15:29:46 -05:00
Christopher Hartl	339ef92eac	Goodbye SW by default. Now aligned reads that overlap intron-exon junctions are scored where they are by default, but warns the user (and flags the record in the VCF) if there's evidence to suggest that there is an indel throwing off the scoring (e.g. if the best score of a realigned unmapped read is >5 log orders better than the best score of a scored mapped read). Unmapped reads are still SW-aligned to the junction-junction sequence. This should result in a rather massive speedup, so far untested. UGBoundAF has to go in at some point. In the process of rewriting the math for bounding the allele frequency (it was assuming uniform tails, which is silly since i derived the posterior distribution in closed form sometime back, just need to find it)	2011-12-19 12:18:18 -05:00
Christopher Hartl	418d22b67e	Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/genotyper/IntronLossGenotyperV2.java	2011-12-19 10:59:18 -05:00
Christopher Hartl	69661da37d	Moving ValidationSiteSelector to validation package in public under my ownership. JunctionGenotyper added and modified several times, this commit is due to merging conflix fixes.	2011-12-19 10:57:28 -05:00
Laurent Francioli	16cc2b864e	- Corrected bug causing cases where both parents are HET to be accounted twice in the TDT calculation - Adapted TDT Integration test to corrected version of TDT Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>	2011-12-19 10:30:59 -05:00
Eric Banks	5fd19ae734	Commented exactly how the results are represented from the exact model so developers can know how to use them.	2011-12-19 10:19:00 -05:00
Eric Banks	3069a689fe	Bug fix: if there are multiple records at a given position, it turns out that SelectVariants would drop all variants that follow after one that fails filters (instead of dropping just the failing one). Added an integration test to cover this case.	2011-12-19 10:04:33 -05:00
Mauricio Carneiro	5b678e3b94	Remove ClippingOp UnitTests * all testing functionality is in the ReadClipperUnitTest, no need to double test. * class and package naming cleanup	2011-12-19 07:49:26 -05:00
Matt Hanna	1ead00cac5	New fork of SamFileHeaderMerger should be cached at the thread level to enable fast (and valid) thread lookups.	2011-12-18 19:04:26 -05:00
Ryan Poplin	bc842ab3a5	Adding option to VariantAnnotator to do strict allele matching when annotating with comp track concordance.	2011-12-18 15:27:23 -05:00
Ryan Poplin	953998dcd0	Now that getSampleDB is public in the walker base class this override in VariantAnnotator isn't necessary.	2011-12-18 14:38:59 -05:00
Eric Banks	07f9d14d9f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-18 00:43:15 -05:00
Eric Banks	c5ffe0ab04	No reason to sum the normalized posteriors array to get Pr(AF>0) given that we can just compute 1.0 - array[0]. Integration tests change only because of trivial precision artifacts for reference calls using EMIT_ALL_SITES.	2011-12-18 00:31:47 -05:00
Eric Banks	6dc52d42bf	Implemented the proper QUAL calculation for multi-allelic calls. Integration tests pass except for the ones making multi-allelic calls (duh) and one of the SLOD tests (which used to print 0 when one of the LODs was NaN but now we just don't print the SB annotation for that record).	2011-12-18 00:01:42 -05:00
Khalid Shakir	7486696c07	When using bam list mode in HSP deriving VCF name from bam list instead of requiring an additional parameter. Creating a single temporary directory per ant test run instead of a putting temp files across all runs in the same directory. Updated various tests for above items and other small fixes.	2011-12-16 18:09:25 -05:00
Mauricio Carneiro	fcc21180e8	Added hardClipLeadingInsertions UnitTest for the ReadClipper fixed issue where a read starting with an insertion followed by a deletion would break, clipper can now safely clip the insertion and the deletion if that's the case. note: test is turned off until contract changes to allow hanging insertions (left/right).	2011-12-16 18:02:47 -05:00
Mauricio Carneiro	5bba44d693	Added hardClipByReferenceCoordinates UnitTest for the ReadClipper * fixed edge case when requested to hard clip beginning of a read that had hanging soft clipped bases on the left tail. * fixed edge case when requested to hard clip end of a read that had hanging soft clipped bases on the right tail. * fixed AlignmentStart of a clipped read that results in only hard clips and soft clips note: added tests to all these beautiful cases...	2011-12-16 18:01:33 -05:00
Mark DePristo	1994c3e3bc	Only print warning about allele incompatibility when running there are genotypes in the file in CombineVariants	2011-12-16 16:50:51 -05:00
Mark DePristo	b6067be952	Support for selecting only variants with specific IDs from a file in SelectVariants -- Cleaned up unused variables as well	2011-12-16 16:50:39 -05:00
Mark DePristo	d6d2f49c88	Don't print log if there are no BAMs	2011-12-16 16:50:36 -05:00
Mark DePristo	78e0950a77	Minor bug fix for printing in SAMDataSource	2011-12-16 11:45:40 -05:00
Mark DePristo	7bc0d18418	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-16 11:42:42 -05:00
Ryan Poplin	5aa79dacfc	Changing hidden optimization argument to advanced.	2011-12-16 10:29:20 -05:00
Matt Hanna	3642a73c07	Performance improvements for dynamically merging BAMs in read walkers. This change and my previous change have dropped runtime when dynamically merging 2k BAM files from 72.6min/1M reads to 46.8sec/1M reads. Note that many of these changes are stopgaps -- the real problem is the way ReadWalkers interface with Picard, and I'll have to work with Tim&Co to produce a more maintainable patch.	2011-12-16 09:37:44 -05:00
Mark DePristo	3414ecfe2e	Restored serial version of reader initialization. Serial mode is default, as the performance gains aren't so huge. -- Serial version can be re-enabled with a static boolean, if we decide to return to the serial version -- Comparison of serial and parallel reader with cached and uncached files: Initialization time: serial with 500 fully cached BAMs: 8.20 seconds Initialization time: serial with 500 uncached BAMs : 197.02 seconds Initialization time: parallel with 500 fully cached BAMs: 30.12 seconds Initialization time: parallel with 500 uncached BAMs : 75.47 seconds	2011-12-16 09:22:10 -05:00
Mark DePristo	fb1c9d2abc	Restored serial version of reader initialization. Parallel mode is default. -- Serial version can be re-enabled with a static boolean, if we decide to return to the serial version	2011-12-16 09:05:28 -05:00
Mauricio Carneiro	e61e5c7589	Refactor of ReadClipper unit tests * expanded the systematic cigar string space test framework Roger wrote to all tests * moved utility functions into Utils and ReadUtils * cleaned up unused classes	2011-12-15 19:05:43 -05:00
Mauricio Carneiro	4748ae0a14	Bugfix: Softclips before Hardclips weren't being accounted for caught a bug in the hard clipper where it does not account for hard clipping softclipped bases in the resulting cigar string, if there is already a hard clipped base immediately after it. * updated unit test for hardClipSoftClippedBases with corresponding test-case.	2011-12-15 12:17:25 -05:00
Mauricio Carneiro	62a2e335bc	Changing HardClipper contract to allow UNMAPPED reads shifted the contract to functions that operate on reference based coordinates. The clipper should do the right thing with unmapped reads, but it needs more testing (Ryan is using it at the moment and says it works). Will write some unit tests.	2011-12-15 11:08:19 -05:00
Matt Hanna	9333b678b5	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-14 18:05:44 -05:00
Matt Hanna	6fb4be1a09	Cache header merger.	2011-12-14 18:05:31 -05:00
Mauricio Carneiro	128bdf9c09	Create artificial reads with "default" parameters * added functions to create synthetic reads for unit testing with reasonable default parameters * added more functions to create synthetic reads based on cigar string + bases and quals.	2011-12-14 16:58:14 -05:00
Mauricio Carneiro	c85100ce9c	Fix ClippingOp bug when performing multiple hardclip ops bug: When performing multiple hard clip operations in a read that has indels, if the N+1 hardclip requests to clip inside an indel that has been removed by one of the (1..N) previous hardclips, the hard clipper would go out of bounds. fix: dynamically adjust the boundaries according to the new hardclipped read length. (this maintains the current contract that hardclipping will never return a read starting or ending in indels).	2011-12-14 16:57:47 -05:00
Eric Banks	de5928ac5a	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-14 16:24:56 -05:00
Eric Banks	4fddac9f22	Updating busted integration tests	2011-12-14 16:24:43 -05:00
Mark DePristo	01e547eed3	Parallel SAMDataSource initialization -- Uses 8 threads to load BAM files and indices in parallel, decreasing costs to read thousands of BAM files by a significant amount -- Added logger.info message noting progress and cost of reading low-level BAM data.	2011-12-14 16:14:26 -05:00
Mark DePristo	71b4bb12b7	Bug fix for incorrect logic in subsetSamples -- Now properly handles the case where a sample isn't present (no longer adds a null to the genotypes list) -- Fix for logic failure where if the number of requested samples equals the number of known genotypes then all of the records were returned, which isn't correct when there are missing samples. -- Unit tests added to handle these cases	2011-12-14 16:14:26 -05:00
Eric Banks	35fc2e13c3	Using the new PL cache, fix a bug: when only a subset of the genotyped alleles are used for assigning genotypes (because the exact model determined that they weren't all real) the PLs need to be adjusted to reflect this. While fixing this I discovered that the integration tests are busted because ref calls (ALT=.) were getting annotated with PLs, which makes no sense at all.	2011-12-14 15:31:09 -05:00
Eric Banks	1e90d602a4	Optimization: cache up front the PL index to the pair of alleles it represents for all possible numbers of alternate alleles.	2011-12-14 13:38:20 -05:00
Eric Banks	988d60091f	Forgot to add in the new result class	2011-12-14 13:37:15 -05:00
Eric Banks	106bf13056	Use a thread local result object to collect the results of the exact calculation instead of passing in multiple pre-allocated arrays.	2011-12-14 12:05:50 -05:00
Eric Banks	7648521718	Add check for mixed genotype so that we don't exception out for a valid record	2011-12-14 11:26:43 -05:00
Eric Banks	9497e9492c	Bug fix for complex records: do not ever reverse clip out a complete allele.	2011-12-14 11:21:28 -05:00
Eric Banks	09a5a9eac0	Don't update lineNo for decodeLoc - only for decode (otherwise they get double-counted). Even still, because of the way the GATK currently utilizes Tribble we can parse the same line multiple times, which knocks the line counter out of sync. For now, I've added a TODO in the code to remind us and the error messages note that it's an approximate line number.	2011-12-14 10:43:52 -05:00
Eric Banks	d3f4a5a901	Fail gracefully when encountering malformed VCFs without enough data columns	2011-12-14 10:37:38 -05:00
Eric Banks	079932ba2a	The log10cache needs to be larger if we want to handle 10K samples in the UG.	2011-12-13 23:36:10 -05:00
Ryan Poplin	7fa1ab1bae	Fix to allow haplotype caller to call indels after UG engine entry points were unified. Adding Haplotype Caller integration test	2011-12-13 17:19:40 -05:00
Eric Banks	e47a113c9f	Enabled multi-allelic SNP discovery in the UG. Needs loads of testing so do not use yet. While working in the UG engine, I removed the extraneous and unnecessary MultiallelicGenotypeLikelihoods class: now a VariantContext with PL-annotated Genotypes is passed around instead. Integration tests pass so it must all work, right?	2011-12-12 23:02:45 -05:00
Mauricio Carneiro	5cc1e72fdb	Parallelized SelectVariants * can now use -nt with SelectVariants for significant speedup in large files * added parallelization integration tests for SelectVariants	2011-12-12 18:41:14 -05:00
Mauricio Carneiro	a70a0f25fb	Better debug output for SAMDataSource output the name and number of the files being loaded by the GATK instead of "coordinate sorted".	2011-12-12 17:57:29 -05:00
Mark DePristo	d03425df2f	TODO optimization targets	2011-12-12 17:39:51 -05:00
Laurent Francioli	025bdfe2cc	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-12 12:19:44 +01:00
Eric Banks	7b6338c742	Merge branch 'master' into trialleles	2011-12-11 00:28:46 -05:00
Eric Banks	7c4b9338ad	The old bi-allelic implementation of the Exact model has been completely deprecated - you can only use the multi-allelic implementation now.	2011-12-11 00:23:33 -05:00
Eric Banks	044f211a30	Don't collapse likelihoods over all alt alleles - that's just not right. For now, the QUAL is calculated for just the most likely of the alt alleles; I need to think about the right way to handle this properly.	2011-12-10 23:57:14 -05:00
Eric Banks	364f1a030b	Plumbing added so that the UG engine can handle multiple alleles and they can successfully be genotyped. Alleles that aren't likely are not allowed to be used when assigning genotypes, but otherwise the greedy PL-based approach is what is used. Moved assign genotypes code to UG engine since it has nothing to do with the Exact model. Still have some TODOs in here before I can push this out to everyone.	2011-12-09 14:25:28 -05:00
Eric Banks	64dad13e2d	Don't carry around an extra copy of the code for the Haplotype Caller	2011-12-09 11:09:40 -05:00
Eric Banks	442ceb6ad9	The Exact model now computes both the likelihoods and posteriors (in separate arrays); likelihoods are used for assigning genotypes, not the posteriors.	2011-12-09 10:16:44 -05:00
Laurent Francioli	a79144f7db	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-09 15:57:24 +01:00
Laurent Francioli	5a06170804	Corrected bug causing getChildrenWithParents() to not take the last family member into consideration.	2011-12-09 14:51:34 +01:00
Eric Banks	aa4a8c5303	No dynamic programming solution for assignning genotypes; just done greedily now. Fixed QualByDepth to skip no-call genotypes. No-calls are no longer given annotations (attributes).	2011-12-09 02:25:06 -05:00
Eric Banks	8777288a9f	Don't throw a UserException if too many alt alleles are trying to be genotyped. Instead, I've added an argument that allows the user to set the max number of alt alleles to genotype and the UG warns and skips any sites with more than that number.	2011-12-09 00:00:20 -05:00
Eric Banks	3e7714629f	Scrapped the whole idea of an int/long as an index into the ACset: with lots of alternate alleles we run into overflow issues. Instead, simply use the ACcounts array as the hash key since it is unique for each AC conformation. To do this, it needed to be wrapped inside an object so hashcode() would work.	2011-12-08 23:50:54 -05:00
Eric Banks	4aebe99445	Need to use longs for the set index (because we can run out of ints when there are too many alternate alleles). Integration tests now use the multiallelic implementation.	2011-12-08 15:31:02 -05:00
Eric Banks	7750bafb12	Fixed bug where last dependent set index wasn't properly being transferred for sites with many alleles. Adding debugging output.	2011-12-08 13:50:50 -05:00
Guillermo del Angel	252e0f3d0a	Merged bug fix from Stable into Unstable	2011-12-08 13:11:39 -05:00
Guillermo del Angel	1bfe28067f	Don't try to genotype an indel even bigger than the reference window size, or else we'll be out of bounds. Necessary to handle Phase 1 integrated callset with large deletions. Better error indication when validating a GenomeLoc.	2011-12-08 12:54:08 -05:00
Mark DePristo	9def841275	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-07 13:36:16 -05:00
Mark DePristo	4055877708	Prints 0.0 TiTv not NaN when there are no variants -- Updated md5	2011-12-07 12:07:54 -05:00
Matt Hanna	15533e08df	Fixed issue with RODWalker parallelization. Turns out that someone previously upped the declared size of a ROD shard to 100M bases, making each ROD shard larger than the size of chr20. Why didn't we see this in Stable? Because the ShardStrategy/ShardStrategyFactory mechanism was dutifully ignoring the shard size specification. When I rolled the ShardStrategy/ShardStrategyFactory mechanics back into the DataSources as part of the async I/O project, I inadvertently reenabled this specifier.	2011-12-07 11:55:42 -05:00
Mark DePristo	5d2212bc8e	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-07 09:03:17 -05:00
Mark DePristo	6bf18899df	Fix for variant summary -- now treats all 50 bp deletions or insertions as CNVs	2011-12-07 09:02:49 -05:00
Matt Hanna	c9b2cd8ba5	Fix for chartl's stale null representation issue.	2011-12-06 18:05:17 -05:00
Eric Banks	79d18dc078	Fixing indexing bug on the ACsets. Added unit tests for the Exact model code.	2011-12-06 16:17:18 -05:00
Matt Hanna	f5b977fc88	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-06 10:11:35 -05:00
Matt Hanna	4001c22a11	Better file count / buffering variation in test suite. Parameterized read shard buffering. Misc cleanup.	2011-12-06 10:10:38 -05:00
Khalid Shakir	677bea0abd	Right aligning GATKReport numeric columns and updated MD5s in tests. PreQC parses file with spaces in sample names by using tabs only. PostQC allows passing the file names for the evals so that flanks can be evaled. BaseTest's network temp dir now adds the user name to the path so files aren't created in the root. HybridSelectionPipeline: - Updated to latest versions of reference data. - Refactored Picard parsing code replacing YAML.	2011-12-05 23:22:15 -05:00
Eric Banks	7a0f6feda4	Make sure that too many alternate alleles aren't being passed to the genotyper (10 for now) and exit with a UserError if there are.	2011-12-05 16:18:52 -05:00
Eric Banks	7fac4afab3	Fixed priors (now initialized upon engine startup in a multi-dimensional array) and cell coefficients (properly handles the generalized closed form representation for multiple alleles).	2011-12-05 15:57:25 -05:00
Eric Banks	a7cb941417	The posteriors vector is now 2 dimensional so that it supports multiple alleles (although the UG is still hard-coded to use only array[0] for now); the exact model now collapses probabilities for all conformations over a given AC into the posteriors array (in the appropriate dimension). Fixed a bug where the priors and posteriors were being passed in swapped.	2011-12-04 13:02:53 -05:00
Eric Banks	eab2b76c9b	Added loads of comments for future reference	2011-12-03 23:54:42 -05:00
Eric Banks	29662be3d7	Fixed bug where k=2N case wasn't properly being computed. Added optimization for BB genotype case not in old model. At this point, integration tests pass except for 1 case where QUALs differ by 0.01 (this is okay because I occasionally need to compute extra cells in the matrix which affects the approximations) and 2 cases where multi-allelic indels are being genotyped (some work still needs to be done to support them).	2011-12-03 23:12:04 -05:00
Eric Banks	71f793b71b	First partially working version of the multi-allelic version of the Exact AF calculation	2011-12-02 14:13:14 -05:00
David Roazen	d014c7faf9	Queue now properly escapes all shell arguments in generated shell scripts This has implications for both Qscript authors and CommandLineFunction authors. Qscript authors: You no longer need to (and in fact must not) manually escape String values to avoid interpretation by the shell when setting up Walker parameters. Queue will safely escape all of your Strings for you so that they'll be interpreted literally. Eg., Old way: filterSNPs.filterExpression = List("\"QD<2.0\"", "\"MQ<40.0\"", "\"HaplotypeScore>13.0\"") New way: filterSNPs.filterExpression = List("QD<2.0", "MQ<40.0", "HaplotypeScore>13.0") CommandLineFunction authors: If you're writing a one-off CommandLineFunction in a Qscript and don't really care about quoting issues, just keep doing things the direct, simple way: def commandLine = "cat %s \| grep -v \"#\" > %s".format(files, out) If you're writing a CommandLineFunction that will become part of Queue and will be used by other QScripts, however, it's advisable to do things the newer, safer way, ie.: When you construct your commandLine, you should do so ONLY using the API methods required(), optional(), conditional(), and repeat(). These will manage quoting and whitespace separation for you, so you shouldn't insert quotes/extraneous whitespace in your Strings. By default you get both (quoting and whitespace separation), but you can disable either of these via parameters. Eg., override def commandLine = super.commandLine + required("eff") + conditional(verbose, "-v") + optional("-c", config) + required("-i", "vcf") + required("-o", "vcf") + required(genomeVersion) + required(inVcf) + required(">", escape=false) + // This will be shell-interpreted required(outVcf) I've ported the Picard/Samtools/SnpEff CommandLineFunction classes to the new system, so you'll get free shell escaping when you use those in Qscripts just like with walkers.	2011-12-01 18:13:44 -05:00
Mark DePristo	3060a4a15e	Support for list of known CNVs in VariantEval -- VariantSummary now includes novelty of CNVs by reciprocal overlap detection using the standard variant eval -knownCNVs argument -- Genericizes loading for intervals into interval tree by chromosome -- GenomeLoc methods for reciprocal overlap detection, with unit tests	2011-11-30 17:05:16 -05:00
Matt Hanna	b65db6a854	First draft of a test script for I/O performance with the new asynchronous I/O processing. Also includes convenience parameters for specifying the IO/CPU threading balance outside of a tag. Will be killed when Queue gets better support for tagged arguments (hopefully soon).	2011-11-30 13:13:16 -05:00
Laurent Francioli	1d5d200790	Cleaned up unused import statements	2011-11-30 15:30:30 +01:00
Mark DePristo	28b286ad39	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-30 09:11:53 -05:00
Laurent Francioli	20bffe0430	Adapted for the new version of MendelianViolation	2011-11-30 14:46:38 +01:00
Laurent Francioli	1cb5e9e149	Removed outdated (and unused) -familyStr commandline argument	2011-11-30 14:45:04 +01:00
Laurent Francioli	f49dc5c067	Added functionality to get all children that have both parents (useful when trios are needed)	2011-11-30 14:43:37 +01:00
Laurent Francioli	a4606f9cfe	Merge branch 'MendelianViolation' Conflicts: public/java/src/org/broadinstitute/sting/utils/MendelianViolation.java	2011-11-30 11:13:15 +01:00
Laurent Francioli	b279ae4ead	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-30 10:10:21 +01:00
Ryan Poplin	91413cf0d9	Merged bug fix from Stable into Unstable	2011-11-29 14:01:23 -05:00
Ryan Poplin	cb284eebde	Further updating VQSR tutorial wiki docs to reflect the bundle	2011-11-29 14:00:57 -05:00
Ryan Poplin	dcb889665d	Merged bug fix from Stable into Unstable	2011-11-29 09:58:49 -05:00
Ryan Poplin	447e9bff9e	Updating VQSR tutorial wiki docs to reflect the bundle	2011-11-29 09:57:45 -05:00
Ryan Poplin	110298322c	Adding Transmission Disequilibrium Test annotation to VariantAnnotator and integration test to test it.	2011-11-29 09:29:18 -05:00
Laurent Francioli	ab67011791	Corrected bug introduced in the last update and causing no families to be returned by getFamilies in case the samples were not specified	2011-11-29 11:18:15 +01:00
Eric Banks	d7d8b8e380	Tribble v42 changes the Codec.canDecode method to take in a String instead of a File; this is something that Jim was adamant about (because Tribble can handle streams other than files). I didn't want the next person who needed to rev Tribble to deal with this change additionally, so I took care of updating the GATK now.	2011-11-28 14:18:28 -05:00
Laurent Francioli	a09c01fcec	Removed walker argument FamilyStructure as this is now supported by the engine (ped file)	2011-11-28 17:18:11 +01:00
Laurent Francioli	795c99d693	Adapted MendelianViolation to the new ped family representation. Adapted all classes using MendelianViolation too. MendelianViolationEvaluator was added a number of useful metrics on allele transmission and MVs	2011-11-28 17:13:14 +01:00
Laurent Francioli	e877db8f42	Changed visibility of getSampleDB from protected to public as the sampleDB needs to be accessible from Annotators and Evaluators too.	2011-11-28 17:11:30 +01:00
Laurent Francioli	5c2595701c	Added a function to get families only for a given list of samples.	2011-11-28 17:10:33 +01:00
Mark DePristo	3c36428a20	Bug fix for TiTv calculation -- shouldn't be rounding	2011-11-28 10:20:34 -05:00
Eric Banks	436b4dc855	Updated docs	2011-11-28 08:59:48 -05:00
Laurent Francioli	b1dd632d5d	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java	2011-11-25 16:16:44 +01:00
Mark DePristo	e319079c32	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-23 13:02:11 -05:00
Mark DePristo	4107636144	VariantEval updates -- Performance optimizations -- Tables now are cleanly formatted (floats are %.2f printed) -- VariantSummary is a standard report now -- Removed CompEvalGenotypes (it didn't do anything) -- Deleted unused classes in GenotypeConcordance -- Updates integration tests as appropriate	2011-11-23 13:02:07 -05:00
David Roazen	e5b85f0a78	A toString() method for IntervalBindings Necessary since we're currently writing things like this to our VCF headers: intervals=[org.broadinstitute.sting.commandline.IntervalBinding@4ce66f56]	2011-11-23 11:56:12 -05:00
Mark DePristo	5a4856b82e	GATKReports now support a format field per column -- You can tell the table to format your object with "%.2f" for example.	2011-11-23 11:31:04 -05:00
Mark DePristo	c8bf7d2099	Check for null comment	2011-11-23 10:47:21 -05:00
Mark DePristo	6c2555885c	Caching getSimpleName() in VariantEval is a big performance improvement -- Removed the SimpleMetricsByAC table, as one should just use the AlleleCount Stratefication and the upcoming VariantSummary table	2011-11-23 08:34:05 -05:00
Guillermo del Angel	32adbd614f	Solve merge conflict	2011-11-22 22:48:46 -05:00
Guillermo del Angel	941f3784dc	Solve merge conflict	2011-11-22 22:48:03 -05:00
Guillermo del Angel	75d93e6335	Another corner condition fix: skip likelihood computation in case we cut so many bases there's no haplotype or read left	2011-11-22 22:46:12 -05:00
Mark DePristo	a3aef8fa53	Final performance optimization for GenotypesContext	2011-11-22 17:19:30 -05:00
Mark DePristo	990c02e4de	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-22 17:19:11 -05:00
Guillermo del Angel	38a90da92c	Fixed merge conflict to Unstable	2011-11-22 14:39:45 -05:00
Guillermo del Angel	32a77a8a56	Prevent out of bound error in case read span > reference context + indel length. Can happen in RNAseq reads with long N CIGAR operators in the middle.	2011-11-22 13:57:24 -05:00
Eric Banks	5821c11fad	For BAM and Reviewed errors we now check the error message to see if it's actually a 'too many open files' problem and, if so, we generate a User Error instead.	2011-11-22 10:50:22 -05:00
Mark DePristo	7087310373	Embarassing bug fixed	2011-11-22 10:16:36 -05:00
Mark DePristo	e484625594	GenotypesContext now updates cached data for add, set, replace operations when possible -- Involved separately managing the sample -> offset and sample sorted list operations. This should improve performance throughout the system	2011-11-22 08:40:48 -05:00
Mark DePristo	2b51c01df4	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-21 19:16:06 -05:00
Mark DePristo	5443d3634a	Again, fixing the add call when we really mean replace -- Updating MD5s for UG to reflect that what was previously called ./.:.:10:0,0,0 is now just ./. Eric will fix long-standing bug in QD observed from this change -- VFW MD5s restored to their old correct values. There was a bug in my implementation to caused the genotypes to not be parsed from the lazy output even through the header was incorrect.	2011-11-21 19:15:56 -05:00
Mauricio Carneiro	5ad3dfcd62	BugFix: byte overflow in SyntheticRead compressed base counts * fixed and added unit test	2011-11-21 17:11:50 -05:00
Mark DePristo	9ea7b70a02	Added decode method to LazyGenotypesContext -- AbstractVCFCodec calls this if the samples are not sorted. Previously called getGenotypes() which didn't actually trigger the decode	2011-11-21 16:21:23 -05:00
Mark DePristo	ab2efe3bd3	Reverting bad exact model changes	2011-11-21 16:14:40 -05:00
Eric Banks	44554b2bfd	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-21 15:01:45 -05:00
Eric Banks	022832bd74	Very bad use of the == operator with Strings was ensuring that validating GenomeLocs was very inefficient. This fix resulted in a significant speedup for a simple RodWalker.	2011-11-21 14:49:47 -05:00
Mark DePristo	1561af22af	Exact model code cleanup -- Fixed up code when fixing a bug detected by aggressive contracts in GenotypesContext.	2011-11-21 14:35:15 -05:00
Mark DePristo	2c501364b8	GenotypesContext no longer have immutability in constructor -- additional bug fixes throughout VariantContext and GenotypesContext objects	2011-11-21 14:34:31 -05:00
David Roazen	1296dd41be	Removing the legacy -L "interval1;interval2" syntax This syntax predates the ability to have multiple -L arguments, is inconsistent with the syntax of all other GATK arguments, requires quoting to avoid interpretation by the shell, and was causing problems in Queue. A UserException is now thrown if someone tries to use this syntax.	2011-11-21 13:18:53 -05:00
Mark DePristo	e467b8e1ae	More contracts on LazyGenotypesContext	2011-11-21 09:34:57 -05:00
Mark DePristo	2e9ecf639e	Generalized interface to LazyGenotypesContext -- Now you provide a LazyParsing object -- LazyGenotypesContext now knows nothing about the VCF parser itself. The parser holds all of the necessary data to parse the VCF genotypes when necessarily, and the LGC only has a pointer to this object -- Using new interface added LazyGenotypesContext to unit tests with a simple lazy version -- Deleted VCFParser interface, as it was no longer necessary	2011-11-21 09:30:40 -05:00
Mark DePristo	bc44f6fd9e	Utility function Collection<Genotype> -> Collection<String>	2011-11-20 18:26:56 -05:00
Mark DePristo	9445326c6c	Genotype is Comparable via sampleName	2011-11-20 18:26:27 -05:00
Mark DePristo	f9e25081ab	Completed documented LazyGenotypesContext	2011-11-20 08:35:52 -05:00
Mark DePristo	9cb3fe3a59	Vastly better way of doing on-demand genotyping loading -- With our GenotypesContext class we can naturally create a LazyGenotypesContext subclass that does the on-demand loading. -- This new class was replaced all of the old, complex functionality -- Better still, there were many cases were the genotypes were being loaded unnecessarily, resulting in efficiency. This was detected because some of the integration tests changed as the genotypes were no longer being parsing unnecessarily -- Misc. bug fixes throughout the system -- Bug fixes for PhaseByTransmission with new GenotypesContext	2011-11-20 08:23:09 -05:00
Mark DePristo	f392d330c3	Proper use of builder. Previous conversion attempt was flawed	2011-11-19 22:09:56 -05:00
Mark DePristo	7d09c0064b	Bug fixes and code cleanup throughout -- chromosomeCounts now takes builder as well, cleaning up a lot of code throughout the codebase.	2011-11-19 18:40:15 -05:00
Mark DePristo	8f7eebbaaf	Bugfix for pError not being checked correctly in CommonInfo -- UnitTests to ensure correct behavior -- UnitTests to ensure correct behavior for pass filters vs. failed filters vs. unfiltered	2011-11-19 15:58:59 -05:00
Mark DePristo	b7b57ef39a	Updating MD5 to reflect canonical ordering of calculation -- We should no longer have md5s changing because of hashmaps changing their sort order on us -- Added GenotypeLikelihoodsUnitTests -- Refactored ExactAFCaclculation to put the PL -> QUAL calculation in the GenotypeLikelihoods class to avoid the code copy.	2011-11-19 15:57:33 -05:00
Mark DePristo	73119c8e3c	Merge with master -- A few bug fixes	2011-11-19 09:56:06 -05:00
Mark DePristo	f685fff79b	Killing the final versions of old new VariantContext interface	2011-11-18 21:32:43 -05:00
Mark DePristo	6cf315e17b	Change interface to getNegLog10PError to getLog10PError	2011-11-18 21:07:30 -05:00
Mark DePristo	c7f2d5c7c7	Final minor fix to contract	2011-11-18 19:40:05 -05:00
Mauricio Carneiro	b5de182014	isEmpty now checks if mReadBases is null Since newly created reads have mReadBases == null. This is an effort to centralize the place to check for empty GATKSAMRecords.	2011-11-18 18:34:05 -05:00
Mauricio Carneiro	8ab3ee9c65	Merge remote-tracking branch 'unstable/master' into rr	2011-11-18 16:50:25 -05:00
Mauricio Carneiro	333e5de812	returning read instead of GATKSAMRecord Do not create new GATKSAMRecord when read has been fully clipped, because it is essentially the same as returning the currently fully clipped read.	2011-11-18 16:49:59 -05:00
Matt Hanna	8bb4d4dca3	First pass of the asynchronous block loader. Block loads are only triggered on queue empty at this point. Disabled by default (enable with nt:io=?).	2011-11-18 15:02:59 -05:00
Mark DePristo	a2e79fbe8a	Fixes to contracts	2011-11-18 14:18:53 -05:00
Mark DePristo	660d6009a2	Documentation and contracts for GenotypesContext and VariantContextBuilder	2011-11-18 13:59:30 -05:00
Mark DePristo	f54afc19b4	VariantContextBuilder -- New approach to making VariantContexts modeled on StringBuilder -- No more modify routines -- use VariantContextBuilder -- Renamed isPolymorphic to isPolymorphicInSamples. Same for mono -- getChromosomeCount -> getCalledChrCount -- Walkers changed to use new VariantContext. Some deprecated new VariantContext calls remain -- VCFCodec now uses optimized cached information to create GenotypesContext.	2011-11-18 12:39:10 -05:00
Eric Banks	6459784351	Merged bug fix from Stable into Unstable	2011-11-18 12:34:57 -05:00
Eric Banks	c62082ba1b	Making this class public again as per request from Cancer folks	2011-11-18 12:34:27 -05:00
Eric Banks	8710673a97	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-18 12:29:33 -05:00
Eric Banks	768b27322b	I figured out why we were getting tons of hom var genotype calls with Mauricio's low quality (synthetic) reduced reads: the RR implementation in the UG was not capping the base quality by the mapping quality, so all the low quality reads were used to generate GLs. Fixed.	2011-11-18 12:29:15 -05:00
Mark DePristo	7490dbb6eb	First version of VariantContextBuilder	2011-11-18 11:06:15 -05:00
Roger Zurawicki	f48d4cfa79	Bug fix: fully clipping GATKSAMRecords and flushing ops Reads that are emptied after clipping become new GATKSAMRecords. When applying ClippingOps, the ops are cleared after the clipping	2011-11-18 00:24:39 -05:00
Mark DePristo	fa454c88bb	UnitTests for VariantContext for chrCount, getSampleNames, Order function -- Major change to how chromosomeCounts is computed. Now NO_CALL alleles are always excluded. So ChromosomeCounts(A/.) is 1, the previous result would have been 2. -- Naming changes for getSamplesNameInOrder()	2011-11-17 20:37:22 -05:00
Mark DePristo	23359d1c6c	Bugfix for pruneVariantContext, which was dropping the ref base for padding	2011-11-17 15:32:52 -05:00
Mark DePristo	473b860312	Major determinism fix for UG and RankSumTest -- Now these routines all iterate in sample name order (genotypes.iterateInSampleNameOrder) so that the results of UG and the annotator do not depend on the particular order of samples we see for the exact model and the RankSumTest	2011-11-17 15:31:45 -05:00
Khalid Shakir	c50274e02e	During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice. Moved flanking interval utilies to IntervalUtils with UnitTests.	2011-11-17 13:56:42 -05:00
Eric Banks	16a021992b	Updated header description for the INFO and FORMAT DP fields to be more accurate.	2011-11-17 13:17:53 -05:00
Eric Banks	e7d41d8d33	Minor cleanup	2011-11-17 12:00:28 -05:00
Mark DePristo	7e66677769	Expanded UnitTests for VariantContext Tests for -- getGenotype and getGenotypes -- subContextBySample -- modify routines	2011-11-16 20:45:15 -05:00
Mark DePristo	aa0610ea92	GenotypeCollection renamed to GenotypesContext	2011-11-16 16:24:05 -05:00
Mark DePristo	caf6080402	Better algorithm for merging genotypes in CombineVariants	2011-11-16 15:17:33 -05:00
Mark DePristo	e56d52006a	Continuing bugfixes to get new VC working	2011-11-16 10:39:17 -05:00
Matt Hanna	eb8e031f75	Merged bug fix from Stable into Unstable	2011-11-16 09:57:37 -05:00
Matt Hanna	6a5d5e7ac9	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable	2011-11-16 09:57:13 -05:00
Matt Hanna	7ac5cf8430	Getting rid of unsupported CountReadPairs walker in stable. Removal of remainder of pairs processing framework to follow in unstable.	2011-11-16 09:53:59 -05:00
Eric Banks	c2ebe58712	Merge remote-tracking branch 'Laurent/master'	2011-11-16 09:34:47 -05:00
Laurent Francioli	0dc3d20d58	Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type	2011-11-16 09:33:13 +01:00
Laurent Francioli	7d77fc51f5	Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type	2011-11-16 03:32:43 -05:00
David Roazen	0d163e3f52	SnpEff 2.0.4 support -Modified the SnpEff parser to work with the SnpEff 2.0.4 VCF output format -Assigning functional classes and effect impacts now handled directly by SnpEff rather than the GATK -Removed support for SnpEff 2.0.2, as we no longer trust the output of that version since it doesn't exclude effects associated with certain nonsensical transcripts. These effects are excluded as of 2.0.4. -Updated unit and integration tests This support is based on a release-candidate of SnpEff 2.0.4, and so is subject to change between now and the next GATK release.	2011-11-15 18:36:22 -05:00
Mark DePristo	df415da4ab	More bug fixes on the way to passing all tests	2011-11-15 17:38:12 -05:00
Mark DePristo	0be23aae4e	Bugfixes on way to a working refactored VariantContext	2011-11-15 17:20:14 -05:00
Mark DePristo	231c47c039	Bugfixes on way to a working refactored VariantContext	2011-11-15 16:42:50 -05:00
Laurent Francioli	fb685f88ec	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-15 16:23:53 -05:00
Mark DePristo	2b2514dad2	Moved many unused phasing walkers and utilities to archive	2011-11-15 16:14:50 -05:00
Mark DePristo	460a51f473	ID field now stored in the VariantContext itself, not the attributes	2011-11-15 14:56:33 -05:00
Eric Banks	b45d10e6f1	The DP in the FORMAT field (per sample) must also use the representative count or else it's always 1 for reduced reads.	2011-11-15 10:23:59 -05:00
Mark DePristo	233e581828	Merging in Master	2011-11-15 09:28:24 -05:00
Eric Banks	b66556f4a0	Update error message so that it's clear ReadPair Walkers are exceptions	2011-11-15 09:22:57 -05:00
Mark DePristo	6e1a86bc3e	Bug fixes to VariantContext and GenotypeCollection	2011-11-15 09:21:30 -05:00
Mauricio Carneiro	cde829899d	compress Reduce Read counts bytes by offset compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size. Example compression --> from : 10, 10, 11, 11, 12, 12, 12, 11, 10 to: 10, 0, 1, 1,2, 2, 2, 1, 0	2011-11-14 18:30:24 -05:00
Mark DePristo	f0234ab67f	GenotypeMap -> GenotypeCollection part 2 -- Code actually builds	2011-11-14 17:42:55 -05:00
David Roazen	ab0ee9b847	Perform only necessary validation in VariantContext modify methods	2011-11-14 16:49:59 -05:00
Mark DePristo	2e9d5363e7	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-14 15:32:06 -05:00
Mark DePristo	1fbdcb4f43	GenotypeMap -> GenotypeCollection	2011-11-14 15:32:03 -05:00
Eric Banks	4dc9dbe890	One quick fix to previous commit	2011-11-14 14:42:12 -05:00
Eric Banks	7b2a7cfbe7	Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes.	2011-11-14 14:31:27 -05:00
Mark DePristo	9b5c79b49d	Renamed InferredGeneticContext to CommonInfo -- I have no idea why I named this InferredGeneticContext, a totally meaningless term -- Renamed to CommonInfo. -- Made package protected, as no one should use this outside of VariantContext and Genotype -- UGEngine was using IGC constant, but it's now using the public one in VariantContext.	2011-11-14 14:28:52 -05:00
Mark DePristo	077397cb4b	Deleted MutableVariantContext -- All methods that used this capable now use VariantContext directly instead	2011-11-14 14:19:06 -05:00
Mark DePristo	b11c535527	Deleted MutableGenotype -- This class wasn't really used anywhere, and so removed to control code bloat.	2011-11-14 13:16:36 -05:00
Mark DePristo	79987d685c	GenotypeMap contains a Map, not extends it -- On path to replacing it with GenotypeCollection	2011-11-14 12:55:03 -05:00
Eric Banks	7aee80cd3b	Fix to deal with reduced reads containing a deletion	2011-11-14 12:23:46 -05:00
Eric Banks	3d2970453b	Misc minor cleanup	2011-11-14 09:41:54 -05:00
Laurent Francioli	1347beef40	Merge branch 'PhaseByTransmission'	2011-11-14 11:31:28 +01:00
Eric Banks	b7c33116af	Minor docs update	2011-11-12 23:21:07 -05:00
Eric Banks	76d357be40	Updating docs example to use -L since that's best practice	2011-11-12 23:20:05 -05:00
Mark DePristo	fee9b367e4	VariantContext genotypes are now stored as GenotypeMap objects -- Enables further sophisticated optimizations, as this class can be smarter about storing the data and will directly support operations like subset to samples -- All instances in the gatk that used Map<String, Genotype> now use GenotypeMap type. -- Amazingly, there were many places where HashMap<String, Genotype> is used, so that the order of the genotypes is technically undefined and could be dangerous. Now everything uses GenotypeMap with a specific ordering of samples (by name) -- Integrationtests updated and all pass	2011-11-11 15:00:35 -05:00
Guillermo del Angel	cd3146f4cf	Add hidden option to ValidationAmplicons to output slightly modified format to make file work with downstream SQNM tools more seamlessly at request of GAP: one line per record, keep probe identifier to 20 characters, no * in ref allele.	2011-11-11 14:07:07 -05:00
Ryan Poplin	40fbeafa37	VQSR will now detect if the negative model failed to converge properly because of having too few data points and automatically retry with more appropriate clustering parameters.	2011-11-11 11:52:30 -05:00
Mark DePristo	ef9f8b5d46	Added subContextOfSamples to VariantContext -- This is a more convenient accesssor than subContextOfGenotypes, represents nearly all of the use cases of the former function, and potentially can be implemented more efficiently.	2011-11-11 10:07:11 -05:00
Mark DePristo	ee40791776	Attributes are now Map<String,Object> not Map<String,?> -- Allows us to avoid an unnecessary copy when creating InferredGeneticContext (whose name really needs to change).	2011-11-11 09:55:42 -05:00
Mark DePristo	dc9b351b5e	Meaningful error message when an IntervalArg file fails to parse correctly	2011-11-10 17:10:26 -05:00
Mark DePristo	bb7bf74aa8	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-10 16:05:43 -05:00
Mauricio Carneiro	060c7ce8ae	It wouldn't harm integrationtests if we had our logic right... :-)	2011-11-10 14:03:22 -05:00
Eric Banks	39678b6a20	Check for reads with missing read groups and throw a UserException when encountered. Mauricio said this wouldn't break integration tests.	2011-11-10 13:34:45 -05:00
Mark DePristo	dd1810140f	-stratIntervals is optional	2011-11-10 13:27:32 -05:00
Mark DePristo	67b022c34b	Cleanup for new SampleUtils function -- getVCFHeadersFromRods(rods) is now available so that you don't have getVCFHeadersFromRods(rods, null) throughout the codebase	2011-11-10 13:27:13 -05:00
Mark DePristo	35fe9c8a06	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-10 11:11:33 -05:00
Mark DePristo	dc4932f93d	VariantEval module to stratify the variants by whether they overlap an interval set The primary use of this stratification is to provide a mechanism to divide asssessment of a call set up by whether a variant overlaps an interval or not. I use this to differentiate between variants occurring in CCDS exons vs. those in non-coding regions, in the 1000G call set, using a command line that looks like: -T VariantEval -R human_g1k_v37.fasta -eval 1000G.vcf -stratIntervals:BED ccds.bed -ST IntervalStratification Note that the overlap algorithm properly handles symbolic alleles with an INFO field END value. In order to safely use this module you should provide entire contigs worth of variants, and let the interval strat decide overlap, as opposed to using -L which will not properly work with symbolic variants. Minor improvements to create() interval in GenomeLocParser.	2011-11-10 10:58:40 -05:00
Mauricio Carneiro	0d8983feee	outputting the RG information setReadGroup now sets the read group attribute for the GATKSAMRecord	2011-11-09 23:35:00 -05:00
Eric Banks	315ac68b0b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-09 22:37:36 -05:00
Eric Banks	6313aae2c4	Adding checks for hasBasePileup() before calling getBasePileup() as per GS thread	2011-11-09 22:37:26 -05:00
Ryan Poplin	74a18d3de8	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-09 22:29:40 -05:00
Ryan Poplin	24712c0221	Merged bug fix from Stable into Unstable	2011-11-09 22:28:27 -05:00
Ryan Poplin	8942406aa2	Use MathUtils to compare doubles instead of testing for equality	2011-11-09 22:05:21 -05:00
Ryan Poplin	348f2db7fd	Fix for HMM optimization. If the two penalty arrays match exactly the function should return the end of the array instead of 0.	2011-11-09 22:00:52 -05:00
Eric Banks	82bf09edf3	Mark Standard Annotations with an asterisk	2011-11-09 20:42:31 -05:00
Eric Banks	04b122be29	Fix for bug reported on GetSatisfaction	2011-11-09 20:33:36 -05:00
Mauricio Carneiro	d00b2c6599	Adding a synthetic read for filtered data * Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data. * Synthetic reads can now have deletions (but not insertions) * New reduced read tag for filtered data synthetic reads (RF) * Sliding window header now keeps information of consensus and filtered data * Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads	2011-11-09 20:16:22 -05:00
Eric Banks	21bf43f3bb	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-09 15:34:40 -05:00
Christopher Hartl	85bffe1dca	Merged bug fix from Stable into Unstable	2011-11-09 15:29:14 -05:00
Christopher Hartl	d828eba7f4	Allow comments in a table-formatted file to precede the header line.	2011-11-09 15:27:38 -05:00
Eric Banks	8205efbb29	Merge branch 'master' into intervals	2011-11-09 15:27:15 -05:00
Eric Banks	d64f8a89a9	Instead of the SelfScopingFeatureCodec interface, pushed this functionality into Tribble itself. Now we can e.g. determine that a file can be parsed by the BedCodec on the fly.	2011-11-09 15:24:29 -05:00
Mauricio Carneiro	f080f64f99	Preserve RG information on new GATKSAMRecord from SAMRecord	2011-11-09 14:39:20 -05:00
Mauricio Carneiro	f9530e0768	Clean unnecessary attributes from the read this gives on average 40% file size reduction.	2011-11-09 14:39:20 -05:00
Mauricio Carneiro	9427ada498	Fixing no cigar bug empty GATKSAMRecords will have a null cigar. Treat them accordingly.	2011-11-09 14:39:20 -05:00
Mark DePristo	e639f0798e	mergeEvals allows you to treat -eval 1.vcf -eval 2.vcf as a single call set -- A bit of code cleanup in VCFUtils -- VariantEval table to create 1000G Phase I variant summary table -- First version of 1000G Phase I summary table Qscript	2011-11-09 14:35:50 -05:00
Christopher Hartl	149b79eaad	Merged bug fix from Stable into Unstable	2011-11-09 11:26:30 -05:00
Christopher Hartl	11abb4f9d1	Better error message.	2011-11-09 11:25:28 -05:00
Christopher Hartl	d3a533b82e	Revert "a" This reverts commit 1175f50ddbf389f5da74d27dc725596582ae15af.	2011-11-09 11:22:26 -05:00
Christopher Hartl	5eaf800281	a	2011-11-09 11:22:20 -05:00
Christopher Hartl	5451fbc2b2	Merged bug fix from Stable into Unstable	2011-11-09 11:06:15 -05:00
Christopher Hartl	091229e4db	MVLikelihoodRatio now checks if the family string is provided before attempting to instantiate. Also check that variant contexts have both genotypes and genotype likelihoods. Table codec now yells at users for not providing a HEADER with the table - parsing tables without a header line was causing the first line of the file to be eaten. Table feature now has a toString method. These are minor bug fixes.	2011-11-09 11:03:29 -05:00
Mauricio Carneiro	e1b4c3968f	Fixing GATKSAMRecord bug when constructing a GATKSAMRecord from scratch, we should set "mRestOfBinaryData" to null so the BAMRecord doesn't try to retrieve missing information from the non-existent bam file.	2011-11-08 16:50:36 -05:00
Ryan Poplin	e973ca2010	fixing merge conflict.	2011-11-08 14:55:05 -05:00
Ryan Poplin	b0e6afec48	Bug fix for HMM optimization. Need to also check the gap continuation penalty array for the index with the first discrepancy.	2011-11-08 14:51:25 -05:00
Laurent Francioli	571c724cfd	Added reporting of the number of genotypes updated.	2011-11-08 15:15:51 +01:00
Ryan Poplin	94dc447a70	Merged bug fix from Stable into Unstable	2011-11-07 15:26:35 -05:00
Ryan Poplin	0b181be61f	Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this.	2011-11-07 15:25:16 -05:00
Ryan Poplin	0534149708	Merged bug fix from Stable into Unstable	2011-11-07 14:07:08 -05:00
Ryan Poplin	2d1e385ca4	Adding note to VQSR docs about Rscript being needed in the environment PATH.	2011-11-07 14:04:13 -05:00
Eric Banks	759f4fe6b8	Moving unclaimed walker with bad integration test to archive	2011-11-07 13:16:38 -05:00
Eric Banks	c1986b6335	Add notes to the GATKdocs as to when a particular annotation can/cannot be calculated.	2011-11-07 11:06:19 -05:00
Eric Banks	724e3f3b0d	Merged bug fix from Stable into Unstable	2011-11-06 22:23:22 -05:00
Eric Banks	cdd40d1222	Removing contracts for the SimpleTimer	2011-11-06 22:22:49 -05:00
Ryan Poplin	5c565d28b9	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-06 10:26:19 -05:00
Eric Banks	1c4e429a1c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-06 00:05:56 -04:00
Eric Banks	a12bc63e5c	Get rid of support for bams without sample information in the read groups. This hidden option wasn't being used anyways because it wasn't hooked up properly in the AlignmentContext.	2011-11-05 23:54:28 -04:00
Eric Banks	90a053ea93	Don't change the mapping quality of MQ=255 reads in IR	2011-11-05 22:40:45 -04:00
Ryan Poplin	611a395783	Now properly extending candidate haplotypes with bases from the reference context instead of filling with padding bases. Functionality in the private Haplotype class is no longer necessary so removing it. No need to have four different Haplotype classes in the GATK.	2011-11-05 12:18:56 -04:00
Mark DePristo	e99871f587	Bug fix for decode loc -- decodeLoc() wasn't skipping input header lines, so the system blew up when there was an = line being split.	2011-11-04 13:20:54 -04:00
Mark DePristo	a340a1aeac	Bug fix. decodeLoc() should update lineNo so you get meaningful line no when indexing due to malformed VCF files.	2011-11-04 11:44:24 -04:00
Mark DePristo	9f260c0dc1	Zero byte index bug fix for RandomlySplitVariants + cleanup -- vcfWriter2 was never being closed in onTraversalDone(), so the on the fly index file was being created but never actually properly written to the file. -- This bug is ultimately due to the inability of the GATK to allow multiple VCF output writers as @Output arguments, though -- Removed the unnecessary local variable iFraction, = 1000 * the input fraction argument. Now the system just uses a double random number and compares to the input fraction at all. Is there some subtle reason I don't appreciate for this programming construct?	2011-11-04 09:45:20 -04:00
Mauricio Carneiro	e89ff063fc	GATKSAMRecord refactor The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...). * No tools should create SAMRecord anymore, use GATKSAMRecord instead *	2011-11-03 15:43:26 -04:00
Laurent Francioli	385a6abec1	Fixed a bug that wrongly swapped the mother and father genotypes in case the child genotype missing.	2011-11-03 13:04:53 +01:00
Laurent Francioli	893787de53	Functions getAsMap and getNegLog10GQ now handle missing genotype case.	2011-11-03 13:04:11 +01:00
Eric Banks	e8bceb1eaa	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 21:13:54 -04:00
Eric Banks	52b16bf739	Must check whether there's a normal vs. extended pileup before asking for it.	2011-11-02 20:45:24 -04:00
Eric Banks	e1edd6bd12	Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain.	2011-11-02 20:32:58 -04:00
Ryan Poplin	e94fcf537b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 16:29:19 -04:00
Ryan Poplin	4d35272916	Bug fixes with Mauricio to functions in ReadUtils used by reduced reads and the haplotype caller.	2011-11-02 16:29:10 -04:00
Mark DePristo	8a2929c1dd	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-11-02 16:21:00 -04:00
Laurent Francioli	19ad5b635a	- Calculation of parent/child pairs corrected - Separated the reporting of single and double mendelian violations in trios	2011-11-02 18:35:31 +01:00
Eric Banks	967ff647b8	Reduced reads shouldn't contribute to Fisher Strand calculations	2011-11-02 13:07:20 -04:00
Eric Banks	cf0e699226	QualByDepth was inefficiently iterating over the pileup 2 times for some reason. Removed non-useful annotation classes.	2011-11-02 12:58:38 -04:00
Eric Banks	4501dce58d	Fixing merge conflict	2011-11-02 12:50:32 -04:00
Eric Banks	54331b44e9	New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths.	2011-11-02 12:47:30 -04:00
Mark DePristo	c2b97030a4	IntervalUtils for completely balanced locus-based scatter/gather -- scatterLocusIntervals master utility -- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc -- Util function for reversing a list (List<T> -> List<T>, unlike Collections version) -- DoC is PartitionType.INTERVAL -- Significant unit tests on new functionality (all passing) -- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work	2011-11-02 10:49:40 -04:00
Laurent Francioli	119ca7d742	Fixed a bug in parent/child pairs reporting causing a crash in case the -mvf option was used and mother was not provided	2011-11-02 08:22:33 +01:00
Laurent Francioli	b91a9c4711	- Fixed parent/child pairs handling (was crashing before) - Added parent/child pair reporting	2011-11-02 08:04:01 +01:00
Mark DePristo	5fc613f972	Better default partition types for walkers -- Added PartitionType.READ, and associated ReadScatterFunction. ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better -- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.	2011-11-01 19:47:10 -04:00
Mauricio Carneiro	36600fd8e9	added MQ of low MQ/BQ to consensus RMS Bases that were excluded for MQ and BQ filters are now contributing to the MQ RMS (but not to consensus base counts and variant/not variant region triggers).	2011-11-01 17:46:12 -04:00
Mauricio Carneiro	b004489c6d	Moving ReduceRead TAG to GATKSAMRecord ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.	2011-11-01 17:12:09 -04:00
Mauricio Carneiro	17cc484dbd	Revert "ReduceReads ref bases are now output as '=' Reducing the reference bases to '=' results in an extra compression of 13% on average. The GATK is not ready to handle files with '=' bases, and the decision was to implement this a an engine support, not a part of ReduceReads.	2011-11-01 16:35:07 -04:00
Eric Banks	0839c75c8d	More minor fixes to docs	2011-10-31 21:49:27 -04:00
Eric Banks	74b018a1f3	Minor fixes to docs	2011-10-31 21:41:43 -04:00
Eric Banks	31ee5432c5	Merged bug fix from Stable into Unstable	2011-10-31 14:56:59 -04:00
David Roazen	cdde32acbd	Merged bug fix from Stable into Unstable	2011-10-31 14:21:15 -04:00
Eric Banks	f62af0291b	Check for invalid VCF records (not enough tokens) instead of assuming they are there.	2011-10-31 14:09:51 -04:00
Andrey Sivachenko	bed0acaed4	nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them	2011-10-31 12:28:28 -04:00
Mauricio Carneiro	389380a590	ReduceReads ref bases are now output as '=' to save space Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.	2011-10-30 12:04:39 -04:00
Eric Banks	0ca7428e76	Allow processing of empty intervals, but warn user when this case is encountered.	2011-10-28 12:12:14 -04:00
Eric Banks	649dfe98f0	Add VCF header for any expressions that are requested	2011-10-28 10:22:19 -04:00
Eric Banks	057a79f598	This argument should be annotated as @Input	2011-10-28 09:44:49 -04:00
Eric Banks	4ba7c0cecd	Moving to private	2011-10-28 09:29:28 -04:00
Eric Banks	1bdd76c2f2	These tools now use the IntervalBinding system to handle intervals instead of doing it all manually	2011-10-28 09:28:12 -04:00
Eric Banks	6ba08a103d	Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory.	2011-10-28 09:23:25 -04:00
Eric Banks	3d04bb5608	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-27 23:55:18 -04:00
Eric Banks	19e27d4568	Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative.	2011-10-27 23:55:11 -04:00
Eric Banks	cafc245a43	For some reason, a class of Codecs (including TableCodec) require that a GenomeLocParser be passed in to do the position processing. Why can't they just return a Feature with chr, start, stop? Isn't that the right thing?	2011-10-27 23:54:28 -04:00
Guillermo del Angel	cbc43683ee	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-27 20:54:18 -04:00
Guillermo del Angel	8907e42007	First fully functional implementation of ValidationSiteSelectorWalker. User gives a) a set of input variants, b) a desired number of output variants, b) Optionally, a set of samples which will restrict sites to be polymorphic in those samples, c) a frequency selection mode: either uniform (no AF matching), or matching AF so that output sites mirror the input AF spectrum as closely as possible. More testing is needed and docs need improving but so far all functionality seems up and running	2011-10-27 20:53:48 -04:00
Eric Banks	ccfd853b34	Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.	2011-10-27 20:43:50 -04:00
Eric Banks	c2f343773e	Oops, working too quickly last time. This is the proper fix for the potential NPE in the equals() test.	2011-10-27 15:32:08 -04:00
Khalid Shakir	b80d407dc7	No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path. Other minor cleanup.	2011-10-27 14:17:07 -04:00
Eric Banks	8c4dbce6d8	Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing.	2011-10-27 13:58:19 -04:00
Eric Banks	4a7e6fee3f	Remove support for BED file interval parsing in the GATK; it should all go through Tribble now. IndelRealigner no longer supports unordered interval input (which shouldn't have been used anyways). Temporarily commenting out serialization of arguments so that tests pass; this whole piece will be deleted soon anyways.	2011-10-27 13:38:08 -04:00
Matt Hanna	f7df8bdecc	Merged bug fix from Stable into Unstable	2011-10-27 11:31:17 -04:00
Matt Hanna	41ddc7bce7	Make sure we output a full stack trace when we encounter Tribble error messages on VCF header merge.	2011-10-27 11:30:04 -04:00
Eric Banks	44f905b5e5	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 23:31:11 -04:00
Eric Banks	68283b1651	Fixing docs and adding GATKdocs for the new interval functionality	2011-10-26 22:14:43 -04:00
Mark DePristo	c9978316a3	Merge branch 'FragmentUtils'	2011-10-26 19:51:49 -04:00
Mauricio Carneiro	add9ad97ec	No scatter gather for VQSR or ApplyVQSR. These walkers should not be scatter gatherable. Annotating them accordingly so that Queue doesn't allow a less than knowledgeable user to try and scatter/gather VQSR.	2011-10-26 16:35:44 -04:00
Ryan Poplin	74aeb22eeb	Merged bug fix from Stable into Unstable	2011-10-26 15:57:30 -04:00
Ryan Poplin	86871bd1e3	Throw a UserException in the BQSR when there is no data instead of creating an empty csv file	2011-10-26 15:56:41 -04:00
Mark DePristo	034a997d07	Generalized Reads -> Fragment calculation -- Supports ReadBackedPileup -> FragmentCollection as before -- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller -- General cleanup, renaming, move to separate package, more extensive unit tests, etc. -- Added toFragment() function to ReadBackedPileup interface	2011-10-26 15:54:38 -04:00
Eric Banks	2f21b6ecfb	Removed debugging output	2011-10-26 15:50:20 -04:00
Eric Banks	b39fcb1bea	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-26 15:44:25 -04:00
Eric Banks	b6ce6ed3f8	Go around the ROD system for now so that we can just call decodeLoc() for efficiency. Noted that we should go through the ROD system once it gets cleaned up. This means that currently gzipped files are not supported with -L.	2011-10-26 15:42:53 -04:00
Eric Banks	9424e8b2ca	Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed.	2011-10-26 14:11:49 -04:00
Mark DePristo	7fa943aef1	Renamed FragmentPileup to FragmentUtils	2011-10-26 14:01:45 -04:00
Laurent Francioli	1f044faedd	- Genotype assignment in case of equally likeli combination is now random - Genotype combinations with 0 confidence are now left unphased	2011-10-26 19:57:09 +02:00
Laurent Francioli	81b163ff4d	Indentation	2011-10-26 14:49:12 +02:00
Laurent Francioli	62cff266d4	GQ calculation corrected for most likely genotype	2011-10-26 14:40:04 +02:00
Mark DePristo	af3613cc5f	GATKSAMRecord commit branch summary First, I'm sure there's a better way to do this, but I wanted to create a single commit summarizing the changes from my branch SamRecordFactory. What's the best way to do this? Rebase? Now, on to the changes here: -- Picard added a SamRecordFactory that is used to create instances the subclass SamRecord or BAMRecord. This factory allows us to have low-level picard readers (SamFileReader) create objects of type GATKSamRecord. The abomination of the extends and contains GATKSamRecord is now gone. GATKSamRecords are now produced by this factory, the GATK provides this factory to our SamFileReaders, and everything works with GATKSamRecord just extending BAMRecord. This results in up to a 2x performance improvement in writing BAMs and a ~10% improvement when reading BAMs files. -- As a consequence of this, we no longer officially support SAM records. Attempting to create SAMRecord objects with the factory will throw a user exception. -- Created a standard NGSPlatform enum, and GATKSamRecords support efficiently obtaining this value. The real BQSR (not the copy indel version) got the efficient code to use this. Please add all future platforms to this enum. -- GATKSamRecord no longer supports using the OQ or defaultBaseQuality. This is performed in a wrapper iterator that's only added when these command line options are used. -- ReducedRead code has been moved from ReadUtils until efficiency caching assessors in GATKSamRecord. -- ArtificialSamUtils creates GATKSamRecords now, just SAMRecords. Added code here to create artifical pairs and using that code to create artificial ReadBackedPileups with specific properties -- New smarter algorithm for FragmentPileup. This new code is up to 3x faster than the previous version, and is lazy so is more efficient when no overlapping pairs are actually in the pileup. Created extensive DataProvider driven UnitTest. Added Caliper-based benchmarking system to characterize the performance differences between the old and new algorithms. TODO still remains to make a efficient version that works for non-pileups for the HaplotypeCaller	2011-10-25 20:52:56 -04:00
Mark DePristo	2822f0dc27	Merge branch 'SamRecordFactory'	2011-10-25 20:34:47 -04:00
Mark DePristo	1b722c21cf	merge master	2011-10-25 16:08:39 -04:00
Ryan Poplin	56fdf0b865	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-25 15:58:56 -04:00
Ryan Poplin	4a34c1862e	misc cleanup. We now filter out haplotypes when it is obvious that the assembly has failed to find a parsimonious event rather than use haplotypes with large numbers of SNPs and small indels on them.	2011-10-25 15:22:28 -04:00
Guillermo del Angel	b559936b7a	a)New variant eval stratification module for indel size. b) Next iteration on indel caller runtime optimization: when computing likelihood of each haplotype for a given read, many computations will be redundant since pieces of haplotypes will be common to both REF and ALT haplotypes. So, we keep HMM matrices from one haplotype to the next one and recompute starting at the part where either haplotype is different or GOP/GCP are different.	2011-10-25 09:56:43 -04:00
Khalid Shakir	fac9932938	Embedding gsalib source and queueJobReport R scripts in the dist and package jars. Moved gsalib and queueJobReport.R to embeddable namespaced locations. Updated packager dependencies/dir to add an @includes which filters the embedded fileset. RScriptExecutor can now JIT compiles the gsalib. RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG. Refactored ProcessController and IOUtils from Queue to Sting Utils. Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count. Replaced uses of some IOUtils with Apache Commons IO. ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown. Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().	2011-10-24 15:58:34 -04:00
Khalid Shakir	89a581a66f	Added ability to specify arguments in files via -args/--arg_file Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()	2011-10-24 15:58:34 -04:00
Mark DePristo	502592671d	Cleanup FragmentPileup before main repo commit -- removed intermiate functions. Now only original version and best optimized new version remain -- Moved general artificial read backed pileup creation code into ArtificialSamUtils	2011-10-24 14:40:05 -04:00
Mark DePristo	166174a551	Google caliper example execution script -- FragmentPileup with final performance testing	2011-10-24 14:04:53 -04:00
Laurent Francioli	62477a0810	Added documentation and comments	2011-10-24 13:45:21 +02:00
Laurent Francioli	38ebf3141a	- Now supports parent/child pairs - Sites with missing genotypes in pairs/trios are handled as follows: -- Missing child -> Homozygous parents are phased, no transmission probability is emitted -- Two individuals missing -> Phase if homozygous, no transmission probability is emitted -- One parent missing -> Phased / transmission probability emitted - Mutation prior set as argument	2011-10-24 12:30:04 +02:00
Laurent Francioli	7312e35c71	Now makes use of standard Allele and Genotype classes. This allowed quite some code cleaning.	2011-10-24 10:25:53 +02:00
Laurent Francioli	01b16abc8d	Genotype quality calculation modified to handle all genotypes the same way. This is inconsistent with GQ output by the UG but is correct even for cases of poor quality genotypes.	2011-10-24 10:24:41 +02:00
Mark DePristo	f6ccac889b	Merged bug fix from Stable into Unstable	2011-10-23 16:37:12 -04:00
Mark DePristo	585a45b7a3	Bug fix for ClipReadsWalker when stats output isn't provided -- See http://getsatisfaction.com/gsa/topics/clipreadswalker?utm_content=topic_link&utm_medium=email&utm_source=reply_notification	2011-10-23 16:36:48 -04:00
Ryan Poplin	f5d910b8a5	Haplotype caller now sends genotype likelihoods to the exact model to genotype the events found in the best haplotypes.	2011-10-23 13:29:08 -04:00
Mark DePristo	42bf9adede	Initial version of "fast" FragmentPileup code -- Uses mayOverlapRoutine in ReadUtils -- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations -- PileupElement now comparable (sorts on offset than on start) -- Caliper microbenchmark to assess performance	2011-10-22 21:36:37 -04:00
Mauricio Carneiro	4913f8a60f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-21 17:45:07 -04:00
Mauricio Carneiro	102dafdcbc	Validation of GATKSamRecord in read filters Moved the validation of the GATKSamRecord to the MalformedReadFilter with the intent to make the read filter the ultimate validation location for sam records. This way we can opt to filter out malformed reads if we know what we are doing or blow up otherwise.	2011-10-21 17:40:43 -04:00
Guillermo del Angel	f4b409fa0d	CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result	2011-10-21 14:07:20 -04:00
Mark DePristo	b863390cb1	Moving reduced read functionality into GATKSAMRecord -- More functions take / produce GATKSAMRecords instead of SAMRecord	2011-10-21 13:28:05 -04:00
Mark DePristo	2403e96062	Renamed GATKSamRecord -> GATKSAMRecord for consistency. Better docs.	2011-10-21 09:59:24 -04:00
Mark DePristo	110e13bc1e	Merge branch 'master' into SamRecordFactory	2011-10-21 09:43:52 -04:00
Mark DePristo	be797a8a1f	Recalibrator now uses the much more efficient NGSPlatform in the cycle covariates system	2011-10-21 09:39:21 -04:00
Mark DePristo	ed74ebcfa1	GATKSamRecords with efficiency NGSPlatform method	2011-10-21 09:38:41 -04:00
Mark DePristo	94e1898d8f	A canonical set of NGS platforms as enums with convenient manipulation methods	2011-10-21 09:37:45 -04:00
Laurent Francioli	edea90786a	Genotype quality is now recalculated for each of the phased Genotypes. Small problem is that we unnecessarily loose a little precision on the genotypes that do not change after assignment.	2011-10-20 17:04:19 +02:00
Laurent Francioli	1c61a57329	Original rewrite of PhaseByTransmission: - Adapted to get the trio information from the SampleDB (i.e. from Pedigree file (ped)) => Multiple trios can be passed as argument - Mendelian violations and trio phasing possibilities are pre-calculated and stored in Maps. => Runtime is ~3x faster - Genotype combinations possible only given two MVs are now given a squared MV prior (e.g. 0/0+0/0=>1/1 is given 10^-16 prior if the MV prior is 10^-8) - Corrected bug: In case the best genotype combination is Het/Het/Het, the genotypes are now set appropriately (before original genotypes were left even if they weren't Het/Het/Het) - Basic reporting added: -- mvf argument let the user specify a file to report remaining MVs -- When the walker ends, some basic stats about the genotype reconfiguration and phasing are output Known problems: - GQ is not recalculated even if the genotype changes Possible improvements: - Phase partially typed trios - Use standard Allele/Genotype Classes for the storage of the pre-calculated phase	2011-10-20 13:06:44 +02:00
Laurent Francioli	ef6a6fdfe4	Added getAsMap -> returns the likelihoods as an EnumMap with Genotypes as keys and likelihoods as values.	2011-10-20 12:49:18 +02:00
Laurent Francioli	76dd816e70	Added getParents() -> returns an arrayList containing the sample's parent(s) if available	2011-10-20 12:47:27 +02:00
Mark DePristo	999a8998ae	Constructor for GATKSamRecord with header only, for unit testing	2011-10-19 17:51:48 -04:00
Mark DePristo	bba69701b5	Now creates GATKSamRecords now SamRecords	2011-10-19 17:49:17 -04:00
Christopher Hartl	cd8a6d62bb	You know how the wiki has a big section on commiting local changes to BRANCHES of the repository you clone it from? Yeah. It sucks if you don't do that. This commit contains: - IntronLossGenotyper is brought into its current incarnation - A couple of simple new filters (ReadName is super useful for debugging, MateUnmapped is useful for selecting out reads that may have a relevant unaligned mate) - RFA now matches my current local repository. It's in flux since I'm transitioning to the new traversal type. + the triggering read stash pilot required me to change the scope of some of the variables in the ReadClipping code, private -> protected. Those are all the changes there. - MendelianViolation restored to its former glory (and an annotator module that uses the likelihood calculation has been added) + use this rather than a hard GQ threshold if you're doing MV analyses. - Some miscellaneous QScripts	2011-10-19 17:42:37 -04:00
Mark DePristo	52345f0aec	Meaningful documentation string	2011-10-19 15:47:36 -04:00
Mark DePristo	1b38aa1a7e	Cleaning up reduced read code accessors	2011-10-19 15:46:44 -04:00
Eric Banks	d8d73fe4f2	Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.	2011-10-19 15:11:13 -04:00
Mark DePristo	7928b287fc	GATKSamRecord now produced by SAMFileReaders by default -- Removed all of the unnecessary caching operations in GATKSAMRecord -- GATKSAMRecord renamed to GATKSamRecord for consistency	2011-10-19 13:15:27 -04:00
Eric Banks	5a6468c11e	Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.	2011-10-19 11:52:05 -04:00
Eric Banks	48c4a8cb33	Make error messages clearer (even I was confused)	2011-10-19 11:49:16 -04:00
Eric Banks	6cadaa84c9	Just use validate() from super class since it does the same thing	2011-10-19 11:48:23 -04:00
Mark DePristo	df3e4e1abd	First working code to use SamRecordFactory to produce objects of our own design in SAMFileReader	2011-10-19 11:22:35 -04:00
Mauricio Carneiro	c27e2fb676	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-18 15:23:05 -04:00
Mark DePristo	f77f2eeb7d	Fix for new ID structure	2011-10-18 13:04:43 -04:00
Mark DePristo	1a92ee3593	No longer adds a binding of ID -> . when the ID field is dot in the VCF -- Really we should make ID a primary key in VariantContext. Putting it into the attributes is just annoying now	2011-10-18 10:57:02 -04:00
Ryan Poplin	e45fcb66eb	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-17 15:56:19 -04:00
Ryan Poplin	1e6794c539	fixing typo in VariantsToTable docs	2011-10-17 15:56:02 -04:00
Mark DePristo	0de8550f17	Merged bug fix from Stable into Unstable	2011-10-17 15:29:53 -04:00
Mark DePristo	c1329c4dde	Fixing a binary to logical or	2011-10-17 15:29:45 -04:00
Mark DePristo	9e4963efc8	Merged bug fix from Stable into Unstable	2011-10-17 15:27:38 -04:00
Mark DePristo	ec911ce5bb	Even better error messages	2011-10-17 15:27:22 -04:00
Mark DePristo	d065bf1715	Merged bug fix from Stable into Unstable	2011-10-17 15:25:47 -04:00
Mark DePristo	a7cf9cdc67	Fixing error message typo	2011-10-17 15:25:35 -04:00
Ryan Poplin	589df6b7cf	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-17 14:35:14 -04:00
Ryan Poplin	6b02354d84	Adding a new getter in VariantsToTable to extract the indel event length.	2011-10-17 14:34:52 -04:00
Mark DePristo	3550798c4c	Merged bug fix from Stable into Unstable	2011-10-17 13:58:56 -04:00
Mark DePristo	4108a294f7	Better error message when a RodBinding file doesn't exist	2011-10-17 13:58:46 -04:00
Mark DePristo	cc76826f78	Merged bug fix from Stable into Unstable	2011-10-17 13:38:11 -04:00
Mark DePristo	fd4540cd32	Fixed extraordinarily subtle race condition with contracts invariant -- all of the methods in the class must be synchronized or the internal state can be inconsistent with the contract invariant when entering the class in a non-synchronized method, even when that method doesn't care about the object's internal state	2011-10-17 13:37:55 -04:00
Mark DePristo	5a881360df	Merged bug fix from Stable into Unstable	2011-10-13 15:54:43 -04:00
Mark DePristo	7cab6f6bb0	Bug fixes for thread unsafe simple timer and bad Ns treatment in AlignmentUtils -- SimpleTimer is now threadsafe using synchronized method keywords -- Bug fix for alignmentToByteArray() where the N case was refPos++ not the now correct refPos += elementLength	2011-10-13 15:53:12 -04:00
Mauricio Carneiro	e12ffb6547	Updating docs for GCContentByInterval This walker does not take any BAMs. It only walks over the reference.	2011-10-13 13:27:00 -04:00
Eric Banks	9aecd50473	Adding ability to exclude annotations from the VA and UG lists. As described in the docs, this argument trumps all others (including -all) so that we can get around the SnpEff issue brought up by Menachem. Added integration test for it.	2011-10-12 15:44:54 -04:00
Mauricio Carneiro	e53a952aeb	Added ION Torrent support to CountCovariates.	2011-10-12 01:57:02 -04:00
Mauricio Carneiro	a2733a451f	Added NotCalled feature to GAV Added "not called" and "no status" to the truth table. Very useful.	2011-10-11 19:31:45 -04:00
David Roazen	ae83420637	Merged bug fix from Stable into Unstable	2011-10-11 12:26:08 -04:00
David Roazen	794f275871	SnpEff is now marked as a RodRequiringAnnotation instead of an ExperimentalAnnotation. Having SnpEff grouped with the Experimental annotations was proving problematic, since it requires a rod. Placing it in its own group should improve the situation somewhat, making it easier to request "all annotations except for SnpEff".	2011-10-11 12:08:56 -04:00
David Roazen	cfd0ac8410	Merged bug fix from Stable into Unstable Conflicts: public/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java	2011-10-11 12:03:51 -04:00
David Roazen	24b72334b3	UnifiedGenotyper now correctly initializes the VariantAnnotator engine. This allows the annotation classes to perform any necessary initialization/validation. For example, it allows the SnpEff annotator to (among other things) validate its rod binding. This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding is present. Added an integration test to cover this case so that it doesn't break again.	2011-10-11 12:02:05 -04:00
Guillermo del Angel	0429b38021	Merged bug fix from Stable into Unstable	2011-10-11 11:19:38 -04:00
Guillermo del Angel	1c485d8b5e	Forgot that no matter how trivial a change it's a good idea to compile first	2011-10-11 11:18:41 -04:00
Guillermo del Angel	6418f4d69b	Merged bug fix from Stable into Unstable	2011-10-11 11:13:18 -04:00
Guillermo del Angel	1975de1b32	Second try: hide --do_indel_quality in AnalyzeCovariates	2011-10-11 11:11:29 -04:00
Guillermo del Angel	6506ea83e8	Revert "Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users"... a hidden passenger change made it through. This reverts commit 70e10ccb1be90dcff8f4485ae6ee036db2d1ac86.	2011-10-11 11:03:12 -04:00
Guillermo del Angel	4c1d8c8d44	Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users	2011-10-11 11:01:06 -04:00
Eric Banks	77c983c5b5	No one claimed this walker and it doesn't have integration tests or GATKdocs so it doesn't belong in public.	2011-10-10 15:17:54 -04:00
Mark DePristo	fb72bcf732	DiffObjects no longer prints out the file name in the status so MD5 are stable	2011-10-10 15:10:57 -04:00
Mark DePristo	46e7370128	this.allele, getAlleles(), and getAltAlleles() now return List not set -- Changes associated code throughout the codebase -- Updated necessary (but minimal) UnitTests to reflect new behavior -- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC	2011-10-09 11:45:55 -07:00
Mark DePristo	c67f6c076b	simpleMerge now preserves allele order -- UnitTests for dangerous PL merging cases in the multi-allelic case. The new behavior is correct	2011-10-08 17:39:53 -07:00
Mark DePristo	ec14a4a606	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-07 08:38:50 -07:00
Eric Banks	ca9cd9b688	Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC.	2011-10-06 22:38:44 -04:00
Mark DePristo	c7864c7256	Filter application order is now deterministic, in the order defined by the walker -- For no apparent reason we were using a HashSet to store the ReadFilters, so the order of operations was really arbitrarily applied. The order now is (1) the order of the walker intrinsic filters (2) read group black list (if provided) (3) command line filters (if provided)	2011-10-06 18:51:40 -07:00
Mark DePristo	0b88af4af9	Counts of records failing filters are displayed sorted -- Stops random ordering of the output, as the counts are returned sorted by string name of the class -- Deleted now unused sh*tty assessors in Utils	2011-10-06 18:42:26 -07:00
Mark DePristo	d1e70d6ec2	Removed Nx counting of reads in metrics with -nt > 1	2011-10-06 18:29:26 -07:00
Eric Banks	c61804a450	Rename the long version of the argument name to more accurately reflect its purpose.	2011-10-06 16:14:04 -04:00
Eric Banks	61a3dfae24	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 15:58:04 -04:00
Eric Banks	6eb87bf58a	RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop.	2011-10-06 15:57:49 -04:00
Eric Banks	1b0735f0a3	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 13:41:45 -04:00
Eric Banks	c4dfc1fb8b	Temporary commit of parallelization support for RealignerTargetCreator. Tim begged us for this and I got assurances from Khalid/Matt that this would also be extremely helpful for the whole genome calling pipeline, so I spent a while working on this. Needs to be fixed up though because apparently only the leaves in the hierarchical reduce get their output aggregated. Worked out a better solution with Matt.	2011-10-06 13:41:36 -04:00
Mark DePristo	73f9d1f217	GATK read group requirement iron hand -- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined -- LIBS again throws an error if the complete list of samples isn't provided -- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data. -- Convenience constructors for UserExceptions.MalformedBAM	2011-10-06 08:40:35 -07:00
Mark DePristo	23845ac798	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 08:17:08 -07:00
Mark DePristo	daa5999489	Fixed typo in argument description	2011-10-06 08:16:25 -07:00
Guillermo del Angel	8a474e38ff	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-06 10:08:39 -04:00
Guillermo del Angel	93f7e632bd	Minor fix/enhancement for VariantEval: if a vcf has symbolic alleles, program would crash ungracefully - now we'll just skip record without processing. This is a big issue since we can't process 1000G integration files with code as is.	2011-10-06 10:07:46 -04:00
Mark DePristo	190be4d0d1	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-10-05 21:27:11 -07:00
Mark DePristo	8e6845806a	Allowing empty samples list in LIBS -- Right now we cannot process BAM files without read groups because we enforce the samples list to not be empty when there's a SAM record. Now if there are reads and there are no samples we add the "null" sample so that LIBS walks the reads properly	2011-10-05 21:26:21 -07:00
Matt Hanna	180c8f286f	Merged bug fix from Stable into Unstable	2011-10-05 20:37:43 -04:00
Matt Hanna	55b9f06527	Ensure that IndelRealigner n-way out option supports MD5 generation.	2011-10-05 20:36:28 -04:00
Mark DePristo	be2d29ce69	Final PED documentation	2011-10-05 15:17:41 -07:00
Mark DePristo	3226d5dc0d	Merge branch 'master' into ped	2011-10-05 15:03:09 -07:00
Mark DePristo	6a573437af	Details documentation arguments for -ped	2011-10-05 15:00:58 -07:00
Mark DePristo	e7c80f7c45	Renaming quantitative trait to OtherPhenotype which is now a String not a double -- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles	2011-10-05 12:26:33 -07:00
Mark DePristo	51ecc20867	getFamily() and associated methods implemented and tested -- Sample no longer serializable -- Sample now implements Comparable	2011-10-05 09:55:05 -07:00
Mark DePristo	a45d985818	TODO method stubs	2011-10-04 15:54:09 -07:00
Mark DePristo	fee89e47ff	Only throws an error when there are no samples but there are reads -- Handles the case when you are running a ROD traversal and yet the LIBS is still used to return null everywhere.	2011-10-04 06:50:54 -07:00
Mark DePristo	f552aede42	Only provide the sample names in the BAM file for efficiency	2011-10-04 06:50:12 -07:00
Mark DePristo	a27641e1fc	Cleaned up imports	2011-10-04 06:28:36 -07:00
Mark DePristo	b20689ff55	No longer supports extraProperties -- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem -- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record. If the two records are inconsistent, an error is thrown -- addSample() in Sample.class now invokes mergeSample() when appropriate -- Validation types are now only STRICT or SILENT -- Validation code implemented in SampleDBBuilder -- Extensive unit tests for SampleDBBuilder	2011-10-03 19:20:33 -07:00
Mauricio Carneiro	3837aa45b4	Fixing conflicts Conflicts: public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java	2011-10-03 19:07:59 -07:00
Mark DePristo	2e3dc52088	Minor function renaming	2011-10-03 14:41:13 -07:00
Mark DePristo	dd71884b0c	On path to SampleDB engine integration -- PedReader tag parser -- Separation of SampleDBBuilder from SampleDB (now immutable) -- Removed old sample engine arguments	2011-10-03 12:08:07 -07:00
Eric Banks	c3eff7451a	Found a small inefficiency while profiling: we were still using String.split instead of ParsingUtils.split to break up array values in the INFO field. There was a noticeable (albeit not big) difference in the change when reading sites only files.	2011-10-03 14:20:39 -04:00
Mark DePristo	8ee0f91904	Remove residual processing tracker arguments	2011-10-03 09:50:01 -07:00
Mark DePristo	89ac50e86e	SampleDataSource -> SampleDB	2011-10-03 09:33:30 -07:00
Mark DePristo	93fba06cb5	Support for whitespace only lines	2011-10-03 09:30:10 -07:00
Mark DePristo	0604ce55d1	PedReader support for ; separated lines, not only newline	2011-10-03 09:19:58 -07:00
Mark DePristo	52f670c8b8	100% version of PedReader -- Passes all unit tests -- Added unit tests for missing fields	2011-10-03 06:12:58 -07:00
Mark DePristo	dd75ad9f49	95% PedReader -- Passes significiant unit tests -- Implicit sample creation for mom / dad when you create single samples -- Continuing cleanup of Sample and SampleDataSource	2011-09-30 18:03:34 -04:00
Andrey Sivachenko	c7898a9be7	inconsequential change in string constants printed into the vcf which noone uses anyway...	2011-09-30 16:40:21 -04:00
Mark DePristo	010899f886	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-30 15:51:09 -04:00
Mark DePristo	84160bd83f	Reorganization of Sample -- Moved Gender and Afflication to separate public enums -- PedReader 90% implemented -- Improve interface cleanup to XReadLines and UserException	2011-09-30 15:50:54 -04:00
Mauricio Carneiro	05fba6f23a	Clipping ends inside deletion and before insertion fixed.	2011-09-30 15:44:43 -04:00
Mark DePristo	c1cf6bc45a	PEDReader should be in samples	2011-09-30 14:22:19 -04:00
Mark DePristo	56f10b40a8	Fixing test bugs for WindowMaker that required empty sample list	2011-09-30 14:18:27 -04:00
Ryan Poplin	af6c053435	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-30 13:33:31 -04:00
Mark DePristo	810e8ad011	Removed getXByReaders() function from the engine -- These could be simplied in their downstream uses -- Or they could be replaced with a generic getSAMFileHeaders() function and then apply the getSamples(header) as desired downstream	2011-09-30 10:43:51 -04:00
Mark DePristo	178ba24c27	Move getSamplesForSamFile to SampleUtils -- A nearly identical piece of code already lived in SampleUtils. Now there are two functions, one taking a regular header and another grabbing the merged header from the GATK engine itself. Much cleaner	2011-09-30 10:28:18 -04:00
Mark DePristo	30d23942b1	Renamed ReadBackedPileup getXSampleName() functions to getXSample -- now that we don't have Sample objects floating around we don't have to have all of the Name extensions on our functions	2011-09-30 10:02:57 -04:00
Mark DePristo	3289a325fc	Removed final use of Sample in RBP	2011-09-30 09:57:39 -04:00
Mark DePristo	a69a4dda2f	SamplesDB no longer has null sample -- Updated getSamples().size() == 2 test in CallableLociWalker that really ensured there was one sample in the system	2011-09-30 09:56:23 -04:00
Mark DePristo	e055a78f6e	LIBS now requires at least one sample be present -- UnitTest provides a "null" sample for matching the reads without read groups	2011-09-30 09:49:35 -04:00
Mark DePristo	9860a2c989	Merge branch 'master' into ped	2011-09-30 09:28:18 -04:00
Mark DePristo	d901fed617	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-30 08:41:44 -04:00
Mauricio Carneiro	cabacf028d	Intermediate commit to fix interval skipping may need additional testing.	2011-09-29 18:45:12 -04:00
Mark DePristo	1765fbeb6b	Merge branch 'master' into ped	2011-09-29 17:18:51 -04:00
Mark DePristo	98ecaf8aa0	Support for ReducedReads with reduced counts and average quals -- ReadUtils and UnitTest updated to support new byte[] style -- Removed unnecessary read transformer in PairHMM	2011-09-29 17:18:39 -04:00
Mauricio Carneiro	9508220157	fixed hard clipping both ends inside deletion If both ends of the interval falls within a deletion in the read then hardClipBothEnds would cut the right tail first including the entire deletion, then fail to cut the left tail because there would not be any bases there anymore. Fixed.	2011-09-29 15:36:49 -04:00
Mark DePristo	625ffb6a07	LocusIteratorByState and ReadBackedPileups no long use Sample	2011-09-29 14:52:11 -04:00
Mark DePristo	b3a2371925	Merge branch 'master' into ped	2011-09-29 14:32:17 -04:00
Mark DePristo	68761a6e28	Removed sample from header	2011-09-29 14:13:05 -04:00
Mauricio Carneiro	a5e75cd14c	Outputting both consensus base qualities and counts The base qualities of a consensus reads are now the average quality of the bases forming the consensus base (most common base) and the consensus quality tag now carry an array with the counts of each base in the consensus. This should increase file size but improve calling sensitivity/specificity.	2011-09-29 12:54:41 -04:00
Mark DePristo	505416b6c0	Merge branch 'master' into ped	2011-09-29 12:22:39 -04:00
Mark DePristo	9536845e35	Cleaning up unused code in MV	2011-09-29 12:20:07 -04:00
Mark DePristo	5043d76c3d	Removing more bad uses of SampleDataSource creation	2011-09-29 12:16:34 -04:00
Mark DePristo	5c9227cf5e	Further cleanup of Sample database -- Removing more and more unnecessary code -- Partial removal of type safe Sample usage. On the road to SampleDB only	2011-09-29 11:50:05 -04:00
Mark DePristo	2a0cd556d3	Further cleanup of Sample -- Cleaned up interface functions in GAE -- Added Walker.getSampleDB() function which is an easier option for tools to get the samples db	2011-09-29 10:34:51 -04:00
Mark DePristo	e76f381628	Moved sample package from DataSources to gatk, and renamed it samples -- All associated changes to the codebase are just header updates	2011-09-29 09:57:15 -04:00
Mark DePristo	e197dcd1f3	Pre-cleanup commit of Sample and SampleDataSource -- SampleDataSource has all reader functionality disabled	2011-09-29 09:44:18 -04:00
Mark DePristo	4d31673cc5	No longer supporting YAML file allows us to delete 75% of the sample's codebase	2011-09-29 09:43:31 -04:00
Ryan Poplin	e366ee18bc	Adding ability to read in and make use of kmer quality tables during HMM evaluation	2011-09-29 07:46:19 -04:00
Mauricio Carneiro	fc86cd6fd8	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr	2011-09-29 00:12:15 -04:00
Roger Zurawicki	4fd5630f6a	Added ReadClipper Unit Test * Includes tests that include HardClip to Read and Reference Coords. * Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing	2011-09-28 23:13:50 -04:00
Matt Hanna	9272ed03b5	Merged bug fix from Stable into Unstable	2011-09-28 21:26:43 -04:00
Matt Hanna	0acaf2df65	Fix an embarrassing issue where a specific configuration of minimal coverage over small intervals could cause reads to be dropped from the pileup. Nothing to see here...	2011-09-28 21:23:01 -04:00
Guillermo del Angel	c8d3a720f9	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-28 18:17:34 -04:00
Guillermo del Angel	7e3cb45093	Further performance optim in banded hmm, about 60% speed improvement over current implementation now	2011-09-28 16:27:28 -04:00
Ryan Poplin	1b1ca80df2	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-28 16:17:39 -04:00
Ryan Poplin	3b73dc89fe	Making several esoteric arguments in the BQSR @Hidden. Adding basic support for Complete Genomics machine cycle.	2011-09-28 16:17:31 -04:00
Mauricio Carneiro	ff2f4df043	Fixed hardclipping inside indel (right tail) when hard clipping the right tail of a read falls inside a deletion, clipping should fall back to the last base before the deletion to follow the ReadClipper's contract.	2011-09-28 16:07:34 -04:00
Mauricio Carneiro	3c7b7f74ef	Optimized interval iteration Using a TreedSet to manipulate getToolkit.getIntervals() and being smart about which intervals to test makes interval clipping O(1) instead of O(n).	2011-09-28 16:07:34 -04:00
Mauricio Carneiro	5c9b659c02	clipping both ends of the reads was modifying the original read This goes against the ReadClipper contract, and was affecting the second part of the read that spans over multiple intervals. Fixed.	2011-09-28 16:07:34 -04:00
Guillermo del Angel	fe23e4d10c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-28 15:53:11 -04:00
Guillermo del Angel	e2b9030e93	First mostly fully functional implementation of banded pair HMM likelihood computation for indel caller. More experimentation to follow but it right now works in small data sets and at least it doesn't break existing things. Disabled by default at this point	2011-09-28 15:51:48 -04:00
Eric Banks	1b45f21774	Removing this command-line tool. Purposely not doing this in stable so that users who may still use it have time to find other options. But the docs are no longer on the wiki.	2011-09-28 13:18:32 -04:00
Eric Banks	1f0e354fae	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-28 13:13:21 -04:00
Eric Banks	bb619a9a3c	Fixing docs	2011-09-28 13:13:03 -04:00
Mark DePristo	5812004e06	Merge branch 'stable'	2011-09-28 11:36:40 -04:00
Mark DePristo	a5006831d7	Shows "" not empty space when default string value is ""	2011-09-28 11:35:52 -04:00
Mark DePristo	1e32281a15	Fix to not show -null when missing short name argument	2011-09-28 11:31:20 -04:00
Mauricio Carneiro	89544c209c	Fixing contracts changed return type to Pair, changing contracts accordingly.	2011-09-28 11:19:17 -04:00
Eric Banks	eacbee3fe5	Merged bug fix from Stable into Unstable	2011-09-27 20:35:18 -04:00
Eric Banks	43b0c98298	Fix docs	2011-09-27 20:34:46 -04:00
Eric Banks	232a6df11c	Add longhand form to the error message.	2011-09-27 20:29:31 -04:00
Eric Banks	1d6fcb6eb1	Revert "Add longhand form to the error message to prevent users from posting borderline dumb posts to GS." This reverts commit 75b2600527cfce05ae683cb394290ff2a80e8552.	2011-09-27 20:27:00 -04:00
Eric Banks	269b9826b6	Add longhand form to the error message to prevent users from posting borderline dumb posts to GS.	2011-09-27 20:26:36 -04:00
Mauricio Carneiro	3b6e43b7c4	Use reads that span multiple intervals * RR will now compress reads that span across multiple intervals correctly and output them in the correct order. * Fixed bug in getReadCoordinateForReferenceCoordinate where if the requested reference coordinate fell inside a deletion in the read the read would be clipped up to one element past the deletion.	2011-09-27 18:39:06 -04:00
Khalid Shakir	84bd355690	Merged bug fix from Stable into Unstable	2011-09-27 14:34:39 -04:00
Khalid Shakir	b090751f62	Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths. Updates to HybridSelectionPipeline: - Added annotations back via snpEff - Minor updates to VQSR paths and lowered memory	2011-09-27 14:33:57 -04:00
Eric Banks	26e71f6688	The Omni files have multiple records (with the same ALT) at a particular location, with one PASSing and the other(s) filtered. Chris, this is why using this file as both eval and comp leads to ref/no-call cells in the GenotypeConcordance table. However, this led to non-determinism in VE because the VCs were placed in a HashSet; we use a LinkedHashMap instead to bring back determinism.	2011-09-27 11:03:17 -04:00
Guillermo del Angel	ceffefa6a6	Intermediate version with banded pair HMM	2011-09-27 10:18:58 -04:00
Mark DePristo	e99ff3caae	Removed lots of old, and not to be used, HMM options -- resulted in massive code cleanup -- GdA will integrate his new banded algorithm here -- Removed: DO_CONTEXT_DEPENDENT_PENALTIES, GET_GAP_PENALTIES_FROM_DATA, INDEL_RECAL_FILE, dovit, GSA_PRODUCTION_ONLY	2011-09-27 10:08:40 -04:00
Mark DePristo	fa0efbc4ca	Refactoring of PairHMM to support reduced reads	2011-09-26 13:28:56 -04:00
Mark DePristo	a6b65d6347	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-26 13:26:21 -04:00
Mark DePristo	4f09453470	Refactored reduced read utilities -- UnitTests for key functions on reduced reads -- PileupElement calls static functions in ReadUtils -- Simple routine that takes a reduced read and fills in its quals with its reduced qual	2011-09-26 12:58:31 -04:00
Eric Banks	234b74dd05	Merged bug fix from Stable into Unstable	2011-09-26 11:47:23 -05:00
Eric Banks	317b95fa57	Fixing some annotator docs	2011-09-26 11:46:45 -05:00
Mauricio Carneiro	b76dbc72f0	Fixed interval navigation bug. If a read was hard clipped away from the current interval, all subsequent reads within that interval (not hardclipped) would be filtered out. Fixed.	2011-09-26 08:13:44 -04:00
Guillermo del Angel	9afccd11b1	Minor refactoring: add ability to MathUtils.normalizeFromLog10 to not go to linear domain but just substract max value from log values and return. Use this function in snp and indel GL computation.	2011-09-25 21:18:56 -04:00
Guillermo del Angel	3eef800889	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-24 21:20:11 -04:00
Guillermo del Angel	203517fbb7	a) Cleanups/bug fixes to previous commit to CombineVariants. b) Change md5 to reflect records that are now merged correctly. c) Change unit merge alleles test to reflect the fact that a null non-variant vc object is not valid and not supported because there's no way to codify such object in a vcf. The code correctly converts this to a non-variant single-base event with whatever the reference is at that location.	2011-09-24 19:08:00 -04:00
Mauricio Carneiro	c31f4cb2f6	Cleaning leading insertions With the current implementation, a read cannot start with a deletion or an insertion. Maybe this will change in the future, but for now, chop the leading insertion off.	2011-09-24 14:33:32 -04:00
Guillermo del Angel	cd058dd10f	a) Fixed md5 for legit change in UG output that now also no-calls genotypes w/0,0,0 in PL's in SNP case. b) First reimplementation of new vc merger of different types. Previous version did it in two steps, first merging all vc's per type and then trying to see if resulting vc's would be merged if alleles of one type were a subset of another, but this won't work when uniquifying genotypes since sample names would be messed up and GT sample names wouldn't match VC sample names. Now, it's actually simpler: when splitting vc's by type before merging, we check for alleles of one vc being a subset of alleles of vc of another type and if so we put them together in same list.	2011-09-24 13:40:11 -04:00
Mark DePristo	bb11951255	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-24 09:26:45 -04:00
Mark DePristo	8d9e136bba	Merge branch 'stable'	2011-09-24 09:26:28 -04:00
Mark DePristo	6804ab6d2f	Bug fix for NPE in very short GATK runs -- Was already in unstable, but not stable...	2011-09-24 09:25:29 -04:00
Mark DePristo	92acff46e5	Moved Haplotype into Utils root	2011-09-24 09:14:05 -04:00
Mark DePristo	f792353dcd	Framework for genotype unit test	2011-09-24 08:56:45 -04:00
Mark DePristo	c0bb0cb465	Make DiploidGenotype enum private to walkers.genotyper	2011-09-24 08:48:33 -04:00
Guillermo del Angel	3a4469a236	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-23 21:58:34 -04:00
Guillermo del Angel	0e74cc3c74	a) Treat SNP genotype likelihoods just as indels, in the sense that they're always normalized as PL's so one of them will always be zero. This creates minor numerical differences in Qual and annotations due to numerical approximations in AF computation. b) Intermediate CombineVariants fixes, not ready yet	2011-09-23 21:58:20 -04:00
Mauricio Carneiro	7cac75ae1d	Merged bug fix from Stable into Unstable	2011-09-23 19:00:43 -04:00
Mauricio Carneiro	fbe3c1e0b3	Adding warning on HardClipping Hard Clipping is still under heavy development and should not be used by anyone less prepared than MacGyver.	2011-09-23 19:00:19 -04:00
Mark DePristo	b66841f179	Static cache for binomial probability -- Very low level performance optimization	2011-09-23 17:29:34 -04:00
Mauricio Carneiro	1a45c331b2	bringing the latest bug fixes to Reduce Reads	2011-09-23 16:40:06 -04:00
Mauricio Carneiro	9ea40f2e41	Deletions/Insertions in hard clip and bug fixes * Deletions now count as hard clipped bases in order to recover the original alignment start of a clipped read. * Insertions do not count as hard clipped bases for the same reason. * This created a bug in the previous cigar cleaning function. Fixed.	2011-09-23 16:37:08 -04:00
David Roazen	40202c85e0	Merged bug fix from Stable into Unstable	2011-09-23 16:35:55 -04:00
David Roazen	e1cb5f6459	SnpEff annotator now assigns a functional class to each effect and distinguishes between actual effects and mere modifiers. -We now assign a functional class (nonsense, missense, silent, or none) to each SnpEff effect, and add a SNPEFF_FUNCTIONAL_CLASS annotation to the INFO field of the output VCF. -Effects are now prioritized according to both biological impact and functional class, instead of impact only. -Many of SnpEff's "low-impact" effects are now classified as "modifiers" with lower priority than every other effect. This includes such "effects" as DOWNSTREAM, UPSTREAM, INTRON, GENE, EXON, and others that really describe the location of the variant rather than its biological effect. This code will be short-lived (likely 1.2-only), as the next version of SnpEff will include most of these features directly. Checking this change into Stable+Unstable instead of Unstable because the current functional class stratification in VariantEval is basically broken and urgently needs to be fixed for production purposes.	2011-09-23 16:06:52 -04:00
Matt Hanna	e388c357ca	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-23 14:53:28 -04:00
Matt Hanna	cc23b0b8a9	Fix for recent change modelling unmapped shards: don't invoke optimization to combine mapped and unmapped shards.	2011-09-23 14:52:31 -04:00
Mark DePristo	e3d4efb283	Remove N2 EXACT model code, which should never be used	2011-09-23 11:55:21 -04:00
Mark DePristo	27ce3c822e	Merge branch 'stable'	2011-09-23 09:04:52 -04:00
Mark DePristo	2bb77a7978	Docs for all VariantAnnotator annotations	2011-09-23 09:04:16 -04:00
Mark DePristo	dd65ba5bae	@Hidden for DocumentationTest and GATKDocsExample	2011-09-23 09:03:37 -04:00
Mark DePristo	dfce301beb	Looks for @Hidden annotation on all classes and excludes them from the docs	2011-09-23 09:03:04 -04:00
Mark DePristo	4397ce8653	Moved removePLs to VariantContextUtils	2011-09-23 08:24:20 -04:00
Mark DePristo	c49cc623de	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-22 17:26:21 -04:00
Mark DePristo	5cf82f9236	simpleMerge UnitTest tests filtered VC merging	2011-09-22 17:05:12 -04:00
Mauricio Carneiro	96c875399c	Merging many bug fixes to reduce reads	2011-09-22 17:04:11 -04:00
Mauricio Carneiro	39b54211d0	Fixed hard clipping soft clipped bases after hard clips if soft clipped bases were after a hard clipped section of the read, the hard clip was clipping the left soft clip tail as if it were a right tail. Mayhem.	2011-09-22 15:46:55 -04:00
Mauricio Carneiro	1acf7945c5	Fixed hard clipped cigar and alignment start * Hard clipped Cigar now includes all insertions that were hard clipped and not the deletions. * The alignment start is now recalculated according to the new hard clipped cigar representation	2011-09-22 14:51:14 -04:00
Mauricio Carneiro	4e9020c9f7	Fixed alignment start for hard clipping insertions	2011-09-22 13:28:25 -04:00
Mark DePristo	ba5f83fee2	start of VariantContextUtils UnitTest -- tests rsID merging	2011-09-22 12:10:39 -04:00
Mark DePristo	93dd1faa5f	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-22 11:20:10 -04:00
Mark DePristo	a05c959e5a	Empty unit tests for VariantContextUtils -- will be expanded over the day	2011-09-22 11:20:07 -04:00
Christopher Hartl	4f4a0fc38a	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git	2011-09-22 11:01:58 -04:00
Christopher Hartl	982c47bfa7	Remove duplicate effort in ReadUtils (with apologies to Mauricio) Big (but not major) cleanup of code in ILG - mostly excising the old likelihood model Activated the early-abort check for ILG. I think it should be better this way.	2011-09-22 10:58:26 -04:00
Eric Banks	8f8b59a932	My interpretation of the VCF spec is that the FORMAT field should only be present if there is genotype/sample data. So the VCFCodec now throws an exception when it encounters such a case. I had to fix one of the integration test VCFs.	2011-09-21 22:23:28 -04:00
Christopher Hartl	dc96f6da79	Merge branch 'master' of ssh://chartl@gsa2/humgen/gsa-scr1/chartl/dev/git	2011-09-21 18:18:41 -04:00
Christopher Hartl	f9cdc119af	Added a method to ReadUtils that converts reads of the form 10S20M10S to 40M (just unclips the soft-clips). Be careful when using this - if you're writing a bam file it will be potentially written out of order (since the previous alignment start was at the M, not the S).	2011-09-21 18:16:42 -04:00
Christopher Hartl	faff6e4019	Failed to commit changes to the GATKReport required for more easy access when using the files as data sources (read: histograms) for walkers	2011-09-21 18:15:23 -04:00
Mauricio Carneiro	96768c8a18	Sending latest bug fixes to Reduce Reads to the main repository	2011-09-21 17:43:11 -04:00
Mauricio Carneiro	70335b2b0a	Hard clipping soft clipped reads to fix misalignments. Pre-softclipped reads (with high qual) are a complicated event to deal with in the Reduced Reads environment. I chose to hard clip them out for now and added a todo item to bring them back on in the future, perhaps as a variant region.	2011-09-21 17:12:01 -04:00
Christopher Hartl	ef05827c7b	Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-21 16:40:47 -04:00
Christopher Hartl	3b51d9106a	Adding in likelihood calculations for mendelian violations. Also fixing a minor and rare bug in SelectVariants when specifying family structure on the command line.	2011-09-21 16:40:29 -04:00
Mark DePristo	04968c88b3	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-21 15:43:25 -04:00
Mark DePristo	6bcfce225f	Fix for dynamic type determination for bgzip files -- GZipInputStream handles bgzip files under linux, but not mac -- Added BlockCompressedInputStream test as well, which works properly on bgzip files	2011-09-21 15:39:19 -04:00
Mark DePristo	9f6f0c443c	Marginally cleaner isVCFStream() function -- cleanup trying to debug minor bug. Failed to fix the bug, but the code is nicer now	2011-09-21 15:25:01 -04:00
Ryan Poplin	5fef6dc5d0	Merged bug fix from Stable into Unstable	2011-09-21 15:23:06 -04:00
Ryan Poplin	2585fc3d6c	Updating Rscript path doc text for Broad users	2011-09-21 15:22:26 -04:00
Mark DePristo	74f9ccf6dd	Merge	2011-09-21 11:30:11 -04:00
Mark DePristo	6592972f82	Putative fix for BAQ array out of bounds -- Old code required qual to be <64, which isn't strictly necessary. Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant -- Unittest to enforce this behavior	2011-09-21 11:25:08 -04:00
Eric Banks	174859fc68	Don't allow whitespace in the INFO field	2011-09-21 11:14:54 -04:00
Mark DePristo	ecc7f34774	Putative fix for BAQ problem.	2011-09-21 11:09:54 -04:00
Mark DePristo	7d11f93b82	Final bugfix for CombineVariants -- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp -- Proper handling of ids. If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list	2011-09-21 10:58:32 -04:00
Mark DePristo	a91ac0c5db	Intermediate commit of bugfixes to CombineVariants	2011-09-21 10:15:05 -04:00
David Roazen	b04d8eab55	Merged bug fix from Stable into Unstable	2011-09-20 17:24:14 -04:00
Mauricio Carneiro	758ecf2d43	Bringing latest updates of ReduceReads to the master repository	2011-09-20 16:35:09 -04:00
David Roazen	d9ea764611	SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file. This change is urgently required for production, which is why it's going into Stable+Unstable instead of just Unstable. The keys for the SnpEff version and command header lines in the VCF file output by VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally different from the keys for those same lines in the SnpEff output file (SnpEffVersion and SnpEffCmd), so that output files from VariantAnnotator won't be confused with output files from SnpEff itself.	2011-09-20 16:30:55 -04:00
Mark DePristo	bffd3cca6f	Bug fix for reduced read; only adds regular bases for calculation -- No longer passes on deletions for genotyping	2011-09-20 15:07:06 -04:00
Mark DePristo	a1b4cafe7a	Bug fix for NPE when timer wasn't initialized	2011-09-20 13:59:59 -04:00
Mark DePristo	b7511c5ff3	Fixed long-standing bug in tribble index creation -- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index. This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write -- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary. This can be used conveniently everywhere, and is what's written into the Tribble index -- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils -- VCFWriter now requires the master sequence dictionary -- Updated walkers that create VCFWriters to provide the master sequence dictionary	2011-09-20 10:53:18 -04:00
Mark DePristo	230e16d7c0	Merge branch 'master' into rodrewrite	2011-09-20 06:54:18 -04:00
Mark DePristo	aa8afa3899	Merge	2011-09-19 21:16:47 -04:00
Mauricio Carneiro	56106d54ed	Changing ReadUtils behavior to comply with GenomeLocParser Now the functions getRefCoordSoftUnclippedStart and getRefCoordSoftUnclippedEnd will return getUnclippedStart if the read is all contained within an insertion. Updated the contracts accordingly. This should give the same behavior as the GenomeLocParser now.	2011-09-19 14:00:00 -04:00
Mauricio Carneiro	080c957547	Fixing contracts for SoftUnclippedEnd utils Now accepts reads that are entirely contained inside an insertion.	2011-09-19 13:53:53 -04:00
Mauricio Carneiro	5e832254a4	Fixing ReadAndInterval overlap comments.	2011-09-19 13:28:41 -04:00
Christopher Hartl	ecb8466662	Merged bug fix from Stable into Unstable	2011-09-19 12:32:08 -04:00
Christopher Hartl	8143def292	Fix the -T argument in the DepthOfCoverage docs Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file	2011-09-19 12:31:47 -04:00
Christopher Hartl	034b868588	Revert "Fix the -T argument in the DepthOfCoverage docs" This reverts commit 0994efda998cf3a41b1a43696dbc852a441d5316.	2011-09-19 12:16:07 -04:00
Mark DePristo	cfde0e674b	Merge branch 'sgintervals'	2011-09-19 12:02:41 -04:00
Mark DePristo	3e93f246f7	Support for sample sets in AssignSomaticStatus -- Also cleaned up SampleUtils.getSamplesFromCommandLine() to return a set, not a list, and trim the sample names.	2011-09-19 11:40:45 -04:00
Mark DePristo	41ffb25b74	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-19 10:55:18 -04:00
Christopher Hartl	ca1b30e4a4	Fix the -T argument in the DepthOfCoverage docs Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file	2011-09-19 10:29:06 -04:00
Mark DePristo	4ad330008d	Final intervals cleanup -- No functional changes (my algorithm wouldn't work) -- Major structural cleanup (returning more basic data structures that allow us to development new algorithm) -- Unit tests for the efficiency of interval partitioning	2011-09-19 10:19:10 -04:00
Mark DePristo	6ea57bf036	Merge branch 'master' into sgintervals	2011-09-19 09:50:19 -04:00
Mark DePristo	6bd42c053d	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-18 20:18:39 -04:00
Roger Zurawicki	091c7197cd	Fixed memory leak and bug with deletions in clipping The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug. * There is no check to make sure the read coordinate are covered by the read though When Hard clipping to interval, I added a check for deletions. NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized	2011-09-18 19:21:51 -04:00
Guillermo del Angel	e7b9a009b7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-16 12:48:30 -04:00
Menachem Fromer	b2e8e11128	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-16 00:52:27 -04:00
Christopher Hartl	57b3efa2e2	Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 21:06:38 -04:00
Christopher Hartl	939babc820	Updating formating for ValidationAmplicons GATK docs	2011-09-15 21:05:51 -04:00
Christopher Hartl	9fdf1f8eb6	Fix some doc formatting for Depth of Coverage	2011-09-15 21:05:22 -04:00
Menachem Fromer	e6e9b08c9a	Must provide alleles VCF to UGCallVariants	2011-09-15 18:51:09 -04:00
David Roazen	d78e00e5b2	Renaming VariantAnnotator SnpEff keys This is to head off potential confusion with the output from the SnpEff tool itself, which also uses a key named EFF.	2011-09-15 17:42:15 -04:00
Ryan Poplin	2a8b8efd2f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 16:26:35 -04:00
Ryan Poplin	2f58fdb369	Adding expected output doc to CountCovariates	2011-09-15 16:26:11 -04:00
Eric Banks	fd1831b4a5	Updating docs to include more details	2011-09-15 16:25:03 -04:00
Eric Banks	6d02a34bfb	Updating docs to include output	2011-09-15 16:17:54 -04:00
Eric Banks	4ef6a4598c	Updating docs to include output	2011-09-15 16:10:34 -04:00
Eric Banks	fe474b77f8	Updating docs so printing looks nicer	2011-09-15 16:05:39 -04:00
Eric Banks	f04e51c6c2	Adding docs from Andrey since his repo was all screwed up.	2011-09-15 15:38:56 -04:00
Guillermo del Angel	86480b2e13	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-15 15:31:07 -04:00
Eric Banks	d369d10593	Adding documentation before the release for GATK wiki page	2011-09-15 13:56:23 -04:00
Eric Banks	202405b1a1	Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations.	2011-09-15 13:52:31 -04:00
David Roazen	1e682deb26	Minor html-formatting-related documentation fix to the SnpEff class.	2011-09-15 13:07:50 -04:00
Guillermo del Angel	a942fa38ef	Refine the way we merge records in CombineVariants of different types. As of before, two records of different types were not combined and were kept separate. This is still the case, except when the alleles of one record are a strict subset of alleles of another record. For example, a SNP with alleles {A,T} and a mixed record with alleles {A,T, AAT} are now combined when start position matches.	2011-09-15 10:22:28 -04:00
David Roazen	3db457ed01	Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames" After discussing this with Mark, it seems clear that the old version of the VariantEval FunctionalClass stratification is preferable to this version. By reverting, we maintain backwards compatibility with legacy output files from the old GenomicAnnotator, and can add SnpEff support later without breaking that backwards compatibility. This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.	2011-09-14 10:47:28 -04:00
David Roazen	e0c8c0ddcb	Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames This is a temporary and hopefully short-lived solution. I've modified the FunctionalClass stratification to stratify by effect impact as defined by SnpEff annotations (high, moderate, and low impact) rather than by the silent/missense/nonsense categories. If we want to bring back the silent/missense/nonsense stratification, we should probably take the approach of asking the SnpEff author to add it as a feature to SnpEff rather than coding it ourselves, since the whole point of moving to SnpEff was to outsource genomic annotation.	2011-09-14 07:09:47 -04:00
David Roazen	1213b2f8c6	SnpEff 2.0.2 support -Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2) -Removed support for SnpEff 1.9.6 (and associated tribble codec) -Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag) -Correctly matches ref/alt alleles before annotating a record, unlike the previous version -Correctly handles indels (again, unlike the previous version	2011-09-14 07:09:47 -04:00
Guillermo del Angel	5b1bf6e244	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-13 17:04:43 -04:00
Guillermo del Angel	c6672f2397	Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf	2011-09-13 16:57:37 -04:00
Mark DePristo	edf29d0616	Explicit info message about uploading S3 log	2011-09-12 22:16:52 -04:00
Mark DePristo	2316b6aad3	Trying to fix problems with S3 uploading behind firewalls -- Cannot reproduce the very long waits reported by some users. -- Fixed problem that exception might result in an undeleted file, which is now fixed with deleteOnExit()	2011-09-12 22:02:42 -04:00
Matt Hanna	64707c33bb	Merged bug fix from Stable into Unstable	2011-09-12 21:54:11 -04:00
Matt Hanna	e63d9d8f8e	Mauricio pointed out to me that dynamic merging the unmapped regions of multiple BAMs ('-L unmapped' with a BAM list) was completely broken. Sorry about this! Fixed.	2011-09-12 21:50:59 -04:00
Eric Banks	ec4b30de6d	Patch from Laurent: typo leads to bad error messages.	2011-09-12 14:45:53 -04:00
David Roazen	9d9d438bc4	New VariantAnnotatorEngine capability: an initialize() method for all annotation classes. All VariantAnnotator annotation classes may now have an (optional) initialize() method that gets called by the VariantAnnotatorEngine ONCE before annotation starts. As an example of how this can be used, the SnpEff annotation class will use the initialize() method to check whether the SnpEff version number stored in the vcf header is a supported version, and also to verify that its required RodBinding is present.	2011-09-12 13:00:53 -04:00
Ryan Poplin	981b78ea50	Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.	2011-09-12 12:17:43 -04:00
Ryan Poplin	60ebe68aff	Fixing issue in VariantEval in which insertion and deletion events weren't treated symmetrically. Added new option to require strict allele matching.	2011-09-12 09:43:23 -04:00
Guillermo del Angel	9344938360	Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly	2011-09-10 19:41:01 -04:00
Guillermo del Angel	e95d484757	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-09 18:31:14 -04:00
Guillermo del Angel	a807205fc3	a) Minor optimization to softMax() computation to avoid redundant operations, results in about 5-10% increase in speed in indel calling. b) Added (but left commented out since it may affect integration tests and to isolate commits) fix to per-sample DP reporting, so that deletions are included in count. c) Bug fix to avoid having non-reference genotypes assigned to samples with PL=0,0,0. Correct behavior should be to no-call these samples, and to ignore these samples when computing AC distribution since their likelihoods are not informative.	2011-09-09 18:00:23 -04:00
Mauricio Carneiro	9e650dfc17	Fixing SelectVariants documentation getting rid of messages telling users to go for the YAML file. The idea is to not support these anymore.	2011-09-09 16:25:31 -04:00
Mark DePristo	72536e5d6d	Done	2011-09-09 15:44:47 -04:00
Mark DePristo	3c8445b934	Performance bugfix for GenomeLoc.hashcode -- old version overflowed so most GenomeLocs had 0 hashcode. Now uses or not plus to combine	2011-09-09 14:25:37 -04:00
Mark DePristo	c6436ee5f0	Whitespace cleanup	2011-09-09 14:24:29 -04:00
Mark DePristo	87dc5cfb24	Whitespace cleanup	2011-09-09 14:23:42 -04:00
Ryan Poplin	91c949db74	Fixing ValidateVariants so that it validates deletion records. Fixing GATKdocs.	2011-09-09 12:57:14 -04:00
Mark DePristo	06cb20f2a5	Intermediate commit cleaning up scatter intervals -- Adding unit tests to ensure uniformity of intervals	2011-09-09 12:56:45 -04:00
Eric Banks	6ad8943ca0	CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.	2011-09-09 09:45:24 -04:00
Mark DePristo	48461b34af	Added TYPE argument to print out VariantType	2011-09-08 15:01:13 -04:00
Ryan Poplin	9cba1019c8	Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap	2011-09-08 09:25:13 -04:00
Ryan Poplin	e0020b2b29	Fixing PrintRODs. Now has input and only prints out one copy of each record	2011-09-08 08:58:37 -04:00
Ryan Poplin	29c968ab60	clean up	2011-09-08 08:42:43 -04:00
Ryan Poplin	59841f8232	Fixing genotype given alleles for indels. Only take the records that start at this locus.	2011-09-08 08:41:16 -04:00
Mark DePristo	cd2c511c4a	GCF improvements -- Support for streaming VCF writing via the VCFWriter interface -- GCF now has a header and a footer. The header is minimal, and contains a forward pointer to the position of the footer in the file. -- Readers now read the header, and then jump to the footer to get the rest of the "header" information -- Version now a field in GCF	2011-09-07 23:28:46 -04:00
Mark DePristo	fe5724b6ea	Refactored indexing part of StandardVCFWriter into superclass -- Now other implementations of the VCFWriter can easily share common functions, such as writing an index on the fly	2011-09-07 23:27:08 -04:00
Mark DePristo	01b6177ce1	Renaming GVCF -> GCF	2011-09-07 17:10:56 -04:00
Mark DePristo	b220ed0d75	Merge branch 'master' into rodrewrite	2011-09-07 17:05:35 -04:00
Guillermo del Angel	45d54f6258	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 16:49:49 -04:00
Guillermo del Angel	9604fb2ba3	Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted	2011-09-07 16:49:16 -04:00
Mark DePristo	2ded027762	Removed dysfunctional tranches support from VariantEval	2011-09-07 16:09:24 -04:00
Eric Banks	aa9e32f2f1	Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.	2011-09-07 15:48:06 -04:00
Eric Banks	3a04955a30	We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now.	2011-09-07 14:01:42 -04:00
Guillermo del Angel	743bf7784c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 13:21:26 -04:00
Guillermo del Angel	5f22ef9a8c	Added missing javadoc info to Beagle arguments	2011-09-07 13:21:11 -04:00
Mark DePristo	3bcbfa6e06	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-09-07 13:13:17 -04:00
Mark DePristo	430da23446	At least 2 minutes must pass before a status message is printed, further stabilizing time estimates	2011-09-07 13:13:07 -04:00
Mauricio Carneiro	6857d0324e	Merge branch 'master' into rr	2011-09-07 12:59:08 -04:00
Mark DePristo	7e9e20fed0	Forgot to delete previous call	2011-09-07 12:54:52 -04:00
Mark DePristo	d23d620494	Pushing traversal engine timer start to as close to actual start as possible -- Should make initial timings more accurate	2011-09-07 12:52:33 -04:00
Mark DePristo	6ff432e1f2	BugFix for TF argument to VariantEval, actually making it work properly	2011-09-07 12:50:17 -04:00
Mauricio Carneiro	131cb7effd	Bringing Reduce Reads bug fixes to the main repository	2011-09-07 12:25:53 -04:00
Mark DePristo	a1920397e8	Major bugfix for per sample VariantEval -- per sample stratification was not being calculated correctly. The alt allele was always remaining, even if the genotype of the sample was hom-ref. Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything. Eric is going to fix the system in general, so this commit may break the build.	2011-09-07 12:18:11 -04:00
Mark DePristo	a02636a1ac	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/ebanks/Sting_rodrefactor into rodrewrite	2011-09-07 10:50:00 -04:00
Mark DePristo	d5641cfac5	Merge branch 'variantEvalST'	2011-09-07 10:44:23 -04:00
Mark DePristo	2f4cf82e3b	VariantEval cleanup. Added VariantType Stratification -- ArrayList are List where possible -- states refactored into VariantStratifier base class (reduces many lines of duplicate code) -- Added VariantType stratification that partitions report by VariantContext.Type	2011-09-07 10:43:53 -04:00
Christopher Hartl	436f6eb52b	Reverting Eric's change and pushing in some command-line-option documentation.	2011-09-07 08:53:30 -04:00
Eric Banks	1ef8a1750a	I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon.	2011-09-06 21:07:49 -04:00
Eric Banks	da9c8ab386	Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.	2011-09-06 20:39:42 -04:00
Mark DePristo	3db7ecb920	ReducedRead flag cached in GATKSAMRecord. 20% performance improvement	2011-09-06 15:11:38 -04:00
Roger Zurawicki	47607a7eff	Fixed bug where deletions messed up interval clipping - Instead of using readLength, the ReadUtil function are used to get a proper read coordinate - Added debug info in interval clipping ( with -dl) NOTE: method might not be safe for production and checks need to be added to the ClippingOp code	2011-09-06 14:25:57 -04:00
Khalid Shakir	0adb388dee	Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue. Updates for HybridSelectionPipeline: - Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5. - If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds. - Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches. - Removed GSA_PRODUCTION_ONLY flag from indel caller. - Updated indel hard filters based on delangel's analysis. - Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.	2011-09-06 12:41:46 -04:00
Mark DePristo	d471617c65	GATK binary VCF (gvcf) prototype format for efficiency testing -- Very minimal working version that can read / write binary VCFs with genotypes -- Already 10x faster for sites, 5x for fully parsed genotypes, and 1000x for skipping genotypes when reading	2011-09-02 21:15:19 -04:00
Mark DePristo	048202d18e	Bugfix for cached quals	2011-09-02 21:13:28 -04:00
Mark DePristo	03aa04e37c	Simple refactoring to make formating functions public	2011-09-02 21:13:08 -04:00
Mark DePristo	124ef6c483	MISSING_VALUE now gets defaultValue in getAttribute functions	2011-09-02 21:12:28 -04:00
Mark DePristo	82f2131777	Simplied getAttributeAsX interfaces -- Removed versions getAttribriteAsX(key) that except on not having the value. -- Removed version that getAttributeAsXNoException(key) -- The only available assessors are now getAttributeAsX(key, default). -- This single accessors properly handle their argument types, so if the value is a double it is returned directly for getAttributeAsDouble(), or if it's a string it's converted to a double. If the key isn't found, default is returned.	2011-09-02 12:27:11 -04:00
Mauricio Carneiro	08ae6c0c61	ReadClipper is now handling unmapped reads	2011-09-02 11:32:30 -04:00
Mark DePristo	c57198a1b9	Optimizations in VCFCodec -- Don't create an empty LinkedHashSet() for PASS fields. Just return Collections.emptySet() instead. -- For filter fields with actual values, returns an unmodifiableSet instead of one that can be changed	2011-09-02 08:46:17 -04:00
Mark DePristo	c3ea96d856	Removing many unused functions of unquestionable purpose	2011-09-02 08:42:01 -04:00
Eric Banks	d241f0e903	Adding docs for the pcr error rate argument.	2011-09-01 21:57:02 -04:00
Eric Banks	827fe6130c	Adding hidden printing option. Also, always run UG in mode GENOTYPE_GIVEN_ALLELES given that we don't actually test for the correct alleles (otherwise UG may choose a different allele and we may falsely validate the wrong one).	2011-09-01 11:40:35 -04:00
Mark DePristo	ac49b8d26b	Conditional support for PerformanceTrackingQuerySource to measure Tribble / GATK bridge performance -- Removed DEBUG option, instead use MEASURE_TRIBBLE_QUERY_PERFORMANCE in RMDTrackerBuilder	2011-09-01 10:41:55 -04:00
Mauricio Carneiro	4b5a7046c5	Making ReadLengthDistribution Public Found this neat little walker Kiran wrote stashed in the private tree. Very useful. Generalized it a bit, added GATKDocs and moved it to public. I might include it as a QC step on the pacbio processing pipeline. * generalize it so it works with non pair ended reads. * generalize it to work with no read group information	2011-08-31 15:52:28 -04:00
Mauricio Carneiro	7d79de91c5	Merge branch 'master' into rr	2011-08-30 02:50:19 -04:00
Mauricio Carneiro	0cd9438ac2	fixed soft unclipped calculation * getRefCoordSoftUnclippedEnd was not resetting the shift when hitting insertions. Fixed. * getReadCoordinateForReferenceCoordinateBeforeAlignmentEnd was returning the wrong read coordinate position. Fixed.	2011-08-30 02:45:29 -04:00
Mauricio Carneiro	fd540592ab	Added RMS calculation for consensus MQ Consensus MQ is now the average of the RMS of the mapping qualities of the reads making each site.	2011-08-30 02:45:20 -04:00
Mauricio Carneiro	6f9264d2b3	Hard Clipping no longer leaves indels on the tails The clipper could leave an insertion or deletion as the start or end of a read after hardclipping a read if the element adjacent to the clipping point was an indel. Fixed.	2011-08-30 02:44:58 -04:00
Mauricio Carneiro	943876c6eb	Added QUAL/MINVAR parameters to the walker	2011-08-30 02:44:46 -04:00
Mauricio Carneiro	7532be7f5a	Allowing to clip after AlignmentEnd if end is soft clipped. Read clipper now identifies and clips even if the requested coordinate is outside the alignment but the read contains soft clipped bases in that region.	2011-08-30 02:44:46 -04:00
Mauricio Carneiro	90a1f5e15c	Several bug fixes * When hard clipping a read that had insertions in it, the insertion was being added to the cigar string's hard clip element. This way, the old UnclippedStart() was being modified and so was the calculation of the new AlignmentStart(). Fixed it by subtracting the number of insertions clipped from the total number of hard clipped bases. * Walker was sending read instead of filtered read when deleting a read that contains only Q2 bases * Sliding the window was causing reads that started on the new start position to be entirely clipped.	2011-08-30 02:44:19 -04:00
Mauricio Carneiro	66a8b36cf5	Fixed most indexing bugs * added bases and quals to consensus * fixed consensus read cigar generation.	2011-08-30 02:43:41 -04:00
Mark DePristo	1e5001b447	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-29 17:04:21 -04:00
Mark DePristo	3b09d42ed6	Now only prints 1 warning message about duplicate headers in simpleMerge	2011-08-29 14:41:29 -04:00
Eric Banks	c2f0db969b	Don't use the default deletion value from UG if not asking to have it set	2011-08-29 13:48:10 -04:00
Eric Banks	bb7a37e8f2	We need to allow reference calls in the input VCF for the GenotypeAndValidate walker when using the BAM as truth so that we can test supposed monomorphic calls against the truth.	2011-08-29 13:19:35 -04:00
Ryan Poplin	bc252a0d62	misc minor bug fixes in assembly. Increasing the minimum number of bad variants to be used in negative model training in the VQSR	2011-08-29 08:11:31 -04:00
Mark DePristo	a5c65fc133	Debugging information to print out the Query tracks	2011-08-28 18:54:49 -04:00
Mark DePristo	7bf006278d	Moved ResolveHostname to general utils as a static function	2011-08-28 12:04:16 -04:00
Mark DePristo	ccec0b4d73	AnalyzeCovariates uses the general RScript system now -- Convenience constructor for collection for testing -- callRScript() now accepts Objects not Strings, for convenience	2011-08-27 12:54:13 -04:00
Mark DePristo	1ceb020fae	UnitTests for RScript	2011-08-27 10:50:05 -04:00
Mark DePristo	e37a638e09	Fix for disallowed characters in GATKReportTable -- Illegal characters are automatically replaced with _	2011-08-26 13:24:06 -04:00
Mark DePristo	eef1ac415a	Merge branch 'master' into rodTesting Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java	2011-08-26 00:35:41 -04:00
Eric Banks	9b7512fd94	Just because there's a ref base doesn't mean the VC needs to be padded	2011-08-25 22:42:14 -04:00
Mark DePristo	e01273ca7c	Queue now writes out queueJobReport.pdf -- General purpose RScript executor in java (please use when invoking RScripts) -- Removed groupName. This is now analysisName -- Explicitly added capability to enable/disable individual QFunction	2011-08-25 16:57:11 -04:00
Eric Banks	09a729da3a	Removing incorrect comment	2011-08-25 15:42:52 -04:00
Eric Banks	8bbef79fc2	Create clipped alleles during allele parsing instead of creating a full VC, clipping alleles, and regenerating the VC from scratch.	2011-08-25 15:37:26 -04:00
Ryan Poplin	29c7b10f7b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-24 15:18:58 -04:00
Ryan Poplin	e5008aba00	Output the top two haplotypes as a variant call by running smith-waterman alignment against the reference and calling any difference as variation. This is the first verion that runs end-to-end by taking in reads as bam file and writing out variant calls in VCF.	2011-08-24 15:18:44 -04:00
Guillermo del Angel	e618cb1e79	a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED\|INDEL\|SNP\|MNP\|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC\|MULTIALLELIC]. Corresponding gatkdocs changes. b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk). c) Added integrationtest for new SelectVariants commands	2011-08-24 12:25:50 -04:00
Mark DePristo	28ee6dac41	Fixed spelling mistake	2011-08-24 10:14:45 -04:00
Mark DePristo	569e1a1089	Walker.isDone() aborts execution early -- Useful if you want to have a parameter like MAX_RECORDS that wants the walker to stop after some number of map calls without having to resort to the old System.exit() call directly.	2011-08-23 16:53:06 -04:00
Ryan Poplin	a1a1fac9e4	Likelihood engine now gives non-zero likelihoods. Using HMM function that can handle context specific gap open and gap continuation penalties	2011-08-23 13:43:07 -04:00
Guillermo del Angel	6e2552a9ef	Merge fix	2011-08-23 12:40:43 -04:00
Guillermo del Angel	8b7a0b3b62	Two new arguments to SelectVariants to exclude either multiallelic or biallelic sites from input vcf	2011-08-23 12:40:01 -04:00
Roger Zurawicki	ac36271457	Fixed extra reads showing up in Variable Sites Reads that were not hard clipped for the variable site no longer show up in output file Walker now uses unclippedStart of Read to determine position in the sliding Window	2011-08-23 11:26:00 -04:00
Mark DePristo	6d6feb5540	Better error message when you cannot determine a ROD type because the file doesn't exist or cannot be read	2011-08-23 10:56:37 -04:00
Mauricio Carneiro	feeab6075f	Merging ReduceReads development with unstable repo It is time to bring the ReadClipper class to the main repo. Read Clipper has tested functionality for soft and hard clipping reads. I will prepare thorough documentation for it as it will be very useful for the assembler and the GATK in general.	2011-08-22 23:03:03 -04:00
Guillermo del Angel	ee68713267	Further Bug fixes to CountVariants: stratifications were wrong in case genotypes had no-calls, for example if we stratified by sample and a sample had a no-call, this no-call was considered a true variant and counts were incorrectly increased	2011-08-22 20:42:47 -04:00
Guillermo del Angel	c270384b2e	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-22 20:39:32 -04:00
Guillermo del Angel	8ae24912f4	a) Misc fixes in Phase1 indel vqsr script, b) More R-friendly VariantsToTable printing of AC in case of multiple alt alleles c) Rename FixPLOrderingWalker to FixGenotypesWalker and rewrote: no longer need older code, replaced with code to replace genotypes with all-zero PL's with a no-call.	2011-08-22 20:39:06 -04:00
Mark DePristo	85c5a6f890	Merge branch 'rodTesting' Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/performance/ProfileRodSystem.java	2011-08-22 17:43:47 -04:00
Mark DePristo	1eab9be35d	Now with accurate javadoc	2011-08-22 17:25:15 -04:00
Mark DePristo	3612a3501d	info, not warn, about dynamic type determination	2011-08-22 17:24:51 -04:00
Eric Banks	dc42571dd9	Only create the genotype map when necessary	2011-08-22 15:40:36 -04:00
Khalid Shakir	c4c90c8826	Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline: - Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size. - Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values. - Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8 - Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.	2011-08-22 15:13:27 -04:00
Eric Banks	2c24b68a96	Working implementation of DecodeLoc for VCF parsing. Makes indexing 3x faster.	2011-08-22 15:11:21 -04:00
Eric Banks	518b3dd291	Don't let the genotypes map be null	2011-08-22 15:10:30 -04:00
Ryan Poplin	f93a554b01	updating exome specific parameters in MDCP	2011-08-21 10:25:36 -04:00
Ryan Poplin	dbff84c54e	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-21 10:09:19 -04:00
Khalid Shakir	22ca44c015	Fixed Queue's tagging of RodBindings. Fixed argument definition names.	2011-08-21 02:34:20 -04:00
Eric Banks	a8cbced71b	Bug fix for Ryan: check for no context	2011-08-20 22:49:51 -04:00
Eric Banks	0ccd173967	Fixing the recent SelectVariants fix	2011-08-20 21:30:08 -04:00
Ryan Poplin	b008676878	fixing the previous fix	2011-08-20 21:21:55 -04:00
Guillermo del Angel	782453235a	Updated VariantEvalIntegrationTest since there's a new column separating nMixed and nComplex in CountVariants Misc updates to WholeGenomeIndelCalling.scala Bug fix in VariantEval (may be temporary, need more investigation): if -disc option is used in sites-only vcf's then a null pointer exception is produced, caused by recent introduction of -xl_sf options.	2011-08-20 12:24:22 -04:00
Ryan Poplin	539e157ecd	Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR	2011-08-20 11:28:48 -04:00
Guillermo del Angel	4939648fd4	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-20 08:50:43 -04:00
Ryan Poplin	a96ecbab71	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-19 19:30:05 -04:00
Ryan Poplin	ddb5045e14	Updating the methods development calling pipeline for the new rod binding syntax and the new best practices.	2011-08-19 19:29:51 -04:00
Mark DePristo	8b3cfb2f1c	Final documented version of GATKDoclet and associated classes -- Docs on everything. -- Feature complete. At this point only minor improvements and bugfixes are anticipated	2011-08-19 16:52:17 -04:00
Mark DePristo	b08d63a6b8	Documentation and code cleanup for ClipReads, CallableLoci, and VariantsToTable -- Swapped -o [summary] and -ob [bam] for more standard -o [bam] and -os [summary] arguments. -- @Advanced arguments	2011-08-19 15:06:37 -04:00
Mark DePristo	49e831a13b	Should have checked in	2011-08-19 14:35:16 -04:00
Mauricio Carneiro	7b5fa4486d	GenotypeAndValidate - Added docs to the @Arguments	2011-08-19 13:35:11 -04:00
Mark DePristo	9f7d4beb89	Merge branch 'help'	2011-08-19 13:14:02 -04:00
Mark DePristo	4d1fd17a97	GATKDoclet cleanup and documentation -- Fixed bug in the way ArgumentCollections were handled that lead to failure in handling the dbsnp argument collection.	2011-08-19 13:13:41 -04:00
Ryan Poplin	0f25167efd	minor fix in VariantEval docs	2011-08-19 11:01:04 -04:00
Mark DePristo	198955f752	GATKDoc descriptions for all standard codecs, or TODO for their owners -- Also added vcf.gz support in the VCF codec. This wasn't committed in the last round, because it was missed by the parallel documentation effort.	2011-08-19 09:57:21 -04:00
Guillermo del Angel	269ed1206c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-19 09:32:20 -04:00
Eric Banks	40e67cff1b	I like the @Advanced annotation	2011-08-18 22:27:34 -04:00
Mark DePristo	2457c7b8f5	Merge branch 'master' into help	2011-08-18 22:20:43 -04:00
Mark DePristo	5fbdf968f7	ArgumentSource no longer comparable. Arguments sorted by GATKDoclet	2011-08-18 22:20:14 -04:00
Eric Banks	77fa2c1546	Renaming read filters with a superfluous 'Read' in their names. Kept the ones that made sense to have it (e.g. MalformedReadFilter).	2011-08-18 22:01:33 -04:00
Mark DePristo	1d3799ddf7	Merge branch 'master' into help	2011-08-18 22:00:29 -04:00
Mark DePristo	d1892cd0d7	Bug fixes -- Sorting of ArgumentSources now done in GATKDoclet, not in the ParsingEngine, as the system depends on the LinkedTreeMap -- Fixed broken exception throwing in the case where a file's type could not be determined	2011-08-18 21:58:36 -04:00
Mark DePristo	c5efb6f40e	Usability improvements to GATKDocs -- ArgumentSources are now sorted by case insensitive names, so arguments are shown in alphabetical order (Ryan) -- @Advanced annotation can be used to indicate that an argument is an advanced option and should be visually deemphasized in the GATKs. There's now an advanced section. Mauricio or Ryan -- could you figure out how to make this section less prominent in the style.css?	2011-08-18 21:39:11 -04:00
Mark DePristo	d94da0b1cf	Moved CG and SOAP codecs to private	2011-08-18 21:20:26 -04:00
Mark DePristo	f7414e39bc	Improvements to GATKDocs -- Allowed values for RodBinding<T> are displayed in the GATKDocs -- Longest name up to 30 characters is chosen for main argument list (suggested by Ryan/Mauricio) -- Features are listed in alphabetical order -- Moved useful getParameterizedType() function to JVMUtils -- Tests of these features in the Documentation Test	2011-08-18 21:20:09 -04:00
Ryan Poplin	09d099cada	Added GATKDocs to the UnifiedGenotyper.	2011-08-18 20:57:02 -04:00
Mauricio Carneiro	6ef01e40b8	Complete rewrite of Hard Clipping (ReadClipper) Hard clipping is now completely independent from softclipping and plows through previously hard or soft clipped reads.	2011-08-18 18:35:45 -04:00
Guillermo del Angel	626cbf9411	Bug fixes and cleanups for IndelStatistics	2011-08-18 16:28:40 -04:00
Guillermo del Angel	58560a6d50	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-18 16:17:52 -04:00
Guillermo del Angel	3dfb60a46e	Fixing up and refactoring usage of indel categories. On a variant context, isInsertion() and isDeletion() are now removed because behavior before was wrong in case of multiallelic sites. Now, methods isSimpleInsertion() and isSimpleDeletion() will return true only if sites are biallelic. For multiallelic sites, isComplex() will return true in all cases. VariantEval module CountVariants is corrected and an additional column is added so that we log mixed events and complex indels separately (before they were being conflated). VariantEval module IndelStatistics is considerably simplified as the sample stratification was wrong and redundant, now it should work with the VE-generic Sample stratification. Several columns are renamed or removed since they're not really useful	2011-08-18 16:17:38 -04:00
Chris Hartl	6b256a8ac5	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git	2011-08-18 15:29:24 -04:00
Chris Hartl	a8935c99fc	dding docs for DepthOfCoverage and ValidationAmplicons	2011-08-18 15:28:35 -04:00
Mark DePristo	f2f51e35e3	Merge branch 'master' into help	2011-08-18 14:05:33 -04:00
Mark DePristo	faa3f8b6f6	Only concrete classes are now documented	2011-08-18 14:04:47 -04:00
Ryan Poplin	7c4ce6d969	Added GATKDocs for the VQSR walkers.	2011-08-18 14:00:39 -04:00
Mark DePristo	5772766dd5	Improvements to GATKDocs -- Now supports a static list of root classes / interfaces that should receive docs. A complementary approach to documenting features to the DocumentedGATKFeature annotation -- Tribble codecs are now documented! -- No longer displayed sub and super classes	2011-08-18 14:00:09 -04:00
Mark DePristo	e03db30ca0	New uses DocumentedGATKFeatureObject instead of annotation directly -- Step 1 on the way to creating a static list of additional classes that we want to document.	2011-08-18 12:31:04 -04:00
Mark DePristo	d4511807ed	Merge branch 'master' into help	2011-08-18 11:53:37 -04:00
Mark DePristo	c787fd0b70	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-18 11:52:45 -04:00
Mark DePristo	c797616c65	If you have one sample in your BAM, getToolkit().getSamples().size() == 2 Also deleted double initializationm, where a line of code was duplicated in creating the GATK engine.	2011-08-18 11:51:53 -04:00
Mark DePristo	cbec69a130	Merge branch 'master' into help Conflicts: public/java/src/org/broadinstitute/sting/utils/help/HelpUtils.java	2011-08-18 11:33:27 -04:00
Eric Banks	aa21fc7c9c	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-18 11:30:59 -04:00
Mark DePristo	f5d7cabb20	Fix for reintroducing an already solved problem.	2011-08-18 11:20:12 -04:00
Eric Banks	a45498150a	Remove non-ascii char	2011-08-18 11:18:29 -04:00
Ryan Poplin	c08a9964d4	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-18 10:58:04 -04:00
Ryan Poplin	bb79d3edae	Added GATKDocs for the BQSR walkers.	2011-08-18 10:57:48 -04:00
Mark DePristo	47bbddb724	Now provides type-specific user feedback For RodBinding<VariantContext> error messages now list only the Tribble types that produce VariantContexts	2011-08-18 10:47:16 -04:00
Mark DePristo	2d41ba15a4	Vastly better Tribble help message Here's a new example: ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A USER ERROR has occurred (version 1.1-520-g76495cd): ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed ##### ERROR Please do not post this error to the GATK forum ##### ERROR ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments. ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa ##### ERROR ##### ERROR MESSAGE: Invalid command line: Failed to parse value /humgen/gsa-hpprojects/GATK/data/refGene_b37.filtered.sorted.txt for argument refSeqRodBinding. Message: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :TYPE listing the correct type from among the supported types: ##### ERROR Name FeatureType Documentation ##### ERROR BEAGLE BeagleFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html ##### ERROR BED BEDFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_bed_BEDCodec.html ##### ERROR BEDTABLE TableFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html ##### ERROR CGVAR VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_completegenomics_CGVarCodec.html ##### ERROR DBSNP DbSNPFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_dbsnp_DbSNPCodec.html ##### ERROR GELITEXT GeliTextFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_gelitext_GeliTextCodec.html ##### ERROR MAF MafFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_features_maf_MafCodec.html ##### ERROR MILLSDEVINE VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_MillsDevineCodec.html ##### ERROR RAWHAPMAP RawHapMapFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html ##### ERROR REFSEQ RefSeqFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html ##### ERROR SAMPILEUP SAMPileupFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html ##### ERROR SAMREAD SAMReadFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html ##### ERROR SNPEFF SnpEffFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_snpEff_SnpEffCodec.html ##### ERROR SOAPSNP VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_soapsnp_SoapSNPCodec.html ##### ERROR TABLE TableFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_TableCodec.html ##### ERROR VCF VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html ##### ERROR VCF3 VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html ##### ERROR ------------------------------------------------------------------------------------------	2011-08-18 10:31:32 -04:00
Mark DePristo	c2287c93d7	Cleanup of codec locations. No more dbSNPHelper -- refdata/features now in utils/codecs with the other codecs -- Deleted dbsnpHelper. rsID function now in VCFutils. Remaining code either deleted or put into VariantContextAdaptors -- Many associated import updates due to code move	2011-08-18 10:02:46 -04:00
Mark DePristo	9c17d54cb6	getFeatureClass() now returns Class<T> not Class to avoid yesterday's runtime error	2011-08-18 09:39:20 -04:00
Mark DePristo	c30e1db744	Better location for help utils	2011-08-18 09:38:51 -04:00
Mark DePristo	4da42d9f39	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-18 09:32:57 -04:00
Eric Banks	c91a442be1	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-17 22:40:16 -04:00
Eric Banks	a7b70e6bb4	Adding feature for Khalid: ability to exclude particular samples.	2011-08-17 22:28:22 -04:00
Mauricio Carneiro	cc3df8f11a	Moving GAV walker to public Walker is updated to the new RodBinding system and has the new GATKDocs layout.	2011-08-17 21:55:17 -04:00
Eric Banks	fa1db3913b	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-17 21:49:25 -04:00
Eric Banks	8e83b6646b	Bug fix for Chris: don't validate ref base for complex events.	2011-08-17 21:49:14 -04:00
Matt Hanna	c104dd7a09	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-17 16:59:12 -04:00
Matt Hanna	81a792afeb	Reverting optimization disable in unstable.	2011-08-17 16:58:24 -04:00
Mark DePristo	2e35592295	GATKDocs for CallableLoci	2011-08-17 16:32:01 -04:00
Guillermo del Angel	c193f52e5d	Fixed up examples: pasting from wiki still had old rod syntax	2011-08-17 16:29:45 -04:00
Matt Hanna	2b2a4e0795	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable	2011-08-17 16:26:45 -04:00
Matt Hanna	297c9e513c	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable into unstable	2011-08-17 16:24:02 -04:00
Matt Hanna	a210a62ab9	Merged bug fix from Stable into Unstable	2011-08-17 16:23:31 -04:00
Mark DePristo	d59e6ed274	Fix for RefSeqCodec bug and better error messages -- RefSeqCodec bug: getFeatureClass() returned RefSeqCodec.class, not RefSeqFeature.class. Really should change this in Tribble to require Class<T extends Feature> to get compile time type checking -- Better error messages that actually list the available tribble types, when there's a type error	2011-08-17 16:22:07 -04:00
Matt Hanna	d170187896	Disable optimization that increases marginal speed of the GATK slightly but can produce data loss in a narrow corner case where the BGZF block(s) locations and offsets in the last index bucket of contig n overlap exactly with the BGZF block locations and offset in the last index bucket of contig n+1. A proper fix that keeps the optimization has already been introduced into unstable, but disabling the optimization is a low risk way to make sure that users of stable experience no data loss.	2011-08-17 16:16:05 -04:00
David Roazen	53006da9a5	Improved descriptions for the SnpEff annotations in the VCF header (based on Eric's feedback).	2011-08-17 16:09:10 -04:00
Guillermo del Angel	784fb148b9	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-17 15:47:01 -04:00
Guillermo del Angel	671330950d	Updated Beagle walker for gatkdocs format. Pushed unsupported, undocumented arguments to @Hidden	2011-08-17 15:46:31 -04:00
Andrey Sivachenko	0af68e052a	Merge branch 'master' of ssh://cga1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-17 15:17:47 -04:00
Andrey Sivachenko	a423546cdd	fix: RefSeq contains records with zero coding length and the refsec codec/feature used to crash on those; now such records are ignored, with warning printed (once)	2011-08-17 15:17:31 -04:00
Andrey Sivachenko	710d34633e	now the reads that are too long are truly ignored (fix of the fix)	2011-08-17 15:16:23 -04:00
Eric Banks	2f19046f0c	Adding docs to the 2 beasts. Saved the worst for last.	2011-08-17 14:19:14 -04:00
Andrey Sivachenko	069554efe5	somatic indel detector does not die on reads that are too long (likely contain a huge deletion) anymore; instead print a warning and ignore the read	2011-08-17 14:05:19 -04:00
Eric Banks	c405a75f54	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-17 13:28:25 -04:00
Eric Banks	575303ae6b	Renaming for consistency and bringing up to speed with new rod system	2011-08-17 13:28:19 -04:00
Eric Banks	6d629c176c	Adding docs	2011-08-17 13:27:36 -04:00
Eric Banks	a21e193a9e	Adding docs to 3 more walkers	2011-08-17 12:35:08 -04:00
Menachem Fromer	98acb546a9	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-17 12:22:29 -04:00
Menachem Fromer	d1bb302d12	Added GatkDocs documentation	2011-08-17 12:21:37 -04:00
Mark DePristo	3da71a9bb6	Clean up summary	2011-08-17 12:04:45 -04:00
Mark DePristo	c6fb215faf	GATKDocs for VariantsToTable -- Made a previously required argument optional, as this was a long-standing bug	2011-08-17 12:02:41 -04:00
Mark DePristo	5f794d16a7	Fixed bad character in documentation	2011-08-17 12:01:08 -04:00
Mark DePristo	9d1d5bd27a	Revert "Fixed bad character in documentation" This reverts commit a1f50c82d3cb25e5e83d36e9054d74cdee957d87.	2011-08-17 11:57:31 -04:00
Mark DePristo	78deb3f195	Fixed bad character in documentation	2011-08-17 11:57:00 -04:00
Mark DePristo	79dcfca25f	Fixed bad character in documentation	2011-08-17 11:56:51 -04:00
Eric Banks	b3b5d608ca	Adding docs to yet more walkers	2011-08-17 09:57:19 -04:00
Eric Banks	fadcbf68fd	Adding docs to QC walkers	2011-08-17 09:39:33 -04:00
Mauricio Carneiro	5d6a6fab98	Renamed softUnclipped functions to refCoord* These functions return reference coordinates, so they should be named accordingly.	2011-08-16 18:56:28 -04:00
Mauricio Carneiro	ed8f769dce	Fixed index for getSoftUnclippedEnd() Unclipped end can be calculated simply by looking at the last cigar element and adding it's length in case it's a soft clip.	2011-08-16 18:54:28 -04:00
Eric Banks	5f3f46aad1	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-16 16:26:33 -04:00
Eric Banks	946f5c53fe	Adding docs to more walkers	2011-08-16 16:26:26 -04:00
Mark DePristo	6e828260a0	Removed -B support. Now explodes with error if -B provided.	2011-08-16 16:13:47 -04:00
Ryan Poplin	2d5bbecd9e	Merged bug fix from Stable into Unstable	2011-08-16 14:19:04 -04:00
Mauricio Carneiro	07c1e113cd	Fixed interval traversal for previously hard clipped reads. If a read was hard clipped for being low quality and no does not overlap the interval anymore, this read will now be discarded instead of treated as an error by the GATK traversal engine.	2011-08-16 14:18:05 -04:00
Ryan Poplin	9d4add3268	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2011-08-16 14:18:03 -04:00
Ryan Poplin	170d1ff7b6	Fix in UG for trying to call indels at IUPAC code bases when in EMIT_ALL_SITES mode	2011-08-16 14:17:46 -04:00
Mauricio Carneiro	b135565183	Added low quality clipping Clips both tails of a read if the tails are below a given quality threshold (default Q2). *Added special treatment for reads that get completely clipped.	2011-08-16 13:51:25 -04:00
Andrey Sivachenko	9f3328db53	fixing read group name collision: before writing the read into respective stream in nway-out mode we now retrieve the original rg, not the merged/modified one	2011-08-16 13:45:40 -04:00
Eric Banks	ab0b56ed11	Minor doc fixes	2011-08-16 12:55:45 -04:00
Eric Banks	125ad0bcfa	Added docs to RTC	2011-08-16 12:46:48 -04:00
Eric Banks	ef9216011e	Added docs to IR	2011-08-16 12:24:53 -04:00
Eric Banks	ab1e3d6a98	Use the right set of sample names	2011-08-16 01:03:05 -04:00
Eric Banks	36c7f83208	Refactoring VE stratifications so that they don't pass around bulky data; instead just pull needed data from the VE parent. This allows us stop using deprecated features of the rod system.	2011-08-15 16:31:57 -04:00
Eric Banks	1246b89049	Forgot to initialize variants on the merge	2011-08-15 16:00:43 -04:00
Mauricio Carneiro	993ecb85da	Added Hard Clipping Tail Ends Added functionality to hard clip the low quality tail ends of reads (lowQual <= 2)	2011-08-15 15:22:54 -04:00
Eric Banks	045e8a045e	Updating random walkers to new rod system; removing unused GenotypeAndValidateWalker	2011-08-15 14:05:23 -04:00
Eric Banks	fc2c21433b	Updating random walkers to new rod system	2011-08-15 13:29:31 -04:00
Eric Banks	3d56bbf087	Resolving merge conflicts	2011-08-15 12:28:05 -04:00
Eric Banks	9ddbfdcb9f	Check filtered status before applying to alt reference	2011-08-15 12:25:23 -04:00
Mauricio Carneiro	0d976d6211	Fixed second time clipping When a read is clipped once, and then in the second operation, because of indels, it doesn't reach the coordinate initially set for hard clipping, the indices were wrong. This should fix it.	2011-08-15 12:04:53 -04:00
Mauricio Carneiro	489c15b99d	Fixed indexing issue in coordinate conversion When a read had been previously soft clipped, the UnclippedEnd could not be used directly as Reference Coordinate for clipping , because the read does not go that far.	2011-08-15 01:42:34 -04:00
Mauricio Carneiro	c7b69a4574	Fixed integration tests	2011-08-14 16:38:20 -04:00
Mauricio Carneiro	6ae3f9e322	Wrapped clipping op information The clipping op extra information being kept by this walker was specific to the walker, not to the read clipper. Created a wrapper ReadClipperWithData class that keeps the extra information and leaves the ReadClipper slim. (this is a quick commit to unbreak the build, performing integration tests and will make further commits if necessary)	2011-08-14 15:44:48 -04:00
Mauricio Carneiro	8a51732049	Fixes to ReadClipper and added Reference Coordinate clipping. * Added reference coordinate based hard clipping functions. This allows you to set a hard cut on where you need the read to be trimmed despite indels. * soft clipping was messing up cigar string if there was already a hard clip at the beginning of the read. Fixed. * hard clipping now works with previously hard clipped reads.	2011-08-14 14:54:33 -04:00
Mauricio Carneiro	291d8c7596	Fixed HardClipping and Interval containment * Hard clipping was wrongfully hard clipping unmapped reads while soft clipping then hard clipping mapped reads. Now we throw exception if we try to hard/soft clip unmapped reads and use the soft->hard clip procedure fore every mapped read. * Interval containment needed a <= and >= to make sure it caught the borders right.	2011-08-14 14:54:33 -04:00
Mauricio Carneiro	0be1dacddb	Refactored interval clipping utility reads are clipped in map() and now we cover almost all cases. Left behind the case where the read stretches through two intervals. This will need special treatment later.	2011-08-14 14:54:33 -04:00
David Roazen	bb4ced3201	SnpEff-related fixes. -To correctly handle indels and MNPs, only consider features that start at the current locus, rather than features that span the current locus, when selecting the most significant effect. -Throw a UserException when a SnpEff rodbinding is not provided instead of simply not adding any annotations and silently returning.	2011-08-12 15:26:24 -04:00
Mauricio Carneiro	10e873d9c6	Merge branch 'repval'	2011-08-12 15:24:31 -04:00
Guillermo del Angel	31dc831531	Merged bug fix from Stable into Unstable	2011-08-12 13:26:41 -04:00
Menachem Fromer	9121b8ed65	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-12 12:24:19 -04:00
Menachem Fromer	7ed120361d	Fixed bug that required symbolic alleles to be padded with reference base and added integration test to test parsing and output of symbolic alleles	2011-08-12 12:23:44 -04:00
Eric Banks	7ea9196321	Better error message for name/type clashes.	2011-08-12 11:18:14 -04:00
Eric Banks	27f0748b33	Renaming the HapMap codec and feature to RawHapMap so that we don't get esoteric errors when trying to bind a rod with the name 'hapmap' (since it was also a feature).	2011-08-12 11:11:56 -04:00
Menachem Fromer	c7ca33cbff	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-12 10:12:09 -04:00
Eric Banks	41f3da75d7	Implementation in VE was confusing 'variant' status vs. 'polymorphic' status. This led to issues because we now match types of eval and comp; specifically, subsetting a VC to a monomorphic sample can't change the 'variant' status of the VC (it's still a variant site or otherwise we'll never match the comps, which breaks GenotypeConcordance). CountVariants really got this wrong. Fixed. VE now passes all integration tests.	2011-08-12 02:22:44 -04:00
Eric Banks	eba316621d	Finish moving VE over to new rod system and fixing up the type inconsistency between eval and comp rods. Now the novel count is always 0 under the known stratification. :)	2011-08-12 00:40:08 -04:00
Menachem Fromer	9de06560df	Update to new RodBinding system	2011-08-11 17:54:16 -04:00
Eric Banks	90771b74b4	When matching eval to comps, try to choose the one with the same alt allele.	2011-08-11 13:55:01 -04:00
Eric Banks	200f73b008	No reason to warn the user anymore because it's no longer possible for them to specify a dbsnp file on the command-line.	2011-08-11 13:44:07 -04:00
Eric Banks	e93538cdf7	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-11 13:39:36 -04:00
Eric Banks	265c3d744b	Fixing VariantEval logic and having it use the new rod system.	2011-08-11 13:39:34 -04:00
Ryan Poplin	b705d9cf15	Oops, these VariantAnnotator input bindings aren't needed during the UG	2011-08-11 13:17:16 -04:00
Ryan Poplin	7fade88070	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-11 11:02:47 -04:00
Ryan Poplin	c7b9a9ef0a	Updating UnifiedGenotyper to use the new rod binding system.	2011-08-11 11:02:11 -04:00
Mark DePristo	418a4d541f	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-11 11:01:38 -04:00
Mark DePristo	e71255d3c2	GATKDocsExample walker -- Shows the best practice for documentating a walker with the GATKdocs -- See http://www.broadinstitute.org/gsa/wiki/index.php/GATKdocs#Writing_GATKdocs_for_your_walkers for a brief discussion	2011-08-11 11:01:21 -04:00
Ryan Poplin	79c86e211f	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-11 09:59:20 -04:00
Ryan Poplin	ea42ee4a95	Updating BQSR for the new rod binding system.	2011-08-11 09:58:42 -04:00
Mark DePristo	8cdc0cbd9c	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-11 08:58:49 -04:00
Mark DePristo	40e06f9afb	Fixed broken RodBinding defaults. -- Verified now to be correct at runtime -- UnitTest covers this -- createTypeDefault now takes a Type, not a Class, so that parameterized classes can have their parameter fetched in the defaults.	2011-08-11 08:58:30 -04:00
Ryan Poplin	dd5fe8291d	Fixing up some comments in the BQSR	2011-08-11 08:36:00 -04:00
Eric Banks	f1b09db39e	Fixes for rod bindings	2011-08-10 23:08:47 -04:00
Eric Banks	75985c2fa0	Resolving merge conflicts	2011-08-10 22:45:11 -04:00
Eric Banks	bdb1da30fd	Better interface for getting RodBindings to the VariantAnnotatorEngine and its annotations: pass around an AnnotatorCompatibleWalker (interface) object. Updating VA to use the new rod system.	2011-08-10 22:43:08 -04:00
Mark DePristo	0086e27741	makeUnbound now package protected -- Removed references to it in the codebase -- Fixed documentation I saw that had the summary + body style	2011-08-10 22:29:32 -04:00
Mark DePristo	cb6cf25bb0	Updating SelectVariants documentation to reflect best practice	2011-08-10 22:24:18 -04:00
Mark DePristo	00b4d6ec57	Updated the best practice on documenting a field -- Best practice is now to skip the summary, as this is the @annotation doc value.	2011-08-10 22:21:12 -04:00
Mark DePristo	2007d2fcad	Better documentation for default value fields -- DocString function for types that create default outputs "stdout" -- RodBinding now creates a makeUnbound default value automatically for you if your RodBinding isn't required -- Removed warning about sparse help from TextFormattingUtils	2011-08-10 22:16:22 -04:00
Mauricio Carneiro	bb557266ca	Merge branches to get new RodBinding framework Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/replication_validation/ReplicationValidationWalker.java	2011-08-10 18:23:01 -04:00
Guillermo del Angel	8325cb8c26	Fixing up apparent source control/merge snafu: fix to correctly output PL ordering in multi-allelic sites by UG was only half-committed and hence not working. This completes fix	2011-08-10 15:31:49 -04:00
Eric Banks	07ad8c78a9	More tools moved over. Fixed the VariantContextIntegrationTest which was not useful because the md5s were all removed. In the future, instead of removing md5s (putting it in 'parameterization' mode), you should instead use @Test{enabled=false} since it's easier to track.	2011-08-10 14:24:40 -04:00
Eric Banks	8d14d32a62	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-10 13:42:37 -04:00
Eric Banks	749c8bfbcd	Moving more tools over to the new rod system	2011-08-10 13:42:35 -04:00
David Roazen	0497170bc9	SnpEffCodec now implements SelfScopingFeatureCodec so that we no longer have to specify the codec name on the command line for SnpEff files.	2011-08-10 13:12:09 -04:00
David Roazen	577f861f69	Pass the rodBindings into the VariantAnnotator engine, and from there to the annotation classes themselves.	2011-08-10 13:11:57 -04:00
David Roazen	480e7a7984	Correctly initialize the optional SnpEff rod binding in VariantAnnotator using RodBinding.makeUnbound()	2011-08-10 12:25:26 -04:00
Eric Banks	a42f90db11	Moving more tools over to use the standard VC arg collection. Also, while I'm in there, I removed all of the empty references to @Requires given that it's no longer relevant.	2011-08-10 12:20:18 -04:00
Eric Banks	c884b6bf1f	Fixed comment	2011-08-10 12:07:43 -04:00
Eric Banks	06cdc4d5f9	Added a StandardVariantContextInputArgumentCollection that is now used for consistency by many of the core tools.	2011-08-10 12:00:56 -04:00
Ryan Poplin	bc125f104a	TrainingSets class is obsolete now.	2011-08-10 10:23:33 -04:00
Ryan Poplin	c60cf52f73	Updating VQSR for new RodBinding syntax. Cleaning up indel specific parts of VQSR.	2011-08-10 10:20:37 -04:00
Eric Banks	1ea5ec276b	Minor cleanup	2011-08-09 23:28:59 -04:00
Eric Banks	bc2d4f554d	Bringing Indel Realigner up to speed with the new rod binding syntax; now use -known to specify the known indels track.	2011-08-09 23:21:17 -04:00
Eric Banks	b8f572b571	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-09 23:19:51 -04:00
Eric Banks	08631546c8	Partial commit for David so he can see what I want to do with the VariantAnnotator. Added a DbsnpArgumentCollection that people can use in their walkers to ensure that we have a standard syntax whenever allowing dbsnp rods. Added it to UG, but didn't hook it up. Maybe we should do the same for the 'variant' rod?	2011-08-09 23:19:40 -04:00
Mark DePristo	86afe878a7	ReducedRead optimization: single pass likelihood calculation -- Low level add() now takes a nObs argument and rather than += likelihood now does += nObs * likelihood	2011-08-09 20:55:15 -04:00
Eric Banks	489e5cffc1	Missed a few 'variants'	2011-08-09 14:29:15 -04:00
Eric Banks	b20c4d5286	Thanks to Mark for agreeing to transition from 'variants' back to 'variant'. I think I got them all but I've been jumping all around the code, so there might be a straggler or two.	2011-08-09 12:04:55 -04:00
Eric Banks	78aa6db076	added the 'reference' header line too. We are now header-compliant for vcf4.1.	2011-08-09 11:45:54 -04:00
Eric Banks	ec76bf6d4a	VCF headers now include 'contig' lines describing the name, length, and assembly (when easily parsable) for each contig in the reference.	2011-08-09 11:24:48 -04:00
Eric Banks	7afb5c9f1c	More updates to be consistent with the new rod syntax.	2011-08-09 10:11:37 -04:00
Eric Banks	70b3daf689	VariantsToVCF is up and running again; integration tests are reenabled (and added one for dbSNP).ant	2011-08-09 03:03:43 -04:00
Mauricio Carneiro	d15852be0a	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-09 00:04:59 -04:00
Mauricio Carneiro	2db6225c53	A read filter that sets all mapping qualities to a given value Pacbio has decided to assign 255 to the MQ of all their reads since they claim their aligner does not produce a number equivalent to a mapping quality. Despite much back and forth, they are dead set on not using this field, so if we want to use their bams, we will need to override that. This filter does just that. Replacing all values with a given one. Default is 60.	2011-08-09 00:04:42 -04:00
David Roazen	2efa376619	Made the necessary changes to get SnpEff support working with the new rodbinding system.	2011-08-08 23:29:39 -04:00
David Roazen	b180a1311a	Merge branch 'snpEff'	2011-08-08 22:12:14 -04:00
David Roazen	a13bc7b929	Added an integration test for the SnpEff annotation support, as well as some extra safety checks and comments.	2011-08-08 20:01:24 -04:00
Mark DePristo	80924d24de	Single positional arguments are now treated as names unless they actually match a tribble feature	2011-08-08 19:26:27 -04:00
Mark DePristo	f8a56bc64b	Merge branch 'master' into rodRefactor	2011-08-08 16:58:18 -04:00
Mark DePristo	f8ad91b16f	Reverting a bunch of bad -B type drops	2011-08-08 16:57:38 -04:00
David Roazen	5e288136e0	Added unit tests for the SnpEff codec, and made minor adjustments to the codec itself.	2011-08-08 16:51:43 -04:00
Eric Banks	d7813db217	Combine Variants was actually outputting invalid VCFs in cases where it was combining Variant Contexts with different alternate alleles: if any of the genotypes had PLs they were no longer valid/correct. Added a check for such cases (the combined VC has more alleles than an original VC) and strip out the PLs when triggered; added integration test to cover it. I also added the check to Select Variants, although it currently doesn't remove unused alleles so it should never trigger. Is there any reason not to strip out unused alleles after a select?	2011-08-08 16:25:35 -04:00
Mark DePristo	383bb6f0e0	Merge branch 'master' into rodRefactor	2011-08-08 15:25:55 -04:00
Mark DePristo	ba7353c561	Updated IntegrationTests to use the new type free format for VCF files	2011-08-08 15:04:38 -04:00
Mark DePristo	0810c42309	GATK now does dynamic type determination for VCF files Added UnitTests covering all of the cases.	2011-08-08 14:45:46 -04:00
Mark DePristo	e36994e36b	Refactored a FeatureManager class from RMDTrackBuilder New class handles (vastly more cleanly) the db of tribble codecs, features, and names for use throughout the GATK. Added SelfScopingFeatureCodec interface that allows a FeatureCodec to examine a file and determine if the file can be parsed. This is the first step towards allowing the GATK to dynamically determine the type of a RodBinding.	2011-08-08 14:04:46 -04:00
Eric Banks	197169e47b	Submitting patch from Larry Singh to make MathUtils compatible with java 1.7	2011-08-08 13:34:04 -04:00
David Roazen	dd974040af	When finding the highest-impact effect at a locus, all effects that are not within a non-coding gene are now considered higher impact than all effects that are within a non-coding gene.	2011-08-08 13:29:54 -04:00
David Roazen	c1061e994c	Initial support for adding genomic annotations through VariantAnnotator using the output from the SnpEff tool, which replaces the old Genomic Annotator.	2011-08-08 13:29:53 -04:00
Mark DePristo	0db79207e8	Refactored dependancy from CommandLineGATK from javadocs This allows us to run the GATK again in environments without Javadoc loading by default in the classpath	2011-08-08 12:27:13 -04:00
Mark DePristo	e5fde0d16b	Merge branch 'master' into rodRefactor	2011-08-08 10:08:43 -04:00
Mark DePristo	526b524c3c	CombineVariants with new RodBinding. Bugfix -- CombineVariants now uses the new RodBinding syntax, -V / --variants. Passed all integration tests on first run -- Exposed gapping bug in the List<RodBinding<T>> system now fixed. ParserEngine now has a addRodBinding() that is called by RodBindingArgumentTypeDescriptor when it encounters each RodBinding. This allows the system to work with collection types that are recursively parsed by the system.	2011-08-07 20:16:51 -04:00
Ryan Poplin	6693407bd8	Merged bug fix from Stable into Unstable	2011-08-07 17:39:03 -04:00
Mark DePristo	5f8bc3aa8a	Documenting classes, and name cleanup	2011-08-07 15:17:50 -04:00
Mark DePristo	1c63d43176	Help now points to GATKDocs instead of spitting out full, garbled description	2011-08-07 15:02:46 -04:00
Mark DePristo	b0e91f85cf	fix merge from Khalid's Queue fix	2011-08-07 10:33:20 -04:00
Mark DePristo	4d88e72958	Merge remote-tracking branch 'remotes/khalid/rodRefactor' into rodRefactor Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java public/java/test/org/broadinstitute/sting/BaseTest.java	2011-08-07 10:32:27 -04:00
Khalid Shakir	f049461120	Changed @Argument to @Input on input RodBindings. Changed shortname collision with longname. Restored scala builds. Updated HSP to use new syntax.	2011-08-06 20:44:19 -04:00
Mark DePristo	d7f98e5c2a	Fixed merge conflict deleting a {	2011-08-04 18:48:34 -04:00
Mark DePristo	75632abf88	Merge branch 'master' into rodRefactor Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToVCF.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/RealignerTargetCreatorIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java	2011-08-04 18:44:14 -04:00
Mark DePristo	f21f7f6335	SelectVariants fully documented, now the shining example of the new RodBinding system.	2011-08-04 18:28:59 -04:00
Mark DePristo	9be1ee59cc	TODO comments for Eric	2011-08-04 18:07:50 -04:00
Mauricio Carneiro	b22a3d6508	Functional VCF output. It is outputting a VCF with the 'second best guess' for the alternate allele correctly. Annotations are added at the pool level, but may get overwritten at the lane and site level. Still need to implement the merging of the the annotations at higher levels.	2011-08-04 17:49:08 -04:00
Guillermo del Angel	a8eb8c27f0	a) Minor changes to indel consensus scripts to better reflect good default values, b) Fixed up Mills/Devine codec so it always produces correct ref padded bases, and added option to VariantsToVCF to fix reference base	2011-08-04 15:34:49 -04:00
Ryan Poplin	98a96f07c1	Updated standard deviation parameter in VQSR to our current recommended value	2011-08-04 14:06:26 -04:00
Eric Banks	e48492f3c3	Validate that the reference padding base for indels is correct.	2011-08-04 12:48:56 -04:00
Mark DePristo	f0d798d47c	Bug fix: call RodBinding.resetNameCounter() in new ParsingEngine() so that we don't magically misnumber arguments in the integration tests where the GATK is only instantiated once.	2011-08-04 12:06:10 -04:00
Mark DePristo	d0279bb28c	RodBinding names are now defaulting to the ArgumentTypeDescriptor fullname Nearly all of the tools are passing integrationtests	2011-08-03 20:48:11 -04:00
Mark DePristo	0ef85647f7	A working version of a GATKReportDiffableReader for the diffEngine!	2011-08-03 18:21:18 -04:00
Mark DePristo	acbd3d0922	Fixing up integration tests so more	2011-08-03 17:26:35 -04:00
Mark DePristo	8f696c7731	Continuing progress towards RodBinding 1.0 -- Cleaning up old interface to RMDT, docs and contracts added -- Proper type checking for RodBinding for cases where the Tribble type isn't found or is the wrong type	2011-08-03 17:19:28 -04:00
Mark DePristo	800bb97f0b	Removed getFeaturesAsGATKFeature and created createGenomeLoc(Feature) in genomeLocParser Updated all walkers that used the now deleted methods.	2011-08-03 16:04:51 -04:00
Mark DePristo	f6563c0f9f	Removed support for RMD in @Requires and @Allows Merge as well Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-03 15:36:55 -04:00
Mark DePristo	79e4a8f6d3	Merge Conflicts: private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-03 15:09:47 -04:00
Mark DePristo	38efd3066c	Bug fix for mask RodBinding	2011-08-03 14:58:18 -04:00
Eric Banks	f62f47d476	Not sure why this didn't fail before, but bringing VE up to date with previous changes	2011-08-03 14:27:07 -04:00
Mark DePristo	b25140db83	Contracts and documentation for some of RefMetaDataTracker Continuing to fix integration tests that don't pass / run	2011-08-03 13:34:20 -04:00
Eric Banks	f6648e0144	Don't left-align complex indels because it's too complicated.	2011-08-03 12:03:50 -04:00
Mark DePristo	85c67e9891	Contracts and documentation for Rodbinding	2011-08-03 11:16:06 -04:00
Eric Banks	5dc324ff35	Dealing with merge confict	2011-08-03 11:03:47 -04:00
Eric Banks	7c89fe01b3	Instead of having the padded reference base be some hackish attribute it is now an actual variable in the Variant Context class. More importantly, we now always require that it be present when padding is necessary - and validate as such upon construction of the VC. This cleans up the interface significantly because we no longer require that a reference base be passed in when writing a VC/VCF record.	2011-08-03 11:00:36 -04:00
Khalid Shakir	5dcac7b064	GATKReport v0.2: - Floating point column widths are measured correctly - Using fixed width columns instead of white space separated which allows spaces embedded in cell values - Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width - Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly Replaced GATKReportTableParser with existing functionality in GATKReport	2011-08-03 00:24:47 -04:00
Mark DePristo	2874835997	Bug fix for type checking RodBindings Now compares the feature class not the codec class. UnitTests improvements integrationtests on their way to actually running	2011-08-02 22:25:41 -04:00
Mark DePristo	b5e843f8f0	Approaching the end for the new RodBinding system -- support for explicit naming of bindings (-X:name,type x) -- support for automatic naming of bindings in lists (-X:vcf foo.vcf -X:vcf bar.vcf will generate internal names X and X2) -- ParserEngineUnitTest expanded to cover all of the Rodbinding cases -- RodBindingUnitTest tests all of the low-level accessors -- Parsing engine throws UserExceptions when bad bindings are provided on the command line	2011-08-02 22:00:06 -04:00
David Roazen	d3437e62da	Added a simple utility method Utils.optimumHashSize() to calculate the optimum initial size for a Java hash table (HashMap, HashSet, etc.) given an expected maximum number of elements. The optimum size is the smallest size that's guaranteed not to result in any rehash / table-resize operations. Example Usage: Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements)); I think we're paying way too heavy a price in unnecessary rehash operations across the GATK. If you don't specify an initial size, you get a table of size 16 that gets completely rehashed and doubles in size every time it becomes 75% full. This means you do at least twice as much work as you need to in order to populate your table: (n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n	2011-08-02 21:59:06 -04:00
Mark DePristo	83891271b5	--variants throughout integrationtests	2011-08-02 20:28:47 -04:00
Mark DePristo	3a27a25cfc	Validates that the tribble binding provides the right object types at startup Tests to ensure this remains working	2011-08-02 20:11:24 -04:00
Guillermo del Angel	df37716857	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-02 18:27:13 -04:00
Mark DePristo	e4a67f3df1	RefMetaDataTracker has complete set of get() functions for List<RodBinding<T>> Including unit tests	2011-08-02 14:28:35 -04:00
Mark DePristo	03741fb640	Merge branch 'master' into rodRefactor Conflicts: public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java	2011-08-02 14:21:58 -04:00
Mark DePristo	a366f9a18d	Updating tools to use the RodBinding<T> syntax	2011-08-02 14:05:51 -04:00
Ryan Poplin	c0653514b3	minor update to comment in UG	2011-08-02 13:34:48 -04:00
Ryan Poplin	2ba57bb502	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-02 13:30:46 -04:00
Ryan Poplin	38e4ae4176	minor update to comment in UG	2011-08-02 13:30:38 -04:00
Guillermo del Angel	821bbfa9e0	Bug fixes and enhancements to run whole-genome indel VQSR, removed old chr20-only code and cleanup	2011-08-02 13:17:20 -04:00
Eric Banks	65c5d55b72	Not sure how I missed these. These lines are now superfluous.	2011-08-02 12:48:36 -04:00
Eric Banks	2c5e526eb7	Don't use the mismatch fraction by default in the RealignerTargetCreator (since it's only useful when using SW in the indel realigner). Also, no more use of -D but instead move over to using VCFs. One integration test is temporarily commented out while I wait for a VCF file to get fixed.	2011-08-02 10:34:46 -04:00
Eric Banks	5626199bb6	The Unified Genotyper now does NOT emit SLOD/SB by default; to compute SB use --computeSLOD	2011-08-02 10:14:21 -04:00
Mark DePristo	184030dd56	RefMetaDataTracker no longer automagically converts inputs to VariantContexts This was no longer working properly given that DBSNP indels needed to be moved around. The adaptor system is being refactored and you will need to convert files from X -> VCF for many tools to work.	2011-08-01 15:21:16 -04:00
Mark DePristo	8b1adb8c95	Removed getVariantContext() code	2011-08-01 13:41:09 -04:00
Eric Banks	3a9b6eacdf	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-08-01 11:23:18 -04:00
Mark DePristo	7b07c4e04e	RefMetaDataTracker now has get() methods accepting RodBindings RodBinding no longer duplicates the get() methods in RMDT. This is just an object now that connects the command line system to the RMDT. Updated programs to use new style Added UnitTests for the RodBinding accessors.	2011-07-30 15:34:11 -04:00
Mark DePristo	a6691ab2fd	List<RodBinding<T>> now working (sort of). At least the argument parsing system tolerates it.	2011-07-29 16:11:22 -04:00
Mark DePristo	6acb4aad3b	RodBinding<T> are properly generic now. VariantContextRodBinding removed, as RodBinding<VariantContext> is the right style now.	2011-07-29 14:37:12 -04:00
Mark DePristo	3b799db61a	RefMetaDataTracker cleanup and unit tests You know have to provide an explicit list of RODRecordLists upfront to the constructor. RefMetaDataTracker is now immutable. Changes in engine to incorporate these differences Extensive UnitTests for RefMetaDataTracker now.	2011-07-29 13:23:17 -04:00
Ryan Poplin	b06deac9ea	Merged bug fix from Stable into Unstable	2011-07-29 10:02:36 -04:00
Ryan Poplin	c0d4110ffd	Correcting redundant warning text.	2011-07-29 10:01:11 -04:00
Mauricio Carneiro	a58ddab93b	minQual and minPower filters added. VCF output added. Calls are now made based on the likelihood AC model. Two filters are applied: minQual and minPower. Output is now a VCF file with the variant context. It's now called the gatk's PoolCaller, no longer Replication Validation framework. Lots of testing ensue....	2011-07-28 18:58:36 -04:00
Mark DePristo	39b4e76fde	Continuing refactoring of RefMetaDataTracker. On the path towards converging getVariantContext() and getValues() in tracker so that we can have a single approach to get values from RODs with the new RodBinding() types	2011-07-28 17:48:28 -04:00
Mark DePristo	7c5c656b46	Uncovered fundamental accounting bug in VariantEval. Will be fixed by dev. team Problem is that Novelty sees multiple records at a site (SNP, INDEL) to calculate whether a site is novel, but VariantEvalWalker makes an arbitrary decision which to use for analysis and CompOverlap may not see a comp record of the same type as eval. So you get lines where the stratification is known but there are 10 novel sites!	2011-07-28 14:19:27 -04:00
Eric Banks	33b32c4211	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-28 13:57:22 -04:00
Eric Banks	7a2a65155f	Merged bug fix from Stable into Unstable	2011-07-28 13:56:43 -04:00
Eric Banks	1afc49a297	There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor.	2011-07-28 13:55:58 -04:00
Mark DePristo	f7a126722b	Cleaned up VariantContext accessors in RefMetaDataTracker It's no longer possible to provided allowed types, as this was a very rarely used feature in the engine. These get methods have been removed and local uses replaced with tests directly in their code. This simplified the RefMetaDataTracker significantly VariantContextRodBinding now forwards along all of the RefMetaDataTracker methods, so it is possible to create a full equivalent VariantContextRodBinding now as a walker field variable. All walkers updated to the new RefMetaDataTracker function call style	2011-07-28 00:16:34 -04:00
Mark DePristo	c83f9432eb	Cleaned up RefMetaDataTracker Renamed many functions to more clearly state what they are actually doing Removed unnecessary / unused functionality, reducing interface complexity Updated all uses of this code in GATK Added generic, type-safe accessors to RefMetaDataTracker such as public <T> List<T> getValues(final String name, Class<T> clazz) Added standard refMetaDataTracker accessors to RodBinding, so you can do everything you can for generic rods with the tracker directly with with the RodBinding	2011-07-27 23:25:52 -04:00
Mark DePristo	f3ad4ec94b	Removed annoying FastaSequenceIndexBuilderProgressListener infrastructure that was just a boolean switch on whether to print progress or not.	2011-07-27 22:06:23 -04:00
Eric Banks	ff31fa7990	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-27 16:15:23 -04:00
Eric Banks	5809a61b20	Merged bug fix from Stable into Unstable	2011-07-27 16:14:59 -04:00
Eric Banks	64aad67b5f	Fixing dbSNP adaptor for complex indels (wasn)	2011-07-27 16:13:45 -04:00
Mark DePristo	15be383d5b	Merge branch 'master' into rodRefactor	2011-07-27 15:36:49 -04:00
Mark DePristo	38a2518668	Merge branch 'master' into rodRefactor	2011-07-27 15:34:54 -04:00
Mark DePristo	60db6cc836	Warnings for old ROD system use. Removed unused class GATKRODFeature	2011-07-27 12:39:12 -04:00
Mark DePristo	097828a466	ParsingEngine now maintains the list of rodBindings No longer try to reparser objects to find the right fields Direct support in RodBinding for getTags()	2011-07-27 11:36:53 -04:00
Mauricio Carneiro	20a3b31b61	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-26 19:29:45 -04:00
Mauricio Carneiro	321afac4e8	Updates to the help layout. New style.css, new template for the walker auto-generated html. Short description is no longer repeated in the long description of the walker. Updated DiffObjectsWalker and ContigStatsWalker as "reference" documented walkers.	2011-07-26 19:29:25 -04:00
Kiran V Garimella	405e521d44	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-26 17:56:48 -04:00
Kiran V Garimella	412c466de6	Bug fix, wherein triple-hets after genotype refinement need to be left unphased, not just prior to refinement	2011-07-26 17:43:43 -04:00
Matt Hanna	fec495e292	Fix a nasty little bug in the sharding system: if the last shard in contig n overlaps exactly on disk with the first shard in contig n+1, the shards would be merged together to avoid duplicate extraction. Unfortunately, the interval overlap filter couldn't handle shards spanning contigs, and was choosing to filter out reads from contig n+1 which should have been included. I'm not completely sure why the BAM indexing code would ever specify that the end of one chromosome had the same on-disk location as the start of the next one. I suspect that this is a indexer performance bug.	2011-07-26 15:43:20 -04:00
Mark DePristo	9dfb57168a	RodBinding source is no longer assumed to be a file	2011-07-26 13:59:44 -04:00
Mark DePristo	d0badd5bd6	RodBinding subclassed to VariantContextRodBinding for easy access to VariantContext providing RODs	2011-07-26 13:54:55 -04:00
Mark DePristo	7ab8b53339	Support for List<RodBinding> argument type	2011-07-26 11:37:31 -04:00
Mark DePristo	38969b9783	Prototype of RODBinding @Arguments instead of -B syntax Initial version of RodBinding class. Flow from walker Rodbinding @Arguments -> RMDTriplet (old system) -> GATK engine (standard). Will need refactoring.	2011-07-26 11:09:06 -04:00
Matt Hanna	088fc39308	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 15:54:56 -04:00
Eric Banks	a53aeb75ab	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 15:10:35 -04:00
Eric Banks	a29554e565	Removing the Genomic Annotator and its supporting classes	2011-07-25 15:10:25 -04:00
Mark DePristo	3afcb3415d	Max of 1000 records will be loaded and compared to avoid heap size problem.	2011-07-25 14:58:31 -04:00
Mark DePristo	f3049fba63	refdata directory cleanup Removing unused files RODRecordIterator, ReferenceOrderedData, QueryableTrack, RMDTrackCreationException, GATKFeatureIterator, ReferenceOrderedDataUnitTest Refactored dbSNP and refseq utilities to be closer to the other files implementing these features	2011-07-25 13:21:52 -04:00
Matt Hanna	8014fad6ff	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 13:20:44 -04:00
Matt Hanna	2ac490dbdf	Fix improper detection of command-line arguments with missing values.	2011-07-25 13:20:00 -04:00
Mark DePristo	90947ab359	Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 12:53:56 -04:00
Mark DePristo	acda8eb09c	Commented out test that causes new CommandLineGATK() to fail	2011-07-25 12:43:27 -04:00
Mauricio Carneiro	95b48eface	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable into repval	2011-07-25 12:09:09 -04:00
Kiran V Garimella	357f503a21	Merge branch 'desktop'	2011-07-25 11:36:27 -04:00
Kiran V Garimella	0b43ee117c	Added the required=false tag to the -noST and -noEV arguments so the auto-help output doesn't look weird (i.e. listing arguments as required when their value has already been specified by default).	2011-07-25 11:35:34 -04:00
Kiran V Garimella	bbb8473f03	Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-07-25 10:59:00 -04:00
Mark DePristo	1a268ff1fd	Refactor so that GenotypeAnnotation and InfoFieldAnnotation share common superclass VariantAnnotatorAnnotation	2011-07-25 10:55:09 -04:00
Mark DePristo	7f8e6a97ee	InfoFieldAnnotation now an abstract class extended by annotations so doc system works	2011-07-25 10:47:11 -04:00
Mauricio Carneiro	4c6c16f895	Documented following the new gatkdoc framework	2011-07-25 00:25:08 -04:00
Mark DePristo	2039ce6102	Default values now displayed in arguments DiffEngine fixed so that newInstance() would work. Pretty quickly encountered a situation where newInstance() failed. Debug output now written when this occurs in the log. Logger now used instead of standard out, with INFO the default level.	2011-07-24 22:56:55 -04:00
Mark DePristo	c43b5981f2	Hidden variables are hidden by default. Settable by command line option DiffObjectsWalker test arguments removed. Minor refactoring of GATKDoclet	2011-07-24 20:52:44 -04:00
Mark DePristo	1c1f1da349	Fixing compilation	2011-07-24 20:01:59 -04:00
Mark DePristo	9f06f6c493	Split GATKDoclet from ResourceBundleDoclet. Refactored GaTKDocWorkUnit	2011-07-24 20:00:04 -04:00
Mark DePristo	ff85687679	Merge branch 'master' into help	2011-07-24 18:14:32 -04:00
Mark DePristo	83996f7951	Enumerated types are working.	2011-07-24 18:14:21 -04:00
Mark DePristo	3c34e9fa65	Cleanup emuns and tables	2011-07-24 17:45:58 -04:00
Mark DePristo	c620d96c96	Inline enum documentation is working	2011-07-24 17:22:14 -04:00

... 22 23 24 25 26 ...

2513 Commits (ee2f12e2ac5c4e04d7e99135ee17f4faf4d731be)