gatk-3.8

Commit Graph

Author	SHA1	Message	Date
David Roazen	d5fce22d78	Disable HaplotypeCaller integration tests in Stable These tests use out-of-date files that no longer exist, and only need to be enabled in Unstable for now.	2012-02-13 16:28:19 -05:00
David Roazen	03e5184741	Fix serious engine bug that could cause reads to be dropped under certain circumstances When aggregating raw BAM file spans into shards, the IntervalSharder tries to combine file spans when it can. Unfortunately, the method that combines two BAM file spans was seriously flawed, and would produce a truncated union if the file spans overlapped in certain ways. This could cause entire regions of the BAM file containing reads within the requested intervals to be dropped. Modified GATKBAMFileSpan.union() to correct this problem, and added unit tests to verify that the correct union is produced regardless of how the file spans happen to overlap. Thanks to Khalid, who did at least as much work on this bug as I did.	2012-02-13 16:25:21 -05:00
Khalid Shakir	23e7f1bed9	When an interval list specifies overlapping intervals merge them before scattering.	2012-02-08 02:12:16 -05:00
Guillermo del Angel	6ec686b877	Complement to previous commit: make sure we also don't inherit filter from input VCF when genotyping at an empty site	2012-02-06 13:19:26 -05:00
Guillermo del Angel	827be878b4	Bug fix when running UG in GenotypeGivenAlleles mode: if an input site to genotype had no coverage, the output VCF had AC,AF and AN inherited from input VCF, which could have nothing to do with given BAM so numbers could be non-sensical. Now new vc has clear attributes instead of attributes inherited from input VCF.	2012-02-06 11:58:13 -05:00
Guillermo del Angel	090d87b48b	Bug fix in ValidationSiteSelector: when input vcf had genotypes and was multiallelic, the parsing of the AF/AC fields was wrong. Better logic to unify parsing of field	2012-02-06 10:33:12 -05:00
Matt Hanna	30b937d2af	Fix bug discovered in FGTP branch in which BlockInputStream returns -1 in cases where some data could be read, but not all the data requested by the caller.	2012-02-01 16:06:22 -05:00
Menachem Fromer	a9671b73ca	Fix to permit proper handling of mapping qualities between 128 to 255 (which get converted to byte values of -128 to -1)	2012-01-27 16:01:30 -05:00
David Roazen	b07fdb1089	Rename alltests* targets in build.xml "ant alltests" is now "ant committests" "ant alltests.public" is now "ant committests.public" "ant alltests.gatk.packagejar" is now "ant releasetests.gatk.packagejar" "ant alltests.queue.packagejar" is now "ant releasetests.queue.packagejar" This is going into both Stable + Unstable so that all Bamboo plans can be properly updated at the same time.	2012-01-24 14:58:30 -05:00
Mark DePristo	80a4ce0edf	Bugfix for incorrect error messages for missing BAMs and VCFs -- Missing BAMs were appearing as StingExceptions -- Missing VCFs were showing up as CommandLineErrors, but it's clearer for them to be CouldNotReadInputFile exceptions -- Added integration tests to ensure missing BAMs, VCFs, and -L files are properly thrown as CouldNotReadInputFile exceptions -- Added path to standard b37 BAM to BaseTest -- Cleaned up code in SAMDataSource, removing my parallel loading code as this just didn't prove to be useful.	2012-01-23 09:52:07 -05:00
David Roazen	d5199db8ec	Be explicit about setting the snpEff -onlyCoding option in the pipeline When run without an explicit -onlyCoding option, as we've been doing up to now, snpEff automatically sets -onlyCoding to "true" provided that there is at least one transcript marked as "protein_coding", which will always be the case for us in practice (and indeed, all pipeline runs so far with snpEff 2.0.5 have run with -onlyCoding auto-set to "true"). However, given the disastrous effect on annotation quality setting "-onlyCoding false" has, we wish to be explicit with this option rather than relying on snpEff's auto-detection logic.	2012-01-17 20:04:27 -05:00
Matt Hanna	3ba918aff1	Error message cleanup in BAM indexing code.	2012-01-17 11:05:42 -05:00
Matt Hanna	cd43f016ce	Fixed NPE in getNextOverlappingBAMScheduleEntry() when mixed mapped/unmapped interval lists are used. Added integrationtest to verify behavior.	2012-01-12 13:29:11 -05:00
Mark DePristo	2e47336a81	Only print out error report for most recent release in runGATKReport.py	2012-01-11 08:54:46 -05:00
Khalid Shakir	ef50e77ee2	When running Queue jobs locally, merge the stderr to the stdout log if the error file is NOT specified. Updated VE strats in the HSP for plotting Ka/Ks by AC.	2012-01-10 16:10:25 -05:00
Matt Hanna	dc60757b68	Eliminate unnecessary strong references (and therefore memory held) by tree reduce entries that have already been processed. Thanks to Tim Fennell for the bug report.	2012-01-09 23:04:53 -05:00
Mark DePristo	845c0b1c66	Merge branch 'master' of ssh://depristo@gsa1/humgen/gsa-scr1/gsa-engineering/git/stable	2012-01-09 08:40:59 -05:00
Mark DePristo	f5add25c72	Improved formatting of queueStatus	2012-01-09 08:40:53 -05:00
Matt Hanna	1f1233b669	Fix for a rare but insidious bug in position tracking during async BAM file reading. Thanks to Khalid for spotting and reporting the issue.	2012-01-08 22:03:35 -05:00
Mark DePristo	63b7a70c44	Removing very costly analyses of all GATK versions. Will be replaced by Tableau website	2012-01-06 18:13:19 -05:00
Mark DePristo	c96fee477c	Bug fix for VariantSummary -- Call sets with indels > 50 bp in length are tagged as CNVs in the tag (following the 1000 Genomes convention) and were unconditionally checking whether the CNV is already known, by looking at the known cnvs file, which is optional. Fixed. Has the annoying side effect that indels > 50bp in size are not counted as indels, and so are substrated from both the novel and known counts for indels. C'est la vie -- Added integration test to check for this case, using Mauricio's most recent VCF file for NA12878 which has many large indels. Using this more recent and representative file probably a good idea for more future tests in VE and other tools. File is NA12878.HiSeq.WGS.b37_decoy.indel.recalibrated.vcf in Validation_Data	2012-01-05 21:51:06 -05:00
Eric Banks	18ed954741	Compute Ti/Tv only if bi-allelic	2012-01-05 15:33:26 -05:00
Khalid Shakir	253a07fdb1	Implicits conversion issue/bug: QScript String<==>File shortcuts at compile time do not make String.equals(File) at runtime.	2012-01-03 18:43:45 -05:00
Mauricio Carneiro	9b55505c03	Fixing PairHMMIndelErrorModel array out of bounds This error was due to the ReadClipper change of contract. Before the read utils would return null if a read was entirely clipped, now it returns an empty (safe) GATKSAMRecord.	2012-01-03 18:08:46 -05:00
David Roazen	ea6e718cb8	SnpEff 2.0.5 support. Re-enabled SnpEff in the HybridSelectionPipeline. For now, we recommend only running with the GRCh37.64 database.	2012-01-03 15:18:36 -05:00
David Roazen	f3f01da1af	Enforce serial dependencies in RecalibrationWalkersIntegrationTest Some tests in this class were intermittently not being executed due to being randomly scheduled before tests whose results they depend on. Now the serial dependencies are enforced to avoid problematic orderings.	2012-01-03 10:42:41 -05:00
Mauricio Carneiro	1b6d52817e	fixing adaptor clipping effect on recalibration integration test	2012-01-01 22:20:06 -05:00
Eric Banks	b0d68eb0e3	Merge remote-tracking branch 'unstable/master'	2011-12-31 20:26:44 -05:00
Mauricio Carneiro	55cfa76cf3	Updated integration tests for the new adaptor clipping fix.	2011-12-30 18:47:14 -05:00
Mauricio Carneiro	c7d0a9ebee	Forgot to test for inter-chromosomal mates in the adaptor clipping * Fixing bug caught by Eric (and Kristian)	2011-12-30 00:19:53 -05:00
Matt Hanna	a259bfefd4	First commit addressing problems running RTC in parallel. Turns out that because the RTC is the first walker to 'correctly' tree reduce according to functional programming standards, the RTC has revealed a few problems with the tree reducer holding on to too much data. This is the first and smaller of two commits to reduce memory consumption. The second commit will likely be pushed after GATK1.4 is released.	2011-12-29 16:22:14 -05:00
Matt Hanna	e6e80e8d3f	Update Picard to fix a bug Mauricio found in Picard where Picard unnecessarily depends on Snappy during some usages of SortingCollection.	2011-12-29 14:35:02 -05:00
Roger Zurawicki	efe33a0a1b	BUG FIX: Output is correct The output would put zero coverage because the pileup filtered using the wrong method Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-28 23:05:43 -05:00
Roger Zurawicki	5672688a73	Optimized CoverageByRG and Added GCContent - CoverageByRG now uses a hashmap for its value instead of a list. It runs about 4 times faster. - Cleaned up some of the code - CoverageByRG now calculates GCContent Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-28 15:25:07 -05:00
Roger Zurawicki	0c05998c4c	Added CoverageByRG LocusWalker WIll take any number of input bams and intervals Returns a ReportTable with Average Coverage of each Read Group per Interval Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2011-12-28 15:25:07 -05:00
Mauricio Carneiro	f692911903	GATKSAMRecord emptyRead static constructor * Creates an empty GATKSAMRecord with empty (not null) Cigar, bases and quals. Allows empty reads to be probed without breaking. * All ReadClipper utilities now emit empty reads for fully clipped reads	2011-12-27 17:01:17 -05:00
Mauricio Carneiro	8259c748f2	No more Filtered Reads tag. All synthetic reads are marked with the reduced read tag.	2011-12-27 17:01:17 -05:00
Ryan Poplin	ef31b2f0a7	fixing merge conflicts.	2011-12-27 14:26:36 -05:00
Ryan Poplin	4f09a95221	Updating HaplotypeCaller for the new contracts in the adapter clipping.	2011-12-27 14:25:03 -05:00
Mauricio Carneiro	17bfe48d5e	Made all class methods private in the ReadClipper * ReadClipperUnitTest now uses static methods * Haplotype caller now uses static methods * Exon Junction Genotyper now uses static methods	2011-12-27 02:11:32 -05:00
Mauricio Carneiro	ce493bf257	Added adaptor clipping to ReduceReads * made all clipping steps optional with arguments.	2011-12-27 01:19:06 -05:00
Mauricio Carneiro	f7a5752025	Let this one slip through my commits.	2011-12-26 21:55:02 -05:00
Mauricio Carneiro	c1eaf7cf81	ReduceReads will allows different context sizes for different events * Rename contextSize to contextSizeMismatches * Indel context size is now different from mismatches context size	2011-12-26 21:17:29 -05:00
Mauricio Carneiro	4633637af6	Moved ReduceReads to static ReadClipper * all clipping done in ReduceReads is done using the static methods of the ReadClipper now.	2011-12-26 21:14:40 -05:00
Mauricio Carneiro	9aa1c0c6e5	Better documentation and contracts for ReduceReads * added javadoc to all methods * added GATKDocs style documentation to the ReduceReadsWalker * revised contracts and made explicit in the documentation	2011-12-26 21:12:23 -05:00
Mauricio Carneiro	3051cdf9c5	fixed reduced reads integration tests	2011-12-26 21:12:22 -05:00
Mauricio Carneiro	256a7d8bd2	fixing the arguments for RRead script	2011-12-26 21:12:22 -05:00
Eric Banks	dd990061f6	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2011-12-26 14:45:35 -05:00
Eric Banks	2130b39f33	Found the bug in the engine: RodLocusView was using the wrong seek method so that it would only move to the first locus of a shard (and with multi-locus shards, this meant that we never processed RODs from the other positions). In fact, because the seek(Shard) method is extremely misleading and now no longer used, I think it's safer to delete it and make everyone use the much more transparent seek(GenomeLoc). Note that I have not re-enabled my improvements to the intervals accumulation of ReferenceDataSource because that inefficiency is still present downstream in RodLocusView; need to discuss those changes with Matt.	2011-12-26 14:45:19 -05:00
Mauricio Carneiro	02495a5fd5	renaming script, once more	2011-12-23 20:01:25 -05:00

1 2 3 4 5 ...

8504 Commits (d5fce22d78db9638b2ebda858f1ca8a67f928e33) All Branches Search

8504 Commits (d5fce22d78db9638b2ebda858f1ca8a67f928e33)

All Branches