gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	8020ba14db	Minor cleanup of SAMDataSource as part of my system review -- Changed a few function from public to protected, as they are only used by the package contents, to simplify the SAMDataSource interface	2012-11-30 15:04:41 -05:00
Mauricio Carneiro	fc7fab5f3b	Fixed ReadBackedPileup downsampling Downsampling in the PerSampleReadBackedPileup was broken, it didn't downsample anything, always returning a copy the original pileup.	2012-11-30 00:42:05 -05:00
Joel Thibault	97d29f203e	Add walltime changes to LSF - Check whether the specified attribute is available - Add pipeline test (disabled due to missing attribute)	2012-11-29 15:23:37 -05:00
Joel Thibault	c76c808268	Reads are required to be sorted - Remove the extended_only case because it's outside intervals	2012-11-28 13:59:58 -05:00
Joel Thibault	198923b597	Add ActiveRegionReadState handling	2012-11-28 13:59:57 -05:00
Ryan Poplin	f0395b457a	Adding the work-in-progress, experimental RepeatLengthCovariate to the BQSR so Chris can continue the development.	2012-11-28 13:56:32 -05:00
Eric Banks	3463774f2a	Merged bug fix from Stable into Unstable	2012-11-28 13:26:52 -05:00
Eric Banks	6030605242	Added quick check for creation of bad BAQ values associated with badly encoded base qualities; hopefully this can help us debug the non-reproducible issue seen by many users.	2012-11-28 13:26:31 -05:00
Mark DePristo	c676853731	Merged bug fix from Stable into Unstable. Updating md5s Conflicts: protected/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java	2012-11-28 12:54:36 -05:00
Mark DePristo	a1d6461121	Critical bugfix to AFCalcResult affecting UG/HC quality score emission thresholds As reported by Menachem Fromer: a critical bug in AFCalcResult: Specifically, the implementation: public boolean isPolymorphic(final Allele allele, final double log10minPNonRef) { return getLog10PosteriorOfAFGt0ForAllele(allele) >= log10minPNonRef; } seems incorrect and should probably be: getLog10PosteriorOfAFEq0ForAllele(allele) <= log10minPNonRef The issue here is that the 30 represents a Phred-scaled probability of error and it's currently being compared to a log probability of non-error. Instead, we need to require that our probability of error be less than the error threshold. This bug has only a minor impact on the calls -- hardly any sites change -- which is good. But the inverted logic effects multi-allelic sites significantly. Basically you only hit this logic with multiple alleles, and in that case it'\s including extra alt alleles incorrectly, and throwing out good ones. Change was to create a new function that properly handles thresholds that are PhredScaled quality scores: /** * Same as #isPolymorphic but takes a phred-scaled quality score as input */ public boolean isPolymorphicPhredScaledQual(final Allele allele, final double minPNonRefPhredScaledQual) { if ( minPNonRefPhredScaledQual < 0 ) throw new IllegalArgumentException("phredScaledQual " + minPNonRefPhredScaledQual + " < 0 "); final double log10Threshold = Math.log10(QualityUtils.qualToProb(minPNonRefPhredScaledQual)); return isPolymorphic(allele, log10Threshold); }	2012-11-28 12:08:02 -05:00
Menachem Fromer	79bc878e6a	Allow debugging to be set from the command line	2012-11-27 22:37:41 -05:00
Eric Banks	b40d3eb8aa	Merged bug fix from Stable into Unstable	2012-11-27 14:41:07 -05:00
Eric Banks	01abcc3e0f	Tests didn't like my note to Geraldine in the output logs; apparently it's tested in integration tests	2012-11-27 14:40:49 -05:00
Mark DePristo	7e4b9c9e6e	Fix failing unit tests for VariantContextUtilsUnitTest -- Previous version was adding multiple samples with the same name to the variant context	2012-11-27 14:26:23 -05:00
Joel Thibault	9bfe39411e	Equal overlap should match right/later region	2012-11-27 13:03:13 -05:00
Joel Thibault	d83ad906ef	Add profile range contract	2012-11-27 13:03:13 -05:00
Joel Thibault	cc550b4145	Add a read and interval on a different contig	2012-11-27 13:03:13 -05:00
Eric Banks	9531e58445	Merged bug fix from Stable into Unstable	2012-11-27 11:00:50 -05:00
Eric Banks	4543ece088	Fixing parsing of genomelocs that contain colons in the contig names (which is allowed by the spec) as reported on the forum. Added unit test for this case.	2012-11-27 11:00:33 -05:00
Eric Banks	a82ec7ad80	Merged bug fix from Stable into Unstable	2012-11-27 10:27:08 -05:00
Eric Banks	e199562c25	I have pulled out all of the documentation URLs and put them into the HelpUtils class as static variables; this way, Appistry can change links as needed to point commercial users to their own internal forum without having to muck things up all over our source. Added some TODOs for Geraldine to update links in the GATK docs that still point to the old wiki. Sorry that I am pushing into stable, but that's what Appistry is pulling from for their release next week (and unstable has been failing forever).	2012-11-27 10:26:17 -05:00
Mauricio Carneiro	97fd5de260	Merging latest CMI updates with UNSTABLE	2012-11-27 09:08:00 -05:00
Eric Banks	b1969a66bd	Update docs	2012-11-27 08:24:41 -05:00
Eric Banks	cc72aaefeb	Minor efficiency: use >= instead of > in test	2012-11-27 01:11:23 -05:00
Eric Banks	405f3c675d	Fix for GSA-649: GenomeLocSortedSet.overlaps is crazy slow. Also improved GenomeLocSortedSet.sizeBeforeLoc.	2012-11-27 01:07:00 -05:00
Ryan Poplin	e27d677c13	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-11-26 12:20:32 -05:00
Ryan Poplin	c3b7dd1374	Misc cleanup in the HaplotypeCaller. Cleaning up unused arguments after recent changes to HC-GenotypingEngine	2012-11-26 12:19:11 -05:00
Eric Banks	4f7fa3009a	I forget why I thought that the VariantAnnotator couldn't run multi-threaded because it works just fine. Now you can specify -nt with VA.	2012-11-26 11:34:59 -05:00
Mauricio Carneiro	a3f5932501	Fixed null pointer exception in Integration Tests When running Utils.setupWriter with NO_PG_TAG set, the writer was attempting to create a program record with the null pointer. Fixed.	2012-11-26 11:12:27 -05:00
Ryan Poplin	fedc4fde6c	Merged bug fix from Stable into Unstable	2012-11-25 21:55:55 -05:00
Ryan Poplin	d978cfe835	Soft clipped bases shouldn't be counted in the delocalized BQSR.	2012-11-25 21:55:29 -05:00
Eric Banks	9719ba7adc	Remove -number example from the docs since it's no longer supported.	2012-11-22 21:53:42 -05:00
Menachem Fromer	2306518ab6	Fix to deal with 'proper' options of casting	2012-11-22 01:45:18 -05:00
Menachem Fromer	d33a412b5f	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-11-22 01:42:29 -05:00
Mark DePristo	48f271c5bd	Adding 80% support for multi-allelic variants -- Multi-allelic variants are split into their bi-allelic version, trimmed, and we attempt to provide a meaningful genotype for NA12878 here. It's not perfect and needs some discussion on how to handle het/alt variants -- Adding splitInBiallelic funtion to VariantContextUtils as well as extensive unit tests that also indirectly test reverseTrimAlleles (which worked perfectly FYI)	2012-11-21 17:24:59 -05:00
Joel Thibault	c68bc95db6	Initial read mapping tests - Failing tests are commented out	2012-11-21 17:16:46 -05:00
Joel Thibault	3ad9128800	Add some reads - Move intervals and reads to init - Update intervals and reads	2012-11-21 17:16:46 -05:00
Joel Thibault	3fa3b00f4a	Add ActiveRegion tests and refactor	2012-11-21 17:16:45 -05:00
Joel Thibault	e8defcb20d	Test multiple bases and intervals	2012-11-21 17:16:45 -05:00
Joel Thibault	c08b782743	Count isActive calls directly	2012-11-21 17:16:45 -05:00
Eric Banks	4f2229d399	As per the TODO message, I removed a check that was no longer necessary. Now ID is an allowable INFO field key.	2012-11-21 16:01:26 -05:00
Menachem Fromer	06261b58c2	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-11-21 15:57:08 -05:00
Eric Banks	ed50814ccb	Finally found a case where user errors were being masked behind other errors and could debug. It turns out that the checkForMaskedUserErrors() method needs to run recursively over all levels (calling exception.getCause()) to check for the original cause.	2012-11-21 15:57:05 -05:00
Menachem Fromer	c8be7c3102	Keep SNPs and indels separately for batch merging; Add options to DepthOfCoverage to count fragments (to not double-count overlapping reads of same fragment); DepthOfCoverage should now support ReducedReads; Replace recusrion with loop in DoC/package.scala (for lists longer than 5000 elements)	2012-11-21 15:56:53 -05:00
Ami Levy-Moonshine	4714ccc284	change the way CombineVariants check the priority arguments in order to throw error when the genotypeMergeOption argument is set to PRIORITIZE but PRIORITY_STRING is not provided	2012-11-21 10:47:35 -05:00
Eric Banks	2e1a055aca	Merged bug fix from Stable into Unstable	2012-11-20 23:20:33 -05:00
Eric Banks	c54fc94505	Protect against features that start off the end of the read (otherwise, Arrays.fill fails)	2012-11-20 23:19:59 -05:00
Eric Banks	c2efb04657	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-11-20 22:43:15 -05:00
Eric Banks	72e2d569c5	The user can now set the maximum allowable cycle on the command-line with --maximum_cycle_value. This value is (now) enforced in the Cycle covariate and a User Error is thrown if the maximum value is passed (with a helpful error message). Added unit tests to cover this new functionality.	2012-11-20 22:41:57 -05:00
Eric Banks	ff87642a91	Enable cycle covariate unit tests	2012-11-20 22:29:56 -05:00
Mark DePristo	cc7680e601	NA12878 knowledge base backed by MongoDB -- Idea is simply to create a persistent database of all TP/FP sites on chr20 in NA12878. Individual callsets can be imported, and a consensus algorithm is run over all callsets in the database to create a consensus collection, which can be used to assess NA12878 callsets for GATK and methods development -- Framework for representing simple VariantContexts and Genotypes in MongoDB, querying for records, and iterating over them in the GATK -- Not hooked up to Tribble, but could be done reasonably easily now (future TODO) -- Tools to import callsets, create consensus callsets, import and export reviews -- Scripts to reset the knowledge base and repopulate it with the standard data files (Eric will expand) -- Actually scales to all of chr20, includes AssessNA12878 that reads a VCF and itemizes it against the truth data set -- ImportCallset can load OMNI, HM3, CEU best practices, mills/devine sites and genotypes, properly marking sites as poly/mono/unk as well as TP/FP/UNK based on command line parameters -- Added shell scripts that start up a local mongo db, that connect to a local or BI hosted mongo for NA12878.db for debugging, and a setupNA12878db script that can load OMNI, HM3, CEU best practices, Mills/Devine into the db and then update the consensus. -- Reviewed sites can be exported to a VCF, and imported again, as a mechanism to safely store the only non-recoverable data from the Mongo DB. -- Created a NA12878DBWalker that manages the outer DB interaction, and that all MongoDB interacting walkers inherit from. Added a NA12878DBArgumentCollection.java consolating all of the common command line arguments (though strictly not necessary as all of this occurs in the root walker) UnitTests -- Can connect to a test knowledge base for development and unit testing -- PolymorphicStatus, TruthStatus, SiteIterator -- NA12878KBUnitTestBase provides simple utilities for connecting to the test mongo db, getting calls, etc -- MongoVariantContext tests creation, matching, and encoding -> writing -> read -> decoding from the mongodb AssessNA12878 -- Generic tool for comparing a NA12878 callset against the knowledge base. See http://gatkforums.broadinstitute.org/discussion/1848/using-the-na12878-knowledge-base for detailed documentation -- Performs trivial filtering on FS, MQ, QD for SNPs and non-SNPs to separate out variants likely to be filtered from those that are honest-to-goodness FPs Misc -- Ability to provide Description for Simplified GATK report	2012-11-20 18:50:52 -05:00
Eric Banks	937ac7290f	Lots more GGA fixes for the HC now that I understand what's going on internally. Integration tests pass except for the GGA test which I believe now produces better results.	2012-11-20 16:13:29 -05:00
Eric Banks	4f243acaa6	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-11-19 10:34:44 -05:00
Eric Banks	f0b8a0228f	Quick fix for HC refactoring: when copying over Haplotype objects, make sure to copy over the artificial allele used to create it too.	2012-11-19 09:57:55 -05:00
Eric Banks	ff180a8e02	Significant refactoring of the Haplotype Caller to handle problems with GGA. The main fix is that we now maintain a mapping from 'original' allele to 'Smith-Waterman-based' allele so that we no longer need to do a (buggy) matching throughout the calling process.	2012-11-19 09:09:57 -05:00
Eric Banks	78ce822b6f	Protect against NPE when using non-GATK reports for inputs expecting valid GATK reports	2012-11-19 09:07:04 -05:00
Joel Thibault	b70fd4a242	Initial testing of the Active Region Traversal contract - TODO: many more tests and test cases	2012-11-15 10:08:00 -05:00
Guillermo del Angel	a68e6810c9	Back off experimental code that escaped last commit, not for general use yet	2012-11-14 14:45:15 -05:00
Guillermo del Angel	89bbe73a43	Commenting out CMI pipeline test that wasn't meant to be in GATK repository (why was this merged??)	2012-11-14 14:39:04 -05:00
Guillermo del Angel	3771d074dc	Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-11-14 14:37:43 -05:00
Mauricio Carneiro	e35fd1c717	Merging CMI-0.5.0 and GATK-2.2 together.	2012-11-14 10:42:03 -05:00
Mauricio Carneiro	a079d8d0d1	Breaking the utility to write @PG tags for SAMFileWriters and StingSAMFileWriters	2012-11-14 10:33:22 -05:00
Mauricio Carneiro	dba31018f4	Implementation of BySampleSAMFileWriter ReduceReads now works with the n-way-out capability, splitting by sample. DEV-27 #resolve #time 3m	2012-11-14 10:33:22 -05:00
Mauricio Carneiro	a17cd54b68	Co-Reduction implementation in ReduceReads ReduceReads now co-reduces bams if they're passed in toghether with multiple -I. Co-reduction forces every variant region in one sample to be a variant region in all samples. Also: * Added integrationtest for co-reduction * Fixed bug with new no-recalculation implementation of the marksites object where the last object wasn't being removed after finalizing a variant region (updated MD5's accordingly) DEV-200 #resolve #time 8m	2012-11-14 10:33:21 -05:00
kshakir	6d59dd3455	Scala classes were only returning direct subclasses (confirmed when inspected in debugger) so changed PluginManager to allow specifying the explicit subclass. Removed some generics from PluginManager for now until able to figure out syntax for requesting explicit subclass. QStatusMessenger uses a slightly more primitive Map[String, Seq[RemoteFile]] instead of Map[ArgumentSource, Seq[RemoteFile]]. Added an QCommandPlugin.initScript utility method for handling specialized script types.	2012-11-14 10:33:20 -05:00
Eric Banks	42ddf51156	Merged bug fix from Stable into Unstable	2012-11-14 10:29:09 -05:00
Eric Banks	ba41f65759	Protect against NPEs in SelectVariants by checking for missing Genotypes	2012-11-13 11:53:39 -05:00
Eric Banks	c7335c9902	Having a malformed GATK report is a User Error	2012-11-13 11:53:12 -05:00
Eric Banks	525cf331f4	Don't catch a User Error and re-throw as a Reviewed Exception. That makes Eric unhappy.	2012-11-13 11:52:47 -05:00
Eric Banks	ee776e996a	Merged bug fix from Stable into Unstable	2012-11-09 08:35:51 -05:00
Eric Banks	66cbaaee31	Fixed nasty bug in BQSR csv file creation: numbers larger than 999 in the Errors column were printed out with commas (which looks like a separate column). This wasn't caught earlier because there are no integration tests covering the csv. I'll add one into unstable in a sec.	2012-11-09 08:33:55 -05:00
Eric Banks	e9183d9fe0	Fix bugs as reported on the forum: BED needs to be explicitly set as the default output format and the output didn't actually adhere to the BED spec.	2012-11-08 15:07:47 -05:00
Eric Banks	17ab3a39d5	Make the --intermediate_csv_file argument un-hidden.	2012-11-08 14:35:23 -05:00
Eric Banks	f4d4846435	Merged bug fix from Stable into Unstable	2012-11-06 20:53:54 -08:00
Eric Banks	15b8c08132	Apparently CIGAR elements can have 0 length according to the spec, but 0Ms were causing left alignment of indels to fail. Fixed.	2012-11-06 20:53:33 -08:00
Mark DePristo	f8a0a947e3	Critical bugfix for GSA-652 / Multi-threaded VCF -> BCF writing produces invalid intermediate file that fails on merging -- New tribble library now uses 64 bit sizes. The 26K VCF has so much data that low-level tribble block indices where overflowing their int size values. This includes a to-be-committed tribble jar that fixes this problem -- See https://jira.broadinstitute.org/browse/GSA-652 -- Minor cleanup of error messages that were useful on the way to solving this monster problem	2012-11-02 09:09:59 -04:00
David Roazen	6185e8c432	Allow large-scale tests 5 hours each to run	2012-11-01 17:48:58 -04:00
Ryan Poplin	386b45e94d	This VE eval module isn't useful anymore.	2012-11-01 15:44:41 -04:00
Mark DePristo	872abddfce	Add custom TestNGTestTransformer that adds a maximum test runtime of 10 minutes to all testng tests -- Closes GSA-494 / Add maximum runtime for integration tests, running them in timeout thread -- Needed to debug locking issues -- Needed to debug excessively long running integrationtests -- Added build.xml maximum runtime for all testng tests of 10 hours. We will ultimately fail the build if it goes on for more than 10 hours	2012-11-01 15:34:12 -04:00
Mark DePristo	1444cd753b	Bugfix for GSA-647 HaplotypeCaller misses good variant because the active region doesn't trigger for an exome -- The logic for determining active regions was a bit broken in the HC when intervals were used in the system -- TraverseActiveRegions now uses the AllLocus view, since we always want to see all reference sites, not just those covered. Simplifies logic of TAR -- Non-overlapping intervals are always treated as separate objects for determing active / inactive state. This means that each exon will stand on its own when deciding if it should be active or inactive -- Misc. cleanup, docs of some TAR infrastructure to make it safer and easier to debug in the future. -- Committing the SingleExomeCalling script that I used to find this problem, and will continue to use in evaluating calling of a single exome with the HC -- Make sure to get all of the reads into the set of potentially active reads, even for genomic locations that themselves don't overlap the engine intervals but may have reads that overlap the regions -- Remove excessively expensive calls to check bases are upper cased in ReferenceContext -- Update md5s after a lot of manual review and discussion with Ryan	2012-11-01 15:34:04 -04:00
Mark DePristo	9cd04c335c	Work on GSA-508 / CachingIndexedFastaReader should internally upper case bases loading data -- As one might expect, CachingIndexedFastaSequenceFile now internally upper cases the FASTA reference bases. This is now done by default, unless requested explicitly to preserve the original bases. -- This is really the correct place to do this for a variety of reasons. First, you don't need to work about upper casing bases throughout the code. Second, the cache is only upper cased once, no matter how often the bases are accessed, which walkers cannot optimize themselves. Finally, this uses the fastest function for this -- Picard's toUpperCase(byte[]) which is way better than String.toUpperCase() -- Added unit tests to ensure this functionality works correct. -- Removing unnecessary upper casing of bases in some core GATK tools, now that RefContext guarentees that the reference bases are all upper case. -- Added contracts to ensure this is the case. -- Remove a ton of sh*t from BaseUtils that was so old I had no idea what it was doing any longer, and didn't have any unit tests to ensure it was correct, and wasn't used anywhere in our code	2012-11-01 15:34:03 -04:00
Eric Banks	94a13c05ed	Merged bug fix from Stable into Unstable	2012-10-31 22:57:26 -04:00
Eric Banks	47a0f5859e	Don't run these tests if not GAKT lite	2012-10-31 22:56:38 -04:00
Eric Banks	881c843307	Merged bug fix from Stable into Unstable	2012-10-31 21:28:27 -04:00
Eric Banks	f8af8a2355	Moving UG integration tests to protected since they use protected-only contamination filtering. Adding a new UGLite integration test to confirm that contamination filtering is ignored in lite.	2012-10-31 21:28:07 -04:00
Guillermo del Angel	24e6da25cc	Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-31 14:17:41 -04:00
Eric Banks	96344c6b62	Add note to realigner docs	2012-10-31 12:35:45 -04:00
Guillermo del Angel	4580e99c0c	Merge branch 'master' of ssh://gsa3/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-10-31 10:50:54 -04:00
Guillermo del Angel	02b790c8db	Merge fix	2012-10-31 10:50:36 -04:00
Guillermo del Angel	51a9ce28e1	Merge remote-tracking branch 'unstable/master' into develop	2012-10-31 10:29:48 -04:00
Eric Banks	e1e480a0b9	Bug fix: don't add no-call alleles to the list of ALT alleles being validated.	2012-10-30 14:54:29 -04:00
Eric Banks	2aa28abe0a	Fixing md5s to reflect the new HapMap file	2012-10-30 14:27:10 -04:00
Guillermo del Angel	c8e17a7adf	totally experimental UG feature, to be removed	2012-10-30 13:57:54 -04:00
Eric Banks	c95e893920	Better error message for unused ALT alleles	2012-10-29 21:51:35 -04:00
Eric Banks	b6a1967f12	Better documentation for ValidateVariants so that people realize it's used for strict validation of the VCF file. Added an option to turn off strict validation and an integration test to cover it.	2012-10-29 21:47:09 -04:00
Eric Banks	be902375ac	'Bug' fix: fix the error message from the vcf validator so people realize that the file fails strict validation but still adheres to the spec.	2012-10-29 16:29:27 -04:00
Ryan Poplin	4e661847b2	DelocalizedBaseRecalibrator becomes the BaseRecalibrator.	2012-10-29 12:53:39 -04:00
Eric Banks	ac99437eec	Bug fixes to hapmap conversion in VariantsToVCF	2012-10-29 01:45:33 -04:00
Eric Banks	43625f652e	Shoot, mixed up the md5s last time.	2012-10-27 19:43:46 -04:00
Andrey Sivachenko	f3ac5d404d	updating vcf header attribute descriptions in order to reflect correctly what's actually being written...	2012-10-26 23:52:21 -04:00

1 2 3 4 5 ...

3117 Commits (6ed9eb3da9eed02e54dd893a4f4fb60b4caa514b)