gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Christopher Hartl	1be8a88909	Changes: 1) GATKArgumentCollection has a command to turn off randomization if setting the seed isn't enough. Right now it's only hooked into RankSumTest. 2) RankSumTest now can be passed a boolean telling it whether to use a dithering or non-randomizing comparator. Unit tested. 3) VariantsToBinaryPed can now output in both individual-major and SNP-major mode. Integration test. 4) Updates to PlinkBed-handling python scripts and utilities. 5) Tool for calculating (LD-corrected) GRMs put under version control. This is analysis for T2D, but I don't want to lose it should something happen to my computer.	2012-10-03 16:02:42 -04:00
David Roazen	ac87ed47bb	BQSR: allow logging recal table updates to a file For testing/debugging purposes only	2012-10-01 14:18:34 -04:00
Christopher Hartl	2508b0f5a7	Merged bug fix from Stable into Unstable	2012-09-29 00:57:43 -04:00
Christopher Hartl	365f1d2429	hmk123's error on the forum came from the reference context occasionally lacking bases needed for validating the reference bases in the variant context. (no @Window for VariantsToBinaryPed). This bugfix adresses this and other minor items: 1) ValidateVariants removed in favor of direct validation VariantContexts. Integration test added to test broken contexts. 2) Enabling indel and SV output. Still bi-allelic sites only. Integration tests added for these cases. 3) Found a bug where GQ recalculation (if a genotype has PLs but no GQ) would only happen for flipped encoding. Fixed. Integration test added.	2012-09-29 00:55:31 -04:00
Eric Banks	2df5be702c	Added an argument to RR to allow polyploid consensus creation (by default it is turned off). This will eventually be replaced by the known SNPs track trigger.	2012-09-28 11:44:25 -04:00
David Roazen	e740977994	GATK Engine: do not merge FilePointers that span multiple contigs This affects both the non-experimental and experimental engine paths, and so may break tests, but this is a necessary change.	2012-09-27 18:02:25 -04:00
David Roazen	e82946e5c9	ExperimentalReadShardBalancer: create one monolithic FilePointer per contig Merge all FilePointers for each contig into a single, merged, optimized FilePointer representing all regions to visit in all BAM files for a given contig. This helps us in several ways: -It allows us to create a single, persistent set of iterators for each contig, finally and definitively eliminating all Shard/FilePointer boundary issues for the new experimental ReadWalker downsampling -We no longer need to track low-level file positions in the sharding system (which was no longer possible anyway given the new experimental downsampling system) -We no longer revisit BAM file chunks that we've visited in the past -- all BAM file access is purely sequential -We no longer need to constantly recreate our full chain of read iterators There are also potential dangers: -We hold more BAM index data in memory at once. Given that we merge and optimize the index data during the merge, and only hold one contig's worth of data at a time, this does not appear to be a major issue. TODO: confirm this! -With a huge number of samples and intervals, the FilePointer merge operation might become expensive. With the latest implementation, this does not appear to be an issue even with a huge number of intervals (for one sample, at least), but if it turns out to be a problem for > 1 sample there are things we can do. Still TODO: unit tests for the new FilePointer.union() method	2012-09-27 14:47:54 -04:00
Christopher Hartl	abbe757907	Merged bug fix from Stable into Unstable	2012-09-27 00:15:35 -04:00
Christopher Hartl	55cdf4f9b7	Commit changes in Variants To Binary Ped to the stable repository to be available prior to next release.	2012-09-27 00:13:32 -04:00
Mark DePristo	33b2f65bbd	Script to evaluate SNP and indel calls for the experimental downsampler -- Calls NA12878 with and without the expt. downsampler on chr1 -- Creates combined vcf, annotating sites as overlapping omni SNPs and Mills indels -- Creates simple combined.table that has chr, pos, set, and type to easily ID missed good sites with the new downsampler	2012-09-26 11:33:06 -04:00
Ryan Poplin	f009424952	Adding Phase2 HC calling qscripts for both the original calls and the project consensus.	2012-09-26 11:26:24 -04:00
Ryan Poplin	e49fe74612	Adding some of the qscripts from my BQSR experiments.	2012-09-26 10:55:34 -04:00
Mark DePristo	e1524ebbc8	NA12878 HiSeq b37 20:10-11 mb test files	2012-09-25 08:54:02 -04:00
Eric Banks	caa431c367	Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-09-24 21:46:36 -04:00
Eric Banks	11a71e0390	RR bug: when determining the most common base at a position, break ties by which base has the highest sum of base qualities. Otherwise, sites with 1 Q2 N and 1 Q30 C are ending up as Ns in the consensus. I think perhaps we don't even care about which base has the most observations - it should just be determined by which has the highest sum of base qualities - but I'm not sure that's what users would expect.	2012-09-24 21:46:14 -04:00
David Roazen	3f44b3e019	Update DataProcessingPipelineTest MD5s	2012-09-24 15:38:07 -04:00
David Roazen	0b488cce66	ExperimentalReadShardBalancer: close() exhausted iterators Fixes a truly awful SAMReaders resource leak reported by Eric -- thanks Eric!	2012-09-24 14:52:59 -04:00
Mark DePristo	9fd30d6f1c	When writing the initial commit for nt + nct I realized this class was really just a ThreadGroupOutputTracker -- The code is cleaner and the logical more obvious now.	2012-09-24 14:15:36 -04:00
Mark DePristo	3e8d992828	Remove bad error test from MicroScheduler, as it's no longer applicable.	2012-09-24 14:15:36 -04:00
Mark DePristo	a6b3497eac	Fixes GSA-515 Nanoscheduler GSA-577 -nt and -nct together appear to not close resources properly -- Fixes monster bug in the way that traversal engines interacted with the NanoScheduler via the output tracker. -- ThreadLocalOutputTracker is now a ThreadBasedOutputTracker that associates via a map from a master thread -> the storage map. Lookups occur by walking through threads in the same thread group, not just the thread itself (TBD -- should have a map from ThreadGroup instead) -- Removed unnecessary debug statement in GenomeLocParser -- nt and nct officially work together now	2012-09-24 14:15:35 -04:00
Mark DePristo	4749fc114f	Temp. disable -nt > 1 and -nct > 1 while bugs are worked out	2012-09-24 14:15:35 -04:00
Mark DePristo	847e79247d	Use SNP only model for NCT big exome test	2012-09-24 14:15:35 -04:00
Mark DePristo	f42e55c9df	Add NCT specific performance test for 5K exomes	2012-09-24 14:15:35 -04:00
Mark DePristo	09bbd2c4c3	Include exception in VCFWriter when one is found when rethrowing as ReviewedStingException	2012-09-24 14:15:35 -04:00
Mark DePristo	10a6b57be6	Fix thread name: should be master executor not input	2012-09-24 14:15:35 -04:00
Eric Banks	9464dfdbf2	Don't penalize the reduced reads for spanning deletions (when surrounding base quals are Q2s)	2012-09-24 14:06:07 -04:00
Eric Banks	6a73265a06	RR bug: we were adding synthetic reads from the header only before the variant region, which meant that reads that overlap the variant region but that weren't used for the consensus (because e.g. of low base quality for the spanning base) were never being used at all. Instead, add synthetic reads from before and spanning the variant region.	2012-09-24 13:29:37 -04:00
Eric Banks	ef680e1e13	RR fix: push the header removal all the way into the inner loops so that we literally remove a read from the general header only if it was added to the polyploid header. Add comments.	2012-09-24 11:14:18 -04:00
Eric Banks	1509153b4b	Adding my little walker to assess reduced bam coverage against the original bam because it's turning out to be very useful.	2012-09-23 00:47:40 -04:00
Eric Banks	0187f04a90	Proper fix for a previous RR bug fix: only remove reads from the header if they were actually used in the creation of the polyploid consensus.	2012-09-23 00:39:19 -04:00
Eric Banks	74bb4e2739	Fixing the VariantContextUtilsUnitTest	2012-09-22 23:24:55 -04:00
Eric Banks	344083051b	Reverting the fix to the generalized ploidy exact model since it cannot handle it computationally. Will file this in the JIRA.	2012-09-22 23:07:28 -04:00
Eric Banks	25e3ea879a	Oops, missed this test before when updating md5s	2012-09-22 22:16:35 -04:00
Eric Banks	ced652b3dd	RR bug: we need to call removeFromHeader() for reads that were used in creating a polyploid consensus or else they are reused later in creating synthetic reads. In the worst case, this bug caused the tool to create 2 copies of the reduced read.	2012-09-22 21:50:10 -04:00
Eric Banks	60b93acf7d	RR bug: we need to test that the mapping and base quals are >= the MIN values and not just >. This was causing us to drop Q20 bases.	2012-09-22 21:32:29 -04:00
David Roazen	f6a22e5f50	ExperimentalReadShardBalancerUnitTest was being skipped; fixed TestNG skips tests when an exception occurs in a data provider, which is what was happening here. This was due to an AWFUL AWFUL use of a non-final static for ReadShard.MAX_READS. This is fine if you assume only one instance of SAMDataSource, but with multiple tests creating multiple SAMDataSources, and each one overwriting ReadShard.MAX_READS, you have a recipe for problems. As a result of this the test ran fine individually, but not as part of the unit test suite. Quick fix for now to get the tests running -- this "mutable static" interface should really be refactored away though, when I have time.	2012-09-22 01:56:39 -04:00
David Roazen	e077347cc2	Re-allow running the GATK with experimental downsampling It's now possible to run with experimental downsampling enabled using the --enable_experimental_downsampling engine argument. This is scheduled to become the GATK-wide default next week after diff engine output for failing tests has been examined.	2012-09-21 23:20:46 -04:00
David Roazen	34eed20aa6	PerSampleDownsamplingReadsIterator: fix for incorrect use of DOWNSAMPLER_POSITIONAL_UPDATE_INTERVAL Notify all downsamplers in our pool of the current global genomic position every DOWNSAMPLER_POSITIONAL_UPDATE_INTERVAL position changes, not every single positional change after that threshold is first reached.	2012-09-21 22:43:39 -04:00
David Roazen	133085469f	Experimental, downsampler-friendly read shard balancer -Only used when experimental downsampling is enabled -Persists read iterators across shards, creating a new set only when we've exhausted the current BAM file region(s). This prevents the engine from revisiting regions discarded by the downsamplers / filters, as could happen in the old implementation. -SAMDataSource no longer tracks low-level file positions in experimental mode. Can strip out all related code when the engine fork is collapsed. -Defensive implementation that assumes BAM file regions coming out of the BAM Schedule can overlap; should be able to improve performance if we can prove they cannot possibly overlap. -Tests a bit on the extreme side (~8 minute runtime) for now; will scale these back once confidence in the code is gained	2012-09-21 22:17:58 -04:00
Guillermo del Angel	ab8fa8f359	Bug fix: AlleleCount stratification in VariantEval didn't support higher ploidy and was producing bad tables	2012-09-21 20:48:12 -04:00
Eric Banks	dcd31e654d	Turn off RR tests while I debug	2012-09-21 17:26:00 -04:00
Eric Banks	21251c29c2	Off-by-one error in sliding window manifests itself at end of a coverage region dropping the last covered base.	2012-09-21 17:22:30 -04:00
Mauricio Carneiro	2c3dc291c0	Added positive/negative strand to the synthetic reads	2012-09-21 10:00:48 -04:00
Mauricio Carneiro	51cb5098e4	Fixed the alignment issues with reads that started with empty consensus headers	2012-09-21 10:00:47 -04:00
Mauricio Carneiro	aa1d2f3a5b	Not every consensus is well aligned. Need to check more, but starting position has been fixed.	2012-09-21 10:00:45 -04:00
Mauricio Carneiro	97874b92d1	Program runs, but the consensus reads are all out of place and need more tags	2012-09-21 10:00:44 -04:00
Mauricio Carneiro	3494a52ddc	another intermediate commit to update changes from stable	2012-09-21 10:00:43 -04:00
Mauricio Carneiro	a89ff7b5dd	Intermediate commit to resolve conflicts coming from stable	2012-09-21 10:00:41 -04:00
Mark DePristo	5d758bf97f	Better run a shorter test -- should take 3 minutes total	2012-09-20 18:54:14 -04:00
Mark DePristo	d29218825d	Fix grouping for display of GATKPerformanceOverTime -- God I hate R	2012-09-20 18:45:16 -04:00

1 2 3 4 5 ...

10659 Commits (1be8a88909abe9fbab855e8b63f1ca73e0175e84) All Branches Search

10659 Commits (1be8a88909abe9fbab855e8b63f1ca73e0175e84)

All Branches