Mark DePristo
bb7bf74aa8
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-10 16:05:43 -05:00
Mauricio Carneiro
060c7ce8ae
It wouldn't harm integrationtests if we had our logic right... :-)
2011-11-10 14:03:22 -05:00
Eric Banks
39678b6a20
Check for reads with missing read groups and throw a UserException when encountered. Mauricio said this wouldn't break integration tests.
2011-11-10 13:34:45 -05:00
Mark DePristo
dd1810140f
-stratIntervals is optional
2011-11-10 13:27:32 -05:00
Mark DePristo
67b022c34b
Cleanup for new SampleUtils function
...
-- getVCFHeadersFromRods(rods) is now available so that you don't have getVCFHeadersFromRods(rods, null) throughout the codebase
2011-11-10 13:27:13 -05:00
Mark DePristo
35fe9c8a06
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-10 11:11:33 -05:00
Mark DePristo
dc4932f93d
VariantEval module to stratify the variants by whether they overlap an interval set
...
The primary use of this stratification is to provide a mechanism to divide asssessment of a call set up by whether a variant overlaps an interval or not. I use this to differentiate between variants occurring in CCDS exons vs. those in non-coding regions, in the 1000G call set, using a command line that looks like:
-T VariantEval -R human_g1k_v37.fasta -eval 1000G.vcf -stratIntervals:BED ccds.bed -ST IntervalStratification
Note that the overlap algorithm properly handles symbolic alleles with an INFO field END value. In order to safely use this module you should provide entire contigs worth of variants, and let the interval strat decide overlap, as opposed to using -L which will not properly work with symbolic variants.
Minor improvements to create() interval in GenomeLocParser.
2011-11-10 10:58:40 -05:00
Mauricio Carneiro
0d8983feee
outputting the RG information
...
setReadGroup now sets the read group attribute for the GATKSAMRecord
2011-11-09 23:35:00 -05:00
Eric Banks
315ac68b0b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-09 22:37:36 -05:00
Eric Banks
6313aae2c4
Adding checks for hasBasePileup() before calling getBasePileup() as per GS thread
2011-11-09 22:37:26 -05:00
Ryan Poplin
74a18d3de8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-09 22:29:40 -05:00
Ryan Poplin
24712c0221
Merged bug fix from Stable into Unstable
2011-11-09 22:28:27 -05:00
Ryan Poplin
8942406aa2
Use MathUtils to compare doubles instead of testing for equality
2011-11-09 22:05:21 -05:00
Ryan Poplin
348f2db7fd
Fix for HMM optimization. If the two penalty arrays match exactly the function should return the end of the array instead of 0.
2011-11-09 22:00:52 -05:00
Eric Banks
82bf09edf3
Mark Standard Annotations with an asterisk
2011-11-09 20:42:31 -05:00
Eric Banks
04b122be29
Fix for bug reported on GetSatisfaction
2011-11-09 20:33:36 -05:00
Mauricio Carneiro
d00b2c6599
Adding a synthetic read for filtered data
...
* Generalized the concept of a synthetic read to cread both running consensus and a synthetic reads of filtered data.
* Synthetic reads can now have deletions (but not insertions)
* New reduced read tag for filtered data synthetic reads *(RF)*
* Sliding window header now keeps information of consensus and filtered data
* Synthetic reads are created simultaneously, new functionality is controlled internally by addToSyntheticReads
2011-11-09 20:16:22 -05:00
Eric Banks
21bf43f3bb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-09 15:34:40 -05:00
Christopher Hartl
85bffe1dca
Merged bug fix from Stable into Unstable
2011-11-09 15:29:14 -05:00
Christopher Hartl
d828eba7f4
Allow comments in a table-formatted file to precede the header line.
2011-11-09 15:27:38 -05:00
Eric Banks
8205efbb29
Merge branch 'master' into intervals
2011-11-09 15:27:15 -05:00
Eric Banks
d64f8a89a9
Instead of the SelfScopingFeatureCodec interface, pushed this functionality into Tribble itself. Now we can e.g. determine that a file can be parsed by the BedCodec on the fly.
2011-11-09 15:24:29 -05:00
Mauricio Carneiro
f080f64f99
Preserve RG information on new GATKSAMRecord from SAMRecord
2011-11-09 14:39:20 -05:00
Mauricio Carneiro
f9530e0768
Clean unnecessary attributes from the read
...
this gives on average 40% file size reduction.
2011-11-09 14:39:20 -05:00
Mauricio Carneiro
9427ada498
Fixing no cigar bug
...
empty GATKSAMRecords will have a null cigar. Treat them accordingly.
2011-11-09 14:39:20 -05:00
Mark DePristo
e639f0798e
mergeEvals allows you to treat -eval 1.vcf -eval 2.vcf as a single call set
...
-- A bit of code cleanup in VCFUtils
-- VariantEval table to create 1000G Phase I variant summary table
-- First version of 1000G Phase I summary table Qscript
2011-11-09 14:35:50 -05:00
Christopher Hartl
149b79eaad
Merged bug fix from Stable into Unstable
2011-11-09 11:26:30 -05:00
Christopher Hartl
11abb4f9d1
Better error message.
2011-11-09 11:25:28 -05:00
Christopher Hartl
d3a533b82e
Revert "a"
...
This reverts commit 1175f50ddbf389f5da74d27dc725596582ae15af.
2011-11-09 11:22:26 -05:00
Christopher Hartl
5eaf800281
a
2011-11-09 11:22:20 -05:00
Christopher Hartl
5451fbc2b2
Merged bug fix from Stable into Unstable
2011-11-09 11:06:15 -05:00
Christopher Hartl
091229e4db
MVLikelihoodRatio now checks if the family string is provided before attempting to instantiate. Also check that variant contexts have both genotypes and genotype likelihoods.
...
Table codec now yells at users for not providing a HEADER with the table - parsing tables without a header line was causing the first line of the file to be eaten.
Table feature now has a toString method.
These are minor bug fixes.
2011-11-09 11:03:29 -05:00
Mauricio Carneiro
e1b4c3968f
Fixing GATKSAMRecord bug
...
when constructing a GATKSAMRecord from scratch, we should set "mRestOfBinaryData" to null so the BAMRecord doesn't try to retrieve missing information from the non-existent bam file.
2011-11-08 16:50:36 -05:00
Ryan Poplin
e973ca2010
fixing merge conflict.
2011-11-08 14:55:05 -05:00
Ryan Poplin
b0e6afec48
Bug fix for HMM optimization. Need to also check the gap continuation penalty array for the index with the first discrepancy.
2011-11-08 14:51:25 -05:00
Laurent Francioli
571c724cfd
Added reporting of the number of genotypes updated.
2011-11-08 15:15:51 +01:00
Ryan Poplin
94dc447a70
Merged bug fix from Stable into Unstable
2011-11-07 15:26:35 -05:00
Ryan Poplin
0b181be61f
Bug fix in SelectVariants when using a discordance track but no sample specifications. Added integration test to test this.
2011-11-07 15:25:16 -05:00
Ryan Poplin
0534149708
Merged bug fix from Stable into Unstable
2011-11-07 14:07:08 -05:00
Ryan Poplin
2d1e385ca4
Adding note to VQSR docs about Rscript being needed in the environment PATH.
2011-11-07 14:04:13 -05:00
Eric Banks
759f4fe6b8
Moving unclaimed walker with bad integration test to archive
2011-11-07 13:16:38 -05:00
Eric Banks
c1986b6335
Add notes to the GATKdocs as to when a particular annotation can/cannot be calculated.
2011-11-07 11:06:19 -05:00
Eric Banks
724e3f3b0d
Merged bug fix from Stable into Unstable
2011-11-06 22:23:22 -05:00
Eric Banks
cdd40d1222
Removing contracts for the SimpleTimer
2011-11-06 22:22:49 -05:00
Ryan Poplin
5c565d28b9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-06 10:26:19 -05:00
Eric Banks
1c4e429a1c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-06 00:05:56 -04:00
Eric Banks
a12bc63e5c
Get rid of support for bams without sample information in the read groups. This hidden option wasn't being used anyways because it wasn't hooked up properly in the AlignmentContext.
2011-11-05 23:54:28 -04:00
Eric Banks
90a053ea93
Don't change the mapping quality of MQ=255 reads in IR
2011-11-05 22:40:45 -04:00
Ryan Poplin
611a395783
Now properly extending candidate haplotypes with bases from the reference context instead of filling with padding bases. Functionality in the private Haplotype class is no longer necessary so removing it. No need to have four different Haplotype classes in the GATK.
2011-11-05 12:18:56 -04:00
Mark DePristo
e99871f587
Bug fix for decode loc
...
-- decodeLoc() wasn't skipping input header lines, so the system blew up when there was an = line being split.
2011-11-04 13:20:54 -04:00
Mark DePristo
a340a1aeac
Bug fix. decodeLoc() should update lineNo so you get meaningful line no when indexing
...
due to malformed VCF files.
2011-11-04 11:44:24 -04:00
Mark DePristo
9f260c0dc1
Zero byte index bug fix for RandomlySplitVariants + cleanup
...
-- vcfWriter2 was never being closed in onTraversalDone(), so the on the fly index file was being created but never actually properly written to the file.
-- This bug is ultimately due to the inability of the GATK to allow multiple VCF output writers as @Output arguments, though
-- Removed the unnecessary local variable iFraction, = 1000 * the input fraction argument. Now the system just uses a double random number and compares to the input fraction at all. Is there some subtle reason I don't appreciate for this programming construct?
2011-11-04 09:45:20 -04:00
Mauricio Carneiro
e89ff063fc
GATKSAMRecord refactor
...
The GATK engine will now provide a GATKSAMRecord to all tools which incorporates the functionality used by the GATK to the bam file (ReadGroups, Reduced Reads, ...).
* No tools should create SAMRecord anymore, use GATKSAMRecord instead *
2011-11-03 15:43:26 -04:00
Laurent Francioli
385a6abec1
Fixed a bug that wrongly swapped the mother and father genotypes in case the child genotype missing.
2011-11-03 13:04:53 +01:00
Laurent Francioli
893787de53
Functions getAsMap and getNegLog10GQ now handle missing genotype case.
2011-11-03 13:04:11 +01:00
Eric Banks
e8bceb1eaa
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-02 21:13:54 -04:00
Eric Banks
52b16bf739
Must check whether there's a normal vs. extended pileup before asking for it.
2011-11-02 20:45:24 -04:00
Eric Banks
e1edd6bd12
Removing the min mapping quality argument since it wasn't being used in the normal processing of the pileups in UG - only for indel pileups. Instead, we apply the min base quality to the reads in the pileup for indels and define it to be the min 'confidence' of the base. Docs are updated but I didn't rename the argument as I don't want people to complain.
2011-11-02 20:32:58 -04:00
Ryan Poplin
e94fcf537b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-02 16:29:19 -04:00
Ryan Poplin
4d35272916
Bug fixes with Mauricio to functions in ReadUtils used by reduced reads and the haplotype caller.
2011-11-02 16:29:10 -04:00
Mark DePristo
8a2929c1dd
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-02 16:21:00 -04:00
Laurent Francioli
19ad5b635a
- Calculation of parent/child pairs corrected
...
- Separated the reporting of single and double mendelian violations in trios
2011-11-02 18:35:31 +01:00
Eric Banks
967ff647b8
Reduced reads shouldn't contribute to Fisher Strand calculations
2011-11-02 13:07:20 -04:00
Eric Banks
cf0e699226
QualByDepth was inefficiently iterating over the pileup 2 times for some reason. Removed non-useful annotation classes.
2011-11-02 12:58:38 -04:00
Eric Banks
4501dce58d
Fixing merge conflict
2011-11-02 12:50:32 -04:00
Eric Banks
54331b44e9
New way of looking at the size of a pileup: there's a physical number of elements in the data structure and there's a representative depth of coverage (since a reduced read represents depth >= 1). The size() method has been removed because its meaning is ambiguous. Updated several annotations and the UG engine to make use of the representative depths.
2011-11-02 12:47:30 -04:00
Mark DePristo
c2b97030a4
IntervalUtils for completely balanced locus-based scatter/gather
...
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Laurent Francioli
119ca7d742
Fixed a bug in parent/child pairs reporting causing a crash in case the -mvf option was used and mother was not provided
2011-11-02 08:22:33 +01:00
Laurent Francioli
b91a9c4711
- Fixed parent/child pairs handling (was crashing before)
...
- Added parent/child pair reporting
2011-11-02 08:04:01 +01:00
Mark DePristo
5fc613f972
Better default partition types for walkers
...
-- Added PartitionType.READ, and associated ReadScatterFunction. ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Mauricio Carneiro
36600fd8e9
added MQ of low MQ/BQ to consensus RMS
...
Bases that were excluded for MQ and BQ filters are now contributing to the MQ RMS (but not to consensus base counts and variant/not variant region triggers).
2011-11-01 17:46:12 -04:00
Mauricio Carneiro
b004489c6d
Moving ReduceRead TAG to GATKSAMRecord
...
ReduceReads are now a feature of a GATKSAMRecord, so the tag and the special methods needed to use it will now be housed by the GATKSAMRecord.
2011-11-01 17:12:09 -04:00
Mauricio Carneiro
17cc484dbd
Revert "ReduceReads ref bases are now output as '='
...
Reducing the reference bases to '=' results in an extra compression of 13% on average. The GATK is not ready to handle files with '=' bases, and the decision was to implement this a an engine support, not a part of ReduceReads.
2011-11-01 16:35:07 -04:00
Eric Banks
0839c75c8d
More minor fixes to docs
2011-10-31 21:49:27 -04:00
Eric Banks
74b018a1f3
Minor fixes to docs
2011-10-31 21:41:43 -04:00
Eric Banks
31ee5432c5
Merged bug fix from Stable into Unstable
2011-10-31 14:56:59 -04:00
David Roazen
cdde32acbd
Merged bug fix from Stable into Unstable
2011-10-31 14:21:15 -04:00
Eric Banks
f62af0291b
Check for invalid VCF records (not enough tokens) instead of assuming they are there.
2011-10-31 14:09:51 -04:00
Andrey Sivachenko
bed0acaed4
nWayOut now adds PG tag to the header as it should. Also, additional hidden option added: keepPGTags. If invoked, IndelRealigner PG tags from previous runs (if any) are kept in the header and the new PG tag is simply added, instead of overriding them
2011-10-31 12:28:28 -04:00
Mauricio Carneiro
389380a590
ReduceReads ref bases are now output as '=' to save space
...
Restructured the sliding window framework to manipulate a wrapped version of the SAMRecord that contains information about the reference.
2011-10-30 12:04:39 -04:00
Eric Banks
0ca7428e76
Allow processing of empty intervals, but warn user when this case is encountered.
2011-10-28 12:12:14 -04:00
Eric Banks
649dfe98f0
Add VCF header for any expressions that are requested
2011-10-28 10:22:19 -04:00
Eric Banks
057a79f598
This argument should be annotated as @Input
2011-10-28 09:44:49 -04:00
Eric Banks
4ba7c0cecd
Moving to private
2011-10-28 09:29:28 -04:00
Eric Banks
1bdd76c2f2
These tools now use the IntervalBinding system to handle intervals instead of doing it all manually
2011-10-28 09:28:12 -04:00
Eric Banks
6ba08a103d
Empty ROD files should generate an exception when used for creating intervals. Moved some now obsolete files to the archive as the realigner will now read all target intervals into memory.
2011-10-28 09:23:25 -04:00
Eric Banks
3d04bb5608
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-27 23:55:18 -04:00
Eric Banks
19e27d4568
Removing all instances of -BTI (in tests and in GATKdocs) and replacing them with the appropriate alternative.
2011-10-27 23:55:11 -04:00
Eric Banks
cafc245a43
For some reason, a class of Codecs (including TableCodec) require that a GenomeLocParser be passed in to do the position processing. Why can't they just return a Feature with chr, start, stop? Isn't that the right thing?
2011-10-27 23:54:28 -04:00
Guillermo del Angel
cbc43683ee
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-27 20:54:18 -04:00
Guillermo del Angel
8907e42007
First fully functional implementation of ValidationSiteSelectorWalker. User gives a) a set of input variants, b) a desired number of output variants, b) Optionally, a set of samples which will restrict sites to be polymorphic in those samples, c) a frequency selection mode: either uniform (no AF matching), or matching AF so that output sites mirror the input AF spectrum as closely as possible.
...
More testing is needed and docs need improving but so far all functionality seems up and running
2011-10-27 20:53:48 -04:00
Eric Banks
ccfd853b34
Added further integration tests for rod-based intervals that deal with more complex cases. Good call by Mark to test the empty VCF example because we were failing on it; fixed.
2011-10-27 20:43:50 -04:00
Eric Banks
c2f343773e
Oops, working too quickly last time. This is the proper fix for the potential NPE in the equals() test.
2011-10-27 15:32:08 -04:00
Khalid Shakir
b80d407dc7
No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
...
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Eric Banks
8c4dbce6d8
Don't serialize the GATKArgumentCollection for the GATKRunReports (which would have meant dealing with the new IntervalBindings). Also, forgot to remove a test that's no longer relevant to BED parsing.
2011-10-27 13:58:19 -04:00
Eric Banks
4a7e6fee3f
Remove support for BED file interval parsing in the GATK; it should all go through Tribble now. IndelRealigner no longer supports unordered interval input (which shouldn't have been used anyways). Temporarily commenting out serialization of arguments so that tests pass; this whole piece will be deleted soon anyways.
2011-10-27 13:38:08 -04:00
Matt Hanna
f7df8bdecc
Merged bug fix from Stable into Unstable
2011-10-27 11:31:17 -04:00
Matt Hanna
41ddc7bce7
Make sure we output a full stack trace when we encounter Tribble error messages on VCF header merge.
2011-10-27 11:30:04 -04:00
Eric Banks
44f905b5e5
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-26 23:31:11 -04:00
Eric Banks
68283b1651
Fixing docs and adding GATKdocs for the new interval functionality
2011-10-26 22:14:43 -04:00
Mark DePristo
c9978316a3
Merge branch 'FragmentUtils'
2011-10-26 19:51:49 -04:00
Mauricio Carneiro
add9ad97ec
No scatter gather for VQSR or ApplyVQSR.
...
These walkers should not be scatter gatherable. Annotating them accordingly so that Queue doesn't allow a less than knowledgeable user to try and scatter/gather VQSR.
2011-10-26 16:35:44 -04:00
Ryan Poplin
74aeb22eeb
Merged bug fix from Stable into Unstable
2011-10-26 15:57:30 -04:00
Ryan Poplin
86871bd1e3
Throw a UserException in the BQSR when there is no data instead of creating an empty csv file
2011-10-26 15:56:41 -04:00
Mark DePristo
034a997d07
Generalized Reads -> Fragment calculation
...
-- Supports ReadBackedPileup -> FragmentCollection as before
-- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller
-- General cleanup, renaming, move to separate package, more extensive unit tests, etc.
-- Added toFragment() function to ReadBackedPileup interface
2011-10-26 15:54:38 -04:00
Eric Banks
2f21b6ecfb
Removed debugging output
2011-10-26 15:50:20 -04:00
Eric Banks
b39fcb1bea
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-26 15:44:25 -04:00
Eric Banks
b6ce6ed3f8
Go around the ROD system for now so that we can just call decodeLoc() for efficiency. Noted that we should go through the ROD system once it gets cleaned up. This means that currently gzipped files are not supported with -L.
2011-10-26 15:42:53 -04:00
Eric Banks
9424e8b2ca
Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed.
2011-10-26 14:11:49 -04:00
Mark DePristo
7fa943aef1
Renamed FragmentPileup to FragmentUtils
2011-10-26 14:01:45 -04:00
Laurent Francioli
1f044faedd
- Genotype assignment in case of equally likeli combination is now random
...
- Genotype combinations with 0 confidence are now left unphased
2011-10-26 19:57:09 +02:00
Laurent Francioli
81b163ff4d
Indentation
2011-10-26 14:49:12 +02:00
Laurent Francioli
62cff266d4
GQ calculation corrected for most likely genotype
2011-10-26 14:40:04 +02:00
Mark DePristo
af3613cc5f
GATKSAMRecord commit branch summary
...
First, I'm sure there's a better way to do this, but I wanted to create a single commit summarizing the changes from my branch SamRecordFactory. What's the best way to do this? Rebase?
Now, on to the changes here:
-- Picard added a SamRecordFactory that is used to create instances the subclass SamRecord or BAMRecord. This factory allows us to have low-level picard readers (SamFileReader) create objects of type GATKSamRecord. The abomination of the extends and contains GATKSamRecord is now gone. GATKSamRecords are now produced by this factory, the GATK provides this factory to our SamFileReaders, and everything works with GATKSamRecord just extending BAMRecord. This results in up to a 2x performance improvement in writing BAMs and a ~10% improvement when reading BAMs files.
-- As a consequence of this, we no longer officially support SAM records. Attempting to create SAMRecord objects with the factory will throw a user exception.
-- Created a standard NGSPlatform enum, and GATKSamRecords support efficiently obtaining this value. The real BQSR (not the copy indel version) got the efficient code to use this. Please add all future platforms to this enum.
-- GATKSamRecord no longer supports using the OQ or defaultBaseQuality. This is performed in a wrapper iterator that's only added when these command line options are used.
-- ReducedRead code has been moved from ReadUtils until efficiency caching assessors in GATKSamRecord.
-- ArtificialSamUtils creates GATKSamRecords now, just SAMRecords. Added code here to create artifical pairs and using that code to create artificial ReadBackedPileups with specific properties
-- New smarter algorithm for FragmentPileup. This new code is up to 3x faster than the previous version, and is lazy so is more efficient when no overlapping pairs are actually in the pileup. Created extensive DataProvider driven UnitTest. Added Caliper-based benchmarking system to characterize the performance differences between the old and new algorithms. TODO still remains to make a efficient version that works for non-pileups for the HaplotypeCaller
2011-10-25 20:52:56 -04:00
Mark DePristo
2822f0dc27
Merge branch 'SamRecordFactory'
2011-10-25 20:34:47 -04:00
Mark DePristo
1b722c21cf
merge master
2011-10-25 16:08:39 -04:00
Ryan Poplin
56fdf0b865
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-25 15:58:56 -04:00
Ryan Poplin
4a34c1862e
misc cleanup. We now filter out haplotypes when it is obvious that the assembly has failed to find a parsimonious event rather than use haplotypes with large numbers of SNPs and small indels on them.
2011-10-25 15:22:28 -04:00
Guillermo del Angel
b559936b7a
a)New variant eval stratification module for indel size. b) Next iteration on indel caller runtime optimization: when computing likelihood of each haplotype for a given read, many computations will be redundant since pieces of haplotypes will be common to both REF and ALT haplotypes. So, we keep HMM matrices from one haplotype to the next one and recompute starting at the part where either haplotype is different or GOP/GCP are different.
2011-10-25 09:56:43 -04:00
Khalid Shakir
fac9932938
Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
...
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Khalid Shakir
89a581a66f
Added ability to specify arguments in files via -args/--arg_file
...
Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()
2011-10-24 15:58:34 -04:00
Mark DePristo
502592671d
Cleanup FragmentPileup before main repo commit
...
-- removed intermiate functions. Now only original version and best optimized new version remain
-- Moved general artificial read backed pileup creation code into ArtificialSamUtils
2011-10-24 14:40:05 -04:00
Mark DePristo
166174a551
Google caliper example execution script
...
-- FragmentPileup with final performance testing
2011-10-24 14:04:53 -04:00
Laurent Francioli
62477a0810
Added documentation and comments
2011-10-24 13:45:21 +02:00
Laurent Francioli
38ebf3141a
- Now supports parent/child pairs
...
- Sites with missing genotypes in pairs/trios are handled as follows:
-- Missing child -> Homozygous parents are phased, no transmission probability is emitted
-- Two individuals missing -> Phase if homozygous, no transmission probability is emitted
-- One parent missing -> Phased / transmission probability emitted
- Mutation prior set as argument
2011-10-24 12:30:04 +02:00
Laurent Francioli
7312e35c71
Now makes use of standard Allele and Genotype classes. This allowed quite some code cleaning.
2011-10-24 10:25:53 +02:00
Laurent Francioli
01b16abc8d
Genotype quality calculation modified to handle all genotypes the same way. This is inconsistent with GQ output by the UG but is correct even for cases of poor quality genotypes.
2011-10-24 10:24:41 +02:00
Mark DePristo
f6ccac889b
Merged bug fix from Stable into Unstable
2011-10-23 16:37:12 -04:00
Mark DePristo
585a45b7a3
Bug fix for ClipReadsWalker when stats output isn't provided
...
-- See http://getsatisfaction.com/gsa/topics/clipreadswalker?utm_content=topic_link&utm_medium=email&utm_source=reply_notification
2011-10-23 16:36:48 -04:00
Ryan Poplin
f5d910b8a5
Haplotype caller now sends genotype likelihoods to the exact model to genotype the events found in the best haplotypes.
2011-10-23 13:29:08 -04:00
Mark DePristo
42bf9adede
Initial version of "fast" FragmentPileup code
...
-- Uses mayOverlapRoutine in ReadUtils
-- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations
-- PileupElement now comparable (sorts on offset than on start)
-- Caliper microbenchmark to assess performance
2011-10-22 21:36:37 -04:00
Mauricio Carneiro
4913f8a60f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-21 17:45:07 -04:00
Mauricio Carneiro
102dafdcbc
Validation of GATKSamRecord in read filters
...
Moved the validation of the GATKSamRecord to the MalformedReadFilter with the intent to make the read filter the ultimate validation location for sam records. This way we can opt to filter out malformed reads if we know what we are doing or blow up otherwise.
2011-10-21 17:40:43 -04:00
Guillermo del Angel
f4b409fa0d
CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result
2011-10-21 14:07:20 -04:00
Mark DePristo
b863390cb1
Moving reduced read functionality into GATKSAMRecord
...
-- More functions take / produce GATKSAMRecords instead of SAMRecord
2011-10-21 13:28:05 -04:00
Mark DePristo
2403e96062
Renamed GATKSamRecord -> GATKSAMRecord for consistency. Better docs.
2011-10-21 09:59:24 -04:00
Mark DePristo
110e13bc1e
Merge branch 'master' into SamRecordFactory
2011-10-21 09:43:52 -04:00
Mark DePristo
be797a8a1f
Recalibrator now uses the much more efficient NGSPlatform in the cycle covariates system
2011-10-21 09:39:21 -04:00
Mark DePristo
ed74ebcfa1
GATKSamRecords with efficiency NGSPlatform method
2011-10-21 09:38:41 -04:00
Mark DePristo
94e1898d8f
A canonical set of NGS platforms as enums with convenient manipulation methods
2011-10-21 09:37:45 -04:00
Laurent Francioli
edea90786a
Genotype quality is now recalculated for each of the phased Genotypes. Small problem is that we unnecessarily loose a little precision on the genotypes that do not change after assignment.
2011-10-20 17:04:19 +02:00
Laurent Francioli
1c61a57329
Original rewrite of PhaseByTransmission:
...
- Adapted to get the trio information from the SampleDB (i.e. from Pedigree file (ped)) => Multiple trios can be passed as argument
- Mendelian violations and trio phasing possibilities are pre-calculated and stored in Maps. => Runtime is ~3x faster
- Genotype combinations possible only given two MVs are now given a squared MV prior (e.g. 0/0+0/0=>1/1 is given 10^-16 prior if the MV prior is 10^-8)
- Corrected bug: In case the best genotype combination is Het/Het/Het, the genotypes are now set appropriately (before original genotypes were left even if they weren't Het/Het/Het)
- Basic reporting added:
-- mvf argument let the user specify a file to report remaining MVs
-- When the walker ends, some basic stats about the genotype reconfiguration and phasing are output
Known problems:
- GQ is not recalculated even if the genotype changes
Possible improvements:
- Phase partially typed trios
- Use standard Allele/Genotype Classes for the storage of the pre-calculated phase
2011-10-20 13:06:44 +02:00
Laurent Francioli
ef6a6fdfe4
Added getAsMap -> returns the likelihoods as an EnumMap with Genotypes as keys and likelihoods as values.
2011-10-20 12:49:18 +02:00
Laurent Francioli
76dd816e70
Added getParents() -> returns an arrayList containing the sample's parent(s) if available
2011-10-20 12:47:27 +02:00
Mark DePristo
999a8998ae
Constructor for GATKSamRecord with header only, for unit testing
2011-10-19 17:51:48 -04:00
Mark DePristo
bba69701b5
Now creates GATKSamRecords now SamRecords
2011-10-19 17:49:17 -04:00
Christopher Hartl
cd8a6d62bb
You know how the wiki has a big section on commiting local changes to BRANCHES of the repository you clone it from? Yeah. It sucks if you don't do that.
...
This commit contains:
- IntronLossGenotyper is brought into its current incarnation
- A couple of simple new filters (ReadName is super useful for debugging, MateUnmapped is useful for selecting out reads that may have a relevant unaligned mate)
- RFA now matches my current local repository. It's in flux since I'm transitioning to the new traversal type.
+ the triggering read stash pilot required me to change the scope of some of the variables in the ReadClipping code, private -> protected. Those are all the changes there.
- MendelianViolation restored to its former glory (and an annotator module that uses the likelihood calculation has been added)
+ use this rather than a hard GQ threshold if you're doing MV analyses.
- Some miscellaneous QScripts
2011-10-19 17:42:37 -04:00
Mark DePristo
52345f0aec
Meaningful documentation string
2011-10-19 15:47:36 -04:00
Mark DePristo
1b38aa1a7e
Cleaning up reduced read code accessors
2011-10-19 15:46:44 -04:00
Eric Banks
d8d73fe4f2
Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.
2011-10-19 15:11:13 -04:00
Mark DePristo
7928b287fc
GATKSamRecord now produced by SAMFileReaders by default
...
-- Removed all of the unnecessary caching operations in GATKSAMRecord
-- GATKSAMRecord renamed to GATKSamRecord for consistency
2011-10-19 13:15:27 -04:00
Eric Banks
5a6468c11e
Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.
2011-10-19 11:52:05 -04:00
Eric Banks
48c4a8cb33
Make error messages clearer (even I was confused)
2011-10-19 11:49:16 -04:00
Eric Banks
6cadaa84c9
Just use validate() from super class since it does the same thing
2011-10-19 11:48:23 -04:00
Mark DePristo
df3e4e1abd
First working code to use SamRecordFactory to produce objects of our own design in SAMFileReader
2011-10-19 11:22:35 -04:00
Mauricio Carneiro
c27e2fb676
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-18 15:23:05 -04:00
Mark DePristo
f77f2eeb7d
Fix for new ID structure
2011-10-18 13:04:43 -04:00
Mark DePristo
1a92ee3593
No longer adds a binding of ID -> . when the ID field is dot in the VCF
...
-- Really we should make ID a primary key in VariantContext. Putting it into the attributes is just annoying now
2011-10-18 10:57:02 -04:00
Ryan Poplin
e45fcb66eb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 15:56:19 -04:00
Ryan Poplin
1e6794c539
fixing typo in VariantsToTable docs
2011-10-17 15:56:02 -04:00
Mark DePristo
0de8550f17
Merged bug fix from Stable into Unstable
2011-10-17 15:29:53 -04:00
Mark DePristo
c1329c4dde
Fixing a binary to logical or
2011-10-17 15:29:45 -04:00
Mark DePristo
9e4963efc8
Merged bug fix from Stable into Unstable
2011-10-17 15:27:38 -04:00
Mark DePristo
ec911ce5bb
Even better error messages
2011-10-17 15:27:22 -04:00
Mark DePristo
d065bf1715
Merged bug fix from Stable into Unstable
2011-10-17 15:25:47 -04:00
Mark DePristo
a7cf9cdc67
Fixing error message typo
2011-10-17 15:25:35 -04:00
Ryan Poplin
589df6b7cf
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 14:35:14 -04:00
Ryan Poplin
6b02354d84
Adding a new getter in VariantsToTable to extract the indel event length.
2011-10-17 14:34:52 -04:00
Mark DePristo
3550798c4c
Merged bug fix from Stable into Unstable
2011-10-17 13:58:56 -04:00
Mark DePristo
4108a294f7
Better error message when a RodBinding file doesn't exist
2011-10-17 13:58:46 -04:00
Mark DePristo
cc76826f78
Merged bug fix from Stable into Unstable
2011-10-17 13:38:11 -04:00
Mark DePristo
fd4540cd32
Fixed extraordinarily subtle race condition with contracts invariant
...
-- all of the methods in the class must be synchronized or the internal state can be inconsistent with the contract invariant when entering the class in a non-synchronized method, even when that method doesn't care about the object's internal state
2011-10-17 13:37:55 -04:00
Mark DePristo
5a881360df
Merged bug fix from Stable into Unstable
2011-10-13 15:54:43 -04:00
Mark DePristo
7cab6f6bb0
Bug fixes for thread unsafe simple timer and bad Ns treatment in AlignmentUtils
...
-- SimpleTimer is now threadsafe using synchronized method keywords
-- Bug fix for alignmentToByteArray() where the N case was refPos++ not the now correct refPos += elementLength
2011-10-13 15:53:12 -04:00
Mauricio Carneiro
e12ffb6547
Updating docs for GCContentByInterval
...
This walker does not take any BAMs. It only walks over the reference.
2011-10-13 13:27:00 -04:00
Eric Banks
9aecd50473
Adding ability to exclude annotations from the VA and UG lists. As described in the docs, this argument trumps all others (including -all) so that we can get around the SnpEff issue brought up by Menachem. Added integration test for it.
2011-10-12 15:44:54 -04:00
Mauricio Carneiro
e53a952aeb
Added ION Torrent support to CountCovariates.
2011-10-12 01:57:02 -04:00
Mauricio Carneiro
a2733a451f
Added NotCalled feature to GAV
...
Added "not called" and "no status" to the truth table. Very useful.
2011-10-11 19:31:45 -04:00
David Roazen
ae83420637
Merged bug fix from Stable into Unstable
2011-10-11 12:26:08 -04:00
David Roazen
794f275871
SnpEff is now marked as a RodRequiringAnnotation instead of an ExperimentalAnnotation.
...
Having SnpEff grouped with the Experimental annotations was proving problematic, since it
requires a rod. Placing it in its own group should improve the situation somewhat, making it
easier to request "all annotations except for SnpEff".
2011-10-11 12:08:56 -04:00
David Roazen
cfd0ac8410
Merged bug fix from Stable into Unstable
...
Conflicts:
public/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java
2011-10-11 12:03:51 -04:00
David Roazen
24b72334b3
UnifiedGenotyper now correctly initializes the VariantAnnotator engine.
...
This allows the annotation classes to perform any necessary initialization/validation.
For example, it allows the SnpEff annotator to (among other things) validate its rod binding.
This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding
is present.
Added an integration test to cover this case so that it doesn't break again.
2011-10-11 12:02:05 -04:00
Guillermo del Angel
0429b38021
Merged bug fix from Stable into Unstable
2011-10-11 11:19:38 -04:00
Guillermo del Angel
1c485d8b5e
Forgot that no matter how trivial a change it's a good idea to compile first
2011-10-11 11:18:41 -04:00
Guillermo del Angel
6418f4d69b
Merged bug fix from Stable into Unstable
2011-10-11 11:13:18 -04:00
Guillermo del Angel
1975de1b32
Second try: hide --do_indel_quality in AnalyzeCovariates
2011-10-11 11:11:29 -04:00
Guillermo del Angel
6506ea83e8
Revert "Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users"... a hidden passenger change made it through.
...
This reverts commit 70e10ccb1be90dcff8f4485ae6ee036db2d1ac86.
2011-10-11 11:03:12 -04:00
Guillermo del Angel
4c1d8c8d44
Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users
2011-10-11 11:01:06 -04:00
Eric Banks
77c983c5b5
No one claimed this walker and it doesn't have integration tests or GATKdocs so it doesn't belong in public.
2011-10-10 15:17:54 -04:00
Mark DePristo
fb72bcf732
DiffObjects no longer prints out the file name in the status so MD5 are stable
2011-10-10 15:10:57 -04:00
Mark DePristo
46e7370128
this.allele, getAlleles(), and getAltAlleles() now return List not set
...
-- Changes associated code throughout the codebase
-- Updated necessary (but minimal) UnitTests to reflect new behavior
-- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC
2011-10-09 11:45:55 -07:00
Mark DePristo
c67f6c076b
simpleMerge now preserves allele order
...
-- UnitTests for dangerous PL merging cases in the multi-allelic case. The new behavior is correct
2011-10-08 17:39:53 -07:00
Mark DePristo
ec14a4a606
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-07 08:38:50 -07:00
Eric Banks
ca9cd9b688
Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC.
2011-10-06 22:38:44 -04:00
Mark DePristo
c7864c7256
Filter application order is now deterministic, in the order defined by the walker
...
-- For no apparent reason we were using a HashSet to store the ReadFilters, so the order of operations was really arbitrarily applied. The order now is
(1) the order of the walker intrinsic filters
(2) read group black list (if provided)
(3) command line filters (if provided)
2011-10-06 18:51:40 -07:00
Mark DePristo
0b88af4af9
Counts of records failing filters are displayed sorted
...
-- Stops random ordering of the output, as the counts are returned sorted by string name of the class
-- Deleted now unused sh*tty assessors in Utils
2011-10-06 18:42:26 -07:00
Mark DePristo
d1e70d6ec2
Removed Nx counting of reads in metrics with -nt > 1
2011-10-06 18:29:26 -07:00
Eric Banks
c61804a450
Rename the long version of the argument name to more accurately reflect its purpose.
2011-10-06 16:14:04 -04:00
Eric Banks
61a3dfae24
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 15:58:04 -04:00
Eric Banks
6eb87bf58a
RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop.
2011-10-06 15:57:49 -04:00
Eric Banks
1b0735f0a3
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 13:41:45 -04:00
Eric Banks
c4dfc1fb8b
Temporary commit of parallelization support for RealignerTargetCreator. Tim begged us for this and I got assurances from Khalid/Matt that this would also be extremely helpful for the whole genome calling pipeline, so I spent a while working on this. Needs to be fixed up though because apparently only the leaves in the hierarchical reduce get their output aggregated. Worked out a better solution with Matt.
2011-10-06 13:41:36 -04:00
Mark DePristo
73f9d1f217
GATK read group requirement iron hand
...
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo
23845ac798
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 08:17:08 -07:00
Mark DePristo
daa5999489
Fixed typo in argument description
2011-10-06 08:16:25 -07:00
Guillermo del Angel
8a474e38ff
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 10:08:39 -04:00
Guillermo del Angel
93f7e632bd
Minor fix/enhancement for VariantEval: if a vcf has symbolic alleles, program would crash ungracefully - now we'll just skip record without processing. This is a big issue since we can't process 1000G integration files with code as is.
2011-10-06 10:07:46 -04:00
Mark DePristo
190be4d0d1
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-05 21:27:11 -07:00
Mark DePristo
8e6845806a
Allowing empty samples list in LIBS
...
-- Right now we cannot process BAM files without read groups because we enforce the samples list to not be empty when there's a SAM record. Now if there are reads and there are no samples we add the "null" sample so that LIBS walks the reads properly
2011-10-05 21:26:21 -07:00
Matt Hanna
180c8f286f
Merged bug fix from Stable into Unstable
2011-10-05 20:37:43 -04:00
Matt Hanna
55b9f06527
Ensure that IndelRealigner n-way out option supports MD5 generation.
2011-10-05 20:36:28 -04:00
Mark DePristo
be2d29ce69
Final PED documentation
2011-10-05 15:17:41 -07:00
Mark DePristo
3226d5dc0d
Merge branch 'master' into ped
2011-10-05 15:03:09 -07:00
Mark DePristo
6a573437af
Details documentation arguments for -ped
2011-10-05 15:00:58 -07:00
Mark DePristo
e7c80f7c45
Renaming quantitative trait to OtherPhenotype which is now a String not a double
...
-- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles
2011-10-05 12:26:33 -07:00
Mark DePristo
51ecc20867
getFamily() and associated methods implemented and tested
...
-- Sample no longer serializable
-- Sample now implements Comparable
2011-10-05 09:55:05 -07:00
Mark DePristo
a45d985818
TODO method stubs
2011-10-04 15:54:09 -07:00
Mark DePristo
fee89e47ff
Only throws an error when there are no samples but there are reads
...
-- Handles the case when you are running a ROD traversal and yet the LIBS is still used to return null everywhere.
2011-10-04 06:50:54 -07:00
Mark DePristo
f552aede42
Only provide the sample names in the BAM file for efficiency
2011-10-04 06:50:12 -07:00
Mark DePristo
a27641e1fc
Cleaned up imports
2011-10-04 06:28:36 -07:00
Mark DePristo
b20689ff55
No longer supports extraProperties
...
-- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem
-- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record. If the two records are inconsistent, an error is thrown
-- addSample() in Sample.class now invokes mergeSample() when appropriate
-- Validation types are now only STRICT or SILENT
-- Validation code implemented in SampleDBBuilder
-- Extensive unit tests for SampleDBBuilder
2011-10-03 19:20:33 -07:00
Mauricio Carneiro
3837aa45b4
Fixing conflicts
...
Conflicts:
public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java
2011-10-03 19:07:59 -07:00
Mark DePristo
2e3dc52088
Minor function renaming
2011-10-03 14:41:13 -07:00
Mark DePristo
dd71884b0c
On path to SampleDB engine integration
...
-- PedReader tag parser
-- Separation of SampleDBBuilder from SampleDB (now immutable)
-- Removed old sample engine arguments
2011-10-03 12:08:07 -07:00
Eric Banks
c3eff7451a
Found a small inefficiency while profiling: we were still using String.split instead of ParsingUtils.split to break up array values in the INFO field. There was a noticeable (albeit not big) difference in the change when reading sites only files.
2011-10-03 14:20:39 -04:00
Mark DePristo
8ee0f91904
Remove residual processing tracker arguments
2011-10-03 09:50:01 -07:00
Mark DePristo
89ac50e86e
SampleDataSource -> SampleDB
2011-10-03 09:33:30 -07:00
Mark DePristo
93fba06cb5
Support for whitespace only lines
2011-10-03 09:30:10 -07:00
Mark DePristo
0604ce55d1
PedReader support for ; separated lines, not only newline
2011-10-03 09:19:58 -07:00
Mark DePristo
52f670c8b8
100% version of PedReader
...
-- Passes all unit tests
-- Added unit tests for missing fields
2011-10-03 06:12:58 -07:00
Mark DePristo
dd75ad9f49
95% PedReader
...
-- Passes significiant unit tests
-- Implicit sample creation for mom / dad when you create single samples
-- Continuing cleanup of Sample and SampleDataSource
2011-09-30 18:03:34 -04:00
Andrey Sivachenko
c7898a9be7
inconsequential change in string constants printed into the vcf which noone uses anyway...
2011-09-30 16:40:21 -04:00
Mark DePristo
010899f886
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-30 15:51:09 -04:00
Mark DePristo
84160bd83f
Reorganization of Sample
...
-- Moved Gender and Afflication to separate public enums
-- PedReader 90% implemented
-- Improve interface cleanup to XReadLines and UserException
2011-09-30 15:50:54 -04:00
Mauricio Carneiro
05fba6f23a
Clipping ends inside deletion and before insertion
...
fixed.
2011-09-30 15:44:43 -04:00
Mark DePristo
c1cf6bc45a
PEDReader should be in samples
2011-09-30 14:22:19 -04:00
Mark DePristo
56f10b40a8
Fixing test bugs for WindowMaker that required empty sample list
2011-09-30 14:18:27 -04:00
Ryan Poplin
af6c053435
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-30 13:33:31 -04:00
Mark DePristo
810e8ad011
Removed getXByReaders() function from the engine
...
-- These could be simplied in their downstream uses
-- Or they could be replaced with a generic getSAMFileHeaders() function and then apply the getSamples(header) as desired downstream
2011-09-30 10:43:51 -04:00
Mark DePristo
178ba24c27
Move getSamplesForSamFile to SampleUtils
...
-- A nearly identical piece of code already lived in SampleUtils. Now there are two functions, one taking a regular header and another grabbing the merged header from the GATK engine itself. Much cleaner
2011-09-30 10:28:18 -04:00
Mark DePristo
30d23942b1
Renamed ReadBackedPileup getXSampleName() functions to getXSample
...
-- now that we don't have Sample objects floating around we don't have to have all of the Name extensions on our functions
2011-09-30 10:02:57 -04:00
Mark DePristo
3289a325fc
Removed final use of Sample in RBP
2011-09-30 09:57:39 -04:00
Mark DePristo
a69a4dda2f
SamplesDB no longer has null sample
...
-- Updated getSamples().size() == 2 test in CallableLociWalker that really ensured there was one sample in the system
2011-09-30 09:56:23 -04:00
Mark DePristo
e055a78f6e
LIBS now requires at least one sample be present
...
-- UnitTest provides a "null" sample for matching the reads without read groups
2011-09-30 09:49:35 -04:00
Mark DePristo
9860a2c989
Merge branch 'master' into ped
2011-09-30 09:28:18 -04:00
Mark DePristo
d901fed617
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-30 08:41:44 -04:00
Mauricio Carneiro
cabacf028d
Intermediate commit to fix interval skipping
...
may need additional testing.
2011-09-29 18:45:12 -04:00
Mark DePristo
1765fbeb6b
Merge branch 'master' into ped
2011-09-29 17:18:51 -04:00
Mark DePristo
98ecaf8aa0
Support for ReducedReads with reduced counts and average quals
...
-- ReadUtils and UnitTest updated to support new byte[] style
-- Removed unnecessary read transformer in PairHMM
2011-09-29 17:18:39 -04:00
Mauricio Carneiro
9508220157
fixed hard clipping both ends inside deletion
...
If both ends of the interval falls within a deletion in the read then hardClipBothEnds would cut the right tail first including the entire deletion, then fail to cut the left tail because there would not be any bases there anymore. Fixed.
2011-09-29 15:36:49 -04:00
Mark DePristo
625ffb6a07
LocusIteratorByState and ReadBackedPileups no long use Sample
2011-09-29 14:52:11 -04:00
Mark DePristo
b3a2371925
Merge branch 'master' into ped
2011-09-29 14:32:17 -04:00
Mark DePristo
68761a6e28
Removed sample from header
2011-09-29 14:13:05 -04:00
Mauricio Carneiro
a5e75cd14c
Outputting both consensus base qualities and counts
...
The base qualities of a consensus reads are now the average quality of the bases forming the consensus base (most common base) and the consensus quality tag now carry an array with the counts of each base in the consensus. This should increase file size but improve calling sensitivity/specificity.
2011-09-29 12:54:41 -04:00
Mark DePristo
505416b6c0
Merge branch 'master' into ped
2011-09-29 12:22:39 -04:00
Mark DePristo
9536845e35
Cleaning up unused code in MV
2011-09-29 12:20:07 -04:00
Mark DePristo
5043d76c3d
Removing more bad uses of SampleDataSource creation
2011-09-29 12:16:34 -04:00
Mark DePristo
5c9227cf5e
Further cleanup of Sample database
...
-- Removing more and more unnecessary code
-- Partial removal of type safe Sample usage. On the road to SampleDB only
2011-09-29 11:50:05 -04:00
Mark DePristo
2a0cd556d3
Further cleanup of Sample
...
-- Cleaned up interface functions in GAE
-- Added Walker.getSampleDB() function which is an easier option for tools to get the samples db
2011-09-29 10:34:51 -04:00
Mark DePristo
e76f381628
Moved sample package from DataSources to gatk, and renamed it samples
...
-- All associated changes to the codebase are just header updates
2011-09-29 09:57:15 -04:00
Mark DePristo
e197dcd1f3
Pre-cleanup commit of Sample and SampleDataSource
...
-- SampleDataSource has all reader functionality disabled
2011-09-29 09:44:18 -04:00
Mark DePristo
4d31673cc5
No longer supporting YAML file allows us to delete 75% of the sample's codebase
2011-09-29 09:43:31 -04:00
Ryan Poplin
e366ee18bc
Adding ability to read in and make use of kmer quality tables during HMM evaluation
2011-09-29 07:46:19 -04:00
Mauricio Carneiro
fc86cd6fd8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr
2011-09-29 00:12:15 -04:00
Roger Zurawicki
4fd5630f6a
Added ReadClipper Unit Test
...
* Includes tests that include HardClip to Read and Reference Coords.
* Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing
2011-09-28 23:13:50 -04:00
Matt Hanna
9272ed03b5
Merged bug fix from Stable into Unstable
2011-09-28 21:26:43 -04:00
Matt Hanna
0acaf2df65
Fix an embarrassing issue where a specific configuration of minimal coverage
...
over small intervals could cause reads to be dropped from the pileup. Nothing
to see here...
2011-09-28 21:23:01 -04:00
Guillermo del Angel
c8d3a720f9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 18:17:34 -04:00
Guillermo del Angel
7e3cb45093
Further performance optim in banded hmm, about 60% speed improvement over current implementation now
2011-09-28 16:27:28 -04:00
Ryan Poplin
1b1ca80df2
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 16:17:39 -04:00
Ryan Poplin
3b73dc89fe
Making several esoteric arguments in the BQSR @Hidden. Adding basic support for Complete Genomics machine cycle.
2011-09-28 16:17:31 -04:00
Mauricio Carneiro
ff2f4df043
Fixed hardclipping inside indel (right tail)
...
when hard clipping the right tail of a read falls inside a deletion, clipping should fall back to the last base before the deletion to follow the ReadClipper's contract.
2011-09-28 16:07:34 -04:00
Mauricio Carneiro
3c7b7f74ef
Optimized interval iteration
...
Using a TreedSet to manipulate getToolkit.getIntervals() and being smart about which intervals to test makes interval clipping O(1) instead of O(n).
2011-09-28 16:07:34 -04:00
Mauricio Carneiro
5c9b659c02
clipping both ends of the reads was modifying the original read
...
This goes against the ReadClipper contract, and was affecting the second part of the read that spans over multiple intervals. Fixed.
2011-09-28 16:07:34 -04:00
Guillermo del Angel
fe23e4d10c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 15:53:11 -04:00
Guillermo del Angel
e2b9030e93
First mostly fully functional implementation of banded pair HMM likelihood computation for indel caller. More experimentation to follow but it right now works in small data sets and at least it doesn't break existing things. Disabled by default at this point
2011-09-28 15:51:48 -04:00
Eric Banks
1b45f21774
Removing this command-line tool. Purposely not doing this in stable so that users who may still use it have time to find other options. But the docs are no longer on the wiki.
2011-09-28 13:18:32 -04:00
Eric Banks
1f0e354fae
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 13:13:21 -04:00
Eric Banks
bb619a9a3c
Fixing docs
2011-09-28 13:13:03 -04:00
Mark DePristo
5812004e06
Merge branch 'stable'
2011-09-28 11:36:40 -04:00
Mark DePristo
a5006831d7
Shows "" not empty space when default string value is ""
2011-09-28 11:35:52 -04:00
Mark DePristo
1e32281a15
Fix to not show -null when missing short name argument
2011-09-28 11:31:20 -04:00
Mauricio Carneiro
89544c209c
Fixing contracts
...
changed return type to Pair, changing contracts accordingly.
2011-09-28 11:19:17 -04:00
Eric Banks
eacbee3fe5
Merged bug fix from Stable into Unstable
2011-09-27 20:35:18 -04:00
Eric Banks
43b0c98298
Fix docs
2011-09-27 20:34:46 -04:00
Eric Banks
232a6df11c
Add longhand form to the error message.
2011-09-27 20:29:31 -04:00
Eric Banks
1d6fcb6eb1
Revert "Add longhand form to the error message to prevent users from posting borderline dumb posts to GS."
...
This reverts commit 75b2600527cfce05ae683cb394290ff2a80e8552.
2011-09-27 20:27:00 -04:00
Eric Banks
269b9826b6
Add longhand form to the error message to prevent users from posting borderline dumb posts to GS.
2011-09-27 20:26:36 -04:00
Mauricio Carneiro
3b6e43b7c4
Use reads that span multiple intervals
...
* RR will now compress reads that span across multiple intervals correctly and output them in the correct order.
* Fixed bug in getReadCoordinateForReferenceCoordinate where if the requested reference coordinate fell inside a deletion in the read the read would be clipped up to one element past the deletion.
2011-09-27 18:39:06 -04:00
Khalid Shakir
84bd355690
Merged bug fix from Stable into Unstable
2011-09-27 14:34:39 -04:00
Khalid Shakir
b090751f62
Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
...
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Eric Banks
26e71f6688
The Omni files have multiple records (with the same ALT) at a particular location, with one PASSing and the other(s) filtered. Chris, this is why using this file as both eval and comp leads to ref/no-call cells in the GenotypeConcordance table. However, this led to non-determinism in VE because the VCs were placed in a HashSet; we use a LinkedHashMap instead to bring back determinism.
2011-09-27 11:03:17 -04:00
Guillermo del Angel
ceffefa6a6
Intermediate version with banded pair HMM
2011-09-27 10:18:58 -04:00
Mark DePristo
e99ff3caae
Removed lots of old, and not to be used, HMM options
...
-- resulted in massive code cleanup
-- GdA will integrate his new banded algorithm here
-- Removed: DO_CONTEXT_DEPENDENT_PENALTIES, GET_GAP_PENALTIES_FROM_DATA, INDEL_RECAL_FILE, dovit, GSA_PRODUCTION_ONLY
2011-09-27 10:08:40 -04:00
Mark DePristo
fa0efbc4ca
Refactoring of PairHMM to support reduced reads
2011-09-26 13:28:56 -04:00
Mark DePristo
a6b65d6347
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-26 13:26:21 -04:00
Mark DePristo
4f09453470
Refactored reduced read utilities
...
-- UnitTests for key functions on reduced reads
-- PileupElement calls static functions in ReadUtils
-- Simple routine that takes a reduced read and fills in its quals with its reduced qual
2011-09-26 12:58:31 -04:00
Eric Banks
234b74dd05
Merged bug fix from Stable into Unstable
2011-09-26 11:47:23 -05:00
Eric Banks
317b95fa57
Fixing some annotator docs
2011-09-26 11:46:45 -05:00
Mauricio Carneiro
b76dbc72f0
Fixed interval navigation bug.
...
If a read was hard clipped away from the current interval, all subsequent reads within that interval (not hardclipped) would be filtered out. Fixed.
2011-09-26 08:13:44 -04:00