Mark DePristo
74f9ccf6dd
Merge
2011-09-21 11:30:11 -04:00
Mark DePristo
6592972f82
Putative fix for BAQ array out of bounds
...
-- Old code required qual to be <64, which isn't strictly necessary. Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant
-- Unittest to enforce this behavior
2011-09-21 11:25:08 -04:00
Eric Banks
174859fc68
Don't allow whitespace in the INFO field
2011-09-21 11:14:54 -04:00
Mark DePristo
ecc7f34774
Putative fix for BAQ problem.
2011-09-21 11:09:54 -04:00
Mark DePristo
7d11f93b82
Final bugfix for CombineVariants
...
-- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp
-- Proper handling of ids. If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list
2011-09-21 10:58:32 -04:00
Mark DePristo
a91ac0c5db
Intermediate commit of bugfixes to CombineVariants
2011-09-21 10:15:05 -04:00
David Roazen
b04d8eab55
Merged bug fix from Stable into Unstable
2011-09-20 17:24:14 -04:00
Mauricio Carneiro
758ecf2d43
Bringing latest updates of ReduceReads to the master repository
2011-09-20 16:35:09 -04:00
David Roazen
d9ea764611
SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file.
...
This change is urgently required for production, which is why it's going into Stable+Unstable
instead of just Unstable.
The keys for the SnpEff version and command header lines in the VCF file output by
VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally
different from the keys for those same lines in the SnpEff output file (SnpEffVersion
and SnpEffCmd), so that output files from VariantAnnotator won't be confused
with output files from SnpEff itself.
2011-09-20 16:30:55 -04:00
Mark DePristo
bffd3cca6f
Bug fix for reduced read; only adds regular bases for calculation
...
-- No longer passes on deletions for genotyping
2011-09-20 15:07:06 -04:00
Mark DePristo
a1b4cafe7a
Bug fix for NPE when timer wasn't initialized
2011-09-20 13:59:59 -04:00
Mark DePristo
b7511c5ff3
Fixed long-standing bug in tribble index creation
...
-- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index. This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write
-- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary. This can be used conveniently everywhere, and is what's written into the Tribble index
-- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils
-- VCFWriter now requires the master sequence dictionary
-- Updated walkers that create VCFWriters to provide the master sequence dictionary
2011-09-20 10:53:18 -04:00
Mark DePristo
230e16d7c0
Merge branch 'master' into rodrewrite
2011-09-20 06:54:18 -04:00
Mark DePristo
aa8afa3899
Merge
2011-09-19 21:16:47 -04:00
Mauricio Carneiro
56106d54ed
Changing ReadUtils behavior to comply with GenomeLocParser
...
Now the functions getRefCoordSoftUnclippedStart and getRefCoordSoftUnclippedEnd will return getUnclippedStart if the read is all contained within an insertion. Updated the contracts accordingly. This should give the same behavior as the GenomeLocParser now.
2011-09-19 14:00:00 -04:00
Mauricio Carneiro
080c957547
Fixing contracts for SoftUnclippedEnd utils
...
Now accepts reads that are entirely contained inside an insertion.
2011-09-19 13:53:53 -04:00
Mauricio Carneiro
5e832254a4
Fixing ReadAndInterval overlap comments.
2011-09-19 13:28:41 -04:00
Christopher Hartl
ecb8466662
Merged bug fix from Stable into Unstable
2011-09-19 12:32:08 -04:00
Christopher Hartl
8143def292
Fix the -T argument in the DepthOfCoverage docs
...
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 12:31:47 -04:00
Christopher Hartl
034b868588
Revert "Fix the -T argument in the DepthOfCoverage docs"
...
This reverts commit 0994efda998cf3a41b1a43696dbc852a441d5316.
2011-09-19 12:16:07 -04:00
Mark DePristo
cfde0e674b
Merge branch 'sgintervals'
2011-09-19 12:02:41 -04:00
Mark DePristo
3e93f246f7
Support for sample sets in AssignSomaticStatus
...
-- Also cleaned up SampleUtils.getSamplesFromCommandLine() to return a set, not a list, and trim the sample names.
2011-09-19 11:40:45 -04:00
Mark DePristo
41ffb25b74
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-19 10:55:18 -04:00
Christopher Hartl
ca1b30e4a4
Fix the -T argument in the DepthOfCoverage docs
...
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 10:29:06 -04:00
Mark DePristo
4ad330008d
Final intervals cleanup
...
-- No functional changes (my algorithm wouldn't work)
-- Major structural cleanup (returning more basic data structures that allow us to development new algorithm)
-- Unit tests for the efficiency of interval partitioning
2011-09-19 10:19:10 -04:00
Mark DePristo
6ea57bf036
Merge branch 'master' into sgintervals
2011-09-19 09:50:19 -04:00
Mark DePristo
6bd42c053d
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-18 20:18:39 -04:00
Roger Zurawicki
091c7197cd
Fixed memory leak and bug with deletions in clipping
...
The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug.
* There is no check to make sure the read coordinate are covered by the read though
When Hard clipping to interval, I added a check for deletions.
NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized
2011-09-18 19:21:51 -04:00
Guillermo del Angel
e7b9a009b7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-16 12:48:30 -04:00
Menachem Fromer
b2e8e11128
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-16 00:52:27 -04:00
Christopher Hartl
57b3efa2e2
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 21:06:38 -04:00
Christopher Hartl
939babc820
Updating formating for ValidationAmplicons GATK docs
2011-09-15 21:05:51 -04:00
Christopher Hartl
9fdf1f8eb6
Fix some doc formatting for Depth of Coverage
2011-09-15 21:05:22 -04:00
Menachem Fromer
e6e9b08c9a
Must provide alleles VCF to UGCallVariants
2011-09-15 18:51:09 -04:00
David Roazen
d78e00e5b2
Renaming VariantAnnotator SnpEff keys
...
This is to head off potential confusion with the output from the SnpEff tool itself,
which also uses a key named EFF.
2011-09-15 17:42:15 -04:00
Ryan Poplin
2a8b8efd2f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 16:26:35 -04:00
Ryan Poplin
2f58fdb369
Adding expected output doc to CountCovariates
2011-09-15 16:26:11 -04:00
Eric Banks
fd1831b4a5
Updating docs to include more details
2011-09-15 16:25:03 -04:00
Eric Banks
6d02a34bfb
Updating docs to include output
2011-09-15 16:17:54 -04:00
Eric Banks
4ef6a4598c
Updating docs to include output
2011-09-15 16:10:34 -04:00
Eric Banks
fe474b77f8
Updating docs so printing looks nicer
2011-09-15 16:05:39 -04:00
Eric Banks
f04e51c6c2
Adding docs from Andrey since his repo was all screwed up.
2011-09-15 15:38:56 -04:00
Guillermo del Angel
86480b2e13
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 15:31:07 -04:00
Eric Banks
d369d10593
Adding documentation before the release for GATK wiki page
2011-09-15 13:56:23 -04:00
Eric Banks
202405b1a1
Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations.
2011-09-15 13:52:31 -04:00
David Roazen
1e682deb26
Minor html-formatting-related documentation fix to the SnpEff class.
2011-09-15 13:07:50 -04:00
Guillermo del Angel
a942fa38ef
Refine the way we merge records in CombineVariants of different types. As of before, two records of different types were not combined and were kept separate. This is still the case, except when the alleles of one record are a strict subset of alleles of another record. For example, a SNP with alleles {A*,T} and a mixed record with alleles {A*,T, AAT} are now combined when start position matches.
2011-09-15 10:22:28 -04:00
David Roazen
3db457ed01
Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames"
...
After discussing this with Mark, it seems clear that the old version of the
VariantEval FunctionalClass stratification is preferable to this version.
By reverting, we maintain backwards compatibility with legacy output files
from the old GenomicAnnotator, and can add SnpEff support later without
breaking that backwards compatibility.
This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.
2011-09-14 10:47:28 -04:00
David Roazen
e0c8c0ddcb
Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames
...
This is a temporary and hopefully short-lived solution. I've modified
the FunctionalClass stratification to stratify by effect impact as
defined by SnpEff annotations (high, moderate, and low impact) rather
than by the silent/missense/nonsense categories.
If we want to bring back the silent/missense/nonsense stratification,
we should probably take the approach of asking the SnpEff author
to add it as a feature to SnpEff rather than coding it ourselves,
since the whole point of moving to SnpEff was to outsource genomic
annotation.
2011-09-14 07:09:47 -04:00
David Roazen
1213b2f8c6
SnpEff 2.0.2 support
...
-Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2)
-Removed support for SnpEff 1.9.6 (and associated tribble codec)
-Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag)
-Correctly matches ref/alt alleles before annotating a record, unlike the previous version
-Correctly handles indels (again, unlike the previous version
2011-09-14 07:09:47 -04:00
Guillermo del Angel
5b1bf6e244
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-13 17:04:43 -04:00
Guillermo del Angel
c6672f2397
Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf
2011-09-13 16:57:37 -04:00
Mark DePristo
edf29d0616
Explicit info message about uploading S3 log
2011-09-12 22:16:52 -04:00
Mark DePristo
2316b6aad3
Trying to fix problems with S3 uploading behind firewalls
...
-- Cannot reproduce the very long waits reported by some users.
-- Fixed problem that exception might result in an undeleted file, which is now fixed with deleteOnExit()
2011-09-12 22:02:42 -04:00
Matt Hanna
64707c33bb
Merged bug fix from Stable into Unstable
2011-09-12 21:54:11 -04:00
Matt Hanna
e63d9d8f8e
Mauricio pointed out to me that dynamic merging the unmapped regions of multiple BAMs ('-L unmapped' with a BAM list)
...
was completely broken. Sorry about this! Fixed.
2011-09-12 21:50:59 -04:00
Eric Banks
ec4b30de6d
Patch from Laurent: typo leads to bad error messages.
2011-09-12 14:45:53 -04:00
David Roazen
9d9d438bc4
New VariantAnnotatorEngine capability: an initialize() method for all annotation classes.
...
All VariantAnnotator annotation classes may now have an (optional) initialize() method
that gets called by the VariantAnnotatorEngine ONCE before annotation starts.
As an example of how this can be used, the SnpEff annotation class will use the initialize()
method to check whether the SnpEff version number stored in the vcf header is a supported
version, and also to verify that its required RodBinding is present.
2011-09-12 13:00:53 -04:00
Ryan Poplin
981b78ea50
Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.
2011-09-12 12:17:43 -04:00
Ryan Poplin
60ebe68aff
Fixing issue in VariantEval in which insertion and deletion events weren't treated symmetrically. Added new option to require strict allele matching.
2011-09-12 09:43:23 -04:00
Guillermo del Angel
9344938360
Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly
2011-09-10 19:41:01 -04:00
Guillermo del Angel
e95d484757
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-09 18:31:14 -04:00
Guillermo del Angel
a807205fc3
a) Minor optimization to softMax() computation to avoid redundant operations, results in about 5-10% increase in speed in indel calling.
...
b) Added (but left commented out since it may affect integration tests and to isolate commits) fix to per-sample DP reporting, so that deletions are included in count.
c) Bug fix to avoid having non-reference genotypes assigned to samples with PL=0,0,0. Correct behavior should be to no-call these samples, and to ignore these samples when computing AC distribution since their likelihoods are not informative.
2011-09-09 18:00:23 -04:00
Mauricio Carneiro
9e650dfc17
Fixing SelectVariants documentation
...
getting rid of messages telling users to go for the YAML file. The idea is to not support these anymore.
2011-09-09 16:25:31 -04:00
Mark DePristo
72536e5d6d
Done
2011-09-09 15:44:47 -04:00
Mark DePristo
3c8445b934
Performance bugfix for GenomeLoc.hashcode
...
-- old version overflowed so most GenomeLocs had 0 hashcode. Now uses or not plus to combine
2011-09-09 14:25:37 -04:00
Mark DePristo
c6436ee5f0
Whitespace cleanup
2011-09-09 14:24:29 -04:00
Mark DePristo
87dc5cfb24
Whitespace cleanup
2011-09-09 14:23:42 -04:00
Ryan Poplin
91c949db74
Fixing ValidateVariants so that it validates deletion records. Fixing GATKdocs.
2011-09-09 12:57:14 -04:00
Mark DePristo
06cb20f2a5
Intermediate commit cleaning up scatter intervals
...
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Eric Banks
6ad8943ca0
CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.
2011-09-09 09:45:24 -04:00
Mark DePristo
48461b34af
Added TYPE argument to print out VariantType
2011-09-08 15:01:13 -04:00
Ryan Poplin
9cba1019c8
Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap
2011-09-08 09:25:13 -04:00
Ryan Poplin
e0020b2b29
Fixing PrintRODs. Now has input and only prints out one copy of each record
2011-09-08 08:58:37 -04:00
Ryan Poplin
29c968ab60
clean up
2011-09-08 08:42:43 -04:00
Ryan Poplin
59841f8232
Fixing genotype given alleles for indels. Only take the records that start at this locus.
2011-09-08 08:41:16 -04:00
Mark DePristo
cd2c511c4a
GCF improvements
...
-- Support for streaming VCF writing via the VCFWriter interface
-- GCF now has a header and a footer. The header is minimal, and contains a forward pointer to the position of the footer in the file.
-- Readers now read the header, and then jump to the footer to get the rest of the "header" information
-- Version now a field in GCF
2011-09-07 23:28:46 -04:00
Mark DePristo
fe5724b6ea
Refactored indexing part of StandardVCFWriter into superclass
...
-- Now other implementations of the VCFWriter can easily share common functions, such as writing an index on the fly
2011-09-07 23:27:08 -04:00
Mark DePristo
01b6177ce1
Renaming GVCF -> GCF
2011-09-07 17:10:56 -04:00
Mark DePristo
b220ed0d75
Merge branch 'master' into rodrewrite
2011-09-07 17:05:35 -04:00
Guillermo del Angel
45d54f6258
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 16:49:49 -04:00
Guillermo del Angel
9604fb2ba3
Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted
2011-09-07 16:49:16 -04:00
Mark DePristo
2ded027762
Removed dysfunctional tranches support from VariantEval
2011-09-07 16:09:24 -04:00
Eric Banks
aa9e32f2f1
Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.
2011-09-07 15:48:06 -04:00
Eric Banks
3a04955a30
We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now.
2011-09-07 14:01:42 -04:00
Guillermo del Angel
743bf7784c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:21:26 -04:00
Guillermo del Angel
5f22ef9a8c
Added missing javadoc info to Beagle arguments
2011-09-07 13:21:11 -04:00
Mark DePristo
3bcbfa6e06
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:13:17 -04:00
Mark DePristo
430da23446
At least 2 minutes must pass before a status message is printed, further stabilizing time estimates
2011-09-07 13:13:07 -04:00
Mauricio Carneiro
6857d0324e
Merge branch 'master' into rr
2011-09-07 12:59:08 -04:00
Mark DePristo
7e9e20fed0
Forgot to delete previous call
2011-09-07 12:54:52 -04:00
Mark DePristo
d23d620494
Pushing traversal engine timer start to as close to actual start as possible
...
-- Should make initial timings more accurate
2011-09-07 12:52:33 -04:00
Mark DePristo
6ff432e1f2
BugFix for TF argument to VariantEval, actually making it work properly
2011-09-07 12:50:17 -04:00
Mauricio Carneiro
131cb7effd
Bringing Reduce Reads bug fixes to the main repository
2011-09-07 12:25:53 -04:00
Mark DePristo
a1920397e8
Major bugfix for per sample VariantEval
...
-- per sample stratification was not being calculated correctly. The alt allele was always remaining, even if the genotype of the sample was hom-ref. Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything. Eric is going to fix the system in general, so this commit may break the build.
2011-09-07 12:18:11 -04:00
Mark DePristo
a02636a1ac
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/ebanks/Sting_rodrefactor into rodrewrite
2011-09-07 10:50:00 -04:00
Mark DePristo
d5641cfac5
Merge branch 'variantEvalST'
2011-09-07 10:44:23 -04:00
Mark DePristo
2f4cf82e3b
VariantEval cleanup. Added VariantType Stratification
...
-- ArrayList are List where possible
-- states refactored into VariantStratifier base class (reduces many lines of duplicate code)
-- Added VariantType stratification that partitions report by VariantContext.Type
2011-09-07 10:43:53 -04:00
Christopher Hartl
436f6eb52b
Reverting Eric's change and pushing in some command-line-option documentation.
2011-09-07 08:53:30 -04:00
Eric Banks
1ef8a1750a
I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon.
2011-09-06 21:07:49 -04:00
Eric Banks
da9c8ab386
Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.
2011-09-06 20:39:42 -04:00
Mark DePristo
3db7ecb920
ReducedRead flag cached in GATKSAMRecord. 20% performance improvement
2011-09-06 15:11:38 -04:00
Roger Zurawicki
47607a7eff
Fixed bug where deletions messed up interval clipping
...
- Instead of using readLength, the ReadUtil function are used to get a proper read coordinate
- Added debug info in interval clipping ( with -dl)
NOTE: method might not be safe for production and checks need to be added to the ClippingOp code
2011-09-06 14:25:57 -04:00
Khalid Shakir
0adb388dee
Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue.
...
Updates for HybridSelectionPipeline:
- Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5.
- If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds.
- Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches.
- Removed GSA_PRODUCTION_ONLY flag from indel caller.
- Updated indel hard filters based on delangel's analysis.
- Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.
2011-09-06 12:41:46 -04:00
Mark DePristo
d471617c65
GATK binary VCF (gvcf) prototype format for efficiency testing
...
-- Very minimal working version that can read / write binary VCFs with genotypes
-- Already 10x faster for sites, 5x for fully parsed genotypes, and 1000x for skipping genotypes when reading
2011-09-02 21:15:19 -04:00
Mark DePristo
048202d18e
Bugfix for cached quals
2011-09-02 21:13:28 -04:00
Mark DePristo
03aa04e37c
Simple refactoring to make formating functions public
2011-09-02 21:13:08 -04:00
Mark DePristo
124ef6c483
MISSING_VALUE now gets defaultValue in getAttribute functions
2011-09-02 21:12:28 -04:00
Mark DePristo
82f2131777
Simplied getAttributeAsX interfaces
...
-- Removed versions getAttribriteAsX(key) that except on not having the value.
-- Removed version that getAttributeAsXNoException(key)
-- The only available assessors are now getAttributeAsX(key, default).
-- This single accessors properly handle their argument types, so if the value is a double it is returned directly for getAttributeAsDouble(), or if it's a string it's converted to a double. If the key isn't found, default is returned.
2011-09-02 12:27:11 -04:00
Mauricio Carneiro
08ae6c0c61
ReadClipper is now handling unmapped reads
2011-09-02 11:32:30 -04:00
Mark DePristo
c57198a1b9
Optimizations in VCFCodec
...
-- Don't create an empty LinkedHashSet() for PASS fields. Just return Collections.emptySet() instead.
-- For filter fields with actual values, returns an unmodifiableSet instead of one that can be changed
2011-09-02 08:46:17 -04:00
Mark DePristo
c3ea96d856
Removing many unused functions of unquestionable purpose
2011-09-02 08:42:01 -04:00
Eric Banks
d241f0e903
Adding docs for the pcr error rate argument.
2011-09-01 21:57:02 -04:00
Eric Banks
827fe6130c
Adding hidden printing option. Also, always run UG in mode GENOTYPE_GIVEN_ALLELES given that we don't actually test for the correct alleles (otherwise UG may choose a different allele and we may falsely validate the wrong one).
2011-09-01 11:40:35 -04:00
Mark DePristo
ac49b8d26b
Conditional support for PerformanceTrackingQuerySource to measure Tribble / GATK bridge performance
...
-- Removed DEBUG option, instead use MEASURE_TRIBBLE_QUERY_PERFORMANCE in RMDTrackerBuilder
2011-09-01 10:41:55 -04:00
Mauricio Carneiro
4b5a7046c5
Making ReadLengthDistribution Public
...
Found this neat little walker Kiran wrote stashed in the private tree. Very useful. Generalized it a bit, added GATKDocs and moved it to public. I might include it as a QC step on the pacbio processing pipeline.
* generalize it so it works with non pair ended reads.
* generalize it to work with no read group information
2011-08-31 15:52:28 -04:00
Mauricio Carneiro
7d79de91c5
Merge branch 'master' into rr
2011-08-30 02:50:19 -04:00
Mauricio Carneiro
0cd9438ac2
fixed soft unclipped calculation
...
* getRefCoordSoftUnclippedEnd was not resetting the shift when hitting insertions. Fixed.
* getReadCoordinateForReferenceCoordinateBeforeAlignmentEnd was returning the wrong read coordinate position. Fixed.
2011-08-30 02:45:29 -04:00
Mauricio Carneiro
fd540592ab
Added RMS calculation for consensus MQ
...
Consensus MQ is now the average of the RMS of the mapping qualities of the reads making each site.
2011-08-30 02:45:20 -04:00
Mauricio Carneiro
6f9264d2b3
Hard Clipping no longer leaves indels on the tails
...
The clipper could leave an insertion or deletion as the start or end of a read after hardclipping a read if the element adjacent to the clipping point was an indel. Fixed.
2011-08-30 02:44:58 -04:00
Mauricio Carneiro
943876c6eb
Added QUAL/MINVAR parameters to the walker
2011-08-30 02:44:46 -04:00
Mauricio Carneiro
7532be7f5a
Allowing to clip after AlignmentEnd if end is soft clipped.
...
Read clipper now identifies and clips even if the requested coordinate is outside the alignment but the read contains soft clipped bases in that region.
2011-08-30 02:44:46 -04:00
Mauricio Carneiro
90a1f5e15c
Several bug fixes
...
* When hard clipping a read that had insertions in it, the insertion was being added to the cigar string's hard clip element. This way, the old UnclippedStart() was being modified and so was the calculation of the new AlignmentStart(). Fixed it by subtracting the number of insertions clipped from the total number of hard clipped bases.
* Walker was sending read instead of filtered read when deleting a read that contains only Q2 bases
* Sliding the window was causing reads that started on the new start position to be entirely clipped.
2011-08-30 02:44:19 -04:00
Mauricio Carneiro
66a8b36cf5
Fixed most indexing bugs
...
* added bases and quals to consensus
* fixed consensus read cigar generation.
2011-08-30 02:43:41 -04:00
Mark DePristo
1e5001b447
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-29 17:04:21 -04:00
Mark DePristo
3b09d42ed6
Now only prints 1 warning message about duplicate headers in simpleMerge
2011-08-29 14:41:29 -04:00
Eric Banks
c2f0db969b
Don't use the default deletion value from UG if not asking to have it set
2011-08-29 13:48:10 -04:00
Eric Banks
bb7a37e8f2
We need to allow reference calls in the input VCF for the GenotypeAndValidate walker when using the BAM as truth so that we can test supposed monomorphic calls against the truth.
2011-08-29 13:19:35 -04:00
Ryan Poplin
bc252a0d62
misc minor bug fixes in assembly. Increasing the minimum number of bad variants to be used in negative model training in the VQSR
2011-08-29 08:11:31 -04:00
Mark DePristo
a5c65fc133
Debugging information to print out the Query tracks
2011-08-28 18:54:49 -04:00
Mark DePristo
7bf006278d
Moved ResolveHostname to general utils as a static function
2011-08-28 12:04:16 -04:00
Mark DePristo
ccec0b4d73
AnalyzeCovariates uses the general RScript system now
...
-- Convenience constructor for collection for testing
-- callRScript() now accepts Objects not Strings, for convenience
2011-08-27 12:54:13 -04:00
Mark DePristo
1ceb020fae
UnitTests for RScript
2011-08-27 10:50:05 -04:00
Mark DePristo
e37a638e09
Fix for disallowed characters in GATKReportTable
...
-- Illegal characters are automatically replaced with _
2011-08-26 13:24:06 -04:00
Mark DePristo
eef1ac415a
Merge branch 'master' into rodTesting
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java
2011-08-26 00:35:41 -04:00
Eric Banks
9b7512fd94
Just because there's a ref base doesn't mean the VC needs to be padded
2011-08-25 22:42:14 -04:00
Mark DePristo
e01273ca7c
Queue now writes out queueJobReport.pdf
...
-- General purpose RScript executor in java (please use when invoking RScripts)
-- Removed groupName. This is now analysisName
-- Explicitly added capability to enable/disable individual QFunction
2011-08-25 16:57:11 -04:00
Eric Banks
09a729da3a
Removing incorrect comment
2011-08-25 15:42:52 -04:00
Eric Banks
8bbef79fc2
Create clipped alleles during allele parsing instead of creating a full VC, clipping alleles, and regenerating the VC from scratch.
2011-08-25 15:37:26 -04:00
Ryan Poplin
29c7b10f7b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-24 15:18:58 -04:00
Ryan Poplin
e5008aba00
Output the top two haplotypes as a variant call by running smith-waterman alignment against the reference and calling any difference as variation. This is the first verion that runs end-to-end by taking in reads as bam file and writing out variant calls in VCF.
2011-08-24 15:18:44 -04:00
Guillermo del Angel
e618cb1e79
a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED|INDEL|SNP|MNP|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC|MULTIALLELIC]. Corresponding gatkdocs changes.
...
b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk).
c) Added integrationtest for new SelectVariants commands
2011-08-24 12:25:50 -04:00
Mark DePristo
28ee6dac41
Fixed spelling mistake
2011-08-24 10:14:45 -04:00
Mark DePristo
569e1a1089
Walker.isDone() aborts execution early
...
-- Useful if you want to have a parameter like MAX_RECORDS that wants the walker to stop after some number of map calls without having to resort to the old System.exit() call directly.
2011-08-23 16:53:06 -04:00
Ryan Poplin
a1a1fac9e4
Likelihood engine now gives non-zero likelihoods. Using HMM function that can handle context specific gap open and gap continuation penalties
2011-08-23 13:43:07 -04:00
Guillermo del Angel
6e2552a9ef
Merge fix
2011-08-23 12:40:43 -04:00
Guillermo del Angel
8b7a0b3b62
Two new arguments to SelectVariants to exclude either multiallelic or biallelic sites from input vcf
2011-08-23 12:40:01 -04:00
Roger Zurawicki
ac36271457
Fixed extra reads showing up in Variable Sites
...
Reads that were not hard clipped for the variable site no longer show up in output file
Walker now uses unclippedStart of Read to determine position in the sliding Window
2011-08-23 11:26:00 -04:00
Mark DePristo
6d6feb5540
Better error message when you cannot determine a ROD type because the file doesn't exist or cannot be read
2011-08-23 10:56:37 -04:00
Mauricio Carneiro
feeab6075f
Merging ReduceReads development with unstable repo
...
It is time to bring the ReadClipper class to the main repo. Read Clipper has tested functionality for soft and hard clipping reads. I will prepare thorough documentation for it as it will be very useful for the assembler and the GATK in general.
2011-08-22 23:03:03 -04:00
Guillermo del Angel
ee68713267
Further Bug fixes to CountVariants: stratifications were wrong in case genotypes had no-calls, for example if we stratified by sample and a sample had a no-call, this no-call was considered a true variant and counts were incorrectly increased
2011-08-22 20:42:47 -04:00
Guillermo del Angel
c270384b2e
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-22 20:39:32 -04:00
Guillermo del Angel
8ae24912f4
a) Misc fixes in Phase1 indel vqsr script,
...
b) More R-friendly VariantsToTable printing of AC in case of multiple alt alleles
c) Rename FixPLOrderingWalker to FixGenotypesWalker and rewrote: no longer need older code, replaced with code to replace genotypes with all-zero PL's with a no-call.
2011-08-22 20:39:06 -04:00
Mark DePristo
85c5a6f890
Merge branch 'rodTesting'
...
Conflicts:
private/java/src/org/broadinstitute/sting/gatk/walkers/performance/ProfileRodSystem.java
2011-08-22 17:43:47 -04:00
Mark DePristo
1eab9be35d
Now with accurate javadoc
2011-08-22 17:25:15 -04:00
Mark DePristo
3612a3501d
info, not warn, about dynamic type determination
2011-08-22 17:24:51 -04:00
Eric Banks
dc42571dd9
Only create the genotype map when necessary
2011-08-22 15:40:36 -04:00
Khalid Shakir
c4c90c8826
Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline:
...
- Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size.
- Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values.
- Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8
- Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.
2011-08-22 15:13:27 -04:00
Eric Banks
2c24b68a96
Working implementation of DecodeLoc for VCF parsing. Makes indexing 3x faster.
2011-08-22 15:11:21 -04:00
Eric Banks
518b3dd291
Don't let the genotypes map be null
2011-08-22 15:10:30 -04:00
Ryan Poplin
f93a554b01
updating exome specific parameters in MDCP
2011-08-21 10:25:36 -04:00
Ryan Poplin
dbff84c54e
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-21 10:09:19 -04:00
Khalid Shakir
22ca44c015
Fixed Queue's tagging of RodBindings.
...
Fixed argument definition names.
2011-08-21 02:34:20 -04:00
Eric Banks
a8cbced71b
Bug fix for Ryan: check for no context
2011-08-20 22:49:51 -04:00
Eric Banks
0ccd173967
Fixing the recent SelectVariants fix
2011-08-20 21:30:08 -04:00
Ryan Poplin
b008676878
fixing the previous fix
2011-08-20 21:21:55 -04:00
Guillermo del Angel
782453235a
Updated VariantEvalIntegrationTest since there's a new column separating nMixed and nComplex in CountVariants
...
Misc updates to WholeGenomeIndelCalling.scala
Bug fix in VariantEval (may be temporary, need more investigation): if -disc option is used in sites-only vcf's then a null pointer exception is produced, caused by recent introduction of -xl_sf options.
2011-08-20 12:24:22 -04:00
Ryan Poplin
539e157ecd
Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR
2011-08-20 11:28:48 -04:00
Guillermo del Angel
4939648fd4
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-20 08:50:43 -04:00
Ryan Poplin
a96ecbab71
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-19 19:30:05 -04:00
Ryan Poplin
ddb5045e14
Updating the methods development calling pipeline for the new rod binding syntax and the new best practices.
2011-08-19 19:29:51 -04:00
Mark DePristo
8b3cfb2f1c
Final documented version of GATKDoclet and associated classes
...
-- Docs on everything.
-- Feature complete. At this point only minor improvements and bugfixes are anticipated
2011-08-19 16:52:17 -04:00
Mark DePristo
b08d63a6b8
Documentation and code cleanup for ClipReads, CallableLoci, and VariantsToTable
...
-- Swapped -o [summary] and -ob [bam] for more standard -o [bam] and -os [summary] arguments.
-- @Advanced arguments
2011-08-19 15:06:37 -04:00
Mark DePristo
49e831a13b
Should have checked in
2011-08-19 14:35:16 -04:00
Mauricio Carneiro
7b5fa4486d
GenotypeAndValidate - Added docs to the @Arguments
2011-08-19 13:35:11 -04:00
Mark DePristo
9f7d4beb89
Merge branch 'help'
2011-08-19 13:14:02 -04:00
Mark DePristo
4d1fd17a97
GATKDoclet cleanup and documentation
...
-- Fixed bug in the way ArgumentCollections were handled that lead to failure in handling the dbsnp argument collection.
2011-08-19 13:13:41 -04:00
Ryan Poplin
0f25167efd
minor fix in VariantEval docs
2011-08-19 11:01:04 -04:00
Mark DePristo
198955f752
GATKDoc descriptions for all standard codecs, or TODO for their owners
...
-- Also added vcf.gz support in the VCF codec. This wasn't committed in the last round, because it was missed by the parallel documentation effort.
2011-08-19 09:57:21 -04:00
Guillermo del Angel
269ed1206c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-19 09:32:20 -04:00
Eric Banks
40e67cff1b
I like the @Advanced annotation
2011-08-18 22:27:34 -04:00
Mark DePristo
2457c7b8f5
Merge branch 'master' into help
2011-08-18 22:20:43 -04:00
Mark DePristo
5fbdf968f7
ArgumentSource no longer comparable. Arguments sorted by GATKDoclet
2011-08-18 22:20:14 -04:00
Eric Banks
77fa2c1546
Renaming read filters with a superfluous 'Read' in their names. Kept the ones that made sense to have it (e.g. MalformedReadFilter).
2011-08-18 22:01:33 -04:00
Mark DePristo
1d3799ddf7
Merge branch 'master' into help
2011-08-18 22:00:29 -04:00
Mark DePristo
d1892cd0d7
Bug fixes
...
-- Sorting of ArgumentSources now done in GATKDoclet, not in the ParsingEngine, as the system depends on the LinkedTreeMap
-- Fixed broken exception throwing in the case where a file's type could not be determined
2011-08-18 21:58:36 -04:00
Mark DePristo
c5efb6f40e
Usability improvements to GATKDocs
...
-- ArgumentSources are now sorted by case insensitive names, so arguments are shown in alphabetical order (Ryan)
-- @Advanced annotation can be used to indicate that an argument is an advanced option and should be visually deemphasized in the GATKs. There's now an advanced section. Mauricio or Ryan -- could you figure out how to make this section less prominent in the style.css?
2011-08-18 21:39:11 -04:00
Mark DePristo
d94da0b1cf
Moved CG and SOAP codecs to private
2011-08-18 21:20:26 -04:00
Mark DePristo
f7414e39bc
Improvements to GATKDocs
...
-- Allowed values for RodBinding<T> are displayed in the GATKDocs
-- Longest name up to 30 characters is chosen for main argument list (suggested by Ryan/Mauricio)
-- Features are listed in alphabetical order
-- Moved useful getParameterizedType() function to JVMUtils
-- Tests of these features in the Documentation Test
2011-08-18 21:20:09 -04:00
Ryan Poplin
09d099cada
Added GATKDocs to the UnifiedGenotyper.
2011-08-18 20:57:02 -04:00
Mauricio Carneiro
6ef01e40b8
Complete rewrite of Hard Clipping (ReadClipper)
...
Hard clipping is now completely independent from softclipping and plows through previously hard or soft clipped reads.
2011-08-18 18:35:45 -04:00
Guillermo del Angel
626cbf9411
Bug fixes and cleanups for IndelStatistics
2011-08-18 16:28:40 -04:00
Guillermo del Angel
58560a6d50
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 16:17:52 -04:00
Guillermo del Angel
3dfb60a46e
Fixing up and refactoring usage of indel categories. On a variant context, isInsertion() and isDeletion() are now removed because behavior before was wrong in case of multiallelic sites. Now, methods isSimpleInsertion() and isSimpleDeletion() will return true only if sites are biallelic. For multiallelic sites, isComplex() will return true in all cases.
...
VariantEval module CountVariants is corrected and an additional column is added so that we log mixed events and complex indels separately (before they were being conflated).
VariantEval module IndelStatistics is considerably simplified as the sample stratification was wrong and redundant, now it should work with the VE-generic Sample stratification. Several columns are renamed or removed since they're not really useful
2011-08-18 16:17:38 -04:00
Chris Hartl
6b256a8ac5
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git
2011-08-18 15:29:24 -04:00
Chris Hartl
a8935c99fc
dding docs for DepthOfCoverage and ValidationAmplicons
2011-08-18 15:28:35 -04:00
Mark DePristo
f2f51e35e3
Merge branch 'master' into help
2011-08-18 14:05:33 -04:00
Mark DePristo
faa3f8b6f6
Only concrete classes are now documented
2011-08-18 14:04:47 -04:00
Ryan Poplin
7c4ce6d969
Added GATKDocs for the VQSR walkers.
2011-08-18 14:00:39 -04:00
Mark DePristo
5772766dd5
Improvements to GATKDocs
...
-- Now supports a static list of root classes / interfaces that should receive docs. A complementary approach to documenting features to the DocumentedGATKFeature annotation
-- Tribble codecs are now documented!
-- No longer displayed sub and super classes
2011-08-18 14:00:09 -04:00
Mark DePristo
e03db30ca0
New uses DocumentedGATKFeatureObject instead of annotation directly
...
-- Step 1 on the way to creating a static list of additional classes that we want to document.
2011-08-18 12:31:04 -04:00
Mark DePristo
d4511807ed
Merge branch 'master' into help
2011-08-18 11:53:37 -04:00
Mark DePristo
c787fd0b70
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 11:52:45 -04:00
Mark DePristo
c797616c65
If you have one sample in your BAM, getToolkit().getSamples().size() == 2
...
Also deleted double initializationm, where a line of code was duplicated in creating the GATK engine.
2011-08-18 11:51:53 -04:00
Mark DePristo
cbec69a130
Merge branch 'master' into help
...
Conflicts:
public/java/src/org/broadinstitute/sting/utils/help/HelpUtils.java
2011-08-18 11:33:27 -04:00
Eric Banks
aa21fc7c9c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 11:30:59 -04:00
Mark DePristo
f5d7cabb20
Fix for reintroducing an already solved problem.
2011-08-18 11:20:12 -04:00
Eric Banks
a45498150a
Remove non-ascii char
2011-08-18 11:18:29 -04:00
Ryan Poplin
c08a9964d4
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 10:58:04 -04:00
Ryan Poplin
bb79d3edae
Added GATKDocs for the BQSR walkers.
2011-08-18 10:57:48 -04:00
Mark DePristo
47bbddb724
Now provides type-specific user feedback
...
For RodBinding<VariantContext> error messages now list only the Tribble types that produce VariantContexts
2011-08-18 10:47:16 -04:00
Mark DePristo
2d41ba15a4
Vastly better Tribble help message
...
Here's a new example:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.1-520-g76495cd):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to parse value /humgen/gsa-hpprojects/GATK/data/refGene_b37.filtered.sorted.txt for argument refSeqRodBinding. Message: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :TYPE listing the correct type from among the supported types:
##### ERROR Name FeatureType Documentation
##### ERROR BEAGLE BeagleFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR BED BEDFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_bed_BEDCodec.html
##### ERROR BEDTABLE TableFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR CGVAR VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_completegenomics_CGVarCodec.html
##### ERROR DBSNP DbSNPFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_dbsnp_DbSNPCodec.html
##### ERROR GELITEXT GeliTextFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR MAF MafFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_features_maf_MafCodec.html
##### ERROR MILLSDEVINE VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_MillsDevineCodec.html
##### ERROR RAWHAPMAP RawHapMapFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR REFSEQ RefSeqFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR SAMPILEUP SAMPileupFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR SAMREAD SAMReadFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR SNPEFF SnpEffFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_snpEff_SnpEffCodec.html
##### ERROR SOAPSNP VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_soapsnp_SoapSNPCodec.html
##### ERROR TABLE TableFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR VCF VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR VCF3 VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------
2011-08-18 10:31:32 -04:00
Mark DePristo
c2287c93d7
Cleanup of codec locations. No more dbSNPHelper
...
-- refdata/features now in utils/codecs with the other codecs
-- Deleted dbsnpHelper. rsID function now in VCFutils. Remaining code either deleted or put into VariantContextAdaptors
-- Many associated import updates due to code move
2011-08-18 10:02:46 -04:00
Mark DePristo
9c17d54cb6
getFeatureClass() now returns Class<T> not Class to avoid yesterday's runtime error
2011-08-18 09:39:20 -04:00
Mark DePristo
c30e1db744
Better location for help utils
2011-08-18 09:38:51 -04:00
Mark DePristo
4da42d9f39
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 09:32:57 -04:00
Eric Banks
c91a442be1
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 22:40:16 -04:00
Eric Banks
a7b70e6bb4
Adding feature for Khalid: ability to exclude particular samples.
2011-08-17 22:28:22 -04:00
Mauricio Carneiro
cc3df8f11a
Moving GAV walker to public
...
Walker is updated to the new RodBinding system and has the new GATKDocs layout.
2011-08-17 21:55:17 -04:00
Eric Banks
fa1db3913b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 21:49:25 -04:00
Eric Banks
8e83b6646b
Bug fix for Chris: don't validate ref base for complex events.
2011-08-17 21:49:14 -04:00
Matt Hanna
c104dd7a09
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 16:59:12 -04:00
Matt Hanna
81a792afeb
Reverting optimization disable in unstable.
2011-08-17 16:58:24 -04:00
Mark DePristo
2e35592295
GATKDocs for CallableLoci
2011-08-17 16:32:01 -04:00
Guillermo del Angel
c193f52e5d
Fixed up examples: pasting from wiki still had old rod syntax
2011-08-17 16:29:45 -04:00
Matt Hanna
2b2a4e0795
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable
2011-08-17 16:26:45 -04:00
Matt Hanna
297c9e513c
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable into unstable
2011-08-17 16:24:02 -04:00
Matt Hanna
a210a62ab9
Merged bug fix from Stable into Unstable
2011-08-17 16:23:31 -04:00
Mark DePristo
d59e6ed274
Fix for RefSeqCodec bug and better error messages
...
-- RefSeqCodec bug: getFeatureClass() returned RefSeqCodec.class, not RefSeqFeature.class. Really should change this in Tribble to require Class<T extends Feature> to get compile time type checking
-- Better error messages that actually list the available tribble types, when there's a type error
2011-08-17 16:22:07 -04:00
Matt Hanna
d170187896
Disable optimization that increases marginal speed of the GATK slightly but
...
can produce data loss in a narrow corner case where the BGZF block(s) locations
and offsets in the last index bucket of contig n overlap exactly with the BGZF
block locations and offset in the last index bucket of contig n+1.
A proper fix that keeps the optimization has already been introduced into
unstable, but disabling the optimization is a low risk way to make sure that
users of stable experience no data loss.
2011-08-17 16:16:05 -04:00
David Roazen
53006da9a5
Improved descriptions for the SnpEff annotations in the VCF header
...
(based on Eric's feedback).
2011-08-17 16:09:10 -04:00
Guillermo del Angel
784fb148b9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 15:47:01 -04:00
Guillermo del Angel
671330950d
Updated Beagle walker for gatkdocs format. Pushed unsupported, undocumented arguments to @Hidden
2011-08-17 15:46:31 -04:00
Andrey Sivachenko
0af68e052a
Merge branch 'master' of ssh://cga1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 15:17:47 -04:00
Andrey Sivachenko
a423546cdd
fix: RefSeq contains records with zero coding length and the refsec codec/feature used to crash on those; now such records are ignored, with warning printed (once)
2011-08-17 15:17:31 -04:00
Andrey Sivachenko
710d34633e
now the reads that are too long are truly ignored (fix of the fix)
2011-08-17 15:16:23 -04:00
Eric Banks
2f19046f0c
Adding docs to the 2 beasts. Saved the worst for last.
2011-08-17 14:19:14 -04:00
Andrey Sivachenko
069554efe5
somatic indel detector does not die on reads that are too long (likely contain a huge deletion) anymore; instead print a warning and ignore the read
2011-08-17 14:05:19 -04:00
Eric Banks
c405a75f54
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 13:28:25 -04:00
Eric Banks
575303ae6b
Renaming for consistency and bringing up to speed with new rod system
2011-08-17 13:28:19 -04:00
Eric Banks
6d629c176c
Adding docs
2011-08-17 13:27:36 -04:00
Eric Banks
a21e193a9e
Adding docs to 3 more walkers
2011-08-17 12:35:08 -04:00
Menachem Fromer
98acb546a9
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 12:22:29 -04:00
Menachem Fromer
d1bb302d12
Added GatkDocs documentation
2011-08-17 12:21:37 -04:00
Mark DePristo
3da71a9bb6
Clean up summary
2011-08-17 12:04:45 -04:00
Mark DePristo
c6fb215faf
GATKDocs for VariantsToTable
...
-- Made a previously required argument optional, as this was a long-standing bug
2011-08-17 12:02:41 -04:00
Mark DePristo
5f794d16a7
Fixed bad character in documentation
2011-08-17 12:01:08 -04:00
Mark DePristo
9d1d5bd27a
Revert "Fixed bad character in documentation"
...
This reverts commit a1f50c82d3cb25e5e83d36e9054d74cdee957d87.
2011-08-17 11:57:31 -04:00
Mark DePristo
78deb3f195
Fixed bad character in documentation
2011-08-17 11:57:00 -04:00
Mark DePristo
79dcfca25f
Fixed bad character in documentation
2011-08-17 11:56:51 -04:00
Eric Banks
b3b5d608ca
Adding docs to yet more walkers
2011-08-17 09:57:19 -04:00
Eric Banks
fadcbf68fd
Adding docs to QC walkers
2011-08-17 09:39:33 -04:00
Mauricio Carneiro
5d6a6fab98
Renamed softUnclipped functions to refCoord*
...
These functions return reference coordinates, so they should be named accordingly.
2011-08-16 18:56:28 -04:00
Mauricio Carneiro
ed8f769dce
Fixed index for getSoftUnclippedEnd()
...
Unclipped end can be calculated simply by looking at the last cigar element and adding it's length in case it's a soft clip.
2011-08-16 18:54:28 -04:00
Eric Banks
5f3f46aad1
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-16 16:26:33 -04:00
Eric Banks
946f5c53fe
Adding docs to more walkers
2011-08-16 16:26:26 -04:00
Mark DePristo
6e828260a0
Removed -B support. Now explodes with error if -B provided.
2011-08-16 16:13:47 -04:00
Ryan Poplin
2d5bbecd9e
Merged bug fix from Stable into Unstable
2011-08-16 14:19:04 -04:00
Mauricio Carneiro
07c1e113cd
Fixed interval traversal for previously hard clipped reads.
...
If a read was hard clipped for being low quality and no does not overlap the interval anymore, this read will now be discarded instead of treated as an error by the GATK traversal engine.
2011-08-16 14:18:05 -04:00
Ryan Poplin
9d4add3268
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-08-16 14:18:03 -04:00
Ryan Poplin
170d1ff7b6
Fix in UG for trying to call indels at IUPAC code bases when in EMIT_ALL_SITES mode
2011-08-16 14:17:46 -04:00
Mauricio Carneiro
b135565183
Added low quality clipping
...
Clips both tails of a read if the tails are below a given quality threshold (default Q2).
*Added special treatment for reads that get completely clipped.
2011-08-16 13:51:25 -04:00
Andrey Sivachenko
9f3328db53
fixing read group name collision: before writing the read into respective stream in nway-out mode we now retrieve the original rg, not the merged/modified one
2011-08-16 13:45:40 -04:00
Eric Banks
ab0b56ed11
Minor doc fixes
2011-08-16 12:55:45 -04:00
Eric Banks
125ad0bcfa
Added docs to RTC
2011-08-16 12:46:48 -04:00
Eric Banks
ef9216011e
Added docs to IR
2011-08-16 12:24:53 -04:00
Eric Banks
ab1e3d6a98
Use the right set of sample names
2011-08-16 01:03:05 -04:00
Eric Banks
36c7f83208
Refactoring VE stratifications so that they don't pass around bulky data; instead just pull needed data from the VE parent. This allows us stop using deprecated features of the rod system.
2011-08-15 16:31:57 -04:00
Eric Banks
1246b89049
Forgot to initialize variants on the merge
2011-08-15 16:00:43 -04:00
Mauricio Carneiro
993ecb85da
Added Hard Clipping Tail Ends
...
Added functionality to hard clip the low quality tail ends of reads (lowQual <= 2)
2011-08-15 15:22:54 -04:00
Eric Banks
045e8a045e
Updating random walkers to new rod system; removing unused GenotypeAndValidateWalker
2011-08-15 14:05:23 -04:00
Eric Banks
fc2c21433b
Updating random walkers to new rod system
2011-08-15 13:29:31 -04:00
Eric Banks
3d56bbf087
Resolving merge conflicts
2011-08-15 12:28:05 -04:00
Eric Banks
9ddbfdcb9f
Check filtered status before applying to alt reference
2011-08-15 12:25:23 -04:00
Mauricio Carneiro
0d976d6211
Fixed second time clipping
...
When a read is clipped once, and then in the second operation, because of indels, it doesn't reach the coordinate initially set for hard clipping, the indices were wrong. This should fix it.
2011-08-15 12:04:53 -04:00
Mauricio Carneiro
489c15b99d
Fixed indexing issue in coordinate conversion
...
When a read had been previously soft clipped, the UnclippedEnd could not be used directly as Reference Coordinate for clipping , because the read does not go that far.
2011-08-15 01:42:34 -04:00
Mauricio Carneiro
c7b69a4574
Fixed integration tests
2011-08-14 16:38:20 -04:00
Mauricio Carneiro
6ae3f9e322
Wrapped clipping op information
...
The clipping op extra information being kept by this walker was specific to the walker, not to the read clipper. Created a wrapper ReadClipperWithData class that keeps the extra information and leaves the ReadClipper slim.
(this is a quick commit to unbreak the build, performing integration tests and will make further commits if necessary)
2011-08-14 15:44:48 -04:00
Mauricio Carneiro
8a51732049
Fixes to ReadClipper and added Reference Coordinate clipping.
...
* Added reference coordinate based hard clipping functions. This allows you to set a hard cut on where you need the read to be trimmed despite indels.
* soft clipping was messing up cigar string if there was already a hard clip at the beginning of the read. Fixed.
* hard clipping now works with previously hard clipped reads.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro
291d8c7596
Fixed HardClipping and Interval containment
...
* Hard clipping was wrongfully hard clipping unmapped reads while soft clipping then hard clipping mapped reads. Now we throw exception if we try to hard/soft clip unmapped reads and use the soft->hard clip procedure fore every mapped read.
* Interval containment needed a <= and >= to make sure it caught the borders right.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro
0be1dacddb
Refactored interval clipping utility
...
reads are clipped in map() and now we cover almost all cases. Left behind the case where the read stretches through two intervals. This will need special treatment later.
2011-08-14 14:54:33 -04:00
David Roazen
bb4ced3201
SnpEff-related fixes.
...
-To correctly handle indels and MNPs, only consider features that start at the current locus,
rather than features that span the current locus, when selecting the most significant effect.
-Throw a UserException when a SnpEff rodbinding is not provided instead of simply not adding
any annotations and silently returning.
2011-08-12 15:26:24 -04:00
Mauricio Carneiro
10e873d9c6
Merge branch 'repval'
2011-08-12 15:24:31 -04:00
Guillermo del Angel
31dc831531
Merged bug fix from Stable into Unstable
2011-08-12 13:26:41 -04:00
Menachem Fromer
9121b8ed65
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-12 12:24:19 -04:00
Menachem Fromer
7ed120361d
Fixed bug that required symbolic alleles to be padded with reference base and added integration test to test parsing and output of symbolic alleles
2011-08-12 12:23:44 -04:00
Eric Banks
7ea9196321
Better error message for name/type clashes.
2011-08-12 11:18:14 -04:00
Eric Banks
27f0748b33
Renaming the HapMap codec and feature to RawHapMap so that we don't get esoteric errors when trying to bind a rod with the name 'hapmap' (since it was also a feature).
2011-08-12 11:11:56 -04:00
Menachem Fromer
c7ca33cbff
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-12 10:12:09 -04:00
Eric Banks
41f3da75d7
Implementation in VE was confusing 'variant' status vs. 'polymorphic' status. This led to issues because we now match types of eval and comp; specifically, subsetting a VC to a monomorphic sample can't change the 'variant' status of the VC (it's still a variant site or otherwise we'll never match the comps, which breaks GenotypeConcordance). CountVariants really got this wrong. Fixed. VE now passes all integration tests.
2011-08-12 02:22:44 -04:00
Eric Banks
eba316621d
Finish moving VE over to new rod system and fixing up the type inconsistency between eval and comp rods. Now the novel count is always 0 under the known stratification. :)
2011-08-12 00:40:08 -04:00
Menachem Fromer
9de06560df
Update to new RodBinding system
2011-08-11 17:54:16 -04:00
Eric Banks
90771b74b4
When matching eval to comps, try to choose the one with the same alt allele.
2011-08-11 13:55:01 -04:00
Eric Banks
200f73b008
No reason to warn the user anymore because it's no longer possible for them to specify a dbsnp file on the command-line.
2011-08-11 13:44:07 -04:00
Eric Banks
e93538cdf7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 13:39:36 -04:00
Eric Banks
265c3d744b
Fixing VariantEval logic and having it use the new rod system.
2011-08-11 13:39:34 -04:00
Ryan Poplin
b705d9cf15
Oops, these VariantAnnotator input bindings aren't needed during the UG
2011-08-11 13:17:16 -04:00
Ryan Poplin
7fade88070
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 11:02:47 -04:00
Ryan Poplin
c7b9a9ef0a
Updating UnifiedGenotyper to use the new rod binding system.
2011-08-11 11:02:11 -04:00
Mark DePristo
418a4d541f
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 11:01:38 -04:00
Mark DePristo
e71255d3c2
GATKDocsExample walker
...
-- Shows the best practice for documentating a walker with the GATKdocs
-- See http://www.broadinstitute.org/gsa/wiki/index.php/GATKdocs#Writing_GATKdocs_for_your_walkers for a brief discussion
2011-08-11 11:01:21 -04:00
Ryan Poplin
79c86e211f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 09:59:20 -04:00
Ryan Poplin
ea42ee4a95
Updating BQSR for the new rod binding system.
2011-08-11 09:58:42 -04:00
Mark DePristo
8cdc0cbd9c
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 08:58:49 -04:00
Mark DePristo
40e06f9afb
Fixed broken RodBinding defaults.
...
-- Verified now to be correct at runtime
-- UnitTest covers this
-- createTypeDefault now takes a Type, not a Class, so that parameterized classes can have their parameter fetched in the defaults.
2011-08-11 08:58:30 -04:00
Ryan Poplin
dd5fe8291d
Fixing up some comments in the BQSR
2011-08-11 08:36:00 -04:00
Eric Banks
f1b09db39e
Fixes for rod bindings
2011-08-10 23:08:47 -04:00
Eric Banks
75985c2fa0
Resolving merge conflicts
2011-08-10 22:45:11 -04:00
Eric Banks
bdb1da30fd
Better interface for getting RodBindings to the VariantAnnotatorEngine and its annotations: pass around an AnnotatorCompatibleWalker (interface) object. Updating VA to use the new rod system.
2011-08-10 22:43:08 -04:00
Mark DePristo
0086e27741
makeUnbound now package protected
...
-- Removed references to it in the codebase
-- Fixed documentation I saw that had the summary + body style
2011-08-10 22:29:32 -04:00
Mark DePristo
cb6cf25bb0
Updating SelectVariants documentation to reflect best practice
2011-08-10 22:24:18 -04:00
Mark DePristo
00b4d6ec57
Updated the best practice on documenting a field
...
-- Best practice is now to skip the summary, as this is the @annotation doc value.
2011-08-10 22:21:12 -04:00
Mark DePristo
2007d2fcad
Better documentation for default value fields
...
-- DocString function for types that create default outputs "stdout"
-- RodBinding now creates a makeUnbound default value automatically for you if your RodBinding isn't required
-- Removed warning about sparse help from TextFormattingUtils
2011-08-10 22:16:22 -04:00
Mauricio Carneiro
bb557266ca
Merge branches to get new RodBinding framework
...
Conflicts:
private/java/src/org/broadinstitute/sting/gatk/walkers/replication_validation/ReplicationValidationWalker.java
2011-08-10 18:23:01 -04:00
Guillermo del Angel
8325cb8c26
Fixing up apparent source control/merge snafu: fix to correctly output PL ordering in multi-allelic sites by UG was only half-committed and hence not working. This completes fix
2011-08-10 15:31:49 -04:00
Eric Banks
07ad8c78a9
More tools moved over. Fixed the VariantContextIntegrationTest which was not useful because the md5s were all removed. In the future, instead of removing md5s (putting it in 'parameterization' mode), you should instead use @Test{enabled=false} since it's easier to track.
2011-08-10 14:24:40 -04:00
Eric Banks
8d14d32a62
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-10 13:42:37 -04:00
Eric Banks
749c8bfbcd
Moving more tools over to the new rod system
2011-08-10 13:42:35 -04:00
David Roazen
0497170bc9
SnpEffCodec now implements SelfScopingFeatureCodec so that we no longer have to specify the codec name on the command line for SnpEff files.
2011-08-10 13:12:09 -04:00
David Roazen
577f861f69
Pass the rodBindings into the VariantAnnotator engine, and from there to the
...
annotation classes themselves.
2011-08-10 13:11:57 -04:00
David Roazen
480e7a7984
Correctly initialize the optional SnpEff rod binding in VariantAnnotator using
...
RodBinding.makeUnbound()
2011-08-10 12:25:26 -04:00
Eric Banks
a42f90db11
Moving more tools over to use the standard VC arg collection. Also, while I'm in there, I removed all of the empty references to @Requires given that it's no longer relevant.
2011-08-10 12:20:18 -04:00
Eric Banks
c884b6bf1f
Fixed comment
2011-08-10 12:07:43 -04:00
Eric Banks
06cdc4d5f9
Added a StandardVariantContextInputArgumentCollection that is now used for consistency by many of the core tools.
2011-08-10 12:00:56 -04:00
Ryan Poplin
bc125f104a
TrainingSets class is obsolete now.
2011-08-10 10:23:33 -04:00
Ryan Poplin
c60cf52f73
Updating VQSR for new RodBinding syntax. Cleaning up indel specific parts of VQSR.
2011-08-10 10:20:37 -04:00
Eric Banks
1ea5ec276b
Minor cleanup
2011-08-09 23:28:59 -04:00
Eric Banks
bc2d4f554d
Bringing Indel Realigner up to speed with the new rod binding syntax; now use -known to specify the known indels track.
2011-08-09 23:21:17 -04:00
Eric Banks
b8f572b571
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-09 23:19:51 -04:00
Eric Banks
08631546c8
Partial commit for David so he can see what I want to do with the VariantAnnotator. Added a DbsnpArgumentCollection that people can use in their walkers to ensure that we have a standard syntax whenever allowing dbsnp rods. Added it to UG, but didn't hook it up. Maybe we should do the same for the 'variant' rod?
2011-08-09 23:19:40 -04:00
Mark DePristo
86afe878a7
ReducedRead optimization: single pass likelihood calculation
...
-- Low level add() now takes a nObs argument and rather than += likelihood now does += nObs * likelihood
2011-08-09 20:55:15 -04:00
Eric Banks
489e5cffc1
Missed a few 'variants'
2011-08-09 14:29:15 -04:00
Eric Banks
b20c4d5286
Thanks to Mark for agreeing to transition from 'variants' back to 'variant'. I think I got them all but I've been jumping all around the code, so there might be a straggler or two.
2011-08-09 12:04:55 -04:00
Eric Banks
78aa6db076
added the 'reference' header line too. We are now header-compliant for vcf4.1.
2011-08-09 11:45:54 -04:00
Eric Banks
ec76bf6d4a
VCF headers now include 'contig' lines describing the name, length, and assembly (when easily parsable) for each contig in the reference.
2011-08-09 11:24:48 -04:00
Eric Banks
7afb5c9f1c
More updates to be consistent with the new rod syntax.
2011-08-09 10:11:37 -04:00
Eric Banks
70b3daf689
VariantsToVCF is up and running again; integration tests are reenabled (and added one for dbSNP).ant
2011-08-09 03:03:43 -04:00
Mauricio Carneiro
d15852be0a
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-09 00:04:59 -04:00
Mauricio Carneiro
2db6225c53
A read filter that sets all mapping qualities to a given value
...
Pacbio has decided to assign 255 to the MQ of all their reads since they claim their aligner does not produce a number equivalent to a mapping quality. Despite much back and forth, they are dead set on not using this field, so if we want to use their bams, we will need to override that. This filter does just that. Replacing all values with a given one. Default is 60.
2011-08-09 00:04:42 -04:00
David Roazen
2efa376619
Made the necessary changes to get SnpEff support working with the new rodbinding system.
2011-08-08 23:29:39 -04:00
David Roazen
b180a1311a
Merge branch 'snpEff'
2011-08-08 22:12:14 -04:00
David Roazen
a13bc7b929
Added an integration test for the SnpEff annotation support, as well as some extra safety checks and comments.
2011-08-08 20:01:24 -04:00
Mark DePristo
80924d24de
Single positional arguments are now treated as names unless they actually match a tribble feature
2011-08-08 19:26:27 -04:00
Mark DePristo
f8a56bc64b
Merge branch 'master' into rodRefactor
2011-08-08 16:58:18 -04:00
Mark DePristo
f8ad91b16f
Reverting a bunch of bad -B type drops
2011-08-08 16:57:38 -04:00
David Roazen
5e288136e0
Added unit tests for the SnpEff codec, and made minor adjustments to the codec itself.
2011-08-08 16:51:43 -04:00
Eric Banks
d7813db217
Combine Variants was actually outputting invalid VCFs in cases where it was combining Variant Contexts with different alternate alleles: if any of the genotypes had PLs they were no longer valid/correct. Added a check for such cases (the combined VC has more alleles than an original VC) and strip out the PLs when triggered; added integration test to cover it. I also added the check to Select Variants, although it currently doesn't remove unused alleles so it should never trigger. Is there any reason not to strip out unused alleles after a select?
2011-08-08 16:25:35 -04:00
Mark DePristo
383bb6f0e0
Merge branch 'master' into rodRefactor
2011-08-08 15:25:55 -04:00
Mark DePristo
ba7353c561
Updated IntegrationTests to use the new type free format for VCF files
2011-08-08 15:04:38 -04:00
Mark DePristo
0810c42309
GATK now does dynamic type determination for VCF files
...
Added UnitTests covering all of the cases.
2011-08-08 14:45:46 -04:00
Mark DePristo
e36994e36b
Refactored a FeatureManager class from RMDTrackBuilder
...
New class handles (vastly more cleanly) the db of tribble codecs, features, and names for use throughout the GATK.
Added SelfScopingFeatureCodec interface that allows a FeatureCodec to examine a file and determine if the file can be parsed. This is the first step towards allowing the GATK to dynamically determine the type of a RodBinding.
2011-08-08 14:04:46 -04:00
Eric Banks
197169e47b
Submitting patch from Larry Singh to make MathUtils compatible with java 1.7
2011-08-08 13:34:04 -04:00
David Roazen
dd974040af
When finding the highest-impact effect at a locus, all effects that are not within a
...
non-coding gene are now considered higher impact than all effects that are within a
non-coding gene.
2011-08-08 13:29:54 -04:00
David Roazen
c1061e994c
Initial support for adding genomic annotations through VariantAnnotator using
...
the output from the SnpEff tool, which replaces the old Genomic Annotator.
2011-08-08 13:29:53 -04:00
Mark DePristo
0db79207e8
Refactored dependancy from CommandLineGATK from javadocs
...
This allows us to run the GATK again in environments without Javadoc loading by default in the classpath
2011-08-08 12:27:13 -04:00
Mark DePristo
e5fde0d16b
Merge branch 'master' into rodRefactor
2011-08-08 10:08:43 -04:00
Mark DePristo
526b524c3c
CombineVariants with new RodBinding. Bugfix
...
-- CombineVariants now uses the new RodBinding syntax, -V / --variants. Passed all integration tests on first run
-- Exposed gapping bug in the List<RodBinding<T>> system now fixed. ParserEngine now has a addRodBinding() that is called by RodBindingArgumentTypeDescriptor when it encounters each RodBinding. This allows the system to work with collection types that are recursively parsed by the system.
2011-08-07 20:16:51 -04:00
Ryan Poplin
6693407bd8
Merged bug fix from Stable into Unstable
2011-08-07 17:39:03 -04:00
Mark DePristo
5f8bc3aa8a
Documenting classes, and name cleanup
2011-08-07 15:17:50 -04:00
Mark DePristo
1c63d43176
Help now points to GATKDocs instead of spitting out full, garbled description
2011-08-07 15:02:46 -04:00
Mark DePristo
b0e91f85cf
fix merge from Khalid's Queue fix
2011-08-07 10:33:20 -04:00
Mark DePristo
4d88e72958
Merge remote-tracking branch 'remotes/khalid/rodRefactor' into rodRefactor
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java
public/java/test/org/broadinstitute/sting/BaseTest.java
2011-08-07 10:32:27 -04:00
Khalid Shakir
f049461120
Changed @Argument to @Input on input RodBindings.
...
Changed shortname collision with longname.
Restored scala builds.
Updated HSP to use new syntax.
2011-08-06 20:44:19 -04:00
Mark DePristo
d7f98e5c2a
Fixed merge conflict deleting a {
2011-08-04 18:48:34 -04:00
Mark DePristo
75632abf88
Merge branch 'master' into rodRefactor
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToVCF.java
public/java/test/org/broadinstitute/sting/gatk/walkers/indels/RealignerTargetCreatorIntegrationTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java
2011-08-04 18:44:14 -04:00
Mark DePristo
f21f7f6335
SelectVariants fully documented, now the shining example of the new RodBinding system.
2011-08-04 18:28:59 -04:00
Mark DePristo
9be1ee59cc
TODO comments for Eric
2011-08-04 18:07:50 -04:00
Mauricio Carneiro
b22a3d6508
Functional VCF output.
...
It is outputting a VCF with the 'second best guess' for the alternate allele correctly. Annotations are added at the pool level, but may get overwritten at the lane and site level. Still need to implement the merging of the the annotations at higher levels.
2011-08-04 17:49:08 -04:00
Guillermo del Angel
a8eb8c27f0
a) Minor changes to indel consensus scripts to better reflect good default values, b) Fixed up Mills/Devine codec so it always produces correct ref padded bases, and added option to VariantsToVCF to fix reference base
2011-08-04 15:34:49 -04:00
Ryan Poplin
98a96f07c1
Updated standard deviation parameter in VQSR to our current recommended value
2011-08-04 14:06:26 -04:00
Eric Banks
e48492f3c3
Validate that the reference padding base for indels is correct.
2011-08-04 12:48:56 -04:00
Mark DePristo
f0d798d47c
Bug fix: call RodBinding.resetNameCounter() in new ParsingEngine() so that we don't magically misnumber arguments in the integration tests where the GATK is only instantiated once.
2011-08-04 12:06:10 -04:00
Mark DePristo
d0279bb28c
RodBinding names are now defaulting to the ArgumentTypeDescriptor fullname
...
Nearly all of the tools are passing integrationtests
2011-08-03 20:48:11 -04:00
Mark DePristo
0ef85647f7
A working version of a GATKReportDiffableReader for the diffEngine!
2011-08-03 18:21:18 -04:00
Mark DePristo
acbd3d0922
Fixing up integration tests so more
2011-08-03 17:26:35 -04:00
Mark DePristo
8f696c7731
Continuing progress towards RodBinding 1.0
...
-- Cleaning up old interface to RMDT, docs and contracts added
-- Proper type checking for RodBinding for cases where the Tribble type isn't found or is the wrong type
2011-08-03 17:19:28 -04:00
Mark DePristo
800bb97f0b
Removed getFeaturesAsGATKFeature and created createGenomeLoc(Feature) in genomeLocParser
...
Updated all walkers that used the now deleted methods.
2011-08-03 16:04:51 -04:00
Mark DePristo
f6563c0f9f
Removed support for RMD in @Requires and @Allows
...
Merge as well
Conflicts:
private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java
public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java
public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java
public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-03 15:36:55 -04:00
Mark DePristo
79e4a8f6d3
Merge
...
Conflicts:
private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java
public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java
public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java
public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-03 15:09:47 -04:00
Mark DePristo
38efd3066c
Bug fix for mask RodBinding
2011-08-03 14:58:18 -04:00
Eric Banks
f62f47d476
Not sure why this didn't fail before, but bringing VE up to date with previous changes
2011-08-03 14:27:07 -04:00
Mark DePristo
b25140db83
Contracts and documentation for some of RefMetaDataTracker
...
Continuing to fix integration tests that don't pass / run
2011-08-03 13:34:20 -04:00
Eric Banks
f6648e0144
Don't left-align complex indels because it's too complicated.
2011-08-03 12:03:50 -04:00
Mark DePristo
85c67e9891
Contracts and documentation for Rodbinding
2011-08-03 11:16:06 -04:00
Eric Banks
5dc324ff35
Dealing with merge confict
2011-08-03 11:03:47 -04:00
Eric Banks
7c89fe01b3
Instead of having the padded reference base be some hackish attribute it is now an actual variable in the Variant Context class. More importantly, we now always require that it be present when padding is necessary - and validate as such upon construction of the VC. This cleans up the interface significantly because we no longer require that a reference base be passed in when writing a VC/VCF record.
2011-08-03 11:00:36 -04:00
Khalid Shakir
5dcac7b064
GATKReport v0.2:
...
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
2011-08-03 00:24:47 -04:00
Mark DePristo
2874835997
Bug fix for type checking RodBindings
...
Now compares the feature class not the codec class.
UnitTests improvements
integrationtests on their way to actually running
2011-08-02 22:25:41 -04:00
Mark DePristo
b5e843f8f0
Approaching the end for the new RodBinding system
...
-- support for explicit naming of bindings (-X:name,type x)
-- support for automatic naming of bindings in lists (-X:vcf foo.vcf -X:vcf bar.vcf will generate internal names X and X2)
-- ParserEngineUnitTest expanded to cover all of the Rodbinding cases
-- RodBindingUnitTest tests all of the low-level accessors
-- Parsing engine throws UserExceptions when bad bindings are provided on the command line
2011-08-02 22:00:06 -04:00
David Roazen
d3437e62da
Added a simple utility method Utils.optimumHashSize() to calculate the optimum
...
initial size for a Java hash table (HashMap, HashSet, etc.) given an expected
maximum number of elements. The optimum size is the smallest size that's
guaranteed not to result in any rehash / table-resize operations.
Example Usage:
Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements));
I think we're paying way too heavy a price in unnecessary rehash operations across
the GATK. If you don't specify an initial size, you get a table of size 16 that gets
completely rehashed and doubles in size every time it becomes 75% full. This means you
do at least twice as much work as you need to in order to populate your table:
(n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n
2011-08-02 21:59:06 -04:00
Mark DePristo
83891271b5
--variants throughout integrationtests
2011-08-02 20:28:47 -04:00
Mark DePristo
3a27a25cfc
Validates that the tribble binding provides the right object types at startup
...
Tests to ensure this remains working
2011-08-02 20:11:24 -04:00
Guillermo del Angel
df37716857
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-02 18:27:13 -04:00
Mark DePristo
e4a67f3df1
RefMetaDataTracker has complete set of get() functions for List<RodBinding<T>>
...
Including unit tests
2011-08-02 14:28:35 -04:00
Mark DePristo
03741fb640
Merge branch 'master' into rodRefactor
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java
public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java
public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java
public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-02 14:21:58 -04:00
Mark DePristo
a366f9a18d
Updating tools to use the RodBinding<T> syntax
2011-08-02 14:05:51 -04:00
Ryan Poplin
c0653514b3
minor update to comment in UG
2011-08-02 13:34:48 -04:00
Ryan Poplin
2ba57bb502
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-02 13:30:46 -04:00
Ryan Poplin
38e4ae4176
minor update to comment in UG
2011-08-02 13:30:38 -04:00
Guillermo del Angel
821bbfa9e0
Bug fixes and enhancements to run whole-genome indel VQSR, removed old chr20-only code and cleanup
2011-08-02 13:17:20 -04:00
Eric Banks
65c5d55b72
Not sure how I missed these. These lines are now superfluous.
2011-08-02 12:48:36 -04:00
Eric Banks
2c5e526eb7
Don't use the mismatch fraction by default in the RealignerTargetCreator (since it's only useful when using SW in the indel realigner). Also, no more use of -D but instead move over to using VCFs. One integration test is temporarily commented out while I wait for a VCF file to get fixed.
2011-08-02 10:34:46 -04:00
Eric Banks
5626199bb6
The Unified Genotyper now does NOT emit SLOD/SB by default; to compute SB use --computeSLOD
2011-08-02 10:14:21 -04:00
Mark DePristo
184030dd56
RefMetaDataTracker no longer automagically converts inputs to VariantContexts
...
This was no longer working properly given that DBSNP indels needed to be moved around. The adaptor system is being refactored and you will need to convert files from X -> VCF for many tools to work.
2011-08-01 15:21:16 -04:00
Mark DePristo
8b1adb8c95
Removed getVariantContext() code
2011-08-01 13:41:09 -04:00
Eric Banks
3a9b6eacdf
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-01 11:23:18 -04:00
Mark DePristo
7b07c4e04e
RefMetaDataTracker now has get() methods accepting RodBindings
...
RodBinding no longer duplicates the get() methods in RMDT. This is just an object now that connects the command line system to the RMDT.
Updated programs to use new style
Added UnitTests for the RodBinding accessors.
2011-07-30 15:34:11 -04:00
Mark DePristo
a6691ab2fd
List<RodBinding<T>> now working (sort of).
...
At least the argument parsing system tolerates it.
2011-07-29 16:11:22 -04:00
Mark DePristo
6acb4aad3b
RodBinding<T> are properly generic now.
...
VariantContextRodBinding removed, as RodBinding<VariantContext> is the right style now.
2011-07-29 14:37:12 -04:00
Mark DePristo
3b799db61a
RefMetaDataTracker cleanup and unit tests
...
You know have to provide an explicit list of RODRecordLists upfront to the constructor. RefMetaDataTracker is now immutable. Changes in engine to incorporate these differences
Extensive UnitTests for RefMetaDataTracker now.
2011-07-29 13:23:17 -04:00
Ryan Poplin
b06deac9ea
Merged bug fix from Stable into Unstable
2011-07-29 10:02:36 -04:00
Ryan Poplin
c0d4110ffd
Correcting redundant warning text.
2011-07-29 10:01:11 -04:00
Mauricio Carneiro
a58ddab93b
minQual and minPower filters added. VCF output added.
...
Calls are now made based on the likelihood AC model. Two filters are applied: minQual and minPower. Output is now a VCF file with the variant context. It's now called the gatk's PoolCaller, no longer Replication Validation framework. Lots of testing ensue....
2011-07-28 18:58:36 -04:00
Mark DePristo
39b4e76fde
Continuing refactoring of RefMetaDataTracker.
...
On the path towards converging getVariantContext() and getValues() in tracker so that we can have a single approach to get values from RODs with the new RodBinding() types
2011-07-28 17:48:28 -04:00
Mark DePristo
7c5c656b46
Uncovered fundamental accounting bug in VariantEval. Will be fixed by dev. team
...
Problem is that Novelty sees multiple records at a site (SNP, INDEL) to calculate whether a site is novel, but VariantEvalWalker makes an arbitrary decision which to use for analysis and CompOverlap may not see a comp record of the same type as eval. So you get lines where the stratification is known but there are 10 novel sites!
2011-07-28 14:19:27 -04:00
Eric Banks
33b32c4211
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-28 13:57:22 -04:00
Eric Banks
7a2a65155f
Merged bug fix from Stable into Unstable
2011-07-28 13:56:43 -04:00
Eric Banks
1afc49a297
There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor.
2011-07-28 13:55:58 -04:00
Mark DePristo
f7a126722b
Cleaned up VariantContext accessors in RefMetaDataTracker
...
It's no longer possible to provided allowed types, as this was a very rarely used feature in the engine. These get methods have been removed and local uses replaced with tests directly in their code. This simplified the RefMetaDataTracker significantly
VariantContextRodBinding now forwards along all of the RefMetaDataTracker methods, so it is possible to create a full equivalent VariantContextRodBinding now as a walker field variable.
All walkers updated to the new RefMetaDataTracker function call style
2011-07-28 00:16:34 -04:00
Mark DePristo
c83f9432eb
Cleaned up RefMetaDataTracker
...
Renamed many functions to more clearly state what they are actually doing
Removed unnecessary / unused functionality, reducing interface complexity
Updated all uses of this code in GATK
Added generic, type-safe accessors to RefMetaDataTracker such as public <T> List<T> getValues(final String name, Class<T> clazz)
Added standard refMetaDataTracker accessors to RodBinding, so you can do everything you can for generic rods with the tracker directly with with the RodBinding
2011-07-27 23:25:52 -04:00
Mark DePristo
f3ad4ec94b
Removed annoying FastaSequenceIndexBuilderProgressListener infrastructure that was just a boolean switch on whether to print progress or not.
2011-07-27 22:06:23 -04:00
Eric Banks
ff31fa7990
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-27 16:15:23 -04:00
Eric Banks
5809a61b20
Merged bug fix from Stable into Unstable
2011-07-27 16:14:59 -04:00
Eric Banks
64aad67b5f
Fixing dbSNP adaptor for complex indels (wasn)
2011-07-27 16:13:45 -04:00
Mark DePristo
15be383d5b
Merge branch 'master' into rodRefactor
2011-07-27 15:36:49 -04:00
Mark DePristo
38a2518668
Merge branch 'master' into rodRefactor
2011-07-27 15:34:54 -04:00
Mark DePristo
60db6cc836
Warnings for old ROD system use.
...
Removed unused class GATKRODFeature
2011-07-27 12:39:12 -04:00
Mark DePristo
097828a466
ParsingEngine now maintains the list of rodBindings
...
No longer try to reparser objects to find the right fields
Direct support in RodBinding for getTags()
2011-07-27 11:36:53 -04:00
Mauricio Carneiro
20a3b31b61
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-26 19:29:45 -04:00
Mauricio Carneiro
321afac4e8
Updates to the help layout.
...
*New style.css, new template for the walker auto-generated html. Short description is no longer repeated in the long description of the walker.
*Updated DiffObjectsWalker and ContigStatsWalker as "reference" documented walkers.
2011-07-26 19:29:25 -04:00
Kiran V Garimella
405e521d44
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-26 17:56:48 -04:00
Kiran V Garimella
412c466de6
Bug fix, wherein triple-hets after genotype refinement need to be left unphased, not just prior to refinement
2011-07-26 17:43:43 -04:00
Matt Hanna
fec495e292
Fix a nasty little bug in the sharding system: if the last shard in contig n
...
overlaps exactly on disk with the first shard in contig n+1, the shards
would be merged together to avoid duplicate extraction. Unfortunately,
the interval overlap filter couldn't handle shards spanning contigs, and
was choosing to filter out reads from contig n+1 which should have been
included.
I'm not completely sure why the BAM indexing code would ever specify that the
end of one chromosome had the same on-disk location as the start of the next
one. I suspect that this is a indexer performance bug.
2011-07-26 15:43:20 -04:00
Mark DePristo
9dfb57168a
RodBinding source is no longer assumed to be a file
2011-07-26 13:59:44 -04:00
Mark DePristo
d0badd5bd6
RodBinding subclassed to VariantContextRodBinding for easy access to VariantContext providing RODs
2011-07-26 13:54:55 -04:00
Mark DePristo
7ab8b53339
Support for List<RodBinding> argument type
2011-07-26 11:37:31 -04:00
Mark DePristo
38969b9783
Prototype of RODBinding @Arguments instead of -B syntax
...
Initial version of RodBinding class.
Flow from walker Rodbinding @Arguments -> RMDTriplet (old system) -> GATK engine (standard). Will need refactoring.
2011-07-26 11:09:06 -04:00
Matt Hanna
088fc39308
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 15:54:56 -04:00
Eric Banks
a53aeb75ab
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 15:10:35 -04:00
Eric Banks
a29554e565
Removing the Genomic Annotator and its supporting classes
2011-07-25 15:10:25 -04:00
Mark DePristo
3afcb3415d
Max of 1000 records will be loaded and compared to avoid heap size problem.
2011-07-25 14:58:31 -04:00
Mark DePristo
f3049fba63
refdata directory cleanup
...
Removing unused files RODRecordIterator, ReferenceOrderedData, QueryableTrack, RMDTrackCreationException, GATKFeatureIterator, ReferenceOrderedDataUnitTest
Refactored dbSNP and refseq utilities to be closer to the other files implementing these features
2011-07-25 13:21:52 -04:00
Matt Hanna
8014fad6ff
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 13:20:44 -04:00
Matt Hanna
2ac490dbdf
Fix improper detection of command-line arguments with missing values.
2011-07-25 13:20:00 -04:00
Mark DePristo
90947ab359
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 12:53:56 -04:00
Mark DePristo
acda8eb09c
Commented out test that causes new CommandLineGATK() to fail
2011-07-25 12:43:27 -04:00
Mauricio Carneiro
95b48eface
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable into repval
2011-07-25 12:09:09 -04:00
Kiran V Garimella
357f503a21
Merge branch 'desktop'
2011-07-25 11:36:27 -04:00
Kiran V Garimella
0b43ee117c
Added the required=false tag to the -noST and -noEV arguments so the auto-help output doesn't look weird (i.e. listing arguments as required when their value has already been specified by default).
2011-07-25 11:35:34 -04:00
Kiran V Garimella
bbb8473f03
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-25 10:59:00 -04:00
Mark DePristo
1a268ff1fd
Refactor so that GenotypeAnnotation and InfoFieldAnnotation share common superclass VariantAnnotatorAnnotation
2011-07-25 10:55:09 -04:00
Mark DePristo
7f8e6a97ee
InfoFieldAnnotation now an abstract class extended by annotations so doc system works
2011-07-25 10:47:11 -04:00
Mauricio Carneiro
4c6c16f895
Documented following the new gatkdoc framework
2011-07-25 00:25:08 -04:00
Mark DePristo
2039ce6102
Default values now displayed in arguments
...
DiffEngine fixed so that newInstance() would work. Pretty quickly encountered a situation where newInstance() failed. Debug output now written when this occurs in the log.
Logger now used instead of standard out, with INFO the default level.
2011-07-24 22:56:55 -04:00
Mark DePristo
c43b5981f2
Hidden variables are hidden by default. Settable by command line option
...
DiffObjectsWalker test arguments removed.
Minor refactoring of GATKDoclet
2011-07-24 20:52:44 -04:00
Mark DePristo
1c1f1da349
Fixing compilation
2011-07-24 20:01:59 -04:00
Mark DePristo
9f06f6c493
Split GATKDoclet from ResourceBundleDoclet. Refactored GaTKDocWorkUnit
2011-07-24 20:00:04 -04:00
Mark DePristo
ff85687679
Merge branch 'master' into help
2011-07-24 18:14:32 -04:00
Mark DePristo
83996f7951
Enumerated types are working.
2011-07-24 18:14:21 -04:00
Mark DePristo
3c34e9fa65
Cleanup emuns and tables
2011-07-24 17:45:58 -04:00
Mark DePristo
c620d96c96
Inline enum documentation is working
2011-07-24 17:22:14 -04:00
Mark DePristo
793e7d3d1d
Improved header and argument details
...
Argument detail structure cleaned up. Only relevant pieces of information are shown now, and in a cleaner layout.
Misc. cleanup in the code.
2011-07-24 16:36:25 -04:00
Mark DePristo
c6af4efcdc
Implemented see also and version header
2011-07-24 16:10:17 -04:00
Mark DePristo
5e0fe2d0f9
Support for style.css via refactored common.html included in all files
2011-07-24 15:42:39 -04:00
Mark DePristo
d0ab6bf7a9
Now links to sub and superclass documentation, where possible.
2011-07-24 09:56:17 -04:00
Mark DePristo
e2dabb70b8
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-24 08:57:47 -04:00
Mauricio Carneiro
7ffedf211c
Contig comparator -- sorting contigs like Picard
...
This is very useful if you want to output your text files or manipulate data in the usual chromosome ordering :
1
2
3
...
21
22
X
Y
GL???
...
Just use this comparator in any SortedSet class constructor and your data will be sorted like in the BAM file.
2011-07-24 02:33:19 -04:00
Mark DePristo
6b501e267b
Includes non-concrete classes in docs
...
CommandLineGATK has extraDocs to ReadFilter and UserException as well
2011-07-23 22:15:01 -04:00
Mark DePristo
7420ed098e
Semi-working version of extraDocs tag in annotation to refer to one capability being accessible in another
...
Required a significant refactoring of the GATKDoclet, which now has a unified place where the ClassDoc, class, annotation, and handler are all stored together.
2011-07-23 22:07:30 -04:00
Mark DePristo
999acacfa1
Merge branch 'master' into help
2011-07-23 20:19:33 -04:00
Mark DePristo
1d3bcce2c4
Merge branch 'master' into NoDistributedGATK
2011-07-23 20:04:50 -04:00
Mark DePristo
e262f4e10b
gatkdoc now generalized to use @Annotation. Multiple subsystems now use annotation to receive docs
...
Index expanded to use summary() annotation field
UserExceptions, ReadFilters, GATK engine all use the system to generate docs
Doclet expanded to handle lots of new cases
2011-07-23 20:00:35 -04:00
Kiran V Garimella
1dba8b768c
Merge branch 'laptop'
2011-07-23 01:39:15 -04:00
Kiran V Garimella
57e3d136eb
Don't try to phase triple-hets either.
2011-07-23 01:38:58 -04:00
Kiran V Garimella
5af9d50183
Merge branch 'laptop'
2011-07-23 01:12:06 -04:00
Kiran V Garimella
5521919cc9
Fixed bug where variants to phase were not being selected properly.
2011-07-23 01:11:28 -04:00
Kiran V Garimella
7da99388ac
Merge branch 'laptop'
2011-07-23 01:01:11 -04:00
Kiran V Garimella
58eed20b83
Copy all entries from the attributes map, rather than attempting to modify an unmodifiable map.
2011-07-23 01:00:46 -04:00
Kiran V Garimella
ffa361f57f
Merge branch 'laptop'
2011-07-23 00:50:38 -04:00
Kiran V Garimella
9417ba8c2c
Modified to accept multi-sample VCFs, removed the application of filters, and changed transmission probability field to be a genotype field rather than an INFO field.
2011-07-23 00:48:26 -04:00
Mark DePristo
28b9432d26
Docs for read filters, the engine, and the UserExceptions.
2011-07-22 16:09:21 -04:00
Kiran V Garimella
051c1dc639
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-22 15:59:00 -04:00
Mark DePristo
f0be7348be
Generalized handler to allow it to be used with any arbitrary class structure.
...
DocumentedGATKFeature now includes a field for the group name.
Build.xml works with public / private now.
2011-07-22 14:07:40 -04:00
Mark DePristo
453954182e
Generalized the documentation system to use a class-specific annotation and processor.
...
Need to generalize and bug fix the system. But at a high level it's working now.
2011-07-22 13:18:33 -04:00
Kiran V Garimella
b8a0fd2a8d
Multiply fractionRandom by 100.0 so that the line that indicates the percentage of variants that will be output says (for instance) 90%, not 0.9%
2011-07-22 11:54:59 -04:00
Mark DePristo
9e88d51db9
Removed now unused @version tags from walker docs.
2011-07-22 09:57:03 -04:00
Mark DePristo
421b70ca4f
Removed previous, and largely unused, help system extensions.
...
This involved deleting the utils/help/*Taglet.java classes, which parsed out these fields unnecessarily
This also involved removing the few uses of these from the codebase. For these uses, though, almost all were an identical copy of the first line of the docs, which is the default javadoc behavior anyway.
2011-07-22 09:42:44 -04:00
Mark DePristo
172b35372b
Moved all of the distributed GATK code to archive.
2011-07-22 09:20:32 -04:00
Mauricio Carneiro
8d7ef1bb51
Complete refactor of the ReplicationValidation framework, plus the following new functionality:
...
* merges all pools in a lane.
* merges all lanes in a site.
2011-07-21 21:39:00 -04:00
Mark DePristo
81d0cab27e
Walker index html now emited.
2011-07-21 16:01:54 -04:00
Mark DePristo
e892489696
V2 of the document system.
...
Now uses GATKDoc class to organize documentation for arguments.
Arguments now listed by feature (required, optional, hidden, etc) and link to detailed information about the argument in the html
Lots of code moving between Class and ClassDoc objects. Should be refactored into a single static utility class.
2011-07-21 15:20:34 -04:00
Christopher Hartl
2f5d10d16b
Fix bug wherein aligner could be closed prior to its being used to lowercase sequences.
2011-07-21 13:21:48 -04:00
Matt Hanna
7054c5342f
When using the BWA bindings, you have to explicitly call close() to get the
...
bindings to release memory.
It may or may not be possible to implicitly close triggered by the GC; I'll add a JIRA.
2011-07-21 12:13:29 -04:00
Mark DePristo
6fa17d86ae
Completely hacked together version of a FreeMarker + javadoc + custom doclet walker documentation generator
2011-07-21 00:18:07 -04:00
Mark DePristo
45c73ff0e5
Runs and emits an HTML document
2011-07-20 17:16:33 -04:00
Mark DePristo
d31b176e15
Removed GATK use of distributed parallelism framework.
...
Moved distributed GATK prototype code into distributedutils, separating from threading package
2011-07-20 16:26:09 -04:00
Guillermo del Angel
0a1d2df8cb
Merged bug fix from Stable into Unstable
2011-07-20 13:19:35 -04:00
Guillermo del Angel
f15023b7d2
Bad bug fix: output GLs in multiallelic records were in incorred order (misread spec)
2011-07-20 12:10:48 -04:00
Guillermo del Angel
b9c9e0e952
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-20 10:45:16 -04:00
Guillermo del Angel
7140280bf6
Further bug fixes/cleanups for PrintReadsWalker
2011-07-20 10:44:37 -04:00
Guillermo del Angel
a2d90a3590
Bug fix: reverted logic so that default behavior skips over sample lookup
2011-07-20 10:23:10 -04:00
Guillermo del Angel
e8409c80fa
Further protection vs null pointers in PrintReadsWalker
2011-07-19 21:59:24 -04:00
Christopher Hartl
5d706c9e92
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
...
Removing PSP and CSM
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/CreateSequenomMask.java
public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/PickSequenomProbes.java
2011-07-19 20:25:33 -04:00
Guillermo del Angel
fb2d475c22
Bug fix to prevent null pointer
2011-07-19 20:13:56 -04:00
Christopher Hartl
92c7cfa1c8
BWA bindings and tests moved to public (was required for ValidationAmplicons)
...
Integration tests for ValidationAmplicons. New argument to disable BWA, lowercase letters only for repetitiveness instead.
2011-07-19 20:11:31 -04:00
David Roazen
baae381acb
Revert "Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable"
...
This reverts commit 039a6bb01f345322ce2be50ae3634308bb24e77e, reversing
changes made to b9c9973d1c638dfc9f8c19b5eb845e99844f9d29.
2011-07-19 18:38:53 -04:00
Christopher Hartl
07e716d23a
PickSequenomProbes2 expanded functionality: lowercasing based on sequence uniqueness, preserving reference base prior to indel (not a part of the VC as I thought it was), masking deletion bases with 'N's, flanking insertion with 'N's, output is a fasta formatted file. Renamed to ValidationAmplicons since this is really not for picking sequenom probes, but for generating amplicon sequence from which other applications (like sequenom) can choose PCR primers. Moved from private to public.
2011-07-19 15:21:47 -04:00
Guillermo del Angel
e6d306458c
Merge bug fixes
2011-07-19 14:36:20 -04:00
Guillermo del Angel
989dd17f95
a) Add ability in PrintReads to specify a sample file to easily subset samples, useful for IGV visualization, b) VariantsToTable is more R-friendly with Indels when printing ref/alt columns, c) Changes to SelectVariants ability to speficy a mask to randomly sample from a given AF distribution
2011-07-19 14:29:07 -04:00
Mark DePristo
c05451047c
Support for multiple records at the same site. The first record gets chr:start, and subsequent records get chr:start_2, chr:start_3, etc.
2011-07-18 15:43:52 -04:00
Mark DePristo
782a05e9b5
Support for sorting the diff output in reverse order.
2011-07-18 15:43:01 -04:00
Mark DePristo
45702d3084
Now supports a mode where the primary key isn't sorted. In this case the records are displayed in the order in which they are added to to the table.
2011-07-18 15:40:15 -04:00
Eric Banks
83ba2c066a
Making it deterministic
2011-07-18 13:59:02 -04:00
Eric Banks
92fa410450
Check that it's a valid bam file before parsing or bad things can happen
2011-07-18 13:43:34 -04:00
Eric Banks
80b5c5261a
CombineVariants no longer combines records of different types. So now when combining SNP and indel callsets, overlapping calls get their own records. Useful for Khalid in the pipeline. For those interested, it turns out the previous behavior was doing the wrong thing occasionally (and this was even captured in the integration tests).
2011-07-18 13:42:45 -04:00
Eric Banks
bc8b5da698
Added docs while I was reading through the code to understand it
2011-07-18 12:25:54 -04:00
Mark DePristo
51b0dd01c3
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-18 10:47:29 -04:00
Mark DePristo
6f26c07b85
Removed the SpecificDifference class. Now Difference classes always have the option to remember specific master and test values. This means that all summarized differences carry with them specific examples of their differences. Consequently, now even summarized differences give at least one example of the specific difference, even when the count of the difference is > 1. Unit tests updated. Added DiffObjects integrationtest. VCFDiffableReader now specifically reads the first line of the VCF file to capture the version number.
2011-07-18 10:42:35 -04:00
Kiran V Garimella
b2b7d27fed
Merge branch 'laptop'
2011-07-18 00:25:46 -04:00
Kiran V Garimella
497721a799
Added class documentation string.
2011-07-18 00:25:21 -04:00
Kiran V Garimella
ac9c66138d
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-18 00:20:33 -04:00
Kiran V Garimella
8167aba601
Moved (poorly named) MergeAndMatchHaplotypes to public. Added integration test
2011-07-17 22:47:32 -04:00
Mark DePristo
9992c373be
Optimize imports run on the whole project, public and private. I just got too tired of all of the unused imports floating around. Confirmed that the system builds after the changes.
2011-07-17 20:29:58 -04:00
Kiran V Garimella
4ea433f8e1
Moved PhaseByTransmission to public
2011-07-17 19:42:00 -04:00
Mark DePristo
4db2b13e9e
Rev tribble.
...
Just added more documentation for diffEngine and pointer to new wiki:
http://www.broadinstitute.org/gsa/wiki/index.php/DiffEngine
2011-07-17 13:05:04 -04:00
Mark DePristo
92a1c0c278
Moved the varianteval/tags/DataPoint.java and varianteval/tags/Analysis.java to varianteval/utils. This allows rsync to see these files with the -C option, as tags is some kind of reserved CVS keyword.
2011-07-17 10:14:23 -04:00
Menachem Fromer
72f4cf9c0e
Walker to perform deterministic annotation of phasing by transmission (to be compatible with RBP's definition of consecutive pairwise phasing)
2011-07-15 17:44:31 -04:00
Guillermo del Angel
9d59c2cb61
a) Made indel VQSR consensus script operational again, b) Made VariantsToTable more indel-friendly when printing out REF and ALT fields: strip out * from REF and print out alleles in the same way as the VCF so that offline processing is easier
2011-07-15 10:13:02 -04:00
Guillermo del Angel
10cf9245d7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-14 19:18:05 -04:00
Mark DePristo
5ffeddd3b1
better to use _ instead of ., as this is a special case later.
2011-07-14 14:45:16 -04:00
Eric Banks
ed6beae1f3
Adding headers to diffable reading for VCFs
2011-07-14 13:55:35 -04:00
Eric Banks
66c652d687
Added some extra error checks in the VCF codec. Now that we've moved this back into the GATK, changed some of the standard exceptions to be USerErrors (instead of TribbleExceptions).
2011-07-14 11:56:10 -04:00
Eric Banks
0c54c796ed
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-13 14:57:33 -04:00
Eric Banks
bb0e3a26fc
Added integration test for VCF writing. Also, bug fix for writing the GT-free records.
2011-07-13 14:57:21 -04:00
Eric Banks
6a431da554
Don't output source and ref header lines anymore. Short-term motivation for this is that I'd like this tool when run on a VCF to emit the exact same VCF. Long-term motivation is that these tags should be output by the VCF writer itself for all tools.
2011-07-13 14:40:01 -04:00
Menachem Fromer
74aa49e423
Merged bug fix from Stable into Unstable
2011-07-13 12:12:42 -04:00
Menachem Fromer
fa3ff53508
Filters should only be applied to the new VC if the old VC had filters applied
2011-07-13 11:58:16 -04:00
Eric Banks
969227c657
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-13 10:01:28 -04:00
Eric Banks
6007eea3ff
Allowing VCF records without GTs in vf4.1
2011-07-13 09:56:08 -04:00
Guillermo del Angel
1e81d521c0
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-12 20:12:29 -04:00
Ryan Poplin
837fb8f689
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-12 15:39:26 -04:00
Ryan Poplin
5077c94d85
Adding MappingQualityUnavailableReadFilter to the SNP and indel CountCovariates
2011-07-12 15:39:07 -04:00
Mark DePristo
01fd6a6949
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-12 15:20:44 -04:00
Mark DePristo
ccedd6ff4c
Difference is now the general form -- used to be SummarizedDifference. The old Difference class is now a subclass of Difference that includes pointers to specific the master and test DiffElements.
...
Added a size() function that calculates the number of elements tree from a DiffElement.
2011-07-12 15:20:28 -04:00
Eric Banks
a2597e7f00
This commit incorporates several different changes that each pretty much break all the VCF-based integration tests, so I bunched them all together. We now officially emit VCF4.1 files (woo hoo), which means that the VCF headers are now all different (header version is 4.1 plus counts for some of the annotations are 'A' or 'G'). Also, I've added a Read Filter for reads with MQ=255 ('unavailable' in the SAM spec) and have applied this to the UG and the RMS MQ annotation.
2011-07-12 14:11:53 -04:00
Ryan Poplin
329c3d8050
Merged bug fix from Stable into Unstable
2011-07-12 13:55:51 -04:00
Ryan Poplin
73735863b0
Fix for the case of requesting genotype for a sample that doesn't exist in a VariantContext
2011-07-12 13:55:21 -04:00
Guillermo del Angel
c4c145afb9
Merged bug fix from Stable into Unstable
2011-07-12 13:44:48 -04:00
Guillermo del Angel
cfe43e3971
Bug fix for Genotype given alleles: if we are in INDEL mode ignore SNPs and MNPs instead of emitting an empty site with alleles but no annotations
2011-07-12 13:43:46 -04:00
Guillermo del Angel
bfbca8b194
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-12 12:11:58 -04:00
Mark DePristo
05212aea62
reader now takes an argument for the maximum number of elements to read from the file.
2011-07-12 08:53:19 -04:00
Mark DePristo
8056a3fe89
getElement() now uses O(1) get from hash instead of linear O(n) search. Enables us to read large files easily.
2011-07-12 08:52:31 -04:00
Eric Banks
d7d15019dd
Adding support for other simple header line types (e.g. ALT) and cleaning up the interface a bit.
2011-07-12 01:16:21 -04:00
Eric Banks
400b0d4422
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-11 23:38:57 -04:00
Mark DePristo
d5056ad899
Merge branch 'master' into diffit
2011-07-11 23:16:15 -04:00
Mark DePristo
893cc2e103
Making the package public, so there's no dependances from public -> private
2011-07-11 23:15:08 -04:00
Eric Banks
e3748675db
Support for VCF 4.1 header counts
2011-07-11 17:40:45 -04:00
Guillermo del Angel
f54c2ae3b4
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-11 16:26:27 -04:00
Christopher Hartl
d6517adb42
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-11 16:16:37 -04:00
Christopher Hartl
86890c6357
N and K (in binomial probability) got switched in RFA Walker with the last commit. No longer will NaNs be produced.
...
Added: TableToVCF. Kind of a longer-term project, but there are lots of variant calls available in a weird tabular format. I used this to convert Ju Et Al small indels to VCF. I'll check against the 1000G ASN superpopulation calls to see if we see a good amount of recapitulation, and if so, i'll put them in unvalidated comparisons. Minor chances to the TableCodec and TableFeatures to allow for this (the codec can sometimes drop a column, and the feature now allows you to grab on to its header).
2011-07-11 16:16:15 -04:00
Guillermo del Angel
d587856f2d
Private feature to input a list of family descriptions from a file and to look for MV's on all of these. Feature can also output a detailed description of the violation into a separate file
2011-07-11 14:17:59 -04:00
Guillermo del Angel
6e7b5e1e7a
Merged bug fix from Stable into Unstable
...
Merge branch 'master' into unstable
2011-07-08 21:19:45 -04:00
Guillermo del Angel
7fbc5987d0
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-07-08 21:17:32 -04:00
Mark DePristo
bd29236684
Merge branch 'master' into diffengine
2011-07-08 14:08:17 -04:00
Guillermo del Angel
224574424e
Bug fix: if we're genotyping a very long indel (>100 bp) fail gracefully instead of with an array out of bounds exception
2011-07-08 12:48:49 -04:00
Ryan Poplin
2a4b3ae4a2
Cleaning up / removing most of the monkeying around with annotation values that happens in VariantDataManager
2011-07-08 12:48:33 -04:00
Mark DePristo
8add2a3866
Merge branch 'master' into diffengine
2011-07-08 09:15:54 -04:00
Eric Banks
cc143493e3
Merged bug fix from Stable into Unstable
2011-07-07 23:01:24 -04:00
Eric Banks
4cfe0dd857
Test for bad alleles so that we don't generate IndexOutOfBoundsExceptions
2011-07-07 23:01:03 -04:00
Mark DePristo
3d4f0e9dd7
Now supports the case where you have multiple AC values in the info field.
2011-07-07 17:21:15 -04:00
Ryan Poplin
212e9a1a0c
Fixing unstable build after stable commit
2011-07-07 15:18:57 -04:00
Ryan Poplin
11d9a0473a
Merged bug fix from Stable into Unstable
2011-07-07 15:03:58 -04:00
Ryan Poplin
50111db2b7
Fixing non-determinism in single-threaded VQSR by moving references to cern.Normal over to the static random generator available in GenomeAnalysisEngine
2011-07-07 15:02:48 -04:00
Guillermo del Angel
4d565b0811
Merge branch 'incoming'
2011-07-07 06:21:05 -04:00
Guillermo del Angel
55c8c05060
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-07 06:18:29 -04:00
Guillermo del Angel
5ab2e83904
a) Cosmetic modifications to IndelType annotation. b) Add ability to select samples from a file in PrintReads, c) fixes to shaped AF random selection in SelectVariants
2011-07-07 06:15:10 -04:00
Eric Banks
52f6f9fdcc
Merged bug fix from Stable into Unstable
2011-07-06 16:05:48 -04:00
Eric Banks
54121eb082
Catch malformed bams that cause the writer to run in infinite loops
2011-07-06 16:05:08 -04:00
Eric Banks
76a01a7453
Merged bug fix from Stable into Unstable
2011-07-06 12:53:09 -04:00
Eric Banks
14fee4ccbd
Patch from Bob to deal with symbolic alleles: these weren't getting padded but they should be.
2011-07-06 12:51:44 -04:00
Ryan Poplin
bdef233d4d
Merged bug fix from Stable into Unstable
2011-07-06 10:05:02 -04:00
Ryan Poplin
e8ed6b7f0f
Adding more comments to main VQSR walker. Fixing copyright lines. Bug fix for default paths to now point to public/R/ instead of R/ Bug fix in VQSR for the path to the R scripts not ending in a slash.
2011-07-06 10:01:14 -04:00
Guillermo del Angel
8e8b901d12
Merged bug fix from Stable into Unstable
...
Merge branch 'master' into unstable
2011-07-06 09:57:55 -04:00
Guillermo del Angel
81a4d18468
Mark several indel-related arguments as @Hidden
2011-07-06 09:56:38 -04:00
Guillermo del Angel
9124c84a7c
bug fixes
2011-07-04 21:10:44 -04:00
Guillermo del Angel
bb85f232b9
bug fixes
2011-07-04 21:04:49 -04:00
Guillermo del Angel
f26ffeaea0
bug fixes
2011-07-04 20:48:45 -04:00
Guillermo del Angel
04df153f47
bug fixes
2011-07-04 20:45:10 -04:00
Guillermo del Angel
7a04872a3f
bug fixes
2011-07-04 20:33:59 -04:00
Guillermo del Angel
08bc843d4c
SelectVariants can get a table to boost AF when choosing randomly
2011-07-04 20:23:22 -04:00
Guillermo del Angel
fac082de64
Report only highest AF and AC in multiallelic records in VariantsToTable or else R can't parse table
2011-07-03 14:32:12 -04:00
Guillermo del Angel
abe9480c6d
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-02 21:19:15 -04:00
Ryan Poplin
fb315b5f8c
Merge branch 'incoming'
2011-07-02 18:10:48 -04:00
Ryan Poplin
41d46059e7
fixing bad format statement
2011-07-02 18:09:17 -04:00
Ryan Poplin
3804afeb8a
Merge branch 'incoming'
2011-07-02 17:55:39 -04:00
Ryan Poplin
781c0c33a4
Use the worst X% of calls in addition to the bad training sites list. Don't include the already added calls in the calculation of X%
2011-07-02 17:55:10 -04:00
Ryan Poplin
6b8af6afd8
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-02 17:15:56 -04:00
Ryan Poplin
fdc2ebb321
Adding ability to specify in VQSR a list of bad sites to use when training the negative model. Just add bad=true to the list of rod tags for your bad sites track.
2011-07-02 17:15:13 -04:00
Guillermo del Angel
09af6bbc6c
Ugh - backed out experimental code not for public consumption unintendedly committed
2011-07-02 16:58:57 -04:00
Guillermo del Angel
c6c0dba040
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-07-02 16:45:34 -04:00
Ryan Poplin
4532a84314
Merged bug fix from Stable into Unstable
2011-07-02 10:48:55 -04:00
Ryan Poplin
5faf40b79d
Moving AnalyzeAnnotations into the archive because it has outlived its usefulness.
2011-07-02 10:39:53 -04:00
Ryan Poplin
17ff5bb094
Variant records coming out of the VQSR are now annotated with which input annotation was most divergent from the Gaussian mixture model. This gives a general sense for why each variant was removed from the callset.
2011-07-02 09:55:35 -04:00
Khalid Shakir
c65e52f88a
Merged bug fix from Stable into Unstable
2011-07-01 20:50:56 -04:00
Khalid Shakir
b6bc64a0c8
Cleanup of the utils.broad package.
...
Using Picard IoUtils on sample names.
2011-07-01 20:47:03 -04:00
Eric Banks
0c9105ca22
Minor fix of description
2011-07-01 18:07:35 -04:00
David Roazen
d647ea4fdc
Long-delayed change to CachingIndexedFastaSequenceFile. Made the cache
...
non-static to avoid problems when multiple references are used within the same
thread (eg., during integration tests). This should kill the intermittent
IndelRealignerIntegrationTest failures.
2011-07-01 16:04:30 -04:00
Eric Banks
761347b8d5
The VariantContext utility method used by SelectVariants wasn't checking the filter status (unfiltered vs. passing filters) and always returned a VC that was passing filters. This is fixed and the md5 from the VCF Streaming test has been re-updated.
2011-06-30 15:26:09 -04:00
Mark A. DePristo
defa3cfe85
Moved around private walkers into appropriate directories in private gatk.walkers. Moved a few public walkers into private qc package, and some private qc walkers into the public directory. Removed several obviously broken and/or unused walkers.
2011-06-30 14:59:58 -04:00
Eric Banks
804d5f22d5
Reverting previous change, as promised.
2011-06-30 13:18:30 -04:00
Eric Banks
9e234cf5d6
This is a temporary commit for Picard. It will absolutely break integration tests, but I'm going to revert it in 1 minute. Because we don't want them in unstable, I need to push this into stable.
2011-06-30 13:17:14 -04:00
Guillermo del Angel
331b47afbd
Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-06-30 08:29:11 -04:00
Guillermo del Angel
50c32ce52e
VariantsToTableFix
2011-06-29 21:39:53 -04:00
Guillermo del Angel
9b134f3b96
VariantsToTableFix
2011-06-29 21:33:41 -04:00
Guillermo del Angel
2b88033ef4
Enable considering 454 reads, just lower GOP by 15
2011-06-29 16:12:55 -04:00
Guillermo del Angel
dc4f63a1a8
a) consensus goes to week queue
...
b) New experimental TechnologyComposition annotation
c) SelectVariants fixes
2011-06-29 16:00:23 -04:00
Eric Banks
70ba851478
Might as well check for the illegal state and throw an exception
2011-06-29 15:59:10 -04:00
Eric Banks
1f19afe1d9
Fixed bug in the IndelRealigner: now that variants are correctly typed in VariantContext, it is possible that a variant can be an indel but neither an insertion or a deletion; added a isComplexIndel() method and now we check for such an event in the realigner (we don't use them to generate alternate consenses). Also, added a isMNP() method while I was there so that it would be consistent with other variant types.
2011-06-29 15:54:09 -04:00
Guillermo del Angel
e91ae6b265
AF matching when selecting random variants
2011-06-29 15:00:26 -04:00
Guillermo del Angel
dee10140dd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-06-29 13:58:04 -04:00
Eric Banks
8586c86bc4
My commit from last week to fix the old dbsnp rod conversion only worked for locus traversals. Updated now to work for all traversals.
2011-06-29 13:56:37 -04:00
Guillermo del Angel
5b6d279a2e
Two bug fixes:
...
a) Modified the way clipped bases are dealt with in ReadPosRankSumTest when annotating indels. Cigar string cannot be trusted because BWA can clip good high quality bases and some sites get incorrect ReadPos annotations if BWA systematically clips at an indel breakpoint.
b) PL header needs to specify "." as length. Otherwise we fail VCF validation if multiallelic sites are present.
2011-06-29 10:21:27 -04:00
David Roazen
139c6b84a1
Modified build.xml and the help extractor doclet to use the output of "git
...
describe" as an absolute version number (if the repository has at least one
tag), using the raw SHA-1 hash value as a fallback version number in the case
where there are no tags.
2011-06-28 08:37:05 -04:00
David Roazen
3c9497788e
Reorganized the codebase beneath top-level public and private directories,
...
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00