Commit Graph

2974 Commits (7dcafe8b8194ce8a9d0b8825812fd11c8f9a0612)

Author SHA1 Message Date
Guillermo del Angel ee68713267 Further Bug fixes to CountVariants: stratifications were wrong in case genotypes had no-calls, for example if we stratified by sample and a sample had a no-call, this no-call was considered a true variant and counts were incorrectly increased 2011-08-22 20:42:47 -04:00
Guillermo del Angel c270384b2e Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-22 20:39:32 -04:00
Guillermo del Angel 8ae24912f4 a) Misc fixes in Phase1 indel vqsr script,
b) More R-friendly VariantsToTable printing of AC in case of multiple alt alleles
c) Rename FixPLOrderingWalker to FixGenotypesWalker and rewrote: no longer need older code, replaced with code to replace genotypes with all-zero PL's with a no-call.
2011-08-22 20:39:06 -04:00
Mark DePristo 85c5a6f890 Merge branch 'rodTesting'
Conflicts:
	private/java/src/org/broadinstitute/sting/gatk/walkers/performance/ProfileRodSystem.java
2011-08-22 17:43:47 -04:00
Mark DePristo 1eab9be35d Now with accurate javadoc 2011-08-22 17:25:15 -04:00
Mark DePristo 3612a3501d info, not warn, about dynamic type determination 2011-08-22 17:24:51 -04:00
Eric Banks dc42571dd9 Only create the genotype map when necessary 2011-08-22 15:40:36 -04:00
Khalid Shakir c4c90c8826 Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline:
- Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size.
- Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values.
- Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8
- Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.
2011-08-22 15:13:27 -04:00
Eric Banks 2c24b68a96 Working implementation of DecodeLoc for VCF parsing. Makes indexing 3x faster. 2011-08-22 15:11:21 -04:00
Eric Banks 518b3dd291 Don't let the genotypes map be null 2011-08-22 15:10:30 -04:00
Ryan Poplin f93a554b01 updating exome specific parameters in MDCP 2011-08-21 10:25:36 -04:00
Ryan Poplin dbff84c54e Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-21 10:09:19 -04:00
Khalid Shakir 22ca44c015 Fixed Queue's tagging of RodBindings.
Fixed argument definition names.
2011-08-21 02:34:20 -04:00
Eric Banks a8cbced71b Bug fix for Ryan: check for no context 2011-08-20 22:49:51 -04:00
Eric Banks 0ccd173967 Fixing the recent SelectVariants fix 2011-08-20 21:30:08 -04:00
Ryan Poplin b008676878 fixing the previous fix 2011-08-20 21:21:55 -04:00
Guillermo del Angel 782453235a Updated VariantEvalIntegrationTest since there's a new column separating nMixed and nComplex in CountVariants
Misc updates to WholeGenomeIndelCalling.scala
Bug fix in VariantEval (may be temporary, need more investigation): if -disc option is used in sites-only vcf's then a null pointer exception is produced, caused by recent introduction of -xl_sf options.
2011-08-20 12:24:22 -04:00
Ryan Poplin 539e157ecd Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR 2011-08-20 11:28:48 -04:00
Guillermo del Angel 4939648fd4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-20 08:50:43 -04:00
Ryan Poplin a96ecbab71 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-19 19:30:05 -04:00
Ryan Poplin ddb5045e14 Updating the methods development calling pipeline for the new rod binding syntax and the new best practices. 2011-08-19 19:29:51 -04:00
Mark DePristo 8b3cfb2f1c Final documented version of GATKDoclet and associated classes
-- Docs on everything.
-- Feature complete.  At this point only minor improvements and bugfixes are anticipated
2011-08-19 16:52:17 -04:00
Mark DePristo b08d63a6b8 Documentation and code cleanup for ClipReads, CallableLoci, and VariantsToTable
-- Swapped -o [summary] and -ob [bam] for more standard -o [bam] and -os [summary] arguments.
-- @Advanced arguments
2011-08-19 15:06:37 -04:00
Mark DePristo 49e831a13b Should have checked in 2011-08-19 14:35:16 -04:00
Mauricio Carneiro 7b5fa4486d GenotypeAndValidate - Added docs to the @Arguments 2011-08-19 13:35:11 -04:00
Mark DePristo 9f7d4beb89 Merge branch 'help' 2011-08-19 13:14:02 -04:00
Mark DePristo 4d1fd17a97 GATKDoclet cleanup and documentation
-- Fixed bug in the way ArgumentCollections were handled that lead to failure in handling the dbsnp argument collection.
2011-08-19 13:13:41 -04:00
Ryan Poplin 0f25167efd minor fix in VariantEval docs 2011-08-19 11:01:04 -04:00
Mark DePristo 198955f752 GATKDoc descriptions for all standard codecs, or TODO for their owners
-- Also added vcf.gz support in the VCF codec.  This wasn't committed in the last round, because it was missed by the parallel documentation effort.
2011-08-19 09:57:21 -04:00
Guillermo del Angel 269ed1206c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-19 09:32:20 -04:00
Eric Banks 40e67cff1b I like the @Advanced annotation 2011-08-18 22:27:34 -04:00
Mark DePristo 2457c7b8f5 Merge branch 'master' into help 2011-08-18 22:20:43 -04:00
Mark DePristo 5fbdf968f7 ArgumentSource no longer comparable. Arguments sorted by GATKDoclet 2011-08-18 22:20:14 -04:00
Eric Banks 77fa2c1546 Renaming read filters with a superfluous 'Read' in their names. Kept the ones that made sense to have it (e.g. MalformedReadFilter). 2011-08-18 22:01:33 -04:00
Mark DePristo 1d3799ddf7 Merge branch 'master' into help 2011-08-18 22:00:29 -04:00
Mark DePristo d1892cd0d7 Bug fixes
-- Sorting of ArgumentSources now done in GATKDoclet, not in the ParsingEngine, as the system depends on the LinkedTreeMap
-- Fixed broken exception throwing in the case where a file's type could not be determined
2011-08-18 21:58:36 -04:00
Mark DePristo c5efb6f40e Usability improvements to GATKDocs
-- ArgumentSources are now sorted by case insensitive names, so arguments are shown in alphabetical order (Ryan)
-- @Advanced annotation can be used to indicate that an argument is an advanced option and should be visually deemphasized in the GATKs.  There's now an advanced section.  Mauricio or Ryan -- could you figure out how to make this section less prominent in the style.css?
2011-08-18 21:39:11 -04:00
Mark DePristo d94da0b1cf Moved CG and SOAP codecs to private 2011-08-18 21:20:26 -04:00
Mark DePristo f7414e39bc Improvements to GATKDocs
-- Allowed values for RodBinding<T> are displayed in the GATKDocs
-- Longest name up to 30 characters is chosen for main argument list (suggested by Ryan/Mauricio)
-- Features are listed in alphabetical order
-- Moved useful getParameterizedType() function to JVMUtils
-- Tests of these features in the Documentation Test
2011-08-18 21:20:09 -04:00
Ryan Poplin 09d099cada Added GATKDocs to the UnifiedGenotyper. 2011-08-18 20:57:02 -04:00
Mauricio Carneiro 6ef01e40b8 Complete rewrite of Hard Clipping (ReadClipper)
Hard clipping is now completely independent from softclipping and plows through previously hard or soft clipped reads.
2011-08-18 18:35:45 -04:00
Guillermo del Angel 626cbf9411 Bug fixes and cleanups for IndelStatistics 2011-08-18 16:28:40 -04:00
Guillermo del Angel 58560a6d50 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 16:17:52 -04:00
Guillermo del Angel 3dfb60a46e Fixing up and refactoring usage of indel categories. On a variant context, isInsertion() and isDeletion() are now removed because behavior before was wrong in case of multiallelic sites. Now, methods isSimpleInsertion() and isSimpleDeletion() will return true only if sites are biallelic. For multiallelic sites, isComplex() will return true in all cases.
VariantEval module CountVariants is corrected and an additional column is added so that we log mixed events and complex indels separately (before they were being conflated).
VariantEval module IndelStatistics is considerably simplified as the sample stratification was wrong and redundant, now it should work with the VE-generic Sample stratification. Several columns are renamed or removed since they're not really useful
2011-08-18 16:17:38 -04:00
Chris Hartl 6b256a8ac5 Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git 2011-08-18 15:29:24 -04:00
Chris Hartl a8935c99fc dding docs for DepthOfCoverage and ValidationAmplicons 2011-08-18 15:28:35 -04:00
Mark DePristo f2f51e35e3 Merge branch 'master' into help 2011-08-18 14:05:33 -04:00
Mark DePristo faa3f8b6f6 Only concrete classes are now documented 2011-08-18 14:04:47 -04:00
Ryan Poplin 7c4ce6d969 Added GATKDocs for the VQSR walkers. 2011-08-18 14:00:39 -04:00
Mark DePristo 5772766dd5 Improvements to GATKDocs
-- Now supports a static list of root classes / interfaces that should receive docs.  A complementary approach to documenting features to the DocumentedGATKFeature annotation
-- Tribble codecs are now documented!
-- No longer displayed sub and super classes
2011-08-18 14:00:09 -04:00
Mark DePristo e03db30ca0 New uses DocumentedGATKFeatureObject instead of annotation directly
-- Step 1 on the way to creating a static list of additional classes that we want to document.
2011-08-18 12:31:04 -04:00
Mark DePristo d4511807ed Merge branch 'master' into help 2011-08-18 11:53:37 -04:00
Mark DePristo c787fd0b70 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 11:52:45 -04:00
Mark DePristo c797616c65 If you have one sample in your BAM, getToolkit().getSamples().size() == 2
Also deleted double initializationm, where a line of code was duplicated in creating the GATK engine.
2011-08-18 11:51:53 -04:00
Mark DePristo cbec69a130 Merge branch 'master' into help
Conflicts:
	public/java/src/org/broadinstitute/sting/utils/help/HelpUtils.java
2011-08-18 11:33:27 -04:00
Eric Banks aa21fc7c9c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 11:30:59 -04:00
Mark DePristo f5d7cabb20 Fix for reintroducing an already solved problem. 2011-08-18 11:20:12 -04:00
Eric Banks a45498150a Remove non-ascii char 2011-08-18 11:18:29 -04:00
Ryan Poplin c08a9964d4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 10:58:04 -04:00
Ryan Poplin bb79d3edae Added GATKDocs for the BQSR walkers. 2011-08-18 10:57:48 -04:00
Mark DePristo 47bbddb724 Now provides type-specific user feedback
For RodBinding<VariantContext> error messages now list only the Tribble types that produce VariantContexts
2011-08-18 10:47:16 -04:00
Mark DePristo 2d41ba15a4 Vastly better Tribble help message
Here's a new example:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.1-520-g76495cd):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to parse value /humgen/gsa-hpprojects/GATK/data/refGene_b37.filtered.sorted.txt for argument refSeqRodBinding. Message: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :TYPE listing the correct type from among the supported types:
##### ERROR        Name        FeatureType   Documentation
##### ERROR      BEAGLE      BeagleFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR         BED         BEDFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_bed_BEDCodec.html
##### ERROR    BEDTABLE       TableFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR       CGVAR     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_completegenomics_CGVarCodec.html
##### ERROR       DBSNP       DbSNPFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_dbsnp_DbSNPCodec.html
##### ERROR    GELITEXT    GeliTextFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR         MAF         MafFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_features_maf_MafCodec.html
##### ERROR MILLSDEVINE     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_MillsDevineCodec.html
##### ERROR   RAWHAPMAP   RawHapMapFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR      REFSEQ      RefSeqFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR   SAMPILEUP   SAMPileupFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR     SAMREAD     SAMReadFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR      SNPEFF      SnpEffFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_snpEff_SnpEffCodec.html
##### ERROR     SOAPSNP     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_soapsnp_SoapSNPCodec.html
##### ERROR       TABLE       TableFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR         VCF     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR        VCF3     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------
2011-08-18 10:31:32 -04:00
Mark DePristo c2287c93d7 Cleanup of codec locations. No more dbSNPHelper
-- refdata/features now in utils/codecs with the other codecs
-- Deleted dbsnpHelper.  rsID function now in VCFutils.  Remaining code either deleted or put into VariantContextAdaptors
-- Many associated import updates due to code move
2011-08-18 10:02:46 -04:00
Mark DePristo 9c17d54cb6 getFeatureClass() now returns Class<T> not Class to avoid yesterday's runtime error 2011-08-18 09:39:20 -04:00
Mark DePristo c30e1db744 Better location for help utils 2011-08-18 09:38:51 -04:00
Mark DePristo 4da42d9f39 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 09:32:57 -04:00
Eric Banks c91a442be1 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 22:40:16 -04:00
Eric Banks a7b70e6bb4 Adding feature for Khalid: ability to exclude particular samples. 2011-08-17 22:28:22 -04:00
Mauricio Carneiro cc3df8f11a Moving GAV walker to public
Walker is updated to the new RodBinding system and has the new GATKDocs layout.
2011-08-17 21:55:17 -04:00
Eric Banks fa1db3913b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 21:49:25 -04:00
Eric Banks 8e83b6646b Bug fix for Chris: don't validate ref base for complex events. 2011-08-17 21:49:14 -04:00
Matt Hanna c104dd7a09 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 16:59:12 -04:00
Matt Hanna 81a792afeb Reverting optimization disable in unstable. 2011-08-17 16:58:24 -04:00
Mark DePristo 2e35592295 GATKDocs for CallableLoci 2011-08-17 16:32:01 -04:00
Guillermo del Angel c193f52e5d Fixed up examples: pasting from wiki still had old rod syntax 2011-08-17 16:29:45 -04:00
Matt Hanna 2b2a4e0795 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-17 16:26:45 -04:00
Matt Hanna 297c9e513c Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable into unstable 2011-08-17 16:24:02 -04:00
Matt Hanna a210a62ab9 Merged bug fix from Stable into Unstable 2011-08-17 16:23:31 -04:00
Mark DePristo d59e6ed274 Fix for RefSeqCodec bug and better error messages
-- RefSeqCodec bug: getFeatureClass() returned RefSeqCodec.class, not RefSeqFeature.class.  Really should change this in Tribble to require Class<T extends Feature> to get compile time type checking
-- Better error messages that actually list the available tribble types, when there's a type error
2011-08-17 16:22:07 -04:00
Matt Hanna d170187896 Disable optimization that increases marginal speed of the GATK slightly but
can produce data loss in a narrow corner case where the BGZF block(s) locations
and offsets in the last index bucket of contig n overlap exactly with the BGZF
block locations and offset in the last index bucket of contig n+1.

A proper fix that keeps the optimization has already been introduced into
unstable, but disabling the optimization is a low risk way to make sure that
users of stable experience no data loss.
2011-08-17 16:16:05 -04:00
David Roazen 53006da9a5 Improved descriptions for the SnpEff annotations in the VCF header
(based on Eric's feedback).
2011-08-17 16:09:10 -04:00
Guillermo del Angel 784fb148b9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 15:47:01 -04:00
Guillermo del Angel 671330950d Updated Beagle walker for gatkdocs format. Pushed unsupported, undocumented arguments to @Hidden 2011-08-17 15:46:31 -04:00
Andrey Sivachenko 0af68e052a Merge branch 'master' of ssh://cga1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 15:17:47 -04:00
Andrey Sivachenko a423546cdd fix: RefSeq contains records with zero coding length and the refsec codec/feature used to crash on those; now such records are ignored, with warning printed (once) 2011-08-17 15:17:31 -04:00
Andrey Sivachenko 710d34633e now the reads that are too long are truly ignored (fix of the fix) 2011-08-17 15:16:23 -04:00
Eric Banks 2f19046f0c Adding docs to the 2 beasts. Saved the worst for last. 2011-08-17 14:19:14 -04:00
Andrey Sivachenko 069554efe5 somatic indel detector does not die on reads that are too long (likely contain a huge deletion) anymore; instead print a warning and ignore the read 2011-08-17 14:05:19 -04:00
Eric Banks c405a75f54 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 13:28:25 -04:00
Eric Banks 575303ae6b Renaming for consistency and bringing up to speed with new rod system 2011-08-17 13:28:19 -04:00
Eric Banks 6d629c176c Adding docs 2011-08-17 13:27:36 -04:00
Eric Banks a21e193a9e Adding docs to 3 more walkers 2011-08-17 12:35:08 -04:00
Menachem Fromer 98acb546a9 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 12:22:29 -04:00
Menachem Fromer d1bb302d12 Added GatkDocs documentation 2011-08-17 12:21:37 -04:00
Mark DePristo 3da71a9bb6 Clean up summary 2011-08-17 12:04:45 -04:00
Mark DePristo c6fb215faf GATKDocs for VariantsToTable
-- Made a previously required argument optional, as this was a long-standing bug
2011-08-17 12:02:41 -04:00
Mark DePristo 5f794d16a7 Fixed bad character in documentation 2011-08-17 12:01:08 -04:00
Mark DePristo 9d1d5bd27a Revert "Fixed bad character in documentation"
This reverts commit a1f50c82d3cb25e5e83d36e9054d74cdee957d87.
2011-08-17 11:57:31 -04:00
Mark DePristo 78deb3f195 Fixed bad character in documentation 2011-08-17 11:57:00 -04:00
Mark DePristo 79dcfca25f Fixed bad character in documentation 2011-08-17 11:56:51 -04:00
Eric Banks b3b5d608ca Adding docs to yet more walkers 2011-08-17 09:57:19 -04:00
Eric Banks fadcbf68fd Adding docs to QC walkers 2011-08-17 09:39:33 -04:00
Mauricio Carneiro 5d6a6fab98 Renamed softUnclipped functions to refCoord*
These functions return reference coordinates, so they should be named accordingly.
2011-08-16 18:56:28 -04:00
Mauricio Carneiro ed8f769dce Fixed index for getSoftUnclippedEnd()
Unclipped end can be calculated simply by looking at the last cigar element and adding it's length in case it's a soft clip.
2011-08-16 18:54:28 -04:00
Eric Banks 5f3f46aad1 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-16 16:26:33 -04:00
Eric Banks 946f5c53fe Adding docs to more walkers 2011-08-16 16:26:26 -04:00
Mark DePristo 6e828260a0 Removed -B support. Now explodes with error if -B provided. 2011-08-16 16:13:47 -04:00
Ryan Poplin 2d5bbecd9e Merged bug fix from Stable into Unstable 2011-08-16 14:19:04 -04:00
Mauricio Carneiro 07c1e113cd Fixed interval traversal for previously hard clipped reads.
If a read was hard clipped for being low quality and no does not overlap the interval anymore, this read will now be discarded instead of treated as an error by the GATK traversal engine.
2011-08-16 14:18:05 -04:00
Ryan Poplin 9d4add3268 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-08-16 14:18:03 -04:00
Ryan Poplin 170d1ff7b6 Fix in UG for trying to call indels at IUPAC code bases when in EMIT_ALL_SITES mode 2011-08-16 14:17:46 -04:00
Mauricio Carneiro b135565183 Added low quality clipping
Clips both tails of a read if the tails are below a given quality threshold (default Q2).
*Added special treatment for reads that get completely clipped.
2011-08-16 13:51:25 -04:00
Andrey Sivachenko 9f3328db53 fixing read group name collision: before writing the read into respective stream in nway-out mode we now retrieve the original rg, not the merged/modified one 2011-08-16 13:45:40 -04:00
Eric Banks ab0b56ed11 Minor doc fixes 2011-08-16 12:55:45 -04:00
Eric Banks 125ad0bcfa Added docs to RTC 2011-08-16 12:46:48 -04:00
Eric Banks ef9216011e Added docs to IR 2011-08-16 12:24:53 -04:00
Eric Banks ab1e3d6a98 Use the right set of sample names 2011-08-16 01:03:05 -04:00
Eric Banks 36c7f83208 Refactoring VE stratifications so that they don't pass around bulky data; instead just pull needed data from the VE parent. This allows us stop using deprecated features of the rod system. 2011-08-15 16:31:57 -04:00
Eric Banks 1246b89049 Forgot to initialize variants on the merge 2011-08-15 16:00:43 -04:00
Mauricio Carneiro 993ecb85da Added Hard Clipping Tail Ends
Added functionality to hard clip the low quality tail ends of reads (lowQual <= 2)
2011-08-15 15:22:54 -04:00
Eric Banks 045e8a045e Updating random walkers to new rod system; removing unused GenotypeAndValidateWalker 2011-08-15 14:05:23 -04:00
Eric Banks fc2c21433b Updating random walkers to new rod system 2011-08-15 13:29:31 -04:00
Eric Banks 3d56bbf087 Resolving merge conflicts 2011-08-15 12:28:05 -04:00
Eric Banks 9ddbfdcb9f Check filtered status before applying to alt reference 2011-08-15 12:25:23 -04:00
Mauricio Carneiro 0d976d6211 Fixed second time clipping
When a read is clipped once, and then in the second operation, because of indels, it doesn't reach the coordinate initially set for hard clipping, the indices were wrong. This should fix it.
2011-08-15 12:04:53 -04:00
Mauricio Carneiro 489c15b99d Fixed indexing issue in coordinate conversion
When a read had been previously soft clipped, the UnclippedEnd could not be used directly as Reference Coordinate for clipping , because the read does not go that far.
2011-08-15 01:42:34 -04:00
Mauricio Carneiro c7b69a4574 Fixed integration tests 2011-08-14 16:38:20 -04:00
Mauricio Carneiro 6ae3f9e322 Wrapped clipping op information
The clipping op extra information being kept by this walker was specific to the walker, not to the read clipper. Created a wrapper ReadClipperWithData class that keeps the extra information and leaves the ReadClipper slim.

(this is a quick commit to unbreak the build, performing integration tests and will make further commits if necessary)
2011-08-14 15:44:48 -04:00
Mauricio Carneiro 8a51732049 Fixes to ReadClipper and added Reference Coordinate clipping.
* Added reference coordinate based hard clipping functions. This allows you to set a hard cut on where you need the read to be trimmed despite indels.
* soft clipping was messing up cigar string if there was already a hard clip at the beginning of the read. Fixed.
* hard clipping now works with previously hard clipped reads.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro 291d8c7596 Fixed HardClipping and Interval containment
* Hard clipping was wrongfully hard clipping unmapped reads while soft clipping then hard clipping mapped reads. Now we throw exception if we try to hard/soft clip unmapped reads and use the soft->hard clip procedure fore every mapped read.

 * Interval containment needed a <= and >= to make sure it caught the borders right.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro 0be1dacddb Refactored interval clipping utility
reads are clipped in map() and now we cover almost all cases. Left behind the case where the read stretches through two intervals. This will need special treatment later.
2011-08-14 14:54:33 -04:00
David Roazen bb4ced3201 SnpEff-related fixes.
-To correctly handle indels and MNPs, only consider features that start at the current locus,
rather than features that span the current locus, when selecting the most significant effect.

-Throw a UserException when a SnpEff rodbinding is not provided instead of simply not adding
any annotations and silently returning.
2011-08-12 15:26:24 -04:00
Mauricio Carneiro 10e873d9c6 Merge branch 'repval' 2011-08-12 15:24:31 -04:00
Guillermo del Angel 31dc831531 Merged bug fix from Stable into Unstable 2011-08-12 13:26:41 -04:00
Menachem Fromer 9121b8ed65 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-12 12:24:19 -04:00
Menachem Fromer 7ed120361d Fixed bug that required symbolic alleles to be padded with reference base and added integration test to test parsing and output of symbolic alleles 2011-08-12 12:23:44 -04:00
Eric Banks 7ea9196321 Better error message for name/type clashes. 2011-08-12 11:18:14 -04:00
Eric Banks 27f0748b33 Renaming the HapMap codec and feature to RawHapMap so that we don't get esoteric errors when trying to bind a rod with the name 'hapmap' (since it was also a feature). 2011-08-12 11:11:56 -04:00
Menachem Fromer c7ca33cbff Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-12 10:12:09 -04:00
Eric Banks 41f3da75d7 Implementation in VE was confusing 'variant' status vs. 'polymorphic' status. This led to issues because we now match types of eval and comp; specifically, subsetting a VC to a monomorphic sample can't change the 'variant' status of the VC (it's still a variant site or otherwise we'll never match the comps, which breaks GenotypeConcordance). CountVariants really got this wrong. Fixed. VE now passes all integration tests. 2011-08-12 02:22:44 -04:00
Eric Banks eba316621d Finish moving VE over to new rod system and fixing up the type inconsistency between eval and comp rods. Now the novel count is always 0 under the known stratification. :) 2011-08-12 00:40:08 -04:00
Menachem Fromer 9de06560df Update to new RodBinding system 2011-08-11 17:54:16 -04:00
Eric Banks 90771b74b4 When matching eval to comps, try to choose the one with the same alt allele. 2011-08-11 13:55:01 -04:00
Eric Banks 200f73b008 No reason to warn the user anymore because it's no longer possible for them to specify a dbsnp file on the command-line. 2011-08-11 13:44:07 -04:00
Eric Banks e93538cdf7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-11 13:39:36 -04:00
Eric Banks 265c3d744b Fixing VariantEval logic and having it use the new rod system. 2011-08-11 13:39:34 -04:00
Ryan Poplin b705d9cf15 Oops, these VariantAnnotator input bindings aren't needed during the UG 2011-08-11 13:17:16 -04:00
Ryan Poplin 7fade88070 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-11 11:02:47 -04:00
Ryan Poplin c7b9a9ef0a Updating UnifiedGenotyper to use the new rod binding system. 2011-08-11 11:02:11 -04:00
Mark DePristo 418a4d541f Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-11 11:01:38 -04:00
Mark DePristo e71255d3c2 GATKDocsExample walker
-- Shows the best practice for documentating a walker with the GATKdocs
-- See http://www.broadinstitute.org/gsa/wiki/index.php/GATKdocs#Writing_GATKdocs_for_your_walkers for a brief discussion
2011-08-11 11:01:21 -04:00
Ryan Poplin 79c86e211f Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-11 09:59:20 -04:00
Ryan Poplin ea42ee4a95 Updating BQSR for the new rod binding system. 2011-08-11 09:58:42 -04:00
Mark DePristo 8cdc0cbd9c Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-11 08:58:49 -04:00
Mark DePristo 40e06f9afb Fixed broken RodBinding defaults.
-- Verified now to be correct at runtime
-- UnitTest covers this
-- createTypeDefault now takes a Type, not a Class, so that parameterized classes can have their parameter fetched in the defaults.
2011-08-11 08:58:30 -04:00
Ryan Poplin dd5fe8291d Fixing up some comments in the BQSR 2011-08-11 08:36:00 -04:00
Eric Banks f1b09db39e Fixes for rod bindings 2011-08-10 23:08:47 -04:00
Eric Banks 75985c2fa0 Resolving merge conflicts 2011-08-10 22:45:11 -04:00
Eric Banks bdb1da30fd Better interface for getting RodBindings to the VariantAnnotatorEngine and its annotations: pass around an AnnotatorCompatibleWalker (interface) object. Updating VA to use the new rod system. 2011-08-10 22:43:08 -04:00
Mark DePristo 0086e27741 makeUnbound now package protected
-- Removed references to it in the codebase
-- Fixed documentation I saw that had the summary + body style
2011-08-10 22:29:32 -04:00
Mark DePristo cb6cf25bb0 Updating SelectVariants documentation to reflect best practice 2011-08-10 22:24:18 -04:00
Mark DePristo 00b4d6ec57 Updated the best practice on documenting a field
-- Best practice is now to skip the summary, as this is the @annotation doc value.
2011-08-10 22:21:12 -04:00
Mark DePristo 2007d2fcad Better documentation for default value fields
-- DocString function for types that create default outputs "stdout"
-- RodBinding now creates a makeUnbound default value automatically for you if your RodBinding isn't required
-- Removed warning about sparse help from TextFormattingUtils
2011-08-10 22:16:22 -04:00
Mauricio Carneiro bb557266ca Merge branches to get new RodBinding framework
Conflicts:
	private/java/src/org/broadinstitute/sting/gatk/walkers/replication_validation/ReplicationValidationWalker.java
2011-08-10 18:23:01 -04:00
Guillermo del Angel 8325cb8c26 Fixing up apparent source control/merge snafu: fix to correctly output PL ordering in multi-allelic sites by UG was only half-committed and hence not working. This completes fix 2011-08-10 15:31:49 -04:00
Eric Banks 07ad8c78a9 More tools moved over. Fixed the VariantContextIntegrationTest which was not useful because the md5s were all removed. In the future, instead of removing md5s (putting it in 'parameterization' mode), you should instead use @Test{enabled=false} since it's easier to track. 2011-08-10 14:24:40 -04:00
Eric Banks 8d14d32a62 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-10 13:42:37 -04:00
Eric Banks 749c8bfbcd Moving more tools over to the new rod system 2011-08-10 13:42:35 -04:00
David Roazen 0497170bc9 SnpEffCodec now implements SelfScopingFeatureCodec so that we no longer have to specify the codec name on the command line for SnpEff files. 2011-08-10 13:12:09 -04:00
David Roazen 577f861f69 Pass the rodBindings into the VariantAnnotator engine, and from there to the
annotation classes themselves.
2011-08-10 13:11:57 -04:00
David Roazen 480e7a7984 Correctly initialize the optional SnpEff rod binding in VariantAnnotator using
RodBinding.makeUnbound()
2011-08-10 12:25:26 -04:00
Eric Banks a42f90db11 Moving more tools over to use the standard VC arg collection. Also, while I'm in there, I removed all of the empty references to @Requires given that it's no longer relevant. 2011-08-10 12:20:18 -04:00
Eric Banks c884b6bf1f Fixed comment 2011-08-10 12:07:43 -04:00
Eric Banks 06cdc4d5f9 Added a StandardVariantContextInputArgumentCollection that is now used for consistency by many of the core tools. 2011-08-10 12:00:56 -04:00
Ryan Poplin bc125f104a TrainingSets class is obsolete now. 2011-08-10 10:23:33 -04:00
Ryan Poplin c60cf52f73 Updating VQSR for new RodBinding syntax. Cleaning up indel specific parts of VQSR. 2011-08-10 10:20:37 -04:00
Eric Banks 1ea5ec276b Minor cleanup 2011-08-09 23:28:59 -04:00
Eric Banks bc2d4f554d Bringing Indel Realigner up to speed with the new rod binding syntax; now use -known to specify the known indels track. 2011-08-09 23:21:17 -04:00
Eric Banks b8f572b571 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-09 23:19:51 -04:00
Eric Banks 08631546c8 Partial commit for David so he can see what I want to do with the VariantAnnotator. Added a DbsnpArgumentCollection that people can use in their walkers to ensure that we have a standard syntax whenever allowing dbsnp rods. Added it to UG, but didn't hook it up. Maybe we should do the same for the 'variant' rod? 2011-08-09 23:19:40 -04:00
Mark DePristo 86afe878a7 ReducedRead optimization: single pass likelihood calculation
-- Low level add() now takes a nObs argument and rather than += likelihood now does += nObs * likelihood
2011-08-09 20:55:15 -04:00
Eric Banks 489e5cffc1 Missed a few 'variants' 2011-08-09 14:29:15 -04:00
Eric Banks b20c4d5286 Thanks to Mark for agreeing to transition from 'variants' back to 'variant'. I think I got them all but I've been jumping all around the code, so there might be a straggler or two. 2011-08-09 12:04:55 -04:00
Eric Banks 78aa6db076 added the 'reference' header line too. We are now header-compliant for vcf4.1. 2011-08-09 11:45:54 -04:00
Eric Banks ec76bf6d4a VCF headers now include 'contig' lines describing the name, length, and assembly (when easily parsable) for each contig in the reference. 2011-08-09 11:24:48 -04:00
Eric Banks 7afb5c9f1c More updates to be consistent with the new rod syntax. 2011-08-09 10:11:37 -04:00
Eric Banks 70b3daf689 VariantsToVCF is up and running again; integration tests are reenabled (and added one for dbSNP).ant 2011-08-09 03:03:43 -04:00
Mauricio Carneiro d15852be0a Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-09 00:04:59 -04:00
Mauricio Carneiro 2db6225c53 A read filter that sets all mapping qualities to a given value
Pacbio has decided to assign 255 to the MQ of all their reads since they claim their aligner does not produce a number equivalent to a mapping quality. Despite much back and forth, they are dead set on not using this field, so if we want to use their bams, we will need to override that. This filter does just that. Replacing all values with a given one. Default is 60.
2011-08-09 00:04:42 -04:00
David Roazen 2efa376619 Made the necessary changes to get SnpEff support working with the new rodbinding system. 2011-08-08 23:29:39 -04:00
David Roazen b180a1311a Merge branch 'snpEff' 2011-08-08 22:12:14 -04:00
David Roazen a13bc7b929 Added an integration test for the SnpEff annotation support, as well as some extra safety checks and comments. 2011-08-08 20:01:24 -04:00
Mark DePristo 80924d24de Single positional arguments are now treated as names unless they actually match a tribble feature 2011-08-08 19:26:27 -04:00
Mark DePristo f8a56bc64b Merge branch 'master' into rodRefactor 2011-08-08 16:58:18 -04:00
Mark DePristo f8ad91b16f Reverting a bunch of bad -B type drops 2011-08-08 16:57:38 -04:00
David Roazen 5e288136e0 Added unit tests for the SnpEff codec, and made minor adjustments to the codec itself. 2011-08-08 16:51:43 -04:00
Eric Banks d7813db217 Combine Variants was actually outputting invalid VCFs in cases where it was combining Variant Contexts with different alternate alleles: if any of the genotypes had PLs they were no longer valid/correct. Added a check for such cases (the combined VC has more alleles than an original VC) and strip out the PLs when triggered; added integration test to cover it. I also added the check to Select Variants, although it currently doesn't remove unused alleles so it should never trigger. Is there any reason not to strip out unused alleles after a select? 2011-08-08 16:25:35 -04:00
Mark DePristo 383bb6f0e0 Merge branch 'master' into rodRefactor 2011-08-08 15:25:55 -04:00
Mark DePristo ba7353c561 Updated IntegrationTests to use the new type free format for VCF files 2011-08-08 15:04:38 -04:00
Mark DePristo 0810c42309 GATK now does dynamic type determination for VCF files
Added UnitTests covering all of the cases.
2011-08-08 14:45:46 -04:00
Mark DePristo e36994e36b Refactored a FeatureManager class from RMDTrackBuilder
New class handles (vastly more cleanly) the db of tribble codecs, features, and names for use throughout the GATK.
Added SelfScopingFeatureCodec interface that allows a FeatureCodec to examine a file and determine if the file can be parsed.  This is the first step towards allowing the GATK to dynamically determine the type of a RodBinding.
2011-08-08 14:04:46 -04:00
Eric Banks 197169e47b Submitting patch from Larry Singh to make MathUtils compatible with java 1.7 2011-08-08 13:34:04 -04:00
David Roazen dd974040af When finding the highest-impact effect at a locus, all effects that are not within a
non-coding gene are now considered higher impact than all effects that are within a
non-coding gene.
2011-08-08 13:29:54 -04:00
David Roazen c1061e994c Initial support for adding genomic annotations through VariantAnnotator using
the output from the SnpEff tool, which replaces the old Genomic Annotator.
2011-08-08 13:29:53 -04:00
Mark DePristo 0db79207e8 Refactored dependancy from CommandLineGATK from javadocs
This allows us to run the GATK again in environments without Javadoc loading by default in the classpath
2011-08-08 12:27:13 -04:00
Mark DePristo e5fde0d16b Merge branch 'master' into rodRefactor 2011-08-08 10:08:43 -04:00
Mark DePristo 526b524c3c CombineVariants with new RodBinding. Bugfix
-- CombineVariants now uses the new RodBinding syntax, -V / --variants.  Passed all integration tests on first run
-- Exposed gapping bug in the List<RodBinding<T>> system now fixed.  ParserEngine now has a addRodBinding() that is called by RodBindingArgumentTypeDescriptor when it encounters each RodBinding.  This allows the system to work with collection types that are recursively parsed by the system.
2011-08-07 20:16:51 -04:00
Ryan Poplin 6693407bd8 Merged bug fix from Stable into Unstable 2011-08-07 17:39:03 -04:00
Mark DePristo 5f8bc3aa8a Documenting classes, and name cleanup 2011-08-07 15:17:50 -04:00
Mark DePristo 1c63d43176 Help now points to GATKDocs instead of spitting out full, garbled description 2011-08-07 15:02:46 -04:00
Mark DePristo b0e91f85cf fix merge from Khalid's Queue fix 2011-08-07 10:33:20 -04:00
Mark DePristo 4d88e72958 Merge remote-tracking branch 'remotes/khalid/rodRefactor' into rodRefactor
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java
	public/java/test/org/broadinstitute/sting/BaseTest.java
2011-08-07 10:32:27 -04:00
Khalid Shakir f049461120 Changed @Argument to @Input on input RodBindings.
Changed shortname collision with longname.
Restored scala builds.
Updated HSP to use new syntax.
2011-08-06 20:44:19 -04:00
Mark DePristo d7f98e5c2a Fixed merge conflict deleting a { 2011-08-04 18:48:34 -04:00
Mark DePristo 75632abf88 Merge branch 'master' into rodRefactor
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToVCF.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/indels/RealignerTargetCreatorIntegrationTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java
2011-08-04 18:44:14 -04:00
Mark DePristo f21f7f6335 SelectVariants fully documented, now the shining example of the new RodBinding system. 2011-08-04 18:28:59 -04:00
Mark DePristo 9be1ee59cc TODO comments for Eric 2011-08-04 18:07:50 -04:00
Mauricio Carneiro b22a3d6508 Functional VCF output.
It is outputting a VCF with the 'second best guess' for the alternate allele correctly. Annotations are added at the pool level, but may get overwritten at the lane and site level. Still need to implement the merging of the the annotations at higher levels.
2011-08-04 17:49:08 -04:00
Guillermo del Angel a8eb8c27f0 a) Minor changes to indel consensus scripts to better reflect good default values, b) Fixed up Mills/Devine codec so it always produces correct ref padded bases, and added option to VariantsToVCF to fix reference base 2011-08-04 15:34:49 -04:00
Ryan Poplin 98a96f07c1 Updated standard deviation parameter in VQSR to our current recommended value 2011-08-04 14:06:26 -04:00
Eric Banks e48492f3c3 Validate that the reference padding base for indels is correct. 2011-08-04 12:48:56 -04:00
Mark DePristo f0d798d47c Bug fix: call RodBinding.resetNameCounter() in new ParsingEngine() so that we don't magically misnumber arguments in the integration tests where the GATK is only instantiated once. 2011-08-04 12:06:10 -04:00
Mark DePristo d0279bb28c RodBinding names are now defaulting to the ArgumentTypeDescriptor fullname
Nearly all of the tools are passing integrationtests
2011-08-03 20:48:11 -04:00
Mark DePristo 0ef85647f7 A working version of a GATKReportDiffableReader for the diffEngine! 2011-08-03 18:21:18 -04:00
Mark DePristo acbd3d0922 Fixing up integration tests so more 2011-08-03 17:26:35 -04:00
Mark DePristo 8f696c7731 Continuing progress towards RodBinding 1.0
-- Cleaning up old interface to RMDT, docs and contracts added
-- Proper type checking for RodBinding for cases where the Tribble type isn't found or is the wrong type
2011-08-03 17:19:28 -04:00
Mark DePristo 800bb97f0b Removed getFeaturesAsGATKFeature and created createGenomeLoc(Feature) in genomeLocParser
Updated all walkers that used the now deleted methods.
2011-08-03 16:04:51 -04:00
Mark DePristo f6563c0f9f Removed support for RMD in @Requires and @Allows
Merge as well

Conflicts:
	private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java
	public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-03 15:36:55 -04:00
Mark DePristo 79e4a8f6d3 Merge
Conflicts:
	private/java/src/org/broadinstitute/sting/gatk/walkers/qc/TestVariantContextWalker.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/phasing/PhaseByTransmission.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantrecalibration/VariantDataManager.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/SelectVariants.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantValidationAssessor.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersIntegrationTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/recalibration/RecalibrationWalkersPerformanceTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/varianteval/VariantEvalIntegrationTest.java
	public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-03 15:09:47 -04:00
Mark DePristo 38efd3066c Bug fix for mask RodBinding 2011-08-03 14:58:18 -04:00
Eric Banks f62f47d476 Not sure why this didn't fail before, but bringing VE up to date with previous changes 2011-08-03 14:27:07 -04:00
Mark DePristo b25140db83 Contracts and documentation for some of RefMetaDataTracker
Continuing to fix integration tests that don't pass / run
2011-08-03 13:34:20 -04:00
Eric Banks f6648e0144 Don't left-align complex indels because it's too complicated. 2011-08-03 12:03:50 -04:00
Mark DePristo 85c67e9891 Contracts and documentation for Rodbinding 2011-08-03 11:16:06 -04:00
Eric Banks 5dc324ff35 Dealing with merge confict 2011-08-03 11:03:47 -04:00
Eric Banks 7c89fe01b3 Instead of having the padded reference base be some hackish attribute it is now an actual variable in the Variant Context class. More importantly, we now always require that it be present when padding is necessary - and validate as such upon construction of the VC. This cleans up the interface significantly because we no longer require that a reference base be passed in when writing a VC/VCF record. 2011-08-03 11:00:36 -04:00
Khalid Shakir 5dcac7b064 GATKReport v0.2:
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
2011-08-03 00:24:47 -04:00
Mark DePristo 2874835997 Bug fix for type checking RodBindings
Now compares the feature class not the codec class.
UnitTests improvements
integrationtests on their way to actually running
2011-08-02 22:25:41 -04:00
Mark DePristo b5e843f8f0 Approaching the end for the new RodBinding system
-- support for explicit naming of bindings (-X:name,type x)
-- support for automatic naming of bindings in lists (-X:vcf foo.vcf -X:vcf bar.vcf will generate internal names X and X2)
-- ParserEngineUnitTest expanded to cover all of the Rodbinding cases
-- RodBindingUnitTest tests all of the low-level accessors
-- Parsing engine throws UserExceptions when bad bindings are provided on the command line
2011-08-02 22:00:06 -04:00
David Roazen d3437e62da Added a simple utility method Utils.optimumHashSize() to calculate the optimum
initial size for a Java hash table (HashMap, HashSet, etc.) given an expected
maximum number of elements. The optimum size is the smallest size that's
guaranteed not to result in any rehash / table-resize operations.

Example Usage:
Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements));

I think we're paying way too heavy a price in unnecessary rehash operations across
the GATK. If you don't specify an initial size, you get a table of size 16 that gets
completely rehashed and doubles in size every time it becomes 75% full. This means you
do at least twice as much work as you need to in order to populate your table:

(n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n
2011-08-02 21:59:06 -04:00
Mark DePristo 83891271b5 --variants throughout integrationtests 2011-08-02 20:28:47 -04:00
Mark DePristo 3a27a25cfc Validates that the tribble binding provides the right object types at startup
Tests to ensure this remains working
2011-08-02 20:11:24 -04:00
Guillermo del Angel df37716857 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 18:27:13 -04:00
Mark DePristo e4a67f3df1 RefMetaDataTracker has complete set of get() functions for List<RodBinding<T>>
Including unit tests
2011-08-02 14:28:35 -04:00
Mark DePristo 03741fb640 Merge branch 'master' into rodRefactor
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/annotator/VariantAnnotatorEngine.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerIntegrationTest.java
	public/java/test/org/broadinstitute/sting/gatk/walkers/indels/IndelRealignerPerformanceTest.java
	public/java/test/org/broadinstitute/sting/utils/variantcontext/VariantContextIntegrationTest.java
2011-08-02 14:21:58 -04:00
Mark DePristo a366f9a18d Updating tools to use the RodBinding<T> syntax 2011-08-02 14:05:51 -04:00
Ryan Poplin c0653514b3 minor update to comment in UG 2011-08-02 13:34:48 -04:00
Ryan Poplin 2ba57bb502 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-02 13:30:46 -04:00
Ryan Poplin 38e4ae4176 minor update to comment in UG 2011-08-02 13:30:38 -04:00
Guillermo del Angel 821bbfa9e0 Bug fixes and enhancements to run whole-genome indel VQSR, removed old chr20-only code and cleanup 2011-08-02 13:17:20 -04:00
Eric Banks 65c5d55b72 Not sure how I missed these. These lines are now superfluous. 2011-08-02 12:48:36 -04:00
Eric Banks 2c5e526eb7 Don't use the mismatch fraction by default in the RealignerTargetCreator (since it's only useful when using SW in the indel realigner). Also, no more use of -D but instead move over to using VCFs. One integration test is temporarily commented out while I wait for a VCF file to get fixed. 2011-08-02 10:34:46 -04:00
Eric Banks 5626199bb6 The Unified Genotyper now does NOT emit SLOD/SB by default; to compute SB use --computeSLOD 2011-08-02 10:14:21 -04:00
Mark DePristo 184030dd56 RefMetaDataTracker no longer automagically converts inputs to VariantContexts
This was no longer working properly given that DBSNP indels needed to be moved around.  The adaptor system is being refactored and you will need to convert files from X -> VCF for many tools to work.
2011-08-01 15:21:16 -04:00
Mark DePristo 8b1adb8c95 Removed getVariantContext() code 2011-08-01 13:41:09 -04:00
Eric Banks 3a9b6eacdf Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-01 11:23:18 -04:00
Mark DePristo 7b07c4e04e RefMetaDataTracker now has get() methods accepting RodBindings
RodBinding no longer duplicates the get() methods in RMDT.  This is just an object now that connects the command line system to the RMDT.
Updated programs to use new style
Added UnitTests for the RodBinding accessors.
2011-07-30 15:34:11 -04:00
Mark DePristo a6691ab2fd List<RodBinding<T>> now working (sort of).
At least the argument parsing system tolerates it.
2011-07-29 16:11:22 -04:00
Mark DePristo 6acb4aad3b RodBinding<T> are properly generic now.
VariantContextRodBinding removed, as RodBinding<VariantContext> is the right style now.
2011-07-29 14:37:12 -04:00
Mark DePristo 3b799db61a RefMetaDataTracker cleanup and unit tests
You know have to provide an explicit list of RODRecordLists upfront to the constructor.  RefMetaDataTracker is now immutable.  Changes in engine to incorporate these differences
Extensive UnitTests for RefMetaDataTracker now.
2011-07-29 13:23:17 -04:00
Ryan Poplin b06deac9ea Merged bug fix from Stable into Unstable 2011-07-29 10:02:36 -04:00
Ryan Poplin c0d4110ffd Correcting redundant warning text. 2011-07-29 10:01:11 -04:00
Mauricio Carneiro a58ddab93b minQual and minPower filters added. VCF output added.
Calls are now made based on the likelihood AC model. Two filters are applied: minQual and minPower. Output is now a VCF file with the variant context. It's now called the gatk's PoolCaller, no longer Replication Validation framework. Lots of testing ensue....
2011-07-28 18:58:36 -04:00
Mark DePristo 39b4e76fde Continuing refactoring of RefMetaDataTracker.
On the path towards converging getVariantContext() and getValues() in tracker so that we can have a single approach to get values from RODs with the new RodBinding() types
2011-07-28 17:48:28 -04:00
Mark DePristo 7c5c656b46 Uncovered fundamental accounting bug in VariantEval. Will be fixed by dev. team
Problem is that Novelty sees multiple records at a site (SNP, INDEL) to calculate whether a site is novel, but VariantEvalWalker makes an arbitrary decision which to use for analysis and CompOverlap may not see a comp record of the same type as eval.  So you get lines where the stratification is known but there are 10 novel sites!
2011-07-28 14:19:27 -04:00
Eric Banks 33b32c4211 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-28 13:57:22 -04:00
Eric Banks 7a2a65155f Merged bug fix from Stable into Unstable 2011-07-28 13:56:43 -04:00
Eric Banks 1afc49a297 There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor. 2011-07-28 13:55:58 -04:00
Mark DePristo f7a126722b Cleaned up VariantContext accessors in RefMetaDataTracker
It's no longer possible to provided allowed types, as this was a very rarely used feature in the engine.  These get methods have been removed and local uses replaced with tests directly in their code.  This simplified the RefMetaDataTracker significantly
VariantContextRodBinding now forwards along all of the RefMetaDataTracker methods, so it is possible to create a full equivalent VariantContextRodBinding now as a walker field variable.
All walkers updated to the new RefMetaDataTracker function call style
2011-07-28 00:16:34 -04:00
Mark DePristo c83f9432eb Cleaned up RefMetaDataTracker
Renamed many functions to more clearly state what they are actually doing
Removed unnecessary / unused functionality, reducing interface complexity
Updated all uses of this code in GATK
Added generic, type-safe accessors to RefMetaDataTracker such as public <T> List<T> getValues(final String name, Class<T> clazz)
Added standard refMetaDataTracker accessors to RodBinding, so you can do everything you can for generic rods with the tracker directly with with the RodBinding
2011-07-27 23:25:52 -04:00
Mark DePristo f3ad4ec94b Removed annoying FastaSequenceIndexBuilderProgressListener infrastructure that was just a boolean switch on whether to print progress or not. 2011-07-27 22:06:23 -04:00
Eric Banks ff31fa7990 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-27 16:15:23 -04:00
Eric Banks 5809a61b20 Merged bug fix from Stable into Unstable 2011-07-27 16:14:59 -04:00
Eric Banks 64aad67b5f Fixing dbSNP adaptor for complex indels (wasn) 2011-07-27 16:13:45 -04:00
Mark DePristo 15be383d5b Merge branch 'master' into rodRefactor 2011-07-27 15:36:49 -04:00
Mark DePristo 38a2518668 Merge branch 'master' into rodRefactor 2011-07-27 15:34:54 -04:00
Mark DePristo 60db6cc836 Warnings for old ROD system use.
Removed unused class GATKRODFeature
2011-07-27 12:39:12 -04:00
Mark DePristo 097828a466 ParsingEngine now maintains the list of rodBindings
No longer try to reparser objects to find the right fields
Direct support in RodBinding for getTags()
2011-07-27 11:36:53 -04:00
Mauricio Carneiro 20a3b31b61 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-26 19:29:45 -04:00
Mauricio Carneiro 321afac4e8 Updates to the help layout.
*New style.css, new template for the walker auto-generated html. Short description is no longer repeated in the long description of the walker.

 *Updated DiffObjectsWalker and ContigStatsWalker as "reference" documented walkers.
2011-07-26 19:29:25 -04:00
Kiran V Garimella 405e521d44 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-26 17:56:48 -04:00
Kiran V Garimella 412c466de6 Bug fix, wherein triple-hets after genotype refinement need to be left unphased, not just prior to refinement 2011-07-26 17:43:43 -04:00
Matt Hanna fec495e292 Fix a nasty little bug in the sharding system: if the last shard in contig n
overlaps exactly on disk with the first shard in contig n+1, the shards
would be merged together to avoid duplicate extraction.  Unfortunately,
the interval overlap filter couldn't handle shards spanning contigs, and
was choosing to filter out reads from contig n+1 which should have been
included.
I'm not completely sure why the BAM indexing code would ever specify that the
end of one chromosome had the same on-disk location as the start of the next
one.  I suspect that this is a indexer performance bug.
2011-07-26 15:43:20 -04:00
Mark DePristo 9dfb57168a RodBinding source is no longer assumed to be a file 2011-07-26 13:59:44 -04:00
Mark DePristo d0badd5bd6 RodBinding subclassed to VariantContextRodBinding for easy access to VariantContext providing RODs 2011-07-26 13:54:55 -04:00
Mark DePristo 7ab8b53339 Support for List<RodBinding> argument type 2011-07-26 11:37:31 -04:00
Mark DePristo 38969b9783 Prototype of RODBinding @Arguments instead of -B syntax
Initial version of RodBinding class.
Flow from walker Rodbinding @Arguments -> RMDTriplet (old system) -> GATK engine (standard).  Will need refactoring.
2011-07-26 11:09:06 -04:00
Matt Hanna 088fc39308 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-25 15:54:56 -04:00
Eric Banks a53aeb75ab Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-25 15:10:35 -04:00
Eric Banks a29554e565 Removing the Genomic Annotator and its supporting classes 2011-07-25 15:10:25 -04:00
Mark DePristo 3afcb3415d Max of 1000 records will be loaded and compared to avoid heap size problem. 2011-07-25 14:58:31 -04:00
Mark DePristo f3049fba63 refdata directory cleanup
Removing unused files RODRecordIterator, ReferenceOrderedData, QueryableTrack, RMDTrackCreationException, GATKFeatureIterator, ReferenceOrderedDataUnitTest
Refactored dbSNP and refseq utilities to be closer to the other files implementing these features
2011-07-25 13:21:52 -04:00
Matt Hanna 8014fad6ff Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-25 13:20:44 -04:00
Matt Hanna 2ac490dbdf Fix improper detection of command-line arguments with missing values. 2011-07-25 13:20:00 -04:00
Mark DePristo 90947ab359 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-25 12:53:56 -04:00
Mark DePristo acda8eb09c Commented out test that causes new CommandLineGATK() to fail 2011-07-25 12:43:27 -04:00
Mauricio Carneiro 95b48eface Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable into repval 2011-07-25 12:09:09 -04:00
Kiran V Garimella 357f503a21 Merge branch 'desktop' 2011-07-25 11:36:27 -04:00
Kiran V Garimella 0b43ee117c Added the required=false tag to the -noST and -noEV arguments so the auto-help output doesn't look weird (i.e. listing arguments as required when their value has already been specified by default). 2011-07-25 11:35:34 -04:00
Kiran V Garimella bbb8473f03 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-25 10:59:00 -04:00
Mark DePristo 1a268ff1fd Refactor so that GenotypeAnnotation and InfoFieldAnnotation share common superclass VariantAnnotatorAnnotation 2011-07-25 10:55:09 -04:00
Mark DePristo 7f8e6a97ee InfoFieldAnnotation now an abstract class extended by annotations so doc system works 2011-07-25 10:47:11 -04:00
Mauricio Carneiro 4c6c16f895 Documented following the new gatkdoc framework 2011-07-25 00:25:08 -04:00
Mark DePristo 2039ce6102 Default values now displayed in arguments
DiffEngine fixed so that newInstance() would work.  Pretty quickly encountered a situation where newInstance() failed.  Debug output now written when this occurs in the log.
Logger now used instead of standard out, with INFO the default level.
2011-07-24 22:56:55 -04:00
Mark DePristo c43b5981f2 Hidden variables are hidden by default. Settable by command line option
DiffObjectsWalker test arguments removed.
Minor refactoring of GATKDoclet
2011-07-24 20:52:44 -04:00
Mark DePristo 1c1f1da349 Fixing compilation 2011-07-24 20:01:59 -04:00
Mark DePristo 9f06f6c493 Split GATKDoclet from ResourceBundleDoclet. Refactored GaTKDocWorkUnit 2011-07-24 20:00:04 -04:00
Mark DePristo ff85687679 Merge branch 'master' into help 2011-07-24 18:14:32 -04:00
Mark DePristo 83996f7951 Enumerated types are working. 2011-07-24 18:14:21 -04:00
Mark DePristo 3c34e9fa65 Cleanup emuns and tables 2011-07-24 17:45:58 -04:00
Mark DePristo c620d96c96 Inline enum documentation is working 2011-07-24 17:22:14 -04:00
Mark DePristo 793e7d3d1d Improved header and argument details
Argument detail structure cleaned up. Only relevant pieces of information are shown now, and in a cleaner layout.
Misc. cleanup in the code.
2011-07-24 16:36:25 -04:00
Mark DePristo c6af4efcdc Implemented see also and version header 2011-07-24 16:10:17 -04:00
Mark DePristo 5e0fe2d0f9 Support for style.css via refactored common.html included in all files 2011-07-24 15:42:39 -04:00
Mark DePristo d0ab6bf7a9 Now links to sub and superclass documentation, where possible. 2011-07-24 09:56:17 -04:00
Mark DePristo e2dabb70b8 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-24 08:57:47 -04:00
Mauricio Carneiro 7ffedf211c Contig comparator -- sorting contigs like Picard
This is very useful if you want to output your text files or manipulate data in the usual chromosome ordering :
 1
 2
 3
 ...
 21
 22
 X
 Y
 GL???
 ...

 Just use this comparator in any SortedSet class constructor and your data will be sorted like in the BAM file.
2011-07-24 02:33:19 -04:00
Mark DePristo 6b501e267b Includes non-concrete classes in docs
CommandLineGATK has extraDocs to ReadFilter and UserException as well
2011-07-23 22:15:01 -04:00
Mark DePristo 7420ed098e Semi-working version of extraDocs tag in annotation to refer to one capability being accessible in another
Required a significant refactoring of the GATKDoclet, which now has a unified place where the ClassDoc, class, annotation, and handler are all stored together.
2011-07-23 22:07:30 -04:00
Mark DePristo 999acacfa1 Merge branch 'master' into help 2011-07-23 20:19:33 -04:00
Mark DePristo 1d3bcce2c4 Merge branch 'master' into NoDistributedGATK 2011-07-23 20:04:50 -04:00
Mark DePristo e262f4e10b gatkdoc now generalized to use @Annotation. Multiple subsystems now use annotation to receive docs
Index expanded to use summary() annotation field
UserExceptions, ReadFilters, GATK engine all use the system to generate docs
Doclet expanded to handle lots of new cases
2011-07-23 20:00:35 -04:00
Kiran V Garimella 1dba8b768c Merge branch 'laptop' 2011-07-23 01:39:15 -04:00
Kiran V Garimella 57e3d136eb Don't try to phase triple-hets either. 2011-07-23 01:38:58 -04:00
Kiran V Garimella 5af9d50183 Merge branch 'laptop' 2011-07-23 01:12:06 -04:00
Kiran V Garimella 5521919cc9 Fixed bug where variants to phase were not being selected properly. 2011-07-23 01:11:28 -04:00
Kiran V Garimella 7da99388ac Merge branch 'laptop' 2011-07-23 01:01:11 -04:00
Kiran V Garimella 58eed20b83 Copy all entries from the attributes map, rather than attempting to modify an unmodifiable map. 2011-07-23 01:00:46 -04:00
Kiran V Garimella ffa361f57f Merge branch 'laptop' 2011-07-23 00:50:38 -04:00
Kiran V Garimella 9417ba8c2c Modified to accept multi-sample VCFs, removed the application of filters, and changed transmission probability field to be a genotype field rather than an INFO field. 2011-07-23 00:48:26 -04:00
Mark DePristo 28b9432d26 Docs for read filters, the engine, and the UserExceptions. 2011-07-22 16:09:21 -04:00
Kiran V Garimella 051c1dc639 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-22 15:59:00 -04:00
Mark DePristo f0be7348be Generalized handler to allow it to be used with any arbitrary class structure.
DocumentedGATKFeature now includes a field for the group name.
Build.xml works with public / private now.
2011-07-22 14:07:40 -04:00
Mark DePristo 453954182e Generalized the documentation system to use a class-specific annotation and processor.
Need to generalize and bug fix the system.  But at a high level it's working now.
2011-07-22 13:18:33 -04:00
Kiran V Garimella b8a0fd2a8d Multiply fractionRandom by 100.0 so that the line that indicates the percentage of variants that will be output says (for instance) 90%, not 0.9% 2011-07-22 11:54:59 -04:00
Mark DePristo 9e88d51db9 Removed now unused @version tags from walker docs. 2011-07-22 09:57:03 -04:00
Mark DePristo 421b70ca4f Removed previous, and largely unused, help system extensions.
This involved deleting the utils/help/*Taglet.java classes, which parsed out these fields unnecessarily
This also involved removing the few uses of these from the codebase.  For these uses, though, almost all were an identical copy of the first line of the docs, which is the default javadoc behavior anyway.
2011-07-22 09:42:44 -04:00
Mark DePristo 172b35372b Moved all of the distributed GATK code to archive. 2011-07-22 09:20:32 -04:00
Mauricio Carneiro 8d7ef1bb51 Complete refactor of the ReplicationValidation framework, plus the following new functionality:
* merges all pools in a lane.
 * merges all lanes in a site.
2011-07-21 21:39:00 -04:00
Mark DePristo 81d0cab27e Walker index html now emited. 2011-07-21 16:01:54 -04:00
Mark DePristo e892489696 V2 of the document system.
Now uses GATKDoc class to organize documentation for arguments.
Arguments now listed by feature (required, optional, hidden, etc) and link to detailed information about the argument in the html
Lots of code moving between Class and ClassDoc objects.  Should be refactored into a single static utility class.
2011-07-21 15:20:34 -04:00
Christopher Hartl 2f5d10d16b Fix bug wherein aligner could be closed prior to its being used to lowercase sequences. 2011-07-21 13:21:48 -04:00
Matt Hanna 7054c5342f When using the BWA bindings, you have to explicitly call close() to get the
bindings to release memory.
It may or may not be possible to implicitly close triggered by the GC; I'll add a JIRA.
2011-07-21 12:13:29 -04:00
Mark DePristo 6fa17d86ae Completely hacked together version of a FreeMarker + javadoc + custom doclet walker documentation generator 2011-07-21 00:18:07 -04:00
Mark DePristo 45c73ff0e5 Runs and emits an HTML document 2011-07-20 17:16:33 -04:00
Mark DePristo d31b176e15 Removed GATK use of distributed parallelism framework.
Moved distributed GATK prototype code into distributedutils, separating from threading package
2011-07-20 16:26:09 -04:00
Guillermo del Angel 0a1d2df8cb Merged bug fix from Stable into Unstable 2011-07-20 13:19:35 -04:00
Guillermo del Angel f15023b7d2 Bad bug fix: output GLs in multiallelic records were in incorred order (misread spec) 2011-07-20 12:10:48 -04:00
Guillermo del Angel b9c9e0e952 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-20 10:45:16 -04:00
Guillermo del Angel 7140280bf6 Further bug fixes/cleanups for PrintReadsWalker 2011-07-20 10:44:37 -04:00
Guillermo del Angel a2d90a3590 Bug fix: reverted logic so that default behavior skips over sample lookup 2011-07-20 10:23:10 -04:00
Guillermo del Angel e8409c80fa Further protection vs null pointers in PrintReadsWalker 2011-07-19 21:59:24 -04:00
Christopher Hartl 5d706c9e92 Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
Removing PSP and CSM

Conflicts:

	public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/CreateSequenomMask.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/PickSequenomProbes.java
2011-07-19 20:25:33 -04:00
Guillermo del Angel fb2d475c22 Bug fix to prevent null pointer 2011-07-19 20:13:56 -04:00
Christopher Hartl 92c7cfa1c8 BWA bindings and tests moved to public (was required for ValidationAmplicons)
Integration tests for ValidationAmplicons. New argument to disable BWA, lowercase letters only for repetitiveness instead.
2011-07-19 20:11:31 -04:00
David Roazen baae381acb Revert "Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable"
This reverts commit 039a6bb01f345322ce2be50ae3634308bb24e77e, reversing
changes made to b9c9973d1c638dfc9f8c19b5eb845e99844f9d29.
2011-07-19 18:38:53 -04:00
Christopher Hartl 07e716d23a PickSequenomProbes2 expanded functionality: lowercasing based on sequence uniqueness, preserving reference base prior to indel (not a part of the VC as I thought it was), masking deletion bases with 'N's, flanking insertion with 'N's, output is a fasta formatted file. Renamed to ValidationAmplicons since this is really not for picking sequenom probes, but for generating amplicon sequence from which other applications (like sequenom) can choose PCR primers. Moved from private to public. 2011-07-19 15:21:47 -04:00
Guillermo del Angel e6d306458c Merge bug fixes 2011-07-19 14:36:20 -04:00
Guillermo del Angel 989dd17f95 a) Add ability in PrintReads to specify a sample file to easily subset samples, useful for IGV visualization, b) VariantsToTable is more R-friendly with Indels when printing ref/alt columns, c) Changes to SelectVariants ability to speficy a mask to randomly sample from a given AF distribution 2011-07-19 14:29:07 -04:00
Mark DePristo c05451047c Support for multiple records at the same site. The first record gets chr:start, and subsequent records get chr:start_2, chr:start_3, etc. 2011-07-18 15:43:52 -04:00
Mark DePristo 782a05e9b5 Support for sorting the diff output in reverse order. 2011-07-18 15:43:01 -04:00
Mark DePristo 45702d3084 Now supports a mode where the primary key isn't sorted. In this case the records are displayed in the order in which they are added to to the table. 2011-07-18 15:40:15 -04:00
Eric Banks 83ba2c066a Making it deterministic 2011-07-18 13:59:02 -04:00
Eric Banks 92fa410450 Check that it's a valid bam file before parsing or bad things can happen 2011-07-18 13:43:34 -04:00
Eric Banks 80b5c5261a CombineVariants no longer combines records of different types. So now when combining SNP and indel callsets, overlapping calls get their own records. Useful for Khalid in the pipeline. For those interested, it turns out the previous behavior was doing the wrong thing occasionally (and this was even captured in the integration tests). 2011-07-18 13:42:45 -04:00
Eric Banks bc8b5da698 Added docs while I was reading through the code to understand it 2011-07-18 12:25:54 -04:00
Mark DePristo 51b0dd01c3 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 10:47:29 -04:00
Mark DePristo 6f26c07b85 Removed the SpecificDifference class. Now Difference classes always have the option to remember specific master and test values. This means that all summarized differences carry with them specific examples of their differences. Consequently, now even summarized differences give at least one example of the specific difference, even when the count of the difference is > 1. Unit tests updated. Added DiffObjects integrationtest. VCFDiffableReader now specifically reads the first line of the VCF file to capture the version number. 2011-07-18 10:42:35 -04:00
Kiran V Garimella b2b7d27fed Merge branch 'laptop' 2011-07-18 00:25:46 -04:00
Kiran V Garimella 497721a799 Added class documentation string. 2011-07-18 00:25:21 -04:00
Kiran V Garimella ac9c66138d Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 00:20:33 -04:00
Kiran V Garimella 8167aba601 Moved (poorly named) MergeAndMatchHaplotypes to public. Added integration test 2011-07-17 22:47:32 -04:00
Mark DePristo 9992c373be Optimize imports run on the whole project, public and private. I just got too tired of all of the unused imports floating around. Confirmed that the system builds after the changes. 2011-07-17 20:29:58 -04:00
Kiran V Garimella 4ea433f8e1 Moved PhaseByTransmission to public 2011-07-17 19:42:00 -04:00
Mark DePristo 4db2b13e9e Rev tribble.
Just added more documentation for diffEngine and pointer to new wiki:

http://www.broadinstitute.org/gsa/wiki/index.php/DiffEngine
2011-07-17 13:05:04 -04:00
Mark DePristo 92a1c0c278 Moved the varianteval/tags/DataPoint.java and varianteval/tags/Analysis.java to varianteval/utils. This allows rsync to see these files with the -C option, as tags is some kind of reserved CVS keyword. 2011-07-17 10:14:23 -04:00
Menachem Fromer 72f4cf9c0e Walker to perform deterministic annotation of phasing by transmission (to be compatible with RBP's definition of consecutive pairwise phasing) 2011-07-15 17:44:31 -04:00
Guillermo del Angel 9d59c2cb61 a) Made indel VQSR consensus script operational again, b) Made VariantsToTable more indel-friendly when printing out REF and ALT fields: strip out * from REF and print out alleles in the same way as the VCF so that offline processing is easier 2011-07-15 10:13:02 -04:00
Guillermo del Angel 10cf9245d7 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-14 19:18:05 -04:00
Mark DePristo 5ffeddd3b1 better to use _ instead of ., as this is a special case later. 2011-07-14 14:45:16 -04:00
Eric Banks ed6beae1f3 Adding headers to diffable reading for VCFs 2011-07-14 13:55:35 -04:00
Eric Banks 66c652d687 Added some extra error checks in the VCF codec. Now that we've moved this back into the GATK, changed some of the standard exceptions to be USerErrors (instead of TribbleExceptions). 2011-07-14 11:56:10 -04:00
Eric Banks 0c54c796ed Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-13 14:57:33 -04:00
Eric Banks bb0e3a26fc Added integration test for VCF writing. Also, bug fix for writing the GT-free records. 2011-07-13 14:57:21 -04:00
Eric Banks 6a431da554 Don't output source and ref header lines anymore. Short-term motivation for this is that I'd like this tool when run on a VCF to emit the exact same VCF. Long-term motivation is that these tags should be output by the VCF writer itself for all tools. 2011-07-13 14:40:01 -04:00
Menachem Fromer 74aa49e423 Merged bug fix from Stable into Unstable 2011-07-13 12:12:42 -04:00
Menachem Fromer fa3ff53508 Filters should only be applied to the new VC if the old VC had filters applied 2011-07-13 11:58:16 -04:00
Eric Banks 969227c657 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-13 10:01:28 -04:00
Eric Banks 6007eea3ff Allowing VCF records without GTs in vf4.1 2011-07-13 09:56:08 -04:00
Guillermo del Angel 1e81d521c0 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-12 20:12:29 -04:00
Ryan Poplin 837fb8f689 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-12 15:39:26 -04:00
Ryan Poplin 5077c94d85 Adding MappingQualityUnavailableReadFilter to the SNP and indel CountCovariates 2011-07-12 15:39:07 -04:00
Mark DePristo 01fd6a6949 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-12 15:20:44 -04:00
Mark DePristo ccedd6ff4c Difference is now the general form -- used to be SummarizedDifference. The old Difference class is now a subclass of Difference that includes pointers to specific the master and test DiffElements.
Added a size() function that calculates the number of elements tree from a DiffElement.
2011-07-12 15:20:28 -04:00
Eric Banks a2597e7f00 This commit incorporates several different changes that each pretty much break all the VCF-based integration tests, so I bunched them all together. We now officially emit VCF4.1 files (woo hoo), which means that the VCF headers are now all different (header version is 4.1 plus counts for some of the annotations are 'A' or 'G'). Also, I've added a Read Filter for reads with MQ=255 ('unavailable' in the SAM spec) and have applied this to the UG and the RMS MQ annotation. 2011-07-12 14:11:53 -04:00
Ryan Poplin 329c3d8050 Merged bug fix from Stable into Unstable 2011-07-12 13:55:51 -04:00
Ryan Poplin 73735863b0 Fix for the case of requesting genotype for a sample that doesn't exist in a VariantContext 2011-07-12 13:55:21 -04:00
Guillermo del Angel c4c145afb9 Merged bug fix from Stable into Unstable 2011-07-12 13:44:48 -04:00
Guillermo del Angel cfe43e3971 Bug fix for Genotype given alleles: if we are in INDEL mode ignore SNPs and MNPs instead of emitting an empty site with alleles but no annotations 2011-07-12 13:43:46 -04:00
Guillermo del Angel bfbca8b194 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-12 12:11:58 -04:00
Mark DePristo 05212aea62 reader now takes an argument for the maximum number of elements to read from the file. 2011-07-12 08:53:19 -04:00
Mark DePristo 8056a3fe89 getElement() now uses O(1) get from hash instead of linear O(n) search. Enables us to read large files easily. 2011-07-12 08:52:31 -04:00
Eric Banks d7d15019dd Adding support for other simple header line types (e.g. ALT) and cleaning up the interface a bit. 2011-07-12 01:16:21 -04:00
Eric Banks 400b0d4422 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-11 23:38:57 -04:00
Mark DePristo d5056ad899 Merge branch 'master' into diffit 2011-07-11 23:16:15 -04:00
Mark DePristo 893cc2e103 Making the package public, so there's no dependances from public -> private 2011-07-11 23:15:08 -04:00
Eric Banks e3748675db Support for VCF 4.1 header counts 2011-07-11 17:40:45 -04:00
Guillermo del Angel f54c2ae3b4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-11 16:26:27 -04:00
Christopher Hartl d6517adb42 Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-11 16:16:37 -04:00
Christopher Hartl 86890c6357 N and K (in binomial probability) got switched in RFA Walker with the last commit. No longer will NaNs be produced.
Added: TableToVCF. Kind of a longer-term project, but there are lots of variant calls available in a weird tabular format. I used this to convert Ju Et Al small indels to VCF. I'll check against the 1000G ASN superpopulation calls to see if we see a good amount of recapitulation, and if so, i'll put them in unvalidated comparisons. Minor chances to the TableCodec and TableFeatures to allow for this (the codec can sometimes drop a column, and the feature now allows you to grab on to its header).
2011-07-11 16:16:15 -04:00
Guillermo del Angel d587856f2d Private feature to input a list of family descriptions from a file and to look for MV's on all of these. Feature can also output a detailed description of the violation into a separate file 2011-07-11 14:17:59 -04:00
Guillermo del Angel 6e7b5e1e7a Merged bug fix from Stable into Unstable
Merge branch 'master' into unstable
2011-07-08 21:19:45 -04:00
Guillermo del Angel 7fbc5987d0 Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-07-08 21:17:32 -04:00
Mark DePristo bd29236684 Merge branch 'master' into diffengine 2011-07-08 14:08:17 -04:00
Guillermo del Angel 224574424e Bug fix: if we're genotyping a very long indel (>100 bp) fail gracefully instead of with an array out of bounds exception 2011-07-08 12:48:49 -04:00
Ryan Poplin 2a4b3ae4a2 Cleaning up / removing most of the monkeying around with annotation values that happens in VariantDataManager 2011-07-08 12:48:33 -04:00
Mark DePristo 8add2a3866 Merge branch 'master' into diffengine 2011-07-08 09:15:54 -04:00
Eric Banks cc143493e3 Merged bug fix from Stable into Unstable 2011-07-07 23:01:24 -04:00
Eric Banks 4cfe0dd857 Test for bad alleles so that we don't generate IndexOutOfBoundsExceptions 2011-07-07 23:01:03 -04:00
Mark DePristo 3d4f0e9dd7 Now supports the case where you have multiple AC values in the info field. 2011-07-07 17:21:15 -04:00
Ryan Poplin 212e9a1a0c Fixing unstable build after stable commit 2011-07-07 15:18:57 -04:00
Ryan Poplin 11d9a0473a Merged bug fix from Stable into Unstable 2011-07-07 15:03:58 -04:00
Ryan Poplin 50111db2b7 Fixing non-determinism in single-threaded VQSR by moving references to cern.Normal over to the static random generator available in GenomeAnalysisEngine 2011-07-07 15:02:48 -04:00
Guillermo del Angel 4d565b0811 Merge branch 'incoming' 2011-07-07 06:21:05 -04:00
Guillermo del Angel 55c8c05060 Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-07 06:18:29 -04:00
Guillermo del Angel 5ab2e83904 a) Cosmetic modifications to IndelType annotation. b) Add ability to select samples from a file in PrintReads, c) fixes to shaped AF random selection in SelectVariants 2011-07-07 06:15:10 -04:00
Eric Banks 52f6f9fdcc Merged bug fix from Stable into Unstable 2011-07-06 16:05:48 -04:00
Eric Banks 54121eb082 Catch malformed bams that cause the writer to run in infinite loops 2011-07-06 16:05:08 -04:00
Eric Banks 76a01a7453 Merged bug fix from Stable into Unstable 2011-07-06 12:53:09 -04:00
Eric Banks 14fee4ccbd Patch from Bob to deal with symbolic alleles: these weren't getting padded but they should be. 2011-07-06 12:51:44 -04:00
Ryan Poplin bdef233d4d Merged bug fix from Stable into Unstable 2011-07-06 10:05:02 -04:00
Ryan Poplin e8ed6b7f0f Adding more comments to main VQSR walker. Fixing copyright lines. Bug fix for default paths to now point to public/R/ instead of R/ Bug fix in VQSR for the path to the R scripts not ending in a slash. 2011-07-06 10:01:14 -04:00
Guillermo del Angel 8e8b901d12 Merged bug fix from Stable into Unstable
Merge branch 'master' into unstable
2011-07-06 09:57:55 -04:00
Guillermo del Angel 81a4d18468 Mark several indel-related arguments as @Hidden 2011-07-06 09:56:38 -04:00
Guillermo del Angel 9124c84a7c bug fixes 2011-07-04 21:10:44 -04:00
Guillermo del Angel bb85f232b9 bug fixes 2011-07-04 21:04:49 -04:00
Guillermo del Angel f26ffeaea0 bug fixes 2011-07-04 20:48:45 -04:00
Guillermo del Angel 04df153f47 bug fixes 2011-07-04 20:45:10 -04:00
Guillermo del Angel 7a04872a3f bug fixes 2011-07-04 20:33:59 -04:00
Guillermo del Angel 08bc843d4c SelectVariants can get a table to boost AF when choosing randomly 2011-07-04 20:23:22 -04:00
Guillermo del Angel fac082de64 Report only highest AF and AC in multiallelic records in VariantsToTable or else R can't parse table 2011-07-03 14:32:12 -04:00
Guillermo del Angel abe9480c6d Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-02 21:19:15 -04:00
Ryan Poplin fb315b5f8c Merge branch 'incoming' 2011-07-02 18:10:48 -04:00
Ryan Poplin 41d46059e7 fixing bad format statement 2011-07-02 18:09:17 -04:00
Ryan Poplin 3804afeb8a Merge branch 'incoming' 2011-07-02 17:55:39 -04:00
Ryan Poplin 781c0c33a4 Use the worst X% of calls in addition to the bad training sites list. Don't include the already added calls in the calculation of X% 2011-07-02 17:55:10 -04:00
Ryan Poplin 6b8af6afd8 Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-02 17:15:56 -04:00
Ryan Poplin fdc2ebb321 Adding ability to specify in VQSR a list of bad sites to use when training the negative model. Just add bad=true to the list of rod tags for your bad sites track. 2011-07-02 17:15:13 -04:00
Guillermo del Angel 09af6bbc6c Ugh - backed out experimental code not for public consumption unintendedly committed 2011-07-02 16:58:57 -04:00
Guillermo del Angel c6c0dba040 Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-02 16:45:34 -04:00
Ryan Poplin 4532a84314 Merged bug fix from Stable into Unstable 2011-07-02 10:48:55 -04:00
Ryan Poplin 5faf40b79d Moving AnalyzeAnnotations into the archive because it has outlived its usefulness. 2011-07-02 10:39:53 -04:00
Ryan Poplin 17ff5bb094 Variant records coming out of the VQSR are now annotated with which input annotation was most divergent from the Gaussian mixture model. This gives a general sense for why each variant was removed from the callset. 2011-07-02 09:55:35 -04:00
Khalid Shakir c65e52f88a Merged bug fix from Stable into Unstable 2011-07-01 20:50:56 -04:00
Khalid Shakir b6bc64a0c8 Cleanup of the utils.broad package.
Using Picard IoUtils on sample names.
2011-07-01 20:47:03 -04:00
Eric Banks 0c9105ca22 Minor fix of description 2011-07-01 18:07:35 -04:00
David Roazen d647ea4fdc Long-delayed change to CachingIndexedFastaSequenceFile. Made the cache
non-static to avoid problems when multiple references are used within the same
thread (eg., during integration tests). This should kill the intermittent
IndelRealignerIntegrationTest failures.
2011-07-01 16:04:30 -04:00
Eric Banks 761347b8d5 The VariantContext utility method used by SelectVariants wasn't checking the filter status (unfiltered vs. passing filters) and always returned a VC that was passing filters. This is fixed and the md5 from the VCF Streaming test has been re-updated. 2011-06-30 15:26:09 -04:00
Mark A. DePristo defa3cfe85 Moved around private walkers into appropriate directories in private gatk.walkers. Moved a few public walkers into private qc package, and some private qc walkers into the public directory. Removed several obviously broken and/or unused walkers. 2011-06-30 14:59:58 -04:00
Eric Banks 804d5f22d5 Reverting previous change, as promised. 2011-06-30 13:18:30 -04:00
Eric Banks 9e234cf5d6 This is a temporary commit for Picard. It will absolutely break integration tests, but I'm going to revert it in 1 minute. Because we don't want them in unstable, I need to push this into stable. 2011-06-30 13:17:14 -04:00
Guillermo del Angel 331b47afbd Merge branch 'master' of ssh://delangel@nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-06-30 08:29:11 -04:00
Guillermo del Angel 50c32ce52e VariantsToTableFix 2011-06-29 21:39:53 -04:00
Guillermo del Angel 9b134f3b96 VariantsToTableFix 2011-06-29 21:33:41 -04:00
Guillermo del Angel 2b88033ef4 Enable considering 454 reads, just lower GOP by 15 2011-06-29 16:12:55 -04:00
Guillermo del Angel dc4f63a1a8 a) consensus goes to week queue
b) New experimental TechnologyComposition annotation
c) SelectVariants fixes
2011-06-29 16:00:23 -04:00
Eric Banks 70ba851478 Might as well check for the illegal state and throw an exception 2011-06-29 15:59:10 -04:00
Eric Banks 1f19afe1d9 Fixed bug in the IndelRealigner: now that variants are correctly typed in VariantContext, it is possible that a variant can be an indel but neither an insertion or a deletion; added a isComplexIndel() method and now we check for such an event in the realigner (we don't use them to generate alternate consenses). Also, added a isMNP() method while I was there so that it would be consistent with other variant types. 2011-06-29 15:54:09 -04:00
Guillermo del Angel e91ae6b265 AF matching when selecting random variants 2011-06-29 15:00:26 -04:00
Guillermo del Angel dee10140dd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-29 13:58:04 -04:00
Eric Banks 8586c86bc4 My commit from last week to fix the old dbsnp rod conversion only worked for locus traversals. Updated now to work for all traversals. 2011-06-29 13:56:37 -04:00
Guillermo del Angel 5b6d279a2e Two bug fixes:
a) Modified the way clipped bases are dealt with in ReadPosRankSumTest when annotating indels. Cigar string cannot be trusted because BWA can clip good high quality bases and some sites get incorrect ReadPos annotations if BWA systematically clips at an indel breakpoint.
b) PL header needs to specify "." as length. Otherwise we fail VCF validation if multiallelic sites are present.
2011-06-29 10:21:27 -04:00
David Roazen 139c6b84a1 Modified build.xml and the help extractor doclet to use the output of "git
describe" as an absolute version number (if the repository has at least one
tag), using the raw SHA-1 hash value as a fallback version number in the case
where there are no tags.
2011-06-28 08:37:05 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00