Commit Graph

625 Commits (cd2c511c4ae8a7d13ca6fe3604308ca5fdea5c00)

Author SHA1 Message Date
Mauricio Carneiro 66a8b36cf5 Fixed most indexing bugs
* added bases and quals to consensus
* fixed consensus read cigar generation.
2011-08-30 02:43:41 -04:00
Mark DePristo 1e5001b447 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-29 17:04:21 -04:00
Mark DePristo 3af001fff2 Bugfix for file that must not exist on disk 2011-08-29 17:00:10 -04:00
Mark DePristo 3b09d42ed6 Now only prints 1 warning message about duplicate headers in simpleMerge 2011-08-29 14:41:29 -04:00
Eric Banks c2f0db969b Don't use the default deletion value from UG if not asking to have it set 2011-08-29 13:48:10 -04:00
Eric Banks bb7a37e8f2 We need to allow reference calls in the input VCF for the GenotypeAndValidate walker when using the BAM as truth so that we can test supposed monomorphic calls against the truth. 2011-08-29 13:19:35 -04:00
Ryan Poplin bc252a0d62 misc minor bug fixes in assembly. Increasing the minimum number of bad variants to be used in negative model training in the VQSR 2011-08-29 08:11:31 -04:00
Mark DePristo a5c65fc133 Debugging information to print out the Query tracks 2011-08-28 18:54:49 -04:00
Mark DePristo 7bf006278d Moved ResolveHostname to general utils as a static function 2011-08-28 12:04:16 -04:00
Mark DePristo ccec0b4d73 AnalyzeCovariates uses the general RScript system now
-- Convenience constructor for collection for testing
-- callRScript() now accepts Objects not Strings, for convenience
2011-08-27 12:54:13 -04:00
Mark DePristo 1ceb020fae UnitTests for RScript 2011-08-27 10:50:05 -04:00
Mark DePristo e37a638e09 Fix for disallowed characters in GATKReportTable
-- Illegal characters are automatically replaced with _
2011-08-26 13:24:06 -04:00
Mark DePristo c0503283df Spelling fix requires md5 updates 2011-08-26 07:40:44 -04:00
Mark DePristo eef1ac415a Merge branch 'master' into rodTesting
Conflicts:
	public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java
2011-08-26 00:35:41 -04:00
Eric Banks 9b7512fd94 Just because there's a ref base doesn't mean the VC needs to be padded 2011-08-25 22:42:14 -04:00
Mark DePristo e01273ca7c Queue now writes out queueJobReport.pdf
-- General purpose RScript executor in java (please use when invoking RScripts)
-- Removed groupName.  This is now analysisName
-- Explicitly added capability to enable/disable individual QFunction
2011-08-25 16:57:11 -04:00
Eric Banks 09a729da3a Removing incorrect comment 2011-08-25 15:42:52 -04:00
Eric Banks 8bbef79fc2 Create clipped alleles during allele parsing instead of creating a full VC, clipping alleles, and regenerating the VC from scratch. 2011-08-25 15:37:26 -04:00
Ryan Poplin 29c7b10f7b Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-24 15:18:58 -04:00
Ryan Poplin e5008aba00 Output the top two haplotypes as a variant call by running smith-waterman alignment against the reference and calling any difference as variation. This is the first verion that runs end-to-end by taking in reads as bam file and writing out variant calls in VCF. 2011-08-24 15:18:44 -04:00
Guillermo del Angel e618cb1e79 a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED|INDEL|SNP|MNP|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC|MULTIALLELIC]. Corresponding gatkdocs changes.
b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk).
c) Added integrationtest for new SelectVariants commands
2011-08-24 12:25:50 -04:00
Mark DePristo 28ee6dac41 Fixed spelling mistake 2011-08-24 10:14:45 -04:00
Ryan Poplin f37875600a Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-24 09:02:44 -04:00
Khalid Shakir 1ecbf05aae Avoid segfaults due to out of date and possibly abandonded LSF DRMAA implementation when use'ing LSF instead of .combined_LSF_SGE 2011-08-23 23:49:36 -04:00
Mark DePristo 569e1a1089 Walker.isDone() aborts execution early
-- Useful if you want to have a parameter like MAX_RECORDS that wants the walker to stop after some number of map calls without having to resort to the old System.exit() call directly.
2011-08-23 16:53:06 -04:00
Ryan Poplin a1a1fac9e4 Likelihood engine now gives non-zero likelihoods. Using HMM function that can handle context specific gap open and gap continuation penalties 2011-08-23 13:43:07 -04:00
Guillermo del Angel 6e2552a9ef Merge fix 2011-08-23 12:40:43 -04:00
Guillermo del Angel 8b7a0b3b62 Two new arguments to SelectVariants to exclude either multiallelic or biallelic sites from input vcf 2011-08-23 12:40:01 -04:00
Roger Zurawicki ac36271457 Fixed extra reads showing up in Variable Sites
Reads that were not hard clipped for the variable site no longer show up in output file
Walker now uses unclippedStart of Read to determine position in the sliding Window
2011-08-23 11:26:00 -04:00
Mark DePristo 6d6feb5540 Better error message when you cannot determine a ROD type because the file doesn't exist or cannot be read 2011-08-23 10:56:37 -04:00
Mauricio Carneiro feeab6075f Merging ReduceReads development with unstable repo
It is time to bring the ReadClipper class to the main repo. Read Clipper has tested functionality for soft and hard clipping reads. I will prepare thorough documentation for it as it will be very useful for the assembler and the GATK in general.
2011-08-22 23:03:03 -04:00
Guillermo del Angel ee68713267 Further Bug fixes to CountVariants: stratifications were wrong in case genotypes had no-calls, for example if we stratified by sample and a sample had a no-call, this no-call was considered a true variant and counts were incorrectly increased 2011-08-22 20:42:47 -04:00
Guillermo del Angel c270384b2e Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-22 20:39:32 -04:00
Guillermo del Angel 8ae24912f4 a) Misc fixes in Phase1 indel vqsr script,
b) More R-friendly VariantsToTable printing of AC in case of multiple alt alleles
c) Rename FixPLOrderingWalker to FixGenotypesWalker and rewrote: no longer need older code, replaced with code to replace genotypes with all-zero PL's with a no-call.
2011-08-22 20:39:06 -04:00
Mark DePristo 85c5a6f890 Merge branch 'rodTesting'
Conflicts:
	private/java/src/org/broadinstitute/sting/gatk/walkers/performance/ProfileRodSystem.java
2011-08-22 17:43:47 -04:00
Mark DePristo 1eab9be35d Now with accurate javadoc 2011-08-22 17:25:15 -04:00
Mark DePristo 3612a3501d info, not warn, about dynamic type determination 2011-08-22 17:24:51 -04:00
Eric Banks dc42571dd9 Only create the genotype map when necessary 2011-08-22 15:40:36 -04:00
Khalid Shakir c4c90c8826 Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline:
- Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size.
- Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values.
- Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8
- Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.
2011-08-22 15:13:27 -04:00
Eric Banks 2c24b68a96 Working implementation of DecodeLoc for VCF parsing. Makes indexing 3x faster. 2011-08-22 15:11:21 -04:00
Eric Banks 518b3dd291 Don't let the genotypes map be null 2011-08-22 15:10:30 -04:00
Ryan Poplin f93a554b01 updating exome specific parameters in MDCP 2011-08-21 10:25:36 -04:00
Ryan Poplin dbff84c54e Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-21 10:09:19 -04:00
Khalid Shakir 22ca44c015 Fixed Queue's tagging of RodBindings.
Fixed argument definition names.
2011-08-21 02:34:20 -04:00
Eric Banks a8cbced71b Bug fix for Ryan: check for no context 2011-08-20 22:49:51 -04:00
Eric Banks 0ccd173967 Fixing the recent SelectVariants fix 2011-08-20 21:30:08 -04:00
Ryan Poplin b008676878 fixing the previous fix 2011-08-20 21:21:55 -04:00
Guillermo del Angel 782453235a Updated VariantEvalIntegrationTest since there's a new column separating nMixed and nComplex in CountVariants
Misc updates to WholeGenomeIndelCalling.scala
Bug fix in VariantEval (may be temporary, need more investigation): if -disc option is used in sites-only vcf's then a null pointer exception is produced, caused by recent introduction of -xl_sf options.
2011-08-20 12:24:22 -04:00
Ryan Poplin 539e157ecd Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR 2011-08-20 11:28:48 -04:00
Guillermo del Angel 4939648fd4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-20 08:50:43 -04:00
Ryan Poplin a96ecbab71 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-19 19:30:05 -04:00
Ryan Poplin ddb5045e14 Updating the methods development calling pipeline for the new rod binding syntax and the new best practices. 2011-08-19 19:29:51 -04:00
Mark DePristo ff018c7964 Swapped argument order but not MD5 order 2011-08-19 16:55:56 -04:00
Mark DePristo 8b3cfb2f1c Final documented version of GATKDoclet and associated classes
-- Docs on everything.
-- Feature complete.  At this point only minor improvements and bugfixes are anticipated
2011-08-19 16:52:17 -04:00
Mark DePristo b08d63a6b8 Documentation and code cleanup for ClipReads, CallableLoci, and VariantsToTable
-- Swapped -o [summary] and -ob [bam] for more standard -o [bam] and -os [summary] arguments.
-- @Advanced arguments
2011-08-19 15:06:37 -04:00
Mark DePristo 49e831a13b Should have checked in 2011-08-19 14:35:16 -04:00
Mauricio Carneiro 7b5fa4486d GenotypeAndValidate - Added docs to the @Arguments 2011-08-19 13:35:11 -04:00
Mark DePristo 9f7d4beb89 Merge branch 'help' 2011-08-19 13:14:02 -04:00
Mark DePristo 4d1fd17a97 GATKDoclet cleanup and documentation
-- Fixed bug in the way ArgumentCollections were handled that lead to failure in handling the dbsnp argument collection.
2011-08-19 13:13:41 -04:00
Ryan Poplin 0f25167efd minor fix in VariantEval docs 2011-08-19 11:01:04 -04:00
Mark DePristo 198955f752 GATKDoc descriptions for all standard codecs, or TODO for their owners
-- Also added vcf.gz support in the VCF codec.  This wasn't committed in the last round, because it was missed by the parallel documentation effort.
2011-08-19 09:57:21 -04:00
Guillermo del Angel 269ed1206c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-19 09:32:20 -04:00
Mark DePristo a5e279d697 Dynamic typing of vcf.gz files
-- CombineVariantsIntegrationTests now use dynamic typing of vcf.gz files
-- FeatureManagerUnitTests tests for correctness.
2011-08-19 09:05:11 -04:00
Eric Banks 40e67cff1b I like the @Advanced annotation 2011-08-18 22:27:34 -04:00
Mark DePristo 2457c7b8f5 Merge branch 'master' into help 2011-08-18 22:20:43 -04:00
Mark DePristo 5fbdf968f7 ArgumentSource no longer comparable. Arguments sorted by GATKDoclet 2011-08-18 22:20:14 -04:00
Eric Banks 77fa2c1546 Renaming read filters with a superfluous 'Read' in their names. Kept the ones that made sense to have it (e.g. MalformedReadFilter). 2011-08-18 22:01:33 -04:00
Mark DePristo 1d3799ddf7 Merge branch 'master' into help 2011-08-18 22:00:29 -04:00
Mark DePristo d1892cd0d7 Bug fixes
-- Sorting of ArgumentSources now done in GATKDoclet, not in the ParsingEngine, as the system depends on the LinkedTreeMap
-- Fixed broken exception throwing in the case where a file's type could not be determined
2011-08-18 21:58:36 -04:00
Mark DePristo c5efb6f40e Usability improvements to GATKDocs
-- ArgumentSources are now sorted by case insensitive names, so arguments are shown in alphabetical order (Ryan)
-- @Advanced annotation can be used to indicate that an argument is an advanced option and should be visually deemphasized in the GATKs.  There's now an advanced section.  Mauricio or Ryan -- could you figure out how to make this section less prominent in the style.css?
2011-08-18 21:39:11 -04:00
Mark DePristo d94da0b1cf Moved CG and SOAP codecs to private 2011-08-18 21:20:26 -04:00
Mark DePristo f7414e39bc Improvements to GATKDocs
-- Allowed values for RodBinding<T> are displayed in the GATKDocs
-- Longest name up to 30 characters is chosen for main argument list (suggested by Ryan/Mauricio)
-- Features are listed in alphabetical order
-- Moved useful getParameterizedType() function to JVMUtils
-- Tests of these features in the Documentation Test
2011-08-18 21:20:09 -04:00
Ryan Poplin 09d099cada Added GATKDocs to the UnifiedGenotyper. 2011-08-18 20:57:02 -04:00
Mauricio Carneiro 6ef01e40b8 Complete rewrite of Hard Clipping (ReadClipper)
Hard clipping is now completely independent from softclipping and plows through previously hard or soft clipped reads.
2011-08-18 18:35:45 -04:00
Guillermo del Angel 626cbf9411 Bug fixes and cleanups for IndelStatistics 2011-08-18 16:28:40 -04:00
Guillermo del Angel 58560a6d50 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 16:17:52 -04:00
Guillermo del Angel 3dfb60a46e Fixing up and refactoring usage of indel categories. On a variant context, isInsertion() and isDeletion() are now removed because behavior before was wrong in case of multiallelic sites. Now, methods isSimpleInsertion() and isSimpleDeletion() will return true only if sites are biallelic. For multiallelic sites, isComplex() will return true in all cases.
VariantEval module CountVariants is corrected and an additional column is added so that we log mixed events and complex indels separately (before they were being conflated).
VariantEval module IndelStatistics is considerably simplified as the sample stratification was wrong and redundant, now it should work with the VE-generic Sample stratification. Several columns are renamed or removed since they're not really useful
2011-08-18 16:17:38 -04:00
Chris Hartl 6b256a8ac5 Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git 2011-08-18 15:29:24 -04:00
Chris Hartl a8935c99fc dding docs for DepthOfCoverage and ValidationAmplicons 2011-08-18 15:28:35 -04:00
Mark DePristo f2f51e35e3 Merge branch 'master' into help 2011-08-18 14:05:33 -04:00
Mark DePristo faa3f8b6f6 Only concrete classes are now documented 2011-08-18 14:04:47 -04:00
Ryan Poplin 7c4ce6d969 Added GATKDocs for the VQSR walkers. 2011-08-18 14:00:39 -04:00
Mark DePristo 5772766dd5 Improvements to GATKDocs
-- Now supports a static list of root classes / interfaces that should receive docs.  A complementary approach to documenting features to the DocumentedGATKFeature annotation
-- Tribble codecs are now documented!
-- No longer displayed sub and super classes
2011-08-18 14:00:09 -04:00
Mark DePristo e03db30ca0 New uses DocumentedGATKFeatureObject instead of annotation directly
-- Step 1 on the way to creating a static list of additional classes that we want to document.
2011-08-18 12:31:04 -04:00
Mark DePristo d4511807ed Merge branch 'master' into help 2011-08-18 11:53:37 -04:00
Mark DePristo c787fd0b70 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 11:52:45 -04:00
Mark DePristo c797616c65 If you have one sample in your BAM, getToolkit().getSamples().size() == 2
Also deleted double initializationm, where a line of code was duplicated in creating the GATK engine.
2011-08-18 11:51:53 -04:00
Mark DePristo cbec69a130 Merge branch 'master' into help
Conflicts:
	public/java/src/org/broadinstitute/sting/utils/help/HelpUtils.java
2011-08-18 11:33:27 -04:00
Eric Banks aa21fc7c9c Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 11:30:59 -04:00
Mark DePristo f5d7cabb20 Fix for reintroducing an already solved problem. 2011-08-18 11:20:12 -04:00
Eric Banks a45498150a Remove non-ascii char 2011-08-18 11:18:29 -04:00
Ryan Poplin c08a9964d4 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 10:58:04 -04:00
Ryan Poplin bb79d3edae Added GATKDocs for the BQSR walkers. 2011-08-18 10:57:48 -04:00
Mark DePristo 47bbddb724 Now provides type-specific user feedback
For RodBinding<VariantContext> error messages now list only the Tribble types that produce VariantContexts
2011-08-18 10:47:16 -04:00
Mark DePristo 2d41ba15a4 Vastly better Tribble help message
Here's a new example:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.1-520-g76495cd):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to parse value /humgen/gsa-hpprojects/GATK/data/refGene_b37.filtered.sorted.txt for argument refSeqRodBinding. Message: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :TYPE listing the correct type from among the supported types:
##### ERROR        Name        FeatureType   Documentation
##### ERROR      BEAGLE      BeagleFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR         BED         BEDFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_bed_BEDCodec.html
##### ERROR    BEDTABLE       TableFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR       CGVAR     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_completegenomics_CGVarCodec.html
##### ERROR       DBSNP       DbSNPFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_dbsnp_DbSNPCodec.html
##### ERROR    GELITEXT    GeliTextFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR         MAF         MafFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_features_maf_MafCodec.html
##### ERROR MILLSDEVINE     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_MillsDevineCodec.html
##### ERROR   RAWHAPMAP   RawHapMapFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR      REFSEQ      RefSeqFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR   SAMPILEUP   SAMPileupFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR     SAMREAD     SAMReadFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR      SNPEFF      SnpEffFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_snpEff_SnpEffCodec.html
##### ERROR     SOAPSNP     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_soapsnp_SoapSNPCodec.html
##### ERROR       TABLE       TableFeature   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR         VCF     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR        VCF3     VariantContext   http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------
2011-08-18 10:31:32 -04:00
Mark DePristo c2287c93d7 Cleanup of codec locations. No more dbSNPHelper
-- refdata/features now in utils/codecs with the other codecs
-- Deleted dbsnpHelper.  rsID function now in VCFutils.  Remaining code either deleted or put into VariantContextAdaptors
-- Many associated import updates due to code move
2011-08-18 10:02:46 -04:00
Mark DePristo 9c17d54cb6 getFeatureClass() now returns Class<T> not Class to avoid yesterday's runtime error 2011-08-18 09:39:20 -04:00
Mark DePristo c30e1db744 Better location for help utils 2011-08-18 09:38:51 -04:00
Mark DePristo 4da42d9f39 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-18 09:32:57 -04:00
Eric Banks c91a442be1 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-08-17 22:40:16 -04:00