Commit Graph

381 Commits (47fa7e2227d16fa8bbb4354dc68df16489a2b830)

Author SHA1 Message Date
chartl 47fa7e2227 + Added override to extractFileEntries
+ UG now doesn't care whether it's given SNPs or indels to genotype, it will do the right thing -- so remove the option to specify which GM user wants

+ Max misamatches argument removed

integration test will follow



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5638 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 15:13:35 +00:00
kshakir cad6722cf6 Emailing on function start.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5637 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 14:55:35 +00:00
kshakir 475ad1259d Put a band-aid on the FCP by switching use of DINDEL to INDEL and explicitly running UG the old way with just indels and just snps.
Switched YAML parser to new Broad parser which will additionally update picard cleaned bams to the latest version if the project and sample are specified.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5634 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-14 02:22:31 +00:00
corin 9ee30ce594 Whole genome pipeline script. currently chunks, cleans, calls, merges, selects and filters indels, recalibrates, and evals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5627 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-13 16:59:48 +00:00
chartl 8125b8b901 Old changes to the exome VQSR search.
SGA updated to include new proportion-based insert size test.

Major fix for dichotomization test: MathUtils now optionally ignores NaN values for sums, averages, variances. In the future this feature can be pushed back into the AssociationContext object iself (e.g. no data? no entry), but it's kept like this for transparency for now.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5618 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-12 23:00:50 +00:00
kshakir 4b7c3af763 When /etc/mailname is unreadable fall back to the hostname.
Implicit conversions for String to/from File.
Small updates to the example QScripts.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5614 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-11 20:22:44 +00:00
rpoplin 05ad6ecf72 bug fix in MDCP
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5613 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-11 18:27:47 +00:00
kshakir 0a58d7aa1a Marked boolean SAMFileWriterATD arguments as flags so scala generator maps them to Boolean instead of Option[Boolean].
Using the VCFWriterATD isCompressed to check if the VCF index will be auto generated.
Tracking BAM and Tribble indexes as @Inputs and @Outputs in generated QFunctions.
Updates to the BamGatherFunction to disable the index during merge when disable_bam_indexing = true.
Made a shortcut for live-running pipelinetest, pipelinetestrun.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5606 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 18:44:32 +00:00
chartl b81228fec1 Minor bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5603 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 17:30:40 +00:00
hanna 437db28937 Incorporating Khalid's feedback.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5602 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 16:22:49 +00:00
hanna 1763a41e94 Oops...broke a Queue compile dependency on the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5596 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-08 02:53:22 +00:00
chartl cc58e19621 This is now running. Expect results in a few weeks when the ~7k jobs have percolated through the week queue. Pray gsa1 doesn't go down.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5593 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 21:12:59 +00:00
chartl 6a26957b65 Bug squashing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5592 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 20:11:28 +00:00
chartl a1b7d28375 Initial VQSR full search script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5591 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-07 20:03:48 +00:00
kshakir 45ebbf725c Instead of always merging Picard interval files they are optionally merged by Sting Utils.
Disabled the MFCP while the FCP gets an update.
Minor updates to email messages for upcoming scala 2.9.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5588 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 21:12:05 +00:00
rpoplin febb883511 updates to MDCP
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5586 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-06 19:44:46 +00:00
hanna 798fb6a7a2 First draft of a script to measure performance of read walkers when merging
dynamically.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5570 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 15:35:14 +00:00
carneiro b722ebf244 quick help/comments updates to match the wikipage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5569 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-04 12:55:55 +00:00
depristo 349661b958 Renamed StratifyAlignmentContext to AlignmentContextUtils, and StatiefyContextType to ReadOrientation. Also, went through the system and deleted all references to second bases. That ship passed long ago.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5563 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 15:35:09 +00:00
rpoplin 40a25af58e Bug fixes in MDCP
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5561 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-03 00:04:38 +00:00
kshakir 73f0610abf When getCanonicalHostName fails use getHostName instead of getHostAddress as it's more compatible with our mail servers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5553 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 20:26:26 +00:00
depristo f2c4356a40 Minor usability improvements to the standard eval script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5551 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-01 17:36:50 +00:00
carneiro 0a772688fe implementation of the Gatherer class for CountCovariates, which makes it now scatter/gatherable. Kudos to the @Gather annotation Khalid just introduced!
QuickCCTest is my test script for the gatherer.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5547 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 21:15:21 +00:00
carneiro 20344a27b4 Quick updates to the data processing pipeline after successfully cleaning the papuans. It now scatter gathers everything and runs in the hour queue for low pass data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5546 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-31 21:13:33 +00:00
kshakir d5ac822e97 When @Gather annotation is missing (probably due to an unclean build) printing out the full field+class name for debugging purposes.
Custom gatherer prints out the class name in the logs.
Try to retrieve mail domain from /etc/mailname before falling back to the hostname.
Building oneoff jars during ant oneoffs.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5540 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 21:43:37 +00:00
carneiro 5d26c66769 Count Covariates is almost scatter-gatherable now!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5537 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 22:25:33 +00:00
rpoplin 5ddc0e464a Under guidance from Matt added ability to use key-value tags with ROD binding command line arguments, so now one can say -B:hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf and get the tags in a walker. Look at ContrastiveRecalibrator for an example of how to use the new ReferenceOrderedDataSource.getTags(). Removed references to FDR in tranches since we are only using truth sensitivity. Finally fixed long standing bug where tranche filters weren't set appropriately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5536 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 21:04:09 +00:00
carneiro c3f70cc5cb DPP: Updated after some tests with BWA. Still needs more testing.
MDP: Removed ApplyVariantCut as it's no longer necessary with VQSR2.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5534 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 18:22:09 +00:00
kshakir f443137dda Fixed RodBind with tag order.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5532 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 14:47:26 +00:00
carneiro ccdc021207 Added BWA (option) to the data processing pipeline. Lots of testing still happening...
little fix to the calling pipeline.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5528 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-28 20:17:57 +00:00
depristo cdb0bde952 Bringing script up to date
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5526 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 20:49:07 +00:00
depristo bae0b6cba8 A script for playing with BEAGLE refinement parameters. Supports construction of reference panels from NGS data sets with varying niteration and calibration curve parameters, as well as imputing missing genotypes in a VCF with this reference panel, and comparison to a deeply sequenced individual.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5523 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-27 12:44:25 +00:00
kshakir fc8acd503e Enabled the parameterize option for debugging PipelineTest MD5s.
Fixed escaping expressions that have more than one space between arguments.
Updated example to match the wiki.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5516 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-26 00:41:47 +00:00
chartl fe7f45ee2e First pass at recalibrating associations, with optional data whitening. Modification to the TableCodec so it can natively read bedgraph files (just needed to add an extra header marker: "track").
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5515 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 19:35:39 +00:00
kshakir 8e67c5567c When host name lookup fails just use the whole internet address instead of truncating to the last two octets of the IP address.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5513 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 18:18:22 +00:00
kshakir 3e3ff4a9e7 Bam gathering passes on the compression_level and the create_index flag to MergeSamFiles.
VCF gathering passes on the no_header and sites_only flags to CombineVariants.
Fixed deletion of gathered log files. Although they are intermediate and do not need to be re-run if not present, they should not be deleted.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5508 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-25 03:58:38 +00:00
kshakir e47513f043 Minor updates to match the wiki documentation.
Upper cased the PartitionType enum values.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5506 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 20:22:23 +00:00
carneiro 1281c842ad quick updates to conform with the new picard bam function structure
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5505 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 16:58:37 +00:00
kshakir f3e94ef2be Walkers can now specify a class extending from Gatherer to merge custom output formats. Add @Gather(MyGatherer.class) to the walker @Output.
JavaCommandLineFunctions can now specify the classpath+mainclass as an alternative to specifying a path to an executable jar.
JCLF by default pass on the current classpath and only require the mainclass be specified by the developer extending the JCLF, relieving the QScript author from having to explicitly specify the jar.
Like the Picard MergeSamFiles, GATK engine by default is now run from the current classpath. The GATK can still be overridden via .jarFile or .javaClasspath.
Walkers from the GATK package are now also embedded into the Queue package.
Updated AnalyzeCovariates to make it easier to guess the main class, AnalyzeCovariates instead of AnalyzeCovariatesCLP.
Removed the GATK jar argument from the example QScripts.
Removed one of the most FAQ when getting started with Scala/Queue, the use of Option[_] in QScripts:
1) Fixed mistaken assumption with java enums. In java enums can be null so they don't need nullable wrappers.
2) Added syntactic sugar for Nullable primitives to the QScript trait. Any variable defined as Option[Int] can just be assigned an Int value or None, ex: myFunc.memoryLimit = 3
Removed other unused code.
Re-fixed dry run function ordering.
Re-ordered the QCommandline companion object so that IntelliJ doesn't complain about missing main methods.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5504 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-24 14:03:51 +00:00
chartl cd90fdeca1 Right. The issue was not setting the scatter/gather classes appropriately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5501 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 20:08:53 +00:00
chartl 3c1bf40a45 QScript for scatter-gathering regional association (not quite as easy as using the built-in extension, due to the multiplexer). Currently does not work due to something I'm missing re: scatter gather class, this commit is an interim one.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5500 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-23 19:42:29 +00:00
carneiro 3414bccb46 documentation changes to agree with the wiki
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5494 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 21:48:49 +00:00
carneiro 28149e5c5e GenotypeAndValidate version 2, ready to be used.
- now it differentiates between confident REF calls and not confident calls.
- you can now use a BAM file as the truth set. 
- output is much clearer now

dataProcessingPipeline version 2, ready to be used.
- All the processing is now done at the sample level
- Reads the input bam file headers to combine all lanes of the same sample.
- Cleaning is now scattered/gathered. Inteligently breaks down in as many intervals as possible, given the dataset.
- Outputs one processed bam file per sample (and a .list file with all processed files listed)
- Much faster, low pass (read Papuans) can run in the hour queue.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5493 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 20:18:02 +00:00
carneiro 748787c509 helper script to the papuan processing... minor updates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5489 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 14:11:02 +00:00
kshakir f6d4b0aaf5 Using an embedded version of Picard for merging un-indexed bam files after scatter/gather instead of requiring the QScripts to specify the picard JAR. May do this for the GATK jar too.
Fixed initialization of pending counts when using -startFromScratch so the count doesn't start at zero and end at -<#njobs>.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5483 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-21 18:20:01 +00:00
carneiro 1198a90ac7 cosmetic change.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5481 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-21 15:46:04 +00:00
carneiro 96628457cb pacbio calling pipeline also using VQSR2 now, minor updates on the other pipelines to get the papuans through.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5479 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 22:06:52 +00:00
carneiro 4e449905d1 methods development pipeline now sports VQSR2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5478 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 22:00:46 +00:00
carneiro c9442e4b21 now merging bam files per sample and processing according to cleaning options.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5477 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 21:31:29 +00:00
carneiro 18fac5112c first step towards the new sample based processing pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5471 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-18 19:25:15 +00:00