gatk-3.8

Commit Graph

Author	SHA1	Message	Date
corin	27acede64d	Removing old arguments. We'll now be running with the defaults. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4811 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-09 18:58:56 +00:00
chartl	f8dd59c1d1	Tightening of the batch merging pipeline. Optimized to run on hour queue, so please: if you run this, crush 'hour' with it. Testing is forthcoming, but it merged 700 samples overnight. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4805 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-08 14:36:23 +00:00
chartl	02de9a9764	With multi-sample genotyping must come scatter+gather. Also Khalid informed me of the .group(size) method, so removing my useless (but pretty) code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4797 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-06 20:12:23 +00:00
chartl	f4c43f013f	Due to the overhead for reading VCF files (>32g for 700 5MB VCF files), batched merging has to generate likelihoods in batches. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4796 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-06 18:23:54 +00:00
chartl	0944184832	Major refactoring of library and full calling pipeline (v2) structure. Arguments to the full calling qscript (and indeed, any qscript that wants them) are now specified via the PipelineArgumentCollection Libraries require a Pipeline object for instantiation -- eliminating their previous dependence on yaml files Functions added to PipelineUtils to build out the proper Pipeline object from the PipelineArgumentCollection, which now contains additional arguments to specify pipeline properties (name, ref, bams, dbsnp, interval list); which are mutually exclusive with the yaml file. Pipeline length reduced to a mere 62 lines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4790 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-05 02:33:54 +00:00
corin	bdc7516168	Taking out recalibrating for now, since having these files is confusing people and we've not gone to dbsnp 132 yet so cluster generation's broken with these command lines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4786 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-03 22:12:09 +00:00
kshakir	c7dbf66d41	Added a javaMemoryLimit option for cases where the java -Xmx memory should be lower than the bsub memory limit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4778 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-02 22:38:06 +00:00
chartl	670ae814b3	Get rid of files from the grep string git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4773 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-02 18:39:59 +00:00
chartl	220fb0c44a	Added a pipeline for merging batches. For now takes a file containing a list of VCFs, and a file containing a list of bams. Does not do anything smart (e.g. if you leave out some .bams or add some extra ones, you will not be warned). Heavy lifting done in (the beginnings of) a library for managing multi-batch or multi-project tasks. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4771 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-02 07:31:59 +00:00
chartl	9f03f09cc9	Changes to V2 pipeline and libraries. AB dropped. Cleaning enabled. Project name now properly propagated to intermediate files (instead of the string repr of the object). Indel mask is now expanded prior to filtering at indels. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4769 348d0f76-0448-11de-a6fe-93d51630548a	2010-12-01 18:55:48 +00:00
chartl	06a0fb4489	Library-ized pipeline now functions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4759 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-30 21:34:59 +00:00
kshakir	e21a66d876	Updated the Queue GATK generator and packaging to include more dependencies for fullCallingPipeline.q. Set the -bigMemQueue in the FullCallingPipelineTest to GSA to avoid waiting for the week queue when it is busy. Fixed the package definition of PipelineTest so that scalac won't recompile it every time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4755 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-30 15:29:40 +00:00
ebanks	4413208c45	Removing unnecessary and incorrect includes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4752 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-30 02:06:48 +00:00
ebanks	d89e17ec8c	Fare thee well, UGv1. Here come the days UGv2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4747 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-29 21:51:19 +00:00
kshakir	6f8cd97673	Added a ten sample 1000G whole exome test along with SimpleMetricsBySample to the pipeline validation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4737 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-26 23:17:23 +00:00
corin	6b70cde0b9	Adding a forgotten quote mark git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4729 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-24 16:38:27 +00:00
corin	e15d18129c	Adding by sample metrics. Not sure why we didn't have this in here in the first place git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4723 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-23 21:36:03 +00:00
corin	fe28f8da9c	Removing Uniquify from main pipeline indel merge, since the pipeline isn't merging from samples with the same name anyway. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4721 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-23 17:25:22 +00:00
kshakir	787e5d85e9	Added the ability to test pipelines in dry or live mode via 'ant pipelinetest' and 'ant pipelinetest -Dpipeline.run=run'. Added an initial test for genotyping chr20 on ten 1000G bams. Since tribble needs logging support too, for now setting the logging level and appending the console logger to the root logger, not just to "org.broadinstitute.sting". Updated IntervalUtilsUnitTest to output to a temp directory and not the SVN controlled testdata directory. Added refseq tables and dbsnps to validation data in BaseTest. Now waiting up to two minutes for gather parts to propagate over NFS before attempting to merge the files. Setting scatter/gather directories relative to the -run directory instead of the current directory that queue is running. Fixed a bug where escaping test expressions didn't handle delimiters at the beginning or end of the String. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4717 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-22 22:59:42 +00:00
kiran	28805d17ca	Commenting out allele-balance for now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4715 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-22 16:48:08 +00:00
corin	8dca5bd861	Putting the annotation back in, both to the filters and to UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4709 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 21:02:15 +00:00
corin	da1fe5bb37	Removing the AB filter given that we don't have that in the VCF anymore git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4708 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 20:22:05 +00:00
kshakir	79725f2d9c	Excluding the QFunction log files from the set of files to delete on completion. When a QGraph is empty displaying a warning instead of crashing with an JGraph internal assertion error. Cleaned up code using the Log4J root logger and explicitly talking to a logger for Sting. When integration tests are run detecting that the logger has already been setup so that messages aren't logged twice. Updated from Ivy 2.2.0-rc1 to 2.2.0. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4707 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 20:22:01 +00:00
hanna	302cc13735	Trying out Queue for the first time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4705 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-18 18:29:12 +00:00
corin	5466365575	Fixing a silly typo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4680 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 18:16:51 +00:00
corin	a64f693b20	Updated pipeline script to include dbSnp for UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4679 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 18:09:47 +00:00
kshakir	302e8f0239	Fixed bug where the command directory was not being set to an absolute path, leading LSF to write some .done files to /tmp. No longer using the command directory for temporary .done files, and instead using the user specified temporary directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4678 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-15 17:59:39 +00:00
kshakir	801c562909	Now actually checking in the integration test mentioned in the prior commit: compiles the full calling pipeline. Removed QScript usages of VariantRecalibrator's -reportDatFile, --report_dat_file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4668 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-14 04:27:10 +00:00
kshakir	673fa841a4	Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader. Removed obsolete usages of PackageUtils with updated PluginManager. Ported Queue interval utilities written in scala over to Sting's java IntervalUtils. Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles. Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test). While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1". Upgraded to scala 2.8.1 and updated calls to deprecated functions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-12 20:14:28 +00:00
kshakir	f35d1aa43f	Moving all file cleanup to IOUtils for easier debugging. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4646 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-10 21:00:58 +00:00
hanna	8e36a07bea	Convert GenomeLocParser into an instance variable. This change is required for anything that needs to be simultaneously aware of multiple references, eg Queue's interval sharding code, liftover support, distributed GATK etc. GenomeLocParser instances must now be used to create/parse GenomeLocs. GenomeLocParser instances are available in walkers by calling either -getToolkit().getGenomeLocParser() or -refContext.getGenomeLocParser() This is an intermediate change; GenomeLocParser will eventually be merged with the reference, but we're not clear exactly how to do that yet. This will become clearer when contig aliasing is implemented. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-10 17:59:50 +00:00
chartl	c19f567424	Sometimes, inputs are really outputs in disguise. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4631 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-05 19:51:16 +00:00
chartl	0e40321a52	Brütall hack: make the bam list creator job wait for the interval creator job, so that there is an implicit dependency of UG on the interval list, by way of the bam list git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4628 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-04 20:43:11 +00:00
chartl	cb0b2f9811	My analysis script for private mutations. I'm committing it because it contains a number of specialized command line functions that could prove useful in the future. (For example: ConcatVCF and ExtractSample) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4626 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-04 19:57:27 +00:00
chartl	42e9987e69	Bug fix to GenotypeConcordance. AC metrics get instantiated based on number of eval samples; if Comp has more samples, we can see AC indeces outside the bounds of the array. Bug fix to LiftoverVariants - no barfing at reference sites. AlleleFrequencyComparison - local changes added to make sure parsing works properly Added HammingDistance annotation. Mostly useless. But only mostly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4622 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-03 19:23:03 +00:00
hanna	861ee3e37a	Changing testing framework from junit -> testng, for its enhanced configurability. Initial test to see how Bamboo will respond. More detailed email to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4609 348d0f76-0448-11de-a6fe-93d51630548a	2010-11-01 21:31:44 +00:00
kshakir	d768c6558d	Now that the user is required to set the java temp directory, it is safer for the LsfJobRunner to write to the java temp directory instead of the command directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4593 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-28 15:00:21 +00:00
kshakir	5cdd7a7ba4	There's no such thing as a sam index, so the GATK extension generator doesn't need to add an @Input for them. Updated a call to swapExt to specify the directory. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4586 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 20:39:03 +00:00
hanna	4c23b1fe9c	Get rid of the static cache of ArgumentTypeDescriptors by making them an integral part of the parsing engine. Hugely lowers our memory footprint in integrationtests, but not yet enough to run Mark's new parallelized VariantEvalIntegrationTests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4585 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-27 19:44:55 +00:00
corin	6d7ed5781c	Added Dbsnp to Indel Realigner; added known indels rod-binding to realigner. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4576 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 22:22:28 +00:00
kshakir	8211cee0b2	Queue UI Improvements: - Forcing user to set the temp directory via -Djava.io.tmpdir to avoid filling up /tmp. - By default deleting job outputs tagged as intermediate. - Defaulting pipeline to scatter count 1 (no reads deleted). - Cleaning up temp classes even when scripting fails. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4573 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-26 19:49:08 +00:00
kshakir	80259b9e20	Changed fullCallingPipeline to output all contigs in the refence if scattering. When the cleaner interval scatter count is set to one explicitly setting the intrevals to Nil. TODO: Need to add an option that lets the user choose from the command line to scatter all contigs or just those in the intervals list. For now can get relatively the same behavior by setting the interval scatter count equal to the number of contigs+1, assuming the random contigs come at the end of the sequence dictionary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4565 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-24 03:01:06 +00:00
kshakir	e9c6f681a4	Instead of the pipeline's cleaner only writing BAMs with the target intervals, now pulling the list of contigs from the target intervals and outputing reads in those contigs. Added a brute force -retry <count> option to Queue for transient errors. Waiting up to 2 minutes for the LSF logs to appear before trying to display the errors from the logs. Updates to the local job runner error logging when a job fails. Refactored QGraph's settings as duplicate code was getting out of control. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4563 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-22 22:22:30 +00:00
kshakir	b954a5a4d5	- After removing special code for intervals, instead of being of type File they are generated as List[File]. Changed previous checkin that was appending to this list and instead assigning a singleton list. - More cleanup including removing the temporary classes and intermediate error files. Quieting any errors using Apache Commons IO 2.0. - Counting the contigs during the QScript generation instead of the end user having to pass a separate contig interval list. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4539 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-21 06:37:28 +00:00
kshakir	88a0d77433	Changed parsing engine to store the order the argument bindings based on their definition in the class, moving "-T" to the front of Queue command lines. Queue GATK generated .intervals is now a List(File) again removing special case handling in the generator. Instead of using @Scatter annotation, using ScatterFunction instance to determine if a job can be scattered. Implemented special VcfGatherFunction which only uses the header from the first file, even if the other files differ in their headers. Added a -deleteIntermediates to Queue to delete the outputs from intermediate commands after a successful run. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4536 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-20 21:43:52 +00:00
kshakir	81479229e1	QScript authors can now tag functions as intermediate. Functions tagged as intermediate will be skipped unless another function in the graph needs their output. Re-logging the failed jobs and the path to their log files at the end of a run. Added a parameter -bigMemQueue for the fullCallingPipeline.q instead of hardcoding gsa (gsa was backed up and it was actually faster to run on week). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4520 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-18 22:11:14 +00:00
chartl	2bc5971ca1	Added - a tool to fix reference bases of a VCF. The OMNI had a couple of sites with incorrect reference bases (look to be legacy from other chips), and a few more that had ref and alt flipped. GAP should probably take care of it, but since I need results by monday, I'm doing it. Modified - SelectVariants: Hook up to VariantContextUtils to recalculate AC/AF/AN, which uses the accessor in VariantContext to do this. Somehow sites that were selected down to hom-ref genotypes only wound up getting positive AC. IMPORTANT I kind of need input here. The header of a file used for an integration test specifies AC as being an integer. Recalculating it casts it into an integer list (which it should be, as it allows for alternate alleles). However this appears to clash with what the jexl expression is looking for? For now, the integration test itself needed to be changed -- it's unclear what to do when the header specifies AC of being one class, but recalculating it casts to another class, and I'm not sure what to do. I'm committing my omni_qc pipeline because I'm almost certain 2 months down the road I'm going to wonder what the heck I did to generate my results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4511 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-17 03:18:01 +00:00
kshakir	9dc2e931b6	Saving the order functions are added to in the QScript. Using the order during submission of ready jobs (but not currently dryrun) and during -status. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4508 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 20:00:35 +00:00
kshakir	7157cb9090	While bkill'ing on the shutdown thread Queue will no longer try to submit more jobs on the original thread. Updated pipeline output structure to current recommendations by Corin. Directories are now automatically before the function runs. Fixed several bugs with scatter gather binding when the script author needs to change the directories. Fixed bug with tracking of log files for CloneFunctions. More error handling and logging of exceptions (good test environment while LSF was down this early AM!) Removed cleanup utility for scatter gather. SG Output structure has changed significantly. Will need to discuss and find a better approach for Queue programatically deleting files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4504 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-15 17:01:36 +00:00
corin	5e0c4ecc21	Added DbSnp to VariantEval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4497 348d0f76-0448-11de-a6fe-93d51630548a	2010-10-14 17:02:17 +00:00

1 2 3 4

155 Commits (c64bf80b57beefe09ad11cd78f4e86f7d5902de6)