gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	e1174a582d	Merge pull request #379 from broadinstitute/mc_dpp_updates_part2 Including SplitByRG in the FullProcessingPipeline	2013-08-19 18:42:12 -07:00
Michael McCowan	c3a933ce84	Adaptations to accomodate Tribble API changes, comprising mostly of the following. * Refactoring implementations of readHeader(LineReader) -> readActualHeader(LineIterator), including nullary implementations where applicable. * Galvanizing fo generic types. * Test fixups, mostly to pass around LineIterators instead of LineReaders. * New rev of tribble, which incorporates a fix that addresses a problem with TribbleIndexedFeatureReader reading a header twice in some instances. * New rev of sam, to make AbstractIterator visible (was moved from picard -> sam in Tribble API refactor).	2013-08-19 15:52:47 -04:00
Mauricio Carneiro	e991307eb5	Including SplitByRG in the FullProcessingPipeline Why wasn't it there before, you ask ---------------------------------- Before I was running it separately (by hand), but now it's integrated in the FullProcessingPipeline. Integration was a pain because of Queue's limitation of only allowing 1 @Output file. This forced me to write the ugliest piece of code of my life, but it's working and it's processing the YRI from scratch using that right now. So I'm happy... somewhat. Other changes to the pipeline ----------------------------- * Add --filter_bases_not_stored to the IndelRealigner step -- sometimes BAM files have reads with no bases stored in the unmapped section (no idea why) but this disrupts the pipeline. * Change adaptor marking parameter to "dual indexed" instead of "pair-ended" -- for PCR Free data.	2013-08-18 00:51:32 -04:00
Mauricio Carneiro	765f5450ac	Updated Full Processing Pipeline * add interleaved fastq option to sam2fastq * add optional adapter trimming path * add "skip_revert" option to skip reverting the bams (sometimes useful -- hidden parameter) * add a walker that reads in one bam file and outputs N bam files, one for each read group in the original bam. This is a very important step in any BAM reprocessing pipeline. I am using this new pipeline to process the CEU and YRI PCR Free WGS trios.	2013-08-13 23:35:32 -04:00
lbergelson	af36c7ce9a	Update QScript.scala Relaxing addAll parameter type from Seq to Traversable to make it slightly more flexible.	2013-08-02 14:09:26 -04:00
David Roazen	6d69c7dc71	Disable RetryMemoryLimit pipeline test -This test is failing intermittently for unexplained reasons (see GSA-943) -In the interest of keeping the rest of the pipeline test suite running, it's best to disable this one test until GSA-943 is resolved	2013-07-03 13:38:28 -04:00
David Roazen	c3d59d890d	Update licenses for new PbsEngine* classes	2013-07-01 15:50:20 -04:00
Khalid Shakir	ec206eccfc	Switch "all" test pipeline job runners to mean the job runners that run at The Broad.	2013-07-01 15:12:55 -04:00
Francesco	acf90ca027	corrected number of arguments passed to PbsEngineJobRunner when requesting multiple cores Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2013-07-01 15:08:15 -04:00
Francesco	948b2fca20	added PbsEngine plugin into engine folders, to be called in Queue with -jobRunner PbsEngine; the plugin is written modifying the existing GridEngine plugin, used as a template Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2013-07-01 15:08:14 -04:00
David Roazen	31827022db	Fix pipeline tests that were not respecting the pipeline test dry run setting There are a few pipeline test classes that do not run Queue, but are classified as pipeline tests because they submit farm jobs. Make these unconventional pipeline tests respect the pipeline test dry run setting.	2013-06-28 15:27:17 -04:00
Guillermo del Angel	f6025d25ae	Feature requested by Reich lab and Paavo lab in Leipzig for ancient DNA processing: -- When doing cross-species comparisons and studying population history and ancient DNA data, having SOME measure of confidence is needed at every single site that doesn't depend on the reference base, even in a naive per-site SNP mode. Old versions of GATK provided GQ and some wrong PL values at reference sites but these were wrong. This commit addresses this need by adding a new UG command line argument, -allSitePLs, that, if enabled will: a) Emit all 3 ALT snp alleles in the ALT column. b) Emit all corresponding 10 PL values. It's up to the user to process these PL values downstream to make sense of these. Note that, in order to follow VCF spec, the QUAL field in a reference call when there are non-null ALT alleles present will be zero, so QUAL will be useless and filtering will need to be done based on other fields. -- Tweaks and fixes to processing pipelines for Reich lab.	2013-06-17 13:21:09 -04:00
David Roazen	639030bd6d	Enable convenient display of diff engine output in Bamboo, plus misc. minor test-related improvements -Diff engine output is now included in the actual exception message thrown as a result of an MD5 mismatch, which allows it to be conveniently viewed on the main page of a build in Bamboo. Minor Additional Improvements: -WalkerTestSpec now auto-detects test class name via new JVMUtils.getCallingClass() method, and the test class name is now included as a regular part of integration test output for each test. -Fix race condition in MD5DB.ensureMd5DbDirectory() -integrationtests dir is now cleaned by "ant clean" GSA-915 #resolve	2013-05-10 19:00:33 -04:00
Eric Banks	d981fd01b8	Now that we don't generate dict and fai files, the resource script needs to copy them to the bundle.	2013-05-02 15:18:13 -04:00
Eric Banks	6d0e383a60	Fixing the bundle script 1. someone out there busted it when adding high confidence 1000G calls 2. new path to NA12878 bam 3. updated clashing version argument	2013-05-02 09:40:36 -04:00
Ryan Poplin	80131ac996	Adding the 1000G_phase1.snps.high_confidence callset to the GATK resource bundle for use in the April 2013 updated best practices.	2013-04-24 11:41:32 -04:00
Guillermo del Angel	c9d3c67a9b	Small Queue/scala improvements, and commiting pipeline scripts developed for ancient DNA processing for posterity: -- Picard extension so Queue scripts can use FastqToSam -- Single-sample BAM processing: merge/trim reads + BWA + IR + MD + BQSR. Mostly identical to standard pipeline, except for the adaptor trimming/merging which is critical for short-insert libraries. -- Single-sample calling (experimental, work in progress): standard UG run but outputting at all sites, meant for deep whole genomes. New scripts	2013-04-08 11:52:13 -04:00
Geraldine Van der Auwera	f972963918	Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) GATK-73 updated docs for bqsr args GATK-9 differentiate CountRODs from CountRODsByRef GATK-76 generate GATKDoc for CatVariants GATK-4 made resource arg required GATK-10 added -o, some docs to CountMales; some docs to CountLoci GATK-11 fixed by MC's -o change; straightened out the docs. GATK-77 fixed references to wiki GATK-76 Added Ami's doc block GATK-14 Added note that these annotations can only be used with VariantAnnotator GATK-15 specified required=false for two arguments GATK-23 Added documentation block GATK-33 Added documentation GATK-34 Added documentation GATK-32 Corrected arg name and docstring in DiffObjects GATK-32 Added note to DO doc about reference (required but unused) GATK-29 Added doc block to CountIntervals GATK-31 Added @Output PrintStream to enable -o GATK-35 Touched up docs GATK-36 Touched up docs, specified verbosity is optional GATK-60 Corrected GContent annot module location in gatkdocs GATK-68 touched up docs and arg docstrings GATK-16 Added note of caution about calling RODRequiringAnnotations as a group GATK-61 Added run requirements (num samples, min genotype quality) Tweaked template and generic doc block formatting (h2 to h3 titles) GATK-62 Added a caveat to HR annot Made experimental annotation hidden GATK-75 Added setup info regarding BWA GATK-22 Clarified some argument requirements GATK-48 Clarified -G doc comments GATK-67 Added arg requirement GATK-58 Added annotation and usage docs GSATDG-96 Corrected doc Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)	2013-03-12 10:57:14 -04:00
David Roazen	2a7af43164	Fix improper dependencies in QScripts used by pipeline tests, and attempt to fix the flawed MisencodedBaseQualityUnitTest -Some QScripts used by public pipeline tests unnecessarily used the (now protected) UnifiedGenotyper. Changed them to use PrintReads instead. -Moved ExampleUnifiedGenotyperPipelineTest to protected -Attempt to fix the flawed and sporadically failing MisencodedBaseQualityUnitTest: After looking at this class a bit, I think the problem was the use of global arrays for the quals shared across all reads in all tests (BAMRecord class definitely does not make a separate copy for each read!). One test (testFixBadQuals) modifies the bad quals array, and if this happens to run before the testBadQualsThrowsError test the bad quals array will have been "fixed" and no exception will be thrown.	2013-02-27 04:45:53 -05:00
Tad Jordan	eb847fa102	Message "script failed" moved to the correct place in the code GSA-719 fixed	2013-02-04 15:37:23 -05:00
Mauricio Carneiro	e7c9e3639e	Making metrics a required parameter in MarkDuplicates As requested by user (forum)	2013-01-25 17:49:49 -05:00
Khalid Shakir	c58e02a3bd	Added a QFunction.jobLocalDir for optionally tracking a node local directory that may have faster intermediate storage, with SGF ensuring that if the directory happens to be on the same machine that it get's a clone specific sub-directory to avoid collisions.	2013-01-25 14:28:04 -05:00
Ami Levy-Moonshine	0fb7b73107	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-18 15:03:42 -05:00
Ami Levy-Moonshine	826c29827b	change the default VCFs gatherer of the GATK (not just the UG)	2013-01-18 15:03:12 -05:00
Khalid Shakir	4ffb43079f	Re-committing the following changes from Dec 18: Refactored interval specific arguments out of GATKArgumentCollection into InvtervalArgumentCollection such that it can be used in other CommandLinePrograms. Updated SelectHeaders to print out full interval arguments. Added RemoteFile.createUrl(Date expiration) to enable creation of presigned URLs for download over http: or file:.	2013-01-16 12:43:15 -05:00
Mauricio Carneiro	bc64d4240f	Licensing update -- batch #2 - caught all scala files that didn't have proper package information / class names - included all source files in archive as well GSATDG-5	2013-01-11 13:38:11 -05:00
Mauricio Carneiro	28235f57f2	Adding package information to scala scripts that were missing it. Including archived ones. GSATDG-5	2013-01-11 13:38:05 -05:00
Ami Levy-Moonshine	352cb831d0	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-10 21:27:06 -05:00
Ami Levy-Moonshine	fac0bce916	add RunCoveredByNSamplesSites; changes in CoveredByNSamplesSites so it can work in parallel; also, move it to diagnostics	2013-01-10 21:26:49 -05:00
Mauricio Carneiro	ea8c8573d2	Fixing ParseLicense script for scala syntax - Scala allows package objects in its syntax, so the script needs to be aware of that and not add "*/" every time it sees it. GSATDG-5	2013-01-10 18:24:24 -05:00
Mauricio Carneiro	e5913e50b2	Updating licenses for all scala files GSATDG-5	2013-01-10 17:46:10 -05:00
Mauricio Carneiro	d3e2352072	Moved processing pipelines to private These pipelines were supposed to serve as an example for the community, they were written a long-long-long time ago and are being used today by users as the 'best practice pipeline'. Unless we decide we want to support and maintain an example best-practices pipeline, I'm moving these to private.	2013-01-07 14:49:57 -05:00
Eric Banks	78f7a4e300	Received permission from Mauricio to archive the DPP and PBPP PipelineTests	2013-01-07 14:03:08 -05:00
Ami Levy-Moonshine	81eef3aa37	merge development branchs of log-less HMM and FastGatherer to master	2013-01-06 23:01:58 -05:00
Ami Levy-Moonshine	fe427cdd77	add few queue script and the CatVariantsGatherer scala class	2012-12-26 13:06:36 -05:00
David Roazen	07b369ca7e	Move VCF/BCF2/VariantContext to new standalone org.broadinstitute.variant package This is an intermediate commit so that there is a record of these changes in our commit history. Next step is to isolate the test classes as well, and then move the entire package to the Picard repository and replace it with a jar in our repo. -Removed all dependencies on org.broadinstitute.sting (still need to do the test classes, though) -Had to split some of the utility classes into "GATK-specific" vs generic methods (eg., GATKVCFUtils vs. VCFUtils) -Placement of some methods and choice of exception classes to replace the StingExceptions and UserExceptions may need to be tweaked until everyone is happy, but this can be done after the move.	2012-12-19 10:25:22 -05:00
Eric Banks	18728ec5bd	Updates to the bundle script: 1. Add the symbolic 'current' link for the new bundle dir 2. Don't gzip and copy .out files 3. Don't call chr20 SNPs on the example BAM because it's now just a few reads on chr1	2012-12-18 11:16:42 -05:00
Mauricio Carneiro	6d22f4f737	Bringing latest performance updates from the GATK to CMI	2012-12-05 21:40:03 -05:00
kshakir	61bde6210b	Restored RemoteFile push and pull in base QScript.	2012-12-04 12:34:07 -05:00
Joel Thibault	97d29f203e	Add walltime changes to LSF - Check whether the specified attribute is available - Add pipeline test (disabled due to missing attribute)	2012-11-29 15:23:37 -05:00
Johan Dahlberg	daf6269b65	Setting the walltime Signed-off-by: Joel Thibault <thibault@broadinstitute.org>	2012-11-29 15:23:36 -05:00
kshakir	a6c1fcd151	Removed default use of @Output syntax. If compile completes for QScripts, sending runtime errors during execute.	2012-11-29 13:40:36 -05:00
Menachem Fromer	a8c7edca05	Fixed fragment handling in DepthOfCoverage	2012-11-21 16:01:10 -05:00
Menachem Fromer	c8be7c3102	Keep SNPs and indels separately for batch merging; Add options to DepthOfCoverage to count fragments (to not double-count overlapping reads of same fragment); DepthOfCoverage should now support ReducedReads; Replace recusrion with loop in DoC/package.scala (for lists longer than 5000 elements)	2012-11-21 15:56:53 -05:00
Menachem Fromer	9111966261	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-11-20 12:19:58 -05:00
Eric Banks	843384e435	Rename hg19 files in bundle to b37 since that's what they are	2012-11-14 11:47:09 -05:00
Mauricio Carneiro	e35fd1c717	Merging CMI-0.5.0 and GATK-2.2 together.	2012-11-14 10:42:03 -05:00
kshakir	6d59dd3455	Scala classes were only returning direct subclasses (confirmed when inspected in debugger) so changed PluginManager to allow specifying the explicit subclass. Removed some generics from PluginManager for now until able to figure out syntax for requesting explicit subclass. QStatusMessenger uses a slightly more primitive Map[String, Seq[RemoteFile]] instead of Map[ArgumentSource, Seq[RemoteFile]]. Added an QCommandPlugin.initScript utility method for handling specialized script types.	2012-11-14 10:33:20 -05:00
David Roazen	73157ae3d3	Allow each pipeline test the max of 10 hours to run The runtime of these tests is extremely variable -- sometimes they will complete almost instantly, other times they will wait in an LSF queue for 5-10+ hours. Minimize timeout errors by setting the timeout for these tests to the maximum of 10 hours.	2012-11-02 12:40:56 -04:00
Guillermo del Angel	51a9ce28e1	Merge remote-tracking branch 'unstable/master' into develop	2012-10-31 10:29:48 -04:00

1 2 3 4 5 ...

330 Commits (cdfd07f9eb4e2ca18b2b6b10d00797cd3a156ebd)