Commit Graph

161 Commits (101ffc4dfd5ce83f7d6bf0b5de4a99ebfac447de)

Author SHA1 Message Date
Mark DePristo 0111e58d4e Don't generate PDF unless you have -run specified 2011-11-09 14:45:40 -05:00
Mark DePristo 849c0757f2 Bug fix for LocusScatterFunction when no intervals are provided
-- Now correctly grabs reference contigs and cuts them all up, rather than NPE as intervalString == null.
2011-11-04 10:55:09 -04:00
Mark DePristo bd977c2d92 Bug fix to avoid infinite loop in GATKScatterFunction 2011-11-02 16:20:42 -04:00
Mark DePristo c1da8cd5e7 Final version of bp-resolved locus scatter/gather
-- Minor refactoring to allow LocusScatterFunction to have maxIntervals be the original scatter count, rather than capping this by the interval count as Contig and Interval do
2011-11-02 11:26:34 -04:00
Mark DePristo c2b97030a4 IntervalUtils for completely balanced locus-based scatter/gather
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Mark DePristo 5fc613f972 Better default partition types for walkers
-- Added PartitionType.READ, and associated ReadScatterFunction.  ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Mauricio Carneiro dbd8c25787 No more R resources in the DPP
updating the DPP to conform with Analyze Covariates changes.
2011-10-28 16:57:01 -04:00
Khalid Shakir e25d40882a Swapping Thread.sleep(0) with Object.wait(0) caused Queue to lock up. Thanks to rpoplin for pointing it out. 2011-10-28 15:51:03 -04:00
Khalid Shakir b80d407dc7 No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Eric Banks b39fcb1bea Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-26 15:44:25 -04:00
Eric Banks 3273c20c98 Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes. 2011-10-26 15:29:18 -04:00
Khalid Shakir fac9932938 Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Mauricio Carneiro 86305a5dcf Adjusting the memory limits of the MDCP
Indel caller needs more than 3G for large datasets.
2011-10-21 17:41:52 -04:00
Mauricio Carneiro 9f867d77ca no sort order
subtle bug fixed.
2011-10-20 18:44:09 -04:00
Mauricio Carneiro c9d8b22092 Added BWASW support to the pipeline
Data Processing Pipeline can now use BWASW for realigning the reads. Useful for Ion Torrent data.
2011-10-20 18:36:28 -04:00
Mauricio Carneiro 093cd95c5d Merged bug fix from Stable into Unstable 2011-10-20 17:03:22 -04:00
Mauricio Carneiro d7367c152a Fixing 'revert' when not realigning
RevertSam was reverting the alignment information and that was screwing up the pipeline if you didn't want to run it with BWA. Fixed.
2011-10-20 17:01:54 -04:00
Mauricio Carneiro ed402588cc Adding the "gold standard NA12878" target 2011-10-20 16:19:13 -04:00
Mauricio Carneiro c27e2fb676 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-18 15:23:05 -04:00
Menachem Fromer e5fc828546 With Khalid's implicit approval, I have removed this line that overrides the memory limit of the VCF-gathering function, so that the inherited limit remains 2011-10-18 14:47:39 -04:00
Mauricio Carneiro 0939d16a8d String not empty bug
Apparently var X: String = _ is not the same as var X: String = "".  :(
2011-10-13 13:22:05 -04:00
Mauricio Carneiro 66b5646f95 Adding hidden options to the DPP
controlling the default platform parameter to Count Covariates and the number of scatter gather jobs to generate are now available under hidden parameters
2011-10-11 13:56:00 -04:00
Mark DePristo 73f9d1f217 GATK read group requirement iron hand
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo a91509e7dd Shouldn't be public 2011-10-05 15:22:57 -07:00
Khalid Shakir 84bd355690 Merged bug fix from Stable into Unstable 2011-09-27 14:34:39 -04:00
Khalid Shakir b090751f62 Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Khalid Shakir 77ba59e30a Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-27 00:51:45 -04:00
Khalid Shakir 648b959361 Minor change to log an info message when a signal such as Ctrl-C is caught. 2011-09-27 00:50:19 -04:00
Mauricio Carneiro d3cc25454c Updating the MDCP 2011-09-22 11:27:40 -04:00
Mauricio Carneiro 623c49765d NO BAQ ON EXOMES!
says the boss.
2011-09-22 11:13:40 -04:00
Ryan Poplin 5d0f284305 Fixing exome specific arguments to the VQSR in the methods development calling pipeline 2011-09-21 20:26:28 -04:00
Mauricio Carneiro 758ecf2d43 Bringing latest updates of ReduceReads to the master repository 2011-09-20 16:35:09 -04:00
Mauricio Carneiro 08ffb18b96 Renaming datasets in the MDCP
Making dataset names and files generated by the MDCP more uniform.
2011-09-20 11:02:51 -04:00
Eric Banks ba150570f3 Updating to use new rod system syntax plus name change for CountRODs 2011-09-19 13:30:32 -04:00
Eric Banks 095f75ff7d Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-19 12:24:12 -04:00
Eric Banks 85626e7a5d We no longer want people to use the August 2010 Dindel calls for indel realignment but instead Guillermo's new whole genome bi-allelic indel calls; updating the bundle accordingly. Also, there was some confusion by the 1000G data processing folks as to exactly what these indel files are, so I've renamed them so that it's clear. Wiki updated too. 2011-09-19 12:24:05 -04:00
Mark DePristo 6ea57bf036 Merge branch 'master' into sgintervals 2011-09-19 09:50:19 -04:00
Khalid Shakir 33967a4e0c Fixed issue reported by chartl where cloned functions lost tags on @Inputs.
Updated ExampleUnifiedGenotyper.scala with new syntax.
2011-09-16 12:46:07 -04:00
Ryan Poplin 981b78ea50 Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts. 2011-09-12 12:17:43 -04:00
Mauricio Carneiro 7f9000382e Making indel calls default in the MDCP
You can turn off indel calling by using -noIndels.
2011-09-09 14:09:26 -04:00
Mark DePristo 06cb20f2a5 Intermediate commit cleaning up scatter intervals
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Khalid Shakir 510d5e7730 Merged bug fix from Stable into Unstable 2011-09-09 01:34:55 -04:00
Khalid Shakir 367bbee25a Fixed typo when printing the contents or last N lines of a file. Thanks to larryns. 2011-09-09 01:33:25 -04:00
Mauricio Carneiro ee9d599558 Just cleaning up
clean up old commented code from tha data processing pipeline.
2011-09-07 13:32:40 -04:00
Mauricio Carneiro 28d782b4c7 Allowing multiple dnsnp and indel files in the DPP 2011-09-02 13:38:47 -04:00
Mauricio Carneiro ad4ea0b80b Merged bug fix from Stable into Unstable 2011-09-01 18:14:45 -04:00
Mauricio Carneiro e253f6f05d Fixing typo in DPP
platform and library were exchanged when rebuilding the read group information
2011-09-01 18:13:52 -04:00
Mauricio Carneiro d2a33beff7 Added WGS/WEX b37-decoy CEU trio datasets 2011-09-01 13:14:40 -04:00
Mark DePristo 61633c95a8 Default jobreport is now jobPrefix, so you see logs like Q-2508.jobreport.txt 2011-08-28 19:19:45 -04:00
Mark DePristo b38de1fa35 Now captures the exechost in the job report
-- Works for in process, shell, and LSF runners
-- Cleanup of debugging output
2011-08-28 12:05:56 -04:00