Commit Graph

85 Commits (06258c8a0154db2ac3fca6e20fe7036e27794485)

Author SHA1 Message Date
Mark DePristo 8db4e787b1 V1 of tool to visualize the quality score information in the context covariates
-- Upgraded jgrapht to latest version (0.8.3)
2012-07-31 08:11:02 -04:00
Roger Zurawicki f3c504769b Added the ability to update the Forum
GATKDocs looks for a key on gsa4, and updates the forum with new walker if it exists.
More changes were made to the GATKDocs. Works nicely with bootstrap on and offline.
Cleaned up the code as well

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-07-23 17:17:33 -04:00
Mauricio Carneiro 116885a450 Removed the "Walker" suffix from all walkers that had it.
* Did not touch archived walkers... those can be named whatever.
   * Kept abstract classes that end in Walker untouched (e.g. LocusWalker, ReadWalker, ...)
   * Renamed a few inner classes due to conflict when stripping off Walker from their outer classes: ContigStats, FlagStats and FastaStats.
2012-07-20 17:27:11 -04:00
Joel Thibault abe74dc32d Navel -> GXDB 2012-06-28 13:38:00 -04:00
Mark DePristo 31ee8aa01a JEXL update
-- Update to 2.1.1 from 2.0
-- VariantFiltrationWalker now allows you to run with type unsafe selects, which all default to false when matching.  So "AF < 0.5" works even in the presence of multi-allelics now.
--
2012-06-21 15:17:21 -04:00
Mark DePristo e9c22b9aad Final updates to integration tests for BCF2
-- Fully working version
-- Use -generateShadowBCF to write out foo.bcf as well as foo.vcf anywhere you use -o foo.vcf
-- Moved MedianUnitTest to its proper home in Utils
-- Added reportng to ivy and testng, so build/report/X/html/ is a nicely formatted output for Unit and Integration tests.  From this website it's easy to see md5 diffs, etc.  This is a vastly better way to manage unit and integration test output
2012-05-24 10:58:59 -04:00
Joel Thibault 085588cb04 Not Nexus. Need new name. Navel? 2012-05-24 10:11:58 -04:00
David Roazen 9c6bccfd8b build system overhaul
* Added support for a protected directory whose contents are only made public in binary form

* Simplified and reorganized build.xml to improve readability and maintainability

* build.xml now autodetects most build properties:
    -Includes private/protected if they exist
    -No more STING_BUILD_TYPE or specialized targets for public-only, etc.

* Build targets have changed! There are now two main build options:

"ant"       build everything (GATK and Queue)
"ant gatk"  build just the GATK

It was too hard to build everything before -- now it is the default.

* To run tests with debugging, use -Dtest.debug=true -Dtest.debug.port=XXXX on the command line.
  Much better than the old comment/uncomment method!
2012-05-17 15:16:29 -04:00
Joel Thibault 229d1aa904 Bjorn -> Nexus 2012-05-15 13:30:29 -04:00
Joel Thibault aa4d41cce0 Minor cleanup before push 2012-05-01 14:16:44 -04:00
Joel Thibault d93a413f2e Add MongoDB dependency 2012-05-01 13:53:43 -04:00
Mark DePristo 3164c8dee5 S3 upload now directly creates the XML report in memory and puts that in S3
-- This is a partial fix for the problem with uploading S3 logs reported by Mauricio.  There the problem is that the java.io.tmpdir is not accessible (network just hangs).  Because of that the s3 upload fails because the underlying system uses tmpdir for caching, etc.  As far as I can tell there's no way around this bug -- you cannot overload the java.io.tmpdir programmatically and even if I could what value would we use?  The only solution seems to me is to detect that tmpdir is hanging (how?!) and fail with a meaningful error.
2012-01-29 15:14:58 -05:00
Mark DePristo cb04c0bf11 Removing javaassist 3.7, lucene library dependancies 2012-01-27 08:24:22 -05:00
Khalid Shakir 5793625592 No more "Q-<pid>@<host>". Generated log file names now use the first output + ".out" (ex. my.vcf.out) or the name of the first QScript plus the order the function was added (ex. MyScript-1.out). The same function added twice with the same outputs will now have the same default logs, meaning the 2nd instance of the function won't be added to the graph twice.
QScript accessor to QSettings to specify a default runName and other default function settings.
Because log files are no longer pseudo-random their presense can be used to tell if a job without other file outputs is "done". For now still using the log's .done file in addition to original outputs.
Gathered log files concatenate all log files together into the stdout.
InProcessFunctions now have PrintStreams for stdout and stderr.
Updated ivy to use commons-io 2.1 for copying logs to the stdout PrintStream. Removed snakeyaml.
During graph tracking of outputs the Index files, and now BAM MD5s, are tracked with the gathering of the original file.
In Queue generated wrappers for the GATK the Index and MD5s used for tracking are switched to private scope.
Added more detailed output when running with -l DEBUG.
Simplified graphviz visualization for additional debugging.
Switched usage of the scala class 'List' to the trait 'Seq' (think java.util.ArrayList vs. using the interface java.util.List)
Minor cleanup to build including sending ant gsalib to R's default libloc.
2012-01-08 12:11:55 -05:00
David Roazen ea6e718cb8 SnpEff 2.0.5 support. Re-enabled SnpEff in the HybridSelectionPipeline.
For now, we recommend only running with the GRCh37.64 database.
2012-01-03 15:18:36 -05:00
Mauricio Carneiro f7a5752025 Let this one slip through my commits. 2011-12-26 21:55:02 -05:00
Mauricio Carneiro 4633637af6 Moved ReduceReads to static ReadClipper
* all clipping done in ReduceReads is done using the static methods of the ReadClipper now.
2011-12-26 21:14:40 -05:00
David Roazen 68b2a0968c Updating the HybridSelectionPipeline for SnpEff 2.0.4 RC3
This will have to be done again when the 2.0.4 release becomes official,
but it's necessary to do now in order to re-enable the pipeline tests.
2011-11-17 14:46:12 -05:00
Khalid Shakir b090751f62 Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Khalid Shakir 61b89e236a To work around potential problem with invalid javax.mail 1.4.1 in ivy cache, added explicit javax.mail 1.4.4 along with build.xml code to remove 1.4.1. 2011-09-20 00:14:35 -04:00
David Roazen bd5cdb8a43 The tribble dependency is now handled through ivy. Revved tribble to r18 and removed obsolete build targets in build.xml 2011-08-11 16:38:29 -04:00
Mark DePristo 45c73ff0e5 Runs and emits an HTML document 2011-07-20 17:16:33 -04:00
David Roazen 68e19edf59 Merged bug fix from Stable into Unstable, and resolved merge conflicts.
Conflicts:
	build.xml
	settings/ivysettings.xml
2011-07-08 15:50:31 -04:00
David Roazen a3c9d9c3ff Fixing Contracts for Java, and enabling contracts by default for unit/integration tests.
The NullPointerException we were seeing when trying to run with contracts enabled was being caused
by an outdated version of the asm library.

To run tests without contracts and disable their compilation, pass in "-Duse.contracts=false" to ant.

Also did some minor unrelated cleanup in build.xml
2011-07-08 15:34:39 -04:00
Khalid Shakir 7b699f8b17 Switched GridEngine from looking from environment variable to using embedded jar. 2011-07-05 21:59:00 -04:00
droazen cc1f94310d A prototype script and library dependencies to extract a BAM list from a
reasonably well-formed PM's xls{x}-format spreadsheet or tsv file.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@6036 348d0f76-0448-11de-a6fe-93d51630548a
2011-06-22 22:53:45 +00:00
hanna c2e8c460cb Factor out all testing dependencies into a separate test configuration and
only download that test configuration when running unit/integration tests.
This means that the build will (hopefully) never break because it can't
fetch a file that isn't required for the GATK to run.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5775 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 22:42:11 +00:00
hanna 45d8634522 Intermediate commit: bring Google Caliper into our private repository (even
though sonatype is back up).  This will tide us over until I figure out how
to add caliper to test configuration, so that it's only swapped in when we
actually run our unit / performance tests.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5770 348d0f76-0448-11de-a6fe-93d51630548a
2011-05-05 14:33:14 +00:00
hanna b915520653 Updating to apache commons math v2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5689 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-26 17:31:49 +00:00
hanna 57a4700299 Ported small BAM performance test suite to the Google Caliper microbenchmarking suite. Looks promising,
but I'm still not sure that GC is a good long-term solution.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5683 348d0f76-0448-11de-a6fe-93d51630548a
2011-04-22 22:09:17 +00:00
hanna b8c3c3ae6e Added commons math, for Kristian.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5238 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 18:57:21 +00:00
depristo 197c91e2fb Working implementation of GATKRunReport POSTing to Amazon Web Services S3 storage. Requires users to explicitly provide the secret key to do the upload. Am investigating options to avoid having to do this in the future. Pretty cool little experiment for those who are interested in S3 interaction (extremely trivial)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5130 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-30 21:23:54 +00:00
depristo 7b92cd5008 Adding lucene dependency for file locking -- may be removed in the near future
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5065 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-24 18:59:42 +00:00
kshakir b34e2f733f Removed stochasticity from IndelRealigner by random sampling using and seed based on the read list.
Updated the Queue scatter/gather for read walkers to include -L unmapped on the last scatter job when intervals aren't specified, and to map it correctly when it is explicitly set.
Simplified the build.xml/ivy.xml to fix a bug reported with "ant clean dist test" where the scalac target wasn't found.
Now building all scala code at the same time, just like all java code is compiled at the same time.
Sped up the build for everyone by uncommenting a small bit of classes so that javac/scalac will not constantly launch trying to build .class files that will never compile.
Moved some source files to their expected location so that the .java/.scala -> .class is a one-to-one match, again keeping the compilers from wasting cycles.
Used <uptodate> and <touch> to skip extracting the help text and generating the GATK Queue extensions when the source files haven't been modified.
Fixed a couple errors when the <javadoc> task is run.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4963 348d0f76-0448-11de-a6fe-93d51630548a
2011-01-07 22:03:36 +00:00
kshakir 56433ebf6b Switched from LSF command line wrappers to JNA wrappers around the C API. Side effects:
- bsub command line is no longer fully printed out.
- extraBsubArgs hack is now a callback function updateJobRun.
Updated FullCallingPipelineTest to reflect latest changes to fullCallingPipeline.q.
Added a pipeline that tests the UGv2 runtimes at different bam counts and memory limits.
Updated VE packages that live in oneoffs to compile to oneoffs.
Added a hack to replace the deprecated symbol environ in Mac OS X 10.5+ which is needed by LSF7 on Mac.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4816 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-10 04:36:06 +00:00
kshakir 673fa841a4 Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader.
Removed obsolete usages of PackageUtils with updated PluginManager.
Ported Queue interval utilities written in scala over to Sting's java IntervalUtils.
Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles.
Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test).
While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1".
Upgraded to scala 2.8.1 and updated calls to deprecated functions.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:14:28 +00:00
hanna 861ee3e37a Changing testing framework from junit -> testng, for its enhanced configurability.
Initial test to see how Bamboo will respond.  More detailed email to follow.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4609 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 21:31:44 +00:00
kshakir b954a5a4d5 - After removing special code for intervals, instead of being of type File they are generated as List[File]. Changed previous checkin that was appending to this list and instead assigning a singleton list.
- More cleanup including removing the temporary classes and intermediate error files.  Quieting any errors using Apache Commons IO 2.0.
- Counting the contigs during the QScript generation instead of the end user having to pass a separate contig interval list.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4539 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 06:37:28 +00:00
kshakir 63e3848187 Added status email support with -statusTo. Will send emails on failure of an individual function or success/failure of the whole pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4496 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-14 15:58:52 +00:00
kshakir db47230dd9 Wrapping ScatterGatherableFunctions with a facade instead of using slower clone library. Will require keeping Clone's facade code in sync with CommandLineFunction but runs *much* faster.
Shell invoking scripts so that even really long shell scripts make it through LSF.
Using the truncated (up to 1000 characters) of the command line for the job name for use with bjobs.
Switched the default from re-running everything to re-running only files that need to be regenerated.  --skip_up_to_date replaced with --start_clean for those who want to regenerate everything.
Updated logging to let users know when the scatter gather generator is running, which still takes a while but is orders of magnatudes faster for large lists of functions.  (40s for a 100 function graph exploding to a 2500 function graph)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4448 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-07 01:19:18 +00:00
kshakir 20b38b38f3 Updated from SnakeYAML 1.6 to 1.7.
Added a pipeline java bean and YAML utility to serialize java beans.
Added a getFirehosePipelineYaml.sh that can pull firehose data into the pipeline yaml file format.
Updated the fullCallingPipeline.q to begin using the pipeline yaml file format for bams and reference.
More changes to come as this code gets tested out in the fullCallingPipeline.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4329 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-22 19:47:49 +00:00
bthomas e5f81d25d4 Adding the --sample-metadata (-SM) command line argument and associated functionality. This is something Matt and I have been working on for a while. Basically, it allows you to integrate sample metadata into an analysis, by including a sample file. More detailed documentation is on the wiki: http://www.broadinstitute.org/gsa/wiki/index.php/Adding_Sample_data_to_an_analysis
This commit adds two important classes: Sample, which contains data about one sample; and SampleDataSource, which manages sample data a la ReferenceDataSource and ReadsDataSource. 

This code should be stable, but it has not been integrated with existing walkers yet. That's the next commit. 

In the meantime, feel free to experiment with the code - there are two basic example walkers in the playground.sample package. And PLEASE let me know if you see any errors/inconsistencies.

Note that this also adds a new dependency on SnakeYaml, a YAML parser.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4285 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-15 11:50:22 +00:00
ebanks 43f1fb2380 Okay, finally done with VCF compression. Now:
1. Uses blocked gzip compression.
2. No more -bzip option available (since we can't compress to sdout).
3. Only file extensions that are compressed are .gz and .gzip.
4. No more need for CompressedVCFWriter.java



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4099 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 16:36:54 +00:00
ebanks 44f3c5639a I have finally figured out that when you volunteer to do something in group meeting, you keep getting pestered about it on Mark's Omniplan doc until it gets done (except for contig aliasing, of course). As such...
We can now emit bzipped VCFs from the GATK.

Details: any walker that defines a VCFWriter for its @Output (i.e. pretty much every core walker from UG and on), also has associated with it the -bzip (--bzip_compression) boolean argument.  When set, it will emit a VCF that is compressed with bzip2.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4093 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-24 04:14:50 +00:00
aaron 3dc4d3c3a9 removing the custom reflections library from the libs, and adding a release version. Hopefully this will fix the problem Menachem has been seeing with random JVM crashes. Also
removed the auto-deletion of the reflections jar, and removed the very old OmniPlan document we had checked-in.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4056 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-19 00:42:37 +00:00
kshakir 4f51a02dea Changed logging level to default at INFO instead of WARN.
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-09 16:42:48 +00:00
aaron 72ae81c6de VariantContext has now moved over to Tribble, and the VCF4 parser is now the only VCF parser in town. Other changes include:
- Tribble is included directly in the GATK repo; those who have access to commit to Tribble can now directly commit from the GATK directory from Intellij; command line users can commit from 
inside the tribble directory.
- Hapmap ROD now in Tribble; all mentions have been switched over.
- VariantContext does not know about GenomeLoc; use VariantContextUtils.getLocation(VariantContext vc) to get a genome loc.
- VariantContext.getSNPSubstitutionType is now in VariantContextUtils.
- This does not include the checked-in project files for Intellij; still running into issues with changes to the iml files being marked as changes by SVN

I'll send out an email to GSAMembers with some more details.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3954 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-05 18:47:53 +00:00
aaron 35ce367898 adding the annotations for findbugs as dependencies in the GATK. They have to be in the default config so that we can
annotate code without running findbugs.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3829 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 16:34:57 +00:00
aaron 7ff6106c14 adding Ivy lines for findbug, and adding a build task (to run it locally you need to have installation of findbug). I'll put more information on the wiki when it's up and running.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3744 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 19:10:19 +00:00
kshakir 30cf78fdc0 Refactoring for a first version of scatter gather api with basic shell script implementations.
Modified build script so that queue is cleaned during "ant clean".



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3611 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 18:39:20 +00:00