Khalid Shakir
c50274e02e
During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
...
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Mark DePristo
0111e58d4e
Don't generate PDF unless you have -run specified
2011-11-09 14:45:40 -05:00
Mark DePristo
849c0757f2
Bug fix for LocusScatterFunction when no intervals are provided
...
-- Now correctly grabs reference contigs and cuts them all up, rather than NPE as intervalString == null.
2011-11-04 10:55:09 -04:00
Mark DePristo
bd977c2d92
Bug fix to avoid infinite loop in GATKScatterFunction
2011-11-02 16:20:42 -04:00
Mark DePristo
c1da8cd5e7
Final version of bp-resolved locus scatter/gather
...
-- Minor refactoring to allow LocusScatterFunction to have maxIntervals be the original scatter count, rather than capping this by the interval count as Contig and Interval do
2011-11-02 11:26:34 -04:00
Mark DePristo
c2b97030a4
IntervalUtils for completely balanced locus-based scatter/gather
...
-- scatterLocusIntervals master utility
-- Moved around some general functionality from GenomeLocSortedSet to GenomeLoc
-- Util function for reversing a list (List<T> -> List<T>, unlike Collections version)
-- DoC is PartitionType.INTERVAL
-- Significant unit tests on new functionality (all passing)
-- Ready for real-world testing, as soon as I can get LocusScatterFunction.scala to actually work
2011-11-02 10:49:40 -04:00
Mark DePristo
5fc613f972
Better default partition types for walkers
...
-- Added PartitionType.READ, and associated ReadScatterFunction. ReadScatterFunction is literally just ContigScatterFunction until someone wants to implement something better
-- LocusWalkers (and subclasses RodWalkers and RefWalkers) are by default PartitionType.LOCUS.
2011-11-01 19:47:10 -04:00
Mauricio Carneiro
dbd8c25787
No more R resources in the DPP
...
updating the DPP to conform with Analyze Covariates changes.
2011-10-28 16:57:01 -04:00
Khalid Shakir
e25d40882a
Swapping Thread.sleep(0) with Object.wait(0) caused Queue to lock up. Thanks to rpoplin for pointing it out.
2011-10-28 15:51:03 -04:00
Khalid Shakir
b80d407dc7
No more hunting down R "resources". As a tradeoff Rscript cannot be specified on the commandline and will be found in the environment path.
...
Other minor cleanup.
2011-10-27 14:17:07 -04:00
Eric Banks
b39fcb1bea
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-26 15:44:25 -04:00
Eric Banks
3273c20c98
Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes.
2011-10-26 15:29:18 -04:00
Khalid Shakir
fac9932938
Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
...
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Mauricio Carneiro
86305a5dcf
Adjusting the memory limits of the MDCP
...
Indel caller needs more than 3G for large datasets.
2011-10-21 17:41:52 -04:00
Mauricio Carneiro
9f867d77ca
no sort order
...
subtle bug fixed.
2011-10-20 18:44:09 -04:00
Mauricio Carneiro
c9d8b22092
Added BWASW support to the pipeline
...
Data Processing Pipeline can now use BWASW for realigning the reads. Useful for Ion Torrent data.
2011-10-20 18:36:28 -04:00
Mauricio Carneiro
093cd95c5d
Merged bug fix from Stable into Unstable
2011-10-20 17:03:22 -04:00
Mauricio Carneiro
d7367c152a
Fixing 'revert' when not realigning
...
RevertSam was reverting the alignment information and that was screwing up the pipeline if you didn't want to run it with BWA. Fixed.
2011-10-20 17:01:54 -04:00
Mauricio Carneiro
ed402588cc
Adding the "gold standard NA12878" target
2011-10-20 16:19:13 -04:00
Mauricio Carneiro
c27e2fb676
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-18 15:23:05 -04:00
Menachem Fromer
e5fc828546
With Khalid's implicit approval, I have removed this line that overrides the memory limit of the VCF-gathering function, so that the inherited limit remains
2011-10-18 14:47:39 -04:00
Mauricio Carneiro
0939d16a8d
String not empty bug
...
Apparently var X: String = _ is not the same as var X: String = "". :(
2011-10-13 13:22:05 -04:00
Mauricio Carneiro
66b5646f95
Adding hidden options to the DPP
...
controlling the default platform parameter to Count Covariates and the number of scatter gather jobs to generate are now available under hidden parameters
2011-10-11 13:56:00 -04:00
Mark DePristo
73f9d1f217
GATK read group requirement iron hand
...
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo
a91509e7dd
Shouldn't be public
2011-10-05 15:22:57 -07:00
Khalid Shakir
84bd355690
Merged bug fix from Stable into Unstable
2011-09-27 14:34:39 -04:00
Khalid Shakir
b090751f62
Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
...
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Khalid Shakir
77ba59e30a
Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-27 00:51:45 -04:00
Khalid Shakir
648b959361
Minor change to log an info message when a signal such as Ctrl-C is caught.
2011-09-27 00:50:19 -04:00
Mauricio Carneiro
d3cc25454c
Updating the MDCP
2011-09-22 11:27:40 -04:00
Mauricio Carneiro
623c49765d
NO BAQ ON EXOMES!
...
says the boss.
2011-09-22 11:13:40 -04:00
Ryan Poplin
5d0f284305
Fixing exome specific arguments to the VQSR in the methods development calling pipeline
2011-09-21 20:26:28 -04:00
Mauricio Carneiro
758ecf2d43
Bringing latest updates of ReduceReads to the master repository
2011-09-20 16:35:09 -04:00
Mauricio Carneiro
08ffb18b96
Renaming datasets in the MDCP
...
Making dataset names and files generated by the MDCP more uniform.
2011-09-20 11:02:51 -04:00
Eric Banks
ba150570f3
Updating to use new rod system syntax plus name change for CountRODs
2011-09-19 13:30:32 -04:00
Eric Banks
095f75ff7d
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-19 12:24:12 -04:00
Eric Banks
85626e7a5d
We no longer want people to use the August 2010 Dindel calls for indel realignment but instead Guillermo's new whole genome bi-allelic indel calls; updating the bundle accordingly. Also, there was some confusion by the 1000G data processing folks as to exactly what these indel files are, so I've renamed them so that it's clear. Wiki updated too.
2011-09-19 12:24:05 -04:00
Mark DePristo
6ea57bf036
Merge branch 'master' into sgintervals
2011-09-19 09:50:19 -04:00
Khalid Shakir
33967a4e0c
Fixed issue reported by chartl where cloned functions lost tags on @Inputs.
...
Updated ExampleUnifiedGenotyper.scala with new syntax.
2011-09-16 12:46:07 -04:00
Ryan Poplin
981b78ea50
Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.
2011-09-12 12:17:43 -04:00
Mauricio Carneiro
7f9000382e
Making indel calls default in the MDCP
...
You can turn off indel calling by using -noIndels.
2011-09-09 14:09:26 -04:00
Mark DePristo
06cb20f2a5
Intermediate commit cleaning up scatter intervals
...
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Khalid Shakir
510d5e7730
Merged bug fix from Stable into Unstable
2011-09-09 01:34:55 -04:00
Khalid Shakir
367bbee25a
Fixed typo when printing the contents or last N lines of a file. Thanks to larryns.
2011-09-09 01:33:25 -04:00
Mauricio Carneiro
ee9d599558
Just cleaning up
...
clean up old commented code from tha data processing pipeline.
2011-09-07 13:32:40 -04:00
Mauricio Carneiro
28d782b4c7
Allowing multiple dnsnp and indel files in the DPP
2011-09-02 13:38:47 -04:00
Mauricio Carneiro
ad4ea0b80b
Merged bug fix from Stable into Unstable
2011-09-01 18:14:45 -04:00
Mauricio Carneiro
e253f6f05d
Fixing typo in DPP
...
platform and library were exchanged when rebuilding the read group information
2011-09-01 18:13:52 -04:00
Mauricio Carneiro
d2a33beff7
Added WGS/WEX b37-decoy CEU trio datasets
2011-09-01 13:14:40 -04:00
Mark DePristo
61633c95a8
Default jobreport is now jobPrefix, so you see logs like Q-2508.jobreport.txt
2011-08-28 19:19:45 -04:00