Commit Graph

5278 Commits (81414a21ddeb2409c8a9d759d0ee11c8f23f9db2)

Author SHA1 Message Date
carneiro 81414a21dd dpp: back to using 4gb memory assuming all is right with IndelRealigner now.
mdcp: Some class structural changes due to the inclusion of indel calls. ApplyCut now chooses the tranche differently for each dataset.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5319 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 19:21:02 +00:00
kshakir 3e0a722672 MFCP waits for other pipelines to finish by using the previous log file of one pipeline as virtual input to the next pipeline.
Using the name of the yaml in the log file name instead of each writing each to "queue.out" so that two yamls can run from the same directory without creating cycles in the graph.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5318 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 17:51:01 +00:00
hanna 600f73cbd6 A checkpoint commit of two BAM reading projects going on simultaneously. These two projects
are works in progress, and this checkin will provide a baseline against which to gauge 
improvements to both projects.

Low-memory BAM protoshards (disabled by default):
- Currently passing ValidatingPileupIntegrationTest.
- Gets progressively slower throughout the traversal, but should run at least as fast as original implementation.
- Uses 10+ file handles per BAM, but should use 3.

BAM performance microbenchmark test system:
- Currently tests performance of BAM reading using SAM-JDK vs. GATK
- Tests do not hit all GATK performance hotspots.
- New tests that require input data in a slightly different form are hard to implement.
- Output of test results is not easily parseable (investigating Google Caliper for possible improvements).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5317 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 17:50:32 +00:00
kshakir ad1e4f47b1 Fixed fatal typo in TSV to YAML converter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5316 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 17:18:54 +00:00
ebanks 5d28cbda27 When crossing contigs it's crucial that the queue get flushed or else it will continue to accumulate reads without emitting. This is the last time I trust someone when they tell me that they are 'confident there are no bugs' in a tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5315 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 05:18:30 +00:00
chartl 44a48b4178 If you scatter depth of coverage and need to do something more sophisticated than gathering up (e.g. concatenating) the interval summary file, and need to smartly gather up a full summary file, modify (stress on MODIFY) this script to do it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5314 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 01:23:53 +00:00
kshakir 24ef2be02d Updated firehose pulldown shell scripts:
- a LOT more error reporting to stderr and exit codes
- split the firehose pull down into a TSV generators and a TSV to YAML converter
- YAML converter is compatible with the TSVs generated by the front end website and will grab only the appropriate columns
- deprecated getFirehosePipelineYaml.sh mode with a single Sample_Set name which uses the Firehose test harness
- new getFirehosePipelineYamle.sh mode using web services API and requires an additional parameter, a password config file with "-u <user>:<pass>" which has been tested on problematic Sample_Sets



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5313 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 00:23:05 +00:00
ebanks cba88a8861 Elegant solution to the determinism problem: force testNG to run tests in the order that I want it to.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5312 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 21:32:35 +00:00
kcibul d9ea7daa73 fixed problem with Matt re: packaging commands from external walker codebases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5311 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 21:11:38 +00:00
rpoplin 1129f1535d Fix for the HaplotypeScore optimization in AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5310 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:40:18 +00:00
ebanks 15dfac6bf7 Updating integration test to be in sync with previous commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5309 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:21:58 +00:00
ebanks 06e3c34e7f Updating performance test to be in sync with previous commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5308 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:13:35 +00:00
chartl 0f1c1fa26f First general association module. Let the bug fixing begin!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5307 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 19:55:33 +00:00
chartl 292b421113 Framework for generalized association testing. Heavy lifting done in implementation of the AssociationContext(s) and AssociationContextAtom(s). Nothing really implemented.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5306 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 18:12:39 +00:00
carneiro 6db3210387 the data processing pipeline needs more memory...
directory updates in the methods pipeline.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5305 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 17:22:58 +00:00
carneiro 897a333aba Methods Development Pipeline now has the option of calling indels with the -indels parameter. Also updated some databases and the new NA12878 HiSeq hg19 that Tim just funneled to us, is updated and called.
Small fixes on the data processing pipeline


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5304 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 17:12:55 +00:00
asivache 2f2aa339d9 Now makes all pairs, not only the good ones. The logic of selecting the "best" pair when the data are messy (e.g. multiple alignments available for an end) is still very naive
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5303 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:21:26 +00:00
asivache abf3fcbb72 Little changes in recognized annotation terms; columns in annotated maf are now prioritized and multiple alternatives do not cause 'i don't know what to do' crash: e.g. if Chromosome and chr columns are both present, then Chromosome is taken (has a priority).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5302 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:19:06 +00:00
rpoplin 255cc246a2 Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:09:03 +00:00
chartl 97e1a5262e -ct x no longer includes coverage in the previous bin
BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 15:52:04 +00:00
ebanks ee6f112556 Phase 3: constrained movement is now the only option available in the realigner (so I guess technically it's not really an option). Several command-line options are deprecated. Code cleaned up. Wiki updated. Release coming. One phase left...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5299 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 14:59:48 +00:00
ebanks 93888e570b Phase 2: after hours of testing, confirming that constrained mode looks good so moving the integration tests over to use it. Some cleanup. More cleanup coming in Phase 3.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5298 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 06:23:41 +00:00
ebanks c59c8b9872 Phase I of my promise to Mark: fleshed out integration tests for Indel Realigner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5297 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 21:05:20 +00:00
carneiro 75bd0129e7 quick bug fix.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5296 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 19:16:20 +00:00
kshakir f1f9bd6dcc Due to recent LSF hiccups put a very brief (.5-2min) retry around getting status. Can't wait too long because statuses are archived an hour after exit.
TODO: Switch to bulk status checks and add status archive lookups.
Sending SIGTERM(15) instead of SIGKILL(9) to allow for graceful termination of child process.
Printing out the name of the QScripts in the compile error text.
Added a pipelineretry -PR pass through for the MFCP and MFCPTest.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5295 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 18:59:08 +00:00
chartl 07d381ec51 BatchMerge now uses the correct UG settings, recently added by Eric
ExpandIntervals now checks that identical intervals are not created by (un)fortunately-spaced targets
VCFExtractIntervals no longer creates duplicate intervals in the case where a VCF has multiple entries at the same site



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5294 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 18:46:15 +00:00
ebanks 9357bee921 Don't skip tri-allelic alleles passed in - just choose the first one.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5293 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:25:50 +00:00
carneiro a2301383bb quick walker to find out where the reads mapped to huref were mapped in the consensus reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5292 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:00:17 +00:00
carneiro 2a48ec1307 now only accepts intervals files if the user specifically requests to report bams at interval only.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5291 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 16:49:58 +00:00
carneiro ecfb51bcd8 Few organizational changes, queue output is now categorized and hidden. Also changed NA12878.Wex to dbsnp 129.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5290 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 22:49:38 +00:00
carneiro 8ea71fd294 minor dataset chages.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5289 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 20:18:10 +00:00
ebanks 318035c147 Fixing up the output system of the Unified Genotyper. Deprecating the -all_bases and -genotype arguments. Adding instead the --output_mode (EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES) and --genotyping_mode (DISCOVERY, GENOTYPE_GIVEN_ALLELES) arguments. UG now does the correct thing when passed alleles (bound to the 'alleles' rod) to use for genotyping; added several integration tests to cover this case. This commit will break the batched calls merging script, but Chris knows this and is ready for it...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5288 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 06:07:18 +00:00
ebanks 63f40215b3 2 more scripts I found helpful in syncing (and cleaning up) the 1000G mirror
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5287 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:17:36 +00:00
ebanks d7f98ccd9c Adding --doNotWriteOriginalQuals argument to BQ recalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5286 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:00:00 +00:00
kshakir dee130ad1b Gather the log files before the actual outputs and mark the log files gatherers as intermediates.
Since the outputs will only be gathered iff the logs were gathered this allows the job name to change without causing SG to re-run.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5285 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 22:04:35 +00:00
depristo 1a5d296737 ReplaceReadGroups. Fixes BAM files without read group info. MissingReadGroup points people to this tool now. Please point users on the forum to this tool now. Will migrate to Picard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5284 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 14:02:41 +00:00
depristo aa4a4e515d Safer interface for ReorderSam. Better error checking. Documentation. Moving into Picard now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5283 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 14:35:44 +00:00
ebanks 463bb737c3 Checking in the scripts I'm using for syncing the 1000G mirror. Note that very few people can actually use them because you most likely don't have perission to write to /humgen/1kg/DCC, but these should be used as a resource if anyone ever needs to do this in the future. These scripts are very naive and consist of just the actual pulling down of data. Currently aspera and wget are supported, but Mark should feel free to add lftp if he wants. :) Also, while I'm here, I'm removing obsolete scripts for running an obsolete pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5282 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 03:59:26 +00:00
depristo cd7a7091ba Lexicographic error points users to the ReorderSam wiki entry
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5281 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:45:37 +00:00
depristo 7323b05dc1 A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5280 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:42 +00:00
depristo 444bf83acf A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5279 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:32 +00:00
depristo 87e5c448cd Forgot to enable printing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5278 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 14:51:33 +00:00
carneiro c61dd2f09f data processing pipeline now has on the fly bam indexing (powered by Matt) some new parameters, Indel Cleaning with constrain movement and fixMates is gone.
setting up methods development pipeline for some cosmetic changes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5277 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 23:13:54 +00:00
kshakir 290afae047 GSA-423 Better reporting for errors in QScript.script().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5276 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 22:21:15 +00:00
depristo d97ed3e080 Comments for Mauricio
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5275 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 16:58:34 +00:00
kiran 52f860c9b2 Modified MD5s to account for Andrey's new MNP column in CountVariants.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5274 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 13:13:58 +00:00
kiran cb95e68fc0 CpG is no longer a standard stratification.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5273 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 07:17:35 +00:00
kiran 9ddee96f93 When subsetting by sample, need to take extra care that hom-ref sites don't accidentally get treated as variant sites in CompOverlap. Renamed convenience method for creating command-lines in integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5272 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 06:26:38 +00:00
delangel 1bc5c7e99b boneheaded mistake, mixed up my min and max
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5271 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 04:02:14 +00:00
carneiro acad3ada06 changed baq to calculate_as_necessary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5270 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:50:46 +00:00