Commit Graph

5263 Commits (897a333aba5990a2edfedcf79df1fd83f3fd4a1d)

Author SHA1 Message Date
carneiro 897a333aba Methods Development Pipeline now has the option of calling indels with the -indels parameter. Also updated some databases and the new NA12878 HiSeq hg19 that Tim just funneled to us, is updated and called.
Small fixes on the data processing pipeline


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5304 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 17:12:55 +00:00
asivache 2f2aa339d9 Now makes all pairs, not only the good ones. The logic of selecting the "best" pair when the data are messy (e.g. multiple alignments available for an end) is still very naive
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5303 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:21:26 +00:00
asivache abf3fcbb72 Little changes in recognized annotation terms; columns in annotated maf are now prioritized and multiple alternatives do not cause 'i don't know what to do' crash: e.g. if Chromosome and chr columns are both present, then Chromosome is taken (has a priority).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5302 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:19:06 +00:00
rpoplin 255cc246a2 Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:09:03 +00:00
chartl 97e1a5262e -ct x no longer includes coverage in the previous bin
BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 15:52:04 +00:00
ebanks ee6f112556 Phase 3: constrained movement is now the only option available in the realigner (so I guess technically it's not really an option). Several command-line options are deprecated. Code cleaned up. Wiki updated. Release coming. One phase left...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5299 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 14:59:48 +00:00
ebanks 93888e570b Phase 2: after hours of testing, confirming that constrained mode looks good so moving the integration tests over to use it. Some cleanup. More cleanup coming in Phase 3.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5298 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 06:23:41 +00:00
ebanks c59c8b9872 Phase I of my promise to Mark: fleshed out integration tests for Indel Realigner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5297 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 21:05:20 +00:00
carneiro 75bd0129e7 quick bug fix.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5296 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 19:16:20 +00:00
kshakir f1f9bd6dcc Due to recent LSF hiccups put a very brief (.5-2min) retry around getting status. Can't wait too long because statuses are archived an hour after exit.
TODO: Switch to bulk status checks and add status archive lookups.
Sending SIGTERM(15) instead of SIGKILL(9) to allow for graceful termination of child process.
Printing out the name of the QScripts in the compile error text.
Added a pipelineretry -PR pass through for the MFCP and MFCPTest.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5295 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 18:59:08 +00:00
chartl 07d381ec51 BatchMerge now uses the correct UG settings, recently added by Eric
ExpandIntervals now checks that identical intervals are not created by (un)fortunately-spaced targets
VCFExtractIntervals no longer creates duplicate intervals in the case where a VCF has multiple entries at the same site



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5294 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 18:46:15 +00:00
ebanks 9357bee921 Don't skip tri-allelic alleles passed in - just choose the first one.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5293 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:25:50 +00:00
carneiro a2301383bb quick walker to find out where the reads mapped to huref were mapped in the consensus reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5292 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:00:17 +00:00
carneiro 2a48ec1307 now only accepts intervals files if the user specifically requests to report bams at interval only.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5291 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 16:49:58 +00:00
carneiro ecfb51bcd8 Few organizational changes, queue output is now categorized and hidden. Also changed NA12878.Wex to dbsnp 129.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5290 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 22:49:38 +00:00
carneiro 8ea71fd294 minor dataset chages.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5289 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 20:18:10 +00:00
ebanks 318035c147 Fixing up the output system of the Unified Genotyper. Deprecating the -all_bases and -genotype arguments. Adding instead the --output_mode (EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES) and --genotyping_mode (DISCOVERY, GENOTYPE_GIVEN_ALLELES) arguments. UG now does the correct thing when passed alleles (bound to the 'alleles' rod) to use for genotyping; added several integration tests to cover this case. This commit will break the batched calls merging script, but Chris knows this and is ready for it...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5288 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 06:07:18 +00:00
ebanks 63f40215b3 2 more scripts I found helpful in syncing (and cleaning up) the 1000G mirror
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5287 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:17:36 +00:00
ebanks d7f98ccd9c Adding --doNotWriteOriginalQuals argument to BQ recalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5286 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:00:00 +00:00
kshakir dee130ad1b Gather the log files before the actual outputs and mark the log files gatherers as intermediates.
Since the outputs will only be gathered iff the logs were gathered this allows the job name to change without causing SG to re-run.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5285 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 22:04:35 +00:00
depristo 1a5d296737 ReplaceReadGroups. Fixes BAM files without read group info. MissingReadGroup points people to this tool now. Please point users on the forum to this tool now. Will migrate to Picard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5284 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 14:02:41 +00:00
depristo aa4a4e515d Safer interface for ReorderSam. Better error checking. Documentation. Moving into Picard now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5283 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 14:35:44 +00:00
ebanks 463bb737c3 Checking in the scripts I'm using for syncing the 1000G mirror. Note that very few people can actually use them because you most likely don't have perission to write to /humgen/1kg/DCC, but these should be used as a resource if anyone ever needs to do this in the future. These scripts are very naive and consist of just the actual pulling down of data. Currently aspera and wget are supported, but Mark should feel free to add lftp if he wants. :) Also, while I'm here, I'm removing obsolete scripts for running an obsolete pipeline.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5282 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 03:59:26 +00:00
depristo cd7a7091ba Lexicographic error points users to the ReorderSam wiki entry
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5281 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:45:37 +00:00
depristo 7323b05dc1 A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5280 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:42 +00:00
depristo 444bf83acf A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5279 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:32 +00:00
depristo 87e5c448cd Forgot to enable printing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5278 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 14:51:33 +00:00
carneiro c61dd2f09f data processing pipeline now has on the fly bam indexing (powered by Matt) some new parameters, Indel Cleaning with constrain movement and fixMates is gone.
setting up methods development pipeline for some cosmetic changes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5277 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 23:13:54 +00:00
kshakir 290afae047 GSA-423 Better reporting for errors in QScript.script().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5276 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 22:21:15 +00:00
depristo d97ed3e080 Comments for Mauricio
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5275 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 16:58:34 +00:00
kiran 52f860c9b2 Modified MD5s to account for Andrey's new MNP column in CountVariants.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5274 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 13:13:58 +00:00
kiran cb95e68fc0 CpG is no longer a standard stratification.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5273 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 07:17:35 +00:00
kiran 9ddee96f93 When subsetting by sample, need to take extra care that hom-ref sites don't accidentally get treated as variant sites in CompOverlap. Renamed convenience method for creating command-lines in integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5272 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 06:26:38 +00:00
delangel 1bc5c7e99b boneheaded mistake, mixed up my min and max
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5271 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 04:02:14 +00:00
carneiro acad3ada06 changed baq to calculate_as_necessary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5270 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:50:46 +00:00
carneiro 7f9ca6b28a full data processing pipeline, now deleting intermediate files and performing both phases (per lane and combined) of the processing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5269 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:34:00 +00:00
kiran 4f83151c4e Evaluates within standard target and expanded target separately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5268 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:04:24 +00:00
kiran 92c82200c9 Fixed an issue where an eval module with TableType objects would get an extra, empty table in the output, screwing up the parse in R.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5267 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:03:46 +00:00
asivache 7ffcade3c3 Added MNP to recognized and counted event types
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5266 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:37:38 +00:00
depristo 57c66b5602 Supports GQ now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5265 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:30:25 +00:00
kshakir 860b172ef1 Defaulting the MFCP to run without a tear script.
Added a missing virtual output for the inner FCP, so that Queue can tell a run of the FCP is dot-done.
Enabled the MCFPTest for the first time, running without the tear script.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5264 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 21:13:14 +00:00
kshakir 49931b12fb Ignore missing external directory, take two.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5263 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 20:00:00 +00:00
kshakir 188c4f67b0 Ignore missing external directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5262 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 19:22:21 +00:00
kshakir a189454343 FCP only adds the expand intervals QFunction once per script instead of once per QFunction using the ExpandTargets scala trait.
Eval dbSNP's type now based on eval dbSNP instead of genotype dbSNP.
Using an external treemap instead of the JGraphT internal node set to speed up larger graph generation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5261 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 19:09:03 +00:00
delangel f1d708f4d4 Fixes for HRun annotation in case of indels:
a) In case of a deletion value was completely broken, we'd report 0 or -1.
b) For indels, we report maximum of forward and backward values - I've seen empirically many sites which are not strand biased but which seem to be artifacts and the homopolymer run is always to the right only (because we left align by convention).




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5260 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 18:57:21 +00:00
hanna fb9f92d09c For Kristian...bug fixes for mechanism allowing external source
directory to live anywhere on the filesystem.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5259 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 18:35:27 +00:00
asivache 0e04e95245 Bug fix: when extracting reference sequence for the event from the reference genome, the tool was treating Deletions and MNPs of length N in exactly the same way: ref_bases[current_pos+1,...,current_pos+N]. This is correct for Deletions but not for MNPs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5258 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 16:15:42 +00:00
carneiro 497e9ab83b too hasty... cleaning up debug messages ;)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5257 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 02:11:03 +00:00
carneiro b4da843c49 now processes either a single bam file or a list of bam files in parallel.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5256 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 02:07:22 +00:00
asivache 52eedaf22d Subtle but very annoying bug due to incorrect exit condition on backward traversal. Example of incorrect old behavior (found by Martha Borkan, this normally would NOT happen with the combination of match/mismatch/open/extend parameters we have been using; use match=10.0, mismatch= -9.0, open= -15.0, extend= -6.66 in older builds in order to reproduce):
let's align two sequences (shown below, good alignment)

AAATTTGGTAAAA-GT
AAATTTGGTAAAAGGT

now let's reverse the same very sequences and align again

 TGAAAATGGTTTAAA
TGGAAAATGGTTTAAA

Note how we lost the deletion and got a mismatch instead at the very first letter of the upper sequence. The overall score of any particular alignment does not depend on the direction of the traversal, so the best alignment (with the highest score) should stay the same too.

New version fixes this issue and produces correct alignment of reverse sequences (up to the different choice of redundant position for the deletion):

T-GAAAATGGTTTAAA
TGGAAAATGGTTTAAA

This version also has the main() method reinstated, so the aligner can be run on its own as a little app.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5255 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 00:02:32 +00:00