rpoplin
255cc246a2
Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:09:03 +00:00
chartl
97e1a5262e
-ct x no longer includes coverage in the previous bin
...
BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 15:52:04 +00:00
ebanks
ee6f112556
Phase 3: constrained movement is now the only option available in the realigner (so I guess technically it's not really an option). Several command-line options are deprecated. Code cleaned up. Wiki updated. Release coming. One phase left...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5299 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 14:59:48 +00:00
ebanks
93888e570b
Phase 2: after hours of testing, confirming that constrained mode looks good so moving the integration tests over to use it. Some cleanup. More cleanup coming in Phase 3.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5298 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 06:23:41 +00:00
ebanks
c59c8b9872
Phase I of my promise to Mark: fleshed out integration tests for Indel Realigner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5297 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 21:05:20 +00:00
carneiro
75bd0129e7
quick bug fix.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5296 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 19:16:20 +00:00
kshakir
f1f9bd6dcc
Due to recent LSF hiccups put a very brief (.5-2min) retry around getting status. Can't wait too long because statuses are archived an hour after exit.
...
TODO: Switch to bulk status checks and add status archive lookups.
Sending SIGTERM(15) instead of SIGKILL(9) to allow for graceful termination of child process.
Printing out the name of the QScripts in the compile error text.
Added a pipelineretry -PR pass through for the MFCP and MFCPTest.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5295 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 18:59:08 +00:00
chartl
07d381ec51
BatchMerge now uses the correct UG settings, recently added by Eric
...
ExpandIntervals now checks that identical intervals are not created by (un)fortunately-spaced targets
VCFExtractIntervals no longer creates duplicate intervals in the case where a VCF has multiple entries at the same site
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5294 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 18:46:15 +00:00
ebanks
9357bee921
Don't skip tri-allelic alleles passed in - just choose the first one.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5293 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:25:50 +00:00
carneiro
a2301383bb
quick walker to find out where the reads mapped to huref were mapped in the consensus reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5292 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:00:17 +00:00
carneiro
2a48ec1307
now only accepts intervals files if the user specifically requests to report bams at interval only.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5291 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 16:49:58 +00:00
carneiro
ecfb51bcd8
Few organizational changes, queue output is now categorized and hidden. Also changed NA12878.Wex to dbsnp 129.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5290 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 22:49:38 +00:00
carneiro
8ea71fd294
minor dataset chages.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5289 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 20:18:10 +00:00
ebanks
318035c147
Fixing up the output system of the Unified Genotyper. Deprecating the -all_bases and -genotype arguments. Adding instead the --output_mode (EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES) and --genotyping_mode (DISCOVERY, GENOTYPE_GIVEN_ALLELES) arguments. UG now does the correct thing when passed alleles (bound to the 'alleles' rod) to use for genotyping; added several integration tests to cover this case. This commit will break the batched calls merging script, but Chris knows this and is ready for it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5288 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 06:07:18 +00:00
ebanks
63f40215b3
2 more scripts I found helpful in syncing (and cleaning up) the 1000G mirror
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5287 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:17:36 +00:00
ebanks
d7f98ccd9c
Adding --doNotWriteOriginalQuals argument to BQ recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5286 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:00:00 +00:00
kshakir
dee130ad1b
Gather the log files before the actual outputs and mark the log files gatherers as intermediates.
...
Since the outputs will only be gathered iff the logs were gathered this allows the job name to change without causing SG to re-run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5285 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 22:04:35 +00:00
depristo
1a5d296737
ReplaceReadGroups. Fixes BAM files without read group info. MissingReadGroup points people to this tool now. Please point users on the forum to this tool now. Will migrate to Picard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5284 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 14:02:41 +00:00
depristo
aa4a4e515d
Safer interface for ReorderSam. Better error checking. Documentation. Moving into Picard now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5283 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 14:35:44 +00:00
ebanks
463bb737c3
Checking in the scripts I'm using for syncing the 1000G mirror. Note that very few people can actually use them because you most likely don't have perission to write to /humgen/1kg/DCC, but these should be used as a resource if anyone ever needs to do this in the future. These scripts are very naive and consist of just the actual pulling down of data. Currently aspera and wget are supported, but Mark should feel free to add lftp if he wants. :) Also, while I'm here, I'm removing obsolete scripts for running an obsolete pipeline.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5282 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 03:59:26 +00:00
depristo
cd7a7091ba
Lexicographic error points users to the ReorderSam wiki entry
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5281 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:45:37 +00:00
depristo
7323b05dc1
A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5280 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:42 +00:00
depristo
444bf83acf
A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5279 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:32 +00:00
depristo
87e5c448cd
Forgot to enable printing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5278 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 14:51:33 +00:00
carneiro
c61dd2f09f
data processing pipeline now has on the fly bam indexing (powered by Matt) some new parameters, Indel Cleaning with constrain movement and fixMates is gone.
...
setting up methods development pipeline for some cosmetic changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5277 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 23:13:54 +00:00
kshakir
290afae047
GSA-423 Better reporting for errors in QScript.script().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5276 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 22:21:15 +00:00
depristo
d97ed3e080
Comments for Mauricio
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5275 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 16:58:34 +00:00
kiran
52f860c9b2
Modified MD5s to account for Andrey's new MNP column in CountVariants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5274 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 13:13:58 +00:00
kiran
cb95e68fc0
CpG is no longer a standard stratification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5273 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 07:17:35 +00:00
kiran
9ddee96f93
When subsetting by sample, need to take extra care that hom-ref sites don't accidentally get treated as variant sites in CompOverlap. Renamed convenience method for creating command-lines in integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5272 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 06:26:38 +00:00
delangel
1bc5c7e99b
boneheaded mistake, mixed up my min and max
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5271 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 04:02:14 +00:00
carneiro
acad3ada06
changed baq to calculate_as_necessary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5270 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:50:46 +00:00
carneiro
7f9ca6b28a
full data processing pipeline, now deleting intermediate files and performing both phases (per lane and combined) of the processing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5269 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:34:00 +00:00
kiran
4f83151c4e
Evaluates within standard target and expanded target separately.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5268 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:04:24 +00:00
kiran
92c82200c9
Fixed an issue where an eval module with TableType objects would get an extra, empty table in the output, screwing up the parse in R.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5267 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:03:46 +00:00
asivache
7ffcade3c3
Added MNP to recognized and counted event types
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5266 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:37:38 +00:00
depristo
57c66b5602
Supports GQ now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5265 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:30:25 +00:00
kshakir
860b172ef1
Defaulting the MFCP to run without a tear script.
...
Added a missing virtual output for the inner FCP, so that Queue can tell a run of the FCP is dot-done.
Enabled the MCFPTest for the first time, running without the tear script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5264 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 21:13:14 +00:00
kshakir
49931b12fb
Ignore missing external directory, take two.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5263 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 20:00:00 +00:00
kshakir
188c4f67b0
Ignore missing external directory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5262 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 19:22:21 +00:00
kshakir
a189454343
FCP only adds the expand intervals QFunction once per script instead of once per QFunction using the ExpandTargets scala trait.
...
Eval dbSNP's type now based on eval dbSNP instead of genotype dbSNP.
Using an external treemap instead of the JGraphT internal node set to speed up larger graph generation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5261 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 19:09:03 +00:00
delangel
f1d708f4d4
Fixes for HRun annotation in case of indels:
...
a) In case of a deletion value was completely broken, we'd report 0 or -1.
b) For indels, we report maximum of forward and backward values - I've seen empirically many sites which are not strand biased but which seem to be artifacts and the homopolymer run is always to the right only (because we left align by convention).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5260 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 18:57:21 +00:00
hanna
fb9f92d09c
For Kristian...bug fixes for mechanism allowing external source
...
directory to live anywhere on the filesystem.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5259 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 18:35:27 +00:00
asivache
0e04e95245
Bug fix: when extracting reference sequence for the event from the reference genome, the tool was treating Deletions and MNPs of length N in exactly the same way: ref_bases[current_pos+1,...,current_pos+N]. This is correct for Deletions but not for MNPs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5258 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 16:15:42 +00:00
carneiro
497e9ab83b
too hasty... cleaning up debug messages ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5257 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 02:11:03 +00:00
carneiro
b4da843c49
now processes either a single bam file or a list of bam files in parallel.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5256 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 02:07:22 +00:00
asivache
52eedaf22d
Subtle but very annoying bug due to incorrect exit condition on backward traversal. Example of incorrect old behavior (found by Martha Borkan, this normally would NOT happen with the combination of match/mismatch/open/extend parameters we have been using; use match=10.0, mismatch= -9.0, open= -15.0, extend= -6.66 in older builds in order to reproduce):
...
let's align two sequences (shown below, good alignment)
AAATTTGGTAAAA-GT
AAATTTGGTAAAAGGT
now let's reverse the same very sequences and align again
TGAAAATGGTTTAAA
TGGAAAATGGTTTAAA
Note how we lost the deletion and got a mismatch instead at the very first letter of the upper sequence. The overall score of any particular alignment does not depend on the direction of the traversal, so the best alignment (with the highest score) should stay the same too.
New version fixes this issue and produces correct alignment of reverse sequences (up to the different choice of redundant position for the deletion):
T-GAAAATGGTTTAAA
TGGAAAATGGTTTAAA
This version also has the main() method reinstated, so the aligner can be run on its own as a little app.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5255 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 00:02:32 +00:00
fromer
6e291820d3
GeneNamesIntervalWalker outputs all genes in each interval; walkers now require a ROD named "intervals"
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5254 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 19:58:09 +00:00
carneiro
50c870cfce
Data Processing Pipeline: local indel realignment, mark duplicates and BQSR. Done.
...
Pacbio pipeline: now all pacbio bams have baq annotated in so running UG is uber fast.
Methods pipeline: minor cosmetic changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5253 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 17:22:30 +00:00
fromer
b304ced801
Updated haplotype calculator to correctly terminate haploptypes RIGHT BEFORE an unphased het
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5252 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 17:10:01 +00:00