rpoplin
1129f1535d
Fix for the HaplotypeScore optimization in AlignmentUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5310 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:40:18 +00:00
ebanks
15dfac6bf7
Updating integration test to be in sync with previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5309 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:21:58 +00:00
ebanks
06e3c34e7f
Updating performance test to be in sync with previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5308 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:13:35 +00:00
chartl
0f1c1fa26f
First general association module. Let the bug fixing begin!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5307 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 19:55:33 +00:00
chartl
292b421113
Framework for generalized association testing. Heavy lifting done in implementation of the AssociationContext(s) and AssociationContextAtom(s). Nothing really implemented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5306 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 18:12:39 +00:00
asivache
2f2aa339d9
Now makes all pairs, not only the good ones. The logic of selecting the "best" pair when the data are messy (e.g. multiple alignments available for an end) is still very naive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5303 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:21:26 +00:00
asivache
abf3fcbb72
Little changes in recognized annotation terms; columns in annotated maf are now prioritized and multiple alternatives do not cause 'i don't know what to do' crash: e.g. if Chromosome and chr columns are both present, then Chromosome is taken (has a priority).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5302 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:19:06 +00:00
rpoplin
255cc246a2
Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:09:03 +00:00
chartl
97e1a5262e
-ct x no longer includes coverage in the previous bin
...
BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 15:52:04 +00:00
ebanks
ee6f112556
Phase 3: constrained movement is now the only option available in the realigner (so I guess technically it's not really an option). Several command-line options are deprecated. Code cleaned up. Wiki updated. Release coming. One phase left...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5299 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 14:59:48 +00:00
ebanks
93888e570b
Phase 2: after hours of testing, confirming that constrained mode looks good so moving the integration tests over to use it. Some cleanup. More cleanup coming in Phase 3.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5298 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 06:23:41 +00:00
ebanks
c59c8b9872
Phase I of my promise to Mark: fleshed out integration tests for Indel Realigner
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5297 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 21:05:20 +00:00
carneiro
75bd0129e7
quick bug fix.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5296 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 19:16:20 +00:00
ebanks
9357bee921
Don't skip tri-allelic alleles passed in - just choose the first one.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5293 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:25:50 +00:00
carneiro
a2301383bb
quick walker to find out where the reads mapped to huref were mapped in the consensus reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5292 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:00:17 +00:00
ebanks
318035c147
Fixing up the output system of the Unified Genotyper. Deprecating the -all_bases and -genotype arguments. Adding instead the --output_mode (EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES) and --genotyping_mode (DISCOVERY, GENOTYPE_GIVEN_ALLELES) arguments. UG now does the correct thing when passed alleles (bound to the 'alleles' rod) to use for genotyping; added several integration tests to cover this case. This commit will break the batched calls merging script, but Chris knows this and is ready for it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5288 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 06:07:18 +00:00
ebanks
d7f98ccd9c
Adding --doNotWriteOriginalQuals argument to BQ recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5286 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:00:00 +00:00
depristo
1a5d296737
ReplaceReadGroups. Fixes BAM files without read group info. MissingReadGroup points people to this tool now. Please point users on the forum to this tool now. Will migrate to Picard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5284 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 14:02:41 +00:00
depristo
aa4a4e515d
Safer interface for ReorderSam. Better error checking. Documentation. Moving into Picard now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5283 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 14:35:44 +00:00
depristo
cd7a7091ba
Lexicographic error points users to the ReorderSam wiki entry
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5281 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:45:37 +00:00
depristo
444bf83acf
A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5279 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:32 +00:00
kshakir
290afae047
GSA-423 Better reporting for errors in QScript.script().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5276 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 22:21:15 +00:00
kiran
52f860c9b2
Modified MD5s to account for Andrey's new MNP column in CountVariants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5274 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 13:13:58 +00:00
kiran
cb95e68fc0
CpG is no longer a standard stratification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5273 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 07:17:35 +00:00
kiran
9ddee96f93
When subsetting by sample, need to take extra care that hom-ref sites don't accidentally get treated as variant sites in CompOverlap. Renamed convenience method for creating command-lines in integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5272 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 06:26:38 +00:00
delangel
1bc5c7e99b
boneheaded mistake, mixed up my min and max
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5271 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 04:02:14 +00:00
kiran
92c82200c9
Fixed an issue where an eval module with TableType objects would get an extra, empty table in the output, screwing up the parse in R.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5267 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:03:46 +00:00
asivache
7ffcade3c3
Added MNP to recognized and counted event types
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5266 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:37:38 +00:00
depristo
57c66b5602
Supports GQ now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5265 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:30:25 +00:00
kshakir
a189454343
FCP only adds the expand intervals QFunction once per script instead of once per QFunction using the ExpandTargets scala trait.
...
Eval dbSNP's type now based on eval dbSNP instead of genotype dbSNP.
Using an external treemap instead of the JGraphT internal node set to speed up larger graph generation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5261 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 19:09:03 +00:00
delangel
f1d708f4d4
Fixes for HRun annotation in case of indels:
...
a) In case of a deletion value was completely broken, we'd report 0 or -1.
b) For indels, we report maximum of forward and backward values - I've seen empirically many sites which are not strand biased but which seem to be artifacts and the homopolymer run is always to the right only (because we left align by convention).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5260 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 18:57:21 +00:00
asivache
0e04e95245
Bug fix: when extracting reference sequence for the event from the reference genome, the tool was treating Deletions and MNPs of length N in exactly the same way: ref_bases[current_pos+1,...,current_pos+N]. This is correct for Deletions but not for MNPs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5258 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 16:15:42 +00:00
asivache
52eedaf22d
Subtle but very annoying bug due to incorrect exit condition on backward traversal. Example of incorrect old behavior (found by Martha Borkan, this normally would NOT happen with the combination of match/mismatch/open/extend parameters we have been using; use match=10.0, mismatch= -9.0, open= -15.0, extend= -6.66 in older builds in order to reproduce):
...
let's align two sequences (shown below, good alignment)
AAATTTGGTAAAA-GT
AAATTTGGTAAAAGGT
now let's reverse the same very sequences and align again
TGAAAATGGTTTAAA
TGGAAAATGGTTTAAA
Note how we lost the deletion and got a mismatch instead at the very first letter of the upper sequence. The overall score of any particular alignment does not depend on the direction of the traversal, so the best alignment (with the highest score) should stay the same too.
New version fixes this issue and produces correct alignment of reverse sequences (up to the different choice of redundant position for the deletion):
T-GAAAATGGTTTAAA
TGGAAAATGGTTTAAA
This version also has the main() method reinstated, so the aligner can be run on its own as a little app.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5255 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 00:02:32 +00:00
fromer
6e291820d3
GeneNamesIntervalWalker outputs all genes in each interval; walkers now require a ROD named "intervals"
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5254 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 19:58:09 +00:00
fromer
b304ced801
Updated haplotype calculator to correctly terminate haploptypes RIGHT BEFORE an unphased het
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5252 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 17:10:01 +00:00
depristo
5a51c9a815
AWS_S3 logging is now enabled by default. It first tries to log internally at the Broad, and if it can't goes to AWS_S3. DEV option is removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5249 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 20:20:14 +00:00
kshakir
d185c2961f
Added pipeline for calling FCP in batches called MultiFullCallingPipeline.
...
Bug smashes for the MCFP:
Synchronized access to LSF library and modifications to the QGraph.
If values are missing from the graph with -run make sure to exit with a non-zero.
Refactored QGraph to pre-generate a unique Int for each QNode speeding up getHashCode/equals inside the graph.
Added jobPriority and removed jobLimitSeconds from QFunction.
All scatter gather is by default in a single sub directory queueScatterGather.
Moved some FCPTest into BaseTest/PipelineTest for use by MFCPTest.
Rev'ed the 1000G bams used for validation from v1 to v2 and added code to look for the bams before running other tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5247 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 18:26:14 +00:00
fromer
d6e3f2eba6
Added GC content calculator for CNV data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5240 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 22:29:55 +00:00
asivache
7a11b4f35d
Another change in variant classification values
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5237 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:47:58 +00:00
asivache
7f7d7eb2d1
Inconsequential changes, more 'variant classification' values are recognized
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5236 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:36:39 +00:00
kiran
d3660aa00e
Very basic functionality for annotating indels (specifies whether the indel is frameshift, inframe, or non-coding). Does not attempt to recalculate the variant codon, variant amino acid, or whether the site falls within a splice region. Added a convenience method to WalkerTest for building command-line arguments with the proper spacing (so that I stop getting annoyed when I've gotten it wrong and the test system yells at me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5235 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-13 17:58:20 +00:00
hanna
8d6db5d188
Additional logging of the temp file creation, management, and merging process
...
for VCF files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5234 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 22:07:25 +00:00
asivache
03482bf7c4
Number of MQ0 reads in each sample (format field)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5229 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:16:26 +00:00
asivache
8560bb290b
Allelic fractions are now computed on MQ>0 reads only; total depth in each sample still includes MQ0 as per usual convention. Also renamed for clarity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5228 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:13:15 +00:00
ebanks
9554df1a7c
Adding integration test for indels in VF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5227 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 16:58:57 +00:00
hanna
b992abb6eb
A few more unit tests plus some extra
...
functionality for BAM index visualization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5222 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 01:51:34 +00:00
kshakir
4d1cca95bb
Removed deprecated getDbsnpFile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:12:15 +00:00
kshakir
a8ab5a5fb9
After code review with APSG, trying a patch for SIGSEGV errors which checks the LSF result codes from lsb_openjobinfo instead of checking for a null return value from lsb_readjobinfo.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5220 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:08:22 +00:00
delangel
f3de9ee3e0
Refactoring of indel evaluation code to make it easier for external functions to get access to indel classification, in preparation for IndelMetricsByAC to stratify indel classes by AC (not done yet).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5219 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:35:16 +00:00
delangel
3635606cd8
Temp checkin just for experimentation: exposed probabilistic alignment parameters to command line interface to make it easier to experiment on their effects, although a full scrap/rewrite of this should be coming soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5218 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:33:29 +00:00