ebanks
d7f98ccd9c
Adding --doNotWriteOriginalQuals argument to BQ recalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5286 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:00:00 +00:00
depristo
1a5d296737
ReplaceReadGroups. Fixes BAM files without read group info. MissingReadGroup points people to this tool now. Please point users on the forum to this tool now. Will migrate to Picard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5284 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 14:02:41 +00:00
depristo
aa4a4e515d
Safer interface for ReorderSam. Better error checking. Documentation. Moving into Picard now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5283 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 14:35:44 +00:00
depristo
cd7a7091ba
Lexicographic error points users to the ReorderSam wiki entry
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5281 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:45:37 +00:00
depristo
444bf83acf
A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5279 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:32 +00:00
kshakir
290afae047
GSA-423 Better reporting for errors in QScript.script().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5276 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 22:21:15 +00:00
kiran
52f860c9b2
Modified MD5s to account for Andrey's new MNP column in CountVariants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5274 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 13:13:58 +00:00
kiran
cb95e68fc0
CpG is no longer a standard stratification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5273 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 07:17:35 +00:00
kiran
9ddee96f93
When subsetting by sample, need to take extra care that hom-ref sites don't accidentally get treated as variant sites in CompOverlap. Renamed convenience method for creating command-lines in integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5272 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 06:26:38 +00:00
delangel
1bc5c7e99b
boneheaded mistake, mixed up my min and max
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5271 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 04:02:14 +00:00
kiran
92c82200c9
Fixed an issue where an eval module with TableType objects would get an extra, empty table in the output, screwing up the parse in R.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5267 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:03:46 +00:00
asivache
7ffcade3c3
Added MNP to recognized and counted event types
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5266 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:37:38 +00:00
depristo
57c66b5602
Supports GQ now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5265 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:30:25 +00:00
kshakir
a189454343
FCP only adds the expand intervals QFunction once per script instead of once per QFunction using the ExpandTargets scala trait.
...
Eval dbSNP's type now based on eval dbSNP instead of genotype dbSNP.
Using an external treemap instead of the JGraphT internal node set to speed up larger graph generation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5261 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 19:09:03 +00:00
delangel
f1d708f4d4
Fixes for HRun annotation in case of indels:
...
a) In case of a deletion value was completely broken, we'd report 0 or -1.
b) For indels, we report maximum of forward and backward values - I've seen empirically many sites which are not strand biased but which seem to be artifacts and the homopolymer run is always to the right only (because we left align by convention).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5260 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 18:57:21 +00:00
asivache
0e04e95245
Bug fix: when extracting reference sequence for the event from the reference genome, the tool was treating Deletions and MNPs of length N in exactly the same way: ref_bases[current_pos+1,...,current_pos+N]. This is correct for Deletions but not for MNPs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5258 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 16:15:42 +00:00
asivache
52eedaf22d
Subtle but very annoying bug due to incorrect exit condition on backward traversal. Example of incorrect old behavior (found by Martha Borkan, this normally would NOT happen with the combination of match/mismatch/open/extend parameters we have been using; use match=10.0, mismatch= -9.0, open= -15.0, extend= -6.66 in older builds in order to reproduce):
...
let's align two sequences (shown below, good alignment)
AAATTTGGTAAAA-GT
AAATTTGGTAAAAGGT
now let's reverse the same very sequences and align again
TGAAAATGGTTTAAA
TGGAAAATGGTTTAAA
Note how we lost the deletion and got a mismatch instead at the very first letter of the upper sequence. The overall score of any particular alignment does not depend on the direction of the traversal, so the best alignment (with the highest score) should stay the same too.
New version fixes this issue and produces correct alignment of reverse sequences (up to the different choice of redundant position for the deletion):
T-GAAAATGGTTTAAA
TGGAAAATGGTTTAAA
This version also has the main() method reinstated, so the aligner can be run on its own as a little app.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5255 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 00:02:32 +00:00
fromer
6e291820d3
GeneNamesIntervalWalker outputs all genes in each interval; walkers now require a ROD named "intervals"
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5254 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 19:58:09 +00:00
fromer
b304ced801
Updated haplotype calculator to correctly terminate haploptypes RIGHT BEFORE an unphased het
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5252 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 17:10:01 +00:00
depristo
5a51c9a815
AWS_S3 logging is now enabled by default. It first tries to log internally at the Broad, and if it can't goes to AWS_S3. DEV option is removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5249 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 20:20:14 +00:00
kshakir
d185c2961f
Added pipeline for calling FCP in batches called MultiFullCallingPipeline.
...
Bug smashes for the MCFP:
Synchronized access to LSF library and modifications to the QGraph.
If values are missing from the graph with -run make sure to exit with a non-zero.
Refactored QGraph to pre-generate a unique Int for each QNode speeding up getHashCode/equals inside the graph.
Added jobPriority and removed jobLimitSeconds from QFunction.
All scatter gather is by default in a single sub directory queueScatterGather.
Moved some FCPTest into BaseTest/PipelineTest for use by MFCPTest.
Rev'ed the 1000G bams used for validation from v1 to v2 and added code to look for the bams before running other tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5247 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 18:26:14 +00:00
fromer
d6e3f2eba6
Added GC content calculator for CNV data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5240 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 22:29:55 +00:00
asivache
7a11b4f35d
Another change in variant classification values
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5237 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:47:58 +00:00
asivache
7f7d7eb2d1
Inconsequential changes, more 'variant classification' values are recognized
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5236 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:36:39 +00:00
kiran
d3660aa00e
Very basic functionality for annotating indels (specifies whether the indel is frameshift, inframe, or non-coding). Does not attempt to recalculate the variant codon, variant amino acid, or whether the site falls within a splice region. Added a convenience method to WalkerTest for building command-line arguments with the proper spacing (so that I stop getting annoyed when I've gotten it wrong and the test system yells at me.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5235 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-13 17:58:20 +00:00
hanna
8d6db5d188
Additional logging of the temp file creation, management, and merging process
...
for VCF files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5234 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 22:07:25 +00:00
asivache
03482bf7c4
Number of MQ0 reads in each sample (format field)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5229 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:16:26 +00:00
asivache
8560bb290b
Allelic fractions are now computed on MQ>0 reads only; total depth in each sample still includes MQ0 as per usual convention. Also renamed for clarity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5228 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:13:15 +00:00
ebanks
9554df1a7c
Adding integration test for indels in VF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5227 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 16:58:57 +00:00
hanna
b992abb6eb
A few more unit tests plus some extra
...
functionality for BAM index visualization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5222 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 01:51:34 +00:00
kshakir
4d1cca95bb
Removed deprecated getDbsnpFile.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:12:15 +00:00
kshakir
a8ab5a5fb9
After code review with APSG, trying a patch for SIGSEGV errors which checks the LSF result codes from lsb_openjobinfo instead of checking for a null return value from lsb_readjobinfo.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5220 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:08:22 +00:00
delangel
f3de9ee3e0
Refactoring of indel evaluation code to make it easier for external functions to get access to indel classification, in preparation for IndelMetricsByAC to stratify indel classes by AC (not done yet).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5219 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:35:16 +00:00
delangel
3635606cd8
Temp checkin just for experimentation: exposed probabilistic alignment parameters to command line interface to make it easier to experiment on their effects, although a full scrap/rewrite of this should be coming soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5218 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:33:29 +00:00
ebanks
196eb77699
CG var format is screwed up and doesn't quite fit into the VariantsToVCF mold (we need to see multiple records before we can assign genotypes to a given position), so it's safer to keep this separate from the other well-behaved formats. Hopefully, it's temporary anyways.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5216 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 03:18:38 +00:00
ebanks
4fe0fcd707
Updates to handle CG data, headers, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5215 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 03:16:05 +00:00
kshakir
8040998c15
Renamed the pipeline yaml dbsnpFile to genotypeDbsnp, and added an evalDbsnp.
...
Added a genotypeDbsnpType and evalDbsnpType to check the extensions for .vcf or .rod.
Moved renaming of "recalibrated" bams to "cleaned" from sed to yaml generation template (see diff for more info).
Renamed fCP.q to FCP.q.
Though it's still disabled until VariantEval is updated, added changes above to the FCPTest.
Removed refseq table from the queue.sh wrapper script. Only specified in the yaml.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5213 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 22:01:09 +00:00
fromer
bceb2a9460
Now that Mauricio has updated the PacBio BAM to properly have RG, can use sample name in the walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5212 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 20:26:57 +00:00
kiran
ecbc38aff0
If no comp rod is specified, specify the dummy name none so that we still get counts.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5211 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 19:24:52 +00:00
carneiro
1fbfd4082e
Cycle covariate now works with pacbio reads. No need to override the platform anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5210 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 17:14:55 +00:00
asivache
2a04e0d378
Explicitly set logger's level to info - otherwise samtools is too chatty
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5209 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 17:08:50 +00:00
ebanks
698096dc5a
Moving VariantsToVCF to the proper directory; removing the oneoffs CG indel converter in preparation for a ligitimate CG variant Feature class in the works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5207 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 05:21:01 +00:00
kiran
35c688ac67
Updated md5 for testVCFStreamingChain to reflect latest changes to VariantEval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5206 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 21:22:05 +00:00
kiran
1f820d5026
Added two files from some refactoring changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5205 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:20:12 +00:00
kiran
1085bbf303
Fixed issue where all comp tracks were being treated as known tracks. Fixed issue where multiple JEXL expressions were causing an exception because the underlying object did not implement the Comparable interface. Fixed issue where variants being compared to the known track were not being checked for equality of variation type. Fixed issue where functional annotations were not being iterated over properly. Refactored a lot of helper methods into a separate VariantEvalUtils utility class. Significantly expanded the test suite using a small VCF with SNPs, indels, and non-variant loci which makes it much easier to see what the proper answer should be, and included the appropriate grep and awk commands in the comments to confirm the values.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5204 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:19:20 +00:00
kshakir
cc5d695bcf
Renamed the IPFL Test to IPFL PipelineTest so that it'll be picked up by the PipelineTests.
...
HACK: Turned off JNA autoRead() in the jobInfoEnt LSF structure to try and dodge the SIGSEGV during strlen calls during bmods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5201 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-05 00:06:12 +00:00
depristo
ce51ffb56e
Oops, old local paths committed on accident.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5200 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 23:35:56 +00:00
depristo
29f3ad72f3
SAMFileWriter that allows the user to move reads, but only a bit, in an incoming coordinated sorted BAM files. Does some local reordering and local mate fixing, under specified constrained. These constrains allow us to make a special -- under testing for Eric, who promised to try this out a bit, expand test cases and integration tests -- but soon to be the default and only model of the realigner that only moves reads with ISIZE < 3000 that directly emits a coordinate sorted, mate fixed validating BAM file without needing FixMates externally. Preliminary testing shows this runs in a totally fine amount of memory and produces equivalent results to the previous version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5199 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:27:05 +00:00
depristo
11ea321b39
Trivial header cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5198 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:23:15 +00:00
depristo
fe4aa58d35
Removing unused class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:22:28 +00:00