Commit Graph

5197 Commits (b8c3c3ae6ea71e061decda17115870251e29bbd2)

Author SHA1 Message Date
hanna b8c3c3ae6e Added commons math, for Kristian.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5238 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 18:57:21 +00:00
asivache 7a11b4f35d Another change in variant classification values
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5237 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:47:58 +00:00
asivache 7f7d7eb2d1 Inconsequential changes, more 'variant classification' values are recognized
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5236 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:36:39 +00:00
kiran d3660aa00e Very basic functionality for annotating indels (specifies whether the indel is frameshift, inframe, or non-coding). Does not attempt to recalculate the variant codon, variant amino acid, or whether the site falls within a splice region. Added a convenience method to WalkerTest for building command-line arguments with the proper spacing (so that I stop getting annoyed when I've gotten it wrong and the test system yells at me.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5235 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-13 17:58:20 +00:00
hanna 8d6db5d188 Additional logging of the temp file creation, management, and merging process
for VCF files.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5234 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 22:07:25 +00:00
carneiro 5f10fffa47 merge intervals now prints a sorted list in the end.
added the ccs datasets to the pbCalling pipeline.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5233 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 20:57:59 +00:00
carneiro 50c2fa3c3a this -1 made ALL the difference in the world. Minor bug fix.
Regular updates to the pbCalling pipeline.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5232 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 19:25:09 +00:00
fromer cdf53188d6 Updated DoC to work with scatter-gather; and, also manually implemented scatter-gather by sample above the scatter-gather by interval. Thansk to Khalid for his support!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5231 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 19:14:42 +00:00
carneiro e7d38247bb chunkIntervals.lua creates 1Mb interval chunks out of any .intervals file. Useful for methods development pipeline datasets.
remapAmplicons.lua takes a sam file with reads aligned to amplicon references, a reference genome , and an amplicon reference mapping table, and rewrites the sam file with mappings to the reference sequence.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5230 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 18:21:31 +00:00
asivache 03482bf7c4 Number of MQ0 reads in each sample (format field)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5229 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:16:26 +00:00
asivache 8560bb290b Allelic fractions are now computed on MQ>0 reads only; total depth in each sample still includes MQ0 as per usual convention. Also renamed for clarity.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5228 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:13:15 +00:00
ebanks 9554df1a7c Adding integration test for indels in VF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5227 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 16:58:57 +00:00
carneiro c630701a76 Following Ryan's suggestion, I am moving the Methods Development Calling pipeline to the Core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5226 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-10 17:36:05 +00:00
carneiro 9c2c5efe35 a modified version of the Methods Development calling pipeline made to work with pacbio data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5225 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-10 16:06:50 +00:00
depristo b1e4e1afb6 Slightly better output now -- no longer emitting pdfs by default. Emails will go to gsamembers now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5224 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-10 13:02:24 +00:00
fromer 947cc44854 Thanks to Matt for walking me through a proper version of VCF_BAM_utilities! Feel free to add to it, or use it to get the samples in a VCF file, a BAM file, or a collection of BAM files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5223 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 18:08:27 +00:00
hanna b992abb6eb A few more unit tests plus some extra
functionality for BAM index visualization.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5222 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 01:51:34 +00:00
kshakir 4d1cca95bb Removed deprecated getDbsnpFile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:12:15 +00:00
kshakir a8ab5a5fb9 After code review with APSG, trying a patch for SIGSEGV errors which checks the LSF result codes from lsb_openjobinfo instead of checking for a null return value from lsb_readjobinfo.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5220 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:08:22 +00:00
delangel f3de9ee3e0 Refactoring of indel evaluation code to make it easier for external functions to get access to indel classification, in preparation for IndelMetricsByAC to stratify indel classes by AC (not done yet).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5219 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:35:16 +00:00
delangel 3635606cd8 Temp checkin just for experimentation: exposed probabilistic alignment parameters to command line interface to make it easier to experiment on their effects, although a full scrap/rewrite of this should be coming soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5218 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:33:29 +00:00
carneiro e5cfc6ae74 NA12878 hg19 dataset was included to the methods pipeline. (and I am running it)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5217 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 16:17:46 +00:00
ebanks 196eb77699 CG var format is screwed up and doesn't quite fit into the VariantsToVCF mold (we need to see multiple records before we can assign genotypes to a given position), so it's safer to keep this separate from the other well-behaved formats. Hopefully, it's temporary anyways.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5216 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 03:18:38 +00:00
ebanks 4fe0fcd707 Updates to handle CG data, headers, etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5215 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 03:16:05 +00:00
fromer 8d0f1b75d5 Added queue/util/BAMutilities Object [with BAM and VCF parsing utilities], which is now used by my qscripts that robustly split runs by sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5214 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 22:17:29 +00:00
kshakir 8040998c15 Renamed the pipeline yaml dbsnpFile to genotypeDbsnp, and added an evalDbsnp.
Added a genotypeDbsnpType and evalDbsnpType to check the extensions for .vcf or .rod.
Moved renaming of "recalibrated" bams to "cleaned" from sed to yaml generation template (see diff for more info).
Renamed fCP.q to FCP.q.
Though it's still disabled until VariantEval is updated, added changes above to the FCPTest.
Removed refseq table from the queue.sh wrapper script. Only specified in the yaml.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5213 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 22:01:09 +00:00
fromer bceb2a9460 Now that Mauricio has updated the PacBio BAM to properly have RG, can use sample name in the walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5212 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 20:26:57 +00:00
kiran ecbc38aff0 If no comp rod is specified, specify the dummy name none so that we still get counts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5211 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 19:24:52 +00:00
carneiro 1fbfd4082e Cycle covariate now works with pacbio reads. No need to override the platform anymore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5210 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 17:14:55 +00:00
asivache 2a04e0d378 Explicitly set logger's level to info - otherwise samtools is too chatty
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5209 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 17:08:50 +00:00
fromer 3c1a026c94 Updated script to properly bin DoC values so that down-sampling corresponds to range of DoC values obtainable
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5208 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 16:47:55 +00:00
ebanks 698096dc5a Moving VariantsToVCF to the proper directory; removing the oneoffs CG indel converter in preparation for a ligitimate CG variant Feature class in the works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5207 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 05:21:01 +00:00
kiran 35c688ac67 Updated md5 for testVCFStreamingChain to reflect latest changes to VariantEval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5206 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 21:22:05 +00:00
kiran 1f820d5026 Added two files from some refactoring changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5205 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:20:12 +00:00
kiran 1085bbf303 Fixed issue where all comp tracks were being treated as known tracks. Fixed issue where multiple JEXL expressions were causing an exception because the underlying object did not implement the Comparable interface. Fixed issue where variants being compared to the known track were not being checked for equality of variation type. Fixed issue where functional annotations were not being iterated over properly. Refactored a lot of helper methods into a separate VariantEvalUtils utility class. Significantly expanded the test suite using a small VCF with SNPs, indels, and non-variant loci which makes it much easier to see what the proper answer should be, and included the appropriate grep and awk commands in the comments to confirm the values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5204 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:19:20 +00:00
depristo c4707631e2 MethodsDevelopmentPipeline is now the test bed for large scale AWS_S3 logging. Can be disabled from command line if this is necessary
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5203 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 17:03:45 +00:00
fromer 8b8b4fced1 Removed explicit memoryLimit, so that memLimit given on the command-line will NOT be ignored...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5202 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 01:55:17 +00:00
kshakir cc5d695bcf Renamed the IPFL Test to IPFL PipelineTest so that it'll be picked up by the PipelineTests.
HACK: Turned off JNA autoRead() in the jobInfoEnt LSF structure to try and dodge the SIGSEGV during strlen calls during bmods. 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5201 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-05 00:06:12 +00:00
depristo ce51ffb56e Oops, old local paths committed on accident.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5200 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 23:35:56 +00:00
depristo 29f3ad72f3 SAMFileWriter that allows the user to move reads, but only a bit, in an incoming coordinated sorted BAM files. Does some local reordering and local mate fixing, under specified constrained. These constrains allow us to make a special -- under testing for Eric, who promised to try this out a bit, expand test cases and integration tests -- but soon to be the default and only model of the realigner that only moves reads with ISIZE < 3000 that directly emits a coordinate sorted, mate fixed validating BAM file without needing FixMates externally. Preliminary testing shows this runs in a totally fine amount of memory and produces equivalent results to the previous version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5199 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:27:05 +00:00
depristo 11ea321b39 Trivial header cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5198 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:23:15 +00:00
depristo fe4aa58d35 Removing unused class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:22:28 +00:00
depristo 0ad1ea4aa1 Fixed Umapped misspelling
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5196 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:21:41 +00:00
asivache 03f265d8bd Change DP format field description in the header line (expected count=1)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5195 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:28:25 +00:00
fromer 4cdc974c5f Preliminary Qscript to run DoC for the purpose of CNV detection
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5194 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:25:59 +00:00
asivache c0e998621c Computes two format (genotype) level annotations: total read depth in the given sample (DP format field) and fraction of reads supporting alt allele(s) in the given sample (FA format field)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5193 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:23:55 +00:00
asivache 8700b74640 Now annotates indels as well. Probably can also annotate mixed vcf with indels +snps, but not tested in that mode...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5192 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 20:28:03 +00:00
corin cd6ace1b47 Includes UG version of indel genotyping rather than IGV2
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5191 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 20:25:46 +00:00
chartl bfc6ef1753 A successful attempt at a queue integration test, ensuring that the InProcessFunction libraries are working as expected.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5190 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 21:30:35 +00:00
carneiro 358a400474 made ApplyVariantCut a default part of the pipeline, added the -noCut option if you don't want to use it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5189 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 19:29:36 +00:00