Commit Graph

4061 Commits (4a84a81d17807eef1194e832db665dd372ddfd63)

Author SHA1 Message Date
depristo bafa61c1fe LINEAR_EXACT now the default model. Passes all integration tests. 2-3x faster in low-pass data. Tests on exome data ongoing, but potentially vastly faster there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5357 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 17:14:36 +00:00
rpoplin 8e1aa6059a New mode for CombineVariants to assume the incoming VCFs have the same samples and disjoint calls. Drastically reduces the runtime for routine combining operations. Very useful with Queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5356 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 15:52:17 +00:00
hanna 5e4b321f86 Add hidden command-line argument for low-memory sharding.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5355 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 15:13:16 +00:00
ebanks ae42c0c7da Bug fix based on GATK run report
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5354 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 14:18:12 +00:00
ebanks 660998065b 'Okay, now I'm absolutely certain that there are no more bugs in the constrained writer.'
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5353 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 03:48:40 +00:00
hanna 880c607d79 Disable validation of linear index against original linear index process.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5352 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 01:51:26 +00:00
hanna dc62685a2f For Ryan: force creation of BAM index when no reads are present in the BAM
file.  Temporary fix until Picard changes the behavior of indexing.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5351 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 01:50:42 +00:00
asivache 570186fa42 Added (deep) clone() and merge() to the RunningAverage utility class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5350 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 00:35:23 +00:00
hanna 43567b7fe3 Load the linear index without forcing the index for the entire contig to be
loaded into memory.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5349 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-02 00:08:39 +00:00
ebanks a20ce1436d A temporary @hidden hack to get indel calling done for Phase I: don't try to call if there's too much coverage. Do not use this unless your last name rhymes with Shmoplin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5348 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 19:22:27 +00:00
hanna 3c7ae0d1a6 Special case handling of unmapped region in low memory sharder.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5346 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 17:38:30 +00:00
hanna dd30ad751a Fix bug in low memory sharder's interval accumulator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5345 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 17:11:22 +00:00
hanna d6145de970 More comprehensive tracking of position when bin trees are sparse.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5344 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 15:53:43 +00:00
ebanks bb969cd3a2 EMIT_ALL_SITES now does exactly that - even when there's no coverage or too many deletions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5343 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 05:05:00 +00:00
chartl 0723b0f44c Generalized association is now working. Output is in a horrific format. Implementation of T-testing. Improvements are to look for classes dynamically (a la VariantEval/VariantAnnotator), beautify output, and do optimizations where they exist.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5341 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-01 01:23:37 +00:00
rpoplin ce34a8a918 New hidden option in VQSR to not parse the genotypes of the incoming training data. Updated VQSR training in methods development pipeline to be more in line with best practices.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5340 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 23:19:51 +00:00
hanna e7089f9870 Fix for particularly small, isolated intervals: make sure the bounds of the
bin tree are dictated by the lowest bin level, whether it exists or not.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5339 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 22:35:53 +00:00
hanna c869d1c9cf Fix misc issues in new protosharder regarding proper iterator termination when
an unexpectedly small amount of data is present.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5338 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 21:14:18 +00:00
hanna e75366f738 Fixed performance issue in protosharding code -- turns out that the index
optimizer was mutating the data stored in the indices.  Protosharding still
disabled by default.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5334 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 17:32:12 +00:00
ebanks 8de83725f9 Simple walker to randomly break VCF files into (potentially unequal) subsets. Useful for e.g. cutting hapmap into training and evaluation sets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5333 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 16:51:46 +00:00
delangel d059d89a9d Fixes and cleanups for indel eval module. Also outputs AT/CG ratio in dedicated column in IndelStatistics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5332 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 12:07:50 +00:00
ebanks 05fac8583d Following up Mark's recent commit: hooking up the --maxPositionalMoveAllowed argument into the indel realigner and through to the SAM writer. We now ensure that no read is realigned more than N bases (200 by default, which is nowhere close to realistically possible). If anyone ever sees a warning message about this with the default value then please let me know because I need to see it for myself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5331 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 04:40:54 +00:00
depristo 874406352c Accidentally commited the N2 comparing test as well...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5330 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 04:15:30 +00:00
depristo 1dedfdb11b Fixes for constrained movement Indel Realigner. Now sorts all of the reads in the interval before handing them to ConstrainedMateFixingSAMFileWriter to maintain correct contract between the two pieces of software
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5329 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 03:52:18 +00:00
depristo d216830b92 Experimental linear version of the exact model. In testing, but gives identical results to N2 gold standard version, and passes integration tests. Performance optimizations still ongoing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5328 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-28 03:48:11 +00:00
ebanks 54facb2c51 Small change for Mauricio so that the correct metrics get output when running in GENOTYPE_GIVEN_ALLELES mode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5327 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-27 06:08:32 +00:00
depristo 7ff8d23c64 Don't do genotype concordance on comp tracks without genotypes, even if they have an AC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5321 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 21:11:50 +00:00
hanna 600f73cbd6 A checkpoint commit of two BAM reading projects going on simultaneously. These two projects
are works in progress, and this checkin will provide a baseline against which to gauge 
improvements to both projects.

Low-memory BAM protoshards (disabled by default):
- Currently passing ValidatingPileupIntegrationTest.
- Gets progressively slower throughout the traversal, but should run at least as fast as original implementation.
- Uses 10+ file handles per BAM, but should use 3.

BAM performance microbenchmark test system:
- Currently tests performance of BAM reading using SAM-JDK vs. GATK
- Tests do not hit all GATK performance hotspots.
- New tests that require input data in a slightly different form are hard to implement.
- Output of test results is not easily parseable (investigating Google Caliper for possible improvements).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5317 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 17:50:32 +00:00
ebanks 5d28cbda27 When crossing contigs it's crucial that the queue get flushed or else it will continue to accumulate reads without emitting. This is the last time I trust someone when they tell me that they are 'confident there are no bugs' in a tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5315 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-25 05:18:30 +00:00
rpoplin 1129f1535d Fix for the HaplotypeScore optimization in AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5310 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 20:40:18 +00:00
chartl 0f1c1fa26f First general association module. Let the bug fixing begin!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5307 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 19:55:33 +00:00
chartl 292b421113 Framework for generalized association testing. Heavy lifting done in implementation of the AssociationContext(s) and AssociationContextAtom(s). Nothing really implemented.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5306 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 18:12:39 +00:00
asivache 2f2aa339d9 Now makes all pairs, not only the good ones. The logic of selecting the "best" pair when the data are messy (e.g. multiple alignments available for an end) is still very naive
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5303 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:21:26 +00:00
asivache abf3fcbb72 Little changes in recognized annotation terms; columns in annotated maf are now prioritized and multiple alternatives do not cause 'i don't know what to do' crash: e.g. if Chromosome and chr columns are both present, then Chromosome is taken (has a priority).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5302 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:19:06 +00:00
rpoplin 255cc246a2 Change in Methods development pipeline: dbsnp130 can't be used for anything, changed it to dbsnp129. Optimization for HaplotypeScore and the to-be-committed ReadRosRankSumTest in AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5301 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 16:09:03 +00:00
chartl 97e1a5262e -ct x no longer includes coverage in the previous bin
BatchMerge - additional support for indels (can't just test the alternate allele when it's an extended event, must also specify that you want to use the dindel model when you actually test the allele)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5300 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 15:52:04 +00:00
ebanks ee6f112556 Phase 3: constrained movement is now the only option available in the realigner (so I guess technically it's not really an option). Several command-line options are deprecated. Code cleaned up. Wiki updated. Release coming. One phase left...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5299 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 14:59:48 +00:00
ebanks 93888e570b Phase 2: after hours of testing, confirming that constrained mode looks good so moving the integration tests over to use it. Some cleanup. More cleanup coming in Phase 3.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5298 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-24 06:23:41 +00:00
carneiro 75bd0129e7 quick bug fix.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5296 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 19:16:20 +00:00
ebanks 9357bee921 Don't skip tri-allelic alleles passed in - just choose the first one.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5293 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:25:50 +00:00
carneiro a2301383bb quick walker to find out where the reads mapped to huref were mapped in the consensus reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5292 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-23 17:00:17 +00:00
ebanks 318035c147 Fixing up the output system of the Unified Genotyper. Deprecating the -all_bases and -genotype arguments. Adding instead the --output_mode (EMIT_VARIANTS_ONLY, EMIT_ALL_CONFIDENT_SITES, EMIT_ALL_SITES) and --genotyping_mode (DISCOVERY, GENOTYPE_GIVEN_ALLELES) arguments. UG now does the correct thing when passed alleles (bound to the 'alleles' rod) to use for genotyping; added several integration tests to cover this case. This commit will break the batched calls merging script, but Chris knows this and is ready for it...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5288 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 06:07:18 +00:00
ebanks d7f98ccd9c Adding --doNotWriteOriginalQuals argument to BQ recalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5286 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-22 04:00:00 +00:00
depristo 1a5d296737 ReplaceReadGroups. Fixes BAM files without read group info. MissingReadGroup points people to this tool now. Please point users on the forum to this tool now. Will migrate to Picard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5284 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-21 14:02:41 +00:00
depristo aa4a4e515d Safer interface for ReorderSam. Better error checking. Documentation. Moving into Picard now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5283 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-20 14:35:44 +00:00
depristo cd7a7091ba Lexicographic error points users to the ReorderSam wiki entry
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5281 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:45:37 +00:00
depristo 444bf83acf A simple utility for reordering a BAM file based on a new reference sequence. This tool can be used to efficiently correct a lexicographically sorted BAM file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5279 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-19 23:24:32 +00:00
kshakir 290afae047 GSA-423 Better reporting for errors in QScript.script().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5276 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 22:21:15 +00:00
kiran cb95e68fc0 CpG is no longer a standard stratification.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5273 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 07:17:35 +00:00
kiran 9ddee96f93 When subsetting by sample, need to take extra care that hom-ref sites don't accidentally get treated as variant sites in CompOverlap. Renamed convenience method for creating command-lines in integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5272 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 06:26:38 +00:00
delangel 1bc5c7e99b boneheaded mistake, mixed up my min and max
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5271 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-18 04:02:14 +00:00
kiran 92c82200c9 Fixed an issue where an eval module with TableType objects would get an extra, empty table in the output, screwing up the parse in R.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5267 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 23:03:46 +00:00
asivache 7ffcade3c3 Added MNP to recognized and counted event types
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5266 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:37:38 +00:00
depristo 57c66b5602 Supports GQ now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5265 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 22:30:25 +00:00
kshakir a189454343 FCP only adds the expand intervals QFunction once per script instead of once per QFunction using the ExpandTargets scala trait.
Eval dbSNP's type now based on eval dbSNP instead of genotype dbSNP.
Using an external treemap instead of the JGraphT internal node set to speed up larger graph generation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5261 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 19:09:03 +00:00
delangel f1d708f4d4 Fixes for HRun annotation in case of indels:
a) In case of a deletion value was completely broken, we'd report 0 or -1.
b) For indels, we report maximum of forward and backward values - I've seen empirically many sites which are not strand biased but which seem to be artifacts and the homopolymer run is always to the right only (because we left align by convention).




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5260 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 18:57:21 +00:00
asivache 0e04e95245 Bug fix: when extracting reference sequence for the event from the reference genome, the tool was treating Deletions and MNPs of length N in exactly the same way: ref_bases[current_pos+1,...,current_pos+N]. This is correct for Deletions but not for MNPs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5258 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 16:15:42 +00:00
asivache 52eedaf22d Subtle but very annoying bug due to incorrect exit condition on backward traversal. Example of incorrect old behavior (found by Martha Borkan, this normally would NOT happen with the combination of match/mismatch/open/extend parameters we have been using; use match=10.0, mismatch= -9.0, open= -15.0, extend= -6.66 in older builds in order to reproduce):
let's align two sequences (shown below, good alignment)

AAATTTGGTAAAA-GT
AAATTTGGTAAAAGGT

now let's reverse the same very sequences and align again

 TGAAAATGGTTTAAA
TGGAAAATGGTTTAAA

Note how we lost the deletion and got a mismatch instead at the very first letter of the upper sequence. The overall score of any particular alignment does not depend on the direction of the traversal, so the best alignment (with the highest score) should stay the same too.

New version fixes this issue and produces correct alignment of reverse sequences (up to the different choice of redundant position for the deletion):

T-GAAAATGGTTTAAA
TGGAAAATGGTTTAAA

This version also has the main() method reinstated, so the aligner can be run on its own as a little app.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5255 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-17 00:02:32 +00:00
fromer 6e291820d3 GeneNamesIntervalWalker outputs all genes in each interval; walkers now require a ROD named "intervals"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5254 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 19:58:09 +00:00
fromer b304ced801 Updated haplotype calculator to correctly terminate haploptypes RIGHT BEFORE an unphased het
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5252 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-16 17:10:01 +00:00
depristo 5a51c9a815 AWS_S3 logging is now enabled by default. It first tries to log internally at the Broad, and if it can't goes to AWS_S3. DEV option is removed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5249 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-15 20:20:14 +00:00
fromer d6e3f2eba6 Added GC content calculator for CNV data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5240 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 22:29:55 +00:00
asivache 7a11b4f35d Another change in variant classification values
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5237 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:47:58 +00:00
asivache 7f7d7eb2d1 Inconsequential changes, more 'variant classification' values are recognized
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5236 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-14 17:36:39 +00:00
kiran d3660aa00e Very basic functionality for annotating indels (specifies whether the indel is frameshift, inframe, or non-coding). Does not attempt to recalculate the variant codon, variant amino acid, or whether the site falls within a splice region. Added a convenience method to WalkerTest for building command-line arguments with the proper spacing (so that I stop getting annoyed when I've gotten it wrong and the test system yells at me.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5235 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-13 17:58:20 +00:00
hanna 8d6db5d188 Additional logging of the temp file creation, management, and merging process
for VCF files.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5234 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 22:07:25 +00:00
asivache 03482bf7c4 Number of MQ0 reads in each sample (format field)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5229 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:16:26 +00:00
asivache 8560bb290b Allelic fractions are now computed on MQ>0 reads only; total depth in each sample still includes MQ0 as per usual convention. Also renamed for clarity.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5228 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-11 17:13:15 +00:00
hanna b992abb6eb A few more unit tests plus some extra
functionality for BAM index visualization.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5222 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-09 01:51:34 +00:00
kshakir 4d1cca95bb Removed deprecated getDbsnpFile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5221 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:12:15 +00:00
kshakir a8ab5a5fb9 After code review with APSG, trying a patch for SIGSEGV errors which checks the LSF result codes from lsb_openjobinfo instead of checking for a null return value from lsb_readjobinfo.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5220 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 21:08:22 +00:00
delangel f3de9ee3e0 Refactoring of indel evaluation code to make it easier for external functions to get access to indel classification, in preparation for IndelMetricsByAC to stratify indel classes by AC (not done yet).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5219 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:35:16 +00:00
delangel 3635606cd8 Temp checkin just for experimentation: exposed probabilistic alignment parameters to command line interface to make it easier to experiment on their effects, although a full scrap/rewrite of this should be coming soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5218 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 17:33:29 +00:00
ebanks 196eb77699 CG var format is screwed up and doesn't quite fit into the VariantsToVCF mold (we need to see multiple records before we can assign genotypes to a given position), so it's safer to keep this separate from the other well-behaved formats. Hopefully, it's temporary anyways.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5216 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 03:18:38 +00:00
ebanks 4fe0fcd707 Updates to handle CG data, headers, etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5215 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-08 03:16:05 +00:00
kshakir 8040998c15 Renamed the pipeline yaml dbsnpFile to genotypeDbsnp, and added an evalDbsnp.
Added a genotypeDbsnpType and evalDbsnpType to check the extensions for .vcf or .rod.
Moved renaming of "recalibrated" bams to "cleaned" from sed to yaml generation template (see diff for more info).
Renamed fCP.q to FCP.q.
Though it's still disabled until VariantEval is updated, added changes above to the FCPTest.
Removed refseq table from the queue.sh wrapper script. Only specified in the yaml.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5213 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 22:01:09 +00:00
fromer bceb2a9460 Now that Mauricio has updated the PacBio BAM to properly have RG, can use sample name in the walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5212 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 20:26:57 +00:00
kiran ecbc38aff0 If no comp rod is specified, specify the dummy name none so that we still get counts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5211 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 19:24:52 +00:00
carneiro 1fbfd4082e Cycle covariate now works with pacbio reads. No need to override the platform anymore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5210 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 17:14:55 +00:00
asivache 2a04e0d378 Explicitly set logger's level to info - otherwise samtools is too chatty
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5209 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 17:08:50 +00:00
ebanks 698096dc5a Moving VariantsToVCF to the proper directory; removing the oneoffs CG indel converter in preparation for a ligitimate CG variant Feature class in the works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5207 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-07 05:21:01 +00:00
kiran 1f820d5026 Added two files from some refactoring changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5205 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:20:12 +00:00
kiran 1085bbf303 Fixed issue where all comp tracks were being treated as known tracks. Fixed issue where multiple JEXL expressions were causing an exception because the underlying object did not implement the Comparable interface. Fixed issue where variants being compared to the known track were not being checked for equality of variation type. Fixed issue where functional annotations were not being iterated over properly. Refactored a lot of helper methods into a separate VariantEvalUtils utility class. Significantly expanded the test suite using a small VCF with SNPs, indels, and non-variant loci which makes it much easier to see what the proper answer should be, and included the appropriate grep and awk commands in the comments to confirm the values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5204 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-06 19:19:20 +00:00
kshakir cc5d695bcf Renamed the IPFL Test to IPFL PipelineTest so that it'll be picked up by the PipelineTests.
HACK: Turned off JNA autoRead() in the jobInfoEnt LSF structure to try and dodge the SIGSEGV during strlen calls during bmods. 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5201 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-05 00:06:12 +00:00
depristo 29f3ad72f3 SAMFileWriter that allows the user to move reads, but only a bit, in an incoming coordinated sorted BAM files. Does some local reordering and local mate fixing, under specified constrained. These constrains allow us to make a special -- under testing for Eric, who promised to try this out a bit, expand test cases and integration tests -- but soon to be the default and only model of the realigner that only moves reads with ISIZE < 3000 that directly emits a coordinate sorted, mate fixed validating BAM file without needing FixMates externally. Preliminary testing shows this runs in a totally fine amount of memory and produces equivalent results to the previous version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5199 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:27:05 +00:00
depristo 11ea321b39 Trivial header cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5198 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:23:15 +00:00
depristo fe4aa58d35 Removing unused class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5197 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:22:28 +00:00
depristo 0ad1ea4aa1 Fixed Umapped misspelling
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5196 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 22:21:41 +00:00
asivache 03f265d8bd Change DP format field description in the header line (expected count=1)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5195 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:28:25 +00:00
asivache c0e998621c Computes two format (genotype) level annotations: total read depth in the given sample (DP format field) and fraction of reads supporting alt allele(s) in the given sample (FA format field)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5193 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 21:23:55 +00:00
asivache 8700b74640 Now annotates indels as well. Probably can also annotate mixed vcf with indels +snps, but not tested in that mode...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5192 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-04 20:28:03 +00:00
hanna 5c3198520c A few minor modifications masquerading as significant changes according to
svn's logs:
- Copied BAM indexing engine from Picard back into the GATK anticipating
  shard merging algorithm.  Tried to leave most of the building blocks in
  Picard.  If this turns into a logistical nightmare, I'll merge the building
  blocks into the GATK as well.
- Reorganized the org.broadinstitute.sting.gatk.datasources package, giving
  better separation of query and management functionality for reads, ref, rmd,
  and samples.  
- Merged Shard building blocks into org.broadinstitute.sting.gatk.datasources.
  reads package, indicating it's current strong relationship with the reads,
  rather than the general unifying element I wish this would be.
- Collapsed BAMFormatAwareShard into Shard.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5184 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 17:59:19 +00:00
kiran 9ddc95c833 NewEvaluationContext needs to be generated in the inner loop. Otherwise, multiple comp tracks end up getting routed to the same row of the output table. Added test to cover multiple comp tracks.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5181 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 07:04:53 +00:00
ebanks 918cc09477 Allow multiple records at a position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5178 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-03 04:19:05 +00:00
kiran cb6454bf98 Multiple eval tracks should be bound with different names, rather than just 'eval'. Added tests to cover usage with multiple tracks.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5177 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 22:33:50 +00:00
rpoplin 5a8e2c2739 Going through the backlog of emergency hacks I put in for the 1000G release. It is possible to call a site in an analysis panel but when using all samples the site isn't called because of going over the minimum deletion threshold, for example.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5174 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 15:12:26 +00:00
fromer 3839fd1a25 Updated phasing pipeline to properly read samples from VCF and BAM files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5172 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 07:16:05 +00:00
ebanks 43fb11b923 Removing stray non-ASCII character
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5171 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 03:10:08 +00:00
kiran 2732c839d4 Restored parallelism and associated tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5170 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 02:04:03 +00:00
kiran fd8dd8fb9b Fixed an issue where a no-call in the eval track would prevent a site from a comparison track from being loaded. Added a new test to cover the use case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5169 348d0f76-0448-11de-a6fe-93d51630548a
2011-02-02 01:47:53 +00:00