Commit Graph

2344 Commits (57a168c0dbd0124c858484e49a9ff0a4aea4a231)

Author SHA1 Message Date
depristo 3b1ab86d11 Added generic interfaces to RefMetaDataTracker to obtain VariantContext objects. More docs. Integration tests for VariantContexts using dbSNP and VCF. At this stage if you use dbSNP or VCF files only in your walkers, please move them over to the VariantContext, it's just nicer. If you've got RODs that implemented the old variation/genotype interfaces, and you want them to work in new walkers, please add an adaptor to VariantContextAdaptors in refdata package. It should be easy and will reduce burden in the long term when those interfaces are retired.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2803 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:26:06 +00:00
depristo 995d55da81 now uses the new RMDT getVariantContext() functions instead of doing the work itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2802 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:23:06 +00:00
depristo 33760834d6 commented out inactive (due to string ==) but actually incorrect code. Sometimes two wrongs do make a right
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2801 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:22:26 +00:00
hanna c7e006a996 Bug fixes for interval batching in sharding system. Sharding system now batches intervals and passes
basic tests for small and large intervals and intervals that cross bin boundaries.  Currently works
only with a single BAM file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2800 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 21:47:54 +00:00
asivache a1d5a384f4 Reverting the last reversal. bestConsensus points to something also kept in a set, so just reassigning it will NOT automatically destroy the underlying data; explicit clearing of unneeded data reinstated. STUPIDO!!!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2796 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:08:53 +00:00
asivache cf7e6d0c0b Memory-saving change, same as in old IntervalCleaner (if alt consensus does not beat the best one, destroy its data immediately)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2795 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:05:04 +00:00
asivache df0be25afb ooops, no need to destroy old best's data explicitly, it will be done automatically of course
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2794 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:03:16 +00:00
asivache 9f44018b7d Reducing memory footprint: if alt consensus does not beat the best alt observed so far, destroy its data immediately, instead of keeping them around. If new alt is better than the old best, then destroy the old best right away instead.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2793 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 17:58:54 +00:00
rpoplin be33d1852c Reverting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2792 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:57:09 +00:00
depristo af8c47fc2f Fixing up testVariantContext for integration tests for variant context. Printing of VCs and genotypes now stable using sorting. Cleaned up comments in quality score by strand. RefMetaDataTracker now directly allows walkers to obtain VariantContexts using the simple Collection<VariantContext> getAllVariantContexts(GenomeLoc curLocation, EnumSet<VariantContext.Type> allowedTypes, boolean requireStartHere, boolean takeFirstOnly) function. VCF and dbSNP VariantContexts now officially supported. Other importan types can be added to the adapator system in refdata package. Integration tests later today
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2791 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:42:54 +00:00
rpoplin 0d8d6e0a14 Ti/Tv module in VariantEval shows known and novel ratios if possible
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2790 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:37:40 +00:00
depristo 1494dc875f fixing up tests. Moves are complete
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2789 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:24:00 +00:00
depristo c6d86da4b8 almost managed to move things around perfectly in move go
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2788 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:18:26 +00:00
depristo e0af3bf761 updating back names
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2786 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:53:45 +00:00
depristo 777617b6c7 managed to actually move the files too! Damn you svn
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2785 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:47:19 +00:00
depristo 8938a4146d moving varianteval2 to it's own dir
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2784 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:37:04 +00:00
depristo 69132c81aa Documentation. Plus nicer structure to adaptors. Intermediate checkin before move into core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2783 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:33:27 +00:00
hanna e53432d54d Checkpoint for combining adjacent intervals into the same shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2782 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 02:48:02 +00:00
asivache 0d347d662a More plumbing: if after the shift window contains indel(s) at the first position, do not throw an exception, just print the warning (we can not deal with this situation!!) and discard those indels without trying to call them. This situation will most probably arise after forced shift over a messy region anyway.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2781 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 21:06:28 +00:00
depristo 1d86dd7fd1 Interface changes following Matt's advice. VariantContexts are now immutable, and there are special mutable versions, in case you need to change things. AttributedObject now a InferredGeneticContext and package protected. VariantContexts are now named, which makes them easier to use with the rod system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2780 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 20:55:49 +00:00
asivache e7b710791f OK, we finally ran into a messy dataset where we can not find a place to shift the window to: there's an indel at every position. Don't panick, don't throw an exception, just ignore the whole window completely, we do not want to call there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2779 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:49:56 +00:00
asivache 152f65b362 Do not die in --cycleOnly mode when the lane is not paired end, just count all single end basequals into the first column and leave the second column filled with 0s
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2778 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:48:12 +00:00
asivache a3cd56897d moving older versions of the oneoff project to archive, bye-bye
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2777 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:46:27 +00:00
asivache f7e7bcd2ef Oneoff project, totally unrelated to anything
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2776 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:44:50 +00:00
hanna 334da80e8b Fixed Mark's bad checkin.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2775 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 12:40:58 +00:00
depristo 1ce0f06216 temp checkin for reorganization
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2774 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 11:10:24 +00:00
ebanks 83b9d63d59 1. Added functionality to the data sources to allow engine to get mapping from input files to (merged) read group ids from those files.
2. Used said mapping to implement N-way-in,N-way-out functionality in the new indel cleaner.  Still needs more testing (to be done after vacation but preliminary tests look good).
3. Fixes to VCF validator: ignore case when testing VCF reference base against true reference base and allow quals of -1 (as per spec).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2773 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 04:12:49 +00:00
rpoplin 210c4c9913 AnalyzeAnnotations now makes plots for the value in the QUAL column as if it were an annotation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2771 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 20:33:15 +00:00
hanna 3f35e181d5 Add an alternate implementation of the BAM file reader that keeps the entire index in memory. Initial revision of BAMFileStat, a tool to inspect BAM file BGZF blocks and index entries.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2769 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 19:48:15 +00:00
depristo c89ba7b1a4 improvements to variant eval 2. Now has titv calculations and mendelian violation detect support. we only make ~80 mendelian violations in 380K calls for the YRI trio, in case you are interested
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2768 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 16:03:19 +00:00
aaron af7cd9cf58 some very old tests relied on cancer data that got moved. Reset one to use data in the validation directory, the other to the artificial sam utils (the best approach).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2767 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 23:13:10 +00:00
depristo fa2cd432fd better printing in VE2. Added support for TiTv analysis
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2766 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 21:20:29 +00:00
depristo cbbc0e98d2 fix for broken imports
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2765 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 15:20:27 +00:00
depristo 681c196097 V2 of VariantEval2. Framework is essentially complete., very simple and clear now compared to VE1. Support for any number of JEXL expressions. dbSNP% evaluation added to show paired comparison evaluation. Pretty printing output tables. Performance is poor but can easily be fixed (see todo notes).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2764 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 14:18:46 +00:00
hanna 9dbdfff786 Moved VariantEval to core. Updated integration test md5s to reflect new Analysis class names.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2762 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 00:22:15 +00:00
asivache 4ddbaeed07 In attempt to reuse: --pairCountsOutput is now optional, if not specified then only per-locus statistics is collected; --silent - do not echo results into stdout; --minMapQ - count only bases coming from reads mapped with specified quality or better; --blacklistedlanes - do not count reads/bases coming from specific lanes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2761 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 22:05:19 +00:00
chartl 2c4f709f6f Bunch of oneoff stuff that I don't want to lose. Also:
VCFRecord - "." dbsnp-ID entries now taken into account (thought these were represented as null; but I guess not)
VCFGenotypeRecord - added a replaceFormat option; since intersecting Broad/BC call sets required genotype formats also be intersected (no changing on-the-fly)
VCFCombine - altered doc to instruct user to give complete priority list (was throwing exception if not)




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2760 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 21:35:10 +00:00
asivache 421282cfa3 Convenience method: getMappingFilteredPileup(int minMapQ)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2759 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 21:19:53 +00:00
ebanks 506d39f751 The UG calculations are now driven by an independent engine.
This completely separates the genotyper walker from other walkers.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2758 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 20:57:31 +00:00
hanna d8e75cf631 Fix for Kiran's memory issue running UG...turned out to be a particularly bad interaction between @By(Reference) traversals and TreeReduce.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2757 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 20:27:06 +00:00
depristo d9671dffba Documentation for VariantContext. Please read it and start using it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2756 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 17:49:51 +00:00
asivache 990af3f76e Will now work with simplest tabular format - genotype string ("+ACTT") does not have to be followed by ':'
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2755 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 15:40:01 +00:00
ebanks e0808e6c37 Moved old EM model to archive
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2754 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 02:55:32 +00:00
rpoplin 64fc76e4bf Added an option to AnalyzeCovariates to set the max value of the histograms to make them easier to directly compare.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2753 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 23:13:57 +00:00
ebanks f6da57dc79 1. For Matt: JIRA GSA-270. Other walkers needing to call into the Unified Genotyper now use static methods (e.g. runGenotyper()) instead of calling initialize and map.
2. Set the default confidence cutoff to 50 (instead of 0).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2752 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 21:14:57 +00:00
ebanks ce9d3dcefb Removing deprecated version of indel genotyper (putting it in archive in case we need to reproduce original 1KG indel calls for some reason).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2749 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 14:05:36 +00:00
depristo 3d45457595 VariantEval2 test framework implemented; Kiran is experimenting with the system. Not for use by anyone else. VariantContext appears to work well; I'll release it next week for general use following docs of the functions. Removing newvarianteval and other classes to avoid any future confusion. Update to TraverseLoci and RodLocusView to simplify a few functions and to correct some minor errors. All tests pass without modification.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2748 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-30 20:51:24 +00:00
chartl 236764b249 Major (and useful) changes to MultiSampleConcordance:
1) Now cares about Genotype filtering. If it is flagged as filtered, it can count as a FP/FN/TP; but goes into a "non-confident genotype" bin, rather than het/hom.

2) Can give it a Genotype Confidence flag (-GC) which will automatically filter genotypes in the way above for quality > Q for "-GC Q"

3) Can give it an -assumeRef flag. For sites only in the truth VCF (that don't even appear in the variant VCF), that locus will be treated as confident
   ref calls for all individuals in the variant VCF; and the calculators updated accordingly.

*** Important: Default behavior is that sites unique to the truth VCF are considered no-call sites for the variant. This flag can help get aroudn that;
    however the safest way to run this is to have a variant VCF with calls at each and every locus, if that is possible.

VCFGenotypeRecord -- added an isFiltered() call to automate looking up the FILTERED flag for VCF v3.3

SimpleVCFIntersectWalker - basic outline for a walker I'm working on tonight.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2747 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-30 01:18:31 +00:00
jmaguire ea7e737441 Two new annotations:
1. LowMQ: fraction of reads at MQ=0 or MQ<=10.
	2. Alignability: annotate SNPs with Heng's (or anyone else's) alignability mask.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2746 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 23:23:00 +00:00
chartl 97f60dbc4b Moving stuff around. ( core;playground ) ----> ( oneoffs ). I've been a bad boy, sullying the core codebase.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2745 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 22:50:03 +00:00