aaron
5546aa4416
adding code to deal with the off-spec situation where our minimum likelihood is above the GLF max of 255.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2871 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:27:39 +00:00
hanna
88d0677379
Misc correctness enhancements: develop the bin selector into a recursive algorithm and return a shard when reads are missing. Also improve the performance of the read filter that clips reads not actually present in the shard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2870 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:19:06 +00:00
ebanks
8b555ff17c
Killed the old cleaner code. Bye bye.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2868 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:49:58 +00:00
kshakir
3738b76320
Added a playground concordance analyzer for summarizing VariantEval across a group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2867 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:28:52 +00:00
ebanks
a640bd2d79
ignore uninteresting extended events
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2866 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:55:46 +00:00
rpoplin
32e5dceef9
Moving comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2865 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:27:31 +00:00
alecw
b236714c8a
Optimization - Added method to Covariates: void getValues( SAMRecord read, Comparable[] comparable ) which takes an array of size (at least) read.getReadLength() and fills it with covariate values for all positions in the given read. Made CovariateCounterWalker and TableRecalibrationWalker use this method instead of calling getValue(..) for each covariate and each offset.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2863 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 17:35:25 +00:00
ebanks
32d14d988e
Overload parseIntervalRegion() to allow for the interval merging rule to be passed in (so one is not required to use the value from the GATK arg collection).
...
Now the IndelRealigner can use this functionality without being forced to merge abutting intervals (which was actually causing a problem with the cleaning).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2862 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 04:13:54 +00:00
hanna
cc09f48cd8
Correctness fix: index can concat chunks around shard edges, and my code didn't account for that.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2861 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 21:44:33 +00:00
chartl
0e05a3acb0
Adding depth of coverage features to firehose summary tools
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2860 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 19:47:16 +00:00
hanna
71f18e941f
Significant performance improvements made by subtracting out the contents of the prior highest-level bin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2859 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 16:46:16 +00:00
rpoplin
3e0e7aad2d
Removing debug statement. oops.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2858 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 15:26:22 +00:00
rpoplin
7f19ff1fa1
Added a new option in the recalibrator to be used by people who have SOLiD data in which only a few of the reads have no-calls in the color space. These reads will be skipped over and left in the bam file untouched.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2857 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 15:25:23 +00:00
aaron
b1a4e6d840
removing non-ascii characters from my Copyright and from VariantEval2Walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2856 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:54:36 +00:00
aaron
33ae256186
a start to some of the infrastructure for Tribble, including dynamic detection of new RMD; not nearly wired in or complete yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2855 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:43:52 +00:00
ebanks
bbbad79f8c
Forgot to remove debugging code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2854 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:12:58 +00:00
ebanks
7669eaaeb3
Optimizations to the cleaner algorithm; reduce total runtime by almost 20%.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2852 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:10:56 +00:00
ebanks
79ab7affda
- Change sortOnDisk option to sortInMemory
...
- Fix horrible cleaner bug
- Trivial optimizations to cleaner code - more significant ones coming soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2850 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-17 20:52:57 +00:00
ebanks
2520889cb3
Check for bad intervals and don't emit them
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2849 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 21:42:36 +00:00
aaron
653f70efa2
added methods to validate an interval before you try to make a GenomeLoc: boolean validGenomeLoc().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2846 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 20:35:35 +00:00
chartl
01af3d0663
Update an error message :)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2842 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 23:24:06 +00:00
jmaguire
81313d9452
added class VCFMerge
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2840 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 14:41:50 +00:00
jmaguire
0ef50bcae7
- update to match recent changes in the VCF parser
...
- compute Het Error Rate in VCFConcordance
- changes to the frequency-specific optimizer
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2839 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 14:27:01 +00:00
depristo
8072e9aed5
should never commit without running intergration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2838 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 23:42:37 +00:00
depristo
a1a3d5fcb0
Support for reading in table of rsIDs -> dbSNP builds to back generate a dbSNP build X from a single file. Very useful indeed. dbSNP -> VC now captures the rsID in the context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2837 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 22:40:55 +00:00
kcibul
28f24ca2ae
made some private member/methods protected to allow for subclassing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2836 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 21:16:00 +00:00
hanna
232d884578
Got back most of the performance lost when I fixed the dropped reads problem.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2835 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 19:59:56 +00:00
chartl
04a2784bf7
Initial commit of tools under development for data QC through firehose.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2834 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 19:13:24 +00:00
hanna
77af5822d4
Correcting my incomplete understanding of how the BAM file index actually works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2833 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 16:15:19 +00:00
depristo
5f74fffa02
Massive improvements to VE2 infrastructure. Now supports VCF writing of interesting sites; multiple comp and eval tracks. Eric will be taking it over and expanding functionality over the next few weeks until it's ready to replace VE1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2832 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 15:26:52 +00:00
depristo
197dd540b5
added root GATKData variable
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2831 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 15:25:34 +00:00
ebanks
c6f6948f9d
Haiku:
...
Eric is a fool.
Matt found his really dumb bug.
Eric is humbled.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2830 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 04:51:56 +00:00
rpoplin
ecebf0bc62
Bug fix for null pointer exception in AnalyzeAnnotations if -name argument isn't specified
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2828 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 18:39:26 +00:00
mmelgar
ad608d0e9d
Cleaned up documentation on SecondaryBaseTransitionTableWalker and added Read Group and Allele Balance to the info.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2827 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 17:20:35 +00:00
hanna
34e566c90d
Fixed bug where new sharding system wasn't grabbing the reads that start at the end of a bin. Caused by what I currently believe to be a bug in Picard -- will verify with Alec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2826 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 17:00:04 +00:00
ebanks
96fee7cf7a
Disabling input of known indels for use as alternate consenses. When we get rods in a read traversal, it will be trivial to hook it into the cleaner (the code is already there).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2825 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 15:52:21 +00:00
ebanks
a4a2c9b172
Deal with bad input; also N-way out isn't default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2823 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 03:44:56 +00:00
hanna
dc885ba386
Fix for some correctness bugs found during early performance testing, phase 1.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2822 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 22:32:25 +00:00
depristo
c66861746a
improvements to ve2, including more meaningful mendelian violation counting. Support for VCF emitted interesting sites, annotated according to the evaluations themselves. Basic intergration test for VE2 started
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2819 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 16:12:29 +00:00
rpoplin
3de72daa88
Removing an accidently added import statement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2818 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 15:54:24 +00:00
rpoplin
0b1e243a7b
CountCovariates now sorts the list of standard covariate classes coming from PackageUtils.getClassesImplementingInterface(). As a result some of the integration tests now make use of -standard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2817 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 15:52:20 +00:00
ebanks
6652b992f7
The new cleaner can now use known indels to create alternate consenses for cleaning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2816 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 04:39:15 +00:00
hanna
0250338ce7
Basic use cases for merging BAM files with the new sharding system work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2815 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 22:14:37 +00:00
depristo
934d4b93a2
VariantContext to VCF converter. BeagleROD, and phasing of VCF calls. Integration tests galore :-)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2814 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 19:02:25 +00:00
andrewk
369cc50802
Added playground walker that does a basic concordance check between two VCF files - an eval and a truth file - across all samples in the eval file. Produces per-sample, per-locus debug info and simple concordance stats. This is not meant to be extended, but rather used for validating the HapMap to VCF conversion in preparation for retiring GFF-based HapMap data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2813 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 02:41:18 +00:00
depristo
94f892ad42
VCF->beagle and VCF phasing using beagle input. Appears to work fairly well. VariantContexts now support phased genotypes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2812 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 01:22:05 +00:00
depristo
457568485a
simple Beagle input ROD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2811 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 01:21:04 +00:00
hanna
57b8c9a53c
Supporting infrastructure for merging SAM files. Not yet integrated into the datasource.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2810 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 23:59:38 +00:00
kshakir
fc810a1800
Updated VCF Reader to parse VCFs according to the VCFv3.3 spec. Column headers are tab separated since sample names might have spaces.
...
Updated test files in /humgen/gsa-scr1/GATK_Data/Validation_Data/*.vcf to remove spaces except for when they are supposed to be in the sample name.
Added @Test before VCFReaderTest.testHeaderNoRecords()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2809 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 22:55:59 +00:00
chartl
935e76daa1
Minor changes to oneoff walkers. PlinkRod altered but still commented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2808 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 18:49:56 +00:00
hanna
21369869b7
Extend regex that supports every 'word' character to use any printable character except ':'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2807 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 03:29:55 +00:00
ebanks
4fe851a83d
Optimization: don't keep scoring an alternate consensus if it's already worse than the best alt seen so far.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2806 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-07 05:06:32 +00:00
ebanks
ca1917507f
Various improvements and fixes:
...
In indel cleaner:
1. allow the user to specify that he wants to use Picardâs SAMFileWriter sorting on disk instead of having us sort in memory; this is useful if the input consists of long reads.
2. for N-way-out mode: output bams now use the original headers from the corresponding input bams - as opposed to the merged header. This entailed some reworking of the datasources code.
3. intermediate check-in of code that allows user to input known indels to be used as alternate consenses. Not done yet.
In UG: fix bug in beagle output for Jared.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2805 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-07 04:21:04 +00:00
depristo
3b1ab86d11
Added generic interfaces to RefMetaDataTracker to obtain VariantContext objects. More docs. Integration tests for VariantContexts using dbSNP and VCF. At this stage if you use dbSNP or VCF files only in your walkers, please move them over to the VariantContext, it's just nicer. If you've got RODs that implemented the old variation/genotype interfaces, and you want them to work in new walkers, please add an adaptor to VariantContextAdaptors in refdata package. It should be easy and will reduce burden in the long term when those interfaces are retired.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2803 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:26:06 +00:00
depristo
995d55da81
now uses the new RMDT getVariantContext() functions instead of doing the work itself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2802 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:23:06 +00:00
depristo
33760834d6
commented out inactive (due to string ==) but actually incorrect code. Sometimes two wrongs do make a right
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2801 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:22:26 +00:00
hanna
c7e006a996
Bug fixes for interval batching in sharding system. Sharding system now batches intervals and passes
...
basic tests for small and large intervals and intervals that cross bin boundaries. Currently works
only with a single BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2800 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 21:47:54 +00:00
asivache
a1d5a384f4
Reverting the last reversal. bestConsensus points to something also kept in a set, so just reassigning it will NOT automatically destroy the underlying data; explicit clearing of unneeded data reinstated. STUPIDO!!!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2796 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:08:53 +00:00
asivache
cf7e6d0c0b
Memory-saving change, same as in old IntervalCleaner (if alt consensus does not beat the best one, destroy its data immediately)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2795 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:05:04 +00:00
asivache
df0be25afb
ooops, no need to destroy old best's data explicitly, it will be done automatically of course
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2794 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:03:16 +00:00
asivache
9f44018b7d
Reducing memory footprint: if alt consensus does not beat the best alt observed so far, destroy its data immediately, instead of keeping them around. If new alt is better than the old best, then destroy the old best right away instead.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2793 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 17:58:54 +00:00
rpoplin
be33d1852c
Reverting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2792 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:57:09 +00:00
depristo
af8c47fc2f
Fixing up testVariantContext for integration tests for variant context. Printing of VCs and genotypes now stable using sorting. Cleaned up comments in quality score by strand. RefMetaDataTracker now directly allows walkers to obtain VariantContexts using the simple Collection<VariantContext> getAllVariantContexts(GenomeLoc curLocation, EnumSet<VariantContext.Type> allowedTypes, boolean requireStartHere, boolean takeFirstOnly) function. VCF and dbSNP VariantContexts now officially supported. Other importan types can be added to the adapator system in refdata package. Integration tests later today
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2791 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:42:54 +00:00
rpoplin
0d8d6e0a14
Ti/Tv module in VariantEval shows known and novel ratios if possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2790 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:37:40 +00:00
depristo
1494dc875f
fixing up tests. Moves are complete
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2789 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:24:00 +00:00
depristo
c6d86da4b8
almost managed to move things around perfectly in move go
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2788 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:18:26 +00:00
depristo
e0af3bf761
updating back names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2786 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:53:45 +00:00
depristo
777617b6c7
managed to actually move the files too! Damn you svn
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2785 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:47:19 +00:00
depristo
8938a4146d
moving varianteval2 to it's own dir
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2784 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:37:04 +00:00
depristo
69132c81aa
Documentation. Plus nicer structure to adaptors. Intermediate checkin before move into core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2783 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:33:27 +00:00
hanna
e53432d54d
Checkpoint for combining adjacent intervals into the same shard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2782 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 02:48:02 +00:00
asivache
0d347d662a
More plumbing: if after the shift window contains indel(s) at the first position, do not throw an exception, just print the warning (we can not deal with this situation!!) and discard those indels without trying to call them. This situation will most probably arise after forced shift over a messy region anyway.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2781 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 21:06:28 +00:00
depristo
1d86dd7fd1
Interface changes following Matt's advice. VariantContexts are now immutable, and there are special mutable versions, in case you need to change things. AttributedObject now a InferredGeneticContext and package protected. VariantContexts are now named, which makes them easier to use with the rod system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2780 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 20:55:49 +00:00
asivache
e7b710791f
OK, we finally ran into a messy dataset where we can not find a place to shift the window to: there's an indel at every position. Don't panick, don't throw an exception, just ignore the whole window completely, we do not want to call there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2779 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:49:56 +00:00
asivache
152f65b362
Do not die in --cycleOnly mode when the lane is not paired end, just count all single end basequals into the first column and leave the second column filled with 0s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2778 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:48:12 +00:00
asivache
a3cd56897d
moving older versions of the oneoff project to archive, bye-bye
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2777 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:46:27 +00:00
asivache
f7e7bcd2ef
Oneoff project, totally unrelated to anything
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2776 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:44:50 +00:00
hanna
334da80e8b
Fixed Mark's bad checkin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2775 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 12:40:58 +00:00
depristo
1ce0f06216
temp checkin for reorganization
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2774 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 11:10:24 +00:00
ebanks
83b9d63d59
1. Added functionality to the data sources to allow engine to get mapping from input files to (merged) read group ids from those files.
...
2. Used said mapping to implement N-way-in,N-way-out functionality in the new indel cleaner. Still needs more testing (to be done after vacation but preliminary tests look good).
3. Fixes to VCF validator: ignore case when testing VCF reference base against true reference base and allow quals of -1 (as per spec).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2773 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 04:12:49 +00:00
rpoplin
210c4c9913
AnalyzeAnnotations now makes plots for the value in the QUAL column as if it were an annotation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2771 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 20:33:15 +00:00
hanna
3f35e181d5
Add an alternate implementation of the BAM file reader that keeps the entire index in memory. Initial revision of BAMFileStat, a tool to inspect BAM file BGZF blocks and index entries.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2769 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 19:48:15 +00:00
depristo
c89ba7b1a4
improvements to variant eval 2. Now has titv calculations and mendelian violation detect support. we only make ~80 mendelian violations in 380K calls for the YRI trio, in case you are interested
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2768 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 16:03:19 +00:00
aaron
af7cd9cf58
some very old tests relied on cancer data that got moved. Reset one to use data in the validation directory, the other to the artificial sam utils (the best approach).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2767 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 23:13:10 +00:00
depristo
fa2cd432fd
better printing in VE2. Added support for TiTv analysis
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2766 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 21:20:29 +00:00
depristo
cbbc0e98d2
fix for broken imports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2765 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 15:20:27 +00:00
depristo
681c196097
V2 of VariantEval2. Framework is essentially complete., very simple and clear now compared to VE1. Support for any number of JEXL expressions. dbSNP% evaluation added to show paired comparison evaluation. Pretty printing output tables. Performance is poor but can easily be fixed (see todo notes).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2764 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 14:18:46 +00:00
hanna
9dbdfff786
Moved VariantEval to core. Updated integration test md5s to reflect new Analysis class names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2762 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 00:22:15 +00:00
asivache
4ddbaeed07
In attempt to reuse: --pairCountsOutput is now optional, if not specified then only per-locus statistics is collected; --silent - do not echo results into stdout; --minMapQ - count only bases coming from reads mapped with specified quality or better; --blacklistedlanes - do not count reads/bases coming from specific lanes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2761 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 22:05:19 +00:00
chartl
2c4f709f6f
Bunch of oneoff stuff that I don't want to lose. Also:
...
VCFRecord - "." dbsnp-ID entries now taken into account (thought these were represented as null; but I guess not)
VCFGenotypeRecord - added a replaceFormat option; since intersecting Broad/BC call sets required genotype formats also be intersected (no changing on-the-fly)
VCFCombine - altered doc to instruct user to give complete priority list (was throwing exception if not)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2760 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 21:35:10 +00:00
asivache
421282cfa3
Convenience method: getMappingFilteredPileup(int minMapQ)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2759 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 21:19:53 +00:00
ebanks
506d39f751
The UG calculations are now driven by an independent engine.
...
This completely separates the genotyper walker from other walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2758 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 20:57:31 +00:00
hanna
d8e75cf631
Fix for Kiran's memory issue running UG...turned out to be a particularly bad interaction between @By(Reference) traversals and TreeReduce.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2757 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 20:27:06 +00:00
depristo
d9671dffba
Documentation for VariantContext. Please read it and start using it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2756 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 17:49:51 +00:00
asivache
990af3f76e
Will now work with simplest tabular format - genotype string ("+ACTT") does not have to be followed by ':'
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2755 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 15:40:01 +00:00
ebanks
e0808e6c37
Moved old EM model to archive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2754 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 02:55:32 +00:00
rpoplin
64fc76e4bf
Added an option to AnalyzeCovariates to set the max value of the histograms to make them easier to directly compare.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2753 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 23:13:57 +00:00
ebanks
f6da57dc79
1. For Matt: JIRA GSA-270. Other walkers needing to call into the Unified Genotyper now use static methods (e.g. runGenotyper()) instead of calling initialize and map.
...
2. Set the default confidence cutoff to 50 (instead of 0).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2752 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 21:14:57 +00:00
ebanks
ce9d3dcefb
Removing deprecated version of indel genotyper (putting it in archive in case we need to reproduce original 1KG indel calls for some reason).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2749 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 14:05:36 +00:00
depristo
3d45457595
VariantEval2 test framework implemented; Kiran is experimenting with the system. Not for use by anyone else. VariantContext appears to work well; I'll release it next week for general use following docs of the functions. Removing newvarianteval and other classes to avoid any future confusion. Update to TraverseLoci and RodLocusView to simplify a few functions and to correct some minor errors. All tests pass without modification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2748 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-30 20:51:24 +00:00
chartl
236764b249
Major (and useful) changes to MultiSampleConcordance:
...
1) Now cares about Genotype filtering. If it is flagged as filtered, it can count as a FP/FN/TP; but goes into a "non-confident genotype" bin, rather than het/hom.
2) Can give it a Genotype Confidence flag (-GC) which will automatically filter genotypes in the way above for quality > Q for "-GC Q"
3) Can give it an -assumeRef flag. For sites only in the truth VCF (that don't even appear in the variant VCF), that locus will be treated as confident
ref calls for all individuals in the variant VCF; and the calculators updated accordingly.
*** Important: Default behavior is that sites unique to the truth VCF are considered no-call sites for the variant. This flag can help get aroudn that;
however the safest way to run this is to have a variant VCF with calls at each and every locus, if that is possible.
VCFGenotypeRecord -- added an isFiltered() call to automate looking up the FILTERED flag for VCF v3.3
SimpleVCFIntersectWalker - basic outline for a walker I'm working on tonight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2747 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-30 01:18:31 +00:00
jmaguire
ea7e737441
Two new annotations:
...
1. LowMQ: fraction of reads at MQ=0 or MQ<=10.
2. Alignability: annotate SNPs with Heng's (or anyone else's) alignability mask.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2746 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 23:23:00 +00:00
chartl
97f60dbc4b
Moving stuff around. ( core;playground ) ----> ( oneoffs ). I've been a bad boy, sullying the core codebase.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2745 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 22:50:03 +00:00
rpoplin
16da5011c0
Added a new option for indicating the mean number of variants on the AnalyzeAnnotations plots. This way one can say, for example, filtering at this point will keep 75 percent of all the variants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2744 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 21:58:31 +00:00
hanna
668c7da33d
Bug fix in custom override of queryOverlapping.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2743 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 21:35:59 +00:00
rpoplin
c6cc844e55
Added -name argument to AnalyzeAnnotations that allows one to specify the name of the annotation to be used on the plots. Instead of seeing AB and DP, one can add -name AB,AlleleBalance -name DP,Depth
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2742 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:48:53 +00:00
depristo
62a80f2b6f
fixed out of date tests. Also, tests uncovered a subtle bug in new implementation that was also fixed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2741 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:03:48 +00:00
rpoplin
4f29a1d4f6
AnalyzeAnnotations now plots true positive rate instead of percentage of variants found in the truth set. Committing GCContentCovariate to help people experiment with correcting the pilot3/Kristian base calling error mode in slx.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2740 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:01:56 +00:00
aaron
ac2a207b0b
added a wrapper exception for anything that goes wrong in VCF parsing; this way the problematic file line is emitted, no matter what happens. Makes debugging a lot easier, especially in large files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2739 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 19:58:51 +00:00
hanna
e7f5c93fe5
Cleaning up the inheritance hierarchy from the previous commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2738 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 19:13:36 +00:00
depristo
88495a39d4
better formating
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2737 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:38:21 +00:00
depristo
1993472b38
Just like VariantFiltration but lets you match info fields out of the VCF instead of annotating them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2736 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:38:03 +00:00
depristo
0a7426c29c
Computes SNP density over the genome. Doesn't work with intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2735 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:36:49 +00:00
depristo
9decd20f46
Fix to priors to allow lower het values for mouse guys; no intergration test changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2734 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:36:12 +00:00
chartl
d57a86ad41
Not nearly as badass as it looks. The problem I mentioned yesterday with "bleeding in" of samples comes from VCFUtils and SampleUtils looking for all VCF-class RODs in the tracker, and stealing the name from them. I have introduced a new HapmapVCF - type rod for use
...
when you want to protect your VCF header from being infected by the samples in a bound hapmap VCF. Changes are as follows:
VCFRecord - minor change to adapt isNovel() to the case where the dbsnp ID field is empty, but the info field has DB=1
HapmapVCFRod - introduced for the reason at the top
RODRecordIterator - was: catch ( Exception e ) { throw new StingException("long ass message") }
is now: catch ( Exception e ) { throw new StingException("long ass message",e) }
to permit full stack ejaculation.
RodVCF - Now with more brackets!
ReferenceOrderedData - registering HapmapVCF as a bindable string
VariantAnnotator - There's an extra space on a line. And some new brackets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2733 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:19:50 +00:00
depristo
5aaf4e6434
VariantFiltration now accepts any number of --name --filter expressions, and annotates the VCF file with each name that matches. Very useful
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2732 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 12:13:08 +00:00
ebanks
01e73fc39e
Yuck - Picard's SAMRecord Comparator only deals with mapped reads. Adding an extended version that works for all reads.
...
After adding some more minor changes to the new realigner it now gets the same exact results as the original version - except that sometimes it doesn't clean when it shouldn't!
More testing coming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2731 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 07:49:47 +00:00
hanna
3d922a019f
Basic support for very simple index-driven locus traversals. Interface has been changed to
...
support batched intervals in a single shard, but intervals are not yet compressed into a single
shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2730 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 03:14:26 +00:00
asivache
4810e9c9cd
And now the DOCS!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2729 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 23:21:33 +00:00
asivache
40262e2070
Now calls single-sample indels too, with all the V2 level stats and bells. This officialy obsoletes IndelGenotyperWalker (V1). In addition, the alignments spanning beyond the contig end are now completely ignored (with a user warning), this applies to both single-sample and paired (somatic) calls. You just wait, Eric, I'll get you the docs with the next commit!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2728 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 22:28:02 +00:00
rpoplin
79c4cc1db7
AnalyzeAnnotations now breaks out titv by calls in hapmap and also plots true positive rates. Any RODs passed in whose name starts with 'truth' is considered to be the truth set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2726 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:41:23 +00:00
chartl
7a10c40fb3
Much clearer (and, like, not totally incorrect) implementation of isNovel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2725 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:16:21 +00:00
chartl
8de6a8d246
Lots of changes; all to do something relatively minor.
...
1) Changed VCF/RodVCF to allow for inquiries to whether or not the site is novel; isNovel() looks at the ID field, and those members of the info field that indicate membership in dbsnp, hapmap2, or hapmap3; and if none can be found, returns true.
2) Changed VariantAnnotator to annotate hapmap2 and hapmap3, if you bind rods to it with those names. Works in the same way as DBSNP does -- if you give it a rod named "hapmap2" it'll annotate membership in it. -- Passes integration tests
3) Changed UnifiedGenotyper to do the same thing (since it uses Annotations as a subroutine) -- Passes integration tests
4) Changed MultiSampleConcordanceWalker to take a flag --ignoreKnownSites (or -novels) to examine concordance only on sites that are not marked as in dbSNP or in Hapmap in the variant VCF
5) Changed VCFConcordanceCalculator (the object MultiSampleConcordanceWalker runs on) to output Concordant_Het_Calls and Concordant_Hom_Calls separately, rather than combined as Concordant_Calls
6) AlleleBalanceHistogramWalker -- I don't know what i did to this thing. I've been jerry rigging System.outs to do stuff it was never really intended to do; so there's probably some dumb System.out.print("HI I AM AT LOCUS:"+loc) stuck somewhere. It compiles at any rate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2724 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:06:56 +00:00
ebanks
6f11fe442a
Sync with Andrey's changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2723 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 20:49:38 +00:00
asivache
db429e1096
Some alt consenses may have cigar string starting with an insertion. Not a bug, strictly speaking, since the cleaner had been detecting this and crashing deliberately. Now it knows how to deal with this special case though. Also, uppercase the ref before using it in SW aligner!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2722 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 18:53:02 +00:00
depristo
956b570c8e
V5 improvements to VariantContext. Now fully supports genotypes. Filtering enabled. Significant tests throughout system. Support for rebuilding variant contexts from subsets of genotypes. Some code cleanup around repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2721 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 18:37:17 +00:00
depristo
9876645a5d
Now drives the walker by reference, not by reads, so we see even loci with no reads. This allows us to accurately calculate the true total callable area
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2720 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 11:12:46 +00:00
ebanks
1dd9996f3a
New realigner now completely uses bytes, plus misc fixes. Still not ready for use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2719 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 04:17:20 +00:00
depristo
f6bca7873c
V3 of VariantContext. Support for Genotypes and NO_CALL alleles. QUAL fields fully implemented. Can parse VCF records and dbSNP. More complete validation. Detailed testing routines for VariantContext and Allele.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2718 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 04:10:16 +00:00
chartl
23fc9737b4
Added the ability to filter out variant (not truth) calls based on read depth. Using -NLD 5 will not update concordant counts for calls with 0, 1, 2, 3, or 4 reads supporting them. Not to be used with VCF files that do not have DP in the format field.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2716 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 23:28:04 +00:00
chartl
1b9184a1c7
Added a multisample concordance walker which takes the place of the VCF python library I've been using. Takes a truth VCF and a variant VCF and outputs A TSV that looks like this:
...
Sample_ID Concordant_Refs Concordant_Vars Homs_called_het Het_called_homs False_Positives False_Negatives_Due_To_Ref_Call False_Negatives_Due_To_No_Call
NA19381 491 294 2 0 0 0 1
NA19451 489 298 1 0 0 0 0
NA19463 486 289 2 3 1 4 3
NA19376 488 296 1 0 2 0 1
NA19317 489 284 5 3 3 3 1
This walker will be merged with GenotypeConcordance once it's clear how to do so.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2715 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 22:59:17 +00:00
asivache
bd11060e72
Ups, I did it again. Fixing the bug introduced in a previous commit: use correct length of the indel event.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2713 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 21:51:54 +00:00
ebanks
fddca032bb
Initial commit of v2.0 of the cleaner. DO NOT USE. (this means you, Chris)
...
Cleaned up SW code and started moving over everything to use byte[] instead of String or char[].
Added a wrapper class for SAMFileWriter that allows for adding reads out of order.
Not even close to done, but I need to commit now to sync up with Andrey.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2712 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 21:36:42 +00:00
rpoplin
b8ae083d1b
AnalyzeAnnotations creates a plot of dbsnp rate as a function of the annotations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2711 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 21:08:33 +00:00
rpoplin
3999a8d2c8
IntelliJ no longer complains that my methods are too complex to analyze.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2708 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 20:12:13 +00:00
rpoplin
fc4285f9fd
AnalyzeAnnotations seems to be popular so I've rewritten the guts to be easier to extend and maintain.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2707 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 19:30:31 +00:00
hanna
fa3589e5c5
Update our error messages to point to getsatisfaction.com/gsa.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2706 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 19:16:28 +00:00
depristo
3399ad9691
Incremental update 2 -- refined allele and VariantContext classes; support for AttributedObject class; extensive testing for Allele class, and partial for VariantContext. Now possible to easily convert dbSNP to VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2705 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 17:19:37 +00:00
asivache
3edcefb7fb
add _gI and _gD to the indel probe names according to the spec (in the hope that wiki is not obsolete); added optional cmd line param -project_id to prefix all probe names with.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2704 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 17:06:49 +00:00
chartl
ed9b7edee3
Changed " to ' to stop the
...
[javadoc] /humgen/gsa-scr1/chartl/sting/java/src/org/broadinstitute/sting/oneoffprojects/variantcontext/VariantContext.java:99: warning: unmappable character for encoding ASCII
[javadoc] * if one of the alleles is deleted (?-?).
warnings on compile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2703 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 15:23:55 +00:00
depristo
40c242d2b8
Fix for overflow issues
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2702 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 13:37:16 +00:00
aaron
8453676b71
added a method to AlignmentContext called hasExceededMaxPileup, which you can use to determine if the current site exceeded the maximum pileup size (reads were dropped). Added this as a check to unified genotyper according to Eric's instructions, and added the plumbing to the engine.
...
Also deleted the FixBamSortOrder package that isn't used anymore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2701 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 05:17:01 +00:00
rpoplin
4bcdab580c
--output_dir has been changed to --output_prefix to give the user more control over the names of the resulting mass of files in AnalyzeAnnotations. The fontsize of the axes is increased. Cumulative filtering plots are removed since the binned filtering plots are much more useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2700 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 04:50:54 +00:00
chartl
df112e64b8
Minor tweaks
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2699 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 04:17:47 +00:00
ebanks
476d6f3076
RealignerTargetCreator is officially live
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2697 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 03:41:52 +00:00
asivache
1f64c5d41a
Do not slurp the whole set of snp mask sites into memory (gets pretty heavy on full dbSNP!); instantiate a privare ROD iterator instead and drag it across the sites we are designing probes for.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2694 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 22:39:46 +00:00
ebanks
47440bc029
- Removed max_coverage argument from UG; Aaron will set it up so that we don't call when the GATK had to drop reads.
...
- Reimplemented optimization in UG to not call when there are no non-ref bases.
- Compute reference confidence accurately in UG for ref calls.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2693 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 21:56:33 +00:00
chartl
2c8d7b0c44
Forgot the onTraversalDone. That was dumb.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2692 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 21:02:46 +00:00
chartl
04e1832968
Added - AlleleBalanceHistogramWalker -- hopefully this'll be able to tell us very clearly whether bad genotype concordance is a result of systematic contamination (consistent wonky allele balances)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2691 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 20:57:12 +00:00
rpoplin
a1054efe8a
Default platform and default read group are no longer set to values by default. The recalibrator throws an exception if needed values are empty in the bam file and the args weren't set by the user. This is done to make it more obvious to the user when the bam file is malformed. Similarly, the recalibrator now refuses to recalibrate any solid reads in which it can't find the color space information with an exception message explaining this. The recalibrator no longer maintains its own version number and instead uses the new global GATK version number.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2690 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 18:47:40 +00:00
rpoplin
0345d9f6a5
Updating the recalibrator to use non-depricated getPileup() method. Adding documentation to AnalyzeAnnotations so that the walker isn't marked as unclean at compile time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2688 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 14:15:09 +00:00
depristo
c231547204
Refactoring and migration of new allele/variantcontext/genotype code into oneoffprojects. NOT FOR USE. PlinkRod commented out due to dependence on this new, rapidly changing interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2687 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 13:53:29 +00:00
aaron
2e57bc7879
added a better message for the SO flag error in MergingSAMIterator2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2685 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 22:57:18 +00:00
rpoplin
24d4082925
AnalyzeAnnotations can now process only variants that are found in samples that match the -sampleName argument. X-axis of plots no longer use annoying scientific notation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2684 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 20:52:11 +00:00
hanna
022601b1a5
Warnings for walkers w/o Javadoc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2683 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 20:34:50 +00:00
rpoplin
894a2b511b
Fixing no platform warning message.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2682 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 19:46:50 +00:00
rpoplin
2b51cf18f0
AnalyzeAnnotations now outputs plots with log x-axis in addition to standard x-axis so things like DP and MQ0 are easier to see. AnalyzeAnnotations now skips over all annotations that aren't floating point values. Recalibrator now warns users if PL tags are missing and so therefore it is reverting to illumina.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2681 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 19:39:18 +00:00
asivache
6cf413e630
Bug: ExpandedSAMRecord did not treat hard-clipped bases ('H') correctly. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2680 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 19:23:44 +00:00
ebanks
dc170caafc
Now, if a dbsnp rod is passed to either the UnifiedGenotyper or VariantAnnotator, a DB=0/1 annotation is added (in addition to filling in the ID field); this is in line with 1KG project calls. If no dbsnp rod is used, the annotation is not added (as opposed to setting every entry to DB=0).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2678 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 17:27:12 +00:00
rpoplin
5d2f8aaa54
Updating recalibrator version number after the several emergency changes last week.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2677 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 14:35:47 +00:00
jmaguire
588417e17d
Don't reference that optimiation library I'm not using anyway.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2676 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 20:30:50 +00:00
jmaguire
d3e3c1c2e0
don't require that optmization lib that I'm not using yet... (doh)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2675 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 20:28:21 +00:00
jmaguire
1d6d2b26f7
tools for optimizing calls.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2674 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 20:16:55 +00:00
jmaguire
877957761f
lots of new stuff, some generally useful, some one-off.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2673 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 19:50:48 +00:00
ebanks
78890c0bee
First version of walker that combines the functionality of IndelIntervalWalker, MismatchIntervalWalker, SNPClusterWalker, and IntervalMergerWalker - plus it allows the user to input rods containing known indels (e.g. dbSNP or 1KG calls) for automatic cleaning. Basically, all pre-processing steps for cleaning are now done in a single pass.
...
More testing needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2672 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 05:32:38 +00:00
chartl
d6b9b788a8
Renamed -- PlinkRodWithGenomeLoc --> PlinkRod
...
Since binary files do not need encoded locus information in the SNP names there's no need to suggest that it is so in the name of the rod
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2671 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 18:19:28 +00:00
chartl
ac983e7a0b
Ran the rod on a binary plink file with indels and it just worked. Love it when that happens! Unit test to ensure this behaviour is maintained.
...
****** PLINK ROD IS NOW READY TO GO ********
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2670 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 18:13:05 +00:00
chartl
ae22d35212
PlinkRod now correctly parses binary files without indels; unit test added for this behavior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2669 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 17:34:06 +00:00
chartl
94dc09c865
PlinkRod now successfully instantiates on the binary ped file trio (.bim, .bam, .fam) for non-indel files.
...
Upcoming: Test that the instantiation is correct, do it for indel-containing files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2668 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 16:13:24 +00:00
chartl
01db93299c
PlinkRodWithGenomeLoc now properly handels indels.
...
There is now a DELETION_REFERENCE allele type to allow for the storage of multi-base references rather than point-mutation references.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2667 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 07:34:52 +00:00
chartl
42fb85e7f3
PlinkRodWithGenomeLoc now properly parses text plink files. Unit test added to test this functionality. Indels and binary files to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2666 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 06:19:26 +00:00
depristo
c871a0f221
UG map() now returns a VariantCallContext object. Also has a field for confidentlyCalledBases. UG reduce() emits statistics on the confident called % of bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2664 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 23:06:43 +00:00
chartl
fbf82526cb
Minor renamign changes.
...
PlinkRodWithGenomeLoc now supports .bed file parsing (and doesn't require |c#_p# conventions for SNPs -- still requires _g[I/D] for indels)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2663 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 23:06:32 +00:00
rpoplin
fd223e955c
Reverting the previous solid change. We now refuse to recalibrate if the solid read doesn't contain proper color space information. The exception message has been updated to say this. Also, Tile has been downgraded to an ExperimentalCovariate due to performance issues.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2662 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 20:55:28 +00:00
rpoplin
7732f98e56
Fix for Solid reads that have '.' in their color space field. The recalibrator will just set them to be illumina reads and won't apply color space correction.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2661 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 20:09:16 +00:00
aaron
2ea768d902
ant clean is your friend....fixed test code dependent on an interface change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2660 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 20:07:46 +00:00
rpoplin
a11503819a
AnalyzeAnnotations now breaks out its TiTv plots into novel SNPs, dbSNP sites, and combined.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2659 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 19:00:23 +00:00
aaron
cc3b818268
cleanup of the pile-up limit exceeded warning, and a little code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2657 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 22:17:24 +00:00
ebanks
c1e09efb23
- Fixed output for beagle header
...
- Better description for QualByDepth annotation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2655 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 21:25:56 +00:00
rpoplin
d9df72e1b5
AnalyzeAnnotations now bins variants per each annotation and outputs plots of TiTv ratio as a function of the annotation's value.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2654 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 21:15:11 +00:00
chartl
f51cffe220
Alteration of PlinkToVCF to be much more flexible about parsing .ped file headers, which can have one of a number of different standard fields, and be in different orders.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2650 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 18:02:28 +00:00
chartl
5b2a1e483e
Renamed SequenomToVCF as PlinkToVCF. Wiki will be changed accordingly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2649 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 17:35:20 +00:00
asivache
74779a9a78
First version of the tool that tries determining indel error rate (basically, counts indels that look like sequencing/alignment errors - such as a single observation at deeply covered locus, and reports the rate of their occurence)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2648 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 15:28:20 +00:00
hanna
d25a2fe120
Better handling of enums by the command-line argument system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2647 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 21:36:46 +00:00
ebanks
9c7b281b4f
Set default value for max_coverage to be 100K (since 10K is too small).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2646 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 20:15:25 +00:00
hanna
1e9fe2a334
Clean up error output when enums have missing arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2645 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:48:26 +00:00
aaron
8d1d37302c
a quick change to GLF to keep as much precision in our likelihoods as long as possible, before we put it into byte space. Sanger was doing a diff at low coverage and noticed our calls didn't contain as much precision as theirs. Updated the MD5 for unified genotyper output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2644 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:36:49 +00:00
hanna
908d399670
Bug fix for help text / version number - help text retriever was crashing in the debugger if help text hadn't been built.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2643 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:18:19 +00:00
chartl
ab289872e4
Changes:
...
- Annotations return null when given pileups with no second-base information
- SequenomRodWithGenomeLoc -- beter handling of indels
Eric; I made two small changes to the new Genotype interface that we should talk about (they basically have to do with allele/genotype representation):
Allele - added a new UNKNOWN_POINT_MUTATION to AlleleType. If I see a sequenom genotype AG; one's got to be ref, one's got to be SNP, but until I have
an actual reference base in hand, I don't know which is which. That's what this entry is for.
Genotype - added an enum class StandardAttributes for dealing with things like deletion/inversion length. This is probably not the way we want to
represent indels, so we should talk about this. Plus now that there's a direct link between my ROD and the genotype; when we do decide
how to deal with indels, we'll be forced to alter the SequenomRodWithGenomeLoc accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2642 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 16:45:17 +00:00
aaron
a1b4cc4baf
changes to intelligently log overflowing locus pile-ups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2640 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 08:09:48 +00:00
ebanks
4ac9eb7cb2
- Smarter strand bias calculation
...
- Better debug/verbose printing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2639 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 03:01:26 +00:00
depristo
ff66023d83
Trivial change to support filter field in VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2636 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:56:22 +00:00
asivache
4625261d79
Bug fix: alignments ending with 'I' were not counted into the overall coverage which resulted in inaccurate stats, and in rare occasions outright messed up ones.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2635 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:12:16 +00:00
hanna
8dafd26100
Print out the current version number in the application header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2633 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:58:36 +00:00
depristo
9e0ae993c7
-B 1kg_ceu,VFC,CEU.vcf -B 1kg_yri,VCF,YRI.vcf system supported to allow 1KG % (like dbSNP%)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2632 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:33:13 +00:00
rpoplin
c98df0a862
Updated solid_recal_modes to work with bfast aligned data. Added an integration test that uses the BFAST file provided by TGen.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2630 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:18:02 +00:00
chartl
53352e1bb4
First pass at a sequenom ROD. Nothing uses it; currently undergoing testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2629 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 17:09:36 +00:00
hanna
1488578617
Working with Aaron to get svnversion running within the build system. This change will break the build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2628 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 16:55:42 +00:00
rpoplin
bca436578f
Added the -maxQ argument to the list of arguments in the PG tag
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2627 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:55:23 +00:00
rpoplin
d61cafd19f
Make the formatting of the list of args in the PG tag consistent.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2626 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:31:37 +00:00
rpoplin
a12465b6d5
The recalFile argument is no longer added into the PG tag of a bam produced by TableRecalibration. Based on a request from the Sanger.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2625 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:25:57 +00:00
rpoplin
ba19afd529
Draft version of AnalyzeAnnotations which creates plots of cumulative TiTv ratio versus filter value per each annotation in the input VCF rod. Minor cleanup of recalibration walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2623 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 20:47:10 +00:00
kiran
ff6877a15e
Added a forgotten column label
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2622 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 01:00:52 +00:00
kiran
dd6d5aadf9
Computes empirical confusion matrices, optionally with up to five bases of preceding context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2621 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 00:55:12 +00:00
ebanks
12453fa163
Misc cleanup of UG args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2620 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-17 04:38:52 +00:00
ebanks
b8cdf64c20
Better descriptions for max reads/downsampling args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2618 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-17 02:30:27 +00:00
depristo
d8e74c5795
Update to MD5s for old tests and added extensive VCF testing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2615 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:58 +00:00
depristo
64225b28fd
Convenience methods for getting the VCFReader and VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2614 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:31 +00:00
depristo
d0af7f6c7b
Now analyzes filtered SNP like all, novel subsets; support for selecting a single sample to analyze from a multi-sample VCF, support for trivial selection of records with INFO field key/value pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2613 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:04 +00:00
depristo
8ae8e120f8
New annotateUnion operation -- provides clearer annotations on where a call came from when unioning two VCF call sets
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2612 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:20:37 +00:00
depristo
41392f8ff5
functions for setting gentoype records and alternate bases; function for getting all rods implementing VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2611 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:19:43 +00:00
hanna
ac4756db20
Add the svn version on the fly to the version number properties.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2607 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 00:28:01 +00:00
hanna
420cef4094
Added version numbers to the help doclet extractor. Since the help system is behaving
...
more like a resource bundle at this point, changed it over to use the Java ResourceBundle
support classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2606 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 23:31:29 +00:00
rpoplin
4de7d6a59b
Initial checkin of skeleton code for AnalyzeAnnotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2605 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:52:34 +00:00
hanna
930082314a
Put a major.minor version into the GATK Javadoc for reading. Also,
...
update some straggler packages to the new package-info.java format introduced in 1.5.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2604 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:48:30 +00:00
mmelgar
3063224446
SecondaryBaseTransitionTableWalker now breaks by genotype and read group, is javadoc annotated, and is compatible with ReadBackedPileup's methods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2603 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:43:39 +00:00
asivache
7a991421f7
-erw argument, begone! Rod traversals are now enabled. current tests pass, more tests for RODWalkers are welcome ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2601 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:11:14 +00:00
asivache
c8c5c176cd
-erw argument, begone! Rod traversals are now enabled. current tests pass, more tests for RODWalkers are welcome ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2600 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:07:49 +00:00
asivache
a12933a26d
Bug fixed: now the length of an insertion is determined correctly. Thought I committed this...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2599 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 20:58:48 +00:00
asivache
404b95183f
This is a LocusWalker, not a RodWalker (thanks Mark!!). RodWalkers currently are not capable of attaching alignment contexts (reads) to the ROD-annotated loci they traverse over...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2596 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 20:33:41 +00:00
rpoplin
7078219b89
Updating outdated comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2595 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 19:17:52 +00:00
rpoplin
ba2acda406
Clarifying the comment regarding differentiating between first and second of pair in CycleCovariate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2594 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:36:14 +00:00
ebanks
b911b7df82
Fixing the AC annotation to be in line with the VCF spec
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2593 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:28:52 +00:00
rpoplin
f2e539c52f
As per discussions with Tim we are reverting the previous change regarding PairedReadOrderCovariate. The CycleCovariate now differentiates between first and second of pair by multiplying the cycle by -1. PairedReadOrderCovariate has been removed completely.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2592 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:18:59 +00:00
asivache
eae1b73945
Fixed a bug in left-adjusting the indels introduced in previous commit :-/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2591 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 17:41:23 +00:00
rpoplin
df998041a8
Minor change to solid warning message. Added note for a future solid recalibration integration test when we get the required data file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2590 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 16:31:25 +00:00
rpoplin
70df30fc1b
Added method to AlignmentUtils which takes a read's cigar and the refBases char array given to a ReadWalker and returns the aligned reference char array. Bug fix in solid_recal_modes to use this aligned reference array. Recalibrator version number is no longer separate for each of the two walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2589 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 15:36:59 +00:00
ebanks
2a116bb5d6
Made the VCF validator a simple rod walker instead of having it be in a separate package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2588 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 06:39:06 +00:00
hanna
b19bb19f3d
First successful test of new sharding system prototype. Can traverse over reads from a single
...
BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2587 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 03:35:55 +00:00
aaron
db9570ae29
Looks bigger than it is:
...
* Moved GATKArgumentCollection into gatk.arguments folder to clean up the main folder, also added some associated argument classes (most of the changes).
* Added code the argument parsing system for default enums, we needed this so we could preserve the current unsafe flag, and at the same time allow finer grained control of unsafe operations. You can now specify:
"-U" (for all unsafe operations), "-U ALLOW_UNINDEXED_BAM" (only allow unindexed BAMs), "-U NO_READ_ORDER_VERIFICATION", etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2586 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 00:14:35 +00:00
asivache
cff8b705c0
Oh, and the test would not work anymore...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2585 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:47:09 +00:00
kiran
04fdbbfa65
This is the beginning of a new version of VariantEval that can cut VCF files up in a variety of ways with JEXL expressions, select one sample out of a multi-sample VCF, and can load analysis modules dynamically.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2584 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:45:58 +00:00
asivache
df63f51253
No changes, just sync-ing; only some commented out debugging prints are added...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2583 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:45:15 +00:00
asivache
d85461c463
MergingIterator completely re-done. Now it is not a generic class (sorry guys), but rather it is tailored for merging ROD tracks. This implementation peeks the locations of next ROD annotations in each track, but does not actually read these RODs from underlying streams until the location is reached and it is time to actually return the object. Now underlying ROD track iterators (registered in the resource pool!) are not advanced prematurely past the current position and all the way to the next ROD record wherever it is, so that the sharding system can reuse them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2582 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:43:36 +00:00
asivache
c0891d512f
added: peekNextLocation(); it's quite hard (and probably unnecessary, ever) to make seekable iterator a peekable one, but it is quite easy and useful to be able to peek just the next location the iterator will jump to after next call to next()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2581 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:38:19 +00:00
rpoplin
9bf0d7250a
Fixing the testOtherOutput UG integration test so it will run.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2580 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 13:40:14 +00:00
ebanks
a082b948a3
Support throughout for S and N cigar elements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2579 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 03:45:42 +00:00
chartl
424d1b57f7
Sequenom to VCF now allows user to specify filters for QC, and they will appear in the filter field of the output VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2577 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 23:22:37 +00:00
rpoplin
f96b2b211e
My last checkin updating R code broke an unrelated UnifiedGenotyper integration test. Eric says that I should take out the verbose test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2576 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 22:28:10 +00:00
rpoplin
49c44e7b36
PairedReadOrderCovariate is now a standard covariate and because of this CycleCovariate no longer multiplies by negative one for second of pair reads. Added PairedReadOrderCovariate to some of the integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2574 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 20:09:10 +00:00
hanna
05575e2e56
Better bounding for the locus window. Don't make the locus window calculation blow up if the GenomeLoc ends
...
up being outside the reference. Force the blowup elsewhere.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2573 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 17:03:54 +00:00
ebanks
8ca5bba738
We emit genotype data in the VCF record if the format string instructs us to (regardless of whether or not genotypes are provided - this was the wrong test).
...
SequenomToVCF now correctly has no-calls when probes fail.
Re-enabled SequenomToVCF integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2572 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:40:27 +00:00
chartl
6d1107a4ed
Update to SequenomToVCF
...
Output changing slightly so integration test disabled temporarily
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2571 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:32:05 +00:00
ebanks
f99586f91b
Added integration test for beagle and verbose output in UG.
...
Minor cleanup of VCFRecord code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2570 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 03:55:24 +00:00
hanna
02e23e2d9c
Threading support for beagle output files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2569 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 02:42:16 +00:00
aaron
0513690416
two fixes in the new cached DbSNP code:
...
-isBiallelic would incorrectly say triallelic sites are biallelic.
-getAlternateAlleleList was broken, since the new cached list is immutable, we couldn’t remove list items.
Also added a dbSNP validating walker to the one-offs, for testing the new b37 130 dbSNP rod.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2568 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 00:27:34 +00:00
asivache
a138bad95a
A rare but not-so-subtle bug fixed: a funky alignment (a kind that should not have been generated in the first place) could make the indel left-adjusting method to overshoot read start and build a cigar like -3M6I...
...
also, few minor fix-ups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2567 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 21:29:50 +00:00
rpoplin
b51f4aae11
Updating the recalibrator to make use of StingSAMFileWriter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2566 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 20:58:27 +00:00
rpoplin
c8ad025ad0
cleaning up unused import statements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2565 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:52:37 +00:00
rpoplin
189829841b
The recalibrator now uses all input RODs when looking for known polymorphic sites not just the one named dbsnp. Added an integration test which uses both dbsnp and an input vcf file and skips over the union of the two.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2564 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:50:39 +00:00